This is an archive of past discussions. Do not edit the contents of this page. If you wish to start a new discussion or revive an old one, please do so on the current talk page. |
Archive 1 | Archive 2 | Archive 3 | Archive 4 |
Hi, I am trying to work out how useful categories are 1. I was trying to find out a count by month of # of new categories
[ [1]]
but I "Can't connect to MySQL server on 'wiki.analytics.db.svc.wikimedia.cloud' ([Errno -2] Name or service not known)"
2. Is it possible to find out how may times a reader clicks on a category on a page?
I asked a similar question to the help desk asking if there were reports Wakelamp d[@-@]b ( talk) 04:25, 20 August 2022 (UTC)
Could someone run a query on requests for adminship by number of participants (support/neutral/oppose) please? Information from it may be featured in an article in The Signpost. –– FormalDude (talk) 21:38, 25 September 2022 (UTC)
I haven't found this anywhere, but perhaps it could be answered with a fairly simple query? I'd like to know the number of inline citations (using <ref>
tags) in the median article. Ideally, this would exclude disambiguation pages and other non-articles, but indisputable evidence is not required. My guess is that it's more than one, but probably not more than five.
WhatamIdoing (
talk) 22:56, 3 October 2022 (UTC)
The pagetriage_page_tags table has around 19 rows per ptrpt_page_id. Each row corresponds to a unique page tag (ptrpt_tag_id). The lookup is stored in the table pagetriage_tags, which maps ptrt_tag_id to ptrt_tag_name, such as 'copyright', 'user_name', etc.
I'm trying to find the page ids for any pages in pagetriage_page_tags that only have 1 entry instead of 19.
The way I chose to do this was with a query that matched the bug report. The bug report had only a copyright tag, and no other tags, so I picked a random tag (user_name) to check in a subquery to see if it also existed.
Rough draft query here but I messed something up. Sample data with 3 pages x 19 tags. Help appreciated :) – Novem Linguae ( talk) 03:39, 21 October 2022 (UTC)
Category:Members of the Fifteen Year Society of Wikipedia editors has 562 members. It it possible to extract a list of those who are still active (those who have made an edit in the last 12 months)? Kudpung กุดผึ้ง ( talk) 00:21, 19 October 2022 (UTC)
I'd like to know, if possible, the proportion of biographies where the subject is:
Thanks. Thryduulf ( talk) 10:28, 12 October 2022 (UTC)
Looking to find out how often some profanity, curse words, are said in revision history. I have done the same thing manually for article, article talk, user, user talk. I suspect it is much higher on revision history
Wakelamp d[@-@]b ( talk) 15:15, 1 November 2022 (UTC)
fuck off
, cunt
, and fuckwit
for just one day on the revision history for article?
Wakelamp d[@-@]b (
talk) 13:51, 4 November 2022 (UTC)
This would make finding and fixing these container categories much easier. DemonStalker ( talk) 01:58, 23 November 2022 (UTC)
How can I get a list of all the individuals currently alive in the English Wikipedia over the age of say 95?
Thank You Dleigthful ( talk) 06:18, 21 November 2022 (UTC)
Hi Cryptic
For a meeting with the WMF this week we need
Sample period would need to be at least over the last 6 months to make any sense.
Is this something you can do for us? Thanks in advance. Kudpung กุดผึ้ง ( talk) 06:45, 6 December 2022 (UTC)
Would it be possible to construct a query that returns all articles that have two or more sections with the same name? I just merged one such instance at Concord, Massachusetts (which had two "Transportation" sections) and I suspect there may be many others. {{u| Sdkb}} talk 00:08, 9 December 2022 (UTC)
Hi! Could someone please tweak this query to only list existing talk pages? Thanks! 🐶 EpicPupper (he/him | talk) 02:58, 12 December 2022 (UTC)
Hello! I'd appreciate it if a query could be created to list all fully-protected templates and their number of transclusions. Additionally, it'd be great if the least-transcluded template, most-transcluded template, and median and average number of transclusions could also be calculated some way. Much thanks! 🐶 EpicPupper (he/him | talk) 04:42, 17 December 2022 (UTC)
Could we please have these for a meeting with the WMF next week:
Sample period would need to be at least over the last 6 months to make any sense. Thanks. Kudpung กุดผึ้ง ( talk) 18:46, 4 January 2023 (UTC)
I tried fiddling around in PetScan, and couldn't see a way to do this: I want to see all articles without short descriptions, sorted by pageviews (high to low). — PerfectSoundWhatever ( t; c) 02:19, 28 December 2022 (UTC)
Hi there, I'd really appreciate it if a query for the number of registered users (movement-wide) that aren't globally locked could be created. Thanks! EpicPupper ( talk) 01:59, 31 January 2023 (UTC)
Hello! Would it be possible to get a quarry on new pages created as a subpage of Wikipedia talk:Articles for creation, organized by date of creation? Thank you so much! House Blaster talk 21:37, 19 February 2023 (UTC)
Hi. I'm looking for:
I suspect to see a shift over the years from deleting articles to deleting drafts. I'm very curious to see how big the shift has been. If results could be broken down by month that would be even better. Thanks in advance! Clayoquot ( talk | contribs) 20:02, 21 February 2023 (UTC)
See quarry:query/69410. There are about 98 pages Category:Redirects_for_discussion_with_talk_page_redirects that I want excluded from my query results. It's easy to qualify by limiting to only pages populating a category, but seems not so much for the reverse. Many pages populating that category are populating at least one other category as well, and if they are they are still listed in the report when I don't want them there. wbm1058 ( talk) 21:10, 28 February 2023 (UTC)
-- your regular query goes here
AND p.page_title NOT IN (
-- the quarry:query/69410 query, without the concat part
)
Is it possible to access this table? The presence of example queries suggests it is, but it doesn't appear to exist on enwiki_p. BilledMammal ( talk) 20:29, 6 March 2023 (UTC)
This table is populated when edits are saved on pages, unless $wgDisableSearchUpdate is set to true, or an extension that implements its own search index (like CirrusSearch extension) is installed.We have CirrusSearch, so sadly the table may not be available. Certes ( talk) 20:34, 6 March 2023 (UTC)
At the Tea house, a user has asked for "the most used red links (ignoring templates and similar)". There seems to be an ambiguity here, where one reading is, 1. "unique red links with the most user clicks", and the other is, 2. "unique red links which appear a) in the greatest number overall", or, b) "...in the greatest number of articles" (i.e., ignoring dupes in the same article, which if they don't overlink, should be similar to 2a). I'm assuming they meant 2a. Can someone help with this request? Thanks, Mathglot ( talk) 08:08, 3 March 2023 (UTC)
{{
Asian topic||art}}
and similar, but {{
Asian topic}} itself doesn't link there (it's not about art) and so that link can't be detected as template-made. Someone would have to search a wikitext dump for [[Abkhaz art]] and variants, and repeat that search for thousands of other redlinks. Another complication is that we may want to count links via some templates: if someone deliberately links to {{
stn|Narnia}}
then that makes
Narnia station a wanted article.
Certes (
talk) 19:23, 6 March 2023 (UTC)
I revoked autopatrolled the other day from someone who had been creating (among other things) unsourced BLPs, and that left me wondering whether it was just an isolated incident or if there was a broader problem. Would it be possible to get a list of articles that 1) were created by someone with the autopatrol right and 2) were deleted with a log summary mentioning "BLPPROD" 3) within the last year? I'd certainly appreciate it. Thanks in advance! Extraordinary Writ ( talk) 20:53, 16 March 2023 (UTC)
I've been working underground, cleaning up a " trainwreck", which lead me to find pages like Buffalo Boys (disambiguation) and Chris Breen (disambiguation) transcluding {{ R to disambiguation page}} and thus populating Category:Redirects to disambiguation pages. But these aren't redirects to disambiguation, they are redirects to articles, and there is no active mechanism flagging them as errors needing correction, so I don't know how many other pages like these are lurking on Wikipedia.
I've been aware of this issue since at least April 2017, when I put it on my to-do list:
We're now up over 254,000 such redirects. I guess solving the issue hasn't gotten any easier over the past six years.
My first thought was to try to track the problem in real-time using template magic. I could modify {{ R to disambiguation page}} to examine the {{ target of}} the page transcluding the template to confirm that it populates Category:All disambiguation pages, and report all redirect targets that don't populate that category as errors. I found {{ if in category}} which might theoretically do that, if it didn't only detect categories directly on a page, not those added indirectly through templates. Looking at how that "very expensive" template works using {{ find page text}} I suppose it might be possible to search for the templates indirectly populating the category, including all their redirect-variant names. Ick.
So my second thought was to create a database query to find them. This query pulls up the first 200 pages in the redirect table, but as soon as I add:
it finds nothing. Is there a way to make this work? Or is it not possible because such a query is too inefficient and resource-intensive?
If a database query can't be made to work, I guess my third thought is that I need to write a PHP bot that likely will take a few hours to manually walk through all 254,000+ redirects and check them one-at-a-time and generate a list of the bad ones like Buffalo Boys (disambiguation) and Chris Breen (disambiguation). wbm1058 ( talk) 17:36, 20 March 2023 (UTC)
For purposes of WP:EDITORRETENTION, I'm interested in gathering some data on the contribution history of Wikipedia:Education program student editors after their course ends. (Sample Wiki Ed course page here.) These Wiki Ed student editors typically contribute for several weeks during the second half of their semester course, and then disappear immediately after the course ends, no doubt busy with college courses, graduation, and real life. Afaik, there's almost a total lack of data about what happens afterward, so it's not even clear what query to formulate, and I suspect it might be a refinement process, with early results informing later queries. Maybe one might start with a scoping query to see what we're dealing with, perhaps this:
Columns might be:
Notes:
Fall 2019
, at the end of the course name, but not all do. If we recast 'Winter 2019' as '2019 Q1', 'Spring 2019' as '2019 Q2', and so on, it would give us an easily sortable value. (Or just '2019-1', -2, -3, -4 if better.) If parsing is hard, try any 4-digit sequence in (2015..2023) in the course name, especially if it's anchored at the end of the string. If year or season/quarter cannot be determined, probably we should throw out the row.
[d]
This query is in support of this discussion at the Wikipedia:Education noticeboard. Thanks, Mathglot ( talk) 00:31, 31 March 2023 (UTC)
|end_date=
) but which are not currently added to the template by default. That will make this value much easier to determine, but we don't have that yet. Or maybe don't throw out the row, but consider year only, and ignore season.
As described over at the admin noticeboard, the ReferenceExpander bot hasn't always been expanding references. Instead, it does things like following a URL that now redirects to a page which doesn't have the information being cited, and then replacing a manually formatted footnote that includes an archive URL with a {{ cite web}} template that doesn't. It'll drop bylines from articles, eliminate quotes, ignore the second of two references put in the same footnote, etc. Is there a way to get a list of all the edits that include "ReferenceExpander" in the summary, maybe sorted by delta-article-size so that we can prioritize checking those cases where it made the article shrink? XOR'easter ( talk) 21:52, 1 April 2023 (UTC)
Result set seems a bit small. What'd I mess up? https://quarry.wmcloud.org/query/72954 – Novem Linguae ( talk) 01:53, 11 April 2023 (UTC)
Requested on Discord by @ Ixtal. They're interested in getting the # of edits each admin had at the time of their RFA.
I'm thinking this could possibly be done with a subquery, and feeding the SQL query some manual data. For example, if I had something like...
CREATE TEMPORARY TABLE list_of_admins (user_name varbinary(255), rfa_date binary(14));
INSERT INTO list_of_admins SET user_name = 'Test1', rfa_date = '20210101000000';
INSERT INTO list_of_admins SET user_name = 'Test2', rfa_date = '20220202000000';
SELECT user_name, rfa_date, edit_count AS (
SELECT COUNT(*)
FROM revision
JOIN actor ON rev_actor = actor_id
JOIN user ON actor_user = user_id
WHERE user.user_name = list_of_admins.user_name
AND rev_timestamp < rfa_date
)
FROM list_of_admins;
But quarry doesn't allow temporary tables. Any ideas? – Novem Linguae ( talk) 01:55, 7 April 2023 (UTC)
WITH list_of_admins (user_name, rfa_date) AS
(SELECT 'Test1', '20210101000000'
UNION SELECT 'Test2', '20220202000000'
UNION SELECT 'Test3', '20230303000000' -- ...
)
SELECT user_name, rfa_date, edit_count -- ...
Quarry:query/72815 produces a list of 626 editors with the timestamp of their earliest (IIRC, although I'm not sure how the group by works here) granting of sysop rights. Doesn't check if their current ug includes sysop, tho. Also, I'm looking for the # of edits for each admin after their RFA. — Ixtal ( T / C ) ⁂ Non nobis solum. 08:41, 7 April 2023 (UTC)
I run from time to time into former Lepidoptera articles that have since been redirected to another article, but where the WikiProject tags on the talkpage weren't updated and still claim it to be a stub (or start/etc., but Lepidoptera articles in classes other than stub are relatively scarce and have low enough numbers checking manually is feasible. Manually checking the near-100k stub-rated Lepidoptera pages, on the other hand, not so much.)
Could anyone by any chance write me a query, or point me in the right direction of how to do it myself, to identify all mainspace redirects where the accompanying talkpage is in Category:Stub-Class Lepidoptera articles? My gut feeling is that there are at least several hundreds, and quite possibly a few thousand, such mismatches. Depending on the scale of the results, I might either tackle the issue myself or request a bot task for it. AddWitty NameHere 09:59, 21 April 2023 (UTC)
I am trying to track the change in new page patrol outcomes over time. The backlog is rather low now, so this seems like an opportune time. I was wondering if anyone might try to query, perhaps grouped on a monthly basis:
1) the number of pages reviewed
2) the outcome - "approved" (ie, no action taken), CSDed, PROD, Draftify, BLAR. I think that would cover most cases.
I suspect (2) will be non-trivial. Maury Markowitz ( talk) 14:34, 22 April 2023 (UTC)
Hi! I'm tryna get a template-protected template edit made to disambiguate the targets of the display text of two parameters that currently link to the same target.
I'm trying to figure out the use cases of having only one of these two parameters defined, since I haven't been able to find any, and the existence of such a set of articles would be the strongest counterargument against disambiguating the wikilinks, and I want to make sure I understand the effects of the proposed disambiguation before I start a discussion about it.
Specifically, I'm looking for a query that will return the list of articles that both: call
Template:Infobox royalty and: pass it values for exactly one of {family_name
,clan_name
}. I understand this will involve parsing the source of pages, and this template has over 21,000 transclusions, so I do have a backup plan involving a proposal to add more complicated parsing logic to the template source if this query is not feasible. Thanks,
Folly Mox (
talk) 00:06, 28 April 2023 (UTC)
For this discussion, I'd like to have data on how many talk pages have {{ Vital article}} but not {{ Talk header}}, as well as how many have {{ Vital article}} but not {{ WikiProject banner shell}}. I tried using PetScan but either it's currently broken or I did something wrong, so it didn't work. Can anyone help? Thanks, {{u| Sdkb}} talk 21:28, 4 May 2023 (UTC)
Some users who edit a lot may have created some unused templates a long time ago, but they forgot about it. At this time, I can remind them through this query and let them delete them by themselves. Q 𝟤 𝟪 06:50, 22 May 2023 (UTC)
"the wiki nowiki and the roles I'm interested in are patruljør (patroller) and autopatruljerte (autopatrolled)."
Requested on Discord. I started a query here but got stuck on the "date of last contribution" column. How do I pull the timestamp of a user's most recent contribution from the revision table? Thanks. – Novem Linguae ( talk) 15:22, 27 May 2023 (UTC)
Would someone like to take a stab at a list of candidate edit counts at the time of their RFAs? Suggested algorithm: query all the subpages of WP:RFA, do a join on revision
to figure out the page created date, isolate the candidate name using REPLACE()
, then subquery the revision
table and do a count for that user before the page created date. 2nd and later RFAs won't isolate the username easily, which is fine, doesn't have to be perfect. Thanks. –
Novem Linguae (
talk) 16:43, 14 June 2023 (UTC)
I wrote a query to show short cycles (not longer than 3 for now) in paths column at quarry:query/74589. It simply queries category-links from categories recursively and keeps each path to find a cycle. The cycles are duplicate if they could be reached from different start points. That leads to inefficiency of this query, which will timeout if the parameter max_recursive_iterations is assigned larger than 3. I'm here to ask for a way to reduce duplicates and get each simple cycle only once during the query. NmWTfs85lXusaybq ( talk) 14:24, 19 June 2023 (UTC)
Could someone compile a list of mixed script redirects? For example, if Αlpha (Greek Alpha + latin lpha) → Alpha existed, it should be picked up. Or if Вaseball (Cyrillic B + latin aseball) → Baseball existed, also list it. Perhaps highlighting the non-dominant characters? E.g.
Mixed script | Target | Scripts |
---|---|---|
Αlpha | Alpha | Latin/Greek |
Вaseball | Baseball | Latin/Cyrillic |
Imagine the red links being blue, because I'm assuming they exist rather than be hypothetical. Headbomb { t · c · p · b} 14:24, 18 June 2023 (UTC)
In the above format, that would be
Mixed script | Target | Scripts |
---|---|---|
Belgrade Мunicipalities | Subdivisions of Belgrade | Latin/Cyrillic |
Lеlаng Соmmаndеry | Lelang Commandery | Latin/Cyrillic |
Headbomb { t · c · p · b} 14:47, 18 June 2023 (UTC)
If I could have one query per parameter, that would be great. Specifically, in all citation templates ({{
cite xxx}}
and {{
citation}}) find all instances of
|encyclopedia=Foobar
= |publisher=Foobar
|journal=Foobar
= |publisher=Foobar
|magazine=Foobar
= |publisher=Foobar
|newspaper=Foobar
= |publisher=Foobar
|series=Foobar
= |publisher=Foobar
|title=Foobar
= |publisher=Foobar
|website=Foobar
= |publisher=Foobar
|work=Foobar
= |publisher=Foobar
and also
|journal=Foobar
= |series=Foobar
The matches should be exact, save for whitespace. Headbomb { t · c · p · b} 12:31, 25 June 2023 (UTC)
\{\{[Cc]it(?:ation|e )[^}]*\|\s*encyclopedia\s*=\s*((?:[^][|}]|\[\[[^]|]+(?:\|[^]]*)?\]\])+)\s*(?:\|[^}]*)?\|\s*publisher\s*=\s*\1\s*[|}]
. You may also want to repeat it with encyclopedia and publisher swapped.
Certes (
talk) 17:45, 25 June 2023 (UTC)
\{\{[Cc]it(?:ation|e )[^}]*\|\s*(encyclopedia|journal|magazine|newspaper|series|title|website|work|publisher)\s*=\s*((?:[^][|}]|\[\[[^]|]+(?:\|[^]]*)?\]\])+)\s*(?:\|[^}]*)?\|\s*(encyclopedia|journal|magazine|newspaper|series|title|website|work|publisher)\s*=\s*\1\s*[|}]
(?:\|[^}]*)?
before the second encyclopedia|etc group to (?:\|[^}]*)*
.You're probably already aware this'll match {{citation|title=Foo|series=Foo}}, i.e. repetition of any two of your left-hand column, without necessarily having a matching publisher parameter.Be aware this won't match {{citation|newspaper=[[The New York Times|New York Times]]|publisher=New York Times}}, nor anything that uses another template (or single close-brace) in between the "citation|cite whatever" and the second encyclopedia/publisher/whatever parameter. Worse (though less likely to be encountered), if one of citation's params is a template invocation that itself has an "encyclopedia/whatever=" param in that, the regex will treat it like it was the outer citation template's. Proper parsing fixes the first problem cheaply, and the second and third for free. —
Cryptic 07:14, 26 June 2023 (UTC)
(?:\|[^}]*)?
→ (?:\|[^}]*)*
though, I wouldn't have thought of that right away.
Headbomb {
t ·
c ·
p ·
b} 10:14, 26 June 2023 (UTC)
(?:\|[^}]*)?
also matches "|this|that". I suggested ? as it looked marginally faster, though I didn't test that.
Certes (
talk) 11:33, 26 June 2023 (UTC)
python myprogram < mydumpfile > myoutput
or the equivalent in your operating system?
Certes (
talk) 23:14, 30 June 2023 (UTC)python3 dup-publisher.py --help
to see the options. You'll have to install mwparserfromhell if you haven't already; from a command line, python3 -m pip install mwparserfromhell
. —
Cryptic 00:06, 1 July 2023 (UTC)sample output from enwiki-20230620-pages-articles16.xml-p20460153p20570392.bz2 - 25635 pages, 68 matches, 4:08 runtime
|
---|
J. M. G. Le Clézio bibliography {{cite web}}:publisher,title="THE NOBEL FOUNDATION 2008" Hot Issue (EP) {{cite web}}:publisher,work="Asianpopcorn" Tokyo Day Trip {{cite web}}:publisher,website="Nonesuch Records." Per- and polyfluoroalkyl substances {{cite web}}:publisher,website="National Collaborating Centre for Environmental Health" Jake Parker {{cite news}}:publisher,work="NPR" German University of Technology in Oman {{cite web}}:publisher,title="ACQUIN" Where the City Meets the Sea {{cite web}}:publisher,title="iTunes Store" Khurshed Makhmudov {{cite web}}:publisher,website="today.tj" Caritas Christi Health Care {{cite web}}:publisher,title="Caritas Christi Health Care" Audi R15 TDI {{cite web}}:publisher,title="audir15tdi.com" SS Automedon {{cite book}}:publisher,title="Lloyd's Register" Grace Hospital (Winnipeg) {{cite web}}:publisher,title="Grace Hospital" History of the Jews in Mumbai {{cite web}}:publisher,title="JTA" Hlaing Township {{cite web}}:publisher,website="International School of Myanmar" Glen Coffee {{Cite news}}:publisher,work="SaturdayDownSouth" Doug Yates {{cite web}}:publisher,website="motorsport.com" Maryse Joissains {{cite web}}:publisher,title="Maryse Joissains-Masini" List of parishes in the Diocese of Salford {{cite web}}:publisher,title="Holy Souls Parish" North Country Supervisory Union {{Cite news}}:publisher,work="The Chronicle" Irish Baroque Orchestra {{cite web}}:publisher,title="Temple Bar Cultural Trust" Riverfront Park (Spokane, Washington) {{cite web}}:publisher,title="North Idaho Centennial Trail Foundation" Rainbow Theatre {{cite web}}:publisher,website="Historic England" World War I in popular culture {{Cite web}}:publisher,title="H-Nationalism" Scremerston {{cite web}}:publisher,website="Archeological Research Services" Sport in Lima {{cite web}}:publisher,website="World Baseball Softball Confederation" Svend Bayer {{cite web}}:publisher,title="Pucker Gallery" Supreeme {{cite web}}:publisher,title="Record Collection" Mumford & Sons {{cite web}}:publisher,website="Never Enough Notes" Hecht Company Warehouse {{cite web}}:publisher,work="Douglas Development Corporation" God's Squad {{cite web}}:publisher,work="CathNews" LGBT rights in the 19th century {{cite web}}:publisher,title="Sodomylaws.org" John Najarian {{cite web}}:publisher,website="Mprnews" Tame Impala {{cite web}}:publisher,title="Dave Fridmann" Escot, Talaton {{cite news}}:publisher,work="The Telegraph" Yasser Khalil {{cite web}}:publisher,title="Yasser Khalil" American Artists' Congress {{cite web}}:publisher,website="The Annex Galleries" Castle (TV series) {{cite web}}:publisher,title="ABC Medianet" Agnete Saba {{cite web}}:publisher,website="vg.no" Mark Simmonds (saxophonist) {{cite web}}:publisher,work="Australian Rock Database" Frederikssund {{cite web}}:publisher,title="vikingespil.dk" Air sock {{cite web}}:publisher,website="Prihoda UK" Halfway Bush {{cite web}}:publisher,title="Halfway Bush School" Baháʼí Faith in Dominica {{cite web}}:publisher,work="thearda.com" Middle Cyclone {{cite web}}:publisher,work="Stereogum" William Wellington House {{cite web}}:publisher,work="Kirkbride Buildings" Alfred Sankoh {{cite web}}:publisher,website="dt.no" Hadleigh Farm {{cite web}}:publisher,title="Hadleigh Mountain Bike Club" Drumbeg, Sutherland {{cite web}}:publisher,website="The Gazetteer for Scotland" Tetra Tech {{cite web}}:publisher,website="Engineer News-Record" Fort Nonsense (Annapolis, Maryland) {{cite web}}:publisher,title="Maryland Historical Trust" Funhouse (song) {{cite web}}:publisher,title="Nielsen Music Poland" Denis Haines {{cite web}}:publisher,website="Classic Rock Radio" Government Office for Science {{cite news}}:publisher,title="Government Office for Science" Yazidis {{cite web}}:publisher,work="Statistics of Russia" Reactions to the September 11 attacks {{cite web}}:publisher,website="georgewbush-whitehouse.archives.gov" New Eyes for the Needy {{cite web}}:publisher,title="New Eyes for the Needy" Advertisements for Myself {{cite web}}:publisher,website="Pbs.org" Medal of Honor Aircraft {{cite web}}:publisher,website="Airport-Data.com" Jonathan Leavitt (publisher) {{cite journal}}:publisher,journal="[[American Society for Eighteenth-Century Studies]]" T-Mobile US {{cite web}}:publisher,title="BillShrink.com" Cannabis in New Zealand {{cite web}}:publisher,website="Say Nope To Dope" Nashville Charter Amendment 1 {{Cite web}}:publisher,title="Nashville English First" List of windmills in Leicestershire {{cite web}}:publisher,title="Redmile Archive" Mattapany-Sewall Archeological Site {{cite web}}:publisher,title="Maryland Historical Trust" GCE Advanced Level (United Kingdom) {{cite web}}:publisher,website="UK Parliament" Jared Bernstein {{cite web}}:publisher,title="Cnbc.com" Jeff Glixman {{cite web}}:publisher,title="Starcityrecording.com" Yang Kaiqi {{cite web}}:publisher,title="Chessinchina.Net" |
To add a "creation date" column to a list of usernames, I need to get the creation date from logging WHERE log_type = 'newusers', right? What's the best way to incorporate it into this query? LEFT JOIN? Subquery? Thanks for the help. – Novem Linguae ( talk) 03:35, 3 July 2023 (UTC)
I 'd like to find all edits made by
User:Qwerfjkl (bot) where the edit summary contains ([[Wikipedia:Bots/Requests for approval/Qwerfjkl (bot) 19|Task 19)]]
[
sic] and the edit isn't a page creation. Then I'd like just the revid and page title. —
Qwerfjkl
talk 15:20, 4 July 2023 (UTC)
Hi, I need a query that lists all non-redirecting mainspace content with one or more of "Alpha", "Beta", etc. (case insensitive) in the title, and is a member of Category:Chemistry to any depth. — CX Zoom[he/him] ( let's talk • { C• X}) 06:46, 9 July 2023 (UTC)
If someone has the time, I would appreciate a query of Category:Players of American football for entries with less than 2,500 bytes and which is either unsourced or sourced only to databases (e.g. "pro-football-reference.com", "sports-reference.com", "profootballarchives.com", "espn.com" "footballdb.com", or "nfl.com"). Cbl62 ( talk) 17:28, 11 July 2023 (UTC)
Hi! The five of us still toiling away at the cleanup enabled by query 72741 (ReferenceExpander edits from 1 January 2023 to 2 April 2023) are nearing-ish the completion of the first phase of our task. Fewer than 350 edits remain to be checked. We already have a second phase prepared (based on query 72745), but we are going to need results for two more date ranges.
Given the above, I'd like to request a query similar to 72741 and 72745, with the following alterations:
using [[en:w:User:BrandonXLF/ReferenceExpander
(need a more specific match than before, since editors active in the cleanup project have been mentioning ReferenceExpander
in edit summaries, which will create a very high false positive rate for the post 2023-04-01 period)I don't remember enough SQL to do this, especially regarding string matching and syntax for escaping special characters. Thanks! Folly Mox ( talk) 20:15, 14 July 2023 (UTC)
The results of my prior request were quite helpful in allowing the creation of Wikipedia:WikiProject National Football League/Football biography cleanup, a promising effort to improve the quality of sub-stub articles on gridiron football players. In an effort to ensure its completeness, would someone we willing to create a simpler list from Category:Gridiron football players and its subcategories for articles with less than 1,500 total bytes? Cbl62 ( talk) 20:48, 23 July 2023 (UTC)
Can someone generate a list of all pages with this exact wikitext in it? WHILE WITH NO ACTION: -->[[Category:Wikipedia usernames with possible policy issues|
I need it for
an approved AWB bot task. Traditional search option doesn't seem to work, thanks to the -->
part. Thanks! —
CX Zoom[he/him] (
let's talk • {
C•
X}) 06:01, 27 July 2023 (UTC)
Could I get, based on this list, the subset of articles that were edited by Citation bot from July 20 of this year onwards. Or alternatively, that were not edited by Citation bot from July 20 of this year onwards?
Headbomb { t · c · p · b} 20:56, 27 July 2023 (UTC)
I am interested to see how many articles run counter to MOS:BOLDLINK within Category:Green Bay Packers, which in my experience is very rampant across the project. I would expect a query to look like:
'''[[Wikilink]]'''
)
Thank you « Gonzo fan2007 (talk) @ 18:54, 28 July 2023 (UTC)
Could someone create a query that shows a user's edits ordered by the size of content they add? TIA. –– FormalDude (talk) 22:58, 3 August 2023 (UTC)
Is it possible to find the top 100 most used templates? I need it for bnwiki. Thanks. আফতাবুজ্জামান ( talk) 13:54, 17 August 2023 (UTC)
In order to support a discussion at
WP:VPT regarding the lack of a dedicated "Script:" namespace, I am interested in analyses of unique editors in different namespace pages compared to unique editors in user script pages (likely defined as User subpages having a .js
suffix). Perhaps one or more of these might help:
Because the goal is to discover "how many are sharing the coding" not the documentation, /doc subpages or other non-code subpages of templates, modules, or scripts (usually just the script page name, minus the .js
suffix) should probably be excluded. Draftspace is kind of a control, because coding isn't involved there. (The common.js files aren't really scripts, and not sure if they need to be excluded to avoid skewing the tallies; there are a lot of them.)
What I'm looking for, is some sort of measure of "editor script shyness", defined as a kind of inverse of "editor collaboration", in an attempt to answer the question, "Do editors tend to avoid collaborating on user script pages because they appear to 'belong' to another user, more so than they would collaborating on modules, templates, or drafts, which have their own namespace and thus don't appear to 'belong' to anybody, and if so, by how much?" A large "editor script shyness quotient" might be an argument in support of creating a "Script" namespace; a low ESSQ would argue against it. Not sure what would be the most useful to help shed light on that question, so feel free to vary the suggestions above or toss them out in favor of whatever seems most useful, or doable. Thanks, Mathglot ( talk) 02:20, 21 August 2023 (UTC)
This is an archive of past discussions. Do not edit the contents of this page. If you wish to start a new discussion or revive an old one, please do so on the current talk page. |
Archive 1 | Archive 2 | Archive 3 | Archive 4 |
Hi, I am trying to work out how useful categories are 1. I was trying to find out a count by month of # of new categories
[ [1]]
but I "Can't connect to MySQL server on 'wiki.analytics.db.svc.wikimedia.cloud' ([Errno -2] Name or service not known)"
2. Is it possible to find out how may times a reader clicks on a category on a page?
I asked a similar question to the help desk asking if there were reports Wakelamp d[@-@]b ( talk) 04:25, 20 August 2022 (UTC)
Could someone run a query on requests for adminship by number of participants (support/neutral/oppose) please? Information from it may be featured in an article in The Signpost. –– FormalDude (talk) 21:38, 25 September 2022 (UTC)
I haven't found this anywhere, but perhaps it could be answered with a fairly simple query? I'd like to know the number of inline citations (using <ref>
tags) in the median article. Ideally, this would exclude disambiguation pages and other non-articles, but indisputable evidence is not required. My guess is that it's more than one, but probably not more than five.
WhatamIdoing (
talk) 22:56, 3 October 2022 (UTC)
The pagetriage_page_tags table has around 19 rows per ptrpt_page_id. Each row corresponds to a unique page tag (ptrpt_tag_id). The lookup is stored in the table pagetriage_tags, which maps ptrt_tag_id to ptrt_tag_name, such as 'copyright', 'user_name', etc.
I'm trying to find the page ids for any pages in pagetriage_page_tags that only have 1 entry instead of 19.
The way I chose to do this was with a query that matched the bug report. The bug report had only a copyright tag, and no other tags, so I picked a random tag (user_name) to check in a subquery to see if it also existed.
Rough draft query here but I messed something up. Sample data with 3 pages x 19 tags. Help appreciated :) – Novem Linguae ( talk) 03:39, 21 October 2022 (UTC)
Category:Members of the Fifteen Year Society of Wikipedia editors has 562 members. It it possible to extract a list of those who are still active (those who have made an edit in the last 12 months)? Kudpung กุดผึ้ง ( talk) 00:21, 19 October 2022 (UTC)
I'd like to know, if possible, the proportion of biographies where the subject is:
Thanks. Thryduulf ( talk) 10:28, 12 October 2022 (UTC)
Looking to find out how often some profanity, curse words, are said in revision history. I have done the same thing manually for article, article talk, user, user talk. I suspect it is much higher on revision history
Wakelamp d[@-@]b ( talk) 15:15, 1 November 2022 (UTC)
fuck off
, cunt
, and fuckwit
for just one day on the revision history for article?
Wakelamp d[@-@]b (
talk) 13:51, 4 November 2022 (UTC)
This would make finding and fixing these container categories much easier. DemonStalker ( talk) 01:58, 23 November 2022 (UTC)
How can I get a list of all the individuals currently alive in the English Wikipedia over the age of say 95?
Thank You Dleigthful ( talk) 06:18, 21 November 2022 (UTC)
Hi Cryptic
For a meeting with the WMF this week we need
Sample period would need to be at least over the last 6 months to make any sense.
Is this something you can do for us? Thanks in advance. Kudpung กุดผึ้ง ( talk) 06:45, 6 December 2022 (UTC)
Would it be possible to construct a query that returns all articles that have two or more sections with the same name? I just merged one such instance at Concord, Massachusetts (which had two "Transportation" sections) and I suspect there may be many others. {{u| Sdkb}} talk 00:08, 9 December 2022 (UTC)
Hi! Could someone please tweak this query to only list existing talk pages? Thanks! 🐶 EpicPupper (he/him | talk) 02:58, 12 December 2022 (UTC)
Hello! I'd appreciate it if a query could be created to list all fully-protected templates and their number of transclusions. Additionally, it'd be great if the least-transcluded template, most-transcluded template, and median and average number of transclusions could also be calculated some way. Much thanks! 🐶 EpicPupper (he/him | talk) 04:42, 17 December 2022 (UTC)
Could we please have these for a meeting with the WMF next week:
Sample period would need to be at least over the last 6 months to make any sense. Thanks. Kudpung กุดผึ้ง ( talk) 18:46, 4 January 2023 (UTC)
I tried fiddling around in PetScan, and couldn't see a way to do this: I want to see all articles without short descriptions, sorted by pageviews (high to low). — PerfectSoundWhatever ( t; c) 02:19, 28 December 2022 (UTC)
Hi there, I'd really appreciate it if a query for the number of registered users (movement-wide) that aren't globally locked could be created. Thanks! EpicPupper ( talk) 01:59, 31 January 2023 (UTC)
Hello! Would it be possible to get a quarry on new pages created as a subpage of Wikipedia talk:Articles for creation, organized by date of creation? Thank you so much! House Blaster talk 21:37, 19 February 2023 (UTC)
Hi. I'm looking for:
I suspect to see a shift over the years from deleting articles to deleting drafts. I'm very curious to see how big the shift has been. If results could be broken down by month that would be even better. Thanks in advance! Clayoquot ( talk | contribs) 20:02, 21 February 2023 (UTC)
See quarry:query/69410. There are about 98 pages Category:Redirects_for_discussion_with_talk_page_redirects that I want excluded from my query results. It's easy to qualify by limiting to only pages populating a category, but seems not so much for the reverse. Many pages populating that category are populating at least one other category as well, and if they are they are still listed in the report when I don't want them there. wbm1058 ( talk) 21:10, 28 February 2023 (UTC)
-- your regular query goes here
AND p.page_title NOT IN (
-- the quarry:query/69410 query, without the concat part
)
Is it possible to access this table? The presence of example queries suggests it is, but it doesn't appear to exist on enwiki_p. BilledMammal ( talk) 20:29, 6 March 2023 (UTC)
This table is populated when edits are saved on pages, unless $wgDisableSearchUpdate is set to true, or an extension that implements its own search index (like CirrusSearch extension) is installed.We have CirrusSearch, so sadly the table may not be available. Certes ( talk) 20:34, 6 March 2023 (UTC)
At the Tea house, a user has asked for "the most used red links (ignoring templates and similar)". There seems to be an ambiguity here, where one reading is, 1. "unique red links with the most user clicks", and the other is, 2. "unique red links which appear a) in the greatest number overall", or, b) "...in the greatest number of articles" (i.e., ignoring dupes in the same article, which if they don't overlink, should be similar to 2a). I'm assuming they meant 2a. Can someone help with this request? Thanks, Mathglot ( talk) 08:08, 3 March 2023 (UTC)
{{
Asian topic||art}}
and similar, but {{
Asian topic}} itself doesn't link there (it's not about art) and so that link can't be detected as template-made. Someone would have to search a wikitext dump for [[Abkhaz art]] and variants, and repeat that search for thousands of other redlinks. Another complication is that we may want to count links via some templates: if someone deliberately links to {{
stn|Narnia}}
then that makes
Narnia station a wanted article.
Certes (
talk) 19:23, 6 March 2023 (UTC)
I revoked autopatrolled the other day from someone who had been creating (among other things) unsourced BLPs, and that left me wondering whether it was just an isolated incident or if there was a broader problem. Would it be possible to get a list of articles that 1) were created by someone with the autopatrol right and 2) were deleted with a log summary mentioning "BLPPROD" 3) within the last year? I'd certainly appreciate it. Thanks in advance! Extraordinary Writ ( talk) 20:53, 16 March 2023 (UTC)
I've been working underground, cleaning up a " trainwreck", which lead me to find pages like Buffalo Boys (disambiguation) and Chris Breen (disambiguation) transcluding {{ R to disambiguation page}} and thus populating Category:Redirects to disambiguation pages. But these aren't redirects to disambiguation, they are redirects to articles, and there is no active mechanism flagging them as errors needing correction, so I don't know how many other pages like these are lurking on Wikipedia.
I've been aware of this issue since at least April 2017, when I put it on my to-do list:
We're now up over 254,000 such redirects. I guess solving the issue hasn't gotten any easier over the past six years.
My first thought was to try to track the problem in real-time using template magic. I could modify {{ R to disambiguation page}} to examine the {{ target of}} the page transcluding the template to confirm that it populates Category:All disambiguation pages, and report all redirect targets that don't populate that category as errors. I found {{ if in category}} which might theoretically do that, if it didn't only detect categories directly on a page, not those added indirectly through templates. Looking at how that "very expensive" template works using {{ find page text}} I suppose it might be possible to search for the templates indirectly populating the category, including all their redirect-variant names. Ick.
So my second thought was to create a database query to find them. This query pulls up the first 200 pages in the redirect table, but as soon as I add:
it finds nothing. Is there a way to make this work? Or is it not possible because such a query is too inefficient and resource-intensive?
If a database query can't be made to work, I guess my third thought is that I need to write a PHP bot that likely will take a few hours to manually walk through all 254,000+ redirects and check them one-at-a-time and generate a list of the bad ones like Buffalo Boys (disambiguation) and Chris Breen (disambiguation). wbm1058 ( talk) 17:36, 20 March 2023 (UTC)
For purposes of WP:EDITORRETENTION, I'm interested in gathering some data on the contribution history of Wikipedia:Education program student editors after their course ends. (Sample Wiki Ed course page here.) These Wiki Ed student editors typically contribute for several weeks during the second half of their semester course, and then disappear immediately after the course ends, no doubt busy with college courses, graduation, and real life. Afaik, there's almost a total lack of data about what happens afterward, so it's not even clear what query to formulate, and I suspect it might be a refinement process, with early results informing later queries. Maybe one might start with a scoping query to see what we're dealing with, perhaps this:
Columns might be:
Notes:
Fall 2019
, at the end of the course name, but not all do. If we recast 'Winter 2019' as '2019 Q1', 'Spring 2019' as '2019 Q2', and so on, it would give us an easily sortable value. (Or just '2019-1', -2, -3, -4 if better.) If parsing is hard, try any 4-digit sequence in (2015..2023) in the course name, especially if it's anchored at the end of the string. If year or season/quarter cannot be determined, probably we should throw out the row.
[d]
This query is in support of this discussion at the Wikipedia:Education noticeboard. Thanks, Mathglot ( talk) 00:31, 31 March 2023 (UTC)
|end_date=
) but which are not currently added to the template by default. That will make this value much easier to determine, but we don't have that yet. Or maybe don't throw out the row, but consider year only, and ignore season.
As described over at the admin noticeboard, the ReferenceExpander bot hasn't always been expanding references. Instead, it does things like following a URL that now redirects to a page which doesn't have the information being cited, and then replacing a manually formatted footnote that includes an archive URL with a {{ cite web}} template that doesn't. It'll drop bylines from articles, eliminate quotes, ignore the second of two references put in the same footnote, etc. Is there a way to get a list of all the edits that include "ReferenceExpander" in the summary, maybe sorted by delta-article-size so that we can prioritize checking those cases where it made the article shrink? XOR'easter ( talk) 21:52, 1 April 2023 (UTC)
Result set seems a bit small. What'd I mess up? https://quarry.wmcloud.org/query/72954 – Novem Linguae ( talk) 01:53, 11 April 2023 (UTC)
Requested on Discord by @ Ixtal. They're interested in getting the # of edits each admin had at the time of their RFA.
I'm thinking this could possibly be done with a subquery, and feeding the SQL query some manual data. For example, if I had something like...
CREATE TEMPORARY TABLE list_of_admins (user_name varbinary(255), rfa_date binary(14));
INSERT INTO list_of_admins SET user_name = 'Test1', rfa_date = '20210101000000';
INSERT INTO list_of_admins SET user_name = 'Test2', rfa_date = '20220202000000';
SELECT user_name, rfa_date, edit_count AS (
SELECT COUNT(*)
FROM revision
JOIN actor ON rev_actor = actor_id
JOIN user ON actor_user = user_id
WHERE user.user_name = list_of_admins.user_name
AND rev_timestamp < rfa_date
)
FROM list_of_admins;
But quarry doesn't allow temporary tables. Any ideas? – Novem Linguae ( talk) 01:55, 7 April 2023 (UTC)
WITH list_of_admins (user_name, rfa_date) AS
(SELECT 'Test1', '20210101000000'
UNION SELECT 'Test2', '20220202000000'
UNION SELECT 'Test3', '20230303000000' -- ...
)
SELECT user_name, rfa_date, edit_count -- ...
Quarry:query/72815 produces a list of 626 editors with the timestamp of their earliest (IIRC, although I'm not sure how the group by works here) granting of sysop rights. Doesn't check if their current ug includes sysop, tho. Also, I'm looking for the # of edits for each admin after their RFA. — Ixtal ( T / C ) ⁂ Non nobis solum. 08:41, 7 April 2023 (UTC)
I run from time to time into former Lepidoptera articles that have since been redirected to another article, but where the WikiProject tags on the talkpage weren't updated and still claim it to be a stub (or start/etc., but Lepidoptera articles in classes other than stub are relatively scarce and have low enough numbers checking manually is feasible. Manually checking the near-100k stub-rated Lepidoptera pages, on the other hand, not so much.)
Could anyone by any chance write me a query, or point me in the right direction of how to do it myself, to identify all mainspace redirects where the accompanying talkpage is in Category:Stub-Class Lepidoptera articles? My gut feeling is that there are at least several hundreds, and quite possibly a few thousand, such mismatches. Depending on the scale of the results, I might either tackle the issue myself or request a bot task for it. AddWitty NameHere 09:59, 21 April 2023 (UTC)
I am trying to track the change in new page patrol outcomes over time. The backlog is rather low now, so this seems like an opportune time. I was wondering if anyone might try to query, perhaps grouped on a monthly basis:
1) the number of pages reviewed
2) the outcome - "approved" (ie, no action taken), CSDed, PROD, Draftify, BLAR. I think that would cover most cases.
I suspect (2) will be non-trivial. Maury Markowitz ( talk) 14:34, 22 April 2023 (UTC)
Hi! I'm tryna get a template-protected template edit made to disambiguate the targets of the display text of two parameters that currently link to the same target.
I'm trying to figure out the use cases of having only one of these two parameters defined, since I haven't been able to find any, and the existence of such a set of articles would be the strongest counterargument against disambiguating the wikilinks, and I want to make sure I understand the effects of the proposed disambiguation before I start a discussion about it.
Specifically, I'm looking for a query that will return the list of articles that both: call
Template:Infobox royalty and: pass it values for exactly one of {family_name
,clan_name
}. I understand this will involve parsing the source of pages, and this template has over 21,000 transclusions, so I do have a backup plan involving a proposal to add more complicated parsing logic to the template source if this query is not feasible. Thanks,
Folly Mox (
talk) 00:06, 28 April 2023 (UTC)
For this discussion, I'd like to have data on how many talk pages have {{ Vital article}} but not {{ Talk header}}, as well as how many have {{ Vital article}} but not {{ WikiProject banner shell}}. I tried using PetScan but either it's currently broken or I did something wrong, so it didn't work. Can anyone help? Thanks, {{u| Sdkb}} talk 21:28, 4 May 2023 (UTC)
Some users who edit a lot may have created some unused templates a long time ago, but they forgot about it. At this time, I can remind them through this query and let them delete them by themselves. Q 𝟤 𝟪 06:50, 22 May 2023 (UTC)
"the wiki nowiki and the roles I'm interested in are patruljør (patroller) and autopatruljerte (autopatrolled)."
Requested on Discord. I started a query here but got stuck on the "date of last contribution" column. How do I pull the timestamp of a user's most recent contribution from the revision table? Thanks. – Novem Linguae ( talk) 15:22, 27 May 2023 (UTC)
Would someone like to take a stab at a list of candidate edit counts at the time of their RFAs? Suggested algorithm: query all the subpages of WP:RFA, do a join on revision
to figure out the page created date, isolate the candidate name using REPLACE()
, then subquery the revision
table and do a count for that user before the page created date. 2nd and later RFAs won't isolate the username easily, which is fine, doesn't have to be perfect. Thanks. –
Novem Linguae (
talk) 16:43, 14 June 2023 (UTC)
I wrote a query to show short cycles (not longer than 3 for now) in paths column at quarry:query/74589. It simply queries category-links from categories recursively and keeps each path to find a cycle. The cycles are duplicate if they could be reached from different start points. That leads to inefficiency of this query, which will timeout if the parameter max_recursive_iterations is assigned larger than 3. I'm here to ask for a way to reduce duplicates and get each simple cycle only once during the query. NmWTfs85lXusaybq ( talk) 14:24, 19 June 2023 (UTC)
Could someone compile a list of mixed script redirects? For example, if Αlpha (Greek Alpha + latin lpha) → Alpha existed, it should be picked up. Or if Вaseball (Cyrillic B + latin aseball) → Baseball existed, also list it. Perhaps highlighting the non-dominant characters? E.g.
Mixed script | Target | Scripts |
---|---|---|
Αlpha | Alpha | Latin/Greek |
Вaseball | Baseball | Latin/Cyrillic |
Imagine the red links being blue, because I'm assuming they exist rather than be hypothetical. Headbomb { t · c · p · b} 14:24, 18 June 2023 (UTC)
In the above format, that would be
Mixed script | Target | Scripts |
---|---|---|
Belgrade Мunicipalities | Subdivisions of Belgrade | Latin/Cyrillic |
Lеlаng Соmmаndеry | Lelang Commandery | Latin/Cyrillic |
Headbomb { t · c · p · b} 14:47, 18 June 2023 (UTC)
If I could have one query per parameter, that would be great. Specifically, in all citation templates ({{
cite xxx}}
and {{
citation}}) find all instances of
|encyclopedia=Foobar
= |publisher=Foobar
|journal=Foobar
= |publisher=Foobar
|magazine=Foobar
= |publisher=Foobar
|newspaper=Foobar
= |publisher=Foobar
|series=Foobar
= |publisher=Foobar
|title=Foobar
= |publisher=Foobar
|website=Foobar
= |publisher=Foobar
|work=Foobar
= |publisher=Foobar
and also
|journal=Foobar
= |series=Foobar
The matches should be exact, save for whitespace. Headbomb { t · c · p · b} 12:31, 25 June 2023 (UTC)
\{\{[Cc]it(?:ation|e )[^}]*\|\s*encyclopedia\s*=\s*((?:[^][|}]|\[\[[^]|]+(?:\|[^]]*)?\]\])+)\s*(?:\|[^}]*)?\|\s*publisher\s*=\s*\1\s*[|}]
. You may also want to repeat it with encyclopedia and publisher swapped.
Certes (
talk) 17:45, 25 June 2023 (UTC)
\{\{[Cc]it(?:ation|e )[^}]*\|\s*(encyclopedia|journal|magazine|newspaper|series|title|website|work|publisher)\s*=\s*((?:[^][|}]|\[\[[^]|]+(?:\|[^]]*)?\]\])+)\s*(?:\|[^}]*)?\|\s*(encyclopedia|journal|magazine|newspaper|series|title|website|work|publisher)\s*=\s*\1\s*[|}]
(?:\|[^}]*)?
before the second encyclopedia|etc group to (?:\|[^}]*)*
.You're probably already aware this'll match {{citation|title=Foo|series=Foo}}, i.e. repetition of any two of your left-hand column, without necessarily having a matching publisher parameter.Be aware this won't match {{citation|newspaper=[[The New York Times|New York Times]]|publisher=New York Times}}, nor anything that uses another template (or single close-brace) in between the "citation|cite whatever" and the second encyclopedia/publisher/whatever parameter. Worse (though less likely to be encountered), if one of citation's params is a template invocation that itself has an "encyclopedia/whatever=" param in that, the regex will treat it like it was the outer citation template's. Proper parsing fixes the first problem cheaply, and the second and third for free. —
Cryptic 07:14, 26 June 2023 (UTC)
(?:\|[^}]*)?
→ (?:\|[^}]*)*
though, I wouldn't have thought of that right away.
Headbomb {
t ·
c ·
p ·
b} 10:14, 26 June 2023 (UTC)
(?:\|[^}]*)?
also matches "|this|that". I suggested ? as it looked marginally faster, though I didn't test that.
Certes (
talk) 11:33, 26 June 2023 (UTC)
python myprogram < mydumpfile > myoutput
or the equivalent in your operating system?
Certes (
talk) 23:14, 30 June 2023 (UTC)python3 dup-publisher.py --help
to see the options. You'll have to install mwparserfromhell if you haven't already; from a command line, python3 -m pip install mwparserfromhell
. —
Cryptic 00:06, 1 July 2023 (UTC)sample output from enwiki-20230620-pages-articles16.xml-p20460153p20570392.bz2 - 25635 pages, 68 matches, 4:08 runtime
|
---|
J. M. G. Le Clézio bibliography {{cite web}}:publisher,title="THE NOBEL FOUNDATION 2008" Hot Issue (EP) {{cite web}}:publisher,work="Asianpopcorn" Tokyo Day Trip {{cite web}}:publisher,website="Nonesuch Records." Per- and polyfluoroalkyl substances {{cite web}}:publisher,website="National Collaborating Centre for Environmental Health" Jake Parker {{cite news}}:publisher,work="NPR" German University of Technology in Oman {{cite web}}:publisher,title="ACQUIN" Where the City Meets the Sea {{cite web}}:publisher,title="iTunes Store" Khurshed Makhmudov {{cite web}}:publisher,website="today.tj" Caritas Christi Health Care {{cite web}}:publisher,title="Caritas Christi Health Care" Audi R15 TDI {{cite web}}:publisher,title="audir15tdi.com" SS Automedon {{cite book}}:publisher,title="Lloyd's Register" Grace Hospital (Winnipeg) {{cite web}}:publisher,title="Grace Hospital" History of the Jews in Mumbai {{cite web}}:publisher,title="JTA" Hlaing Township {{cite web}}:publisher,website="International School of Myanmar" Glen Coffee {{Cite news}}:publisher,work="SaturdayDownSouth" Doug Yates {{cite web}}:publisher,website="motorsport.com" Maryse Joissains {{cite web}}:publisher,title="Maryse Joissains-Masini" List of parishes in the Diocese of Salford {{cite web}}:publisher,title="Holy Souls Parish" North Country Supervisory Union {{Cite news}}:publisher,work="The Chronicle" Irish Baroque Orchestra {{cite web}}:publisher,title="Temple Bar Cultural Trust" Riverfront Park (Spokane, Washington) {{cite web}}:publisher,title="North Idaho Centennial Trail Foundation" Rainbow Theatre {{cite web}}:publisher,website="Historic England" World War I in popular culture {{Cite web}}:publisher,title="H-Nationalism" Scremerston {{cite web}}:publisher,website="Archeological Research Services" Sport in Lima {{cite web}}:publisher,website="World Baseball Softball Confederation" Svend Bayer {{cite web}}:publisher,title="Pucker Gallery" Supreeme {{cite web}}:publisher,title="Record Collection" Mumford & Sons {{cite web}}:publisher,website="Never Enough Notes" Hecht Company Warehouse {{cite web}}:publisher,work="Douglas Development Corporation" God's Squad {{cite web}}:publisher,work="CathNews" LGBT rights in the 19th century {{cite web}}:publisher,title="Sodomylaws.org" John Najarian {{cite web}}:publisher,website="Mprnews" Tame Impala {{cite web}}:publisher,title="Dave Fridmann" Escot, Talaton {{cite news}}:publisher,work="The Telegraph" Yasser Khalil {{cite web}}:publisher,title="Yasser Khalil" American Artists' Congress {{cite web}}:publisher,website="The Annex Galleries" Castle (TV series) {{cite web}}:publisher,title="ABC Medianet" Agnete Saba {{cite web}}:publisher,website="vg.no" Mark Simmonds (saxophonist) {{cite web}}:publisher,work="Australian Rock Database" Frederikssund {{cite web}}:publisher,title="vikingespil.dk" Air sock {{cite web}}:publisher,website="Prihoda UK" Halfway Bush {{cite web}}:publisher,title="Halfway Bush School" Baháʼí Faith in Dominica {{cite web}}:publisher,work="thearda.com" Middle Cyclone {{cite web}}:publisher,work="Stereogum" William Wellington House {{cite web}}:publisher,work="Kirkbride Buildings" Alfred Sankoh {{cite web}}:publisher,website="dt.no" Hadleigh Farm {{cite web}}:publisher,title="Hadleigh Mountain Bike Club" Drumbeg, Sutherland {{cite web}}:publisher,website="The Gazetteer for Scotland" Tetra Tech {{cite web}}:publisher,website="Engineer News-Record" Fort Nonsense (Annapolis, Maryland) {{cite web}}:publisher,title="Maryland Historical Trust" Funhouse (song) {{cite web}}:publisher,title="Nielsen Music Poland" Denis Haines {{cite web}}:publisher,website="Classic Rock Radio" Government Office for Science {{cite news}}:publisher,title="Government Office for Science" Yazidis {{cite web}}:publisher,work="Statistics of Russia" Reactions to the September 11 attacks {{cite web}}:publisher,website="georgewbush-whitehouse.archives.gov" New Eyes for the Needy {{cite web}}:publisher,title="New Eyes for the Needy" Advertisements for Myself {{cite web}}:publisher,website="Pbs.org" Medal of Honor Aircraft {{cite web}}:publisher,website="Airport-Data.com" Jonathan Leavitt (publisher) {{cite journal}}:publisher,journal="[[American Society for Eighteenth-Century Studies]]" T-Mobile US {{cite web}}:publisher,title="BillShrink.com" Cannabis in New Zealand {{cite web}}:publisher,website="Say Nope To Dope" Nashville Charter Amendment 1 {{Cite web}}:publisher,title="Nashville English First" List of windmills in Leicestershire {{cite web}}:publisher,title="Redmile Archive" Mattapany-Sewall Archeological Site {{cite web}}:publisher,title="Maryland Historical Trust" GCE Advanced Level (United Kingdom) {{cite web}}:publisher,website="UK Parliament" Jared Bernstein {{cite web}}:publisher,title="Cnbc.com" Jeff Glixman {{cite web}}:publisher,title="Starcityrecording.com" Yang Kaiqi {{cite web}}:publisher,title="Chessinchina.Net" |
To add a "creation date" column to a list of usernames, I need to get the creation date from logging WHERE log_type = 'newusers', right? What's the best way to incorporate it into this query? LEFT JOIN? Subquery? Thanks for the help. – Novem Linguae ( talk) 03:35, 3 July 2023 (UTC)
I 'd like to find all edits made by
User:Qwerfjkl (bot) where the edit summary contains ([[Wikipedia:Bots/Requests for approval/Qwerfjkl (bot) 19|Task 19)]]
[
sic] and the edit isn't a page creation. Then I'd like just the revid and page title. —
Qwerfjkl
talk 15:20, 4 July 2023 (UTC)
Hi, I need a query that lists all non-redirecting mainspace content with one or more of "Alpha", "Beta", etc. (case insensitive) in the title, and is a member of Category:Chemistry to any depth. — CX Zoom[he/him] ( let's talk • { C• X}) 06:46, 9 July 2023 (UTC)
If someone has the time, I would appreciate a query of Category:Players of American football for entries with less than 2,500 bytes and which is either unsourced or sourced only to databases (e.g. "pro-football-reference.com", "sports-reference.com", "profootballarchives.com", "espn.com" "footballdb.com", or "nfl.com"). Cbl62 ( talk) 17:28, 11 July 2023 (UTC)
Hi! The five of us still toiling away at the cleanup enabled by query 72741 (ReferenceExpander edits from 1 January 2023 to 2 April 2023) are nearing-ish the completion of the first phase of our task. Fewer than 350 edits remain to be checked. We already have a second phase prepared (based on query 72745), but we are going to need results for two more date ranges.
Given the above, I'd like to request a query similar to 72741 and 72745, with the following alterations:
using [[en:w:User:BrandonXLF/ReferenceExpander
(need a more specific match than before, since editors active in the cleanup project have been mentioning ReferenceExpander
in edit summaries, which will create a very high false positive rate for the post 2023-04-01 period)I don't remember enough SQL to do this, especially regarding string matching and syntax for escaping special characters. Thanks! Folly Mox ( talk) 20:15, 14 July 2023 (UTC)
The results of my prior request were quite helpful in allowing the creation of Wikipedia:WikiProject National Football League/Football biography cleanup, a promising effort to improve the quality of sub-stub articles on gridiron football players. In an effort to ensure its completeness, would someone we willing to create a simpler list from Category:Gridiron football players and its subcategories for articles with less than 1,500 total bytes? Cbl62 ( talk) 20:48, 23 July 2023 (UTC)
Can someone generate a list of all pages with this exact wikitext in it? WHILE WITH NO ACTION: -->[[Category:Wikipedia usernames with possible policy issues|
I need it for
an approved AWB bot task. Traditional search option doesn't seem to work, thanks to the -->
part. Thanks! —
CX Zoom[he/him] (
let's talk • {
C•
X}) 06:01, 27 July 2023 (UTC)
Could I get, based on this list, the subset of articles that were edited by Citation bot from July 20 of this year onwards. Or alternatively, that were not edited by Citation bot from July 20 of this year onwards?
Headbomb { t · c · p · b} 20:56, 27 July 2023 (UTC)
I am interested to see how many articles run counter to MOS:BOLDLINK within Category:Green Bay Packers, which in my experience is very rampant across the project. I would expect a query to look like:
'''[[Wikilink]]'''
)
Thank you « Gonzo fan2007 (talk) @ 18:54, 28 July 2023 (UTC)
Could someone create a query that shows a user's edits ordered by the size of content they add? TIA. –– FormalDude (talk) 22:58, 3 August 2023 (UTC)
Is it possible to find the top 100 most used templates? I need it for bnwiki. Thanks. আফতাবুজ্জামান ( talk) 13:54, 17 August 2023 (UTC)
In order to support a discussion at
WP:VPT regarding the lack of a dedicated "Script:" namespace, I am interested in analyses of unique editors in different namespace pages compared to unique editors in user script pages (likely defined as User subpages having a .js
suffix). Perhaps one or more of these might help:
Because the goal is to discover "how many are sharing the coding" not the documentation, /doc subpages or other non-code subpages of templates, modules, or scripts (usually just the script page name, minus the .js
suffix) should probably be excluded. Draftspace is kind of a control, because coding isn't involved there. (The common.js files aren't really scripts, and not sure if they need to be excluded to avoid skewing the tallies; there are a lot of them.)
What I'm looking for, is some sort of measure of "editor script shyness", defined as a kind of inverse of "editor collaboration", in an attempt to answer the question, "Do editors tend to avoid collaborating on user script pages because they appear to 'belong' to another user, more so than they would collaborating on modules, templates, or drafts, which have their own namespace and thus don't appear to 'belong' to anybody, and if so, by how much?" A large "editor script shyness quotient" might be an argument in support of creating a "Script" namespace; a low ESSQ would argue against it. Not sure what would be the most useful to help shed light on that question, so feel free to vary the suggestions above or toss them out in favor of whatever seems most useful, or doable. Thanks, Mathglot ( talk) 02:20, 21 August 2023 (UTC)