From Wikipedia, the free encyclopedia
(Redirected from Wikipedia:SQLREQ)

This is a page for requesting one-off database queries for certain criteria. Users who are interested and able to perform SQL queries on the projects can provide results from the Quarry website.

You may also be interested in the following:

  • If you are interested in writing SQL queries or helping out here, visit our tips page.
  • If you need to obtain a list of article titles that meet certain criteria, consider using PetScan ( user manual) or the default search. Petscan can generate list of articles in subcategories, articles which transclude some template, etc.
  • If you need to make changes to a number of articles based on a particular query, you can post to the bot requests page, depending on how many changes are needed.
  • For long-term review and checking, database reports are available.

Quarry does not have access to page content, so queries which require checking wikitext cannot be answered with Quarry. However, someone may be able to assist by using Quarry in another way (e.g. checking the table of category links rather than the "Category:" text) or suggest an alternative tool.

ListeriaBot issue

I already have a query, albeit SPARQL, but been running into this issue and I thought somebody could help here. Thanks! Assem Khidhr ( talk) 05:11, 12 May 2024 (UTC) reply

You might have better luck asking at d:WD:RAQ. — Cryptic 12:09, 16 May 2024 (UTC) reply

Help with expanding the self-transcluded-templates report

Hey. I was wondering if this is even possible and if someone here might be able to help with this. The query in User:Jonesey95/self-transcluded-templates collects templates that either have zero transclusions or are only transcluded on the template itself. I'd like to extend this to also the sub-pages of the template.

The new logic would be:

  • Get all templates with number of transclusions less than 6 (5 valid transclusions are: main, talk, /sandbox, /doc, /testcases - of the same template).
  • Keep in the report all templates that have all of their transclusions only on one of the above pages.

Is this possible? Gonnym ( talk) 09:17, 29 May 2024 (UTC) reply

What you ask for in your first paragraph and in your bulleted list aren't the same things, or even terribly similar. Also, while we could literally do what you're asking for in your bulleted list - first generate a list of templates with 0-5 transclusions, and then cull that - but it would be complex, quite likely would be very slow, and possibly wouldn't do quite what you want. Generally it's a better idea to ask for what you actually want, rather than how to get it, and that's especially true with SQL since, since it's primarily a descriptive language, that's what the query-writer does too. (At least, until something goes wrong.)
What I'm guessing you're really after is to exclude transclusions on
  1. the template's own talk page, and
  2. either
    1. the template's own "/sandbox", "/doc", or "/testcases" subpages, or
    2. all of the template's own subpages.
Which of B1 or B2 are you after? — Cryptic 17:01, 29 May 2024 (UTC) reply
What you ask for in your first paragraph and in your bulleted list aren't the same things I know... The first paragraph is what we currently have at User:Jonesey95/self-transcluded-templates. What I asked in the bulleted list is what I hoped we could modify it to.
Regarding your second question, B2. Exclude all of a template's own subpages (and talk page). So the finale result would be a database report with templates that have "zero" transclusions (but might have actual transclusions on their own pages). Gonnym ( talk) 17:09, 29 May 2024 (UTC) reply
If I've understood correctly: as well as the existing condition tl_from <> page_id, tl_from also has to differ from the page ids of all subpages and any talk page. Certes ( talk) 17:23, 29 May 2024 (UTC) reply
My knowledge of SQL is limited so can't answer that. Gonnym ( talk) 17:34, 29 May 2024 (UTC) reply
(I meant the "I'd like to extend this to also the sub-pages of the template." sentence.)
Do these results look right? — Cryptic 17:49, 29 May 2024 (UTC) reply
No. The transclusion check is meant to reduce the number of valid transclusions to zero. So for example, if template is transcluded only on itself, it then it should be on the report. Template:Article length bar/L0 was removed, but it shouldn't as it's unused other than itself. Gonnym ( talk) 17:53, 29 May 2024 (UTC) reply
Yeah, just realized that I was only removing items from the list that already showed no transclusions. — Cryptic 17:55, 29 May 2024 (UTC) reply
I've created Template:Test SQL report so help test this. It should appear on the report. Gonnym ( talk) 17:59, 29 May 2024 (UTC) reply
Take 2. — Cryptic 20:12, 29 May 2024 (UTC) reply
Looks great! I've browsed it and everything looks good. I'll have a deeper dive into it and if I find something I'll let you know, but so far works as requested. Thank you! Gonnym ( talk) 20:14, 29 May 2024 (UTC) reply

How do the patrol/pagetriage-curation logs really work?

Does anyone really know, definitively, what the distinction between (log_type = 'patrol' AND log_action = 'patrol) and (log_type = 'pagetriage-curation' AND log_action IN ('reviewed', 'reviewed-article', 'reviewed-redirect')) is? In particular, why do you variously get one, the other, or both even for patrols by the same person within minutes of each other (example: both, pagetriage only, patrol only); exactly which on-wiki actions result in those cases; and is there a good way to distinguish genuine multiple reviews of the same title without doublecounting the various log_type/log_action combinations (perhaps group by log_page)?

(Context is Wikipedia:Requests for adminship/Elli#General comments, starting with User:Hey man im josh's comment at 21:42, 31 May 2024; my best guesses for the answers are in the description of quarry:query/83443.) — Cryptic 00:05, 1 June 2024 (UTC) reply

Wikipedia:New pages patrol#Patrol versus review is my attempt to document this. It is confusing. I think the native patrol system has some unintuitive behaviors (maybe autoreviewing in certain cases, maybe being revision-based instead of page-based, not exactly sure, but I sense some differences), that I haven't quite wrapped my head around yet and that make it not correspond 1:1 to reviewing. Also, the original PageTriage authors tried to keep the patrol and review logs in sync but missed some cases such as page moves, so there are some bugs. phab:T346215, phab:T337356. Finally, PageTriage reviewing only occurs in the main namespace (and formerly the user namespace), whereas native patrolling with the "[Mark this page as patrolled]" link can occur in any namespace.
When evaluating an NPP's number of reviews, one should use the pagetriage-curation log exclusively because it corresponds to the NPP clicking the reviewed button in the Page Curation toolbar. The patrol log is less accurate. When the Page Curation toolbar is open, as it is for most NPPs unless they close it, the "[Mark this page as patrolled]" link is hidden.
Confusingly, XTools only displays the patrol count, not the reviewed count. quarry:query/70425 is an old query I wrote to get a count of a person's reviews only. I think I'll go file an XTools ticket to use "reviews" instead of "patrols" for enwiki. phab:T366397
One other thing. 'reviewed', 'reviewed-article', 'reviewed-redirect'. It used to be only reviewed, then we got rid of reviewed and split it into reviewed-article and reviewed-redirect for easier querying. This split happened about a year ago. phab:T349048Novem Linguae ( talk) 05:08, 1 June 2024 (UTC) reply

Page table slow query

Any way to speed up this page table query? CirrusSearch does it fast, but maybe elasticsearch has its own indexing or something to help make it fast. – Novem Linguae ( talk) 18:08, 15 June 2024 (UTC) reply

Yes, CirrusSearch has full-text indices on page titles, while the main db only has normal (prefix) ones.
Using the page_name_title index speeds this sort of query up considerably, since titles not matching '%.js' can be filtered out there (even though every title still needs to be looked at, the whole row won't need to be fetched). There's no way to force that, though, since we only have access to views. Sometimes you can fool the optimizer into the indexed search if you say which namespaces you want instead of the ones you don't, even if the list is long, something like WHERE page_namespace IN (0,1,3,4,5,6,7,9,10,11,12,13,14,15,100,101,118,119,710,711,828,829), but that doesn't work here. So you're going to have to partition it into several (well, lots of) queries that look small enough that the optimizer uses the index instead of a full table scan - WHERE page_namespace = 0 AND page_title < 'M', WHERE page_namespace =0 AND page_title >= 'M', and so on for the other possible namespaces. — Cryptic 00:38, 22 June 2024 (UTC) reply
An alternate approach: since you know that there won't be many titles ending in '.js' except in userspace and mediawiki:, find the titles yourself by grepping enwiki-20240601-all-titles.gz from a dump. Link to all of them from a page in your userspace, then you can use pagelinks to find them in a query and check page_content_model and page_restrictions. — Cryptic 04:19, 22 June 2024 (UTC) reply

Links in common between two pages, with a twist

I realize PetScan can be used to show links that appear on two pages, but I have a need for it to show such a result sorted by the order in which they appear on the first page. Based on my limited understanding of the database, I don't think the order of the links on a page are tracked. But just in case I'm wrong, can someone show me a query to do this? Or suggest an alternative approach? Thanks. Stefen Towers among the rest! GabGruntwerk 23:49, 22 June 2024 (UTC) reply

No, that's not possible; Quarry doesn't contain information about the content of the page.
Could you give some additional information about what you want this for? There are several alternative approaches that would work, but I need to know a little more about what you want it for. BilledMammal ( talk) 23:53, 22 June 2024 (UTC) reply
I thought Quarry knows the links on a page, so that's where I was coming from. Anyway, I have a page that shows links sorted by popularity (views, descending) and another page that shows links to articles having old issues. I'd like an intersection of the links between them, in order of popularity (order of appearance on first page). Stefen Towers among the rest! GabGruntwerk 00:03, 23 June 2024 (UTC) reply
It can see which pages a given page links to (or is linked from), but doesn't have any information on what order it's done in - that would need the page content.
Is this a one-time query, or will you need it repeated? — Cryptic 00:08, 23 June 2024 (UTC) reply
I'd like for it to be repeatable as the underlying pages will change. I'm fine with having to run it manually. Stefen Towers among the rest! GabGruntwerk 00:10, 23 June 2024 (UTC) reply
( edit conflict) It does, but it doesn't have information beyond that, such as about the text of the page.
I'm not aware of any tools that can help you with that, but I threw together the information you wanted using a quick and dirty script:
Extended content
  1. Deion Sanders
  2. Ford Explorer
  3. Maurice Lucas
  4. Josh Hamilton
  5. Fort Knox
  6. Diane Sawyer
  7. Aroldis Chapman
  8. Secretariat (film)
  9. UPS Airlines
  10. Joe Torre
  11. Presbyterian Church (USA)
  12. Damaris Phillips
  13. Jim Beam
  14. Louis Brandeis
  15. Jack McCall
  16. My Morning Jacket
  17. Carlton Fisk
  18. David Pajo
  19. Adam Dunn
  20. Kentucky Colonels
  21. Humana
  22. Earl Weaver
  23. Pope Lick Monster
  24. Meriwether Lewis Clark Jr.
  25. John Marshall Harlan
  26. B. Brian Blair
  27. Frank Ramsey (basketball)
  28. Interstate 71
  29. A. J. Foyt IV
  30. Oldham County, Kentucky
  31. Susanne Zenor
  32. Andy Van Slyke
  33. Harvey Fuqua
  34. Dan Uggla
  35. Homer Bailey
  36. Louisville Cardinals
  37. Aristides (horse)
  38. Playa (band)
  39. Greg Page (boxer)
  40. Terry Pendleton
  41. Kentucky Derby Festival
  42. Louisville Metro Police Department
  43. 5th Cavalry Regiment
  44. Taylor Nichols
  45. David Grissom
  46. Valley of the Drums
  47. Ward Hill Lamon
  48. Jefferson C. Davis
  49. Robert Nardelli
  50. Jim Caldwell (American football)
  51. John Cowan
  52. Mildred J. Hill
  53. Johnny Edwards (musician)
  54. Lance Burton
  55. IWA Mid-South
  56. Mickie Knuckles
  57. Run for the Roses (song)
  58. Taeler Hendrix
  59. Sovereign Grace Churches
  60. Fabian Ver
  61. Tori Hall
  62. Larry Collmus
  63. New Grass Revival
  64. Rebel (bourbon)
  65. 2011 Kentucky Derby
  66. Bullitt County, Kentucky
  67. Catherine McCord
  68. Shelley Duncan
  69. Dan Boyle (ice hockey)
  70. History of Louisville, Kentucky
  71. Rudy Rucker
  72. Big Four Bridge
  73. Optimist International
  74. Larnelle Harris
  75. C. J. Mahaney
  76. Thunder Over Louisville
  77. Belle of Louisville
  78. Bertha Palmer
  79. Sports in Louisville, Kentucky
  80. Gary Matthews Jr.
  81. George Devol
  82. John Yarmuth
  83. Travis Stone
  84. June of 44
  85. Ted Washington
  86. Larry Elmore
  87. Parents Involved in Community Schools v. Seattle School District No. 1
  88. Interstate 64 in Kentucky
  89. Corey Patterson
  90. Stith Thompson
  91. Roman Catholic Archdiocese of Louisville
  92. Louisville Zoo
  93. Boyce Watkins
  94. James Speed
  95. Jefferson County Public Schools (Kentucky)
BilledMammal ( talk) 00:23, 23 June 2024 (UTC) reply
I see you want something that can run repeatedly. I don't have time right now to put something together for you, but if Cryptic doesn't come up with something I'll do it sometime in the next couple of weeks - if I don't, feel free to remind me on my talk page. BilledMammal ( talk) 00:24, 23 June 2024 (UTC) reply
Thanks for the list - that's good for a start. Would you mind giving me a few clues on your approach for the script you did? It might snap me into figuring it out. Also, I have thought of using a spreadsheet or text compare software, but was hoping for an on-wiki or otherwise online approach. Stefen Towers among the rest! GabGruntwerk 00:40, 23 June 2024 (UTC) reply
A few Regex operations to get just the links in order, and then a basic python script that works down the first list and if the item exists on the second outputs it. Unfortunately, nothing online ATM. BilledMammal ( talk) 00:49, 23 June 2024 (UTC) reply
Thanks for the tips! Stefen Towers among the rest! GabGruntwerk 00:50, 23 June 2024 (UTC) reply
Any pure-sql-against-the-wmf-databases approach would have to start with something similar to either "Make a page in your userspace with redlinks to 1!Tom Cruise, 2!Muhammad Ali, ... 1000!Frank Torre" or "manually edit this stupidly long query that includes all that data" (like how quarry:query/81948 includes the namespace names, but with a thousand items instead of 30). — Cryptic 01:04, 23 June 2024 (UTC) reply
Indeed, those don't seem like tenable approaches. But this discussion has helped me figure out a solution, not optimal but workable:
  1. Copy popular articles wikitext into a text editor, and break down into a flat list using a macro I constructed with regex and other tricks.
  2. Use PetScan to create a flat list of articles with old issues (this didn't have to be in any particular order).
  3. Insert both flat lists into their own column in a spreadsheet, then find matches of first column entries in the second column, then apply a filter of matches, and voila.
Stefen Towers among the rest! GabGruntwerk 04:18, 23 June 2024 (UTC) reply
From Wikipedia, the free encyclopedia
(Redirected from Wikipedia:SQLREQ)

This is a page for requesting one-off database queries for certain criteria. Users who are interested and able to perform SQL queries on the projects can provide results from the Quarry website.

You may also be interested in the following:

  • If you are interested in writing SQL queries or helping out here, visit our tips page.
  • If you need to obtain a list of article titles that meet certain criteria, consider using PetScan ( user manual) or the default search. Petscan can generate list of articles in subcategories, articles which transclude some template, etc.
  • If you need to make changes to a number of articles based on a particular query, you can post to the bot requests page, depending on how many changes are needed.
  • For long-term review and checking, database reports are available.

Quarry does not have access to page content, so queries which require checking wikitext cannot be answered with Quarry. However, someone may be able to assist by using Quarry in another way (e.g. checking the table of category links rather than the "Category:" text) or suggest an alternative tool.

ListeriaBot issue

I already have a query, albeit SPARQL, but been running into this issue and I thought somebody could help here. Thanks! Assem Khidhr ( talk) 05:11, 12 May 2024 (UTC) reply

You might have better luck asking at d:WD:RAQ. — Cryptic 12:09, 16 May 2024 (UTC) reply

Help with expanding the self-transcluded-templates report

Hey. I was wondering if this is even possible and if someone here might be able to help with this. The query in User:Jonesey95/self-transcluded-templates collects templates that either have zero transclusions or are only transcluded on the template itself. I'd like to extend this to also the sub-pages of the template.

The new logic would be:

  • Get all templates with number of transclusions less than 6 (5 valid transclusions are: main, talk, /sandbox, /doc, /testcases - of the same template).
  • Keep in the report all templates that have all of their transclusions only on one of the above pages.

Is this possible? Gonnym ( talk) 09:17, 29 May 2024 (UTC) reply

What you ask for in your first paragraph and in your bulleted list aren't the same things, or even terribly similar. Also, while we could literally do what you're asking for in your bulleted list - first generate a list of templates with 0-5 transclusions, and then cull that - but it would be complex, quite likely would be very slow, and possibly wouldn't do quite what you want. Generally it's a better idea to ask for what you actually want, rather than how to get it, and that's especially true with SQL since, since it's primarily a descriptive language, that's what the query-writer does too. (At least, until something goes wrong.)
What I'm guessing you're really after is to exclude transclusions on
  1. the template's own talk page, and
  2. either
    1. the template's own "/sandbox", "/doc", or "/testcases" subpages, or
    2. all of the template's own subpages.
Which of B1 or B2 are you after? — Cryptic 17:01, 29 May 2024 (UTC) reply
What you ask for in your first paragraph and in your bulleted list aren't the same things I know... The first paragraph is what we currently have at User:Jonesey95/self-transcluded-templates. What I asked in the bulleted list is what I hoped we could modify it to.
Regarding your second question, B2. Exclude all of a template's own subpages (and talk page). So the finale result would be a database report with templates that have "zero" transclusions (but might have actual transclusions on their own pages). Gonnym ( talk) 17:09, 29 May 2024 (UTC) reply
If I've understood correctly: as well as the existing condition tl_from <> page_id, tl_from also has to differ from the page ids of all subpages and any talk page. Certes ( talk) 17:23, 29 May 2024 (UTC) reply
My knowledge of SQL is limited so can't answer that. Gonnym ( talk) 17:34, 29 May 2024 (UTC) reply
(I meant the "I'd like to extend this to also the sub-pages of the template." sentence.)
Do these results look right? — Cryptic 17:49, 29 May 2024 (UTC) reply
No. The transclusion check is meant to reduce the number of valid transclusions to zero. So for example, if template is transcluded only on itself, it then it should be on the report. Template:Article length bar/L0 was removed, but it shouldn't as it's unused other than itself. Gonnym ( talk) 17:53, 29 May 2024 (UTC) reply
Yeah, just realized that I was only removing items from the list that already showed no transclusions. — Cryptic 17:55, 29 May 2024 (UTC) reply
I've created Template:Test SQL report so help test this. It should appear on the report. Gonnym ( talk) 17:59, 29 May 2024 (UTC) reply
Take 2. — Cryptic 20:12, 29 May 2024 (UTC) reply
Looks great! I've browsed it and everything looks good. I'll have a deeper dive into it and if I find something I'll let you know, but so far works as requested. Thank you! Gonnym ( talk) 20:14, 29 May 2024 (UTC) reply

How do the patrol/pagetriage-curation logs really work?

Does anyone really know, definitively, what the distinction between (log_type = 'patrol' AND log_action = 'patrol) and (log_type = 'pagetriage-curation' AND log_action IN ('reviewed', 'reviewed-article', 'reviewed-redirect')) is? In particular, why do you variously get one, the other, or both even for patrols by the same person within minutes of each other (example: both, pagetriage only, patrol only); exactly which on-wiki actions result in those cases; and is there a good way to distinguish genuine multiple reviews of the same title without doublecounting the various log_type/log_action combinations (perhaps group by log_page)?

(Context is Wikipedia:Requests for adminship/Elli#General comments, starting with User:Hey man im josh's comment at 21:42, 31 May 2024; my best guesses for the answers are in the description of quarry:query/83443.) — Cryptic 00:05, 1 June 2024 (UTC) reply

Wikipedia:New pages patrol#Patrol versus review is my attempt to document this. It is confusing. I think the native patrol system has some unintuitive behaviors (maybe autoreviewing in certain cases, maybe being revision-based instead of page-based, not exactly sure, but I sense some differences), that I haven't quite wrapped my head around yet and that make it not correspond 1:1 to reviewing. Also, the original PageTriage authors tried to keep the patrol and review logs in sync but missed some cases such as page moves, so there are some bugs. phab:T346215, phab:T337356. Finally, PageTriage reviewing only occurs in the main namespace (and formerly the user namespace), whereas native patrolling with the "[Mark this page as patrolled]" link can occur in any namespace.
When evaluating an NPP's number of reviews, one should use the pagetriage-curation log exclusively because it corresponds to the NPP clicking the reviewed button in the Page Curation toolbar. The patrol log is less accurate. When the Page Curation toolbar is open, as it is for most NPPs unless they close it, the "[Mark this page as patrolled]" link is hidden.
Confusingly, XTools only displays the patrol count, not the reviewed count. quarry:query/70425 is an old query I wrote to get a count of a person's reviews only. I think I'll go file an XTools ticket to use "reviews" instead of "patrols" for enwiki. phab:T366397
One other thing. 'reviewed', 'reviewed-article', 'reviewed-redirect'. It used to be only reviewed, then we got rid of reviewed and split it into reviewed-article and reviewed-redirect for easier querying. This split happened about a year ago. phab:T349048Novem Linguae ( talk) 05:08, 1 June 2024 (UTC) reply

Page table slow query

Any way to speed up this page table query? CirrusSearch does it fast, but maybe elasticsearch has its own indexing or something to help make it fast. – Novem Linguae ( talk) 18:08, 15 June 2024 (UTC) reply

Yes, CirrusSearch has full-text indices on page titles, while the main db only has normal (prefix) ones.
Using the page_name_title index speeds this sort of query up considerably, since titles not matching '%.js' can be filtered out there (even though every title still needs to be looked at, the whole row won't need to be fetched). There's no way to force that, though, since we only have access to views. Sometimes you can fool the optimizer into the indexed search if you say which namespaces you want instead of the ones you don't, even if the list is long, something like WHERE page_namespace IN (0,1,3,4,5,6,7,9,10,11,12,13,14,15,100,101,118,119,710,711,828,829), but that doesn't work here. So you're going to have to partition it into several (well, lots of) queries that look small enough that the optimizer uses the index instead of a full table scan - WHERE page_namespace = 0 AND page_title < 'M', WHERE page_namespace =0 AND page_title >= 'M', and so on for the other possible namespaces. — Cryptic 00:38, 22 June 2024 (UTC) reply
An alternate approach: since you know that there won't be many titles ending in '.js' except in userspace and mediawiki:, find the titles yourself by grepping enwiki-20240601-all-titles.gz from a dump. Link to all of them from a page in your userspace, then you can use pagelinks to find them in a query and check page_content_model and page_restrictions. — Cryptic 04:19, 22 June 2024 (UTC) reply

Links in common between two pages, with a twist

I realize PetScan can be used to show links that appear on two pages, but I have a need for it to show such a result sorted by the order in which they appear on the first page. Based on my limited understanding of the database, I don't think the order of the links on a page are tracked. But just in case I'm wrong, can someone show me a query to do this? Or suggest an alternative approach? Thanks. Stefen Towers among the rest! GabGruntwerk 23:49, 22 June 2024 (UTC) reply

No, that's not possible; Quarry doesn't contain information about the content of the page.
Could you give some additional information about what you want this for? There are several alternative approaches that would work, but I need to know a little more about what you want it for. BilledMammal ( talk) 23:53, 22 June 2024 (UTC) reply
I thought Quarry knows the links on a page, so that's where I was coming from. Anyway, I have a page that shows links sorted by popularity (views, descending) and another page that shows links to articles having old issues. I'd like an intersection of the links between them, in order of popularity (order of appearance on first page). Stefen Towers among the rest! GabGruntwerk 00:03, 23 June 2024 (UTC) reply
It can see which pages a given page links to (or is linked from), but doesn't have any information on what order it's done in - that would need the page content.
Is this a one-time query, or will you need it repeated? — Cryptic 00:08, 23 June 2024 (UTC) reply
I'd like for it to be repeatable as the underlying pages will change. I'm fine with having to run it manually. Stefen Towers among the rest! GabGruntwerk 00:10, 23 June 2024 (UTC) reply
( edit conflict) It does, but it doesn't have information beyond that, such as about the text of the page.
I'm not aware of any tools that can help you with that, but I threw together the information you wanted using a quick and dirty script:
Extended content
  1. Deion Sanders
  2. Ford Explorer
  3. Maurice Lucas
  4. Josh Hamilton
  5. Fort Knox
  6. Diane Sawyer
  7. Aroldis Chapman
  8. Secretariat (film)
  9. UPS Airlines
  10. Joe Torre
  11. Presbyterian Church (USA)
  12. Damaris Phillips
  13. Jim Beam
  14. Louis Brandeis
  15. Jack McCall
  16. My Morning Jacket
  17. Carlton Fisk
  18. David Pajo
  19. Adam Dunn
  20. Kentucky Colonels
  21. Humana
  22. Earl Weaver
  23. Pope Lick Monster
  24. Meriwether Lewis Clark Jr.
  25. John Marshall Harlan
  26. B. Brian Blair
  27. Frank Ramsey (basketball)
  28. Interstate 71
  29. A. J. Foyt IV
  30. Oldham County, Kentucky
  31. Susanne Zenor
  32. Andy Van Slyke
  33. Harvey Fuqua
  34. Dan Uggla
  35. Homer Bailey
  36. Louisville Cardinals
  37. Aristides (horse)
  38. Playa (band)
  39. Greg Page (boxer)
  40. Terry Pendleton
  41. Kentucky Derby Festival
  42. Louisville Metro Police Department
  43. 5th Cavalry Regiment
  44. Taylor Nichols
  45. David Grissom
  46. Valley of the Drums
  47. Ward Hill Lamon
  48. Jefferson C. Davis
  49. Robert Nardelli
  50. Jim Caldwell (American football)
  51. John Cowan
  52. Mildred J. Hill
  53. Johnny Edwards (musician)
  54. Lance Burton
  55. IWA Mid-South
  56. Mickie Knuckles
  57. Run for the Roses (song)
  58. Taeler Hendrix
  59. Sovereign Grace Churches
  60. Fabian Ver
  61. Tori Hall
  62. Larry Collmus
  63. New Grass Revival
  64. Rebel (bourbon)
  65. 2011 Kentucky Derby
  66. Bullitt County, Kentucky
  67. Catherine McCord
  68. Shelley Duncan
  69. Dan Boyle (ice hockey)
  70. History of Louisville, Kentucky
  71. Rudy Rucker
  72. Big Four Bridge
  73. Optimist International
  74. Larnelle Harris
  75. C. J. Mahaney
  76. Thunder Over Louisville
  77. Belle of Louisville
  78. Bertha Palmer
  79. Sports in Louisville, Kentucky
  80. Gary Matthews Jr.
  81. George Devol
  82. John Yarmuth
  83. Travis Stone
  84. June of 44
  85. Ted Washington
  86. Larry Elmore
  87. Parents Involved in Community Schools v. Seattle School District No. 1
  88. Interstate 64 in Kentucky
  89. Corey Patterson
  90. Stith Thompson
  91. Roman Catholic Archdiocese of Louisville
  92. Louisville Zoo
  93. Boyce Watkins
  94. James Speed
  95. Jefferson County Public Schools (Kentucky)
BilledMammal ( talk) 00:23, 23 June 2024 (UTC) reply
I see you want something that can run repeatedly. I don't have time right now to put something together for you, but if Cryptic doesn't come up with something I'll do it sometime in the next couple of weeks - if I don't, feel free to remind me on my talk page. BilledMammal ( talk) 00:24, 23 June 2024 (UTC) reply
Thanks for the list - that's good for a start. Would you mind giving me a few clues on your approach for the script you did? It might snap me into figuring it out. Also, I have thought of using a spreadsheet or text compare software, but was hoping for an on-wiki or otherwise online approach. Stefen Towers among the rest! GabGruntwerk 00:40, 23 June 2024 (UTC) reply
A few Regex operations to get just the links in order, and then a basic python script that works down the first list and if the item exists on the second outputs it. Unfortunately, nothing online ATM. BilledMammal ( talk) 00:49, 23 June 2024 (UTC) reply
Thanks for the tips! Stefen Towers among the rest! GabGruntwerk 00:50, 23 June 2024 (UTC) reply
Any pure-sql-against-the-wmf-databases approach would have to start with something similar to either "Make a page in your userspace with redlinks to 1!Tom Cruise, 2!Muhammad Ali, ... 1000!Frank Torre" or "manually edit this stupidly long query that includes all that data" (like how quarry:query/81948 includes the namespace names, but with a thousand items instead of 30). — Cryptic 01:04, 23 June 2024 (UTC) reply
Indeed, those don't seem like tenable approaches. But this discussion has helped me figure out a solution, not optimal but workable:
  1. Copy popular articles wikitext into a text editor, and break down into a flat list using a macro I constructed with regex and other tricks.
  2. Use PetScan to create a flat list of articles with old issues (this didn't have to be in any particular order).
  3. Insert both flat lists into their own column in a spreadsheet, then find matches of first column entries in the second column, then apply a filter of matches, and voila.
Stefen Towers among the rest! GabGruntwerk 04:18, 23 June 2024 (UTC) reply

Videos

Youtube | Vimeo | Bing

Websites

Google | Yahoo | Bing

Encyclopedia

Google | Yahoo | Bing

Facebook