|
||||
This page has archives. Sections older than 14 days may be automatically archived by Lowercase sigmabot III when more than 4 sections are present. |
This is a page for requesting one-off database queries for certain criteria. Users who are interested and able to perform SQL queries on the projects can provide results from the Quarry website.
You may also be interested in the following:
Quarry does not have access to page content, so queries which require checking wikitext cannot be answered with Quarry. However, someone may be able to assist by using Quarry in another way (e.g. checking the table of category links rather than the "Category:" text) or suggest an alternative tool.
I kindly requested a query of Category:Green Bay Packers Hall of Fame to produce a wikitable with three columns:
The table would look something like (but for all 160+ articles in the cat):
Name | Size (bytes) | Rating |
---|---|---|
Bob Monnett | 2,347 | Stub |
This is to help me prioritize important articles under WP:PACKERS. Thank you! « Gonzo fan2007 (talk) @ 18:39, 21 March 2024 (UTC)
|sql=
parameter. If you're copying the SQL into Quarry or some other SQL client that doesn't parse wikitext, you'll need to change {{!}} to | manually.
Certes (
talk) 10:34, 22 March 2024 (UTC)
I'm not exactly requesting a query, but help with one we already have:
Wikipedia:WikiProject_Women_in_Red/Redlinks/C2.
This version ran in 37 seconds. Some time between 20 and 25 February, it stopped working, instead throwing SQL Error: ER_DATA_TOO_LONG: Data too long for column 'subcat' at row 2
. This is because it initially populates a table with one row, a fairly short category name, then attempts to add longer names. This used to work but something in the database or engine must have changed, as the same SQL now fails. I've fixed it by seeding the table with a long dummy value which doesn't affect the results, but
that version takes nearly two hours to run. Does anyone know what changed? Is part of the forthcoming categorylinks "upgrade" causing the failure and wrecking performance? (A simple search for a link to details of that change timed out too – perhaps there is a more general temporary performance problem but VPT has nothing, and performance was just as bad last night.)
Certes (
talk) 15:24, 24 March 2024 (UTC)
WITH names AS ( /* the long cte */ ) SELECT COUNT(*) FROM names JOIN pagelinks ON pl_namespace = 0 AND pl_title = name GROUP BY name
, but every variant of "pl_title starts with name" I could come up with - name=SUBSTRING_INDEX like you used, pl_title LIKE CONCAT(name, '%'), LEFT(pl_title, LENGTH(name)) = name, and so on - was unindexed. Which is why your query taken as a whole is looking through all of page first, when I suspect it was able to make use of the Cn..D range limit before. Irritatingly, putting the CTE results directly in the query instead like
quarry:query/81913 does work, but I don't know a workaround to force use of the index when we don't have access to the real pagelinks table, just a view. (And no, I don't know what changed.)Something like SELECT CAST('Feminine_given_names' AS VARCHAR(256)), 0
would be cleaner than your long 'This category does not exist but...' dummy value, but obviously that's not the real problem here.What, precisely, are you trying to do? Maybe we can find another way to do it? —
Cryptic 17:50, 10 April 2024 (UTC)
+------+-----------------+---------------+--------+------------------------------------------------------------------------------+-----------------+---------+---------------------------------+----------+-----------------------------------------------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +------+-----------------+---------------+--------+------------------------------------------------------------------------------+-----------------+---------+---------------------------------+----------+-----------------------------------------------------------+ | 1 | PRIMARY | page | index | PRIMARY,page_name_title | page_name_title | 261 | NULL | 57680411 | Using where; Using index; Using temporary; Using filesort | | 1 | PRIMARY | pagelinks | ref | PRIMARY,pl_namespace,pl_backlinks_namespace,pl_backlinks_namespace_target_id | PRIMARY | 8 | enwiki.page.page_id,const | 4 | Using where | | 1 | PRIMARY | <derived5> | ref | key0 | key0 | 258 | func | 1 | Using where | | 1 | PRIMARY | page | eq_ref | page_name_title | page_name_title | 261 | const,enwiki.pagelinks.pl_title | 1 | Using index | | 5 | DERIVED | <derived2> | ALL | NULL | NULL | NULL | NULL | 2 | Using where; Using temporary | | 5 | DERIVED | categorylinks | ref | PRIMARY,cl_timestamp,cl_sortkey | cl_sortkey | 258 | deepcat.subcat,const | 19 | Using where; Using index | | 5 | DERIVED | page | eq_ref | PRIMARY,page_name_title | PRIMARY | 4 | enwiki.categorylinks.cl_from | 1 | Using where | | 2 | DERIVED | NULL | NULL | NULL | NULL | NULL | NULL | NULL | No tables used | | 3 | UNION | NULL | NULL | NULL | NULL | NULL | NULL | NULL | No tables used | | 4 | RECURSIVE UNION | <derived2> | ALL | NULL | NULL | NULL | NULL | 2 | Using where | | 4 | RECURSIVE UNION | categorylinks | ref | PRIMARY,cl_timestamp,cl_sortkey | cl_sortkey | 258 | deepcat.subcat,const | 19 | Using where; Using index | | 4 | RECURSIVE UNION | page | eq_ref | PRIMARY,page_name_title | PRIMARY | 4 | enwiki.categorylinks.cl_from | 1 | Using where | | NULL | UNION RESULT | <union2,3,4> | ALL | NULL | NULL | NULL | NULL | NULL | | +------+-----------------+---------------+--------+------------------------------------------------------------------------------+-----------------+---------+---------------------------------+----------+-----------------------------------------------------------+
and the same for WHERE pl_from_namespace = 0:
+------+-----------------+---------------+--------+------------------------------------------------------------------------------+------------------------+---------+---------------------------------+----------+-----------------------------------------------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +------+-----------------+---------------+--------+------------------------------------------------------------------------------+------------------------+---------+---------------------------------+----------+-----------------------------------------------------------+ | 1 | PRIMARY | pagelinks | range | PRIMARY,pl_namespace,pl_backlinks_namespace,pl_backlinks_namespace_target_id | pl_backlinks_namespace | 265 | NULL | 50732398 | Using where; Using index; Using temporary; Using filesort | | 1 | PRIMARY | <derived5> | ref | key0 | key0 | 258 | func | 1 | Using where | | 1 | PRIMARY | page | eq_ref | page_name_title | page_name_title | 261 | const,enwiki.pagelinks.pl_title | 1 | Using index | | 1 | PRIMARY | page | eq_ref | PRIMARY,page_name_title | PRIMARY | 4 | enwiki.pagelinks.pl_from | 1 | Using where | | 5 | DERIVED | <derived2> | ALL | NULL | NULL | NULL | NULL | 2 | Using where; Using temporary | | 5 | DERIVED | categorylinks | ref | PRIMARY,cl_timestamp,cl_sortkey | cl_sortkey | 258 | deepcat.subcat,const | 19 | Using where; Using index | | 5 | DERIVED | page | eq_ref | PRIMARY,page_name_title | PRIMARY | 4 | enwiki.categorylinks.cl_from | 1 | Using where | | 2 | DERIVED | NULL | NULL | NULL | NULL | NULL | NULL | NULL | No tables used | | 3 | UNION | NULL | NULL | NULL | NULL | NULL | NULL | NULL | No tables used | | 4 | RECURSIVE UNION | <derived2> | ALL | NULL | NULL | NULL | NULL | 2 | Using where | | 4 | RECURSIVE UNION | categorylinks | ref | PRIMARY,cl_timestamp,cl_sortkey | cl_sortkey | 258 | deepcat.subcat,const | 19 | Using where; Using index | | 4 | RECURSIVE UNION | page | eq_ref | PRIMARY,page_name_title | PRIMARY | 4 | enwiki.categorylinks.cl_from | 1 | Using where | | NULL | UNION RESULT | <union2,3,4> | ALL | NULL | NULL | NULL | NULL | NULL | | +------+-----------------+---------------+--------+------------------------------------------------------------------------------+------------------------+---------+---------------------------------+----------+-----------------------------------------------------------+
The first doesn't look like it'll look through so many more rows, but it's definitely taking forever, and the table order makes less sense. — Cryptic 20:03, 10 April 2024 (UTC)
@ Cryptic: Just as we got this working, the WMF is about to improve the pagelinks table by removing the useful columns. I've tried updating a query but of course it now times out. Any further help would be very welcome. Certes ( talk) 20:45, 18 April 2024 (UTC)
quarry:query/81887. The part of my query that limits things to the last year (log_id > subquery) is running too slow and is causing it to timeout. Any suggestions to speed it up? The query works fine when I hard-code a log_id like I did in quarry:query/81844. But I'd prefer the query to be more dynamic than just hard-coding a log_id. Thanks. – Novem Linguae ( talk) 13:55, 9 April 2024 (UTC)
logging_logindex
and actor_logging
sped things up a lot. As a side note, since text fields in the replicas are either VARBINARY or BLOB type, the LIKE clause is in fact case-sensitive. If you first convert the fields to UTF-8 then LIKE works as expected. Also, unless you don't want the oldest results to appear first, there's no reason to sort by the log_id
ascending since that's already the clustered index of the table.
Uhai (
talk) 03:25, 10 April 2024 (UTC)
I attempted this in quarry:query/82243, but I may well have missed things that would make it more efficient. I’m wondering if the time it takes to run is purely a symptom of having to look through two very large categories, or if anyone’s aware of things that might improve it. All the best :) — a smart kitten[ meow 18:14, 24 April 2024 (UTC)
I'm wondering if it would be possible to check for articles that use both the Parent and Owner field in infobox company. The reason being, I have noticed that many articles incorrectly use the infobox company template to show higher-level ownership. This is not allowed per the infobox documentation. I have fixed this on most pages. However, there are many that I cannot find. Wii nter U 22:45, 27 April 2024 (UTC)
|
||||
This page has archives. Sections older than 14 days may be automatically archived by Lowercase sigmabot III when more than 4 sections are present. |
This is a page for requesting one-off database queries for certain criteria. Users who are interested and able to perform SQL queries on the projects can provide results from the Quarry website.
You may also be interested in the following:
Quarry does not have access to page content, so queries which require checking wikitext cannot be answered with Quarry. However, someone may be able to assist by using Quarry in another way (e.g. checking the table of category links rather than the "Category:" text) or suggest an alternative tool.
I kindly requested a query of Category:Green Bay Packers Hall of Fame to produce a wikitable with three columns:
The table would look something like (but for all 160+ articles in the cat):
Name | Size (bytes) | Rating |
---|---|---|
Bob Monnett | 2,347 | Stub |
This is to help me prioritize important articles under WP:PACKERS. Thank you! « Gonzo fan2007 (talk) @ 18:39, 21 March 2024 (UTC)
|sql=
parameter. If you're copying the SQL into Quarry or some other SQL client that doesn't parse wikitext, you'll need to change {{!}} to | manually.
Certes (
talk) 10:34, 22 March 2024 (UTC)
I'm not exactly requesting a query, but help with one we already have:
Wikipedia:WikiProject_Women_in_Red/Redlinks/C2.
This version ran in 37 seconds. Some time between 20 and 25 February, it stopped working, instead throwing SQL Error: ER_DATA_TOO_LONG: Data too long for column 'subcat' at row 2
. This is because it initially populates a table with one row, a fairly short category name, then attempts to add longer names. This used to work but something in the database or engine must have changed, as the same SQL now fails. I've fixed it by seeding the table with a long dummy value which doesn't affect the results, but
that version takes nearly two hours to run. Does anyone know what changed? Is part of the forthcoming categorylinks "upgrade" causing the failure and wrecking performance? (A simple search for a link to details of that change timed out too – perhaps there is a more general temporary performance problem but VPT has nothing, and performance was just as bad last night.)
Certes (
talk) 15:24, 24 March 2024 (UTC)
WITH names AS ( /* the long cte */ ) SELECT COUNT(*) FROM names JOIN pagelinks ON pl_namespace = 0 AND pl_title = name GROUP BY name
, but every variant of "pl_title starts with name" I could come up with - name=SUBSTRING_INDEX like you used, pl_title LIKE CONCAT(name, '%'), LEFT(pl_title, LENGTH(name)) = name, and so on - was unindexed. Which is why your query taken as a whole is looking through all of page first, when I suspect it was able to make use of the Cn..D range limit before. Irritatingly, putting the CTE results directly in the query instead like
quarry:query/81913 does work, but I don't know a workaround to force use of the index when we don't have access to the real pagelinks table, just a view. (And no, I don't know what changed.)Something like SELECT CAST('Feminine_given_names' AS VARCHAR(256)), 0
would be cleaner than your long 'This category does not exist but...' dummy value, but obviously that's not the real problem here.What, precisely, are you trying to do? Maybe we can find another way to do it? —
Cryptic 17:50, 10 April 2024 (UTC)
+------+-----------------+---------------+--------+------------------------------------------------------------------------------+-----------------+---------+---------------------------------+----------+-----------------------------------------------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +------+-----------------+---------------+--------+------------------------------------------------------------------------------+-----------------+---------+---------------------------------+----------+-----------------------------------------------------------+ | 1 | PRIMARY | page | index | PRIMARY,page_name_title | page_name_title | 261 | NULL | 57680411 | Using where; Using index; Using temporary; Using filesort | | 1 | PRIMARY | pagelinks | ref | PRIMARY,pl_namespace,pl_backlinks_namespace,pl_backlinks_namespace_target_id | PRIMARY | 8 | enwiki.page.page_id,const | 4 | Using where | | 1 | PRIMARY | <derived5> | ref | key0 | key0 | 258 | func | 1 | Using where | | 1 | PRIMARY | page | eq_ref | page_name_title | page_name_title | 261 | const,enwiki.pagelinks.pl_title | 1 | Using index | | 5 | DERIVED | <derived2> | ALL | NULL | NULL | NULL | NULL | 2 | Using where; Using temporary | | 5 | DERIVED | categorylinks | ref | PRIMARY,cl_timestamp,cl_sortkey | cl_sortkey | 258 | deepcat.subcat,const | 19 | Using where; Using index | | 5 | DERIVED | page | eq_ref | PRIMARY,page_name_title | PRIMARY | 4 | enwiki.categorylinks.cl_from | 1 | Using where | | 2 | DERIVED | NULL | NULL | NULL | NULL | NULL | NULL | NULL | No tables used | | 3 | UNION | NULL | NULL | NULL | NULL | NULL | NULL | NULL | No tables used | | 4 | RECURSIVE UNION | <derived2> | ALL | NULL | NULL | NULL | NULL | 2 | Using where | | 4 | RECURSIVE UNION | categorylinks | ref | PRIMARY,cl_timestamp,cl_sortkey | cl_sortkey | 258 | deepcat.subcat,const | 19 | Using where; Using index | | 4 | RECURSIVE UNION | page | eq_ref | PRIMARY,page_name_title | PRIMARY | 4 | enwiki.categorylinks.cl_from | 1 | Using where | | NULL | UNION RESULT | <union2,3,4> | ALL | NULL | NULL | NULL | NULL | NULL | | +------+-----------------+---------------+--------+------------------------------------------------------------------------------+-----------------+---------+---------------------------------+----------+-----------------------------------------------------------+
and the same for WHERE pl_from_namespace = 0:
+------+-----------------+---------------+--------+------------------------------------------------------------------------------+------------------------+---------+---------------------------------+----------+-----------------------------------------------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +------+-----------------+---------------+--------+------------------------------------------------------------------------------+------------------------+---------+---------------------------------+----------+-----------------------------------------------------------+ | 1 | PRIMARY | pagelinks | range | PRIMARY,pl_namespace,pl_backlinks_namespace,pl_backlinks_namespace_target_id | pl_backlinks_namespace | 265 | NULL | 50732398 | Using where; Using index; Using temporary; Using filesort | | 1 | PRIMARY | <derived5> | ref | key0 | key0 | 258 | func | 1 | Using where | | 1 | PRIMARY | page | eq_ref | page_name_title | page_name_title | 261 | const,enwiki.pagelinks.pl_title | 1 | Using index | | 1 | PRIMARY | page | eq_ref | PRIMARY,page_name_title | PRIMARY | 4 | enwiki.pagelinks.pl_from | 1 | Using where | | 5 | DERIVED | <derived2> | ALL | NULL | NULL | NULL | NULL | 2 | Using where; Using temporary | | 5 | DERIVED | categorylinks | ref | PRIMARY,cl_timestamp,cl_sortkey | cl_sortkey | 258 | deepcat.subcat,const | 19 | Using where; Using index | | 5 | DERIVED | page | eq_ref | PRIMARY,page_name_title | PRIMARY | 4 | enwiki.categorylinks.cl_from | 1 | Using where | | 2 | DERIVED | NULL | NULL | NULL | NULL | NULL | NULL | NULL | No tables used | | 3 | UNION | NULL | NULL | NULL | NULL | NULL | NULL | NULL | No tables used | | 4 | RECURSIVE UNION | <derived2> | ALL | NULL | NULL | NULL | NULL | 2 | Using where | | 4 | RECURSIVE UNION | categorylinks | ref | PRIMARY,cl_timestamp,cl_sortkey | cl_sortkey | 258 | deepcat.subcat,const | 19 | Using where; Using index | | 4 | RECURSIVE UNION | page | eq_ref | PRIMARY,page_name_title | PRIMARY | 4 | enwiki.categorylinks.cl_from | 1 | Using where | | NULL | UNION RESULT | <union2,3,4> | ALL | NULL | NULL | NULL | NULL | NULL | | +------+-----------------+---------------+--------+------------------------------------------------------------------------------+------------------------+---------+---------------------------------+----------+-----------------------------------------------------------+
The first doesn't look like it'll look through so many more rows, but it's definitely taking forever, and the table order makes less sense. — Cryptic 20:03, 10 April 2024 (UTC)
@ Cryptic: Just as we got this working, the WMF is about to improve the pagelinks table by removing the useful columns. I've tried updating a query but of course it now times out. Any further help would be very welcome. Certes ( talk) 20:45, 18 April 2024 (UTC)
quarry:query/81887. The part of my query that limits things to the last year (log_id > subquery) is running too slow and is causing it to timeout. Any suggestions to speed it up? The query works fine when I hard-code a log_id like I did in quarry:query/81844. But I'd prefer the query to be more dynamic than just hard-coding a log_id. Thanks. – Novem Linguae ( talk) 13:55, 9 April 2024 (UTC)
logging_logindex
and actor_logging
sped things up a lot. As a side note, since text fields in the replicas are either VARBINARY or BLOB type, the LIKE clause is in fact case-sensitive. If you first convert the fields to UTF-8 then LIKE works as expected. Also, unless you don't want the oldest results to appear first, there's no reason to sort by the log_id
ascending since that's already the clustered index of the table.
Uhai (
talk) 03:25, 10 April 2024 (UTC)
I attempted this in quarry:query/82243, but I may well have missed things that would make it more efficient. I’m wondering if the time it takes to run is purely a symptom of having to look through two very large categories, or if anyone’s aware of things that might improve it. All the best :) — a smart kitten[ meow 18:14, 24 April 2024 (UTC)
I'm wondering if it would be possible to check for articles that use both the Parent and Owner field in infobox company. The reason being, I have noticed that many articles incorrectly use the infobox company template to show higher-level ownership. This is not allowed per the infobox documentation. I have fixed this on most pages. However, there are many that I cannot find. Wii nter U 22:45, 27 April 2024 (UTC)