This is an archive of past discussions. Do not edit the contents of this page. If you wish to start a new discussion or revive an old one, please do so on the current talk page. |
Archive 1 | Archive 2 | Archive 3 | → | Archive 5 |
Now that we've got Last edit and Last user columns for WP:Database reports/Empty categories, how hard would it be to add the same features to WP:Database reports/Uncategorized categories? - Stepheng3 ( talk) 21:36, 5 November 2008 (UTC)
I've begun working through the Wikipedia:Database reports/Empty categories, and I have a few suggestions:
I hope these ideas inspire someone to upgrade this valuable report. Thanks in advance, - Stepheng3 ( talk) 07:35, 30 October 2008 (UTC)
(unindent) w00t! Looks like I've got it working now. :D I'm going to hold off on #6 while discussion continues (though it is trivial to implement once there's a decision about what words / phrases to filter out). -- MZMcBride ( talk) 20:14, 2 November 2008 (UTC)
We could use some review of the long-term semi-protection on many articles. We could run the report daily, or even weekly. Thanks, HiDrNick! 15:47, 12 November 2008 (UTC)
If these reports already exist, or if these pages are tagged/deleted by bots, that would be nice to know. If not, let's start with monthly reports and see how much cleanup work there is. - Stepheng3 ( talk) 22:54, 31 October 2008 (UTC)
Also, you may be interested in the /Magic subpage in my userspace where I list all of the things you've listed here and more. Cheers. -- MZMcBride ( talk) 23:23, 31 October 2008 (UTC)
How about a report showing categories that contain themselves as members? I've run into this situation dozens of times in the past year. Some editor (who is probably a bit unclear on how categories are intended to work) creates a category Category:Foo with the text [[Category:Foo]] in it. Or someone copying categories from the main article forgets to delete the category in question. The resulting category is categorized, non-zero length, and non-empty. I think there might be some cleanup work to do in this area.
And if you get ambitious, maybe you could write a report to identify cycles in the category hierarchy.
A third alternative might be to modify Wikipedia:Database reports/Uncategorized categories so that the category in question is excluded from being counted as a parent. This would be the least thorough approach, though probably the easiest. - Stepheng3 ( talk) 19:47, 1 November 2008 (UTC)
Well, the general trick is to wrap the [[Category:Foo]] links in a {{{category}}} parameter and then define it as null where you don't want to have the category show up. So, for example:
Template:
Foo {{{1}}} ''{{{2}}}''<includeonly>[[Category:Bar]]</includeonly><noinclude> Docs... </noinclude>
That would be changed to:
Foo {{{1}}} ''{{{2}}}''<includeonly>{{{category|[[Category:Bar]]}}}</includeonly><noinclude> Docs... </noinclude>
Transclusions of the template need to be changed from:
{{template}}
To:
{{template|category=}}
Essentially, you're defining a category parameter (and defaulting it to the regular category) while then setting it as null for specific pages. -- MZMcBride ( talk) 04:26, 4 November 2008 (UTC)
<includeonly>{{#ifeq:{{ns:2}}|{{NAMESPACE}}|[[Category:FOO|{{PAGENAME}}]]|}}</includeonly>
Why does Category:Slovenia national football team templates keep showing up in this report? - Stepheng3 ( talk) 01:28, 22 December 2008 (UTC)
mysql> SELECT * FROM categorylinks WHERE cl_to="Slovenia_national_football_team_templates";
+----------+-------------------------------------------+-----------------------------------+---------------------+
| cl_from | cl_to | cl_sortkey | cl_timestamp |
+----------+-------------------------------------------+-----------------------------------+---------------------+
| 7314367 | Slovenia_national_football_team_templates | Slovenia Squad 2002 World Cup | 2006-10-06 04:20:14 |
| 17894927 | Slovenia_national_football_team_templates | Template:Slovenia Squad 2000 Euro | 2008-06-11 19:27:50 |
+----------+-------------------------------------------+-----------------------------------+---------------------+
2 rows in set (0.02 sec)
That looks right to me. The Toolserver is still a bit lagged (about four days), so I'm inclined to wait until the next update to see if this is resolved. -- MZMcBride ( talk) 04:53, 5 January 2009 (UTC)
I notice that none of the reports were updated on Wednesday or today (Thursday) -- not even the ones marked "daily". Is something broken? - Stepheng3 ( talk) 18:16, 1 January 2009 (UTC)
Is there a way to generate these with more current data? The galleries display images included in {{ infobox lake}} as "image_lake". The name of the lake displayed is from "lake_name" and links back to the lakes article. I break it down manually in pages with 200 images.
Currently I use Templatetiger. Its data is updated once or twice a year. Its January 2009 update is with data from 2008-10-08. -- User:Docu
So I'd need to read Infobox lake and then each article directly from the site? Maybe I can get this to work with pywikipediabot. BTW is there one that reads template fields? -- User:Docu
After looking through some of the tools, with a few changes, archivebot.py might do it. -- User:Docu
This report appears to be broken. Is anyone looking into this? - Stepheng3 ( talk) 04:49, 24 January 2009 (UTC)
I've been unsuccessful at trying to convince rst20xx to remove this exemption he added to the C1 criteria. Could you filter these categories out of the empty categories report? Here are a couple examples:
While I've got your attention, I'm reiterating my request for categories whose names contain the following phrases to be filtered out, or at least segregated in some way from the main list of empty categories:
In my experience, categories of this sort are "project categories that by their nature may become empty on occasion". -- Stepheng3 ( talk) 21:00, 12 February 2009 (UTC)
Over at Wikipedia:Improving referencing efforts we're trying to improve article referencing. It would be helpful if we had statistics on the proportion of articles with references, average references per article, etc. Ideally, I would like to see to compare a year ago to this year, or even monthly stats if that would be possible. Thanks. - Peregrine Fisher ( talk) ( contribs) 22:03, 15 December 2008 (UTC)
Pages with a protection level [edit=none], [move=autoconfirmed], or with a [edit=sysop] and not [move=sysop]. As it confuses bots and protection templates (through {{PROTECTIONLEVEL}}). Thanks, Cenarium ( talk) 14:19, 13 February 2009 (UTC)
Someone should probably check my logic here, but this should be what you want.
mysql> SELECT
-> CONCAT('# [[',ns_name,':',page_title,']]')
-> FROM page_restrictions AS pr1
-> JOIN page
-> ON pr1.pr_page = page_id
-> JOIN toolserver.namespace
-> ON page_namespace = ns_id
-> AND dbname = "enwiki_p"
-> WHERE pr1.pr_type = "move"
-> AND pr1.pr_level = "autoconfirmed"
-> AND NOT EXISTS (SELECT
-> 1
-> FROM page_restrictions AS pr2
-> WHERE pr1.pr_page = pr2.pr_page
-> AND pr2.pr_type != "move");
+----------------------------------------------+
| CONCAT('# [[',ns_name,':',page_title,']]') |
+----------------------------------------------+
| # [[:Lysergic_acid_diethylamide]] |
| # [[Talk:Serbo-Croatian_language/Archive_2]] |
| # [[:Scrubs_(TV_series)]] |
| # [[:Interscope_Records]] |
| # [[:Lost]] |
| # [[User talk:Anonymous_editor]] |
| # [[User talk:212.85.18.7]] |
| # [[Template:Intelligent_Design]] |
| # [[Portal:Taiwan]] |
| # [[User:Titoxd/Sandbox2]] |
| # [[Template:Essay]] |
| # [[:DADVSI]] |
| # [[User talk:210.217.33.187]] |
| # [[User talk:Hillman]] |
| # [[:George_Augustus_Stallings]] |
| # [[User talk:Jersyko/archive10]] |
| # [[User talk:Kafka_Liz]] |
| # [[:Glen_A._Wilson_High_School]] |
| # [[User talk:Gavin.collins/Archive_6]] |
| # [[User talk:Peter_Damian_(old)]] |
| # [[User talk:69.180.82.43]] |
| # [[User talk:EEMIV/Archive10]] |
| # [[User talk:EEMIV]] |
| # [[Talk:Serbo-Croatian_language]] |
+----------------------------------------------+
24 rows in set (0.18 sec)
mysql> SELECT
-> CONCAT('# [[',ns_name,':',page_title,']]')
-> FROM page_restrictions AS pr1
-> JOIN page
-> ON pr1.pr_page = page_id
-> JOIN toolserver.namespace
-> ON page_namespace = ns_id
-> AND dbname = "enwiki_p"
-> WHERE pr1.pr_type = "edit"
-> AND pr1.pr_level = "sysop"
-> AND NOT EXISTS (SELECT
-> 1
-> FROM page_restrictions AS pr2
-> WHERE pr1.pr_page = pr2.pr_page
-> AND pr2.pr_type = "move"
-> AND pr2.pr_level = "sysop");
+--------------------------------------------+
| CONCAT('# [[',ns_name,':',page_title,']]') |
+--------------------------------------------+
| # [[Template:WP_Mesoamerica_subpage]] |
| # [[User talk:Railer_198]] |
| # [[Template:IPsock]] |
| # [[User:Happy-melon/sandbox4]] |
| # [[User:Kingturtle/Warningkt]] |
| # [[User:Sceptre/contactme]] |
| # [[User:Sceptre/userpage]] |
| # [[User talk:Fatloss]] |
| # [[User talk:Stifle/wizard/images]] |
+--------------------------------------------+
9 rows in set (0.48 sec)
Don't have time at the moment to make a proper database report, but at least you have the results. :-) -- MZMcBride ( talk) 15:31, 14 February 2009 (UTC)
SELECT *
FROM page
INNER JOIN pagelinks ON page_id = pl_from
WHERE page_namespace = 0
AND pl_namespace=0 /* optional */
/* AND page_len > 2024 to skip stubs */
AND page_title IN (SELECT rd_title
FROM redirect
INNER JOIN page ON rd_from = page_id
WHERE page_title = pl_title AND page_namespace=pl_namespace)
LIMIT 50;
This query will find links that redirect to the originating page. This can happened because:
In all cases it is annoying for the reading to have to reload the same page. — Dispenser 17:25, 13 February 2009 (UTC)
Not sure I understand this query. Here are sample results of the query for enwiki_p with LIMIT 5.
mysql> SELECT *
-> FROM page
-> INNER JOIN pagelinks ON page_id = pl_from
-> WHERE page_namespace = 0
-> AND pl_namespace=0 /* optional */
-> /* AND page_len > 2024 to skip stubs */
-> AND page_title IN (SELECT rd_title
-> FROM redirect
-> INNER JOIN page ON rd_from = page_id
-> WHERE page_title = pl_title AND page_namespace=pl_namespace)
-> LIMIT 5;
+----------+----------------+-------------------------------------------+-------------------+--------------+------------------+-------------+----------------+----------------+-------------+----------+----------+--------------+-------------------------------------------+
| page_id | page_namespace | page_title | page_restrictions | page_counter | page_is_redirect | page_is_new | page_random | page_touched | page_latest | page_len | pl_from | pl_namespace | pl_title |
+----------+----------------+-------------------------------------------+-------------------+--------------+------------------+-------------+----------------+----------------+-------------+----------+----------+--------------+-------------------------------------------+
| 19813130 | 0 | "D"_Is_for_Dubby_-_The_Lustmord_Dub_Mixes | | 0 | 0 | 0 | 0.671963747482 | 20081214074626 | 257866114 | 1356 | 19813130 | 0 | "D"_is_for_Dubby_-_The_Lustmord_Dub_Mixes |
| 9582421 | 0 | "Them" | | 0 | 0 | 0 | 0.415525086755 | 20090211013015 | 265439711 | 4810 | 9582421 | 0 | "Them"_(King_Diamond_album) |
| 12571133 | 0 | "V"_Is_for_Vagina | | 0 | 0 | 0 | 0.364994536024 | 20090213022315 | 270356944 | 6165 | 12571133 | 0 | "V"_is_for_Vagina |
| 16079543 | 0 | "V"_Is_for_Viagra._The_Remixes | | 0 | 0 | 0 | 0.409772390435 | 20081216101637 | 258331209 | 3187 | 16079543 | 0 | "V"_is_for_Viagra._The_Remixes |
| 7630017 | 0 | "Weird_Al"_Yankovic_discography | | 0 | 0 | 0 | 0.110680981041 | 20090211013015 | 269028703 | 20424 | 7630017 | 0 | List_of_singles_by_"Weird_Al"_Yankovic |
+----------+----------------+-------------------------------------------+-------------------+--------------+------------------+-------------+----------------+----------------+-------------+----------+----------+--------------+-------------------------------------------+
5 rows in set (9.06 sec)
I don't think this is finding what you're after (self-redirects).
-- MZMcBride ( talk) 18:18, 13 February 2009 (UTC)
Oh, I think I just understood what you're talking about. This is probably not feasible for a wiki page (likely far too many results), but it can certainly be saved to a text file (either as a one-time thing or regularly). The pagelinks table is enormous, however, so it could take a very long time to run and may require LIMITs and such.... -- MZMcBride ( talk) 18:41, 13 February 2009 (UTC)
In the toolbox area on the left of the Wikipedia page on the "Special pages" page, several links don't work. How do I delete them? There is nothing on the "Unused files" page. There is nothing on the "wanted pages" page. There is nothing on the "unused templates page". I mentioned the issue on ANI and it was suggested I come here. ChildofMidnight ( talk) 07:43, 9 March 2009 (UTC)
I think we should have a template for exclusion from this report. Category:Wikipedia noindex pages, for example, should probably not be in it, even though it fits the formal criteria. עוד מישהו Od Mishehu 09:31, 9 March 2009 (UTC)
Quick note. I forgot to sync this file to the Toolserver, so when it ran just recently, it was using the old query. I'm re-running it manually now to exclude the tagged category pages. -- MZMcBride ( talk) 09:21, 21 March 2009 (UTC)
I think a monthly report of articles with direct use (i.e not using templates) of categories ending with the word "stubs" would be useful. עוד מישהו Od Mishehu 13:43, 11 March 2009 (UTC)
I've switched nearly all of the scripts to wikitools from pywikipedia. Most of the changes were untested, so invariably something has broken. Let me know if you see anything and I'll fix it as soon as possible. Cheers. -- MZMcBride ( talk) 19:53, 11 March 2009 (UTC)
There are many pages with remains of test edits, a database report would be an efficient way to detect them, maybe one for articles, and another one for non-user pages (it'll probably return a lot of talk pages, in talk:, file talk:, etc). We can start with pages containing one of those elements:
"'''Bold text'''", "''Italic text''", "[[Link title]]", [http://www.example.com link title]", "== Headline text ==", "[[File:Example.jpg]]", "[[Image:Example.jpg]]",
"[[Media:Example.ogg]]", "<math>Insert formula here</math>", "<nowiki>Insert non-formatted text here</nowiki>", "#REDIRECT [[Target page name]]", "<s>Strike-through text</s>", "<sup>Superscript text</sup>", "<sub>Subscript text</sub>", "<small>Small Text</small>", "<!-- Comment -->", "Image:Example.jpg|Caption" or "<ref>Insert footnote text here</ref>" .
Thanks,
Cenarium (
talk)
17:09, 25 March 2009 (UTC)
^Somewhat self-explanatory. Monthly? – xeno ( talk) 18:40, 30 March 2009 (UTC)
There are subpages of Wikipedia:Articles for deletion that are created with strange content such as this or this There are also AFD noms that are not transcluded, or very late ( example). A list of subpages of AFD that are orphaned (orphan in project space if possible, to include the latter) and are not redirects would probably detect many of those and allow to cleanup any inappropriate content. If this is possible, monthly should be enough (as DumbBOT monitors recently created incomplete nominations). Cenarium ( talk) 16:28, 4 April 2009 (UTC)
I've just created this report. I don't deal with stubs, so I have no idea how useful it will be, but let me know if you think the page length check should be increased or decreased. Cheers. -- MZMcBride ( talk) 08:41, 5 April 2009 (UTC)
Hi MZMcBride, do you think it would be possible to periodically re-generate the list found at Special:MostLinkedTemplates. That list ceased to update itself a few months ago, and I found it quite useful for tracking widely used new templates so I could add them to my watchlist or check if they need protection. Since the official list isn't being regenerated at all, perhaps you could make a script replicates its functionality at Wikipedia:Database reports instead. Thanks, -- CapitalR ( talk) 02:59, 8 April 2009 (UTC)
Just wondering if a few more items could be excluded from the Wikipedia:Database reports/Empty categories report
-- WOSlinker ( talk) 13:06, 4 May 2009 (UTC)
Would it be possible to run a report on articles using "plainlinks" as span tag. Special:Search/plainlinks is probably not exhaustive. While this formatting is useful in template namespace, I don't think it should be in article namespace. -- User:Docu
Category:Templates generating hCards and Geo lists templates used by articles that should generally receive coordinates. I would be interested in knowing which of these articles don't transclude {{ coord missing}} or {{ coor URL}} yet. That means that they still need to be tagged with {{ coord missing}}. If it helps, I can clean out the /doc subtemplates first, but it shouldn't matter, they aren't used in articles. -- User:Docu
The below should work. It should render at least
Roma_Street_railway_station,_Brisbane. -- User:Docu 21:45, 5 May 2009 (UTC) - updated 21:59, 5 May 2009 (UTC)
SELECT p1.page_title, p2.page_title FROM categorylinks, page As P1, templatelinks As tl1, page AS P2
WHERE cl_to = 'Templates_generating_hCards_and_Geo'
AND cl_from = P1.page_id
AND p1.page_title = tl1.tl_title
AND tl1.tl_from = P2.page_id
AND P2.page_namespace = 0
AND P2.page_id NOT IN (SELECT tl_from FROM templatelinks WHERE tl_title = 'Coor_URL' OR tl_title = 'Coord_missing')
MySQL squawked about the capitalization of "P1" versus "p1". I ran the query with LIMIT 1 and these were the results:
mysql> SELECT p1.page_title, p2.page_title FROM categorylinks, page AS p1, templatelinks AS tl1, page AS p2
-> WHERE cl_to = 'Templates_generating_hCards_and_Geo'
-> AND cl_from = p1.page_id
-> AND p1.page_title = tl1.tl_title
-> AND tl1.tl_from = p2.page_id
-> AND p2.page_namespace = 0
-> AND p2.page_id NOT IN (SELECT tl_from FROM templatelinks WHERE tl_title = 'Coord' OR tl_title = 'Coord_missing')
-> LIMIT 1;
+--------------------+-------------------------+
| page_title | page_title |
+--------------------+-------------------------+
| Infobox_Skyscraper | 1000_Connecticut_Avenue |
+--------------------+-------------------------+
1 row in set (1 min 56.61 sec)
Seems a bit slow; I may be able to optimize the query a bit. Is this the type of output you want? And do you want an individual report (updated regularly) or do you want a one-time run (in a sandbox or something)? -- MZMcBride ( talk) 22:01, 5 May 2009 (UTC)
To simplify processing the result, would you add the categories of the resulting articles as well? Can the query below still run on toolserver?
SELECT p2.page_title As article, p1.page_title As infobox, GROUP_CONCAT(cl2.cl_to) As categories
FROM categorylinks as cl1, page As p1, templatelinks As tl1, page AS p2, categorylinks as cl2
WHERE cl1.cl_to = 'Templates_generating_hCards_and_Geo'
AND cl1.cl_from = p1.page_id
AND p1.page_title = tl1.tl_title
AND tl1.tl_from = p2.page_id
AND p2.page_namespace = 0
AND p2.page_id NOT IN
(SELECT tl2.tl_from FROM templatelinks As tl2 WHERE tl2.tl_title = 'Coor_URL' OR tl2.tl_title = 'Coord_missing')
AND p2.page_id = cl2.cl_from
GROUP BY p2.page_id
I'd be glad if you'd set it up to run once a week on toolserver. -- User:Docu
Current progress:
Id: 1758577
User: mzmcbride
Host: nightshade.toolserver.org:52148
db: enwiki_p
Command: Query
Time: 15352
State: preparing
Info: SELECT p1.page_title, p2.page_title FROM categorylinks, page AS p1, templatelinks AS tl1, page AS p2
WHERE cl_to = 'Templates_generating_hCards_and_Geo'
AND cl_from = p1.page_id
AND p1.page_title = tl1.tl_title
AND tl1.tl_from = p2.page_id
AND p2.page_namespace = 0
AND p2.page_id NOT IN (SELECT tl_from FROM templatelinks WHERE tl_title = 'Coor_URL' OR tl_title = 'Coord_missing')
Over four hours and still running. Hmm. -- MZMcBride ( talk) 04:22, 6 May 2009 (UTC)
And... this is what I get for not writing my own queries. ;-) The primary problem with your query is that it's not using an index when it selects from templatelinks. If you don't specify tl_namespace = 10, it requires scanning the entire table (about 134,616,320 rows). And doing it recursively. Ouch.
The other issue is that I'm fairly sure MySQL hates OR statements. I re-wrote the query (albeit before I saw you wanted the categories as well...).
mysql> SELECT
-> p1.page_title,
-> tl_title
-> FROM templatelinks
-> JOIN page AS p1
-> ON tl_from = p1.page_id
-> JOIN (SELECT
-> cl_from,
-> p2.page_title
-> FROM categorylinks
-> JOIN page AS p2
-> ON cl_from = p2.page_id
-> WHERE cl_to = "Templates_generating_hCards_and_Geo"
-> AND p2.page_namespace = 10) AS cltmp
-> ON cltmp.page_title = tl_title
-> WHERE tl_namespace = 10
-> AND p1.page_namespace = 0
-> AND page_id NOT IN (SELECT
-> tl_from
-> FROM templatelinks
-> WHERE tl_namespace = 10
-> AND tl_title ="Coord_missing")
-> AND page_id NOT IN (SELECT
-> tl_from
-> FROM templatelinks
-> WHERE tl_namespace = 10
-> AND tl_title ="Coor_URL")
-> LIMIT 1000;
+---------------------------------------------------------------+-----------------------------------+
| page_title | tl_title |
+---------------------------------------------------------------+-----------------------------------+
| Gijón | Infobox_City_Spain |
| Jerez_de_la_Frontera | Infobox_City_Spain |
| Almagro,_Ciudad_Real | Infobox_City_Spain |
| Bárcena_de_Pie_de_Concha | Infobox_City_Spain |
[Removed results here....]
| Shire_of_Capel | Infobox_Australian_Place |
| Shire_of_Carnamah | Infobox_Australian_Place |
| Shire_of_Carnarvon | Infobox_Australian_Place |
| Shire_of_Chapman_Valley | Infobox_Australian_Place |
| Shire_of_Collie | Infobox_Australian_Place |
| Shire_of_Cranbrook | Infobox_Australian_Place |
| Shire_of_Cue | Infobox_Australian_Place |
+---------------------------------------------------------------+-----------------------------------+
1000 rows in set (24.44 sec)
Seems to be much, much faster now. Do these results look correct?
I'm not sure how feasible this is for wiki pages. There are probably thousands and thousands of results. Do you want it in a different format (text file, CSV, etc.)? Or do you want me to throw a LIMIT on it? I should have the total number of results in a few minutes. -- MZMcBride ( talk) 04:57, 6 May 2009 (UTC)
SELECT
lgtmp.page_title,
lgtmp.tl_title,
GROUP_CONCAT(ctgl.cl_to)
FROM categorylinks AS ctgl
JOIN (SELECT
p1.page_id,
p1.page_title,
tl_title
FROM templatelinks
JOIN page AS p1
ON tl_from = p1.page_id
JOIN (SELECT
cl_from,
p2.page_title
FROM categorylinks
JOIN page AS p2
ON cl_from = p2.page_id
WHERE cl_to = "Templates_generating_hCards_and_Geo"
AND p2.page_namespace = 10) AS cltmp
ON cltmp.page_title = tl_title
WHERE tl_namespace = 10
AND p1.page_namespace = 0
AND page_id NOT IN (SELECT
tl_from
FROM templatelinks
WHERE tl_namespace = 10
AND tl_title ="Coord_missing")
AND page_id NOT IN (SELECT
tl_from
FROM templatelinks
WHERE tl_namespace = 10
AND tl_title ="Coor_URL")) AS lgtmp
ON lgtmp.page_id = ctgl.cl_from
GROUP BY lgtmp.page_title;
SELECT
lgtmp.page_title,
lgtmp.tl_title,
GROUP_CONCAT(ctgl.cl_to)
FROM categorylinks AS ctgl
RIGHT JOIN (SELECT
p1.page_id,
p1.page_title,
tl_title
FROM templatelinks
JOIN page AS p1
ON tl_from = p1.page_id
JOIN (SELECT
cl_from,
p2.page_title
FROM categorylinks
JOIN page AS p2
ON cl_from = p2.page_id
WHERE cl_to = "Templates_with_coordinates_fields"
AND p2.page_namespace = 10) AS cltmp
ON cltmp.page_title = tl_title
WHERE tl_namespace = 10
AND p1.page_namespace = 0
AND page_id NOT IN (SELECT
tl_from
FROM templatelinks
WHERE tl_namespace = 10
AND tl_title ="Coord_missing")
AND page_id NOT IN (SELECT
tl_from
FROM templatelinks
WHERE tl_namespace = 10
AND tl_title ="Coor_URL")) AS lgtmp
ON lgtmp.page_id = ctgl.cl_from
GROUP BY lgtmp.page_title;
-- Updated version above. -- User:Docu
To help with the decisions at Wikipedia:Reviewers. Run as a one-off (maybe once per year in case there are any trends)? AndrewRT( Talk) 01:11, 9 May 2009 (UTC)
The user_registration field isn't really very accurate. For example, Versageek is an admin and registered a long time ago (before me, even), yet here's her info:
mysql> SELECT * FROM user WHERE user_name="Versageek";
+---------+-----------+-------------------+----------------+
| user_id | user_name | user_registration | user_editcount |
+---------+-----------+-------------------+----------------+
| 146418 | Versageek | NULL | 19408 |
+---------+-----------+-------------------+----------------+
1 row in set (0.00 sec)
There are a lot of NULL values in the field. I'll use MIN(rev_timestamp). -- MZMcBride ( talk) 20:03, 10 May 2009 (UTC)
Yes, these blocks are unusually long, but some aren't excessive, e.g. the IP blocks of Scientology IPs per the recent Arbcom case. ⟳ausa کui × 11:31, 21 June 2009 (UTC)
Just wondering if it would also be possible to show the number of pages that the template is used on as well as the number of transclusions.
Would this SQL do it:
SELECT tl_title, COUNT(*) AS transclusion_count, COUNT(DISTINCT tl_from) AS page_count FROM templatelinks WHERE tl_namespace = 10 GROUP BY tl_title ORDER BY COUNT(*) DESC LIMIT 1000;
??? -- WOSlinker ( talk) 20:50, 30 June 2009 (UTC)
Not sure if anyone has noticed, but the protection-related reports have been broken for some time. It seems the query they were using stopped working well; any instance of it hung indefinitely. Probably a MySQL bug or something. Anyway, with this commit I've updated all six reports to now use the API to pull their data. Preliminary tests indicate that the scripts work, but not all of them have been tested. Things like missing titles may break the reports entirely; I didn't have a chance to fully stress-test.
The reports will update at the beginning of Thursday UTC. If you notice any strangeness (or a lack of updating), please shout here or on my talk page. Cheers. -- MZMcBride ( talk) 08:24, 14 July 2009 (UTC)
On July 2, the Toolserver had a power outage causing data corruption in the S1 server (which holds en.wiki's database and a copy of Commons' database). This is a documented breakage and there's currently no estimated time for a re-import of the databases. Because there is data corruption, some reports will be inaccurate. Please act accordingly. Apologies in advance for the inconvenience. -- MZMcBride ( talk) 14:36, 14 July 2009 (UTC)
This may exist already. This report used to trawl the database and report any article which contained ten or more non-free images - it looked like this. Run monthly? Black Kite 08:02, 22 July 2009 (UTC)
I'd love to run AWB over these reports but it doesn't see the links, just templates. Please fix this or create a raw toolserver tool to use.-- Ipatrol ( talk) 19:33, 22 July 2009 (UTC)
I know that technically it's not related to this page, but could you update Wikipedia:WikiProject Stub sorting/Stub type sizes/data on a monthly basis? עוד מישהו Od Mishehu 08:25, 17 August 2009 (UTC)
Per this discussion, it is possible to have a trawl through the data base to find how many stubs of less than 250bytes have not been edited in the last twelve months? It may help the discussion to see how many there are, and for a list so a few examples could be reviewed. If this is not the correct venue, could some kind soul advise me what is. Cheers, LessHeard vanU ( talk) 20:40, 21 August 2009 (UTC)
I'm not sure what the actual objective of this list was / is; if that's made more clear, perhaps I can find another way to get the data you're after. 1500-ish random pages doesn't seem to be your primary goal. -- MZMcBride ( talk) 09:00, 22 August 2009 (UTC)
I'd like a monthly report on all pages which fit the following cirteria:
Query used:
SELECT
pg5.page_id, pg5.page_namespace, pg5.page_title, /* page id, namespace, and title */
pg6.page_namespace, pg6.page_title /* redirect namespace and title */
FROM page AS pg6
JOIN (SELECT
*
FROM page AS pg3
JOIN (SELECT
pl_from,
rd_namespace,
rd_title,
rd_from
FROM page AS pg1
JOIN (SELECT
pl_from,
page_id
FROM pagelinks
JOIN page AS pg4
ON pl_namespace = pg4.page_namespace
AND pl_title = pg4.page_title) AS pgl1
ON pgl1.page_id = pg1.page_id
JOIN redirect
ON rd_from = pg1.page_id) AS pg2
ON pg2.pl_from = pg3.page_id
WHERE rd_title = page_title
AND rd_namespace = page_namespace) AS pg5
ON pg5.rd_from = pg6.page_id;
Almost identical to my Self redirects request, the only problem with the queries is the section is not included in the database. So some of these might be valid redirects to sections. — Dispenser 07:02, 1 September 2009 (UTC)
Per
WP:R2D ("exceptions"), templates such as navboxes shouldn't link to redirects, as it can create self-referential linking similar to that described in the section above. A list of templates that link to redirects would be helpful for fixing these (maybe weekly or biweekly?) –
Drilnoth (
T •
C •
L) 20:58, 31 August 2009 (UTC) My bad, I see that this was within the scope of the last requests. Having a separate list for this would be helpful, though. –
Drilnoth (
T •
C •
L)
21:04, 31 August 2009 (UTC)
If a file doesn't have a template on it, it usually doesn't have a license. Having a list of these (perhaps weekly) would greatly help in tracking down copyright violations. – Drilnoth ( T • C • L) 20:59, 31 August 2009 (UTC)
Files which consist only of headlines and templates, other than {{ Information}} and the various templates in Category:Non-free use rationale templates, usually lack essential information such as a source. Having a list of these (maybe biweekly?) would be helpful in tracking down copyvios. – Drilnoth ( T • C • L) 21:01, 31 August 2009 (UTC)
The lack of metadata in an image is a common sign of copyvios. Any image that is tagged as being available under one of the "self" free licenses, like {{ PD-self}}, {{ PD-user}}, {{ GFDL-self}}, or just the plain old {{ Self}}, and which has no metadata, would make a useful list for finding copyvios. Perhaps biweekly? – Drilnoth ( T • C • L) 21:03, 31 August 2009 (UTC)
SELECT
CONCAT('# [[:File:',page_title,']]')
FROM page
JOIN image
ON img_name = page_title
JOIN categorylinks
ON cl_from = page_id
WHERE cl_to = 'Self-published_work'
AND page_namespace = 6
AND page_is_redirect = 0
AND img_metadata = '0'
LIMIT 30;
+-------------------------------------------------------------------------------------------------------------------------+
| CONCAT('#
File:',page_title,'') |
+-------------------------------------------------------------------------------------------------------------------------+
| #
File:"Bebop-rebop"_early_bop_phrase.png |
| #
File:"Riven_Rock"_wine_label_by_Alden_Marin.jpg |
| #
File:'Standing_Youth',_cast_stone_sculpture_by_Wilhelm_Lehmbruck,_1913,_Museum_of_Modern_Art_(New_York_City).jpg |
| #
File:(Az)Xeus.jpg |
| #
File:-Logo_of_the_15_helicopter_battalion_and_battalion_of_air_transport.jpg |
| #
File:-model-baxe.jpg |
| #
File:001front.jpg |
| #
File:009w.jpg |
| #
File:01201_1t.jpg |
| #
File:014w.jpg |
| #
File:016w.jpg |
| #
File:018w.jpg |
| #
File:020202.JPG |
| #
File:021w.jpg |
| #
File:02210506l.jpg |
| #
File:023_23.jpg |
| #
File:023w.jpg |
| #
File:024w.jpg |
| #
File:03_heritage_monument.JPG |
| #
File:033_33.jpg |
| #
File:035_35.jpg |
| #
File:036_36.jpg |
| #
File:037_37.jpg |
| #
File:04_Jan_2003_(11).jpg |
| #
File:048_48.jpg |
| #
File:049_49.jpg |
| #
File:05_Flag_Presentation.jpg |
| #
File:050_50.jpg |
| #
File:051206d.JPG |
| #
File:05deciraq2.gif |
+-------------------------------------------------------------------------------------------------------------------------+
30 rows in set (0.01 sec)
Thanks for running this. I'll take a look through them in the next couple days and see if I can refine the request. Thanks! – Drilnoth ( T • C • L) 20:00, 1 September 2009 (UTC)
That's a pretty inspired idea. Assuming I wrote this query correctly, some sample data is provided below. Helpful? Unhelpful? -- MZMcBride ( talk) 23:00, 1 September 2009 (UTC)
+--------------------------------------------------------------------------------------------------------------------------+
| CONCAT('
',pg.page_title,' |
File:',il_to,'') |
+--------------------------------------------------------------------------------------------------------------------------+
|
Alden_Marin |
File:"Riven_Rock"_wine_label_by_Alden_Marin.jpg |
|
Chong_Chul_Rhee |
File:070616-Janggeom-Master.jpg |
|
Huang_Yong_Ping |
File:100_Arme_der_Guan-yin.jpg |
|
Sui_Jianguo |
File:100sj.jpg |
|
Peter_Dobbie |
File:10728-225-017t.jpg |
|
Brian_Chase |
File:1161-02chasemisterka14.jpg |
|
Bruno_Wizard |
File:1178-06homo09.jpg |
|
Lisle_Ellis |
File:1255-04_lisle13.jpg |
|
Jack_Dishel |
File:1342moldy09.jpg |
|
Jan_Inge_Reilstad |
File:149px-JanInge.jpg |
|
Giorgio_Gomelsky |
File:1528_giorgio08_crop.jpg |
|
Chris_Caffery |
File:16CC16a_CC_s.jpg |
|
András_Petöcz |
File:19-gurul_500.jpg |
|
Wray_Carlton |
File:1965WrayCarlton.png |
|
Sonny_Black |
File:1978-m07-d00_-_Bracknell_Folk_Festival_-_Bill_Boazman.jpg |
|
Dave_Concepción |
File:198009-012WillieMcCoveyLastCandlestick.gif |
|
Willie_McCovey |
File:198009-012WillieMcCoveyLastCandlestick.gif |
|
Fawzi_Al_Shammari |
File:19810_W400XH600.jpg |
|
Maggie_Holland |
File:1983-m08-d07_-_Goodwood_Folk_Festival_-_English_Country_Blues_Band_(Maggie_Holland).jpg |
|
Boris_Dragojevic |
File:1993.Leute_Moj..._47x37cm.jpg |
|
George_W._Della,_Jr. |
File:1gdella.jpg |
|
D._Page_Elmore |
File:1pelmore.jpg |
|
Perry_McCarthy |
File:2003_pq_paddocks-perry-mccarthy.jpg |
|
Sean_Gabb |
File:2003-08-sig-sweden.jpg |
|
Fred_Wedlock |
File:2003-m06-d21_(11)_-_Somerset_-_Fred_Wedlock_(Allerford_M-8bd.jpg |
|
Mia_Michaels |
File:2005_Mia-Michaels_face.jpg |
|
Horacio_Lavandera |
File:20050731elpepiesp_1_I_SCO.jpg |
|
Bob_Chiarelli |
File:2006_Chiarelli_vote.png |
|
Jonathan_Roberts_(dancer) |
File:2006_Jonathan-Roberts_face.jpg |
|
Alex_Munter |
File:2006_Munter_vote.png |
|
Larry_O'Brien_(Canadian_politician) |
File:2006_O'Brien_vote.png |
|
Anna_Trebunskaya |
File:2007_Anna-Trebunskaya_face.jpg |
|
Ted_Vasin |
File:2007_ted_vasin_the_i_behind_this_i_M.jpg |
|
Jaume_Plensa |
File:20070621_Crown_Fountain_Water.JPG |
|
Piyaporn_Deejing |
File:2008thailand.JPG |
|
Pleumjit_Thinkaow |
File:200px-PJ0@DOHA.jpg |
|
Toby_Goodshank |
File:212bday01.jpg |
|
Hector_Luis_Bustamante |
File:249657869_l-1-.jpg |
|
Csaba_Markus |
File:250px-Csaba_markus_965.jpg |
|
Tim_Ellis |
File:2645.jpg |
+--------------------------------------------------------------------------------------------------------------------------+
40 rows in set (5.61 sec)
The above ones gave me an idea, what about files that don't contain files, if that makes sense. I.e. where some user has created a file:example.jpg page, but didn't upload a file to it, it just contains text or something. Possible? Or would it just list every file thats been moved to commons?-- Jac16888 Talk 21:12, 31 August 2009 (UTC)
SELECT
img_name
FROM image
WHERE NOT EXISTS (SELECT
page_title
FROM page
WHERE img_name = page_title
AND page_namespace = 6)
AND NOT EXISTS (SELECT
page_title
FROM commonswiki_p.page
WHERE img_name = page_title
AND page_namespace = 6);
+--------------------------------------------------------+
| img_name |
+--------------------------------------------------------+
| WLVR_logo.jpg |
| 2005_jorge_ramos_new_3.jpg |
| Lunar_eclipse_chart_close-1944Dec29.png |
| Nas\x7e\x7e\x7e\x7e\x7e\x7e\x7e\x7e_bridgingt_101b.jpg |
| Ruben-studdard-love-is-album-cover.jpg |
| Hebb_bobby\x7e_sunny\x7e\x7e\x7e\x7e_101b.jpg |
| Church_2.jpg |
| St._Theresa_of_Lisieux_Church_2.jpg |
| AK-74,_Modernized.png |
| Lunar_eclipse_from_moon-2053Aug29.png |
| Lunar_eclipse_from_moon-1901May03.png |
| Lunar_eclipse_from_moon-1948Oct18.png |
| Birthplace_of_the_Computer.jpg |
| Lunar_eclipse_from_moon-1908Dec07.png |
| Destination_x.jpg |
| Mike_ford.jpg |
+--------------------------------------------------------+
16 rows in set (16.36 sec)
Done: Wikipedia:Database reports/File description pages without an associated file. -- MZMcBride ( talk) 16:08, 4 September 2009 (UTC)
Hi. Having seen sometimes disruptive users who could edit semi-protected articles and other things restricted to autoconfirmed users due to edits in talk or user namespaces, I was wondering if requiring 10 content edits for autoconfirmation, content edits being defined as those in the main, category, template or portal namespaces, could be useful or would generate too much collateral damage. So it would be useful to have a representative subset of users who wouldn't have been autoconfirmed if we required content edits. A one-time report of autoconfirmed users without 10 edits in content namespaces, including blocked users, would be useful. Thanks, Cenarium ( talk) 01:22, 3 September 2009 (UTC)
Basically, a list of all files in Category:All non-free media which: A) Have more than one image revision (e.g., a resizing or other edit made to the image itself, not to the description page), B) do not transclude {{ Non-free reduce}}, and C) are not in Category:All orphaned non-free use Wikipedia files. The older revisions of these images usually need to be deleted per WP:CSD#F5 because they are unused non-free images. – Drilnoth ( T • C • L) 15:47, 14 September 2009 (UTC)
Query:
SELECT
page_title
FROM page AS pg1
JOIN categorylinks
ON cl_from = page_id
WHERE cl_to = 'All_non-free_media'
AND pg1.page_namespace = 6
AND EXISTS (SELECT
1
FROM oldimage
WHERE oi_name = pg1.page_title)
AND NOT EXISTS (SELECT
1
FROM categorylinks
WHERE cl_from = pg1.page_id
AND cl_to = 'All_orphaned_non-free')
AND NOT EXISTS (SELECT
1
FROM templatelinks
WHERE tl_from = pg1.page_id
AND tl_namespace = 10
AND tl_title = 'Non-free_reduce');
Sample results:
Please check the sample data. It looks like there are 38,000 results total, so I'll probably limit output to the first 1,000 entries. -- MZMcBride ( talk) 02:17, 17 September 2009 (UTC)
This is an archive of past discussions. Do not edit the contents of this page. If you wish to start a new discussion or revive an old one, please do so on the current talk page. |
Archive 1 | Archive 2 | Archive 3 | → | Archive 5 |
Now that we've got Last edit and Last user columns for WP:Database reports/Empty categories, how hard would it be to add the same features to WP:Database reports/Uncategorized categories? - Stepheng3 ( talk) 21:36, 5 November 2008 (UTC)
I've begun working through the Wikipedia:Database reports/Empty categories, and I have a few suggestions:
I hope these ideas inspire someone to upgrade this valuable report. Thanks in advance, - Stepheng3 ( talk) 07:35, 30 October 2008 (UTC)
(unindent) w00t! Looks like I've got it working now. :D I'm going to hold off on #6 while discussion continues (though it is trivial to implement once there's a decision about what words / phrases to filter out). -- MZMcBride ( talk) 20:14, 2 November 2008 (UTC)
We could use some review of the long-term semi-protection on many articles. We could run the report daily, or even weekly. Thanks, HiDrNick! 15:47, 12 November 2008 (UTC)
If these reports already exist, or if these pages are tagged/deleted by bots, that would be nice to know. If not, let's start with monthly reports and see how much cleanup work there is. - Stepheng3 ( talk) 22:54, 31 October 2008 (UTC)
Also, you may be interested in the /Magic subpage in my userspace where I list all of the things you've listed here and more. Cheers. -- MZMcBride ( talk) 23:23, 31 October 2008 (UTC)
How about a report showing categories that contain themselves as members? I've run into this situation dozens of times in the past year. Some editor (who is probably a bit unclear on how categories are intended to work) creates a category Category:Foo with the text [[Category:Foo]] in it. Or someone copying categories from the main article forgets to delete the category in question. The resulting category is categorized, non-zero length, and non-empty. I think there might be some cleanup work to do in this area.
And if you get ambitious, maybe you could write a report to identify cycles in the category hierarchy.
A third alternative might be to modify Wikipedia:Database reports/Uncategorized categories so that the category in question is excluded from being counted as a parent. This would be the least thorough approach, though probably the easiest. - Stepheng3 ( talk) 19:47, 1 November 2008 (UTC)
Well, the general trick is to wrap the [[Category:Foo]] links in a {{{category}}} parameter and then define it as null where you don't want to have the category show up. So, for example:
Template:
Foo {{{1}}} ''{{{2}}}''<includeonly>[[Category:Bar]]</includeonly><noinclude> Docs... </noinclude>
That would be changed to:
Foo {{{1}}} ''{{{2}}}''<includeonly>{{{category|[[Category:Bar]]}}}</includeonly><noinclude> Docs... </noinclude>
Transclusions of the template need to be changed from:
{{template}}
To:
{{template|category=}}
Essentially, you're defining a category parameter (and defaulting it to the regular category) while then setting it as null for specific pages. -- MZMcBride ( talk) 04:26, 4 November 2008 (UTC)
<includeonly>{{#ifeq:{{ns:2}}|{{NAMESPACE}}|[[Category:FOO|{{PAGENAME}}]]|}}</includeonly>
Why does Category:Slovenia national football team templates keep showing up in this report? - Stepheng3 ( talk) 01:28, 22 December 2008 (UTC)
mysql> SELECT * FROM categorylinks WHERE cl_to="Slovenia_national_football_team_templates";
+----------+-------------------------------------------+-----------------------------------+---------------------+
| cl_from | cl_to | cl_sortkey | cl_timestamp |
+----------+-------------------------------------------+-----------------------------------+---------------------+
| 7314367 | Slovenia_national_football_team_templates | Slovenia Squad 2002 World Cup | 2006-10-06 04:20:14 |
| 17894927 | Slovenia_national_football_team_templates | Template:Slovenia Squad 2000 Euro | 2008-06-11 19:27:50 |
+----------+-------------------------------------------+-----------------------------------+---------------------+
2 rows in set (0.02 sec)
That looks right to me. The Toolserver is still a bit lagged (about four days), so I'm inclined to wait until the next update to see if this is resolved. -- MZMcBride ( talk) 04:53, 5 January 2009 (UTC)
I notice that none of the reports were updated on Wednesday or today (Thursday) -- not even the ones marked "daily". Is something broken? - Stepheng3 ( talk) 18:16, 1 January 2009 (UTC)
Is there a way to generate these with more current data? The galleries display images included in {{ infobox lake}} as "image_lake". The name of the lake displayed is from "lake_name" and links back to the lakes article. I break it down manually in pages with 200 images.
Currently I use Templatetiger. Its data is updated once or twice a year. Its January 2009 update is with data from 2008-10-08. -- User:Docu
So I'd need to read Infobox lake and then each article directly from the site? Maybe I can get this to work with pywikipediabot. BTW is there one that reads template fields? -- User:Docu
After looking through some of the tools, with a few changes, archivebot.py might do it. -- User:Docu
This report appears to be broken. Is anyone looking into this? - Stepheng3 ( talk) 04:49, 24 January 2009 (UTC)
I've been unsuccessful at trying to convince rst20xx to remove this exemption he added to the C1 criteria. Could you filter these categories out of the empty categories report? Here are a couple examples:
While I've got your attention, I'm reiterating my request for categories whose names contain the following phrases to be filtered out, or at least segregated in some way from the main list of empty categories:
In my experience, categories of this sort are "project categories that by their nature may become empty on occasion". -- Stepheng3 ( talk) 21:00, 12 February 2009 (UTC)
Over at Wikipedia:Improving referencing efforts we're trying to improve article referencing. It would be helpful if we had statistics on the proportion of articles with references, average references per article, etc. Ideally, I would like to see to compare a year ago to this year, or even monthly stats if that would be possible. Thanks. - Peregrine Fisher ( talk) ( contribs) 22:03, 15 December 2008 (UTC)
Pages with a protection level [edit=none], [move=autoconfirmed], or with a [edit=sysop] and not [move=sysop]. As it confuses bots and protection templates (through {{PROTECTIONLEVEL}}). Thanks, Cenarium ( talk) 14:19, 13 February 2009 (UTC)
Someone should probably check my logic here, but this should be what you want.
mysql> SELECT
-> CONCAT('# [[',ns_name,':',page_title,']]')
-> FROM page_restrictions AS pr1
-> JOIN page
-> ON pr1.pr_page = page_id
-> JOIN toolserver.namespace
-> ON page_namespace = ns_id
-> AND dbname = "enwiki_p"
-> WHERE pr1.pr_type = "move"
-> AND pr1.pr_level = "autoconfirmed"
-> AND NOT EXISTS (SELECT
-> 1
-> FROM page_restrictions AS pr2
-> WHERE pr1.pr_page = pr2.pr_page
-> AND pr2.pr_type != "move");
+----------------------------------------------+
| CONCAT('# [[',ns_name,':',page_title,']]') |
+----------------------------------------------+
| # [[:Lysergic_acid_diethylamide]] |
| # [[Talk:Serbo-Croatian_language/Archive_2]] |
| # [[:Scrubs_(TV_series)]] |
| # [[:Interscope_Records]] |
| # [[:Lost]] |
| # [[User talk:Anonymous_editor]] |
| # [[User talk:212.85.18.7]] |
| # [[Template:Intelligent_Design]] |
| # [[Portal:Taiwan]] |
| # [[User:Titoxd/Sandbox2]] |
| # [[Template:Essay]] |
| # [[:DADVSI]] |
| # [[User talk:210.217.33.187]] |
| # [[User talk:Hillman]] |
| # [[:George_Augustus_Stallings]] |
| # [[User talk:Jersyko/archive10]] |
| # [[User talk:Kafka_Liz]] |
| # [[:Glen_A._Wilson_High_School]] |
| # [[User talk:Gavin.collins/Archive_6]] |
| # [[User talk:Peter_Damian_(old)]] |
| # [[User talk:69.180.82.43]] |
| # [[User talk:EEMIV/Archive10]] |
| # [[User talk:EEMIV]] |
| # [[Talk:Serbo-Croatian_language]] |
+----------------------------------------------+
24 rows in set (0.18 sec)
mysql> SELECT
-> CONCAT('# [[',ns_name,':',page_title,']]')
-> FROM page_restrictions AS pr1
-> JOIN page
-> ON pr1.pr_page = page_id
-> JOIN toolserver.namespace
-> ON page_namespace = ns_id
-> AND dbname = "enwiki_p"
-> WHERE pr1.pr_type = "edit"
-> AND pr1.pr_level = "sysop"
-> AND NOT EXISTS (SELECT
-> 1
-> FROM page_restrictions AS pr2
-> WHERE pr1.pr_page = pr2.pr_page
-> AND pr2.pr_type = "move"
-> AND pr2.pr_level = "sysop");
+--------------------------------------------+
| CONCAT('# [[',ns_name,':',page_title,']]') |
+--------------------------------------------+
| # [[Template:WP_Mesoamerica_subpage]] |
| # [[User talk:Railer_198]] |
| # [[Template:IPsock]] |
| # [[User:Happy-melon/sandbox4]] |
| # [[User:Kingturtle/Warningkt]] |
| # [[User:Sceptre/contactme]] |
| # [[User:Sceptre/userpage]] |
| # [[User talk:Fatloss]] |
| # [[User talk:Stifle/wizard/images]] |
+--------------------------------------------+
9 rows in set (0.48 sec)
Don't have time at the moment to make a proper database report, but at least you have the results. :-) -- MZMcBride ( talk) 15:31, 14 February 2009 (UTC)
SELECT *
FROM page
INNER JOIN pagelinks ON page_id = pl_from
WHERE page_namespace = 0
AND pl_namespace=0 /* optional */
/* AND page_len > 2024 to skip stubs */
AND page_title IN (SELECT rd_title
FROM redirect
INNER JOIN page ON rd_from = page_id
WHERE page_title = pl_title AND page_namespace=pl_namespace)
LIMIT 50;
This query will find links that redirect to the originating page. This can happened because:
In all cases it is annoying for the reading to have to reload the same page. — Dispenser 17:25, 13 February 2009 (UTC)
Not sure I understand this query. Here are sample results of the query for enwiki_p with LIMIT 5.
mysql> SELECT *
-> FROM page
-> INNER JOIN pagelinks ON page_id = pl_from
-> WHERE page_namespace = 0
-> AND pl_namespace=0 /* optional */
-> /* AND page_len > 2024 to skip stubs */
-> AND page_title IN (SELECT rd_title
-> FROM redirect
-> INNER JOIN page ON rd_from = page_id
-> WHERE page_title = pl_title AND page_namespace=pl_namespace)
-> LIMIT 5;
+----------+----------------+-------------------------------------------+-------------------+--------------+------------------+-------------+----------------+----------------+-------------+----------+----------+--------------+-------------------------------------------+
| page_id | page_namespace | page_title | page_restrictions | page_counter | page_is_redirect | page_is_new | page_random | page_touched | page_latest | page_len | pl_from | pl_namespace | pl_title |
+----------+----------------+-------------------------------------------+-------------------+--------------+------------------+-------------+----------------+----------------+-------------+----------+----------+--------------+-------------------------------------------+
| 19813130 | 0 | "D"_Is_for_Dubby_-_The_Lustmord_Dub_Mixes | | 0 | 0 | 0 | 0.671963747482 | 20081214074626 | 257866114 | 1356 | 19813130 | 0 | "D"_is_for_Dubby_-_The_Lustmord_Dub_Mixes |
| 9582421 | 0 | "Them" | | 0 | 0 | 0 | 0.415525086755 | 20090211013015 | 265439711 | 4810 | 9582421 | 0 | "Them"_(King_Diamond_album) |
| 12571133 | 0 | "V"_Is_for_Vagina | | 0 | 0 | 0 | 0.364994536024 | 20090213022315 | 270356944 | 6165 | 12571133 | 0 | "V"_is_for_Vagina |
| 16079543 | 0 | "V"_Is_for_Viagra._The_Remixes | | 0 | 0 | 0 | 0.409772390435 | 20081216101637 | 258331209 | 3187 | 16079543 | 0 | "V"_is_for_Viagra._The_Remixes |
| 7630017 | 0 | "Weird_Al"_Yankovic_discography | | 0 | 0 | 0 | 0.110680981041 | 20090211013015 | 269028703 | 20424 | 7630017 | 0 | List_of_singles_by_"Weird_Al"_Yankovic |
+----------+----------------+-------------------------------------------+-------------------+--------------+------------------+-------------+----------------+----------------+-------------+----------+----------+--------------+-------------------------------------------+
5 rows in set (9.06 sec)
I don't think this is finding what you're after (self-redirects).
-- MZMcBride ( talk) 18:18, 13 February 2009 (UTC)
Oh, I think I just understood what you're talking about. This is probably not feasible for a wiki page (likely far too many results), but it can certainly be saved to a text file (either as a one-time thing or regularly). The pagelinks table is enormous, however, so it could take a very long time to run and may require LIMITs and such.... -- MZMcBride ( talk) 18:41, 13 February 2009 (UTC)
In the toolbox area on the left of the Wikipedia page on the "Special pages" page, several links don't work. How do I delete them? There is nothing on the "Unused files" page. There is nothing on the "wanted pages" page. There is nothing on the "unused templates page". I mentioned the issue on ANI and it was suggested I come here. ChildofMidnight ( talk) 07:43, 9 March 2009 (UTC)
I think we should have a template for exclusion from this report. Category:Wikipedia noindex pages, for example, should probably not be in it, even though it fits the formal criteria. עוד מישהו Od Mishehu 09:31, 9 March 2009 (UTC)
Quick note. I forgot to sync this file to the Toolserver, so when it ran just recently, it was using the old query. I'm re-running it manually now to exclude the tagged category pages. -- MZMcBride ( talk) 09:21, 21 March 2009 (UTC)
I think a monthly report of articles with direct use (i.e not using templates) of categories ending with the word "stubs" would be useful. עוד מישהו Od Mishehu 13:43, 11 March 2009 (UTC)
I've switched nearly all of the scripts to wikitools from pywikipedia. Most of the changes were untested, so invariably something has broken. Let me know if you see anything and I'll fix it as soon as possible. Cheers. -- MZMcBride ( talk) 19:53, 11 March 2009 (UTC)
There are many pages with remains of test edits, a database report would be an efficient way to detect them, maybe one for articles, and another one for non-user pages (it'll probably return a lot of talk pages, in talk:, file talk:, etc). We can start with pages containing one of those elements:
"'''Bold text'''", "''Italic text''", "[[Link title]]", [http://www.example.com link title]", "== Headline text ==", "[[File:Example.jpg]]", "[[Image:Example.jpg]]",
"[[Media:Example.ogg]]", "<math>Insert formula here</math>", "<nowiki>Insert non-formatted text here</nowiki>", "#REDIRECT [[Target page name]]", "<s>Strike-through text</s>", "<sup>Superscript text</sup>", "<sub>Subscript text</sub>", "<small>Small Text</small>", "<!-- Comment -->", "Image:Example.jpg|Caption" or "<ref>Insert footnote text here</ref>" .
Thanks,
Cenarium (
talk)
17:09, 25 March 2009 (UTC)
^Somewhat self-explanatory. Monthly? – xeno ( talk) 18:40, 30 March 2009 (UTC)
There are subpages of Wikipedia:Articles for deletion that are created with strange content such as this or this There are also AFD noms that are not transcluded, or very late ( example). A list of subpages of AFD that are orphaned (orphan in project space if possible, to include the latter) and are not redirects would probably detect many of those and allow to cleanup any inappropriate content. If this is possible, monthly should be enough (as DumbBOT monitors recently created incomplete nominations). Cenarium ( talk) 16:28, 4 April 2009 (UTC)
I've just created this report. I don't deal with stubs, so I have no idea how useful it will be, but let me know if you think the page length check should be increased or decreased. Cheers. -- MZMcBride ( talk) 08:41, 5 April 2009 (UTC)
Hi MZMcBride, do you think it would be possible to periodically re-generate the list found at Special:MostLinkedTemplates. That list ceased to update itself a few months ago, and I found it quite useful for tracking widely used new templates so I could add them to my watchlist or check if they need protection. Since the official list isn't being regenerated at all, perhaps you could make a script replicates its functionality at Wikipedia:Database reports instead. Thanks, -- CapitalR ( talk) 02:59, 8 April 2009 (UTC)
Just wondering if a few more items could be excluded from the Wikipedia:Database reports/Empty categories report
-- WOSlinker ( talk) 13:06, 4 May 2009 (UTC)
Would it be possible to run a report on articles using "plainlinks" as span tag. Special:Search/plainlinks is probably not exhaustive. While this formatting is useful in template namespace, I don't think it should be in article namespace. -- User:Docu
Category:Templates generating hCards and Geo lists templates used by articles that should generally receive coordinates. I would be interested in knowing which of these articles don't transclude {{ coord missing}} or {{ coor URL}} yet. That means that they still need to be tagged with {{ coord missing}}. If it helps, I can clean out the /doc subtemplates first, but it shouldn't matter, they aren't used in articles. -- User:Docu
The below should work. It should render at least
Roma_Street_railway_station,_Brisbane. -- User:Docu 21:45, 5 May 2009 (UTC) - updated 21:59, 5 May 2009 (UTC)
SELECT p1.page_title, p2.page_title FROM categorylinks, page As P1, templatelinks As tl1, page AS P2
WHERE cl_to = 'Templates_generating_hCards_and_Geo'
AND cl_from = P1.page_id
AND p1.page_title = tl1.tl_title
AND tl1.tl_from = P2.page_id
AND P2.page_namespace = 0
AND P2.page_id NOT IN (SELECT tl_from FROM templatelinks WHERE tl_title = 'Coor_URL' OR tl_title = 'Coord_missing')
MySQL squawked about the capitalization of "P1" versus "p1". I ran the query with LIMIT 1 and these were the results:
mysql> SELECT p1.page_title, p2.page_title FROM categorylinks, page AS p1, templatelinks AS tl1, page AS p2
-> WHERE cl_to = 'Templates_generating_hCards_and_Geo'
-> AND cl_from = p1.page_id
-> AND p1.page_title = tl1.tl_title
-> AND tl1.tl_from = p2.page_id
-> AND p2.page_namespace = 0
-> AND p2.page_id NOT IN (SELECT tl_from FROM templatelinks WHERE tl_title = 'Coord' OR tl_title = 'Coord_missing')
-> LIMIT 1;
+--------------------+-------------------------+
| page_title | page_title |
+--------------------+-------------------------+
| Infobox_Skyscraper | 1000_Connecticut_Avenue |
+--------------------+-------------------------+
1 row in set (1 min 56.61 sec)
Seems a bit slow; I may be able to optimize the query a bit. Is this the type of output you want? And do you want an individual report (updated regularly) or do you want a one-time run (in a sandbox or something)? -- MZMcBride ( talk) 22:01, 5 May 2009 (UTC)
To simplify processing the result, would you add the categories of the resulting articles as well? Can the query below still run on toolserver?
SELECT p2.page_title As article, p1.page_title As infobox, GROUP_CONCAT(cl2.cl_to) As categories
FROM categorylinks as cl1, page As p1, templatelinks As tl1, page AS p2, categorylinks as cl2
WHERE cl1.cl_to = 'Templates_generating_hCards_and_Geo'
AND cl1.cl_from = p1.page_id
AND p1.page_title = tl1.tl_title
AND tl1.tl_from = p2.page_id
AND p2.page_namespace = 0
AND p2.page_id NOT IN
(SELECT tl2.tl_from FROM templatelinks As tl2 WHERE tl2.tl_title = 'Coor_URL' OR tl2.tl_title = 'Coord_missing')
AND p2.page_id = cl2.cl_from
GROUP BY p2.page_id
I'd be glad if you'd set it up to run once a week on toolserver. -- User:Docu
Current progress:
Id: 1758577
User: mzmcbride
Host: nightshade.toolserver.org:52148
db: enwiki_p
Command: Query
Time: 15352
State: preparing
Info: SELECT p1.page_title, p2.page_title FROM categorylinks, page AS p1, templatelinks AS tl1, page AS p2
WHERE cl_to = 'Templates_generating_hCards_and_Geo'
AND cl_from = p1.page_id
AND p1.page_title = tl1.tl_title
AND tl1.tl_from = p2.page_id
AND p2.page_namespace = 0
AND p2.page_id NOT IN (SELECT tl_from FROM templatelinks WHERE tl_title = 'Coor_URL' OR tl_title = 'Coord_missing')
Over four hours and still running. Hmm. -- MZMcBride ( talk) 04:22, 6 May 2009 (UTC)
And... this is what I get for not writing my own queries. ;-) The primary problem with your query is that it's not using an index when it selects from templatelinks. If you don't specify tl_namespace = 10, it requires scanning the entire table (about 134,616,320 rows). And doing it recursively. Ouch.
The other issue is that I'm fairly sure MySQL hates OR statements. I re-wrote the query (albeit before I saw you wanted the categories as well...).
mysql> SELECT
-> p1.page_title,
-> tl_title
-> FROM templatelinks
-> JOIN page AS p1
-> ON tl_from = p1.page_id
-> JOIN (SELECT
-> cl_from,
-> p2.page_title
-> FROM categorylinks
-> JOIN page AS p2
-> ON cl_from = p2.page_id
-> WHERE cl_to = "Templates_generating_hCards_and_Geo"
-> AND p2.page_namespace = 10) AS cltmp
-> ON cltmp.page_title = tl_title
-> WHERE tl_namespace = 10
-> AND p1.page_namespace = 0
-> AND page_id NOT IN (SELECT
-> tl_from
-> FROM templatelinks
-> WHERE tl_namespace = 10
-> AND tl_title ="Coord_missing")
-> AND page_id NOT IN (SELECT
-> tl_from
-> FROM templatelinks
-> WHERE tl_namespace = 10
-> AND tl_title ="Coor_URL")
-> LIMIT 1000;
+---------------------------------------------------------------+-----------------------------------+
| page_title | tl_title |
+---------------------------------------------------------------+-----------------------------------+
| Gijón | Infobox_City_Spain |
| Jerez_de_la_Frontera | Infobox_City_Spain |
| Almagro,_Ciudad_Real | Infobox_City_Spain |
| Bárcena_de_Pie_de_Concha | Infobox_City_Spain |
[Removed results here....]
| Shire_of_Capel | Infobox_Australian_Place |
| Shire_of_Carnamah | Infobox_Australian_Place |
| Shire_of_Carnarvon | Infobox_Australian_Place |
| Shire_of_Chapman_Valley | Infobox_Australian_Place |
| Shire_of_Collie | Infobox_Australian_Place |
| Shire_of_Cranbrook | Infobox_Australian_Place |
| Shire_of_Cue | Infobox_Australian_Place |
+---------------------------------------------------------------+-----------------------------------+
1000 rows in set (24.44 sec)
Seems to be much, much faster now. Do these results look correct?
I'm not sure how feasible this is for wiki pages. There are probably thousands and thousands of results. Do you want it in a different format (text file, CSV, etc.)? Or do you want me to throw a LIMIT on it? I should have the total number of results in a few minutes. -- MZMcBride ( talk) 04:57, 6 May 2009 (UTC)
SELECT
lgtmp.page_title,
lgtmp.tl_title,
GROUP_CONCAT(ctgl.cl_to)
FROM categorylinks AS ctgl
JOIN (SELECT
p1.page_id,
p1.page_title,
tl_title
FROM templatelinks
JOIN page AS p1
ON tl_from = p1.page_id
JOIN (SELECT
cl_from,
p2.page_title
FROM categorylinks
JOIN page AS p2
ON cl_from = p2.page_id
WHERE cl_to = "Templates_generating_hCards_and_Geo"
AND p2.page_namespace = 10) AS cltmp
ON cltmp.page_title = tl_title
WHERE tl_namespace = 10
AND p1.page_namespace = 0
AND page_id NOT IN (SELECT
tl_from
FROM templatelinks
WHERE tl_namespace = 10
AND tl_title ="Coord_missing")
AND page_id NOT IN (SELECT
tl_from
FROM templatelinks
WHERE tl_namespace = 10
AND tl_title ="Coor_URL")) AS lgtmp
ON lgtmp.page_id = ctgl.cl_from
GROUP BY lgtmp.page_title;
SELECT
lgtmp.page_title,
lgtmp.tl_title,
GROUP_CONCAT(ctgl.cl_to)
FROM categorylinks AS ctgl
RIGHT JOIN (SELECT
p1.page_id,
p1.page_title,
tl_title
FROM templatelinks
JOIN page AS p1
ON tl_from = p1.page_id
JOIN (SELECT
cl_from,
p2.page_title
FROM categorylinks
JOIN page AS p2
ON cl_from = p2.page_id
WHERE cl_to = "Templates_with_coordinates_fields"
AND p2.page_namespace = 10) AS cltmp
ON cltmp.page_title = tl_title
WHERE tl_namespace = 10
AND p1.page_namespace = 0
AND page_id NOT IN (SELECT
tl_from
FROM templatelinks
WHERE tl_namespace = 10
AND tl_title ="Coord_missing")
AND page_id NOT IN (SELECT
tl_from
FROM templatelinks
WHERE tl_namespace = 10
AND tl_title ="Coor_URL")) AS lgtmp
ON lgtmp.page_id = ctgl.cl_from
GROUP BY lgtmp.page_title;
-- Updated version above. -- User:Docu
To help with the decisions at Wikipedia:Reviewers. Run as a one-off (maybe once per year in case there are any trends)? AndrewRT( Talk) 01:11, 9 May 2009 (UTC)
The user_registration field isn't really very accurate. For example, Versageek is an admin and registered a long time ago (before me, even), yet here's her info:
mysql> SELECT * FROM user WHERE user_name="Versageek";
+---------+-----------+-------------------+----------------+
| user_id | user_name | user_registration | user_editcount |
+---------+-----------+-------------------+----------------+
| 146418 | Versageek | NULL | 19408 |
+---------+-----------+-------------------+----------------+
1 row in set (0.00 sec)
There are a lot of NULL values in the field. I'll use MIN(rev_timestamp). -- MZMcBride ( talk) 20:03, 10 May 2009 (UTC)
Yes, these blocks are unusually long, but some aren't excessive, e.g. the IP blocks of Scientology IPs per the recent Arbcom case. ⟳ausa کui × 11:31, 21 June 2009 (UTC)
Just wondering if it would also be possible to show the number of pages that the template is used on as well as the number of transclusions.
Would this SQL do it:
SELECT tl_title, COUNT(*) AS transclusion_count, COUNT(DISTINCT tl_from) AS page_count FROM templatelinks WHERE tl_namespace = 10 GROUP BY tl_title ORDER BY COUNT(*) DESC LIMIT 1000;
??? -- WOSlinker ( talk) 20:50, 30 June 2009 (UTC)
Not sure if anyone has noticed, but the protection-related reports have been broken for some time. It seems the query they were using stopped working well; any instance of it hung indefinitely. Probably a MySQL bug or something. Anyway, with this commit I've updated all six reports to now use the API to pull their data. Preliminary tests indicate that the scripts work, but not all of them have been tested. Things like missing titles may break the reports entirely; I didn't have a chance to fully stress-test.
The reports will update at the beginning of Thursday UTC. If you notice any strangeness (or a lack of updating), please shout here or on my talk page. Cheers. -- MZMcBride ( talk) 08:24, 14 July 2009 (UTC)
On July 2, the Toolserver had a power outage causing data corruption in the S1 server (which holds en.wiki's database and a copy of Commons' database). This is a documented breakage and there's currently no estimated time for a re-import of the databases. Because there is data corruption, some reports will be inaccurate. Please act accordingly. Apologies in advance for the inconvenience. -- MZMcBride ( talk) 14:36, 14 July 2009 (UTC)
This may exist already. This report used to trawl the database and report any article which contained ten or more non-free images - it looked like this. Run monthly? Black Kite 08:02, 22 July 2009 (UTC)
I'd love to run AWB over these reports but it doesn't see the links, just templates. Please fix this or create a raw toolserver tool to use.-- Ipatrol ( talk) 19:33, 22 July 2009 (UTC)
I know that technically it's not related to this page, but could you update Wikipedia:WikiProject Stub sorting/Stub type sizes/data on a monthly basis? עוד מישהו Od Mishehu 08:25, 17 August 2009 (UTC)
Per this discussion, it is possible to have a trawl through the data base to find how many stubs of less than 250bytes have not been edited in the last twelve months? It may help the discussion to see how many there are, and for a list so a few examples could be reviewed. If this is not the correct venue, could some kind soul advise me what is. Cheers, LessHeard vanU ( talk) 20:40, 21 August 2009 (UTC)
I'm not sure what the actual objective of this list was / is; if that's made more clear, perhaps I can find another way to get the data you're after. 1500-ish random pages doesn't seem to be your primary goal. -- MZMcBride ( talk) 09:00, 22 August 2009 (UTC)
I'd like a monthly report on all pages which fit the following cirteria:
Query used:
SELECT
pg5.page_id, pg5.page_namespace, pg5.page_title, /* page id, namespace, and title */
pg6.page_namespace, pg6.page_title /* redirect namespace and title */
FROM page AS pg6
JOIN (SELECT
*
FROM page AS pg3
JOIN (SELECT
pl_from,
rd_namespace,
rd_title,
rd_from
FROM page AS pg1
JOIN (SELECT
pl_from,
page_id
FROM pagelinks
JOIN page AS pg4
ON pl_namespace = pg4.page_namespace
AND pl_title = pg4.page_title) AS pgl1
ON pgl1.page_id = pg1.page_id
JOIN redirect
ON rd_from = pg1.page_id) AS pg2
ON pg2.pl_from = pg3.page_id
WHERE rd_title = page_title
AND rd_namespace = page_namespace) AS pg5
ON pg5.rd_from = pg6.page_id;
Almost identical to my Self redirects request, the only problem with the queries is the section is not included in the database. So some of these might be valid redirects to sections. — Dispenser 07:02, 1 September 2009 (UTC)
Per
WP:R2D ("exceptions"), templates such as navboxes shouldn't link to redirects, as it can create self-referential linking similar to that described in the section above. A list of templates that link to redirects would be helpful for fixing these (maybe weekly or biweekly?) –
Drilnoth (
T •
C •
L) 20:58, 31 August 2009 (UTC) My bad, I see that this was within the scope of the last requests. Having a separate list for this would be helpful, though. –
Drilnoth (
T •
C •
L)
21:04, 31 August 2009 (UTC)
If a file doesn't have a template on it, it usually doesn't have a license. Having a list of these (perhaps weekly) would greatly help in tracking down copyright violations. – Drilnoth ( T • C • L) 20:59, 31 August 2009 (UTC)
Files which consist only of headlines and templates, other than {{ Information}} and the various templates in Category:Non-free use rationale templates, usually lack essential information such as a source. Having a list of these (maybe biweekly?) would be helpful in tracking down copyvios. – Drilnoth ( T • C • L) 21:01, 31 August 2009 (UTC)
The lack of metadata in an image is a common sign of copyvios. Any image that is tagged as being available under one of the "self" free licenses, like {{ PD-self}}, {{ PD-user}}, {{ GFDL-self}}, or just the plain old {{ Self}}, and which has no metadata, would make a useful list for finding copyvios. Perhaps biweekly? – Drilnoth ( T • C • L) 21:03, 31 August 2009 (UTC)
SELECT
CONCAT('# [[:File:',page_title,']]')
FROM page
JOIN image
ON img_name = page_title
JOIN categorylinks
ON cl_from = page_id
WHERE cl_to = 'Self-published_work'
AND page_namespace = 6
AND page_is_redirect = 0
AND img_metadata = '0'
LIMIT 30;
+-------------------------------------------------------------------------------------------------------------------------+
| CONCAT('#
File:',page_title,'') |
+-------------------------------------------------------------------------------------------------------------------------+
| #
File:"Bebop-rebop"_early_bop_phrase.png |
| #
File:"Riven_Rock"_wine_label_by_Alden_Marin.jpg |
| #
File:'Standing_Youth',_cast_stone_sculpture_by_Wilhelm_Lehmbruck,_1913,_Museum_of_Modern_Art_(New_York_City).jpg |
| #
File:(Az)Xeus.jpg |
| #
File:-Logo_of_the_15_helicopter_battalion_and_battalion_of_air_transport.jpg |
| #
File:-model-baxe.jpg |
| #
File:001front.jpg |
| #
File:009w.jpg |
| #
File:01201_1t.jpg |
| #
File:014w.jpg |
| #
File:016w.jpg |
| #
File:018w.jpg |
| #
File:020202.JPG |
| #
File:021w.jpg |
| #
File:02210506l.jpg |
| #
File:023_23.jpg |
| #
File:023w.jpg |
| #
File:024w.jpg |
| #
File:03_heritage_monument.JPG |
| #
File:033_33.jpg |
| #
File:035_35.jpg |
| #
File:036_36.jpg |
| #
File:037_37.jpg |
| #
File:04_Jan_2003_(11).jpg |
| #
File:048_48.jpg |
| #
File:049_49.jpg |
| #
File:05_Flag_Presentation.jpg |
| #
File:050_50.jpg |
| #
File:051206d.JPG |
| #
File:05deciraq2.gif |
+-------------------------------------------------------------------------------------------------------------------------+
30 rows in set (0.01 sec)
Thanks for running this. I'll take a look through them in the next couple days and see if I can refine the request. Thanks! – Drilnoth ( T • C • L) 20:00, 1 September 2009 (UTC)
That's a pretty inspired idea. Assuming I wrote this query correctly, some sample data is provided below. Helpful? Unhelpful? -- MZMcBride ( talk) 23:00, 1 September 2009 (UTC)
+--------------------------------------------------------------------------------------------------------------------------+
| CONCAT('
',pg.page_title,' |
File:',il_to,'') |
+--------------------------------------------------------------------------------------------------------------------------+
|
Alden_Marin |
File:"Riven_Rock"_wine_label_by_Alden_Marin.jpg |
|
Chong_Chul_Rhee |
File:070616-Janggeom-Master.jpg |
|
Huang_Yong_Ping |
File:100_Arme_der_Guan-yin.jpg |
|
Sui_Jianguo |
File:100sj.jpg |
|
Peter_Dobbie |
File:10728-225-017t.jpg |
|
Brian_Chase |
File:1161-02chasemisterka14.jpg |
|
Bruno_Wizard |
File:1178-06homo09.jpg |
|
Lisle_Ellis |
File:1255-04_lisle13.jpg |
|
Jack_Dishel |
File:1342moldy09.jpg |
|
Jan_Inge_Reilstad |
File:149px-JanInge.jpg |
|
Giorgio_Gomelsky |
File:1528_giorgio08_crop.jpg |
|
Chris_Caffery |
File:16CC16a_CC_s.jpg |
|
András_Petöcz |
File:19-gurul_500.jpg |
|
Wray_Carlton |
File:1965WrayCarlton.png |
|
Sonny_Black |
File:1978-m07-d00_-_Bracknell_Folk_Festival_-_Bill_Boazman.jpg |
|
Dave_Concepción |
File:198009-012WillieMcCoveyLastCandlestick.gif |
|
Willie_McCovey |
File:198009-012WillieMcCoveyLastCandlestick.gif |
|
Fawzi_Al_Shammari |
File:19810_W400XH600.jpg |
|
Maggie_Holland |
File:1983-m08-d07_-_Goodwood_Folk_Festival_-_English_Country_Blues_Band_(Maggie_Holland).jpg |
|
Boris_Dragojevic |
File:1993.Leute_Moj..._47x37cm.jpg |
|
George_W._Della,_Jr. |
File:1gdella.jpg |
|
D._Page_Elmore |
File:1pelmore.jpg |
|
Perry_McCarthy |
File:2003_pq_paddocks-perry-mccarthy.jpg |
|
Sean_Gabb |
File:2003-08-sig-sweden.jpg |
|
Fred_Wedlock |
File:2003-m06-d21_(11)_-_Somerset_-_Fred_Wedlock_(Allerford_M-8bd.jpg |
|
Mia_Michaels |
File:2005_Mia-Michaels_face.jpg |
|
Horacio_Lavandera |
File:20050731elpepiesp_1_I_SCO.jpg |
|
Bob_Chiarelli |
File:2006_Chiarelli_vote.png |
|
Jonathan_Roberts_(dancer) |
File:2006_Jonathan-Roberts_face.jpg |
|
Alex_Munter |
File:2006_Munter_vote.png |
|
Larry_O'Brien_(Canadian_politician) |
File:2006_O'Brien_vote.png |
|
Anna_Trebunskaya |
File:2007_Anna-Trebunskaya_face.jpg |
|
Ted_Vasin |
File:2007_ted_vasin_the_i_behind_this_i_M.jpg |
|
Jaume_Plensa |
File:20070621_Crown_Fountain_Water.JPG |
|
Piyaporn_Deejing |
File:2008thailand.JPG |
|
Pleumjit_Thinkaow |
File:200px-PJ0@DOHA.jpg |
|
Toby_Goodshank |
File:212bday01.jpg |
|
Hector_Luis_Bustamante |
File:249657869_l-1-.jpg |
|
Csaba_Markus |
File:250px-Csaba_markus_965.jpg |
|
Tim_Ellis |
File:2645.jpg |
+--------------------------------------------------------------------------------------------------------------------------+
40 rows in set (5.61 sec)
The above ones gave me an idea, what about files that don't contain files, if that makes sense. I.e. where some user has created a file:example.jpg page, but didn't upload a file to it, it just contains text or something. Possible? Or would it just list every file thats been moved to commons?-- Jac16888 Talk 21:12, 31 August 2009 (UTC)
SELECT
img_name
FROM image
WHERE NOT EXISTS (SELECT
page_title
FROM page
WHERE img_name = page_title
AND page_namespace = 6)
AND NOT EXISTS (SELECT
page_title
FROM commonswiki_p.page
WHERE img_name = page_title
AND page_namespace = 6);
+--------------------------------------------------------+
| img_name |
+--------------------------------------------------------+
| WLVR_logo.jpg |
| 2005_jorge_ramos_new_3.jpg |
| Lunar_eclipse_chart_close-1944Dec29.png |
| Nas\x7e\x7e\x7e\x7e\x7e\x7e\x7e\x7e_bridgingt_101b.jpg |
| Ruben-studdard-love-is-album-cover.jpg |
| Hebb_bobby\x7e_sunny\x7e\x7e\x7e\x7e_101b.jpg |
| Church_2.jpg |
| St._Theresa_of_Lisieux_Church_2.jpg |
| AK-74,_Modernized.png |
| Lunar_eclipse_from_moon-2053Aug29.png |
| Lunar_eclipse_from_moon-1901May03.png |
| Lunar_eclipse_from_moon-1948Oct18.png |
| Birthplace_of_the_Computer.jpg |
| Lunar_eclipse_from_moon-1908Dec07.png |
| Destination_x.jpg |
| Mike_ford.jpg |
+--------------------------------------------------------+
16 rows in set (16.36 sec)
Done: Wikipedia:Database reports/File description pages without an associated file. -- MZMcBride ( talk) 16:08, 4 September 2009 (UTC)
Hi. Having seen sometimes disruptive users who could edit semi-protected articles and other things restricted to autoconfirmed users due to edits in talk or user namespaces, I was wondering if requiring 10 content edits for autoconfirmation, content edits being defined as those in the main, category, template or portal namespaces, could be useful or would generate too much collateral damage. So it would be useful to have a representative subset of users who wouldn't have been autoconfirmed if we required content edits. A one-time report of autoconfirmed users without 10 edits in content namespaces, including blocked users, would be useful. Thanks, Cenarium ( talk) 01:22, 3 September 2009 (UTC)
Basically, a list of all files in Category:All non-free media which: A) Have more than one image revision (e.g., a resizing or other edit made to the image itself, not to the description page), B) do not transclude {{ Non-free reduce}}, and C) are not in Category:All orphaned non-free use Wikipedia files. The older revisions of these images usually need to be deleted per WP:CSD#F5 because they are unused non-free images. – Drilnoth ( T • C • L) 15:47, 14 September 2009 (UTC)
Query:
SELECT
page_title
FROM page AS pg1
JOIN categorylinks
ON cl_from = page_id
WHERE cl_to = 'All_non-free_media'
AND pg1.page_namespace = 6
AND EXISTS (SELECT
1
FROM oldimage
WHERE oi_name = pg1.page_title)
AND NOT EXISTS (SELECT
1
FROM categorylinks
WHERE cl_from = pg1.page_id
AND cl_to = 'All_orphaned_non-free')
AND NOT EXISTS (SELECT
1
FROM templatelinks
WHERE tl_from = pg1.page_id
AND tl_namespace = 10
AND tl_title = 'Non-free_reduce');
Sample results:
Please check the sample data. It looks like there are 38,000 results total, so I'll probably limit output to the first 1,000 entries. -- MZMcBride ( talk) 02:17, 17 September 2009 (UTC)