![]() | This is an archive of past discussions. Do not edit the contents of this page. If you wish to start a new discussion or revive an old one, please do so on the current talk page. |
Archive 1 | Archive 2 | Archive 3 |
I've just noticed that WP:5000 (or Wikipedia:5000, which behaves identically) now redirects to Wikipedia:List of Wikipedians by number of edits, but it sure used to redirect to User:West.andrew.g/Popular pages. The strange thing is, that when you follow the redirect, and then go back by clicking on (Redirected from Wikipedia:5000), voila... You see the page http://en.wikipedia.org/?title=Wikipedia:5000&redirect=no which says that no, the redirect is actually still pointing to User:West.andrew.g/Popular pages. Could anyone fix this incoherence? Or redirect the problem to a proper place? -- Kubanczyk ( talk) 19:17, 2 August 2013 (UTC)
Hi, I just had two questions:
Thanks for any answers you can provide. I use to track trends on Twitter so I'm always curious in seeing what topics are drawing interest and I'm glad I came across your page. Newjerseyliz ( talk) 14:08, 13 August 2013 (UTC)
- "for the week" should be spelled out. I wasn't clear and had to check one. Otherwise very useful, thanks. A calculated column with the daily average would be helpful too, as daily views are the typical measure we are used to seeing. Johnbod ( talk) 13:49, 8 September 2013 (UTC)
Andrew, I was thinking this table would be more useful if people could sort by article class:
That would be useful! Additionally, I was trying to figure out why some articles that do have class ratings don't have an icon displayed. -- phoebe / ( talk to me) 18:27, 26 November 2013 (UTC)
Curious about the odd character that takes the place of "ú", I looked at one of the logs [1] (large file) and found this (first column is the site, where "en" is the English WIkipedia; second column is the page title; third column the number of requests, and fourth column the bytes served):
en Canc%C3%BAn 75 2499038
en Canc%C3%BAn%2C_Mexico 1 30782
en Canc%C3%BAn%2C_Quintana_Roo 2 0
en Canc%C3%BAn,_Quintana_Roo 1 30782
en Canc%C3%BAn_International_Airport 7 219001
en Canc%FAn 689 0
en Canc\xC3\xBAn 4 126709
[...]
en Cancun 2 61560
en Cancun,_Mexico 3 92352
en Cancun_International_Airport 3 369228
en Cancun_Underwater_Museum 2 19440
en Cancun_airport 1 31341
The requests for Canc%FAn seem to be what make this "popular". The wiki only partly supports that encoding; putting it into a URL takes me to the intended article [2] but an attempt at making a wiki-link looks like this: [[Canc%FAn]]. Looking further in the log file, I noticed that there were no requests to other Wikipedias, or for other articles or files, with the word encoded as "Canc%FAn". Requests with the "Canc%C3%BAn" encoding had much more variation:
commons.m File:Aeropuerto_de_Canc%C3%BAn.JPG 1 11518
commons.m File:Canc%C3%BAn,_Quintana_Roo_Collage.jpg 5 79278
commons.m File:Hard_Rock_Cafe_Canc%C3%BAn.JPG 1 9196
commons.m File:Hotel_Bah%C3%ADa_Pr%C3%ADncipe-Chacumal-Estrada_federal_307_Canc%C3%BAn-Chetumal-1.jpg 1 0
de Canc%C3%BAn 3 55545
de UN-Klimakonferenz_in_Canc%C3%BAn 1 46264
en Amante_bandido_-_Miguel_Bos%C3%A9_en_Canc%C3%BAn_(acercamiento_con_binocular) 1 7153
en Aut%C3%B3dromo_de_Canc%C3%BAn 1 18823
en Canc%C3%BAn 75 2499038
en Canc%C3%BAn%2C_Mexico 1 30782
en Canc%C3%BAn%2C_Quintana_Roo 2 0
en Canc%C3%BAn,_Quintana_Roo 1 30782
en Canc%C3%BAn_International_Airport 7 219001
en Category:People_from_Canc%C3%BAn 4 32788
en File:Canc%C3%BAn%2C_Quintana_Roo_Collage.jpg 6 57540
en File:Canc%C3%BAn,_Quintana_Roo_Collage.jpg 5 47950
en Talk:Canc%C3%BAn 1 24111
es Aeropuerto_Internacional_de_Canc%C3%BAn 9 385164
es Canc%C3%BAn 51 3217033
es Estadio_Canc%C3%BAn_86 1 0
es Pioneros_de_Canc%C3%BAn 1 10636
eu Canc%C3%BAn 1 13625
fr Canc%C3%BAn 3 48714
fr Les_Marseillais_%C3%A0_Canc%C3%BAn 7 117465
hr Canc%C3%BAn 1 11223
it Aeroporto_Internazionale_di_Canc%C3%BAn 1 17328
it Canc%C3%BAn 1 17144
ko eu:Canc%C3%BAn 1 20
mr %E0%A4%9A%E0%A4%BF%E0%A4%A4%E0%A5%8D%E0%A4%B0:Sala_embarque_aeropuerto_de_Canc
%C3%BAn.JPG 1 12132
pl Canc%C3%BAn 5 186749
pt Canc%C3%BAn 16 396723
tr Canc%C3%BAn 1 12754
I noticed especially that on the Spanish Wikipedia, there were 51 requests for "Canc%C3%BAn" but none for "Canc%FAn" whereas on the English Wikipedia, there were 4 requests for "Canc\xC3\xBAn" and 689 for "Canc%FAn". — rybec 22:01, 25 December 2013 (UTC)
The Web site stats.grok.se has graphs of the traffic. For last week's list, I noticed that many of the most-requested articles about food, ecology, politics and geography had similar graphs (for Climatic Research Unit email controversy and two others, the similarity to all the others begins after a drastic increase in traffic).
— rybec 10:32, 29 December 2013 (UTC)
I deliberately exclude the climate change articles' views from my reports, because I assume they artificially generated; the fact that they follow similar patterns would appear to support that. Serendi pod ous 11:03, 29 December 2013 (UTC)
( edit conflict) The ones that didn't match were mainly about current events or entertainment (my computer mangled some of the diacritical marks):
I think the traffic to articles in the first list is mostly automated. On 1 November, noticing a massive number of requests for Harlan Watson, I wrote the article. Several of the sources I found call the man Harlan L. Watson. Since the beginning of November, there have been over a million requests for Harlan Watson, but only 24 for Harlan L. Watson. Also striking is the fact that no one else has edited the article or its talk page. Along the same lines, I notice that:
ratio of November 2013 to November 2012 requests
|
---|
Main Page (non-eco group) 352080385/271083976 = 1.30 Meat (eco group) 1664329/45763 = 36.4 Beef (non-eco group) 56685/94833 = 0.60 Quinoa (eco group) 792725/258832 = 3.06 Food (eco group) 659638/123008 = 5.36 India (perhaps I shouldn't have included this in the first group) 1228738/976577 = 1.26 Finland (non-eco group) 225774/182604 = 1.24 Denmark (eco group) 735967/244273 = 3.01 San_Francisco (eco group) 741764/237712 = 3.12 Oakland (non-eco group) 3702/5902=0.62 Environmentalism (non-eco group) 20921/27438 = 0.76 |
— rybec 04:40, 30 December 2013 (UTC)
FYI, the WMF statistical backend malfunctioned for nearly 35 hours over 1/5 and 1/6. Notice the empty (4k) hourly files in the usual location. I don't know if this is something they can recover, but if not, it will certainly have great bearing on our next WP:5000 and its comparisons to previous editions of this list. West.andrew.g ( talk) 15:30, 8 January 2014 (UTC)
@ The ed17: @ Serendipodous: @ Milowent: @ Yaris678: -- Code is currently running to spit out a 2013 statistical summary equivalent in format to WP:5000. This is no trivial task, and I expect it to take on the order of a couple days to do the massive database join. Once it is done, I am thinking it will be a valuable and fun resource. I can also spin off a couple of tables for the "biggest hours" or "biggest days" for certain events/articles. Framed with discussion this should make a nice Signpost article, and given the success of our last attempt, I'd again like to see this pushed to Reddit, Slashdot and all the other outlets we can think of. Who is on board?
In related news, I'd like to combine these statistics, our previous discussion/analysis in the Signpost, and some novel processing towards an academic publication (a conference deadline friendly to this topic is coming in late February). I'd like to invite those who I interact with regularly here to be my co-authors in that effort. While the Signpost is great for Wikipedia folks, it would be nice to reach out to the larger web research community and perhaps get others interested the data. West.andrew.g ( talk) 18:34, 2 January 2014 (UTC)
Is the update late this week? Hope you're not too overloaded. Serendi pod ous 16:47, 5 January 2014 (UTC)
Below are the busiest "article hours" in 2013. That is, those articles receiving the most traffic in a one hour period. Only the most popular hour for a title is shown, and I've excluded the main page. I've pasted the first 500 entries in raw form. Recall that these dates are in UTC time. If someone would like to wikify and extend this table, perhaps we could try to publicize a bit?
ARTICLE | UTC DATE | VIEWS | REASON ---------------------------------------------------------------------- [[Jorge_Bergoglio]] | March 13, 2013 | 1,460,586 | Papal ascension [[Shakuntala_Devi]] | November 4, 2013 | 766,256 | Google Doodle [[Paul_Walker]] | December 1, 2013 | 752,770 | Death [[Grace_Hopper]] | December 9, 2013 | 621,694 | Google Doodle [[Nelson_Mandela]] | December 5, 2013 | 484,966 | Death [[Jodie_Foster]] | January 14, 2013 | 451,270 | Came out at Golden Globes [[Beyonc%C3%A9_Knowles]] | February 4, 2013 | 378,923 | Super bowl halftime [[Nicolaus_Copernicus]] | February 19, 2013 | 336,836 | Google Doodle [[Seth_MacFarlane]] | February 25, 2013 | 320,999 | Hosted the Oscars [[Daniel_Day-Lewis]] | February 25, 2013 | 318,839 | Oscars [[Society_of_Jesus]] | March 13, 2013 | 287,568 | Papal ascension [[Mindy_McCready]] | February 18, 2013 | 282,679 | Death [[Hermann_Rorschach]] | November 8, 2013 | 276,072 | Google Doodle [[Edith_Head]] | October 28, 2013 | 263,915 | Google Doodle [[Raymond_Loewy]] | November 5, 2013 | 258,301 | Google Doodle [[Margaret_Thatcher]] | April 8, 2013 | 252,906 | Death [[Pope_Francis]] | March 13, 2013 | 248,753 | Papal ascension [[Peter_Capaldi]] | August 4, 2013 | 244,667 | Announced as next Dr. Who
Thanks, West.andrew.g ( talk) 20:30, 9 January 2014 (UTC)
After many computer cycles, the list has generated. I did the top 10k with quality annotations. Give it a while to load, as there is a ton of table processing that has to go on for that page to generate:
The top 10,000 for 2013 -- I would appreciate if people could re-post to whatever talk pages or venues might find this interesting. Thanks, West.andrew.g ( talk) 17:02, 13 January 2014 (UTC)
I am currently working on organizing pageview stats for my own purposes, although it may prove useful to others as well if things go well. This seemed to be the best (most watched) place to get the attention of multiple "page view gurus"...
Specifically, I am looking to use the logs to analyze how the Olympics drove traffic on athlete article. However, depending on performance, I have be inspired to expand the project to a longer range of data and make a stats service like grok\wikistats (but more focused on traffic jumps). My first dilemma is how to structure the database - specifically for scalability. My thought was table 1: id (primary key), pagename (indexed). Table 2: id,date,hour (3 col primary key), hits. Initial calculations suggest that will be fine for 1 month of data, but if extended I'm not so sure. Any advice\experiences to share?
Second, any thoughts about combining equivalent hits (example "First_Last" vs. "First%20Last")? Currently I "un-uri-encode" the data and combine identical. This makes import slower, but I think is "correct" as the two requests should resolve the same. Is there any valid reason not to combine?
I will have more questions later on people's preferred way to handle several data handling choices later if I decide to pursue the public stats service idea. -- ThaddeusB ( talk) 03:09, 3 March 2014 (UTC)
lots of Red Links on this page
Following a bug report at User_talk:West.andrew.g#Weird_topics_in_top_5000_list_and_Stats.Grok.Se I have discovered that this aggregation has not been handling colon characters properly. Previous code used colons as an indication that an article was outside of namespace 0 (the "main" or "article" namespace). Therefore article titles that contained colons, such as Call_of_Duty_4:_Modern_Warfare, would have been excluded from this list. That bug has now been fixed. However, this also represents a non-trivial change to the very inner loops of the aggregation routines. Please check for odd behavior at the next update, especially as it pertains to namespaces and titles with colons. Thanks, West.andrew.g ( talk) 15:04, 10 June 2014 (UTC)
I noted that the WP:5000 did not get updated on 17 Aug, is it scheduled to occur soon? Cheers.-- Milowent • has spoken 02:57, 20 August 2014 (UTC)
Do these numbers include views to the mobile versions of these pages? Thanks. Biosthmors ( talk) pls notify me (i.e. {{ U}}) while signing a reply, thx 17:07, 8 August 2014 (UTC)
A significant statistical issue has come to my attention. Quite simply, the WMF does not record/report per-article mobile views, and thus they are unavailable for my aggregation. This means the numbers I present significantly under-report the actual number of total views, as the WMF provides only the "desktop" (non-mobile) perspective.
This has been confirmed via WMF staff. They have indicated to me the processing infrastructure of the WMF is insufficient to handle the workload at this time.
Frankly, this came as a surprise to me. It is a bit perplexing to me why English "mobile" pageviews can't be included in the per-page aggregates for English "desktop" views; they are, after all, the exact same content. This very well could be an artifact of an earlier system design that was not prepared to handle mobile views. I am in no position to comment on that hypothesis.
To say our numbers (limited only to "desktop" views) are under-representing actual views is quite an understatement. The one thing the WMF does monitor in both desktop/mobile formats are project scale view counts, as can be seen in the 2nd and 3rd graphs at [420]. Based on ~9.5B total en.wp views at the last snapshot, ~3B of which were mobile, then the average per-article total under-reports by a factor of 1.38x. We might imagine this factor is even higher on entries found in WP:5000 and WP:Top25Report whose pop-culture nature might lend themselves more to mobile audiences.
If/when the WMF starts reporting per-article mobile views, I'll be quick to integrate them into my reporting infrastructure. Until then? Community awareness of the issue might bring a more rapid solution. Also, should we consider designing a template (with link back to this thread) that points out this fact, and put it atop all of the prior reports? Thanks, West.andrew.g ( talk) 18:36, 4 September 2014 (UTC)
![]() | This is an archive of past discussions. Do not edit the contents of this page. If you wish to start a new discussion or revive an old one, please do so on the current talk page. |
Archive 1 | Archive 2 | Archive 3 |
I've just noticed that WP:5000 (or Wikipedia:5000, which behaves identically) now redirects to Wikipedia:List of Wikipedians by number of edits, but it sure used to redirect to User:West.andrew.g/Popular pages. The strange thing is, that when you follow the redirect, and then go back by clicking on (Redirected from Wikipedia:5000), voila... You see the page http://en.wikipedia.org/?title=Wikipedia:5000&redirect=no which says that no, the redirect is actually still pointing to User:West.andrew.g/Popular pages. Could anyone fix this incoherence? Or redirect the problem to a proper place? -- Kubanczyk ( talk) 19:17, 2 August 2013 (UTC)
Hi, I just had two questions:
Thanks for any answers you can provide. I use to track trends on Twitter so I'm always curious in seeing what topics are drawing interest and I'm glad I came across your page. Newjerseyliz ( talk) 14:08, 13 August 2013 (UTC)
- "for the week" should be spelled out. I wasn't clear and had to check one. Otherwise very useful, thanks. A calculated column with the daily average would be helpful too, as daily views are the typical measure we are used to seeing. Johnbod ( talk) 13:49, 8 September 2013 (UTC)
Andrew, I was thinking this table would be more useful if people could sort by article class:
That would be useful! Additionally, I was trying to figure out why some articles that do have class ratings don't have an icon displayed. -- phoebe / ( talk to me) 18:27, 26 November 2013 (UTC)
Curious about the odd character that takes the place of "ú", I looked at one of the logs [1] (large file) and found this (first column is the site, where "en" is the English WIkipedia; second column is the page title; third column the number of requests, and fourth column the bytes served):
en Canc%C3%BAn 75 2499038
en Canc%C3%BAn%2C_Mexico 1 30782
en Canc%C3%BAn%2C_Quintana_Roo 2 0
en Canc%C3%BAn,_Quintana_Roo 1 30782
en Canc%C3%BAn_International_Airport 7 219001
en Canc%FAn 689 0
en Canc\xC3\xBAn 4 126709
[...]
en Cancun 2 61560
en Cancun,_Mexico 3 92352
en Cancun_International_Airport 3 369228
en Cancun_Underwater_Museum 2 19440
en Cancun_airport 1 31341
The requests for Canc%FAn seem to be what make this "popular". The wiki only partly supports that encoding; putting it into a URL takes me to the intended article [2] but an attempt at making a wiki-link looks like this: [[Canc%FAn]]. Looking further in the log file, I noticed that there were no requests to other Wikipedias, or for other articles or files, with the word encoded as "Canc%FAn". Requests with the "Canc%C3%BAn" encoding had much more variation:
commons.m File:Aeropuerto_de_Canc%C3%BAn.JPG 1 11518
commons.m File:Canc%C3%BAn,_Quintana_Roo_Collage.jpg 5 79278
commons.m File:Hard_Rock_Cafe_Canc%C3%BAn.JPG 1 9196
commons.m File:Hotel_Bah%C3%ADa_Pr%C3%ADncipe-Chacumal-Estrada_federal_307_Canc%C3%BAn-Chetumal-1.jpg 1 0
de Canc%C3%BAn 3 55545
de UN-Klimakonferenz_in_Canc%C3%BAn 1 46264
en Amante_bandido_-_Miguel_Bos%C3%A9_en_Canc%C3%BAn_(acercamiento_con_binocular) 1 7153
en Aut%C3%B3dromo_de_Canc%C3%BAn 1 18823
en Canc%C3%BAn 75 2499038
en Canc%C3%BAn%2C_Mexico 1 30782
en Canc%C3%BAn%2C_Quintana_Roo 2 0
en Canc%C3%BAn,_Quintana_Roo 1 30782
en Canc%C3%BAn_International_Airport 7 219001
en Category:People_from_Canc%C3%BAn 4 32788
en File:Canc%C3%BAn%2C_Quintana_Roo_Collage.jpg 6 57540
en File:Canc%C3%BAn,_Quintana_Roo_Collage.jpg 5 47950
en Talk:Canc%C3%BAn 1 24111
es Aeropuerto_Internacional_de_Canc%C3%BAn 9 385164
es Canc%C3%BAn 51 3217033
es Estadio_Canc%C3%BAn_86 1 0
es Pioneros_de_Canc%C3%BAn 1 10636
eu Canc%C3%BAn 1 13625
fr Canc%C3%BAn 3 48714
fr Les_Marseillais_%C3%A0_Canc%C3%BAn 7 117465
hr Canc%C3%BAn 1 11223
it Aeroporto_Internazionale_di_Canc%C3%BAn 1 17328
it Canc%C3%BAn 1 17144
ko eu:Canc%C3%BAn 1 20
mr %E0%A4%9A%E0%A4%BF%E0%A4%A4%E0%A5%8D%E0%A4%B0:Sala_embarque_aeropuerto_de_Canc
%C3%BAn.JPG 1 12132
pl Canc%C3%BAn 5 186749
pt Canc%C3%BAn 16 396723
tr Canc%C3%BAn 1 12754
I noticed especially that on the Spanish Wikipedia, there were 51 requests for "Canc%C3%BAn" but none for "Canc%FAn" whereas on the English Wikipedia, there were 4 requests for "Canc\xC3\xBAn" and 689 for "Canc%FAn". — rybec 22:01, 25 December 2013 (UTC)
The Web site stats.grok.se has graphs of the traffic. For last week's list, I noticed that many of the most-requested articles about food, ecology, politics and geography had similar graphs (for Climatic Research Unit email controversy and two others, the similarity to all the others begins after a drastic increase in traffic).
— rybec 10:32, 29 December 2013 (UTC)
I deliberately exclude the climate change articles' views from my reports, because I assume they artificially generated; the fact that they follow similar patterns would appear to support that. Serendi pod ous 11:03, 29 December 2013 (UTC)
( edit conflict) The ones that didn't match were mainly about current events or entertainment (my computer mangled some of the diacritical marks):
I think the traffic to articles in the first list is mostly automated. On 1 November, noticing a massive number of requests for Harlan Watson, I wrote the article. Several of the sources I found call the man Harlan L. Watson. Since the beginning of November, there have been over a million requests for Harlan Watson, but only 24 for Harlan L. Watson. Also striking is the fact that no one else has edited the article or its talk page. Along the same lines, I notice that:
ratio of November 2013 to November 2012 requests
|
---|
Main Page (non-eco group) 352080385/271083976 = 1.30 Meat (eco group) 1664329/45763 = 36.4 Beef (non-eco group) 56685/94833 = 0.60 Quinoa (eco group) 792725/258832 = 3.06 Food (eco group) 659638/123008 = 5.36 India (perhaps I shouldn't have included this in the first group) 1228738/976577 = 1.26 Finland (non-eco group) 225774/182604 = 1.24 Denmark (eco group) 735967/244273 = 3.01 San_Francisco (eco group) 741764/237712 = 3.12 Oakland (non-eco group) 3702/5902=0.62 Environmentalism (non-eco group) 20921/27438 = 0.76 |
— rybec 04:40, 30 December 2013 (UTC)
FYI, the WMF statistical backend malfunctioned for nearly 35 hours over 1/5 and 1/6. Notice the empty (4k) hourly files in the usual location. I don't know if this is something they can recover, but if not, it will certainly have great bearing on our next WP:5000 and its comparisons to previous editions of this list. West.andrew.g ( talk) 15:30, 8 January 2014 (UTC)
@ The ed17: @ Serendipodous: @ Milowent: @ Yaris678: -- Code is currently running to spit out a 2013 statistical summary equivalent in format to WP:5000. This is no trivial task, and I expect it to take on the order of a couple days to do the massive database join. Once it is done, I am thinking it will be a valuable and fun resource. I can also spin off a couple of tables for the "biggest hours" or "biggest days" for certain events/articles. Framed with discussion this should make a nice Signpost article, and given the success of our last attempt, I'd again like to see this pushed to Reddit, Slashdot and all the other outlets we can think of. Who is on board?
In related news, I'd like to combine these statistics, our previous discussion/analysis in the Signpost, and some novel processing towards an academic publication (a conference deadline friendly to this topic is coming in late February). I'd like to invite those who I interact with regularly here to be my co-authors in that effort. While the Signpost is great for Wikipedia folks, it would be nice to reach out to the larger web research community and perhaps get others interested the data. West.andrew.g ( talk) 18:34, 2 January 2014 (UTC)
Is the update late this week? Hope you're not too overloaded. Serendi pod ous 16:47, 5 January 2014 (UTC)
Below are the busiest "article hours" in 2013. That is, those articles receiving the most traffic in a one hour period. Only the most popular hour for a title is shown, and I've excluded the main page. I've pasted the first 500 entries in raw form. Recall that these dates are in UTC time. If someone would like to wikify and extend this table, perhaps we could try to publicize a bit?
ARTICLE | UTC DATE | VIEWS | REASON ---------------------------------------------------------------------- [[Jorge_Bergoglio]] | March 13, 2013 | 1,460,586 | Papal ascension [[Shakuntala_Devi]] | November 4, 2013 | 766,256 | Google Doodle [[Paul_Walker]] | December 1, 2013 | 752,770 | Death [[Grace_Hopper]] | December 9, 2013 | 621,694 | Google Doodle [[Nelson_Mandela]] | December 5, 2013 | 484,966 | Death [[Jodie_Foster]] | January 14, 2013 | 451,270 | Came out at Golden Globes [[Beyonc%C3%A9_Knowles]] | February 4, 2013 | 378,923 | Super bowl halftime [[Nicolaus_Copernicus]] | February 19, 2013 | 336,836 | Google Doodle [[Seth_MacFarlane]] | February 25, 2013 | 320,999 | Hosted the Oscars [[Daniel_Day-Lewis]] | February 25, 2013 | 318,839 | Oscars [[Society_of_Jesus]] | March 13, 2013 | 287,568 | Papal ascension [[Mindy_McCready]] | February 18, 2013 | 282,679 | Death [[Hermann_Rorschach]] | November 8, 2013 | 276,072 | Google Doodle [[Edith_Head]] | October 28, 2013 | 263,915 | Google Doodle [[Raymond_Loewy]] | November 5, 2013 | 258,301 | Google Doodle [[Margaret_Thatcher]] | April 8, 2013 | 252,906 | Death [[Pope_Francis]] | March 13, 2013 | 248,753 | Papal ascension [[Peter_Capaldi]] | August 4, 2013 | 244,667 | Announced as next Dr. Who
Thanks, West.andrew.g ( talk) 20:30, 9 January 2014 (UTC)
After many computer cycles, the list has generated. I did the top 10k with quality annotations. Give it a while to load, as there is a ton of table processing that has to go on for that page to generate:
The top 10,000 for 2013 -- I would appreciate if people could re-post to whatever talk pages or venues might find this interesting. Thanks, West.andrew.g ( talk) 17:02, 13 January 2014 (UTC)
I am currently working on organizing pageview stats for my own purposes, although it may prove useful to others as well if things go well. This seemed to be the best (most watched) place to get the attention of multiple "page view gurus"...
Specifically, I am looking to use the logs to analyze how the Olympics drove traffic on athlete article. However, depending on performance, I have be inspired to expand the project to a longer range of data and make a stats service like grok\wikistats (but more focused on traffic jumps). My first dilemma is how to structure the database - specifically for scalability. My thought was table 1: id (primary key), pagename (indexed). Table 2: id,date,hour (3 col primary key), hits. Initial calculations suggest that will be fine for 1 month of data, but if extended I'm not so sure. Any advice\experiences to share?
Second, any thoughts about combining equivalent hits (example "First_Last" vs. "First%20Last")? Currently I "un-uri-encode" the data and combine identical. This makes import slower, but I think is "correct" as the two requests should resolve the same. Is there any valid reason not to combine?
I will have more questions later on people's preferred way to handle several data handling choices later if I decide to pursue the public stats service idea. -- ThaddeusB ( talk) 03:09, 3 March 2014 (UTC)
lots of Red Links on this page
Following a bug report at User_talk:West.andrew.g#Weird_topics_in_top_5000_list_and_Stats.Grok.Se I have discovered that this aggregation has not been handling colon characters properly. Previous code used colons as an indication that an article was outside of namespace 0 (the "main" or "article" namespace). Therefore article titles that contained colons, such as Call_of_Duty_4:_Modern_Warfare, would have been excluded from this list. That bug has now been fixed. However, this also represents a non-trivial change to the very inner loops of the aggregation routines. Please check for odd behavior at the next update, especially as it pertains to namespaces and titles with colons. Thanks, West.andrew.g ( talk) 15:04, 10 June 2014 (UTC)
I noted that the WP:5000 did not get updated on 17 Aug, is it scheduled to occur soon? Cheers.-- Milowent • has spoken 02:57, 20 August 2014 (UTC)
Do these numbers include views to the mobile versions of these pages? Thanks. Biosthmors ( talk) pls notify me (i.e. {{ U}}) while signing a reply, thx 17:07, 8 August 2014 (UTC)
A significant statistical issue has come to my attention. Quite simply, the WMF does not record/report per-article mobile views, and thus they are unavailable for my aggregation. This means the numbers I present significantly under-report the actual number of total views, as the WMF provides only the "desktop" (non-mobile) perspective.
This has been confirmed via WMF staff. They have indicated to me the processing infrastructure of the WMF is insufficient to handle the workload at this time.
Frankly, this came as a surprise to me. It is a bit perplexing to me why English "mobile" pageviews can't be included in the per-page aggregates for English "desktop" views; they are, after all, the exact same content. This very well could be an artifact of an earlier system design that was not prepared to handle mobile views. I am in no position to comment on that hypothesis.
To say our numbers (limited only to "desktop" views) are under-representing actual views is quite an understatement. The one thing the WMF does monitor in both desktop/mobile formats are project scale view counts, as can be seen in the 2nd and 3rd graphs at [420]. Based on ~9.5B total en.wp views at the last snapshot, ~3B of which were mobile, then the average per-article total under-reports by a factor of 1.38x. We might imagine this factor is even higher on entries found in WP:5000 and WP:Top25Report whose pop-culture nature might lend themselves more to mobile audiences.
If/when the WMF starts reporting per-article mobile views, I'll be quick to integrate them into my reporting infrastructure. Until then? Community awareness of the issue might bring a more rapid solution. Also, should we consider designing a template (with link back to this thread) that points out this fact, and put it atop all of the prior reports? Thanks, West.andrew.g ( talk) 18:36, 4 September 2014 (UTC)