All feedback appreciated. Smallbones( smalltalk) 17:55, 24 February 2016 (UTC)
User:Smallbones/1000 random has, dare I say it, 1000 articles. Yet User:Smallbones/1000 random results discusses a sample size of 1001. Where did the mysterious 1001st article go (or come from)? -- GRuban ( talk) 19:37, 24 February 2016 (UTC)
a few days later, while still gathering data, I found another article was merged (out of existence essentially)
merged article started 28 November 2001 22835 pvs"
this was a bit more difficult - there seems to have been 2 parallel articles. Which article history to use? Not being clear on what to do and having 2 "extra" articles - the natural thing seemed to be just delete it.
I'd guess these type of fuzzy choices, based on unexpected situations, are pretty common at the start of any data set. The only real question is whether the choices biased the data set somehow. I can't see what the bias would be, and in any case it would be very small, 0.1%, and the choices were made in good faith. Smallbones( smalltalk) 02:03, 25 February 2016 (UTC)
Presumably some of the 1001 had content removed as well as added? So an article that begins with 2,000 bytes, then has 250 bytes added, 50 bytes removed, would then be 2,200 bytes.
Was there any calculation done to see if this was more / less likely in women's bios than men's bios? What the end figures for blps would have been if no content had been deleted. -- The Vintage Feminist ( talk) 12:08, 23 March 2016 (UTC)
Hi Smallbones, my name is Shira Klein, I'm a historian at Chapman University (Orange County, CA) and I'm currently working on a project related to Wikipedia's impact on society. I came across this tree map you created a while ago: https://commons.wikimedia.org/wiki/File:Size_of_English_Wikipedia_(1000_vol).svg I am trying to find a breakdown of Wikipedia by topic for the last year or two, or as recently as possible. Is this ( /info/en/?search=User:Smallbones/1000_random) the latest data you found? Do you know if others have carried out such a survey more recently? I thought perhaps there would be a Wiki page providing this kind of information, but my google searches haven't yielded anything. I also haven't found anything using Google Scholar or other scholarly search engines. Many thanks in advance for any ideas you might have! Shira (feel free to email me also, at sklein at chapman.edu) -- Chapmansh ( talk) 18:27, 26 February 2022 (UTC)
All feedback appreciated. Smallbones( smalltalk) 17:55, 24 February 2016 (UTC)
User:Smallbones/1000 random has, dare I say it, 1000 articles. Yet User:Smallbones/1000 random results discusses a sample size of 1001. Where did the mysterious 1001st article go (or come from)? -- GRuban ( talk) 19:37, 24 February 2016 (UTC)
a few days later, while still gathering data, I found another article was merged (out of existence essentially)
merged article started 28 November 2001 22835 pvs"
this was a bit more difficult - there seems to have been 2 parallel articles. Which article history to use? Not being clear on what to do and having 2 "extra" articles - the natural thing seemed to be just delete it.
I'd guess these type of fuzzy choices, based on unexpected situations, are pretty common at the start of any data set. The only real question is whether the choices biased the data set somehow. I can't see what the bias would be, and in any case it would be very small, 0.1%, and the choices were made in good faith. Smallbones( smalltalk) 02:03, 25 February 2016 (UTC)
Presumably some of the 1001 had content removed as well as added? So an article that begins with 2,000 bytes, then has 250 bytes added, 50 bytes removed, would then be 2,200 bytes.
Was there any calculation done to see if this was more / less likely in women's bios than men's bios? What the end figures for blps would have been if no content had been deleted. -- The Vintage Feminist ( talk) 12:08, 23 March 2016 (UTC)
Hi Smallbones, my name is Shira Klein, I'm a historian at Chapman University (Orange County, CA) and I'm currently working on a project related to Wikipedia's impact on society. I came across this tree map you created a while ago: https://commons.wikimedia.org/wiki/File:Size_of_English_Wikipedia_(1000_vol).svg I am trying to find a breakdown of Wikipedia by topic for the last year or two, or as recently as possible. Is this ( /info/en/?search=User:Smallbones/1000_random) the latest data you found? Do you know if others have carried out such a survey more recently? I thought perhaps there would be a Wiki page providing this kind of information, but my google searches haven't yielded anything. I also haven't found anything using Google Scholar or other scholarly search engines. Many thanks in advance for any ideas you might have! Shira (feel free to email me also, at sklein at chapman.edu) -- Chapmansh ( talk) 18:27, 26 February 2022 (UTC)