Lies, damned lies, and statistics. -- Benjamin Disraeli |
This user page needs to be updated. The reason given is: New data for the current year should be added. Please help update this user page to reflect recent events or newly available information. Relevant discussion may be found on the talk page. (August 2023) |
All the time one can hear claims that Wikipedia has "enough articles" and it is unlikely to grow. And all the time those predictions are proven wrong. In summer 2006, there were about 2 millions articles in need of translation from non-English Wikipedias, and more then 50 million of specialized topics in need of creation (I justify those numbers below). In summer 2011, Wikipedia boasted 3.5 million articles, still covering less than 10% of what would be, roughly, a comprehensive coverage of world's notable subjects. Wikipedia is just in its infancy...
One of the interesting questions about Wikipedia is "how much more information there is for Wikipedia to assimilate"?
Part of that answer, if we look at English Wikipedia, is the number of articles from non-English Wikipedias that need to be translated. I was somewhat surprised that we have no such statistics - at least I was unable to find information on how many articles on a given Wiki (for example, Polish wiki) have interlinks to a specific (English) Wiki?
I checked pages of User:YurikBot and on Wikipedia:Interwikimedia link, Wikipedia:Interlanguage links (shouldn't those two be merged?), and Wikipedia:Multilingual coordination, but they don't seem to have the answer (or I can't find it :>)
Note: while the initial comparison (Polish Wikipedia, PSB) was done by me ( Piotr Konieczny aka Prokonsul Piotrus talk ), please don't hesitate to edit this page, and add more information (from 'to do' lists or whatever you seem is appropriate). But let's discuss it at the discussion page, not here.
So I decided to run a little test: take a random sample of 100 pages from Polish Wikipedia (4th largest Wikipedia with over 250,000 articles) and see how many have interwiki links to en wiki. The sample was taken by clicking the ' random page' button and noting down if article has an interwiki or not.
Results: out of 100 pages randomly selected on Polish Wikipedia, 72 had no interwiki links to en Wikipedia. (test as of 22 July 2006; Wikipedia at that time had about 1,350,000 articles.)
Notes:
Conclusion: While generalizing from Polish Wiki to other wikis is not recommended and we should do similar tests for other wikis, it appears that between 60-70% of articles on Polish wiki have not yet been translated to en wiki. According to Wikipedia article, Wikipedia had more than 4,600,000 articles in many languages, including more than 1,200,000 in the English-language version. Therefore it is possible that just by translating articles from non-English wikipedias into English Wikipedias we would increase the size of English Wikipedia by ~2,000,000 articles.
Next, I decided to run a comparison of 'how many articles from a random encyclopedic publication' are missing on Wikipedia. The publication I selected, Polski Słownik Biograficzny (encyclopedia of famous Poles), was not completely random, but as far as I know there is no project dedicated to creating relevant stubs on en-wiki, and as one of my past projects there is a nice index at User:Piotrus/List of Poles. Note also that PSB is not a general knowledge encyclopedia but a specialized knowledge encyclopedia.
Results: as of 22 July 2006 out of selected 1000 entries of User:Piotrus/List of Poles/Kisielinski-Korzelinski, about 30 entries have blue links (I ignored entries in need of disambigation, like 10 entries for Konrad). Wikipedia at that time had about 1,350,000 articles.
Notes:
Conclusion (as of June '06): assuming PSB represents the average coverage of specialized knowledge on (English) Wikipedia, Wikipedia has covers currently about 3% of such knowledge. If 3% = 1,350,000 articles, than 100% would equal, roughly, 40,000,000 (40 million) articles. Therefore Wikipedia will be approaching somewhat comprehensive coverage of specialized knowledge when we have about 40,000,000 articles. This is a very rough estimate, but it is my reply to some people who said there is not enough encyclopedic knowledge to merit 2,000,000 articles, as well as to the very optimistic estimates of Wikipedia:WikiProject Missing encyclopedic articles (Biographies - 92.6% done ?? who are they kidding? :D )
Preeliminary analysis suggests coverage improvement of ~1% per year, with the estimate completion around turn of the century, assuming a linear growth model...
Updated conclusion (as of February '11): It appears that Wikipedia is growing faster in some other areas than in Polish biographies. Wikipedia coverage of Polish biographies have doubled between June '06 and December '10, but its total number of articles has grown almost threefold in that period (well, around 2.6 times). If we were to take June '09 or Dec '10 numbers and try to estimate the size of complete Wikipedia, we would get the number of ~60 million instead of 40, as the June '06 data would suggest. Of course, as the growth in Polish biographies have not kept pace with the growth of Wikipedia, it is obvious that it is hardly a perfect estimator. Assuming it is some kind of an estimator, we might as well take an average of those two results and call the ultimate, comprehensive size 50 million.
Lies, damned lies, and statistics. -- Benjamin Disraeli |
This user page needs to be updated. The reason given is: New data for the current year should be added. Please help update this user page to reflect recent events or newly available information. Relevant discussion may be found on the talk page. (August 2023) |
All the time one can hear claims that Wikipedia has "enough articles" and it is unlikely to grow. And all the time those predictions are proven wrong. In summer 2006, there were about 2 millions articles in need of translation from non-English Wikipedias, and more then 50 million of specialized topics in need of creation (I justify those numbers below). In summer 2011, Wikipedia boasted 3.5 million articles, still covering less than 10% of what would be, roughly, a comprehensive coverage of world's notable subjects. Wikipedia is just in its infancy...
One of the interesting questions about Wikipedia is "how much more information there is for Wikipedia to assimilate"?
Part of that answer, if we look at English Wikipedia, is the number of articles from non-English Wikipedias that need to be translated. I was somewhat surprised that we have no such statistics - at least I was unable to find information on how many articles on a given Wiki (for example, Polish wiki) have interlinks to a specific (English) Wiki?
I checked pages of User:YurikBot and on Wikipedia:Interwikimedia link, Wikipedia:Interlanguage links (shouldn't those two be merged?), and Wikipedia:Multilingual coordination, but they don't seem to have the answer (or I can't find it :>)
Note: while the initial comparison (Polish Wikipedia, PSB) was done by me ( Piotr Konieczny aka Prokonsul Piotrus talk ), please don't hesitate to edit this page, and add more information (from 'to do' lists or whatever you seem is appropriate). But let's discuss it at the discussion page, not here.
So I decided to run a little test: take a random sample of 100 pages from Polish Wikipedia (4th largest Wikipedia with over 250,000 articles) and see how many have interwiki links to en wiki. The sample was taken by clicking the ' random page' button and noting down if article has an interwiki or not.
Results: out of 100 pages randomly selected on Polish Wikipedia, 72 had no interwiki links to en Wikipedia. (test as of 22 July 2006; Wikipedia at that time had about 1,350,000 articles.)
Notes:
Conclusion: While generalizing from Polish Wiki to other wikis is not recommended and we should do similar tests for other wikis, it appears that between 60-70% of articles on Polish wiki have not yet been translated to en wiki. According to Wikipedia article, Wikipedia had more than 4,600,000 articles in many languages, including more than 1,200,000 in the English-language version. Therefore it is possible that just by translating articles from non-English wikipedias into English Wikipedias we would increase the size of English Wikipedia by ~2,000,000 articles.
Next, I decided to run a comparison of 'how many articles from a random encyclopedic publication' are missing on Wikipedia. The publication I selected, Polski Słownik Biograficzny (encyclopedia of famous Poles), was not completely random, but as far as I know there is no project dedicated to creating relevant stubs on en-wiki, and as one of my past projects there is a nice index at User:Piotrus/List of Poles. Note also that PSB is not a general knowledge encyclopedia but a specialized knowledge encyclopedia.
Results: as of 22 July 2006 out of selected 1000 entries of User:Piotrus/List of Poles/Kisielinski-Korzelinski, about 30 entries have blue links (I ignored entries in need of disambigation, like 10 entries for Konrad). Wikipedia at that time had about 1,350,000 articles.
Notes:
Conclusion (as of June '06): assuming PSB represents the average coverage of specialized knowledge on (English) Wikipedia, Wikipedia has covers currently about 3% of such knowledge. If 3% = 1,350,000 articles, than 100% would equal, roughly, 40,000,000 (40 million) articles. Therefore Wikipedia will be approaching somewhat comprehensive coverage of specialized knowledge when we have about 40,000,000 articles. This is a very rough estimate, but it is my reply to some people who said there is not enough encyclopedic knowledge to merit 2,000,000 articles, as well as to the very optimistic estimates of Wikipedia:WikiProject Missing encyclopedic articles (Biographies - 92.6% done ?? who are they kidding? :D )
Preeliminary analysis suggests coverage improvement of ~1% per year, with the estimate completion around turn of the century, assuming a linear growth model...
Updated conclusion (as of February '11): It appears that Wikipedia is growing faster in some other areas than in Polish biographies. Wikipedia coverage of Polish biographies have doubled between June '06 and December '10, but its total number of articles has grown almost threefold in that period (well, around 2.6 times). If we were to take June '09 or Dec '10 numbers and try to estimate the size of complete Wikipedia, we would get the number of ~60 million instead of 40, as the June '06 data would suggest. Of course, as the growth in Polish biographies have not kept pace with the growth of Wikipedia, it is obvious that it is hardly a perfect estimator. Assuming it is some kind of an estimator, we might as well take an average of those two results and call the ultimate, comprehensive size 50 million.