A monthly overview of recent academic research about Wikipedia and other Wikimedia projects, also published as the Wikimedia Research Newsletter.
This paper [1] provides evidence that quality of an article is not a simple function of its popularity, or, in the words of the authors, that there is "extensive misalignment between production and consumption" in peer communities such as Wikipedia. As the author notes, reader demand for some topics (e.g. LGBT topics or pages about countries) is poorly satisfied, whereas there is over-abundance of quality on topics of comparatively little interest, such as military history.
Rank | Popular and underdeveloped topics | High-quality, not popular topics |
---|---|---|
1 | Countries | Cricket |
2 | Pop music | Tropical cyclones |
3 | Internet | Middle Ages |
4 | Comedy | Politics |
5 | Technology | Fungi |
6 | Religion | Birds |
7 | Science fiction | Military history |
8 | Rock music | Ships |
9 | Psychology | England |
10 | LGBT studies | Australia |
The authors arrived at this conclusion by comparing data on page views to articles on English, French, Russian, and Portuguese Wikipedias to their respective Wikipedia:Assessment (and like) quality ratings. The authors note that at most 10% of Wikipedia articles are well correlated with regards to their quality and popularity; in turn over 50% of high quality articles concern topics of relatively little demand (as measured by their page views). The authors estimate that about half of the page views on Wikipedia – billions each month – are directed towards articles that should be of better quality, if it was just their popularity that would translate directly into quality. The authors identify 4,135 articles that are of high interest but poor quality, and suggest that the Wikipedia community may want to focus on improving such topics. Among specific examples of extremes are articles with poor quality (start class) and high number of views such as wedding (1k views each day) or cisgender (2.5k views each day). For examples of topics of high quality and little impact, well, one just needs to glance at a random topic in the Wikipedia:Featured articles – the authors use the example of 10 Featured Articles about the members of the Australian cricket team in England in 1948 (itself a Good Article; 30 views per day). Interestingly, based on their study of WikiProjects, popularity and quality, the authors find that contrary to some popular claims, pop culture topics are also among those that are underdeveloped. The authors also note that even within WikiProjects, the labor is not efficiently organized: for example, within the topic of military history, there are numerous featured articles about individual naval ships, but the topics of broader and more popular interests, such as about NATO, are less well attended to. In conclusion, the authors encourage the Wikipedia community to focus on such topics, and to recruit participants for improvement drives using tools such as User:SuggestBot.
Paul J. Heald and his coauthors at the University of Glasgow continued their extremely valuable studies of the public domain, publishing "The Valuation of Unprotected Works". [2] The study finds that "massive social harm was done by the most recent copyright term extension that has prevented millions of works from falling into the public domain since 1998" which "provides strong justification for the enactment of orphan works legislation."
In recent years, authorities have started acknowledging possible errors in copyright legislation of the past, which would have been prevented by an evidence-based approach. Heald mentions the Hargreaves Report (2011), endorsed by the UK's IP office, but other examples can be found in World Intellectual Property Organization reports. This awakening corresponds to the work by researchers and think tanks to prove the importance of public domain and certain damages of copyright. [supp 1]
The importance of evidence-based legislation can't be overstated, especially in the current process of EU copyright revision.
As Heald notes, past copyright policy has relied on a number of incorrect assumptions, in short:
Recent studies, some of which are mentioned in this paper (Pollock, Waldfogel, Heald), have instead found strong indicators that:
In short, it seems that "the public is better off when a work becomes freely available", insofar as copyright has been "robust enough to stimulate the creation of the work in the first place" and that a work "must remain available to the public after it falls into the public domain".
However, it is impossible to measure the value of knowledge acquired by society and, even considering the mere monetary value, it is impossible to measure transactions which did not happen. The English Wikipedia is used by the authors as dataset because its history is open to inspection and its content is unencumbered by copyright payments, so every "transaction" is public.
In particular, the study measures what would be the cost of gratis images not being available for use on English Wikipedia articles, as a proxy of (i) the consumer surplus generated by those images, (ii) their private value, and (iii) their contribution to social welfare. If a positive value is found, it is proved that a more restrictive copyright would be harmful, and we can reasonably infer that reducing copyright restrictions would make society richer.
The calculation is done in three passages.
Clearly, the number of inferences is great, but the authors believe the findings to be robust. The pageview increase, depending on the method, was 6%, 17% or 19%, and at any rate positive. Authors with most images were those died before 1880, an outcome which has no possible technological reason nor any welfare justification: it's clearly a distortion produced by copyright.
For those fond of price tags, the English Wikipedia images were esteemed to be worth about $30,000/year for those 362 writers, or about $30m in hypothetical advertising revenue for English Wikipedia, or $200m–230m in hypothetical costs of image purchase.
At any rate, this reviewer thinks that the positive impact of the lack of copyright royalties is proven and confirms the authors' thesis. It is quite challenging to extend the finding to the whole English Wikipedia, all Wikimedia projects, the entire free knowledge landscape and finally the overall cultural works market; and even more fragile to put a price tag on it. However, this kind of one-number communication device is widely used to explain the impact of legislation and numbers traditionally used by legislators are way more fragile than this. Moreover, the study makes it possible to prove a positive impact on important literature authors and their life, i.e. their reputation, which is supposed to be the aim of copyright laws, while financial transactions are only means.
There are several possible observations to be made about details of the study.
A list of other recent publications that could not be covered in time for this issue – contributions are always welcome for reviewing or summarizing newly published research.
{{
cite journal}}
: Cite journal requires |journal=
(
help)
A monthly overview of recent academic research about Wikipedia and other Wikimedia projects, also published as the Wikimedia Research Newsletter.
This paper [1] provides evidence that quality of an article is not a simple function of its popularity, or, in the words of the authors, that there is "extensive misalignment between production and consumption" in peer communities such as Wikipedia. As the author notes, reader demand for some topics (e.g. LGBT topics or pages about countries) is poorly satisfied, whereas there is over-abundance of quality on topics of comparatively little interest, such as military history.
Rank | Popular and underdeveloped topics | High-quality, not popular topics |
---|---|---|
1 | Countries | Cricket |
2 | Pop music | Tropical cyclones |
3 | Internet | Middle Ages |
4 | Comedy | Politics |
5 | Technology | Fungi |
6 | Religion | Birds |
7 | Science fiction | Military history |
8 | Rock music | Ships |
9 | Psychology | England |
10 | LGBT studies | Australia |
The authors arrived at this conclusion by comparing data on page views to articles on English, French, Russian, and Portuguese Wikipedias to their respective Wikipedia:Assessment (and like) quality ratings. The authors note that at most 10% of Wikipedia articles are well correlated with regards to their quality and popularity; in turn over 50% of high quality articles concern topics of relatively little demand (as measured by their page views). The authors estimate that about half of the page views on Wikipedia – billions each month – are directed towards articles that should be of better quality, if it was just their popularity that would translate directly into quality. The authors identify 4,135 articles that are of high interest but poor quality, and suggest that the Wikipedia community may want to focus on improving such topics. Among specific examples of extremes are articles with poor quality (start class) and high number of views such as wedding (1k views each day) or cisgender (2.5k views each day). For examples of topics of high quality and little impact, well, one just needs to glance at a random topic in the Wikipedia:Featured articles – the authors use the example of 10 Featured Articles about the members of the Australian cricket team in England in 1948 (itself a Good Article; 30 views per day). Interestingly, based on their study of WikiProjects, popularity and quality, the authors find that contrary to some popular claims, pop culture topics are also among those that are underdeveloped. The authors also note that even within WikiProjects, the labor is not efficiently organized: for example, within the topic of military history, there are numerous featured articles about individual naval ships, but the topics of broader and more popular interests, such as about NATO, are less well attended to. In conclusion, the authors encourage the Wikipedia community to focus on such topics, and to recruit participants for improvement drives using tools such as User:SuggestBot.
Paul J. Heald and his coauthors at the University of Glasgow continued their extremely valuable studies of the public domain, publishing "The Valuation of Unprotected Works". [2] The study finds that "massive social harm was done by the most recent copyright term extension that has prevented millions of works from falling into the public domain since 1998" which "provides strong justification for the enactment of orphan works legislation."
In recent years, authorities have started acknowledging possible errors in copyright legislation of the past, which would have been prevented by an evidence-based approach. Heald mentions the Hargreaves Report (2011), endorsed by the UK's IP office, but other examples can be found in World Intellectual Property Organization reports. This awakening corresponds to the work by researchers and think tanks to prove the importance of public domain and certain damages of copyright. [supp 1]
The importance of evidence-based legislation can't be overstated, especially in the current process of EU copyright revision.
As Heald notes, past copyright policy has relied on a number of incorrect assumptions, in short:
Recent studies, some of which are mentioned in this paper (Pollock, Waldfogel, Heald), have instead found strong indicators that:
In short, it seems that "the public is better off when a work becomes freely available", insofar as copyright has been "robust enough to stimulate the creation of the work in the first place" and that a work "must remain available to the public after it falls into the public domain".
However, it is impossible to measure the value of knowledge acquired by society and, even considering the mere monetary value, it is impossible to measure transactions which did not happen. The English Wikipedia is used by the authors as dataset because its history is open to inspection and its content is unencumbered by copyright payments, so every "transaction" is public.
In particular, the study measures what would be the cost of gratis images not being available for use on English Wikipedia articles, as a proxy of (i) the consumer surplus generated by those images, (ii) their private value, and (iii) their contribution to social welfare. If a positive value is found, it is proved that a more restrictive copyright would be harmful, and we can reasonably infer that reducing copyright restrictions would make society richer.
The calculation is done in three passages.
Clearly, the number of inferences is great, but the authors believe the findings to be robust. The pageview increase, depending on the method, was 6%, 17% or 19%, and at any rate positive. Authors with most images were those died before 1880, an outcome which has no possible technological reason nor any welfare justification: it's clearly a distortion produced by copyright.
For those fond of price tags, the English Wikipedia images were esteemed to be worth about $30,000/year for those 362 writers, or about $30m in hypothetical advertising revenue for English Wikipedia, or $200m–230m in hypothetical costs of image purchase.
At any rate, this reviewer thinks that the positive impact of the lack of copyright royalties is proven and confirms the authors' thesis. It is quite challenging to extend the finding to the whole English Wikipedia, all Wikimedia projects, the entire free knowledge landscape and finally the overall cultural works market; and even more fragile to put a price tag on it. However, this kind of one-number communication device is widely used to explain the impact of legislation and numbers traditionally used by legislators are way more fragile than this. Moreover, the study makes it possible to prove a positive impact on important literature authors and their life, i.e. their reputation, which is supposed to be the aim of copyright laws, while financial transactions are only means.
There are several possible observations to be made about details of the study.
A list of other recent publications that could not be covered in time for this issue – contributions are always welcome for reviewing or summarizing newly published research.
{{
cite journal}}
: Cite journal requires |journal=
(
help)
Discuss this story
I haven't read the first paper yet, but I think two factors might explain some of it. Perhaps editors feel that it is easier, more manageable, and less intimidating to tackle a smaller-scale subject, such as the SS Minnow, as opposed to an article covering an extremely broad topic like the entire US Navy. Also, perhaps editors mistakenly assume that these broad topics are already well covered by the encyclopedia. Gamaliel ( talk) 02:47, 1 May 2015 (UTC) reply
So, Comedy and Science Fiction topics are underdeveloped, while Politics and Birds are High-quality ... and this is a problem? Curly Turkey ¡gobble! 04:36, 1 May 2015 (UTC) reply
One of those cricket FAs in the topic mentioned is Donald Bradman. That isn't so very unpopular - it typically gets 500-1000 views a day, which ain't bad for an article about a sportsman who retired 50 years ago. -- Dweller ( talk) 09:23, 5 May 2015 (UTC) reply
Hi everyone, and apologies for being late to the party! In case you don't know, I'm the first author on the paper about popularity/quality. Thank you all for a very interesting discussion, I have jotted down notes from it once already, and will re-read it and write down more notes. The links to previous discussions along these lines are also very helpful, although I haven't yet had the time to read all of them (some of them are quite large). Let me comment on a few specific things, before I go dish out actual thanks to everyone. I'll be adding this talk page to my watchlist in case there are follow-up questions, and I welcome questions or comments on my talk page as well, of course, and I can be emailed if you want to reach me off-wiki.
Gamaliel brings up an important point with regards to why these general subjects don't have FAs (size of the topic), and Karanacs' work on Texas Revolution is a good example (massive kudos for that effort!) We think along the same lines in the paper, although perhaps not at clearly. Figuring out why something occurs was outside the scope of this paper (it's analytical, we try to describe what the world looks like, so to speak), but as I continue my research I am interested in building tools to support contributors who are interested in working on these types of articles, and then those types of issues are of course very important.
Maury Markowitz and Curly Turkey mentioned the long tail, and Jack mentioned contributors choosing from self-interest. The latter is part of our motivation for studying this and something we point to several times in the paper, we wanted to know more about how that type of work selection affects systems like Wikipedia. When it comes to the long tail, it's typically not a "problem" in the popularity context. In all four languages we studied the majority of articles are stub/start quality and they do not get a lot of views, so there is no issue there. It's also clear that because Wikipedia's contributors are volunteers, they're free to leave, and therefore a central decision process on what to work on us unlikely to happen (we discuss this in the paper). Yet, I'm thinking that it would be great if we could figure out a way to serve high-quality content to a larger portion of Wikipedia's audience, which as Karanacs pointed out doesn't mean I'd want to decrease other parts.
Lastly, a technical detail: cricket is, as Dweller and Jack point out, not an unpopular topic. In our paper we were interested in understanding what topics are in the two extremes: highly-popular non-FAs, and FAs that aren't particularly popular. In the latter group, the relative risk of encountering an article from WikiProject Cricket is very high, which is why that project made our list. In other words, we didn't try to define the entirety of topics as popular/not-popular, we instead looked at specific subsets of articles to understand more about them.
Thanks again for the comments, everyone, and please do ask if you have questions! Regards, Nettrom ( talk) 22:33, 5 May 2015 (UTC) reply