A monthly overview of recent academic research about Wikipedia and other Wikimedia projects, also published as the Wikimedia Research Newsletter.
A paper [1] presented at the International Conference on Pattern Recognition last year ( earlier poster) presents an automated method to improve Wikipedia's coverage of theatre plays ("only about 10% of the plays in our dataset have corresponding Wikipedia pages"). It searches for playscripts and related documents on the web, extracts key information from them (including the play's main characters, relevant sentences from online synopses of the play, and mentions in Google Books and the Google News archive in an attempt to ensure that the play satisfies Wikipedia's notability criteria). It then compiles this information into an automatically generated Wikipedia article. Two of the 15 articles submitted as result of this method were accepted by Wikipedia editors. For the first, Chitra by Rabindranath Tagore, the initial bot-created submission underwent significant changes by other editors ("the final page reflects some of the improvements we can incorporate in our bot"). The second one, Fourteen by Alice Gerstenberg, "was moved into Wikipedia mainspace with minimal changes. All the references, quotes and paragraphs were retained".
A study of the German Wikipedia [2], about the diversity of editor contributions among the 8 "main categories", shows a relationship between editor diversity and quality. The authors start by defining an "interest profile" of an editor – the proportion of bytes contributed across all categories. Then an entropy measure is proposed which rewards an interest profile for being more distributed across more categories – having a polymath style.
There is a correlation shown between the average diversity of contributors and what types of article quality they've contributed to. Article quality is determined based on whether the article is a " Good Article", " Featured Article", or neither. It is also shown that total productivity, measured by bytes contributed, is linked to diversity, only marginally insignificantly. Finally, a logistic regression shows that diversity more than productivity significantly determines article quality.
Despite too many simplifications (e.g. single language, naive article quality ratings, too broad categories), the methods used by the researchers are well-defined, clear, and convincing in a limited scope, and place a finger on the notion that our most lauded editors tend to run all over Wikipedia.
A list of other recent publications that could not be covered in time for this issue – contributions are always welcome for reviewing or summarizing newly published research.
A monthly overview of recent academic research about Wikipedia and other Wikimedia projects, also published as the Wikimedia Research Newsletter.
A paper [1] presented at the International Conference on Pattern Recognition last year ( earlier poster) presents an automated method to improve Wikipedia's coverage of theatre plays ("only about 10% of the plays in our dataset have corresponding Wikipedia pages"). It searches for playscripts and related documents on the web, extracts key information from them (including the play's main characters, relevant sentences from online synopses of the play, and mentions in Google Books and the Google News archive in an attempt to ensure that the play satisfies Wikipedia's notability criteria). It then compiles this information into an automatically generated Wikipedia article. Two of the 15 articles submitted as result of this method were accepted by Wikipedia editors. For the first, Chitra by Rabindranath Tagore, the initial bot-created submission underwent significant changes by other editors ("the final page reflects some of the improvements we can incorporate in our bot"). The second one, Fourteen by Alice Gerstenberg, "was moved into Wikipedia mainspace with minimal changes. All the references, quotes and paragraphs were retained".
A study of the German Wikipedia [2], about the diversity of editor contributions among the 8 "main categories", shows a relationship between editor diversity and quality. The authors start by defining an "interest profile" of an editor – the proportion of bytes contributed across all categories. Then an entropy measure is proposed which rewards an interest profile for being more distributed across more categories – having a polymath style.
There is a correlation shown between the average diversity of contributors and what types of article quality they've contributed to. Article quality is determined based on whether the article is a " Good Article", " Featured Article", or neither. It is also shown that total productivity, measured by bytes contributed, is linked to diversity, only marginally insignificantly. Finally, a logistic regression shows that diversity more than productivity significantly determines article quality.
Despite too many simplifications (e.g. single language, naive article quality ratings, too broad categories), the methods used by the researchers are well-defined, clear, and convincing in a limited scope, and place a finger on the notion that our most lauded editors tend to run all over Wikipedia.
A list of other recent publications that could not be covered in time for this issue – contributions are always welcome for reviewing or summarizing newly published research.
Discuss this story
Fantastic report! Lots of studies to read, thank you. :-) -- Atlasowa ( talk) 14:26, 30 January 2015 (UTC) reply
This an amazing report! I'll be even more polymath. Before reading this, I edited mostly insects, hopping from family to family, but now I have diversified to GA reviewing, editing a few Romania-related articles, and mostly working on animals. Gug01 ( talk) 13:57, 31 January 2015 (UTC) reply