A monthly overview of recent academic research about Wikipedia and other Wikimedia projects, also published as the Wikimedia Research Newsletter.
An ArXiv preprint titled "Highlighting entanglement of cultures via ranking of multilingual Wikipedia articles" [1], authored by a group of physicists from France, examines the Wikipedia articles on individuals and their position in the hyperlink network of the articles in each Wikipedia language edition. There are 9 language editions studied. The authors try to locate the most "important" individuals ("heroes") in each language edition by calculating two different page rank scores: PageRank and CheiRank. After making the lists of individuals with highest ranks in each language edition (with 30 individuals in each list), overlaps between lists are investigated and local and global "heroes" are introduced. The lists of "global heroes" are topped by Napoleon for PageRank, and Michael Jackson for 2DRank. It is shown that both local and global heroes exist and while global heroes gain their central position in the network due to links from multiple other central nodes, local heroes are mostly notable because of the large number of links directly pointing to them. Finally, based on the nationality (language of origin) of the highly ranked individual, a network of languages is constructed and the position of each language in this network is analysed by calculating rank scores. The authors also analyzed the activities of those important individuals, and have found politicians and scientists to be quite often among the most important ones.
The book chapter of "Wikipedia as Cultural Reference" in Richard A. Rogers' book "Digital Methods" [2] can be read as an example of the "digital methods" applied to Wikipedia, or a contribution to the emerging literature on cross-language-version or cross-cultural comparison of the same or similar encyclopedia articles in global Wikipedia projects. Not to be confused with "big methods", "virtual methods", etc. [3], the Digital Methods Initiative (DMI) is a school of Internet researchers at University of Amsterdam led by Rogers to 'create a platform to display the tools and methods to perform research that ... take advantage of "web epistemology"'. Currently the DMI has built some basic Wikipedia research tools that help social scientists to analyze cross-lingual images, anonymous edits, tables of contents, etc. Thus, as part of Rogers' research agenda in advocating the "digital methods", the Wikipedia projects become both a data set and analytical devices that can be repurposed for social research: "as a cultural reference, a vigilant community, a scandal machine and a controversy diagnostic machine" [4].
Self-defined as "cultural research with Wikipedia", this chapter compared the Srebrenica Articles (The Fall of Srebrenica, the Srebrenica Massacre, and the Srebrenica Genocide) across six language versions: Dutch, English, Bosnian, Croatian, Serbian, and Serbo-Croatian. Using various kinds of datasets, ranging from creation dates, edits by interlanguage article editors and top ten editors, the numbers of victims, tables of contents, referenced websites and images used, the findings show that the principle of neutral point of view does not automatically make Wikipedia articles universal (or at least similar) across language versions. The differences, especially those specific to the Wiki medium, can be used for cultural analysis on the selected topics. The content outcome is found to reflect the dynamics between the power editors in defending their sources and content using Wikipedia policies. Among these "umbrella articles", the English version is a highly contested article among many interlanguage editors, and the Serbo-Croatian version is much softened and unifying with very few editors.
Adopting and extending the digital methods, two groups of participants at the DMI summer school 2013 examined the cross-language-version differences on two topics: art and menstruation. The "Cross Lingual Art Spaces on Wikipedia" project (by Sangeet Kumar, Garance Coggins, Sarah Mc Monagle, Stephan Schlögl, Han-Teng Liao, Michael Stevenson, Federica Bardelli, and Anat Ben-David) sought to find the universal and specific articulations of the concept of art through (1) images and (2) concepts (i.e. strongly related articles), producing an image network visualization for 154 language versions and a concept network visualization for eight selected language versions. A Wikidata scraping tool was developed to identify different names for the same content for the process called "concept reference disambiguation".
The second project, "Menstruation Across Cultures Online" (by Astrid Bigoni, Loes Bogers, Zuzana Karascakova, Emily Stacey and Sarah Mc Monagle) looked at the cultural differences of Wikipedia images and Google autocomplete suggestions to find associated images and search queries. In addition, the English version of the article on menstruation was compared with other English-language sources such as Urban Dictionary and Twitter, producing an interesting cross-platform comparative tag cloud. While not full research articles, the research outcomes of the two projects nonetheless demonstrated the potential directions for cross-cultural and cross-platform comparison, when Wikipedia projects are compared among themselves or with other online platforms that contain user-generated content and/or activities.
A conference paper titled "Does the Acquaintance Relation Close up the Administrator Community of Polish Wikipedia?" [5] investigates why the Polish Wikipedia community of Administrators is growing slower than expected, as defined by a decrease in successful RfAs. The paper presents a useful literature review of related academic work on RfA, and is a welcome study of the under-researched population of editors at non-English Wikipedias. It seems to focus on the computer science dimension, with a developed statistics section, but little theory discussion. In this reviewer's opinion it would've been stronger if the authors engaged with more social science theory, such as the iron law of oligarchy.
The authors suggest at first such a decline may occur because administrators are chosen on the basis of acquaintance, thus creating a closed group which people lacking the right connections cannot join. Later, they conclude that this is unlikely, instead pointing to growing expectations about new candidates. Both of those would be valid hypotheses, but neither is clearly tied to any theory or previous study. The authors' analysis of the data is problematic; at one point they contradict themselves, noting that "[One of the observed phenomena] could indicate, however, that the community is closing up after all" although later their conclusion states "Our conclusion is that it cannot be claimed with certainty that the Polish Wikipedia community is closing up.".
The authors also misunderstand how the WP:RFA process works on English Wikipedia, noting that one of the key differences between Polish and English Wikipedia is voting, as in "in the case of English version of Wikipedia, new administrators are elected not by voting, but by discussion". That the authors are ready to take such policy claims at face value does cast a little doubt on the applicability of their findings.
Overall, the paper presents some interesting statistical data on trends in an understudied community, and contributes to our understanding of the governance of Wikipedia. The analysis of the received data is however rather lacking, particularly through weak ties to literature on leadership, volunteer motivation and related social science areas.
A Portuguese-language dissertation at the University of Évora, titled "Colaboração em Massa ou Amadorismo em Massa?" ("Mass collaboration or mass amateurism?") [6] compared the quality of English Wikipedia with that of Encyclopaedia Britannica. As summarized in English on the author's blog, a representative random sample of 245 article pairs from both encyclopedias was generated, and "reformatted to hide [their] source and then graded by an expert in its subject area using a five-point scale. We asked experts to concentrate only on some [...] intrinsic aspects of the articles' quality, namely accuracy and objectivity, and discard the contextual, representational and accessibility aspects. Whenever possible, the experts invited to participate in the study are University teachers, because they are used to grading students' work not using the reputation of the source." They rated "90% of the Wikipedia articles ... as having equivalent or better quality than their Britannica counterparts".
The annual WikiSym research conference is taking place in Hong Kong from August 5 to 7. Since June, the organizers have been featuring the abstracts of the conference's papers on the conference blog, with online publication of full texts planned for August 5. But several authors have already made their papers available elsewhere:
The fact that Wikipedia's editing community has a huge gender gap (with vastly more male than female editors contributing to the encyclopedia) was first brought to wider attention by a 2008 survey of Wikipedia readers and editors, whose results were published by UNU-MERIT and the Wikimedia Foundation in 2010. It found that only 17.8% of US-based editors were female, and 12.7% globally. As reported in the Signpost at the time, some concerns were voiced about the possible impact of participation bias on the results (an effect which is frequent in volunteer web surveys), for example because the survey had also found a gender gap in Wikipedia readers (39.9% female in the US), in contrast to other research which estimated the gender ratio among readers closer to 50%.
A new PloS ONE paper titled "The Wikipedia Gender Gap Revisited: Characterizing Survey Response Bias with Propensity Score Estimation" [12] has made it possible for the first time to quantify this participation bias, regarding the subset of US-based editors. Using a method for propensity adjustment for web surveys first published in a 2011 statistical paper, they compare the 2008 survey with Pew Research data from around the same time, which is assumed to be free of the same kind of bias because it was based on different methodology (a phone survey), and had found 49.0% of US Wikipedia readers to be female. The authors write: "We estimate that the proportion of female US adult editors was 27.5% higher than the original study reported (22.7%, versus 17.8%), and that the total proportion of female editors was 26.8% higher (16.1%, versus 12.7%)." Likewise, they find evidence that the proportion of editors who are "married, or parents, [had] been underestimated, while the proportions of immigrants and students [had] been overestimated."
The authors emphasize that their results do not negate the existence of the gender gap in general ("the basic takeaways in regards to the underrepresentation of women in the WMF/UNU-MERIT survey remain intact"), and actually call for "the Wikimedia Foundation's strategic goal to increase female editorship to 25% [...] to be raised in light of these adjusted estimates." They observe that their method is not applicable to the three subsequent editor surveys conducted by the Wikimedia Foundation in 2011/12 (the most recent one by this reviewer), because they focused solely on editors, and therefore the necessary reader comparison data (e.g. the data from Pew Research surveys) is not available. Still, the paper's results will definitely have a positive impact on the research efforts by the Foundation and others to better understand the demographics of the Wikipedia editing community.
A monthly overview of recent academic research about Wikipedia and other Wikimedia projects, also published as the Wikimedia Research Newsletter.
An ArXiv preprint titled "Highlighting entanglement of cultures via ranking of multilingual Wikipedia articles" [1], authored by a group of physicists from France, examines the Wikipedia articles on individuals and their position in the hyperlink network of the articles in each Wikipedia language edition. There are 9 language editions studied. The authors try to locate the most "important" individuals ("heroes") in each language edition by calculating two different page rank scores: PageRank and CheiRank. After making the lists of individuals with highest ranks in each language edition (with 30 individuals in each list), overlaps between lists are investigated and local and global "heroes" are introduced. The lists of "global heroes" are topped by Napoleon for PageRank, and Michael Jackson for 2DRank. It is shown that both local and global heroes exist and while global heroes gain their central position in the network due to links from multiple other central nodes, local heroes are mostly notable because of the large number of links directly pointing to them. Finally, based on the nationality (language of origin) of the highly ranked individual, a network of languages is constructed and the position of each language in this network is analysed by calculating rank scores. The authors also analyzed the activities of those important individuals, and have found politicians and scientists to be quite often among the most important ones.
The book chapter of "Wikipedia as Cultural Reference" in Richard A. Rogers' book "Digital Methods" [2] can be read as an example of the "digital methods" applied to Wikipedia, or a contribution to the emerging literature on cross-language-version or cross-cultural comparison of the same or similar encyclopedia articles in global Wikipedia projects. Not to be confused with "big methods", "virtual methods", etc. [3], the Digital Methods Initiative (DMI) is a school of Internet researchers at University of Amsterdam led by Rogers to 'create a platform to display the tools and methods to perform research that ... take advantage of "web epistemology"'. Currently the DMI has built some basic Wikipedia research tools that help social scientists to analyze cross-lingual images, anonymous edits, tables of contents, etc. Thus, as part of Rogers' research agenda in advocating the "digital methods", the Wikipedia projects become both a data set and analytical devices that can be repurposed for social research: "as a cultural reference, a vigilant community, a scandal machine and a controversy diagnostic machine" [4].
Self-defined as "cultural research with Wikipedia", this chapter compared the Srebrenica Articles (The Fall of Srebrenica, the Srebrenica Massacre, and the Srebrenica Genocide) across six language versions: Dutch, English, Bosnian, Croatian, Serbian, and Serbo-Croatian. Using various kinds of datasets, ranging from creation dates, edits by interlanguage article editors and top ten editors, the numbers of victims, tables of contents, referenced websites and images used, the findings show that the principle of neutral point of view does not automatically make Wikipedia articles universal (or at least similar) across language versions. The differences, especially those specific to the Wiki medium, can be used for cultural analysis on the selected topics. The content outcome is found to reflect the dynamics between the power editors in defending their sources and content using Wikipedia policies. Among these "umbrella articles", the English version is a highly contested article among many interlanguage editors, and the Serbo-Croatian version is much softened and unifying with very few editors.
Adopting and extending the digital methods, two groups of participants at the DMI summer school 2013 examined the cross-language-version differences on two topics: art and menstruation. The "Cross Lingual Art Spaces on Wikipedia" project (by Sangeet Kumar, Garance Coggins, Sarah Mc Monagle, Stephan Schlögl, Han-Teng Liao, Michael Stevenson, Federica Bardelli, and Anat Ben-David) sought to find the universal and specific articulations of the concept of art through (1) images and (2) concepts (i.e. strongly related articles), producing an image network visualization for 154 language versions and a concept network visualization for eight selected language versions. A Wikidata scraping tool was developed to identify different names for the same content for the process called "concept reference disambiguation".
The second project, "Menstruation Across Cultures Online" (by Astrid Bigoni, Loes Bogers, Zuzana Karascakova, Emily Stacey and Sarah Mc Monagle) looked at the cultural differences of Wikipedia images and Google autocomplete suggestions to find associated images and search queries. In addition, the English version of the article on menstruation was compared with other English-language sources such as Urban Dictionary and Twitter, producing an interesting cross-platform comparative tag cloud. While not full research articles, the research outcomes of the two projects nonetheless demonstrated the potential directions for cross-cultural and cross-platform comparison, when Wikipedia projects are compared among themselves or with other online platforms that contain user-generated content and/or activities.
A conference paper titled "Does the Acquaintance Relation Close up the Administrator Community of Polish Wikipedia?" [5] investigates why the Polish Wikipedia community of Administrators is growing slower than expected, as defined by a decrease in successful RfAs. The paper presents a useful literature review of related academic work on RfA, and is a welcome study of the under-researched population of editors at non-English Wikipedias. It seems to focus on the computer science dimension, with a developed statistics section, but little theory discussion. In this reviewer's opinion it would've been stronger if the authors engaged with more social science theory, such as the iron law of oligarchy.
The authors suggest at first such a decline may occur because administrators are chosen on the basis of acquaintance, thus creating a closed group which people lacking the right connections cannot join. Later, they conclude that this is unlikely, instead pointing to growing expectations about new candidates. Both of those would be valid hypotheses, but neither is clearly tied to any theory or previous study. The authors' analysis of the data is problematic; at one point they contradict themselves, noting that "[One of the observed phenomena] could indicate, however, that the community is closing up after all" although later their conclusion states "Our conclusion is that it cannot be claimed with certainty that the Polish Wikipedia community is closing up.".
The authors also misunderstand how the WP:RFA process works on English Wikipedia, noting that one of the key differences between Polish and English Wikipedia is voting, as in "in the case of English version of Wikipedia, new administrators are elected not by voting, but by discussion". That the authors are ready to take such policy claims at face value does cast a little doubt on the applicability of their findings.
Overall, the paper presents some interesting statistical data on trends in an understudied community, and contributes to our understanding of the governance of Wikipedia. The analysis of the received data is however rather lacking, particularly through weak ties to literature on leadership, volunteer motivation and related social science areas.
A Portuguese-language dissertation at the University of Évora, titled "Colaboração em Massa ou Amadorismo em Massa?" ("Mass collaboration or mass amateurism?") [6] compared the quality of English Wikipedia with that of Encyclopaedia Britannica. As summarized in English on the author's blog, a representative random sample of 245 article pairs from both encyclopedias was generated, and "reformatted to hide [their] source and then graded by an expert in its subject area using a five-point scale. We asked experts to concentrate only on some [...] intrinsic aspects of the articles' quality, namely accuracy and objectivity, and discard the contextual, representational and accessibility aspects. Whenever possible, the experts invited to participate in the study are University teachers, because they are used to grading students' work not using the reputation of the source." They rated "90% of the Wikipedia articles ... as having equivalent or better quality than their Britannica counterparts".
The annual WikiSym research conference is taking place in Hong Kong from August 5 to 7. Since June, the organizers have been featuring the abstracts of the conference's papers on the conference blog, with online publication of full texts planned for August 5. But several authors have already made their papers available elsewhere:
The fact that Wikipedia's editing community has a huge gender gap (with vastly more male than female editors contributing to the encyclopedia) was first brought to wider attention by a 2008 survey of Wikipedia readers and editors, whose results were published by UNU-MERIT and the Wikimedia Foundation in 2010. It found that only 17.8% of US-based editors were female, and 12.7% globally. As reported in the Signpost at the time, some concerns were voiced about the possible impact of participation bias on the results (an effect which is frequent in volunteer web surveys), for example because the survey had also found a gender gap in Wikipedia readers (39.9% female in the US), in contrast to other research which estimated the gender ratio among readers closer to 50%.
A new PloS ONE paper titled "The Wikipedia Gender Gap Revisited: Characterizing Survey Response Bias with Propensity Score Estimation" [12] has made it possible for the first time to quantify this participation bias, regarding the subset of US-based editors. Using a method for propensity adjustment for web surveys first published in a 2011 statistical paper, they compare the 2008 survey with Pew Research data from around the same time, which is assumed to be free of the same kind of bias because it was based on different methodology (a phone survey), and had found 49.0% of US Wikipedia readers to be female. The authors write: "We estimate that the proportion of female US adult editors was 27.5% higher than the original study reported (22.7%, versus 17.8%), and that the total proportion of female editors was 26.8% higher (16.1%, versus 12.7%)." Likewise, they find evidence that the proportion of editors who are "married, or parents, [had] been underestimated, while the proportions of immigrants and students [had] been overestimated."
The authors emphasize that their results do not negate the existence of the gender gap in general ("the basic takeaways in regards to the underrepresentation of women in the WMF/UNU-MERIT survey remain intact"), and actually call for "the Wikimedia Foundation's strategic goal to increase female editorship to 25% [...] to be raised in light of these adjusted estimates." They observe that their method is not applicable to the three subsequent editor surveys conducted by the Wikimedia Foundation in 2011/12 (the most recent one by this reviewer), because they focused solely on editors, and therefore the necessary reader comparison data (e.g. the data from Pew Research surveys) is not available. Still, the paper's results will definitely have a positive impact on the research efforts by the Foundation and others to better understand the demographics of the Wikipedia editing community.
Discuss this story
Barnstars