A paper to appear in a special issue of American Behavioral Scientist ( summarized in the research index) sheds new light on the English Wikipedia's declining editor growth and retention trends. The paper describes how "several changes that the Wikipedia community made to manage quality and consistency in the face of a massive growth in participation have lead to a more restrictive environment for newcomers". [1] The number of active Wikipedia editors has been declining since 2007 and research examining data up to September 2009 [2] has shown that the root of the problem has been the declining retention of new editors. The authors show this decline is mainly due to a decline among desirable, good-faith newcomers, and point to three factors contributing to the increasingly "restrictive environment" they face.
First, Wikipedia is increasingly likely to reject desirable newcomers' contributions, be it in the form of reverts or deletions. Second, it is increasingly likely to greet them with impersonal messages; the authors cite a study that shows that by mid 2008 over half of new users received their first message in a depersonalized format, usually as a warning from a bot, or an editor using a semi-automated tool [3]. They show a correlation between the growing use of various depersonalized tools for dealing with newcomers, and the dropping retention of newcomers. The authors speculate that unwanted but good faithed contributions were likely handled differently in the early years of the project – unwanted changes were fixed and non-notable articles were merged. Startlingly, the authors find that a significant number of first time editors will make an inquiry about their reverted edit on the talk page of the article they were reverted on only to be ignored by the Wikipedians who reverted them. Specifically editors who use vandal-fighting tools like Huggle or Twinkle are increasingly less likely to follow the Wikipedia:Bold, revert, discuss cycle and respond to discussions about their reverts.
As a third factor, the authors note that the majority of Wikipedia rules were created before 2007 and have not changed much since, and thus new editors face the environment where they have little influence on the rules that govern their behavior, and more importantly, how others should behave toward them. The authors note that this violates Ostrom's 3rd principle for stable local common pool resource management, by effectively excluding a group that is very vulnerable to certain rules from being able to effectively influence them.
The authors recognize that automated tools and extensive rules are needed to deal with vandalism and manage a complex project, but they caution that the currently evolved customs and procedures are not sustainable for the long term. They suggest Wikipedia editors could copy the strategy of distributed, automated tools that have proven so effective at dealing with vandalism (e.g. Huggle & User:ClueBot NG) to build tools that aid in identifying and supporting desirable newcomers (a task in which Wikipedia increasingly fails [4]). Further, they recommend that the newcomers are given a voice, if indirectly via mentors, when it comes to how rules are created and applied.
Overall, the authors present a series of very compelling arguments, and the only complaint this reviewer has is that (even though three of the four were among the Wikimedia Foundation's visiting researchers for the Summer of Research 2011) they do not discuss the fact that the Foundation and the wider community has recognized similar issues, and has engaged in debates, studies, pilot programs and such aimed to remedy the issue (see for example the WMF Editor Trends Study).
Nicolas Jullien's "What we know about Wikipedia. A review of the literature analyzing the project(s)" [5] is an attempt at a "comprehensive" literature review of academic research on Wikipedia. Jullien works to distinguish his literature review from previous attempts like those of Okoli and collaborators (cf. earlier coverage: " A systematic review of the Wikipedia literature") and of Park which tend to split the literature into three main themes: (1) motivations of editors to contribute and relationship between motivation and contribution quality, (2) editorial processes and organization and its relationship to quality and (3) the quality and reliability of production.
Jullien builds on this basic framework by Carillo and Okoli, but distinguishes his from their work in several ways. First, Jullien holds that previous work has focused too little on the outputs, which his analysis emphasizes more. Second and crucially, Jullien's review is not limited to material published in journals and, as a result, is more representative of fields like computer science, HCI, and CSCW, which publish many of their most influential articles in conference proceedings. Jullien does not consider articles on how Wikipedia is used, questions of tools and their improvement, and studies that only use Wikipedia as a database (e.g., to test an algorithm). Other than this, the study is not limited to any particular field. It covers articles published in English, French and Spanish before December 2011, mostly based on searches in WebofScience and Scopus (sharing the search query used in the latter). The review is structured around inputs, processes, and outputs.
In terms of inputs, Jullien considers broad cultural factors in the broader environment and questions of why people choose to participate or join Wikipedia. In terms of process, he considers questions about the activities and roles of contributors, the social (e.g., network) structure of both the projects and the individuals who participants, the role of teams and organization of people within them, the processes around editing, creation, deletion, and promotion of articles with a particular focus on conflict, and questions of management and leadership. In terms of outputs, the paper divides publications into studies of process, Wikipedia user experience, the external evaluation of Wikipedia articles, and questions of Wikipedia coverage.
A second recent preprint by Taha Yasseri and János Kertész [6] likewise gives an overview of vast areas of recent research about Wikipedia. Subtitled "Sociophysical studies of Wikipedia" and citing 114 references, it compares some of the authors' own results on e.g. editing patterns (covered in several past issues of this research report, e.g.: " Dynamics of edit wars") with existing literature. The review focuses on quantitative data-driven analyses of Wikipedia production, reproduces and reports a series of previous analyses, and extends some of the earlier findings.
After a detailed description of how Wikipedia works, the authors walk through a series of types of quantitative analyses of patterns of editing to Wikipedia. They use "blocking" of edits to characterize good and "bad" editors and describe different editing patterns between these groups. The authors show that editors, in general, tend to edit in a "bursty" pattern with long periods of breaks and that editing tends to follow daily and weekly patterns that vary by culture. They also walk through several approaches for classifying edits by type, and discuss the characterization of linguistic features with an emphasis on readability.
Much of their article is focused on the issue of conflicts and edit warring. The authors pay particular attention both to the identification of conflicts and of controversial articles and topics and to characterizing the nature of edit warring itself. The paper ends with the description of an agent-based model of edit warring and conflict.
The International Symposium on Wikis and Open Collaboration -– " WikiSym 2012" – was held August 27–29 in Linz, Austria. The three-day conference featured research papers, posters and demonstrations, and open space discussion sessions. About 80 researchers and wiki experts from around the world attended.
WikiSym is an academic conference, now in its eighth year, that seeks to highlight research on wikis and open collaboration systems. This year’s WikiSym had a strong focus on Wikipedia research, with studies that ranged from analyzing breaking news articles on Wikipedia to looking at the behavior of Wikipedia editors and how long they stay active. In all, 17 papers focused on Wikipedia or MediaWiki, and the two keynotes also focused on Wikipedia research.
The first keynote session was given by Jimmy Wales, who discussed challenges for Wikipedia and potential research questions that matter to the Wikimedia community [2] [3]; Wales focused particularly on questions around diversity of the editing body, how to grow small language communities, and how to retain editors. The closing keynote was given by Brent Hecht, a researcher from Northwestern University, who spoke on techniques for making multilingual comparisons of content across Wikipedia versions, which in turn allows researchers to identify the potential cultural biases of various Wikipedia editions. Hecht found, for instance, that (looking at interwiki links across 25 languages) the majority of Wikipedia article topics only appear in 1 language; that the overlap between major language editions is relatively small; and that the depth of geographical representation varies widely by language, which a bias towards representing the country or place where that edition's language is prominent. Hecht also compared articles on the same topic across Wikipedias to see the degree of similarity between them. Hecht described his work as "hyperlingual", developing techniques to gain a broader perspective on Wikipedia by looking across language editions. His content comparison tool can be seen at the Omnipedia site, and the WikAPIdia API software he developed can be downloaded here. (See also earlier coverage about Omnipedia: " Navigating conceptual maps of Wikipedia language editions")
In addition to the presented papers, some of which are profiled below, WikiSym has a strong tradition of hosting open space sessions in parallel with the main presentations, so that attendees can discuss topics of interest. This year’s open space topics included helping new wiki users; non-text content in wikis (including videos, images, annotations, slideshows and slidecasting); the future of WikiSym; Wikipedia bots; surveying Wikipedia editors; and realtime wiki synchronization and multilingual synchronization feedback. The conference closed with a panel session entitled "What Aren't We Measuring?", where panelists discussed and debated various methods for quantifying wiki-work (by studying editors, edits, and other metrics).
This year's WikiSym was hosted at the Ars Electronica Center in Linz, a "museum of the future" that hosts the Ars Electronica festival every year. The colorful, dramatic Ars Electronica building is in the heart of Linz, so outside of sessions conference attendees enjoyed exploring and socializing in the city center. The conference dinner was held at the Pöstlingberg Schlössl, which is accessed by one of the steepest mountain trams in the world.
WikiSym 2012 papers and poster and demonstration abstracts may be downloaded from the conference website. Next year’s WikiSym is planned for Hong Kong, just before Wikimania 2013. Updates on the schedule and important dates can be found on the WikiSym blog.
On the "Ethnography Matters" blog, participant Heather Ford looked back at the conference, [7] stating that "WikiSym is dominated by big data quantitative analyses of English Wikipedia", asking "where does ethnography belong?" and counting 82% of the Wikipedia-related papers as examining the English Wikipedia and only 18% about other language Wikipedias. A panel at WikiSym 2011 had called to broaden research to other languages (see last year's coverage: " Wiki research beyond the English Wikipedia at WikiSym").
The conference papers and posters included, (apart from several ones that have been covered in earlier issues of this report):
First Monday, the veteran open access journal about Internet topics, featured three Wikipedia-themed papers in its September issue:
A paper to appear in a special issue of American Behavioral Scientist ( summarized in the research index) sheds new light on the English Wikipedia's declining editor growth and retention trends. The paper describes how "several changes that the Wikipedia community made to manage quality and consistency in the face of a massive growth in participation have lead to a more restrictive environment for newcomers". [1] The number of active Wikipedia editors has been declining since 2007 and research examining data up to September 2009 [2] has shown that the root of the problem has been the declining retention of new editors. The authors show this decline is mainly due to a decline among desirable, good-faith newcomers, and point to three factors contributing to the increasingly "restrictive environment" they face.
First, Wikipedia is increasingly likely to reject desirable newcomers' contributions, be it in the form of reverts or deletions. Second, it is increasingly likely to greet them with impersonal messages; the authors cite a study that shows that by mid 2008 over half of new users received their first message in a depersonalized format, usually as a warning from a bot, or an editor using a semi-automated tool [3]. They show a correlation between the growing use of various depersonalized tools for dealing with newcomers, and the dropping retention of newcomers. The authors speculate that unwanted but good faithed contributions were likely handled differently in the early years of the project – unwanted changes were fixed and non-notable articles were merged. Startlingly, the authors find that a significant number of first time editors will make an inquiry about their reverted edit on the talk page of the article they were reverted on only to be ignored by the Wikipedians who reverted them. Specifically editors who use vandal-fighting tools like Huggle or Twinkle are increasingly less likely to follow the Wikipedia:Bold, revert, discuss cycle and respond to discussions about their reverts.
As a third factor, the authors note that the majority of Wikipedia rules were created before 2007 and have not changed much since, and thus new editors face the environment where they have little influence on the rules that govern their behavior, and more importantly, how others should behave toward them. The authors note that this violates Ostrom's 3rd principle for stable local common pool resource management, by effectively excluding a group that is very vulnerable to certain rules from being able to effectively influence them.
The authors recognize that automated tools and extensive rules are needed to deal with vandalism and manage a complex project, but they caution that the currently evolved customs and procedures are not sustainable for the long term. They suggest Wikipedia editors could copy the strategy of distributed, automated tools that have proven so effective at dealing with vandalism (e.g. Huggle & User:ClueBot NG) to build tools that aid in identifying and supporting desirable newcomers (a task in which Wikipedia increasingly fails [4]). Further, they recommend that the newcomers are given a voice, if indirectly via mentors, when it comes to how rules are created and applied.
Overall, the authors present a series of very compelling arguments, and the only complaint this reviewer has is that (even though three of the four were among the Wikimedia Foundation's visiting researchers for the Summer of Research 2011) they do not discuss the fact that the Foundation and the wider community has recognized similar issues, and has engaged in debates, studies, pilot programs and such aimed to remedy the issue (see for example the WMF Editor Trends Study).
Nicolas Jullien's "What we know about Wikipedia. A review of the literature analyzing the project(s)" [5] is an attempt at a "comprehensive" literature review of academic research on Wikipedia. Jullien works to distinguish his literature review from previous attempts like those of Okoli and collaborators (cf. earlier coverage: " A systematic review of the Wikipedia literature") and of Park which tend to split the literature into three main themes: (1) motivations of editors to contribute and relationship between motivation and contribution quality, (2) editorial processes and organization and its relationship to quality and (3) the quality and reliability of production.
Jullien builds on this basic framework by Carillo and Okoli, but distinguishes his from their work in several ways. First, Jullien holds that previous work has focused too little on the outputs, which his analysis emphasizes more. Second and crucially, Jullien's review is not limited to material published in journals and, as a result, is more representative of fields like computer science, HCI, and CSCW, which publish many of their most influential articles in conference proceedings. Jullien does not consider articles on how Wikipedia is used, questions of tools and their improvement, and studies that only use Wikipedia as a database (e.g., to test an algorithm). Other than this, the study is not limited to any particular field. It covers articles published in English, French and Spanish before December 2011, mostly based on searches in WebofScience and Scopus (sharing the search query used in the latter). The review is structured around inputs, processes, and outputs.
In terms of inputs, Jullien considers broad cultural factors in the broader environment and questions of why people choose to participate or join Wikipedia. In terms of process, he considers questions about the activities and roles of contributors, the social (e.g., network) structure of both the projects and the individuals who participants, the role of teams and organization of people within them, the processes around editing, creation, deletion, and promotion of articles with a particular focus on conflict, and questions of management and leadership. In terms of outputs, the paper divides publications into studies of process, Wikipedia user experience, the external evaluation of Wikipedia articles, and questions of Wikipedia coverage.
A second recent preprint by Taha Yasseri and János Kertész [6] likewise gives an overview of vast areas of recent research about Wikipedia. Subtitled "Sociophysical studies of Wikipedia" and citing 114 references, it compares some of the authors' own results on e.g. editing patterns (covered in several past issues of this research report, e.g.: " Dynamics of edit wars") with existing literature. The review focuses on quantitative data-driven analyses of Wikipedia production, reproduces and reports a series of previous analyses, and extends some of the earlier findings.
After a detailed description of how Wikipedia works, the authors walk through a series of types of quantitative analyses of patterns of editing to Wikipedia. They use "blocking" of edits to characterize good and "bad" editors and describe different editing patterns between these groups. The authors show that editors, in general, tend to edit in a "bursty" pattern with long periods of breaks and that editing tends to follow daily and weekly patterns that vary by culture. They also walk through several approaches for classifying edits by type, and discuss the characterization of linguistic features with an emphasis on readability.
Much of their article is focused on the issue of conflicts and edit warring. The authors pay particular attention both to the identification of conflicts and of controversial articles and topics and to characterizing the nature of edit warring itself. The paper ends with the description of an agent-based model of edit warring and conflict.
The International Symposium on Wikis and Open Collaboration -– " WikiSym 2012" – was held August 27–29 in Linz, Austria. The three-day conference featured research papers, posters and demonstrations, and open space discussion sessions. About 80 researchers and wiki experts from around the world attended.
WikiSym is an academic conference, now in its eighth year, that seeks to highlight research on wikis and open collaboration systems. This year’s WikiSym had a strong focus on Wikipedia research, with studies that ranged from analyzing breaking news articles on Wikipedia to looking at the behavior of Wikipedia editors and how long they stay active. In all, 17 papers focused on Wikipedia or MediaWiki, and the two keynotes also focused on Wikipedia research.
The first keynote session was given by Jimmy Wales, who discussed challenges for Wikipedia and potential research questions that matter to the Wikimedia community [2] [3]; Wales focused particularly on questions around diversity of the editing body, how to grow small language communities, and how to retain editors. The closing keynote was given by Brent Hecht, a researcher from Northwestern University, who spoke on techniques for making multilingual comparisons of content across Wikipedia versions, which in turn allows researchers to identify the potential cultural biases of various Wikipedia editions. Hecht found, for instance, that (looking at interwiki links across 25 languages) the majority of Wikipedia article topics only appear in 1 language; that the overlap between major language editions is relatively small; and that the depth of geographical representation varies widely by language, which a bias towards representing the country or place where that edition's language is prominent. Hecht also compared articles on the same topic across Wikipedias to see the degree of similarity between them. Hecht described his work as "hyperlingual", developing techniques to gain a broader perspective on Wikipedia by looking across language editions. His content comparison tool can be seen at the Omnipedia site, and the WikAPIdia API software he developed can be downloaded here. (See also earlier coverage about Omnipedia: " Navigating conceptual maps of Wikipedia language editions")
In addition to the presented papers, some of which are profiled below, WikiSym has a strong tradition of hosting open space sessions in parallel with the main presentations, so that attendees can discuss topics of interest. This year’s open space topics included helping new wiki users; non-text content in wikis (including videos, images, annotations, slideshows and slidecasting); the future of WikiSym; Wikipedia bots; surveying Wikipedia editors; and realtime wiki synchronization and multilingual synchronization feedback. The conference closed with a panel session entitled "What Aren't We Measuring?", where panelists discussed and debated various methods for quantifying wiki-work (by studying editors, edits, and other metrics).
This year's WikiSym was hosted at the Ars Electronica Center in Linz, a "museum of the future" that hosts the Ars Electronica festival every year. The colorful, dramatic Ars Electronica building is in the heart of Linz, so outside of sessions conference attendees enjoyed exploring and socializing in the city center. The conference dinner was held at the Pöstlingberg Schlössl, which is accessed by one of the steepest mountain trams in the world.
WikiSym 2012 papers and poster and demonstration abstracts may be downloaded from the conference website. Next year’s WikiSym is planned for Hong Kong, just before Wikimania 2013. Updates on the schedule and important dates can be found on the WikiSym blog.
On the "Ethnography Matters" blog, participant Heather Ford looked back at the conference, [7] stating that "WikiSym is dominated by big data quantitative analyses of English Wikipedia", asking "where does ethnography belong?" and counting 82% of the Wikipedia-related papers as examining the English Wikipedia and only 18% about other language Wikipedias. A panel at WikiSym 2011 had called to broaden research to other languages (see last year's coverage: " Wiki research beyond the English Wikipedia at WikiSym").
The conference papers and posters included, (apart from several ones that have been covered in earlier issues of this report):
First Monday, the veteran open access journal about Internet topics, featured three Wikipedia-themed papers in its September issue:
Discuss this story
Where to post? - on-topic possible partial explanaton
Where should I post this info on a possible partial explanaton of part of this article's topic? ParkSehJik ( talk) 17:40, 2 December 2012 (UTC) reply