A monthly overview of recent academic research about Wikipedia and other Wikimedia projects, also published as the Wikimedia Research Newsletter.
A short paper presented at the Joint Conference on Digital Libraries titled "Quality Assessment of Wikipedia Articles Without Feature Engineering" [1] uses deep learning to predict the quality of articles in the English Wikipedia. As the paper's title alludes, previous research on article quality has used a specific set of features to represent the articles, whereas the promise of deep learning is that the machine learner will determine the best representation on its own.
Some representation of the articles still requires to be chosen, and the paper uses "Doc2Vec", an extension of Word2vec that uses unsupervised machine learning to learn vector representations of the articles. A benefit of this approach is that it is language neutral, whereas other approaches might utilize features that are language-specific. These vectors are learned from a training set based on the Wikimedia Foundation's dataset of 30,000 English articles. A deep neural network using Google’s TensorFlow library is then trained using these vectors with the aim to predict to which of the English Wikipedia’s assessment classes an article belongs.
The performance of the classifier is compared to the current state of the art, which at the time of writing is the WMF's own Objective Revision Evaluation Service (ORES) (disclaimer: the reviewer is the primary author of the research upon which ORES' article quality classifier is built). Since the number of articles in each class is fairly balanced, the proportion of correctly classified instances (accuracy) is used as the performance measure. ORES is reported to be 60% accurate (it currently reports 61.9% accuracy), and the deep neural network was found to be 55% accurate. As pointed out in the paper, this work is a first step towards using deep learning for this task, meaning that slightly lower performance is acceptable. The authors describe a couple of changes that will most likely improve the classifier and aim to do so in future work. Deep learning is an area where interesting things are happening, and if it can be used to improve our ability to automatically assess Wikipedia articles, a service that is already useful to many Wikipedians through services like WikiProject X and SuggestBot, that is only for the better!
Dr. Tsung-Ho Liang (梁宗賀) [supp 1] is a systems analyst in the information center at the Tainan City Government's Bureau of Education. He currently studies big data in education, especially dealing with unstructured data and natural language processing techniques. In 2013, he started a project to integrate the contents of Chinese Wikipedia with the Chinese Knowledge and Information Processing (CKIP) technology and established a new search engine for Chinese Wikipedia, [supp 2] – WikiSeeker (維基嬉客).
WikiSeeker is a tailor-made search system based on the Wikipedia corpus to leverage search effectiveness by providing structured association graphs with related Wikipedia articles for students' queries in Chinese. First, it produces a knowledge map with clear relationships among each field of knowledge, so students can easily identify the most important keywords among contents. Second, the search bar of WikiSeeker is capable of using natural language to search instead of typing keywords. You can see a tour of WikiSeeker on Youtube.
The above two features make WikiSeeker intuitive and easy to use for K-12 students. According to the research essay "WikiSeeker─The Study of the Impact of a Search System with Structured Association Graphs on Learning Effectiveness" [2] by the researcher Sheng-Nan Cheng (鄭盛南), two experimental groups were adopted in this study: one asks students to use Chinese Wikipedia directly to answer questions, and another asks students use the WikiSeeker website to answer the same questions. The results showed that the students who used WikiSeeker were 10.8% more correct in their answers (on average, 13.73 out of 19, compared to 15.8 out of 19 questions). Moreover, it was found that girls and middle-achieving students reached the highest learning improvement when using WikiSeeker. The conclusion suggests that WikiSeeker is suitable for students to acquire knowledge in Chinese Wikipedia.
Sentiment analysis - the automated extraction of subjective information expressed in text - has been applied to Wikipedia research in several recent papers.
Four researchers from Stanford University analyzed [3] all (non-neutral) votes in the English Wikipedia's request for adminship process cast from its inception in 2003 until 2013. These form a directed, signed graph with around 11,000 nodes (users) and 160,000 edges (votes). They removed the actual vote text ("support" and "oppose") and tried to reconstruct the vote by applying sentiment analysis to the remaining comment text (where e.g. "I’ve no concerns, will make an excellent addition to the admin corps" indicates a positive vote). The performance of the resulting prediction model is described as "remarkably high, [...] as a consequence of the highly indicative, sometimes even formulaic, language used in the comments". It performed much better than a model trying to predict votes based on network characteristics alone (patterns of other support/oppose votes, using e.g. ideas from balance theory like "an enemy of my enemy is my friend").
Is the editing frequency of Wikipedians influenced by negative or positive comments they receive on their user talk pages?
A student course project at the same university [4] tried to examine this question by analyzing the user talk pages of all users (around 620,000) who signed up in 2013 and made at least one article edit on the English Wikipedia, together with "thanks" messages received via the new software feature introduced during that year. They related this data to the number of article edits per week. The authors report that "while we found some predictive value for future behavior in the sentimental content of messages received by Wikipedia editors, we do not have evidence to establish a causal relationship between these variables... we were able to detect macro-level patterns of behavior that appear to discredit the hypothesis that the sentimental content of user talk pages is a main driver of user churn on Wikipedia". As a limitation of their application of sentiment analysis in this situation, they note that "Most messages exchanged through user talk pages are not sentimentally-loaded, but rather talk about the Wikipedia guidelines and policies in a neutral manner", calling for the use of more sophisticated natural language processing techniques.
These results are somewhat in contrast to those of a paper titled "The Impact of Sentiment-driven Feedback on Knowledge Reuse in Online Communities", [5] which investigated "whether affective communication [...] in form of sentiment-driven feedback in discussions between Wikipedia editors motivates collaborative work", by analyzing a complete history dump of the Simple English Wikipedia (until 2011). The researchers focus on the "knowledge reuse" aspect of this collaborative work, quantified for "any two consecutive revisions of the same article page as the ratio of the number of words reused from the previous revision (e.g., copied, moved elsewhere, or restored) to the number of words newly created in the current revision." By relating the positivity or negativity of article talk page comments to editing activity in the article itself, the authors found that:
Besides observing that public positive feedback may have a positive effect on editor motivation, they also note that "non-public negative peer feedback could increase one’s likelihood to engage in online social production by correcting inherent problems, behaviors, and attitudes in private peer conversations, which also strongly suggests that mechanisms for providing non-public negative feedback should be designed, incorporated, and tested in collaborative platforms such as wikis."
See also our earlier coverage of sentiment analysis research, and a current research collaboration of the Wikimedia Foundation and other researchers that aims "to use machine learning and statistics to understand how attacking or 'toxic' language affects the contributor community on Wikipedia. The focus of our analysis is initially on talk page comments that exhibit harassment, personal attacks and aggressive tone."
Wikimania 2016, the annual global Wikimedia conference, took place in June in Esino Lario, Italy. The programme contained various research-related session, including the annual "State of Wikimedia Research" presentation highlighting some of the most interesting scholarship from the past year ( slides).
See the research events page on Meta-wiki for upcoming conferences and events, including submission deadlines.
A list of other recent publications that could not be covered in time for this issue – contributions are always welcome for reviewing or summarizing newly published research.
Other student project writeups from the fall 2015 CS229 course at Stanford (see also above):
A monthly overview of recent academic research about Wikipedia and other Wikimedia projects, also published as the Wikimedia Research Newsletter.
A short paper presented at the Joint Conference on Digital Libraries titled "Quality Assessment of Wikipedia Articles Without Feature Engineering" [1] uses deep learning to predict the quality of articles in the English Wikipedia. As the paper's title alludes, previous research on article quality has used a specific set of features to represent the articles, whereas the promise of deep learning is that the machine learner will determine the best representation on its own.
Some representation of the articles still requires to be chosen, and the paper uses "Doc2Vec", an extension of Word2vec that uses unsupervised machine learning to learn vector representations of the articles. A benefit of this approach is that it is language neutral, whereas other approaches might utilize features that are language-specific. These vectors are learned from a training set based on the Wikimedia Foundation's dataset of 30,000 English articles. A deep neural network using Google’s TensorFlow library is then trained using these vectors with the aim to predict to which of the English Wikipedia’s assessment classes an article belongs.
The performance of the classifier is compared to the current state of the art, which at the time of writing is the WMF's own Objective Revision Evaluation Service (ORES) (disclaimer: the reviewer is the primary author of the research upon which ORES' article quality classifier is built). Since the number of articles in each class is fairly balanced, the proportion of correctly classified instances (accuracy) is used as the performance measure. ORES is reported to be 60% accurate (it currently reports 61.9% accuracy), and the deep neural network was found to be 55% accurate. As pointed out in the paper, this work is a first step towards using deep learning for this task, meaning that slightly lower performance is acceptable. The authors describe a couple of changes that will most likely improve the classifier and aim to do so in future work. Deep learning is an area where interesting things are happening, and if it can be used to improve our ability to automatically assess Wikipedia articles, a service that is already useful to many Wikipedians through services like WikiProject X and SuggestBot, that is only for the better!
Dr. Tsung-Ho Liang (梁宗賀) [supp 1] is a systems analyst in the information center at the Tainan City Government's Bureau of Education. He currently studies big data in education, especially dealing with unstructured data and natural language processing techniques. In 2013, he started a project to integrate the contents of Chinese Wikipedia with the Chinese Knowledge and Information Processing (CKIP) technology and established a new search engine for Chinese Wikipedia, [supp 2] – WikiSeeker (維基嬉客).
WikiSeeker is a tailor-made search system based on the Wikipedia corpus to leverage search effectiveness by providing structured association graphs with related Wikipedia articles for students' queries in Chinese. First, it produces a knowledge map with clear relationships among each field of knowledge, so students can easily identify the most important keywords among contents. Second, the search bar of WikiSeeker is capable of using natural language to search instead of typing keywords. You can see a tour of WikiSeeker on Youtube.
The above two features make WikiSeeker intuitive and easy to use for K-12 students. According to the research essay "WikiSeeker─The Study of the Impact of a Search System with Structured Association Graphs on Learning Effectiveness" [2] by the researcher Sheng-Nan Cheng (鄭盛南), two experimental groups were adopted in this study: one asks students to use Chinese Wikipedia directly to answer questions, and another asks students use the WikiSeeker website to answer the same questions. The results showed that the students who used WikiSeeker were 10.8% more correct in their answers (on average, 13.73 out of 19, compared to 15.8 out of 19 questions). Moreover, it was found that girls and middle-achieving students reached the highest learning improvement when using WikiSeeker. The conclusion suggests that WikiSeeker is suitable for students to acquire knowledge in Chinese Wikipedia.
Sentiment analysis - the automated extraction of subjective information expressed in text - has been applied to Wikipedia research in several recent papers.
Four researchers from Stanford University analyzed [3] all (non-neutral) votes in the English Wikipedia's request for adminship process cast from its inception in 2003 until 2013. These form a directed, signed graph with around 11,000 nodes (users) and 160,000 edges (votes). They removed the actual vote text ("support" and "oppose") and tried to reconstruct the vote by applying sentiment analysis to the remaining comment text (where e.g. "I’ve no concerns, will make an excellent addition to the admin corps" indicates a positive vote). The performance of the resulting prediction model is described as "remarkably high, [...] as a consequence of the highly indicative, sometimes even formulaic, language used in the comments". It performed much better than a model trying to predict votes based on network characteristics alone (patterns of other support/oppose votes, using e.g. ideas from balance theory like "an enemy of my enemy is my friend").
Is the editing frequency of Wikipedians influenced by negative or positive comments they receive on their user talk pages?
A student course project at the same university [4] tried to examine this question by analyzing the user talk pages of all users (around 620,000) who signed up in 2013 and made at least one article edit on the English Wikipedia, together with "thanks" messages received via the new software feature introduced during that year. They related this data to the number of article edits per week. The authors report that "while we found some predictive value for future behavior in the sentimental content of messages received by Wikipedia editors, we do not have evidence to establish a causal relationship between these variables... we were able to detect macro-level patterns of behavior that appear to discredit the hypothesis that the sentimental content of user talk pages is a main driver of user churn on Wikipedia". As a limitation of their application of sentiment analysis in this situation, they note that "Most messages exchanged through user talk pages are not sentimentally-loaded, but rather talk about the Wikipedia guidelines and policies in a neutral manner", calling for the use of more sophisticated natural language processing techniques.
These results are somewhat in contrast to those of a paper titled "The Impact of Sentiment-driven Feedback on Knowledge Reuse in Online Communities", [5] which investigated "whether affective communication [...] in form of sentiment-driven feedback in discussions between Wikipedia editors motivates collaborative work", by analyzing a complete history dump of the Simple English Wikipedia (until 2011). The researchers focus on the "knowledge reuse" aspect of this collaborative work, quantified for "any two consecutive revisions of the same article page as the ratio of the number of words reused from the previous revision (e.g., copied, moved elsewhere, or restored) to the number of words newly created in the current revision." By relating the positivity or negativity of article talk page comments to editing activity in the article itself, the authors found that:
Besides observing that public positive feedback may have a positive effect on editor motivation, they also note that "non-public negative peer feedback could increase one’s likelihood to engage in online social production by correcting inherent problems, behaviors, and attitudes in private peer conversations, which also strongly suggests that mechanisms for providing non-public negative feedback should be designed, incorporated, and tested in collaborative platforms such as wikis."
See also our earlier coverage of sentiment analysis research, and a current research collaboration of the Wikimedia Foundation and other researchers that aims "to use machine learning and statistics to understand how attacking or 'toxic' language affects the contributor community on Wikipedia. The focus of our analysis is initially on talk page comments that exhibit harassment, personal attacks and aggressive tone."
Wikimania 2016, the annual global Wikimedia conference, took place in June in Esino Lario, Italy. The programme contained various research-related session, including the annual "State of Wikimedia Research" presentation highlighting some of the most interesting scholarship from the past year ( slides).
See the research events page on Meta-wiki for upcoming conferences and events, including submission deadlines.
A list of other recent publications that could not be covered in time for this issue – contributions are always welcome for reviewing or summarizing newly published research.
Other student project writeups from the fall 2015 CS229 course at Stanford (see also above):
Discuss this story