A monthly overview of recent academic research about Wikipedia and other Wikimedia projects, also published as the Wikimedia Research Newsletter.
This paper by Chen et al. [1] proposes to use the Wikipedia article corpus as a source of world knowledge in order to answer open domain questions. They point out that Wikipedia articles contain a lot more information than current knowledge bases, such as DBPedia or Freebase. While knowledge in KBs is encoded in a more machine-friendly way, the vast majority of Wikipedia's knowledge is not covered in KBs, but contained in unstructured text and is thus difficult to access in an algorithmic way. The proposed approach, called "DrQA", aims to overcome that limitation by leveraging the article content. It first retrieves Wikipedia articles relevant to a question, and then uses a recurrent neural network (RNN) to detect relevant parts in the article's paragraphs that could be used as answers. This RNN is based on a set of pretrained word embeddings as well as a set of other features.
Their results indicate that DrQA seems better suited to answer open domain questions than other competitors, based on a set of four question benchmarks. While the evaluation score improvement seems rather small (77.3 vs 78.8 F1 score), the whole task of machine reading at scale using Wikipedia gives directions for interesting future research and applications. For example, depending on the speed of the framework (which unfortunately was not discussed), a new Wikipedia service for answering such open domain questions could be established. Furthermore, this process of answering common knowledge questions could help in improving chatbots.
This Carnegie Mellon University study [2] quantified the success of those editors who engage in talk page discussions and their roles in these discussions. The roles assigned to each editor was:
Unlike earlier studies exploring editor interactions, editors in this study could be assigned simultaneous roles on an article talk page. Success of each editor was determined by analyzing subsequent edits to the article under discussion which were promoted by a particular editor and longevity of these edits. Those editors that are more detail-oriented tend to have more success than those more interested in organization. Multiple editors assuming the role of organization lessens the success of individual editors. The study assessed 7,211 articles, 21,108 discussion threads, 21,108 editor discussion pairs, and the average number of editors per discussion. The number of total edits by an editor is not associated with success.
The researchers also published a dataset consisting of "53,175 instances in which an editor interacts with one or more other editors in a talk page discussion and achieves a measured influence on the associated article page".
This article [3] focuses on the 1.2 million unassessed articles in the Polish Wikipedia, and considers "over 100 linguistic features to determine the quality of Wikipedia articles in Polish language." From the conclusion: "Use of linguistic features is valuable for automatic determination of quality of Wikipedia article in Polish language. Better results in terms of precision can be achieved when the whole text of [an] article is taken into the account. Then our model shows over 93% classification precision using such features as relative number of unique nouns and verbs (unique, 3rd person, impersonal). However, if we take into account only [the] leading section of an article, relative quantity of common words, locatives, vocatives and third person words are the most significant for determination of quality. Using the obtained quality models we [assess] 500 000 randomly chosen unevaluated articles from Polish Wikipedia. According to result, about 4–5% of assessed articles can be considered by Wikipedia community as high quality articles."
See the research events page on Meta-wiki for upcoming conferences and events, including submission deadlines.
Other recent publications that could not be covered in time for this issue include the items listed below. contributions are always welcome for reviewing or summarizing newly published research.
{{
cite journal}}
: Cite journal requires |journal=
(
help)
{{
cite journal}}
: Cite journal requires |journal=
(
help)
(in Polish, book chapter from ISBN 978-83-8012-916-0)
A monthly overview of recent academic research about Wikipedia and other Wikimedia projects, also published as the Wikimedia Research Newsletter.
This paper by Chen et al. [1] proposes to use the Wikipedia article corpus as a source of world knowledge in order to answer open domain questions. They point out that Wikipedia articles contain a lot more information than current knowledge bases, such as DBPedia or Freebase. While knowledge in KBs is encoded in a more machine-friendly way, the vast majority of Wikipedia's knowledge is not covered in KBs, but contained in unstructured text and is thus difficult to access in an algorithmic way. The proposed approach, called "DrQA", aims to overcome that limitation by leveraging the article content. It first retrieves Wikipedia articles relevant to a question, and then uses a recurrent neural network (RNN) to detect relevant parts in the article's paragraphs that could be used as answers. This RNN is based on a set of pretrained word embeddings as well as a set of other features.
Their results indicate that DrQA seems better suited to answer open domain questions than other competitors, based on a set of four question benchmarks. While the evaluation score improvement seems rather small (77.3 vs 78.8 F1 score), the whole task of machine reading at scale using Wikipedia gives directions for interesting future research and applications. For example, depending on the speed of the framework (which unfortunately was not discussed), a new Wikipedia service for answering such open domain questions could be established. Furthermore, this process of answering common knowledge questions could help in improving chatbots.
This Carnegie Mellon University study [2] quantified the success of those editors who engage in talk page discussions and their roles in these discussions. The roles assigned to each editor was:
Unlike earlier studies exploring editor interactions, editors in this study could be assigned simultaneous roles on an article talk page. Success of each editor was determined by analyzing subsequent edits to the article under discussion which were promoted by a particular editor and longevity of these edits. Those editors that are more detail-oriented tend to have more success than those more interested in organization. Multiple editors assuming the role of organization lessens the success of individual editors. The study assessed 7,211 articles, 21,108 discussion threads, 21,108 editor discussion pairs, and the average number of editors per discussion. The number of total edits by an editor is not associated with success.
The researchers also published a dataset consisting of "53,175 instances in which an editor interacts with one or more other editors in a talk page discussion and achieves a measured influence on the associated article page".
This article [3] focuses on the 1.2 million unassessed articles in the Polish Wikipedia, and considers "over 100 linguistic features to determine the quality of Wikipedia articles in Polish language." From the conclusion: "Use of linguistic features is valuable for automatic determination of quality of Wikipedia article in Polish language. Better results in terms of precision can be achieved when the whole text of [an] article is taken into the account. Then our model shows over 93% classification precision using such features as relative number of unique nouns and verbs (unique, 3rd person, impersonal). However, if we take into account only [the] leading section of an article, relative quantity of common words, locatives, vocatives and third person words are the most significant for determination of quality. Using the obtained quality models we [assess] 500 000 randomly chosen unevaluated articles from Polish Wikipedia. According to result, about 4–5% of assessed articles can be considered by Wikipedia community as high quality articles."
See the research events page on Meta-wiki for upcoming conferences and events, including submission deadlines.
Other recent publications that could not be covered in time for this issue include the items listed below. contributions are always welcome for reviewing or summarizing newly published research.
{{
cite journal}}
: Cite journal requires |journal=
(
help)
{{
cite journal}}
: Cite journal requires |journal=
(
help)
(in Polish, book chapter from ISBN 978-83-8012-916-0)
Discuss this story