A monthly overview of recent academic research about Wikipedia and other Wikimedia projects, also published as the Wikimedia Research Newsletter.
Asteroids are among the categories with the most overrepresentation of male editors, and figure skating among those with most female overrepresentation |
---|
This bachelor thesis [1] looks for gender imbalance among editors for specific categories in the English Wikipedia. The analysis is based on the edits of users who publicly disclosed their gender (about 176 thousand) to more than 3.7 million articles in 470 categories (derived from DBpedia's ontology, rather than Wikipedia's inbuilt category system). The thesis first establishes the distribution of editors by gender (roughly 85% males and 15% females). The number of edits by each group is statistically compared to that baseline distribution. For each category, if it varies from the baseline, it is considered to represent a gender gap, i.e. that editors from that gender are overrepresented in that category.
The results show that despite the huge imbalance in the two groups, pages in some categories receive more edits from users belonging to one gender, while other categories are dominated by the other one. As the "Top five categories where male editors are most overrepresented", the author lists "YearInSpaceflight", "Asteroid", "BaseballSeason", "MotorsportSeason", and "FormulaOneTeam". He observes sports as recurring theme "throughout all significant 'male categories'. Besides sports other recurring subjects are transport and politics." On the other hand, "the categories with a female overrepresentation show somewhat less obvious recurring themes. Many of these categories are more or less culture related however." The five categories with the most female overrepresentation are "FigureSkater", "Skater", "Garden", "GaelicGamesPlayer", and "Mollusca".
While highlighting some information on such unbalanced distribution, the underlying hypothesis could be further explored by using the quantity of text changed in each edit and other patterns mentioned by the author.
(See related Signpost coverage from 2011: " New tool analyzes article contributors' gender and location")
While much is known about the quality of Wikipedia articles, less is known about how the different language editions assess article importance. The English Wikipedia's article about waffles is for instance labelled "top-importance" by WikiProject Breakfast, the highest category possible, but at the same time labelled "high importance" by WikiProject Food and Drink (you can find both of these labels on waffle’s talk page). A paper at the International Conference on Information and Software Technologies studies titled "Quality and Importance of Wikipedia Articles in Different Languages" [2] studies the connection between importance and quality. The paper's three research questions look at whether importance affects quality, what parameters are useful for applying machine learning to automatically assess importance, and if there are differences between how language editions model importance.
The English edition offers the most data on article importance, and the paper therefore uses a dataset of English articles to test if importance affects quality. Using a random forest classifier and a model with 85 parameters, a modest increase in classifier performance is found when importance is added as a parameter, indicating that importance affects quality. The same dataset and model is then trained to predict article importance, finding that about two-thirds of top- and low-importance articles can be correctly identified. Lastly the paper compares the importance of model features between different language editions, finding many differences, although these are not described in more detail.
Research on aspects of article quality across different language editions is an area that has not received a lot of attention, making this paper a welcome addition to the literature. It is also great to see article importance being studied. At the same time, this paper could have made a much stronger contribution through comparisons against a sensible baseline (this reviewer notes that the paper cites an in-press paper by the same authors [supp 1], although that paper's results do not appear to be available in English) because the classifier performance appears to be similar to for instance ORES although ORES uses a model with a lot fewer parameters. A deeper investigation into article importance would also be worthwhile, for example because importance differs between topic areas, as exemplified by the article on waffle described earlier.
Researchers have attempted to quantify Wikipedia's gender gap and its impact on content type and quality, and to understand the reasons for the gender gap. A new journal article [3] attempts to experimentally evaluate several hypotheses for why women tend to edit Wikipedia less than men do.
The researchers asked 192 male and female college students to contribute a draft essay about school bullying. The version of the draft that participants were asked to work on had already been edited by four other users (secretly, the researchers themselves), identified by pseudonyms. Two of the pseudonyms were obviously gendered ("Ms Trouble", "Mr Football"), and two were gender neutral ("Cheerios4Life", "AnonymousOne"). Since most people are not familiar with the mechanics of wiki editing, the researchers used a Microsoft Word document with "track changes" enabled as a platform for the editing task, to simulate the versioning and commenting capabilities of MediaWiki pages. The researchers also surveyed the students to gather relevant demographic and psychometric data, and compared their survey responses with their editing behaviors.
Findings from this study include that while women edited more than men overall (contributed more words to the draft), they were less likely to edit under the conditions designed to approximate the social environment of Wikipedia. Specifically, women edited less where there were few or no female-identified collaborators present, and where feedback from the pseudonymous collaborators was neutral (vs. constructive). Interestingly, female participants also tended to assume that one of the non-gendered pseudonyms ("AnonymousOne") was male, and also evaluate feedback from that editor as more critical than male participants who received the same feedback. Based on these findings, the researchers suggest that increasing the visibility of female editors and encouraging constructive feedback may encourage more women to edit Wikipedia.
This research [4] aims to explore the relationship between Wikipedia page view statistics and electoral results during the 2009 and 2014 European Parliament elections in regards to overall voter turnout and individual party results. The article suggests two reasons why voters might seek information: to research new parties which are beyond the voter's familiarity, and to research alternative party options if a voter is unhappy with the party they previously supported (thus becoming swing voters).
The first dataset used in this research is Wikipedia page views data on the general election page in 14 different languages (those which are the primary languages of the voting countries). The second dataset includes political parties which had at least 5% vote share in the 2009 and/or 2014 elections in the UK, France, Germany, Spain and Italy. The researchers gathered additional data points such as number of views to the political party's Wikipedia page the week before the election, the final percentage of vote share each party received, whether a party was new, whether a party was incumbent, and the number of times each party was mentioned in print media during the week before the election.
Comparing the relative change in page views to the EU Parliament elections article and total voter turnout in the 2009 and 2014 elections indicates that interest in election events is proportional to levels of readership on Wikipedia. This research suggests that often the party garnering the most page views does not win the election, rather, it may be a smaller party which interested swing voters. Figure 1(a) shows a high correlation between print media mentions and overall voter share for parties. Figure 1(b) shows Wikipedia page views may predict a new party's success, while news outlet mentions are better at predicting an established party's success.
The research tests the theory that an increase in Wikipedia page views may suggest an increase to votes for a party using three linear ordinary least squares regression models. The first model is a baseline of past voting results. The second model is also a baseline model which includes past voting results, along with all other non-Wikipedia related data collected. These baseline models serve as a comparison to the third model, which includes all the previously modeled data, along with two Wikipedia-related parameters. The models show that Wikipedia can be considered a predictor of voter outcome, but it only marginally improves upon the baseline models. Wikipedia's predictive power lies in predicting the amount a party's vote share may increase or decrease from the previous election cycle.
As noted by the researchers, one limitation of this article is that the data is at an aggregated level, while all theories are at the micro level. Also, it is unclear what number of Wikipedia page views reflect voters versus other groups, such as journalists or those those affiliated with the parties.
(See also our 2014 coverage of some related blog posts by the same authors: " Wikipedia use driven by news media or replacing news media?")
See the research events page on Meta-wiki for upcoming conferences and events, including submission deadlines.
Other recent publications that could not be covered in time for this issue include the items listed below. contributions are always welcome for reviewing or summarizing newly published research.
A monthly overview of recent academic research about Wikipedia and other Wikimedia projects, also published as the Wikimedia Research Newsletter.
Asteroids are among the categories with the most overrepresentation of male editors, and figure skating among those with most female overrepresentation |
---|
This bachelor thesis [1] looks for gender imbalance among editors for specific categories in the English Wikipedia. The analysis is based on the edits of users who publicly disclosed their gender (about 176 thousand) to more than 3.7 million articles in 470 categories (derived from DBpedia's ontology, rather than Wikipedia's inbuilt category system). The thesis first establishes the distribution of editors by gender (roughly 85% males and 15% females). The number of edits by each group is statistically compared to that baseline distribution. For each category, if it varies from the baseline, it is considered to represent a gender gap, i.e. that editors from that gender are overrepresented in that category.
The results show that despite the huge imbalance in the two groups, pages in some categories receive more edits from users belonging to one gender, while other categories are dominated by the other one. As the "Top five categories where male editors are most overrepresented", the author lists "YearInSpaceflight", "Asteroid", "BaseballSeason", "MotorsportSeason", and "FormulaOneTeam". He observes sports as recurring theme "throughout all significant 'male categories'. Besides sports other recurring subjects are transport and politics." On the other hand, "the categories with a female overrepresentation show somewhat less obvious recurring themes. Many of these categories are more or less culture related however." The five categories with the most female overrepresentation are "FigureSkater", "Skater", "Garden", "GaelicGamesPlayer", and "Mollusca".
While highlighting some information on such unbalanced distribution, the underlying hypothesis could be further explored by using the quantity of text changed in each edit and other patterns mentioned by the author.
(See related Signpost coverage from 2011: " New tool analyzes article contributors' gender and location")
While much is known about the quality of Wikipedia articles, less is known about how the different language editions assess article importance. The English Wikipedia's article about waffles is for instance labelled "top-importance" by WikiProject Breakfast, the highest category possible, but at the same time labelled "high importance" by WikiProject Food and Drink (you can find both of these labels on waffle’s talk page). A paper at the International Conference on Information and Software Technologies studies titled "Quality and Importance of Wikipedia Articles in Different Languages" [2] studies the connection between importance and quality. The paper's three research questions look at whether importance affects quality, what parameters are useful for applying machine learning to automatically assess importance, and if there are differences between how language editions model importance.
The English edition offers the most data on article importance, and the paper therefore uses a dataset of English articles to test if importance affects quality. Using a random forest classifier and a model with 85 parameters, a modest increase in classifier performance is found when importance is added as a parameter, indicating that importance affects quality. The same dataset and model is then trained to predict article importance, finding that about two-thirds of top- and low-importance articles can be correctly identified. Lastly the paper compares the importance of model features between different language editions, finding many differences, although these are not described in more detail.
Research on aspects of article quality across different language editions is an area that has not received a lot of attention, making this paper a welcome addition to the literature. It is also great to see article importance being studied. At the same time, this paper could have made a much stronger contribution through comparisons against a sensible baseline (this reviewer notes that the paper cites an in-press paper by the same authors [supp 1], although that paper's results do not appear to be available in English) because the classifier performance appears to be similar to for instance ORES although ORES uses a model with a lot fewer parameters. A deeper investigation into article importance would also be worthwhile, for example because importance differs between topic areas, as exemplified by the article on waffle described earlier.
Researchers have attempted to quantify Wikipedia's gender gap and its impact on content type and quality, and to understand the reasons for the gender gap. A new journal article [3] attempts to experimentally evaluate several hypotheses for why women tend to edit Wikipedia less than men do.
The researchers asked 192 male and female college students to contribute a draft essay about school bullying. The version of the draft that participants were asked to work on had already been edited by four other users (secretly, the researchers themselves), identified by pseudonyms. Two of the pseudonyms were obviously gendered ("Ms Trouble", "Mr Football"), and two were gender neutral ("Cheerios4Life", "AnonymousOne"). Since most people are not familiar with the mechanics of wiki editing, the researchers used a Microsoft Word document with "track changes" enabled as a platform for the editing task, to simulate the versioning and commenting capabilities of MediaWiki pages. The researchers also surveyed the students to gather relevant demographic and psychometric data, and compared their survey responses with their editing behaviors.
Findings from this study include that while women edited more than men overall (contributed more words to the draft), they were less likely to edit under the conditions designed to approximate the social environment of Wikipedia. Specifically, women edited less where there were few or no female-identified collaborators present, and where feedback from the pseudonymous collaborators was neutral (vs. constructive). Interestingly, female participants also tended to assume that one of the non-gendered pseudonyms ("AnonymousOne") was male, and also evaluate feedback from that editor as more critical than male participants who received the same feedback. Based on these findings, the researchers suggest that increasing the visibility of female editors and encouraging constructive feedback may encourage more women to edit Wikipedia.
This research [4] aims to explore the relationship between Wikipedia page view statistics and electoral results during the 2009 and 2014 European Parliament elections in regards to overall voter turnout and individual party results. The article suggests two reasons why voters might seek information: to research new parties which are beyond the voter's familiarity, and to research alternative party options if a voter is unhappy with the party they previously supported (thus becoming swing voters).
The first dataset used in this research is Wikipedia page views data on the general election page in 14 different languages (those which are the primary languages of the voting countries). The second dataset includes political parties which had at least 5% vote share in the 2009 and/or 2014 elections in the UK, France, Germany, Spain and Italy. The researchers gathered additional data points such as number of views to the political party's Wikipedia page the week before the election, the final percentage of vote share each party received, whether a party was new, whether a party was incumbent, and the number of times each party was mentioned in print media during the week before the election.
Comparing the relative change in page views to the EU Parliament elections article and total voter turnout in the 2009 and 2014 elections indicates that interest in election events is proportional to levels of readership on Wikipedia. This research suggests that often the party garnering the most page views does not win the election, rather, it may be a smaller party which interested swing voters. Figure 1(a) shows a high correlation between print media mentions and overall voter share for parties. Figure 1(b) shows Wikipedia page views may predict a new party's success, while news outlet mentions are better at predicting an established party's success.
The research tests the theory that an increase in Wikipedia page views may suggest an increase to votes for a party using three linear ordinary least squares regression models. The first model is a baseline of past voting results. The second model is also a baseline model which includes past voting results, along with all other non-Wikipedia related data collected. These baseline models serve as a comparison to the third model, which includes all the previously modeled data, along with two Wikipedia-related parameters. The models show that Wikipedia can be considered a predictor of voter outcome, but it only marginally improves upon the baseline models. Wikipedia's predictive power lies in predicting the amount a party's vote share may increase or decrease from the previous election cycle.
As noted by the researchers, one limitation of this article is that the data is at an aggregated level, while all theories are at the micro level. Also, it is unclear what number of Wikipedia page views reflect voters versus other groups, such as journalists or those those affiliated with the parties.
(See also our 2014 coverage of some related blog posts by the same authors: " Wikipedia use driven by news media or replacing news media?")
See the research events page on Meta-wiki for upcoming conferences and events, including submission deadlines.
Other recent publications that could not be covered in time for this issue include the items listed below. contributions are always welcome for reviewing or summarizing newly published research.
Discuss this story
Why women edit less
Study after study by researchers, sitting in ivory towers, about the causes of the gender gap on Wikipedia, yet none have studied what is obvious to many of the foot-soldiers edting here.
For example: How about the use of derogatory language when it comes to depicting women on Wikipedia referring to them females. This has been brought up time and again, but is still dismissed as silliness by many, including editors who themselves belong to a marginalized real-life group such as Gay (for example).
And how about marginalizing those who support more inclusiveness of women. It is not unheard of to characterize such editors who are perceived to be men as ‘’creepy’’. It creates an environment where any editor who is the target of attacks (many/most editors here) must think twice before showing support for these issues. Ottawahitech ( talk) 18:43, 5 November 2016 (UTC)please ping me reply
I never thought of molluscs as having some kind of female connotations. Or asteroids as male - they're just big rocks in space, what could be less gendered? Maybe there is something I am missing.-- Bellerophon5685 ( talk) 06:40, 11 November 2016 (UTC) reply