From Wikipedia, the free encyclopedia
In focus

Measuring gender diversity in Wikipedia articles

When thinking about gender diversity in Wikipedia, we often think of the number of biographical articles about men and women. The Humaniki project shows that about 19% of biographical articles on the English Wikipedia are about women. However, this is only one aspect of gender diversity. In this article, I develop a method which measures gender diversity at the article level and show why it's useful.

Motivation

While working on the article about economics on the French Wikipedia, I was surprised by the low number of women among the people cited in the article. So I've started exploring methods to measure gender diversity. I draw a distinction between gender diversity and gender parity [1]. First, gender parity supposes binary gender, which excludes non-binary people. Second, gender parity implies that the ideal would be a fifty-fifty divide between men and women. After some iterations, I've found a way to measure gender diversity at the article level. This tool can be used to explore gender diversity for articles about academic fields, activities, or occupations. My approach is very basic and simply computes the share of people cited in an article by gender.

This simple quantitative approach to measure gender diversity is similar to many research projects on this theme in computational social sciences. David Doukhan is tracking women's speaking time on the radio [2]. Antoine Mazières and his co-authors are computing the share of screen time with women in popular movies [3] and Gilles Bastin and his co-authors are computing gender frequency of people cited in French newspapers [4].

Methodology

For each article, I get the list of internal links (also known as blue links). I retrieve them using the Wikipedia links API. Then I combine this query with a Wikidata SPARQL query [5]. I select all links corresponding to human beings in Wikidata (property P31 is Q5) and I retrieve their gender ( property P21 in Wikidata). Note that gender in Wikidata can be male, female, non-binary, intersex, transgender female, transgender male, or agender. I'd find it more intuitive to group together transgender males with males and transgender females with females but I prefer to keep the classification of Wikidata.

Last, I count the number of entities by gender and compute the share.

Everyone can compute gender diversity for a single Wikipedia article using the gender diversity explorer tool.

This is a very basic approach. It doesn't distinguish any difference between entities cited in the references and entities cited in the core of the article. It doesn't take into account people cited in the article without a link to a Wikipedia article. But even if it's imperfect, I believe this is a useful approach.

Numbers should be interpreted with caution. The number of gendered entities cited in a single article is often very low. I personally don't interpret proportions if the total number of gendered entities is lower than 50.

Insights

Focus on economics

Chart measuring gender diversity in the Wikipedia article Economics in May 2022.

Let's have a look at the article about economics. In May 2022, we find 137 males, 6 cisgender females, and 1 transgender female [6]. So fewer than 5% of people quoted in the article are female. Of course, everyone knows that many prominent economists from Adam Smith to Jean Tirole are male. So no one is really surprised to find a vast majority of males in the results. Nobody would be able to say what a fair share of females in the article would be. However, I personally think that 5% is not much and that the contribution of women to economics is more important. Harriet Martineau, Mary Paley Marshall, Joan Robinson, Elinor Ostrom, Anna Schwartz, Janet Yellen, Esther Duflo, or Susan Athey have all made major contributions to economics.

Focus on academic fields

Share of people cited in articles by gender for academic fields

In this section, I compare gender diversity in Wikipedia articles about some important academic fields. As with economics, we know that most academic fields have long been dominated by male figures. So we're not surprised to find a relative low share of women in Wikipedia articles. By comparing Physics, Architecture, Economics, Social science, Computer science, Philosophy, Mathematics, Psychology, Medicine, Music, Political science, Sociology, Biology, Science, Art, History, and Literature, I find that all of them have a proportion of men higher than 80% [7]. Values for computer science and political science should be taken with caution since the number of people cited in those articles is lower than 50. If we exclude computer science and political science, we find that 10 out of 15 articles have less than 10% of women among all gendered entities! If we look at raw numbers, the count of women in each article is really low: 4 women in mathematics, 4 women in medicine, 1 woman in physics.

Conclusion and discussion

I believe that measuring helps to raise awareness of the problem of gender diversity in Wikipedia articles. Anyone can play with the gender diversity inspector and discover some insights.

In the next months, I would like to explore gender diversity in articles about occupations (journalist, politician, etc.) and activities (journalism, politics, sports, etc.). I would also like to have large scale studies looking at all articles about academic fields or all articles about an occupation.

My experiments with measuring gender diversity in Wikipedia articles lead me to believe that women are often forgotten or undermined in Wikipedia articles about general topics. It would be worthwhile to give specific attention to this topic. WikiProjects such as Women in Red could focus on this issue to ensure that the role of women hasn't been diminished in articles.

References

  1. ^ "The idea of closing the “gender gap” itself has always struck me as somewhat problematic as it implies a gulf between two equivalent sides and reinforces the idea of binary gender. An aspiration to equitable “gender diversity” might be more fitting" writes Katherine Maher in "Capstone: Making History, Building the Future Together", in Wikipedia @ 20, MIT Press, 2020, https://wikipedia20.pubpub.org/pub/4d61w771/release/2?readingCollection=08ec69da
  2. ^ https://larevuedesmedias.ina.fr/la-radio-et-la-tele-les-femmes-parlent-deux-fois-moins-que-les-hommes
  3. ^ "Computational appraisal of gender representativeness in popular movies", https://www.nature.com/articles/s41599-021-00815-9
  4. ^ Gendered News project, https://gendered-news.imag.fr/genderednews/
  5. ^ See the SPARQL queries in the project methodology
  6. ^ https://observablehq.com/@pac02/explore-gender-diversity-in-a-single-wikipedia-article?wikipedia=en.wikipedia.org&article=Economics
  7. ^ https://observablehq.com/@pac02/gender-diversity-in-wikipedia-articles-evidence-from-some?collection=@pac02/gender-diversity-in-wikipedia-articles
From Wikipedia, the free encyclopedia
In focus

Measuring gender diversity in Wikipedia articles

When thinking about gender diversity in Wikipedia, we often think of the number of biographical articles about men and women. The Humaniki project shows that about 19% of biographical articles on the English Wikipedia are about women. However, this is only one aspect of gender diversity. In this article, I develop a method which measures gender diversity at the article level and show why it's useful.

Motivation

While working on the article about economics on the French Wikipedia, I was surprised by the low number of women among the people cited in the article. So I've started exploring methods to measure gender diversity. I draw a distinction between gender diversity and gender parity [1]. First, gender parity supposes binary gender, which excludes non-binary people. Second, gender parity implies that the ideal would be a fifty-fifty divide between men and women. After some iterations, I've found a way to measure gender diversity at the article level. This tool can be used to explore gender diversity for articles about academic fields, activities, or occupations. My approach is very basic and simply computes the share of people cited in an article by gender.

This simple quantitative approach to measure gender diversity is similar to many research projects on this theme in computational social sciences. David Doukhan is tracking women's speaking time on the radio [2]. Antoine Mazières and his co-authors are computing the share of screen time with women in popular movies [3] and Gilles Bastin and his co-authors are computing gender frequency of people cited in French newspapers [4].

Methodology

For each article, I get the list of internal links (also known as blue links). I retrieve them using the Wikipedia links API. Then I combine this query with a Wikidata SPARQL query [5]. I select all links corresponding to human beings in Wikidata (property P31 is Q5) and I retrieve their gender ( property P21 in Wikidata). Note that gender in Wikidata can be male, female, non-binary, intersex, transgender female, transgender male, or agender. I'd find it more intuitive to group together transgender males with males and transgender females with females but I prefer to keep the classification of Wikidata.

Last, I count the number of entities by gender and compute the share.

Everyone can compute gender diversity for a single Wikipedia article using the gender diversity explorer tool.

This is a very basic approach. It doesn't distinguish any difference between entities cited in the references and entities cited in the core of the article. It doesn't take into account people cited in the article without a link to a Wikipedia article. But even if it's imperfect, I believe this is a useful approach.

Numbers should be interpreted with caution. The number of gendered entities cited in a single article is often very low. I personally don't interpret proportions if the total number of gendered entities is lower than 50.

Insights

Focus on economics

Chart measuring gender diversity in the Wikipedia article Economics in May 2022.

Let's have a look at the article about economics. In May 2022, we find 137 males, 6 cisgender females, and 1 transgender female [6]. So fewer than 5% of people quoted in the article are female. Of course, everyone knows that many prominent economists from Adam Smith to Jean Tirole are male. So no one is really surprised to find a vast majority of males in the results. Nobody would be able to say what a fair share of females in the article would be. However, I personally think that 5% is not much and that the contribution of women to economics is more important. Harriet Martineau, Mary Paley Marshall, Joan Robinson, Elinor Ostrom, Anna Schwartz, Janet Yellen, Esther Duflo, or Susan Athey have all made major contributions to economics.

Focus on academic fields

Share of people cited in articles by gender for academic fields

In this section, I compare gender diversity in Wikipedia articles about some important academic fields. As with economics, we know that most academic fields have long been dominated by male figures. So we're not surprised to find a relative low share of women in Wikipedia articles. By comparing Physics, Architecture, Economics, Social science, Computer science, Philosophy, Mathematics, Psychology, Medicine, Music, Political science, Sociology, Biology, Science, Art, History, and Literature, I find that all of them have a proportion of men higher than 80% [7]. Values for computer science and political science should be taken with caution since the number of people cited in those articles is lower than 50. If we exclude computer science and political science, we find that 10 out of 15 articles have less than 10% of women among all gendered entities! If we look at raw numbers, the count of women in each article is really low: 4 women in mathematics, 4 women in medicine, 1 woman in physics.

Conclusion and discussion

I believe that measuring helps to raise awareness of the problem of gender diversity in Wikipedia articles. Anyone can play with the gender diversity inspector and discover some insights.

In the next months, I would like to explore gender diversity in articles about occupations (journalist, politician, etc.) and activities (journalism, politics, sports, etc.). I would also like to have large scale studies looking at all articles about academic fields or all articles about an occupation.

My experiments with measuring gender diversity in Wikipedia articles lead me to believe that women are often forgotten or undermined in Wikipedia articles about general topics. It would be worthwhile to give specific attention to this topic. WikiProjects such as Women in Red could focus on this issue to ensure that the role of women hasn't been diminished in articles.

References

  1. ^ "The idea of closing the “gender gap” itself has always struck me as somewhat problematic as it implies a gulf between two equivalent sides and reinforces the idea of binary gender. An aspiration to equitable “gender diversity” might be more fitting" writes Katherine Maher in "Capstone: Making History, Building the Future Together", in Wikipedia @ 20, MIT Press, 2020, https://wikipedia20.pubpub.org/pub/4d61w771/release/2?readingCollection=08ec69da
  2. ^ https://larevuedesmedias.ina.fr/la-radio-et-la-tele-les-femmes-parlent-deux-fois-moins-que-les-hommes
  3. ^ "Computational appraisal of gender representativeness in popular movies", https://www.nature.com/articles/s41599-021-00815-9
  4. ^ Gendered News project, https://gendered-news.imag.fr/genderednews/
  5. ^ See the SPARQL queries in the project methodology
  6. ^ https://observablehq.com/@pac02/explore-gender-diversity-in-a-single-wikipedia-article?wikipedia=en.wikipedia.org&article=Economics
  7. ^ https://observablehq.com/@pac02/gender-diversity-in-wikipedia-articles-evidence-from-some?collection=@pac02/gender-diversity-in-wikipedia-articles

Videos

Youtube | Vimeo | Bing

Websites

Google | Yahoo | Bing

Encyclopedia

Google | Yahoo | Bing

Facebook