An Arxiv preprint titled "Echoes of power: Language effects and power differences in social interaction" [1] looks at the language used by Wikipedia editors. The authors look at how conversational language can be used to understand power relationships. The research analyzes how much one adapts their language to the language of others involved in a discussion (the process of language coordination). The findings indicate that the more such adoption occurs, the more deferential one is. The authors find that editors on Wikipedia tend to coordinate (language-wise) more with the administrators than with non-administrators. Further, the study suggests that one's ability to coordinate language has an impact on one's chances to become an administrator: the admin-candidates who do more language coordination have a higher chance of becoming an administrator than those who don't change their language. Once a person is elected an administrator, they tend to coordinate less.
A blog post on the website of Technology Review summarized the results using the headline " Algorithm Measures Human Pecking Order" and highlighted the fact that one of the authors is Jon Kleinberg, known as inventor of the HITS algorithm (also known as "hubs and authorities").
An article [2] by a librarian and professor at California State University, East Bay offers a comparison of "biographical content for literary authors writing in English" between Wikipedia, "the web" (i.e. top Google search results) and two commercial databases: the Biography Reference Bank (BRB, now part of EBSCO Industries) and Contemporary Authors Online, motivated by the decision of the author's institution to cancel its subscription to the latter database (CAO) during a budget crisis in 2008–2009, which among other reasons had been accompanied by "a comment that this information is 'on the web'".
The paper starts out with a literature review on the reliability of Wikipedia and then describes how the author compiled a list of 500 authors (mostly from the US and UK) by "examining curricula and textbooks from English literature courses across the USA" and soliciting additional suggestions from peers. These names were then searched on BRB, CAO (as part of the Literature Resource Center), Wikipedia and Google.
Regarding breadth of coverage, only six of the 500 names were "absent" on Wikipedia (meaning that they had "no entry of their own or reference in any other entry"), compared to 14 for CAO, and 50 for the Biography Reference Bank.
While the study does not seem to have attempted a systematic comparison of factual accuracy, it observes that Wikipedia "entries are less uniform than those in commercial databases. The biographical information ranges from extensive to perfunctory."
The author remarks favorably on Wikipedia's searchability:
A large part of the comparison consists of examining each resource's production process. Wikipedians may find parallels to their policies on biographies of living people, self-published sources and notability in the description for the Biography Reference Bank:
In the conclusion, the author answers the initial question by recommending that her employer "re-subscribe to a commercial biographical database" if the budget would permit it again, because "Commercial databases provide a foundation with authoritative core content authenticated prior to publication and integrated with the fabric of information in the library’s holdings. They are easy to search and reliable, although they cannot be as current as Wikipedia or the Web because of their authentication processes. Wikipedia become [ sic] more impressive as searching proceeded. The focus may be on verifiability rather than authority and there may be challenges in securing contributors, but the current contributors provide citations and often include unique information." All in all she seems to favor Wikipedia and the two databases over "The web" (Google results) which "may have plenty of dross and be less reliable, harder to search, and focused on commercialism, but there are gold nuggets." She worries: "What will happen if contributors to Wikipedia and the web have no authoritative databases to use as sources?"
Among the student projects in a class on " Computational Analysis of Social Processes" at Rensselaer Polytechnic Institute, three analyzed social networks of Wikipedia editors:
A study presented earlier this month at the annual meeting of the American Economic Association which is to appear in The American Economic Review [6] sets out to test whether the English Wikipedia is truly neutral, by measuring bias within a sample of 28,000 entries about US political topics, examined over a decade. The bias is identified through detecting the use of language specific to one side of the American political scene (Democrats or Republicans). To quote from the article: "In brief, we ask whether a given Wikipedia article uses phrases favored more by Republican members or by Democratic members of Congress" (in the text of the 2005 Congressional Record, using a method developed in an earlier paper by Gentzkow and Shapiro who applied it to newspapers). The authors identified, as of January 2011, 70,668 articles related to US politics, about 40% of which had a statistically significant bias. They find that Wikipedia articles are often biased upon creation, and that this bias rarely changes. Early on in Wikipedia's history, most had a pro-Democratic bias, and while "by the last date, Wikipedia's articles appear to be centered close to a middle point on average", this is simply an effect of a larger amount of new pro-Republican articles than due to the existing ones having been rewritten neutrally.
While the authors made efforts to exclude articles not pertinent to US politics (requiring the terms "United States" or "America" to appear at least three times in the article text), the sample also includes the clearly international article Iraq War. And in what Wikipedians may call out as systemic bias, the authors never question their assumption that for an international encyclopedia, a lack of bias would be indicated by the replication of the spectrum of opinions present in the US Congress. As early as 2006, Jimmy Wales objected to such notions with respect to the community of contributors: "If averages mattered, and due to the nature of the wiki software (no voting) they almost certainly don't, I would say that the Wikipedia community is slightly more liberal than the U.S. population on average, because we are global and the international community of English speakers is slightly more liberal than the U.S. population. ... The idea that neutrality can only be achieved if we have some exact demographic matchup to [the] United States of America is preposterous." Nevertheless, even if one turns the study on its head and reads it as a statement on average American political opinion compared to the rest of the world as reflected in the English Wikipedia, its results remain remarkable.
An Arxiv preprint titled "Echoes of power: Language effects and power differences in social interaction" [1] looks at the language used by Wikipedia editors. The authors look at how conversational language can be used to understand power relationships. The research analyzes how much one adapts their language to the language of others involved in a discussion (the process of language coordination). The findings indicate that the more such adoption occurs, the more deferential one is. The authors find that editors on Wikipedia tend to coordinate (language-wise) more with the administrators than with non-administrators. Further, the study suggests that one's ability to coordinate language has an impact on one's chances to become an administrator: the admin-candidates who do more language coordination have a higher chance of becoming an administrator than those who don't change their language. Once a person is elected an administrator, they tend to coordinate less.
A blog post on the website of Technology Review summarized the results using the headline " Algorithm Measures Human Pecking Order" and highlighted the fact that one of the authors is Jon Kleinberg, known as inventor of the HITS algorithm (also known as "hubs and authorities").
An article [2] by a librarian and professor at California State University, East Bay offers a comparison of "biographical content for literary authors writing in English" between Wikipedia, "the web" (i.e. top Google search results) and two commercial databases: the Biography Reference Bank (BRB, now part of EBSCO Industries) and Contemporary Authors Online, motivated by the decision of the author's institution to cancel its subscription to the latter database (CAO) during a budget crisis in 2008–2009, which among other reasons had been accompanied by "a comment that this information is 'on the web'".
The paper starts out with a literature review on the reliability of Wikipedia and then describes how the author compiled a list of 500 authors (mostly from the US and UK) by "examining curricula and textbooks from English literature courses across the USA" and soliciting additional suggestions from peers. These names were then searched on BRB, CAO (as part of the Literature Resource Center), Wikipedia and Google.
Regarding breadth of coverage, only six of the 500 names were "absent" on Wikipedia (meaning that they had "no entry of their own or reference in any other entry"), compared to 14 for CAO, and 50 for the Biography Reference Bank.
While the study does not seem to have attempted a systematic comparison of factual accuracy, it observes that Wikipedia "entries are less uniform than those in commercial databases. The biographical information ranges from extensive to perfunctory."
The author remarks favorably on Wikipedia's searchability:
A large part of the comparison consists of examining each resource's production process. Wikipedians may find parallels to their policies on biographies of living people, self-published sources and notability in the description for the Biography Reference Bank:
In the conclusion, the author answers the initial question by recommending that her employer "re-subscribe to a commercial biographical database" if the budget would permit it again, because "Commercial databases provide a foundation with authoritative core content authenticated prior to publication and integrated with the fabric of information in the library’s holdings. They are easy to search and reliable, although they cannot be as current as Wikipedia or the Web because of their authentication processes. Wikipedia become [ sic] more impressive as searching proceeded. The focus may be on verifiability rather than authority and there may be challenges in securing contributors, but the current contributors provide citations and often include unique information." All in all she seems to favor Wikipedia and the two databases over "The web" (Google results) which "may have plenty of dross and be less reliable, harder to search, and focused on commercialism, but there are gold nuggets." She worries: "What will happen if contributors to Wikipedia and the web have no authoritative databases to use as sources?"
Among the student projects in a class on " Computational Analysis of Social Processes" at Rensselaer Polytechnic Institute, three analyzed social networks of Wikipedia editors:
A study presented earlier this month at the annual meeting of the American Economic Association which is to appear in The American Economic Review [6] sets out to test whether the English Wikipedia is truly neutral, by measuring bias within a sample of 28,000 entries about US political topics, examined over a decade. The bias is identified through detecting the use of language specific to one side of the American political scene (Democrats or Republicans). To quote from the article: "In brief, we ask whether a given Wikipedia article uses phrases favored more by Republican members or by Democratic members of Congress" (in the text of the 2005 Congressional Record, using a method developed in an earlier paper by Gentzkow and Shapiro who applied it to newspapers). The authors identified, as of January 2011, 70,668 articles related to US politics, about 40% of which had a statistically significant bias. They find that Wikipedia articles are often biased upon creation, and that this bias rarely changes. Early on in Wikipedia's history, most had a pro-Democratic bias, and while "by the last date, Wikipedia's articles appear to be centered close to a middle point on average", this is simply an effect of a larger amount of new pro-Republican articles than due to the existing ones having been rewritten neutrally.
While the authors made efforts to exclude articles not pertinent to US politics (requiring the terms "United States" or "America" to appear at least three times in the article text), the sample also includes the clearly international article Iraq War. And in what Wikipedians may call out as systemic bias, the authors never question their assumption that for an international encyclopedia, a lack of bias would be indicated by the replication of the spectrum of opinions present in the US Congress. As early as 2006, Jimmy Wales objected to such notions with respect to the community of contributors: "If averages mattered, and due to the nature of the wiki software (no voting) they almost certainly don't, I would say that the Wikipedia community is slightly more liberal than the U.S. population on average, because we are global and the international community of English speakers is slightly more liberal than the U.S. population. ... The idea that neutrality can only be achieved if we have some exact demographic matchup to [the] United States of America is preposterous." Nevertheless, even if one turns the study on its head and reads it as a statement on average American political opinion compared to the rest of the world as reflected in the English Wikipedia, its results remain remarkable.
Discuss this story
Very thorough job with this report. Pine talk 08:14, 31 January 2012 (UTC) reply