A draft of a letter, submitted for publication, has been posted on ArXiv. [1] The letter reports research on modeling the process of collaborative editing in Wikipedia and similar open-collaboration writing projects. The work builds on previous research by some of its authors on conflict detection in Wikipedia. The authors explore a simple agent-based model of opinion dynamics, in which editors influence each other either by direct communication or by successively editing a shared medium, such as a Wikipedia page. According to the authors, the model, although highly idealized, exhibits a rich behavior that can reproduce, albeit only qualitatively, some key characteristics of conflicts over real-world Wikipedia pages. The authors show that, for a fixed editorial pool with one "mainstream" and two opposing "extremist" groups, consensus is always reached. However, depending on the values of the model's input parameters, achieving consensus may take an extremely long time, and the consensus does not always conform to the initial mainstream view. In the case of a dynamic group, where new editors replace existing ones, consensus may be achieved through a phase of conflict, depending on the rate of new editors joining the editorial pool and on the degree of controversy over the article's topic.
In a copyright panel at this month's Wikimania, Abhishek Nagaraj – a PhD student and economist from the MIT Sloan School of Management – presented early results from an econometric study of copyright law. The study used data from the English Wikipedia's WikiProject Baseball to try to consider how gains from digitization are moderated by the effects of copyright. Previous work on the economics of copyrights have struggled to disentangle the effects of copyright with the effects of increased access that often coincides with content after it has entered the public domain.
The paper takes advantage of the fact that in 2008, Google digitized and published a large number of magazines as part of the Google Books projects. Among other magazines published were 70 years of back-issues of Baseball Digest, a magazine that publishes baseball stories, statistics, and photographs. Measuring the effect of digitization, Nagaraj found that the articles on baseball All-Stars from between 1944 and 1984 saw large increases in size (5,200) around the period that the digital Google Books version of Baseball Digest became available. However, because of the law governing copyright expiration, all the issues of Baseball Digest published before 1964 were in the public domain, while issues published after were not. Using the econometric difference in differences technique, Nagaraj compared the different effects of digitization for (1) players who began their professional baseball career after 1964 and as a result had no new digitized public-domain material and (2) players who had played before and were thus more likely to have digitized material about them enter the public domain.
In terms of the effect of copyright, Nagaraj found no effect on the length of Wikipedia articles on public domain status but found a strong effect for images. Wikipedia writers could, presumably, simply rewrite copyrighted material or may not have found the Baseball Digest form appropriate for the encyclopedia. However, Nagaraj found that the availability of public domain material in Baseball Digest led to a strong increase in the number of images. Before Google Books published the material, the pre-64 group had an average of 0.183 pictures on their articles and the post 64 group had about 0.158 pictures. In the period after digitization, both groups increased but the older group increased more, to 1.15 pictures per article as opposed to 0.667 images for the more recent players whose Baseball Digest material was still under copyright. Nagaraj also found that those players with public domain material have more traffic to their articles. The essay controls for a large number of variables related to players, their performance and talent, and their potential popularity, as well as for trends in Wikipedia editing.
The presentation slides are available on the Wikimania conference website [2] and a nice journalistic write-up was published by The Atlantic.
Field notes can be a valuable source of information about meteorological, geological and ecological aspects of the past, and making them accessible by way of Wikisource-based semantic annotation was the focus of a recent study [3] published in ZooKeys as part of a special issue on the digitization of natural history collections. The paper described how the field notes of Junius Henderson from the years 1905–1910 have been transcribed on Wikisource and then semantically annotated, as illustrated in the screenshot. Henderson was an avid collector of molluscs and, while trained as a judge, served as the first curator of the University of Colorado Museum of Natural History. His notebooks are rich in species occurrence records, but also contain occasional gems like this one from September 3, 1905:
“ | Train again so late as to afford ample opportunity for philosophic meditation upon the motives which inspire railroad people to advertise time which they do not expect to make except under rare circumstances | ” |
The article provides a detailed introduction to the workflows on the English Wikisource in general and to WikiProject Field Notes in particular, which is home to transcriptions of other field notes as well. The data resulting from annotation of the field notes are available in Darwin Core format under a Creative Commons Public Domain Dedication (CC0). This work ties in with discussions that took place at Wikimania about the future of Wikisource, the technical prerequisites and existing tools and initiatives.
The quality of medical information in Wikipedia could be vastly improved, based on the results of a recent study of 24 articles in pediatric otolaryngology [4] (more commonly referred to as "ear, nose, and throat" or ENT). The study compared results on common ENT diagnoses from Wikipedia, eMedicine, and MedlinePlus (the three most popular websites, by their determination) and they found that Wikipedia's articles on ENT were the least accurate and had the most errors of the three and that they were in the middle of the other two in regards to readability.
While one of the most referenced sources in this area, Wikipedia had poor content accuracy (46%) compared to the two other frequent sources. MedlinePlus has comparable (49%) accuracy, but was missing 7 topics. The clear leader in accuracy, eMedicine, suffers from a higher reading level. The study provides specific criteria, in section 2.3, which could be considered for evaluation of existing articles. One limitation of the study is that, while suggesting that Wikipedia "suffers from the lack of understanding that a physician-editor may offer", it does not point to information on how to get involved with Wikipedia. Engagement with the pediatric medicine community would be beneficial, especially since about 25% of parents made decisions about their children's care in part based on online information.
A forthcoming paper at this year's WikiSym conference investigates the emotions expressed in article and user talk pages. [5] "Administrators tend to be more positive than regular users", and the paper suggests that "as women gain experience in Wikipedia they tend to adopt the emotional tone of administrators", for instance linking to policy at more than twice the rate as males. Due to the likelihood of women to interact with other women, they suggest gender-aware recruiting to address the gender gap.
The authors point out the utility of positive emotion in keeping discussions on track, and suggest that experienced editors should be encouraged to maintain a positive climate. To determine users' gender, they used a crowd-sourced study through Crowdflower. Emotions are determined using the ANEW wordlist which distinguishes the range of emotional variability, based on valence, arousal, and dominance. The paper notes that policy mentions tend to have "a remarkably positive and dominant tone, and with stronger emotional load than in the rest of the discussion'".
A paper from the University of Alberta addresses the difficulty of analyzing edit histories and finding conflict in particular. [6] They use terms indicating content-based agreement (e.g. "add", "fix", "spellcheck", "copy", and "move") and disagreement ("uncited", "fact", "is not", "bias", "claim", "revert", and "see talk page"). They define conflicting interactions as those that revert, or delete content, or use more negative terms than positive terms. They find that this is a useful way to identify controversial articles.
A student paper for a course on " Project in Mining Massive Data Sets" at Stanford University, titled "Wikipedia Mathematical Models and Reversion Prediction" [7] tries to use mathematical models "to explain why the amount of [editors on the English Wikipedia] stops increasing, whereas the amount of viewers keeps increase", and "to predict if an edit will be reverted." The researchers used Elastic MapReduce on Amazon's servers to carry out this research. The paper is a bit confused since the researchers are more interested in models and validation than explaining the phenomena.
The first part of the paper includes two models for examining the relation of visitors to editors in Wikipedia's community. The first model makes the assumption that editors act as predators and articles have the role of prey. However this model did not fit the data. The second model used a linear regression between a number of factors which allow the authors to model the community's statistics over time. The model is then tested using simulation and seems to present accurate results.
In the second part of the paper, three models were used to predict which edits will get reverted. The models were trained using 24 features, classified either as edit, editor or article based. E.g. an article's age; its edit count; number of editors participating in editing; number of articles the editor has edited; change in information compared to previous status. The outcome of the prediction which used three machine learning algorithms achieved about 75% accuracy and another interesting conclusion was that the ability to detect reversion has not changed much over time.
However, Wikimedia Germany supported a community effort to produce photos for Wikimedia articles on members of the German Olympic team, by piggybacking on the press event at which the team's clothing for London 2012 was presented to the public. Five volunteers managed to take several hundred pictures of the team and the event.
The summer Paralympics, which will start shortly after the finish of the able-bodied Olympics, is locked into the same restrictions on photography and licensing. However, Wikimedia Australia has been working closely with the Australian Paralympic Committee to enable Wikinews coverage by two Wikimedians, Laura Hale and Hawkeye7, the only Wikimedians to have been granted press accreditation at the 2012 Paralympics. This will give them access to Paralympians and other personnel after they finish their events, to ask questions during press conferences, and to conduct interviews. But Wikimedians have to accept that the Olympics are now among the most highly commercialised events in the world. Laura Hale told the Signpost that "rights holders, the ones that pay big money, get the first chance to interview people. Then we're granted a few minutes for interviews if the athletes are amenable." One small hope is to photograph athletes outside the village, she says, which is allowable without commercial restrictions on licensing.
Hale said this is a great opportunity to improve women's content on both Wikipedia and Wikinews, and coverage of people with disabilities, particularly Asian and African Paralympians. She and Hawkeye7 will be working to take and upload pictures under the non-commercial licenses used by Wikinews – which is in line with the International Paralympic Committee's regulations. Non-commercial licenses are incompatible with the licensing policy of Commons. The Australian Paralympic Committee will upload some of their own images under a Creative Commons license, specifically to make them easier for use on Wikinews; however, these images face the same problems as those that will be taken by Laura Hale and Hawkeye7. While images and video are a problem, there are no such restrictions for audio files, which means interviews can be uploaded to Commons under a compatible license.
The Wikimedia Foundation has published its 2012–13 Annual Plan, focusing on technical improvements, editor retention, and structural reforms over the coming year. The movement's total revenue, including almost all chapter funding, is slated to rise by 35%, from $34.2 million to $46.1 million, and global spending to more than $42.1 million, although both figures overstate the real increases, since the recent financial reforms now include all financial categories in these figures. The foundation's own core spending will grow by 15% to $30.2 million in 2012–13.
Due to the new financial structure of the movement, $11.4 million of the volunteer-run Funds Dissemination Committee's (FDC) awards and grants – mainly to go to Wikimedia chapters – are part of the WMF's annual plan for the first time. The foundation plans to request $4.5 million of the FDC's $11.4 million allocation to finance non-core activities, which will include the Wikimedia grant program, a GAC allocation doubled to $600k, global education, and education programs in the Arabic-speaking world, Brazil, and India. The movement's overall revenue is projected to grow by 35%, from $34.2M to $46.1M, while continuing to use the less aggressive annual fundraiser methods deployed in 2011–12 with fewer days and fewer "Jimmy Days". Jimmy Wales's image was displayed in the annual fundraiser banners on 12 of the 46 days in 2011, compared with 36 of 50 days in 2010.
On the downside, the plan acknowledges that the foundation has been unable to significantly increase the diversity of its communities – including female participation, which remains at a strikingly low 9% – or to turn the tide on the slight decline of project participation, down in March 2012 to 85,000 regular users (more than five edits a month) from 89,000 a year earlier. This contrasts with last year's goal to increase participation to 95,000 regular users by June 2012. The new Visual Editor was expected to be ready for deployment by June 2012, a target that has now been put back a year to mid-2013. On the other hand, the readership growth goals – to reach a billion people by 2015 – are on track due to increasing mobile page views of 2,008M in April 2012, up a remarkable 187% from 726M a year earlier. The combined Wikipedias, scheduled to reach 50 million articles by 2015, had 22.3 million entries in March 2012, up from 18.8 million over the past year.
According to the plan, the foundation will "redouble" its work to reverse the decreasing participation trend. The document also recognises and describes other key risks, including that:
To address these challenges and the related content-goals, the foundation will increase its support for efforts in strategic key areas such as the Arabic-speaking world, Brazil, and India; the WMF will promote new models of community self-organization (Signpost coverage). Boosting technical capacity will secure the launch of the Visual Editor and new multimedia tools, and will improve mobile access to Wikimedia sites. The foundation's engineering department will be the main focus for staff recruitment: up to 30 engineering jobs will boost numbers by nearly 50%, in an overall staffing increase of 55 for the foundation, bringing numbers to 174.
The UK Telegraph has just published a story apparently sparked by the site-ban of the chair of the WMUK board by the English Wikipedia's ArbCom last week. Written by technology correspondent Christopher Williams under the title "Chairman of Wikipedia charity banned after pornography row", the article attempts to link Fæ's "punishment" with what it calls "a deep rift among Wikipedia contributors over the mass of explicit material in the online encyclopedia", and with the UK government's proposed new controls "to protect children online ... potentially limiting access to Wikipedia".
However, Williams provides no evidence for connecting the complex issues underlying Fæ's ban with the community's protracted discussion of controversial content; nor does his article – complete with a large photograph of Fæ – back up the implication that Wikipedia's policies and practices concerning such content might be caught up by the government's proposed rules. He wrongly confuses the English Wikipedia's rules for controversial content with those of Commons, writing somewhat boldly that "Wikimedia Commons makes massive volumes of pornography freely available to any Wikipedia visitor."
In response to the announcement of ArbCom's sanctions on Fæ, the board of Wikimedia UK had released a statement on 26 July.
The Board is united in the view that this decision does not affect [Fæ's] role as a Trustee of the charity. His work at Wikimedia UK has always been enthusiastic and diligent. In particular, his knowledge of charity governance, and his ability to bring about consensus at WMUK's board meetings, have been particularly valuable. The Board points out that the editing issues were fully public before, and during, the recent elections to the board, and were openly and publicly discussed. Our membership placed their trust in him by electing him as a Trustee. He was then elected unanimously as Chair of the Board. He continues to have the full support of the Board.
Jon Davies, chief executive of WMUK, responded to Williams' piece at the chapter's blog-site: "The Daily Telegraph has chosen its headline to create maximum impact. The reality is far, far more complex." The blog reprinted the board's statement of support, with a link to the publicly available minutes of the board meeting at which it was endorsed.
On July 25, the WMF launched a discussion of how the award of Wikimania scholarships should be reformed. The volunteer committee that reviews scholarship applications for Wikimania has experienced capacity problems, and its structure will be reviewed.
Among the more than 150 scholarships awarded in 2012 – partly with the support of chapters and other entities – the committee approved 130 from applicants in 57 countries. The cost of the scholarship scheme amounts to several hundred thousand dollars. The committee examined some 1,100 confidential applications, with supporting staff aiming to balance factors such as geography, WMF project, and whether applicants had been awarded scholarships for previous Wikimanias. Cost estimates for foundation Wikimania 2012 scholarships are graphed here, based on estimated flights to and from Washington DC from representative airports and assuming a 300 euro award for partial scholars.
Editors are welcome to participate in the discussion on Meta, which is determining how to improve transparency, efficiency and coordination, and alignment with the movement's strategic priorities and the role of qualification standards. The current design of the process is in the handbook.
Wikipedia organized the first ever Punjabi Wikipedia workshop in Punjab at Ludhiana City on 28th July, 2012. Ludhiana, an industrial city of Punjab, saw a decent turnout of 20 people for this open for all workshop. The workshop started with the basic presentation aimed at spreading awareness about Punjabi Wikipedia, educating users on editing techniques, contributing articles and encouraging users to propagate their native language and share their knowledge with the world.
What was amazing though, is the large number of women attendants. So far, Punjabi Wikipedia, that completes ten years, had only two editors and very few articles. After the workshop, we saw an addition of fifteen new editors of which thirteen are women. We also got four new administrators: Tow, Tari Buttar, Guglani and Surinder Wadhawan. Two of the new sysops, Guglani and Surinder Wadhawan were present at the workshop and addressed the students’ queries and motivated them.
The workshop also got coverage by Punjabi media praising this effort from Wikipedia. This includes the Ajit, Punjab Tribune and Hindustan Times. Let's hope that this workshop will kick-start the series of many more workshops across the state and thus many more editors and many more Punjabi articles.
We continue our Summer Sports Series this week with WikiProject Horse Racing. Started in November 2005, the project has grown to include nearly 8,000 articles maintained by 34 active members. There are 10 Featured Articles and 19 Good Articles included in the project's scope. In addition to preparing articles for GA and FA status, the project attempts to create requested articles and locate requested images. We interviewed Redrose64, Montanabw, Tigerboy1966, Ealdgyth, and Cuddy Wifter.
What motivated you to join WikiProject Horse Racing? How do articles about horse racing differ from articles about other sports?
The project is home to 10 Featured Articles and 19 Good Articles. Have you contributed to any of these? What are some challenges you've encountered when improving horse racing articles?
How much overlap exists between WikiProject Horse Racing and WikiProject Equine? Do the two projects collaborate or share resources? Are there any other projects that share common interests with WikiProject Horse Racing?
Are some types of horse racing better covered by Wikipedia than others? Is horse racing in some countries under-represented? What can be done to fill holes in Wikipedia's coverage of horse racing?
How does the project determine notability for horses, jockeys, and owners? What are some good resources editors can turn to when sourcing an article or determining notability?
What are the project's most urgent needs? How can a new contributor help today?
Anything else you'd like to add?
Next week, we'll conclude the Summer Sports Series with a lesson in self defense. Until then, duke it out in the
archive.
Reader comments
Eight featured articles were promoted this week:
Five featured lists were promoted this week:
Eight featured pictures were promoted this week:
In the light of recent questions over the long-term reliability of Wikimedia wikis, the Signpost caught up with CT Woo, the Wikimedia Foundation's director of technical operations.
In the second of our series looking at this year's eight ongoing Google Summer of Code projects, the Signpost caught up with developer Nischay Nahata. Nischay is working on performance improvements to Semantic MediaWiki (SMW), a collection of extensions not in use on any Wikimedia Projects, but nevertheless boasting a significant list of adopters. SMW is also regarded as an influential player when it comes to deciding the course of MediaWiki's potential adoption of so-called "structured data" forms, which have recently come to prominence with the establishment of the Wikidata project. While SMW and Wikidata are distinct projects, there is an active exchange of ideas (and developers) between them. Nischay explained to the Signpost what he has been trying to accomplish, and what its broader impact might be:
“ | Semantic MediaWiki's continuous development has seen many changes and new datatypes introduced upon request. However, while these new data-types were introduced successfully to the front-end, the backend still stored them in the same way, requiring complex conversions to be undertaken whenever storing or retrieving data; such a system also required more space. My work has been focussed on redesigning the database to accommodate all supported data types in the most native way. This needed lots of re-factoring of code that depended on the old database architecture. While doing this I also introduced a special feature called "fixed tables" that lets a user shard (splinter) the database by using special tables for some "highly-used" properties. In addition, while waiting for code-review I wrote
unit tests for Semantic MediaWiki which was useful for me and will be for future contributors.
After redesigning the database layout I have been working on improving page read/write times by doing hash-based checks before querying the database. When done at the lowest level, I started to look for performance improvements in the Special Pages shipped with Semantic MediaWiki, and we planned to maintain Property-usage statistics to improve these Special Pages. As an amazing by-product of this feature, I introduced support for ' diff'ing the semantic data of a page (similar to what SemanticWatchList does, but in Semantic MediaWiki's core). I am currently working on improved handling of subobjects, and plan to later look into caching queries which is promising to be just as awesome. I hope my work will make SMW more practical to use for larger datasets, and so more websites can use it. |
” |
Nischay regularly updates a blog following his latest progress.
Not all fixes may have gone live to WMF sites at the time of writing; some may not be scheduled to go live for several weeks.
For the second time this year and the fourth in the history of the Arbitration Committee, there are no requests for arbitration or open cases.
The closure of Fæ last week marked the closure of the last open case before the Arbitration Committee. This has only happened on three occasions: in 2009, 2010 and in May of this year. At the time of writing, the Committee has no requests for arbitration before it.
Arbitration cases do not form all of the Committee's workload, however, as there are four requests for clarification and amendment and one motion being discussed.
Arbitrator Kirill Lokshin proposed a motion requiring the alteration of any instances of an editor's previous username in arbitration decisions to reflect their name change(s). Any instances appearing within the:
The Devil's Advocate has initiated an amendment request for the controversial Race and intelligence case. The request calls for the amendment of review remedies 1.1, 6.1 and 7.1.
Amendment 1 concerns 6.1 and 7.1; calls for the modification of SightWatcher's and TrevelyanL85A2's indefinite omni-namespace edit and discussion ban from Race and intelligence topics, including participation in discussions concerning topic-editor conduct, to be a standard topic ban from Race and intelligence-related edits (broadly construed) with a clearly-defined route for appeal of the sanction.
Amendment 2 concerns 1.1; calls for the modification of
Mathsci's admonishment for engaging in battlefield conduct to include an explicit warning that further battleground conduct (towards editors) related to the topic will be "cause for discretionary sanctions."
Reader comments
A draft of a letter, submitted for publication, has been posted on ArXiv. [1] The letter reports research on modeling the process of collaborative editing in Wikipedia and similar open-collaboration writing projects. The work builds on previous research by some of its authors on conflict detection in Wikipedia. The authors explore a simple agent-based model of opinion dynamics, in which editors influence each other either by direct communication or by successively editing a shared medium, such as a Wikipedia page. According to the authors, the model, although highly idealized, exhibits a rich behavior that can reproduce, albeit only qualitatively, some key characteristics of conflicts over real-world Wikipedia pages. The authors show that, for a fixed editorial pool with one "mainstream" and two opposing "extremist" groups, consensus is always reached. However, depending on the values of the model's input parameters, achieving consensus may take an extremely long time, and the consensus does not always conform to the initial mainstream view. In the case of a dynamic group, where new editors replace existing ones, consensus may be achieved through a phase of conflict, depending on the rate of new editors joining the editorial pool and on the degree of controversy over the article's topic.
In a copyright panel at this month's Wikimania, Abhishek Nagaraj – a PhD student and economist from the MIT Sloan School of Management – presented early results from an econometric study of copyright law. The study used data from the English Wikipedia's WikiProject Baseball to try to consider how gains from digitization are moderated by the effects of copyright. Previous work on the economics of copyrights have struggled to disentangle the effects of copyright with the effects of increased access that often coincides with content after it has entered the public domain.
The paper takes advantage of the fact that in 2008, Google digitized and published a large number of magazines as part of the Google Books projects. Among other magazines published were 70 years of back-issues of Baseball Digest, a magazine that publishes baseball stories, statistics, and photographs. Measuring the effect of digitization, Nagaraj found that the articles on baseball All-Stars from between 1944 and 1984 saw large increases in size (5,200) around the period that the digital Google Books version of Baseball Digest became available. However, because of the law governing copyright expiration, all the issues of Baseball Digest published before 1964 were in the public domain, while issues published after were not. Using the econometric difference in differences technique, Nagaraj compared the different effects of digitization for (1) players who began their professional baseball career after 1964 and as a result had no new digitized public-domain material and (2) players who had played before and were thus more likely to have digitized material about them enter the public domain.
In terms of the effect of copyright, Nagaraj found no effect on the length of Wikipedia articles on public domain status but found a strong effect for images. Wikipedia writers could, presumably, simply rewrite copyrighted material or may not have found the Baseball Digest form appropriate for the encyclopedia. However, Nagaraj found that the availability of public domain material in Baseball Digest led to a strong increase in the number of images. Before Google Books published the material, the pre-64 group had an average of 0.183 pictures on their articles and the post 64 group had about 0.158 pictures. In the period after digitization, both groups increased but the older group increased more, to 1.15 pictures per article as opposed to 0.667 images for the more recent players whose Baseball Digest material was still under copyright. Nagaraj also found that those players with public domain material have more traffic to their articles. The essay controls for a large number of variables related to players, their performance and talent, and their potential popularity, as well as for trends in Wikipedia editing.
The presentation slides are available on the Wikimania conference website [2] and a nice journalistic write-up was published by The Atlantic.
Field notes can be a valuable source of information about meteorological, geological and ecological aspects of the past, and making them accessible by way of Wikisource-based semantic annotation was the focus of a recent study [3] published in ZooKeys as part of a special issue on the digitization of natural history collections. The paper described how the field notes of Junius Henderson from the years 1905–1910 have been transcribed on Wikisource and then semantically annotated, as illustrated in the screenshot. Henderson was an avid collector of molluscs and, while trained as a judge, served as the first curator of the University of Colorado Museum of Natural History. His notebooks are rich in species occurrence records, but also contain occasional gems like this one from September 3, 1905:
“ | Train again so late as to afford ample opportunity for philosophic meditation upon the motives which inspire railroad people to advertise time which they do not expect to make except under rare circumstances | ” |
The article provides a detailed introduction to the workflows on the English Wikisource in general and to WikiProject Field Notes in particular, which is home to transcriptions of other field notes as well. The data resulting from annotation of the field notes are available in Darwin Core format under a Creative Commons Public Domain Dedication (CC0). This work ties in with discussions that took place at Wikimania about the future of Wikisource, the technical prerequisites and existing tools and initiatives.
The quality of medical information in Wikipedia could be vastly improved, based on the results of a recent study of 24 articles in pediatric otolaryngology [4] (more commonly referred to as "ear, nose, and throat" or ENT). The study compared results on common ENT diagnoses from Wikipedia, eMedicine, and MedlinePlus (the three most popular websites, by their determination) and they found that Wikipedia's articles on ENT were the least accurate and had the most errors of the three and that they were in the middle of the other two in regards to readability.
While one of the most referenced sources in this area, Wikipedia had poor content accuracy (46%) compared to the two other frequent sources. MedlinePlus has comparable (49%) accuracy, but was missing 7 topics. The clear leader in accuracy, eMedicine, suffers from a higher reading level. The study provides specific criteria, in section 2.3, which could be considered for evaluation of existing articles. One limitation of the study is that, while suggesting that Wikipedia "suffers from the lack of understanding that a physician-editor may offer", it does not point to information on how to get involved with Wikipedia. Engagement with the pediatric medicine community would be beneficial, especially since about 25% of parents made decisions about their children's care in part based on online information.
A forthcoming paper at this year's WikiSym conference investigates the emotions expressed in article and user talk pages. [5] "Administrators tend to be more positive than regular users", and the paper suggests that "as women gain experience in Wikipedia they tend to adopt the emotional tone of administrators", for instance linking to policy at more than twice the rate as males. Due to the likelihood of women to interact with other women, they suggest gender-aware recruiting to address the gender gap.
The authors point out the utility of positive emotion in keeping discussions on track, and suggest that experienced editors should be encouraged to maintain a positive climate. To determine users' gender, they used a crowd-sourced study through Crowdflower. Emotions are determined using the ANEW wordlist which distinguishes the range of emotional variability, based on valence, arousal, and dominance. The paper notes that policy mentions tend to have "a remarkably positive and dominant tone, and with stronger emotional load than in the rest of the discussion'".
A paper from the University of Alberta addresses the difficulty of analyzing edit histories and finding conflict in particular. [6] They use terms indicating content-based agreement (e.g. "add", "fix", "spellcheck", "copy", and "move") and disagreement ("uncited", "fact", "is not", "bias", "claim", "revert", and "see talk page"). They define conflicting interactions as those that revert, or delete content, or use more negative terms than positive terms. They find that this is a useful way to identify controversial articles.
A student paper for a course on " Project in Mining Massive Data Sets" at Stanford University, titled "Wikipedia Mathematical Models and Reversion Prediction" [7] tries to use mathematical models "to explain why the amount of [editors on the English Wikipedia] stops increasing, whereas the amount of viewers keeps increase", and "to predict if an edit will be reverted." The researchers used Elastic MapReduce on Amazon's servers to carry out this research. The paper is a bit confused since the researchers are more interested in models and validation than explaining the phenomena.
The first part of the paper includes two models for examining the relation of visitors to editors in Wikipedia's community. The first model makes the assumption that editors act as predators and articles have the role of prey. However this model did not fit the data. The second model used a linear regression between a number of factors which allow the authors to model the community's statistics over time. The model is then tested using simulation and seems to present accurate results.
In the second part of the paper, three models were used to predict which edits will get reverted. The models were trained using 24 features, classified either as edit, editor or article based. E.g. an article's age; its edit count; number of editors participating in editing; number of articles the editor has edited; change in information compared to previous status. The outcome of the prediction which used three machine learning algorithms achieved about 75% accuracy and another interesting conclusion was that the ability to detect reversion has not changed much over time.
However, Wikimedia Germany supported a community effort to produce photos for Wikimedia articles on members of the German Olympic team, by piggybacking on the press event at which the team's clothing for London 2012 was presented to the public. Five volunteers managed to take several hundred pictures of the team and the event.
The summer Paralympics, which will start shortly after the finish of the able-bodied Olympics, is locked into the same restrictions on photography and licensing. However, Wikimedia Australia has been working closely with the Australian Paralympic Committee to enable Wikinews coverage by two Wikimedians, Laura Hale and Hawkeye7, the only Wikimedians to have been granted press accreditation at the 2012 Paralympics. This will give them access to Paralympians and other personnel after they finish their events, to ask questions during press conferences, and to conduct interviews. But Wikimedians have to accept that the Olympics are now among the most highly commercialised events in the world. Laura Hale told the Signpost that "rights holders, the ones that pay big money, get the first chance to interview people. Then we're granted a few minutes for interviews if the athletes are amenable." One small hope is to photograph athletes outside the village, she says, which is allowable without commercial restrictions on licensing.
Hale said this is a great opportunity to improve women's content on both Wikipedia and Wikinews, and coverage of people with disabilities, particularly Asian and African Paralympians. She and Hawkeye7 will be working to take and upload pictures under the non-commercial licenses used by Wikinews – which is in line with the International Paralympic Committee's regulations. Non-commercial licenses are incompatible with the licensing policy of Commons. The Australian Paralympic Committee will upload some of their own images under a Creative Commons license, specifically to make them easier for use on Wikinews; however, these images face the same problems as those that will be taken by Laura Hale and Hawkeye7. While images and video are a problem, there are no such restrictions for audio files, which means interviews can be uploaded to Commons under a compatible license.
The Wikimedia Foundation has published its 2012–13 Annual Plan, focusing on technical improvements, editor retention, and structural reforms over the coming year. The movement's total revenue, including almost all chapter funding, is slated to rise by 35%, from $34.2 million to $46.1 million, and global spending to more than $42.1 million, although both figures overstate the real increases, since the recent financial reforms now include all financial categories in these figures. The foundation's own core spending will grow by 15% to $30.2 million in 2012–13.
Due to the new financial structure of the movement, $11.4 million of the volunteer-run Funds Dissemination Committee's (FDC) awards and grants – mainly to go to Wikimedia chapters – are part of the WMF's annual plan for the first time. The foundation plans to request $4.5 million of the FDC's $11.4 million allocation to finance non-core activities, which will include the Wikimedia grant program, a GAC allocation doubled to $600k, global education, and education programs in the Arabic-speaking world, Brazil, and India. The movement's overall revenue is projected to grow by 35%, from $34.2M to $46.1M, while continuing to use the less aggressive annual fundraiser methods deployed in 2011–12 with fewer days and fewer "Jimmy Days". Jimmy Wales's image was displayed in the annual fundraiser banners on 12 of the 46 days in 2011, compared with 36 of 50 days in 2010.
On the downside, the plan acknowledges that the foundation has been unable to significantly increase the diversity of its communities – including female participation, which remains at a strikingly low 9% – or to turn the tide on the slight decline of project participation, down in March 2012 to 85,000 regular users (more than five edits a month) from 89,000 a year earlier. This contrasts with last year's goal to increase participation to 95,000 regular users by June 2012. The new Visual Editor was expected to be ready for deployment by June 2012, a target that has now been put back a year to mid-2013. On the other hand, the readership growth goals – to reach a billion people by 2015 – are on track due to increasing mobile page views of 2,008M in April 2012, up a remarkable 187% from 726M a year earlier. The combined Wikipedias, scheduled to reach 50 million articles by 2015, had 22.3 million entries in March 2012, up from 18.8 million over the past year.
According to the plan, the foundation will "redouble" its work to reverse the decreasing participation trend. The document also recognises and describes other key risks, including that:
To address these challenges and the related content-goals, the foundation will increase its support for efforts in strategic key areas such as the Arabic-speaking world, Brazil, and India; the WMF will promote new models of community self-organization (Signpost coverage). Boosting technical capacity will secure the launch of the Visual Editor and new multimedia tools, and will improve mobile access to Wikimedia sites. The foundation's engineering department will be the main focus for staff recruitment: up to 30 engineering jobs will boost numbers by nearly 50%, in an overall staffing increase of 55 for the foundation, bringing numbers to 174.
The UK Telegraph has just published a story apparently sparked by the site-ban of the chair of the WMUK board by the English Wikipedia's ArbCom last week. Written by technology correspondent Christopher Williams under the title "Chairman of Wikipedia charity banned after pornography row", the article attempts to link Fæ's "punishment" with what it calls "a deep rift among Wikipedia contributors over the mass of explicit material in the online encyclopedia", and with the UK government's proposed new controls "to protect children online ... potentially limiting access to Wikipedia".
However, Williams provides no evidence for connecting the complex issues underlying Fæ's ban with the community's protracted discussion of controversial content; nor does his article – complete with a large photograph of Fæ – back up the implication that Wikipedia's policies and practices concerning such content might be caught up by the government's proposed rules. He wrongly confuses the English Wikipedia's rules for controversial content with those of Commons, writing somewhat boldly that "Wikimedia Commons makes massive volumes of pornography freely available to any Wikipedia visitor."
In response to the announcement of ArbCom's sanctions on Fæ, the board of Wikimedia UK had released a statement on 26 July.
The Board is united in the view that this decision does not affect [Fæ's] role as a Trustee of the charity. His work at Wikimedia UK has always been enthusiastic and diligent. In particular, his knowledge of charity governance, and his ability to bring about consensus at WMUK's board meetings, have been particularly valuable. The Board points out that the editing issues were fully public before, and during, the recent elections to the board, and were openly and publicly discussed. Our membership placed their trust in him by electing him as a Trustee. He was then elected unanimously as Chair of the Board. He continues to have the full support of the Board.
Jon Davies, chief executive of WMUK, responded to Williams' piece at the chapter's blog-site: "The Daily Telegraph has chosen its headline to create maximum impact. The reality is far, far more complex." The blog reprinted the board's statement of support, with a link to the publicly available minutes of the board meeting at which it was endorsed.
On July 25, the WMF launched a discussion of how the award of Wikimania scholarships should be reformed. The volunteer committee that reviews scholarship applications for Wikimania has experienced capacity problems, and its structure will be reviewed.
Among the more than 150 scholarships awarded in 2012 – partly with the support of chapters and other entities – the committee approved 130 from applicants in 57 countries. The cost of the scholarship scheme amounts to several hundred thousand dollars. The committee examined some 1,100 confidential applications, with supporting staff aiming to balance factors such as geography, WMF project, and whether applicants had been awarded scholarships for previous Wikimanias. Cost estimates for foundation Wikimania 2012 scholarships are graphed here, based on estimated flights to and from Washington DC from representative airports and assuming a 300 euro award for partial scholars.
Editors are welcome to participate in the discussion on Meta, which is determining how to improve transparency, efficiency and coordination, and alignment with the movement's strategic priorities and the role of qualification standards. The current design of the process is in the handbook.
Wikipedia organized the first ever Punjabi Wikipedia workshop in Punjab at Ludhiana City on 28th July, 2012. Ludhiana, an industrial city of Punjab, saw a decent turnout of 20 people for this open for all workshop. The workshop started with the basic presentation aimed at spreading awareness about Punjabi Wikipedia, educating users on editing techniques, contributing articles and encouraging users to propagate their native language and share their knowledge with the world.
What was amazing though, is the large number of women attendants. So far, Punjabi Wikipedia, that completes ten years, had only two editors and very few articles. After the workshop, we saw an addition of fifteen new editors of which thirteen are women. We also got four new administrators: Tow, Tari Buttar, Guglani and Surinder Wadhawan. Two of the new sysops, Guglani and Surinder Wadhawan were present at the workshop and addressed the students’ queries and motivated them.
The workshop also got coverage by Punjabi media praising this effort from Wikipedia. This includes the Ajit, Punjab Tribune and Hindustan Times. Let's hope that this workshop will kick-start the series of many more workshops across the state and thus many more editors and many more Punjabi articles.
We continue our Summer Sports Series this week with WikiProject Horse Racing. Started in November 2005, the project has grown to include nearly 8,000 articles maintained by 34 active members. There are 10 Featured Articles and 19 Good Articles included in the project's scope. In addition to preparing articles for GA and FA status, the project attempts to create requested articles and locate requested images. We interviewed Redrose64, Montanabw, Tigerboy1966, Ealdgyth, and Cuddy Wifter.
What motivated you to join WikiProject Horse Racing? How do articles about horse racing differ from articles about other sports?
The project is home to 10 Featured Articles and 19 Good Articles. Have you contributed to any of these? What are some challenges you've encountered when improving horse racing articles?
How much overlap exists between WikiProject Horse Racing and WikiProject Equine? Do the two projects collaborate or share resources? Are there any other projects that share common interests with WikiProject Horse Racing?
Are some types of horse racing better covered by Wikipedia than others? Is horse racing in some countries under-represented? What can be done to fill holes in Wikipedia's coverage of horse racing?
How does the project determine notability for horses, jockeys, and owners? What are some good resources editors can turn to when sourcing an article or determining notability?
What are the project's most urgent needs? How can a new contributor help today?
Anything else you'd like to add?
Next week, we'll conclude the Summer Sports Series with a lesson in self defense. Until then, duke it out in the
archive.
Reader comments
Eight featured articles were promoted this week:
Five featured lists were promoted this week:
Eight featured pictures were promoted this week:
In the light of recent questions over the long-term reliability of Wikimedia wikis, the Signpost caught up with CT Woo, the Wikimedia Foundation's director of technical operations.
In the second of our series looking at this year's eight ongoing Google Summer of Code projects, the Signpost caught up with developer Nischay Nahata. Nischay is working on performance improvements to Semantic MediaWiki (SMW), a collection of extensions not in use on any Wikimedia Projects, but nevertheless boasting a significant list of adopters. SMW is also regarded as an influential player when it comes to deciding the course of MediaWiki's potential adoption of so-called "structured data" forms, which have recently come to prominence with the establishment of the Wikidata project. While SMW and Wikidata are distinct projects, there is an active exchange of ideas (and developers) between them. Nischay explained to the Signpost what he has been trying to accomplish, and what its broader impact might be:
“ | Semantic MediaWiki's continuous development has seen many changes and new datatypes introduced upon request. However, while these new data-types were introduced successfully to the front-end, the backend still stored them in the same way, requiring complex conversions to be undertaken whenever storing or retrieving data; such a system also required more space. My work has been focussed on redesigning the database to accommodate all supported data types in the most native way. This needed lots of re-factoring of code that depended on the old database architecture. While doing this I also introduced a special feature called "fixed tables" that lets a user shard (splinter) the database by using special tables for some "highly-used" properties. In addition, while waiting for code-review I wrote
unit tests for Semantic MediaWiki which was useful for me and will be for future contributors.
After redesigning the database layout I have been working on improving page read/write times by doing hash-based checks before querying the database. When done at the lowest level, I started to look for performance improvements in the Special Pages shipped with Semantic MediaWiki, and we planned to maintain Property-usage statistics to improve these Special Pages. As an amazing by-product of this feature, I introduced support for ' diff'ing the semantic data of a page (similar to what SemanticWatchList does, but in Semantic MediaWiki's core). I am currently working on improved handling of subobjects, and plan to later look into caching queries which is promising to be just as awesome. I hope my work will make SMW more practical to use for larger datasets, and so more websites can use it. |
” |
Nischay regularly updates a blog following his latest progress.
Not all fixes may have gone live to WMF sites at the time of writing; some may not be scheduled to go live for several weeks.
For the second time this year and the fourth in the history of the Arbitration Committee, there are no requests for arbitration or open cases.
The closure of Fæ last week marked the closure of the last open case before the Arbitration Committee. This has only happened on three occasions: in 2009, 2010 and in May of this year. At the time of writing, the Committee has no requests for arbitration before it.
Arbitration cases do not form all of the Committee's workload, however, as there are four requests for clarification and amendment and one motion being discussed.
Arbitrator Kirill Lokshin proposed a motion requiring the alteration of any instances of an editor's previous username in arbitration decisions to reflect their name change(s). Any instances appearing within the:
The Devil's Advocate has initiated an amendment request for the controversial Race and intelligence case. The request calls for the amendment of review remedies 1.1, 6.1 and 7.1.
Amendment 1 concerns 6.1 and 7.1; calls for the modification of SightWatcher's and TrevelyanL85A2's indefinite omni-namespace edit and discussion ban from Race and intelligence topics, including participation in discussions concerning topic-editor conduct, to be a standard topic ban from Race and intelligence-related edits (broadly construed) with a clearly-defined route for appeal of the sanction.
Amendment 2 concerns 1.1; calls for the modification of
Mathsci's admonishment for engaging in battlefield conduct to include an explicit warning that further battleground conduct (towards editors) related to the topic will be "cause for discretionary sanctions."
Reader comments