WIKIPEDIA:WIKIPEDIA SIGNPOST 2012-07-30 TECHNOLOGY REPORT

Technology report

Talking performance with CT Woo and Green Semantic MediaWiki with Nischay Nahata

Talking performance with CT Woo

In the light of recent questions over the long-term reliability of Wikimedia wikis, the Signpost caught up with CT Woo, the Wikimedia Foundation's director of technical operations.

Hey CT. Many users have reported timeouts and other performance problems over the last few months. Does the Foundation view these as separate incidents or as representative of a larger trend?: There are several reasons. For example, we are in the midst of changing file systems from NFS to an object storage system (OpenStack Swift). Since it is a very new product, we did discover a performance issue occuring during some image deletions. We have investigated, tracked it down and I am happy to report it is no longer an issue. Also recently, we hit a Linux kernel bug where systems started rebooting themselves after about 211 days of uptime. As a result, we had to patch all the affected servers. In addition, a number of development teams (especially Platform and Localisation) have changed their build-test-deploy process over the last few months and are now rolling out more frequent (albeit smaller) deployments. I do like to add that 2011/2012 has been a relatively good year for our site uptime metrics, better than 2010/2011. For readers of Wikipedia, the uptime was 99.97%. For editors, the uptime was 99.86%.
Does the Foundation feel that it has the resources at its disposal to make these kind of problems a thing of the past?: Resources are always a constraint. Whenever we encounter or discover a critical issue, we will all circle in to fix the problem. We usually gather the domain experts when we hit a hard problem and they could be from the Foundation or from the community. For example, the Varnish Software folks are helping us now to fix some issues when using Varnish for multimedia streaming purposes.
Is there not a tension between the operations team on the one hand and development teams on the other that could cause more issues in the future?: On the contrary, the teams work together very well. Yes, we do have differences in opinions occasionally but they are all healthy discussions. Most of the time, the operations team aren't the ones who perform the deployment but they are on standby. However, should we find performance issues with the deployment, and depending on the severity, we do revert the changes, using perform profiling to help identify bottlenecks.
CT, thank you.

Google Summer of Code: Green Semantic MediaWiki

In the second of our series looking at this year's eight ongoing Google Summer of Code projects, the Signpost caught up with developer Nischay Nahata. Nischay is working on performance improvements to Semantic MediaWiki (SMW), a collection of extensions not in use on any Wikimedia Projects, but nevertheless boasting a significant list of adopters. SMW is also regarded as an influential player when it comes to deciding the course of MediaWiki's potential adoption of so-called "structured data" forms, which have recently come to prominence with the establishment of the Wikidata project. While SMW and Wikidata are distinct projects, there is an active exchange of ideas (and developers) between them. Nischay explained to the Signpost what he has been trying to accomplish, and what its broader impact might be:

“

Semantic MediaWiki's continuous development has seen many changes and new datatypes introduced upon request. However, while these new data-types were introduced successfully to the front-end, the backend still stored them in the same way, requiring complex conversions to be undertaken whenever storing or retrieving data; such a system also required more space. My work has been focussed on redesigning the database to accommodate all supported data types in the most native way. This needed lots of re-factoring of code that depended on the old database architecture. While doing this I also introduced a special feature called "fixed tables" that lets a user shard (splinter) the database by using special tables for some "highly-used" properties. In addition, while waiting for code-review I wrote unit tests for Semantic MediaWiki which was useful for me and will be for future contributors.

After redesigning the database layout I have been working on improving page read/write times by doing hash-based checks before querying the database. When done at the lowest level, I started to look for performance improvements in the Special Pages shipped with Semantic MediaWiki, and we planned to maintain Property-usage statistics to improve these Special Pages. As an amazing by-product of this feature, I introduced support for ' diff'ing the semantic data of a page (similar to what SemanticWatchList does, but in Semantic MediaWiki's core). I am currently working on improved handling of subobjects, and plan to later look into caching queries which is promising to be just as awesome. I hope my work will make SMW more practical to use for larger datasets, and so more websites can use it.

”

Nischay regularly updates a blog following his latest progress.

In brief

Signpost poll

Reader poll

You can now give your opinion on next week's poll: How well do geonotices (notices that appear to target only a limited geographic area) work for you?

Not all fixes may have gone live to WMF sites at the time of writing; some may not be scheduled to go live for several weeks.

Gerrit discussions continue: As reported in previous editions of the "Technology report", the start of code review tool Gerrit's own review period has sparked a series of discussions about its utility. In one, the possibility of making Gerrit compatible with popular Git repository management site GitHub was addressed ( wikitech-l mailing list); in another, the possibility of changing the Gerrit visuals to something without "puke green/yellow colour schemes" ( also wikitech-l). One thought that has come to prominence focusses on the soon-to-be-released Gerrit 2.5, which allows reusers such as Wikimedia to add their own plugins, an advance which will no doubt please those supporting an "improved Gerrit" outcome to the review. Improvements to Git statistical tracking were also in the news this week.
Lua to hit first WMF wikis in August: According to a recent update by Director of Platform Engineering, Lua scripts could be in operation on a Wikimedia wikis as soon as next month. Deployment will start on a test wiki, before moving to MediaWiki.org, when the possibilities afforded by MediaWiki's first serious attempt at providing a template programming language will begin to come under serious scrutiny. Talks regarding Lua (see previous Signpost coverage for context) were well received at both Berlin and Washington D.C.; any deployments are likely to attract significant developer attention.
Meet the Analytics teams: In a post on the Wikimedia blog, the WMF Analytics team introduced their work, focussed on an update to the Wikimedia Report Card and Kraken, a new "data services platform" aimed at providing a huge array of statistics generated from dozens of datasets. The blog post also stressed the Foundation's commitment to privacy under the heading "counting not tracking". WMF wikis have traditionally been praised for high privacy ratings, albeit at the potential expense of data collection (for example, see previous Signpost coverage).
Geolocation, geolocation, geolocation: The possibility of upgrading MediaWiki's geolocation abilities was raised this week ( wikitech-l mailing list). Geolocation powers geonotices, messages delivered via the watchlist to editors from a specific area, usually advertising meetups and other real world events. The privacy implications of utilising data other than just publicly available IP addresses will no doubt also need to be considered.
One bot approved: 1 BRFA was recently approved for use on the English Wikipedia:
- Legobot's 13th BRfA, creating a list of incorrectly moved pages for WP:AFC;

At the time of writing, 16 BRFAs are active. As usual, community input is encouraged.

← Previous "Technology report"

Next "Technology report" →

In this issue

Discuss this story

These comments are automatically transcluded from this article's talk page. To follow comments, add the page to your watchlist. If your comment has not appeared here, you can try purging the cache.

Thanks for focusing on Nischay Nahata's Google Summer of Code project; I think it's going to be very helpful for Semantic MediaWiki in a variety of ways. A few points of correction: "Semantic MediaWiki" is both an individual extension and the name given to the group of extensions that make use of it; SMW is in fact in use on one Wikimedia wiki, Wikimedia Labs (that counts, right?); I don't think academia makes for any significant portion of SMW's usage; and I don't think anyone with any serious knowledge of the matter would call SMW and Wikidata competitors: SMW is intended for single-language wikis, while Wikidata (or more accurately, the software it will run on, Wikibase) is intended for massively multilingual wikis like Wikipedia. You could argue that wikis with a few languages could use either system, but I would hardly say that makes the two competitors. Yaron K. ( talk) 14:57, 31 July 2012 (UTC) reply

Personally I consider labs to be an "internal" thing and not a "project" (in the sense of an open wiki for collaboration on writing down the sum of human knowladge, yadda). However I could see an argument both ways on that. translatewiki also uses SMW, and well it is separate from Wikimedia, it is highly integrated with our (non-english) projects. Bawolff ( talk) 17:20, 31 July 2012 (UTC) reply

- Thank you as ever for the corrections, Yaron, I do struggle to report with the same contextual depth on SMW as vanilla MW, especially in summary format. The competitors reference is to a thread last month on wikidata-l (I deliberately shan't link to it) in which one SMW advocate accused Wikidata of "rewriting SMW (and various of its extensions) almost from scratch" etc., etc. It's a hugely complex issue, especially since Wikidata phase 2 isn't fixed itself yet. I shall try to give a better researched (and longer) overview in the future, I promise :) . - Jarry1250 ^{Deliberation
  needed} 19:21, 1 August 2012 (UTC) (n.b. just in case it's confusing, the claims in question were later removed, not by me) reply

Thanks for responding, and for clarifying. I didn't know we were allowed to change the articles themselves... :) I mean, it's a wiki, but that's still a little unexpected. I also didn't know about that wikidata-l thread - I'm not on that mailing list. I just looked it up, and now I have to reiterate what I said about "anyone with any serious knowledge of the matter". :) Anyway, it's still good to see SMW being mentioned, and you (or anyone else) are always free to write to the semediawiki-user mailing list if you want quick feedback on anything. Yaron K. ( talk) 04:30, 2 August 2012 (UTC) reply

Keep up with The Signpost on Twitter, Facebook or Mastodon.

Home

About

Talking performance with CT Woo and Green Semantic MediaWiki with Nischay Nahata

Talking performance with CT Woo

Google Summer of Code: Green Semantic MediaWiki

In brief

Discuss this story

Talking performance with CT Woo and Green Semantic MediaWiki with Nischay Nahata

Talking performance with CT Woo

Google Summer of Code: Green Semantic MediaWiki

In brief

Discuss this story

Videos

Websites

Encyclopedia

Facebook