In the light of recent questions over the long-term reliability of Wikimedia wikis, the Signpost caught up with CT Woo, the Wikimedia Foundation's director of technical operations.
In the second of our series looking at this year's eight ongoing Google Summer of Code projects, the Signpost caught up with developer Nischay Nahata. Nischay is working on performance improvements to Semantic MediaWiki (SMW), a collection of extensions not in use on any Wikimedia Projects, but nevertheless boasting a significant list of adopters. SMW is also regarded as an influential player when it comes to deciding the course of MediaWiki's potential adoption of so-called "structured data" forms, which have recently come to prominence with the establishment of the Wikidata project. While SMW and Wikidata are distinct projects, there is an active exchange of ideas (and developers) between them. Nischay explained to the Signpost what he has been trying to accomplish, and what its broader impact might be:
“ | Semantic MediaWiki's continuous development has seen many changes and new datatypes introduced upon request. However, while these new data-types were introduced successfully to the front-end, the backend still stored them in the same way, requiring complex conversions to be undertaken whenever storing or retrieving data; such a system also required more space. My work has been focussed on redesigning the database to accommodate all supported data types in the most native way. This needed lots of re-factoring of code that depended on the old database architecture. While doing this I also introduced a special feature called "fixed tables" that lets a user shard (splinter) the database by using special tables for some "highly-used" properties. In addition, while waiting for code-review I wrote
unit tests for Semantic MediaWiki which was useful for me and will be for future contributors.
After redesigning the database layout I have been working on improving page read/write times by doing hash-based checks before querying the database. When done at the lowest level, I started to look for performance improvements in the Special Pages shipped with Semantic MediaWiki, and we planned to maintain Property-usage statistics to improve these Special Pages. As an amazing by-product of this feature, I introduced support for ' diff'ing the semantic data of a page (similar to what SemanticWatchList does, but in Semantic MediaWiki's core). I am currently working on improved handling of subobjects, and plan to later look into caching queries which is promising to be just as awesome. I hope my work will make SMW more practical to use for larger datasets, and so more websites can use it. |
” |
Nischay regularly updates a blog following his latest progress.
Not all fixes may have gone live to WMF sites at the time of writing; some may not be scheduled to go live for several weeks.
In the light of recent questions over the long-term reliability of Wikimedia wikis, the Signpost caught up with CT Woo, the Wikimedia Foundation's director of technical operations.
In the second of our series looking at this year's eight ongoing Google Summer of Code projects, the Signpost caught up with developer Nischay Nahata. Nischay is working on performance improvements to Semantic MediaWiki (SMW), a collection of extensions not in use on any Wikimedia Projects, but nevertheless boasting a significant list of adopters. SMW is also regarded as an influential player when it comes to deciding the course of MediaWiki's potential adoption of so-called "structured data" forms, which have recently come to prominence with the establishment of the Wikidata project. While SMW and Wikidata are distinct projects, there is an active exchange of ideas (and developers) between them. Nischay explained to the Signpost what he has been trying to accomplish, and what its broader impact might be:
“ | Semantic MediaWiki's continuous development has seen many changes and new datatypes introduced upon request. However, while these new data-types were introduced successfully to the front-end, the backend still stored them in the same way, requiring complex conversions to be undertaken whenever storing or retrieving data; such a system also required more space. My work has been focussed on redesigning the database to accommodate all supported data types in the most native way. This needed lots of re-factoring of code that depended on the old database architecture. While doing this I also introduced a special feature called "fixed tables" that lets a user shard (splinter) the database by using special tables for some "highly-used" properties. In addition, while waiting for code-review I wrote
unit tests for Semantic MediaWiki which was useful for me and will be for future contributors.
After redesigning the database layout I have been working on improving page read/write times by doing hash-based checks before querying the database. When done at the lowest level, I started to look for performance improvements in the Special Pages shipped with Semantic MediaWiki, and we planned to maintain Property-usage statistics to improve these Special Pages. As an amazing by-product of this feature, I introduced support for ' diff'ing the semantic data of a page (similar to what SemanticWatchList does, but in Semantic MediaWiki's core). I am currently working on improved handling of subobjects, and plan to later look into caching queries which is promising to be just as awesome. I hope my work will make SMW more practical to use for larger datasets, and so more websites can use it. |
” |
Nischay regularly updates a blog following his latest progress.
Not all fixes may have gone live to WMF sites at the time of writing; some may not be scheduled to go live for several weeks.
Discuss this story