This week we sat down with The Earwig to learn about his wikitext parser, mwparserfromhell.
What is mwparserfromhell, and how did it get its name?
{{citation needed|reason=This fact is important.{{citation needed|date=October 2014}}}}
), whether the template is between <nowiki> tags, and so on...What led you to develop it in the first place?
What were some of the challenges you faced or things that didn't go according to plan while developing the parser? How did you manage them?
{{{{{foo}}bar}}}
and {{{{{foo}}}bar}}
) and handling them as closely as possible to the way MediaWiki does. Sometimes this is hard, but other times it is outright impossible and we have to make guesses. For example, if we imagine that the template {{close ref}}
transcludes </ref>
and the parser encounters the wikicode <ref>{{cite web|…}}{{close ref}}
, it will appear as if the <ref>
tag does not end, even though it does. This is a limitation inherent in the nature of parsing wikicode: we have no knowledge of the contents of the template, so we can't figure out every situation. mwparserfromhell compromises as best as it can, by treating the <ref>
tag as ordinary text and fully parsing the two templates.How does mwparserfromhell compare to other re-implementations of the MediaWiki parser, like Parsoid?
What is the most significant challenge that mwparserfromhell currently faces, and why?
What's next for mwparserfromhell? Do you have any other cool projects you'd like to tell us about?
#REDIRECT
s) aren’t understood by mwparserfromhell, so I would like to implement those. There’s actually an open request to review some code for table support that I've been procrastinating on for a couple months now. Other than that, I have some plants to make it more efficient; mwpfh has some speed issues with ambiguous syntax on large pages.This week we sat down with The Earwig to learn about his wikitext parser, mwparserfromhell.
What is mwparserfromhell, and how did it get its name?
{{citation needed|reason=This fact is important.{{citation needed|date=October 2014}}}}
), whether the template is between <nowiki> tags, and so on...What led you to develop it in the first place?
What were some of the challenges you faced or things that didn't go according to plan while developing the parser? How did you manage them?
{{{{{foo}}bar}}}
and {{{{{foo}}}bar}}
) and handling them as closely as possible to the way MediaWiki does. Sometimes this is hard, but other times it is outright impossible and we have to make guesses. For example, if we imagine that the template {{close ref}}
transcludes </ref>
and the parser encounters the wikicode <ref>{{cite web|…}}{{close ref}}
, it will appear as if the <ref>
tag does not end, even though it does. This is a limitation inherent in the nature of parsing wikicode: we have no knowledge of the contents of the template, so we can't figure out every situation. mwparserfromhell compromises as best as it can, by treating the <ref>
tag as ordinary text and fully parsing the two templates.How does mwparserfromhell compare to other re-implementations of the MediaWiki parser, like Parsoid?
What is the most significant challenge that mwparserfromhell currently faces, and why?
What's next for mwparserfromhell? Do you have any other cool projects you'd like to tell us about?
#REDIRECT
s) aren’t understood by mwparserfromhell, so I would like to implement those. There’s actually an open request to review some code for table support that I've been procrastinating on for a couple months now. Other than that, I have some plants to make it more efficient; mwpfh has some speed issues with ambiguous syntax on large pages.
Discuss this story
First of all, it's great to see progress on making it easier to edit our content using a variety of tools.
That said, I think that it's worth looking more closely at Parsoid. It also provides a well-defined tree structure, but covers basically every aspect of wikitext. It even marks up multi-template content in a way that makes it easy to replace the entire block of templated content.
The DOM structure it provides can be edited by bots, gadgets or external services like content translation (see a list of current users). There is no limitation to manual editing; any method of manipulating HTML will work. A combination of several algorithms ( video) is used to avoid dirty diffs (unintended changes in the wikitext).
We are very interested in improving Parsoid further for bots and other uses. Let us know about your needs. You can find us on IRC in #mediawiki-parsoid. -- GWicke ( talk) 14:36, 17 October 2014 (UTC) reply
<span about="#mwt1" typeof="mw:Transclusion" data-parsoid='{"dsr":[0,31,null,null],"pi":[[{"k":"1","spc":["","","",""]}]]}' data-mw='{"parts":[{"template":{"target":{"wt":"foo","href":"./Template:Foo"},"params":{"1":{"wt":"{{bar|{{baz|abc=123}}}}"}},"i":0}}]}'></span>
, and I'm not sure how I could, say, use this to read the value of the "abc" parameter in {{baz}}. Would I need to use Parsoid again on the value of that "wt" key or am I missing something? Part of mwpfh's usefulness for bots is that the trees it generates have methods for common wikicode manipulation – there are simple functions for adding template parameters and the like, modifying and traversing the tree, etc. As far as I know, Parsoid is focused solely on the parsing aspect and doesn't support this kind of stuff directly, but it raises question of whether it could be useful as an alternate backend for mwpfh. Would be annoying to have to deal with outsourcing queries from Python to a node.js subprocess, but it could be an interesting experiment. — Earwig talk 17:41, 17 October 2014 (UTC) reply<ref>foo{{close ref}}
thing… it does not work as described here (as I expected, because I know how the parser works). You will be better off using mw:API:Expandtemplates with thegeneratexml
option instead. (I would avoid Parsoid too, it has similar warts.) — Keφr 21:33, 20 October 2014 (UTC) reply