Every page says that "Fix those 5 problems with bad syntax". I'd rather use good or appropriate one :-) - Skysmith 07:56, 10 Nov 2004 (UTC)
Just note - if the sample text contains '<nowiki>' tags itself, they're not escaped (for example see (now deleted) second entry on 'Phylogenetic tree' on square-brackets-018.txt page) - this should be fixed (eg. replacing them with '<nowiki>') before next run, as it renders incorrectly and may be confusing. JohnyDog 13:54, 10 Nov 2004 (UTC)
This is a bit silly. nowiki tags should be used for when some markup should not render, not to help out bots. One can simply add articles with intentionally misplaced brackets to an exclusion lists, in future. Dysprosia 03:21, 13 Dec 2004 (UTC)
<nowiki>
tag is really that anything inside should be displayed literally rather than interpreted as wiki-markup, which is a reasonable abstraction of what the project is partly intended to address. On the gripping hand, simply marking a whole article as "don't touch this" is bound to cause problems if other errors come to be introduced further down the line—I don't think any of us are fooling ourselves that no contributor is ever going to bork up the mark-up in a fixed article ever again. HTH HAND --
Phil |
Talk 15:26, Dec 13, 2004 (UTC)
Because:
This line has a leading space, but links and formatting still work, so the Wiki syntax of this line still matters.
This line is in tt tags, but links and formatting still work, so the Wiki syntax of this line still matters.
And code tags are already treated as special (they get treated identically to nowiki tags) - so if people would rather use <code> instead of <nowiki>, then that's no problem - from the point of view of this project it's all the same. All the best, -- Nickj 23:25, 13 Dec 2004 (UTC)
So leading-space markup is excluded or isn't it? I say this because most cases where intentionally misplaced bracketing is used, is used with leading-space markup. Dysprosia 06:25, 17 Dec 2004 (UTC)
If it has a leading space, it is still being checked. You've haven't told us specifically what it is that you're concerned about, but reading between the lines, is it that you're mostly concerned about source code examples? Things like:
c = x[2]; b = y[4][2];
If so, they're OK (because the brackets are balanced). It's only where the brackets aren't balanced, like:
z = x[2]]
or are split across lines :
x = y [ 2 ];
that we'll notice it. In which case, it can be surrounded by nowiki or code tags. Note also for square brackets that a complete pass was made of all articles in November (full lists made, and all problems fixed), so if the articles you're thinking about are older than November, then they've probably already been done. Basically in all types of checks (with the single exception of standard parentheses), we've now done a first pass, and for some things (e.g. redirects, double quotes) we've now done 2 passes, and for some things we've even done third passes (triple quotes, braces-tables, headings). Each successive pass gets smaller and smaller, until eventually we're only fixing the recently added stuff that was malformed (things seem to reach this stage after completing 3 passes). Hope that helps. If not, can you maybe indicate which article/articles and which part of them you're concerned about? (At the moment we're both talking in the abstract). All the best, -- Nickj 07:10, 17 Dec 2004 (UTC)
[[SomeObject something:[someInstance somethingElse] method:x] method3]
Well, the current situation for the Objective-C line shown above is that it would be listed in the Square Brackets category, as having an unbalanced "[[ and ] and ]", as the two middle square brackets cancel each other out, and the double square brackets are different to two single square brackets (so they don't cancel each other out). When someone went to "fix" this, they would probably either do this:
<nowiki>[[SomeObject something:[someInstance somethingElse] method:x] method3]</nowiki>
Or possibly even this:
<nowiki>[[</nowiki>SomeObject something:[someInstance somethingElse] method:x<nowiki>]</nowiki> method3<nowiki>]</nowiki>
Err ... Which is, of course, extremely easy to read! ;-)
OK, so that's the current situation, but how about this: At the moment we don't treat <pre> tags as special. However testing them has shown that they actually are special - they're basically a leading space plus a nowiki tag, all in one easy short tag. For example:
On leading space lines, links work (i.e. we should check the wiki syntax).
Whereas in <pre> tags, [[links]] do not work (i.e. we should not check the wiki syntax).
So what if instead of this (line with leading space):
[[SomeObject something:[someInstance somethingElse] method:x] method3]
You did this (line surrounded by pre):
[[SomeObject something:[someInstance somethingElse] method:x] method3]
And I changed the software so that handling of pre tags was improved. That way your source code example is still clean and readable, and we're eliminating false positives. Would that be an acceptable compromise? All the best, -- Nickj 22:59, 17 Dec 2004 (UTC)
It's kind of annoying to have to wrap things in pre tags, isn't it? That's why the leading-space markup is so useful (and should be used most of the time in preference to pre because it is blindingly clear in the wikitext, plus you can use markup for highlighting also). I understand you want to trap all the cases where mismatched links can arise, but realistically, when are links actually used in leading-space markup? I can't immediately see any other reasonable way out of this. Dysprosia 11:33, 18 Dec 2004 (UTC)
Perhaps it is possible to use a bit of special logic to exclude the Objective-C cases? For example, you could do something like the following
which may be trap the cases for you already, and excludes the Objective-C cases and half-open interval cases automatically (of course, implementing it is another matter ;) Dysprosia 02:01, 21 Dec 2004 (UTC)
May I get some feedback on this please? Dysprosia 08:03, 31 Dec 2004 (UTC)
I've been on Christmas break, hence the slow reply. To be honest, the answer to your question is "no, I don't think pre tags are annoying". You might wish that leading space tags meant "don't apply any wiki syntax", but the fact is, they don't mean that. Pre tags do. Nowiki tags do. But leading space lines don't. Consequently, it is appropriate to check the syntax of those lines. Also, with the checking, this project (unashamedly) is applying a more stringent level of syntax checking than the Wikipedia itself does. This might seem strange, but the facts are that the vast bulk of things that get caught as errors really are wiki syntax errors. Furthermore, being stringent is also useful because nobody can truly guarantee that every future version of the Wikipedia will be feature-for-feature and bug-for-bug rendering compatible with the current version - so things that display fine at the moment with invalid syntax may display wrong in the future. I saw examples of this during the recent 1.4 upgrade on user pages - things that rendered fine in 1.3, now rendered wrong (and I suspect part of the reason I didn't see this on article pages was in part because of this project). Consequently, I think being strict about syntax, and tagging things (as unobtrusively as possible) that look like syntax errors (but are not) is the correct approach. Personally, I don't think your suggestions are the right approach, but if you'd like to implement your suggestions, you can - I'll send you a ZIP of the source code, and you can send me a patch. Just send me a quick email. -- All the best, Nickj (t) 00:09, 2 Jan 2005 (UTC)
I love the [[Wikipedia:WikiProject Wiki Syntax|Please return the favour by clicking here to fix someone else's Wiki Syntax]] innovation, which seems to work well. Mind, I'd like to see your Data Protection registration for the Thank You to Contributors list ... I have a sneaking suspicion that the names will be dragooned into later projects, having shown themselves susceptible to this sort of appeal ;) -- Tagishsimon (talk)
It would also be good to replaces all those double hyphens in the Wikipedia articles with proper dashes. I'm not sure where to suggest this or how to start, but this seems a good place. Shantavira 10:52, 11 Nov 2004 (UTC)
I have come across something whilst working on the double-quotes sections. I have seen the unclosed ''This article incorporates text from the public domain 1911 Encyclopædia Britannica. I think it's better to replace that line with {{1911}} which results in This article incorporates text from the public domain 1911 Encyclopædia Britannica. and also lists the article in the 1911 Britannica category. I have taken it upon myself to add the info to each of the remaining double-quotes pages. -- Martin TB 12:05, 11 Nov 2004 (UTC)
Please be careful. I've seen many problem edits in several mathematics articles as a result of this page. For example some editors considering set notations like: {1, 2, {3, 4}} to be wrong, or replacing half-open interval notations like "[a, b)" with"[a, b]" or massive "nowiki" block insertions, which are not ideal. Perhaps some serious cautionary wording as well as example of "false positives" to avoid needs to be included on this project page. Paul August 19:10, Nov 11, 2004 (UTC)
I'm certain that there are missing links in here. I just can't find them, for example, Wikipedia:WikiProject Wiki Syntax/ordinary-brackets-001.txt is a valid page, that I got to by changing the URL, but it is not listed, nor is it commented out. In the last... day or so, this page has been massacared (heh - sp.) and there are must be *dozens* of valid links that are not mentioned. Am I just being dumb or have they really been removed for no reason? They don't seem to be complete or anything. Estel 19:18, Nov 12, 2004 (UTC)
When I edited Varda, Greece for unclosed bold text, I found an unclosed <div> block that narrowed the rest of the page, and character entities without the final semicolons, which are also found on many pages for Japanese towns. Can these be searched for? Susvolans 12:00, 15 Nov 2004 (UTC)
I've noticed a lot of changes recently closing italicization marks ('') that I use for endline comments in pseudocode. I actually left them unclosed on purpose, letting the end-of-line close them — should I avoid doing this? Deco 20:36, 15 Nov 2004 (UTC)
Would it be possible to make a fresh search before the existing to-do list is finished, to keep people busy? Susvolans 10:54, 17 Nov 2004 (UTC)
I think your algorithm is finding false positives in those situations where a user chooses to itemize arguments or details. For example, at Bee learning and communication, someone wrote "The primary lines of evidence used by the odor plume advocates are 1) clinical experiments with odorless sugar sources which show that worker bees are unable to recruit to those sources and 2) logical difficulties of a small-scale dance..." Wrp103 followed your instructions and changed those to "(1)..." and "(2)...". This may be a small point, but it is stylistically incorrect. Surrounding the number on both sides indicates a footnote, not a segmented argument.
I imagine that this is a rare problem. In many situations, the segmented arguments can be displayed as either a bulleted or numbered list. However, there are some articles where that layout just does not make sense. Please do not arbitrarily close the parentheses unless it really is a grammatical mistake. Thanks. Rossami (talk) 12:18, 19 Nov 2004 (UTC)
It appears that the string ]]) is causing the software to pick up an unclosed "(". For instance, the string and John (who was a [[pilot]]) gained fame will complain of an unclosed opening parenthesis. grendel| khan 07:14, 2004 Nov 20 (UTC)
Will the next run search the Category: and Template: namespaces? Susvolans 13:17, 22 Nov 2004 (UTC)
The third batch has commenced, using the database dump from yesterday. Currently there are only redirect problems listed, namely:
These two categories are new to this run. Lists for the other categories are being generated now, and will be added once they're finished - if all goes smoothly, it should take around 34 hours for this happen. All the best, Nickj 06:06, 28 Nov 2004 (UTC)
Folks, the remainder of the third batch has now been added. By the way, if you're wondering whether we're having an effect, the answer is an emphatic "Yes!". Consider the number of problems found in this batch:
That gives a total of 3924 entries. The second run found 15000 entries in the same categories. 3924 / 15000 is equal to 26% - I.e. we have eliminated 74% of these problems! Pretty damn impressive! All the best, -- Nickj 05:43, 1 Dec 2004 (UTC)
In response to Nick's call: I will produce a more extensive list of html errors with my Wikipedia to TomeRaider conversion script, with some documentation, and think about the feasibility of a stripped down version of the script for validation purposes only.
It would be nice if a tool and docu set could easibly be applied to other Wikipedias as well. Compare the bot for finding interwiki links, which runs on many Wikipedias now.
Is there a method/procedure to flag warnings as 'false positives', so that they do not reappear in consecutive runs? Erik Zachte 10:27, 1 Dec 2004 (UTC)
Redirects to titles with URL escapes should be detected and fixed, for these redirects fail to work properly. See [[Cimarr%E3o]] for an example: It should redirect to [[Chimarr%E3o]], but doesn't. [[User:Poccil| Peter O. ( Talk, automation script)]] 20:49, Dec 1, 2004 (UTC)
Possible wrinkle: some editors use an empty <div /> to add an "id" anchor for an internal link. If you don't want to put the work in replacing these with more orthodox footnotes, leave them in place and make a note for next time around.
Yes, these tags could be "fixed" by closing the DIV tag in such a manner, but there are two objections:
-- Phil | Talk 16:28, Dec 17, 2004 (UTC)
I don't currently know anything about Wiki footnotes, so I'm not qualified to comment on them. On XHTML (and I could definitely be wrong here), but aren't div tags supposed to be closed in XHTML though? This bit of the spec says that for non-empty elements, end tags are required, and in the XHTML DTD it defines div tags as a block-level element (like paragraphs, tables, etc, all of which also have to be closed). So doesn't that mean divs must be closed? I could easily be wrong though, and please correct me if I am. The reason I'm trying to clarify this is that if they really are valid when unclosed, then as Phil says it's a bad idea to keep detecting them as malformed syntax. All the best, -- Nickj 21:43, 17 Dec 2004 (UTC)
<div id="xxx" />
is both the opening and closing tag, because it is a shorthand for <div id="xxx"></div>
. –
AB
CD 17:14, 18 Dec 2004 (UTC)<p/>
and (more appositely) <br "clear="all"/>
are legitimate. HTH HAND --
Phil |
Talk 09:23, Dec 20, 2004 (UTC)<p />
and <br "clear="all" />
. See
[1].
Erik Zachte 12:19, 20 Dec 2004 (UTC)How about a project to replace non ISO-8859-1 characters with their correct equivalents? For example, € becomes €. These invalid characters are bad because they tend to get replaced by ?, automatically by the browser when someone edits. -- Dbenbenn 08:25, 19 Dec 2004 (UTC)
The square bracket pages often included the section name as part of the URL, so that when you clicked on the link, it would position you to the section with the problem. Since starting on the parens section, I noticed that this doesn't do that. It was a great help, and if you could add that to this section, it would make life easier. ;^) wrp103 (Bill Pringle) - Talk 05:12, 22 Dec 2004 (UTC)
After doing 120+ pages of bracket fixing, I hereby declare the term "parenthesis" to be a new form of mental illness... -- Plek 03:14, 12 Jan 2005 (UTC)
The last of the parentheses is slain! Huzzah! Free drinks for everybody (rings bell)! -- Plek 23:09, 12 Jan 2005 (UTC)
Nice job folks, next <small> run should attest to your efforts. — Davenbelle 00:37, Jan 13, 2005 (UTC)
I'll have an exclamation pint. 68.88.234.52 21:53, 22 Jan 2005 (UTC)
I'll have a small Single malt Scotch, although I only did a little. Henry Troup 00:01, 2 Feb 2005 (UTC)
What queries were used to make this? r3m0t 18:37, 12 Feb 2005 (UTC)
It's written in PHP currently. IMHO, PHP is fast enough. It does take a while to run (around 60 hours), but it's doing 3 different things at once in that time, to every "proper" (namespace = 0) article in the Wikipedia, namely:
The slowest of these is the suggesting wiki links, since it involves checking whether every word (and some word combinations) in every article has a matching article or redirect of the same name. Given this, I don't think 60 hours is unreasonable, and I'm not sure that rewriting it in another language would make it significantly faster (I could definitely be wrong though!). -- All the best, Nickj (t) 22:11, 21 Feb 2005 (UTC)
I've picked out the benchmarks most obviously involved in string manipulation. Well, I guess I'll reimplement it, for my own entertainment. So the (opening) tokens are: " ( { [ [[ '' ''' {| " <math> <tt> &" and their closing tokens are " ) } ] ]] '' ''' |} " </math> </tt> ;" correct? r3m0t 07:35, Feb 23, 2005 (UTC)
Those are some quite big speed differences! And if you're willing to implement a syntax checker, that's great because the more the merrier as far as I'm concerned ;-) With the wiki tokens, there are some multi-line tokens, and some single line ones. I've copied and pasted the code I'm using below, and tried to remove any stuff that's irrelevant to the area of syntax checking:
<?php
// Purpose: Wiki Syntax functions
// License: GNU Public License (v2 or later)
// Author: Nickj
// -------- format handling ----------------
/*
** @desc: handles the stack for the formatting
*/
function formatHandler($string, &$formatStack, $reset = false) {
static $in_nowiki, $in_comment, $in_math, $in_code;
if (!isset($in_nowiki) || $reset) {
$in_nowiki = false;
$in_comment = false;
$in_math = false;
$in_code = false;
}
// don't bother processing an empty string.
$string = trim($string);
if ($string == "") return;
$pattern = "%(''')|('')|" // Wiki quotes
. "(\[\[)|(\[)|(]])|(])|" // Wiki square brackets
. "(\{\|)|(\|\}\})|(\|\})|" // Wiki table open & Close + infobox close.
. "(\{\{)|(\}\})|" // Transclude open and close
. "(<!--)|(-->)|" // Comment open and close
. "(====)|(===)|(==)|" // Wiki headings
. "(<math>)|(</math>)|" // Math tags
. "(<nowiki>)|(</nowiki>)|" // Nowiki tags
. "(<code>)|(</code>)|" // Code tags
. "(<div)|(</div>)%i"; // div tags
$matches = preg_split ($pattern, strtolower($string), -1, PREG_SPLIT_DELIM_CAPTURE | PREG_SPLIT_NO_EMPTY);
foreach ($matches as $format) {
if ($format == "<nowiki>") {
if ($in_nowiki == false) addRemoveFromStack($format, $format, false, $formatStack, $string);
$in_nowiki = true;
}
else if ($format == "</nowiki>") {
if ($in_nowiki == true) addRemoveFromStack($format, "<nowiki>", false, $formatStack, $string);
$in_nowiki = false;
}
else if ($format == "<math>") {
if ($in_math == false) addRemoveFromStack($format, $format, false, $formatStack, $string);
$in_math = true;
}
else if ($format == "</math>") {
if ($in_math == true) addRemoveFromStack($format, "<math>", false, $formatStack, $string);
$in_math = false;
}
else if ($format == "<!--") {
if ($in_comment == false) addRemoveFromStack($format, $format, false, $formatStack, $string);
$in_comment = true;
}
else if ($format == "-->") {
if ($in_comment == true) addRemoveFromStack($format, "<!--", false, $formatStack, $string);
$in_comment = false;
}
else if ($format == "<code>") {
if ($in_code == false) addRemoveFromStack($format, $format, false, $formatStack, $string);
$in_code = true;
}
else if ($format == "</code>") {
if ($in_code == true) addRemoveFromStack($format, "<code>", false, $formatStack, $string);
$in_code = false;
}
else if (!$in_math && !$in_nowiki && !$in_comment && !$in_code) {
if ($format == "'''") {
addRemoveFromStack($format, $format, true, $formatStack, $string);
}
else if ($format == "''") {
addRemoveFromStack($format, $format, true, $formatStack, $string);
}
else if ($format == "[[") {
addRemoveFromStack($format, $format, false, $formatStack, $string);
}
else if ($format == "[") {
addRemoveFromStack($format, $format, false, $formatStack, $string);
}
else if ($format == "]]") {
addRemoveFromStack($format, "[[", false, $formatStack, $string);
}
else if ($format == "]") {
addRemoveFromStack($format, "[", false, $formatStack, $string);
}
else if ($format == "{|") {
addRemoveFromStack($format, $format, false, $formatStack, $string);
}
else if ($format == "|}") {
addRemoveFromStack($format, "{|", false, $formatStack, $string);
}
else if ($format == "====") {
addRemoveFromStack($format, $format, true, $formatStack, $string);
}
else if ($format == "===") {
addRemoveFromStack($format, $format, true, $formatStack, $string);
}
else if ($format == "==") {
addRemoveFromStack($format, $format, true, $formatStack, $string);
}
else if ($format == "{{") {
addRemoveFromStack($format, $format, false, $formatStack, $string);
}
else if ($format == "}}") {
addRemoveFromStack($format, "{{", false, $formatStack, $string);
}
else if ($format == "|}}") {
addRemoveFromStack($format, "{{", false, $formatStack, $string);
}
else if ($format == "<div") {
addRemoveFromStack("<div>", "<div>", false, $formatStack, $string);
}
else if ($format == "</div>") {
addRemoveFromStack($format, "<div>", false, $formatStack, $string);
}
}
}
}
/*
** @desc: Given a type of formatting, this adds it to, or removes it from, the stack (as appropriate).
*/
function addRemoveFromStack($format, $start_format, $same_start_and_end, &$stack, $string) {
// if it is there, remove it from the stack, as long is not start of format
if (isset($stack$start_format]) && ($same_start_and_end || $format != $start_format)) {
array_pop($stack$start_format]);
if (empty($stack$start_format])) unset($stack$start_format]);
}
// otherwise, add it, and the string responsible for it.
else {
$stack$format][] = $string;
}
}
/*
** @desc: returns whether a format is a multi-line or a single line format.
*/
function is_single_line_format($format) {
if ($format == "'''" || $format == "''" ||
$format == "[[" || $format == "]]" ||
$format == "[" || $format == "]" ||
$format == "====" || $format == "===" || $format == "==" ||
$format == "(" || $format == ")" ) {
return true;
}
return false;
}
/*
** @desc: takes a wiki string, and removes the newlines, &'s, >'s, and <'s.
*/
function neuterWikiString($string) {
// remove newline chars, and escape '<' and '>' and '&' (note that & needs to come first)
return str_replace( array ("\n", "&", "<", ">"), array(" ", "&amp;", "&lt;", "&gt;"), $string);
}
/*
** @desc: checks the formatting of a line, and logs an errors found.
*/
function checkLineFormatting($page_title, $full_line, &$formatting_stack) {
// the temp array for storing the section heading parsing output
$section_array = array();
// If this is a section heading, then store this.
if (preg_match("/^={2,4}([^=]+)={2,4}$/", trim($full_line), $section_array)) {
$section = trim($section_array1]);
$heading_line = true;
}
// if we are still formatting
if (!empty($formatting_stack)) {
// don't report any heading problems if we're not in a heading line.
if (!$heading_line) {
if (isset($formatting_stack"=="])) unset($formatting_stack"=="]);
if (isset($formatting_stack"==="])) unset($formatting_stack"==="]);
if (isset($formatting_stack"===="])) unset($formatting_stack"===="]);
}
$format_string = "";
// for each misplaced bit of formatting
foreach (array_keys($formatting_stack) as $format) {
// only consider single-line formatting at this point
if (is_single_line_format($format)) {
// save this format string.
if ($format_string != "") {
$format_string .= " and ";
}
$format_string .= "$format";
// remove it from the stack
unset($formatting_stack$format]);
}
}
// if there were any formatting problems, then save those now.
if ($format_string != "") {
// save the formatting problem to the DB.
dbSaveMalformedPage(addslashes($page_title), addslashes($format_string), addslashes(neuterWikiString($full_line)), addslashes($section));
}
}
}
// --------------------------------------------------------
/*
Then the usage is like this:
// for each article in the wikipedia, set $page_title
$formatting_stack = array();
// reset the static vars in the format handler
formatHandler("", $formatting_stack, true);
// for each $line in the article text of the $page_title article
formatHandler($line, $formatting_stack);
checkLineFormatting($page_title, $line, $formatting_stack);
// end for
// then save any full-page formatting problems.
foreach (array_keys($formatting_stack) as $format) {
dbSaveMalformedPage(addslashes($page_title), addslashes($format), "", "");
}
// end for
*/
?>
Here's everything that I'm currently aware of that's wrong in the above code, or potentially missing from it:
Hope that helps! -- All the best, Nickj (t) 23:17, 24 Feb 2005 (UTC)
can you please add a summary of the change instead of just saying "fixed wiki syntax"? say "[[test] --> [[test]] Fix wikilink syntax blah blah blah". Then we don't have to go to each article to search every small thing you changed to find bad "fixes". - Omegatron 04:55, Mar 12, 2005 (UTC)
Can you please be more specific about what we did that was bad? For example, is there a particular error that we're misdetecting? If so, please let me know. Please realise that we're not perfect, but we're honestly not trying to introduce problems.
With the current batch, there's just two types of errors listed at the moment, namely:
Which one of these was wrong? They're both fairly straightforward transformations, and hopefully neither should introduce new errors (but for example in the double-redirect case, if it was wrong for A to redirect B, and we then change A to redirect to C, then the source of the error was that A redirected to B, not that we changed A to redirect to C). -- All the best, Nickj (t) 07:42, 12 Mar 2005 (UTC)
I've hit a wall on Wikipedia:WikiProject Wiki Syntax/square-brackets-001.txt, regarding pages containing several multi-line image descriptions, such as Apollodotus I, Apollodotus II and Apollophanes. I doubt that squashing these descriptions into a single line would be an acceptable solution. Would it therefore be alright to consider these pages "fixed", and rip them out accordingly? Fbriere 20:03, 23 Mar 2005 (UTC)
This seems to be valid syntax for opening and closing a comment, i.e.
<!-->comment<!-->
and should probably be ignored, as they're valid and a complete waste of time to "fix". -- Jshadias 23:07, 23 Mar 2005 (UTC)
The following appears on several articles:
Would it be possible to create a bot to take care of these? (Though I notice Google only shows 20 such articles. If this is accurate, I guess manual work would still be cheaper...)
This is a very cool project, but...
It's commonly accepted in software development that it's a lot cheaper to fix a bug before the product is released than to release it and then have to go back and fix problems. It seems we would do well to do this sort of syntax checking right on the edit page (make it part of the "Show preview" function) instead of finding them in batch mode later. -- RoySmith 01:02, 26 Mar 2005 (UTC)
The {{msg:}} syntax for templates is deprecated as of 1.5 where {{msg:foo}} will simply transclude Template:Msg:foo instead of Template:Foo, here's a list of pages from the 2005-03-09 dump that still use the syntax:
The one left on Wikipedia:WikiProject Wiki Syntax/div-tags-000.txt is Main Page/French, and it looks too hard to do by hand. So if anybody has a good HTML fixing program to use on that, go ahead. Or I suppose we could just forget about that page, since it seems to be dead (last edit was Dec 14, 2004). -- Kenyon 04:15, May 16, 2005 (UTC)
Is this at all necessary? Moving the links of the entries to a completely different table and also striking them out. I could see just keeping them in the one table and strikinging them out, or maybe moving them to a separate table, but not both. I'd like to join the two tables and keep the strike-outs. Anyone have an opinion on the matter? – Quoth 09:59, 20 May 2005 (UTC)
Hello,
I've generated another list of double redirects as of the 20050516 database dump at User:triddle/double_redirect/20050516. I did not want to edit the project page since I'm not a member of this project. Perhaps someone else knows the best way to try to integrate my list with this project? Thanks. Triddle 21:18, Jun 23, 2005 (UTC)
Hello,
I've been getting pretty good at analyzing the dump files with perl and getting useful stuff done. I am curious if I could help work with this project? How are you preparing your lists? If you are having problems beating really hard on SQL databases then I might be able to help by having it done through analysis of the dump files. Let me know if you think I can help. Triddle 06:42, Jun 26, 2005 (UTC)
I have some AlMac observations about possible similar interests between this project and the usability project. AlMac
For example, for the usability project, I suggested that there might be value in adding to the Tool box.
I just edited this page, please run some standard software to identify common typing errors, that I could fix right now. AlMac 4 July 2005 18:56 (UTC)
What was the script used to generate the vast lists of edit links for this project? It's needed desperately at WikiProject Disambiguation. -- Smack ( talk) 00:03, 24 July 2005 (UTC)
Every page says that "Fix those 5 problems with bad syntax". I'd rather use good or appropriate one :-) - Skysmith 07:56, 10 Nov 2004 (UTC)
Just note - if the sample text contains '<nowiki>' tags itself, they're not escaped (for example see (now deleted) second entry on 'Phylogenetic tree' on square-brackets-018.txt page) - this should be fixed (eg. replacing them with '<nowiki>') before next run, as it renders incorrectly and may be confusing. JohnyDog 13:54, 10 Nov 2004 (UTC)
This is a bit silly. nowiki tags should be used for when some markup should not render, not to help out bots. One can simply add articles with intentionally misplaced brackets to an exclusion lists, in future. Dysprosia 03:21, 13 Dec 2004 (UTC)
<nowiki>
tag is really that anything inside should be displayed literally rather than interpreted as wiki-markup, which is a reasonable abstraction of what the project is partly intended to address. On the gripping hand, simply marking a whole article as "don't touch this" is bound to cause problems if other errors come to be introduced further down the line—I don't think any of us are fooling ourselves that no contributor is ever going to bork up the mark-up in a fixed article ever again. HTH HAND --
Phil |
Talk 15:26, Dec 13, 2004 (UTC)
Because:
This line has a leading space, but links and formatting still work, so the Wiki syntax of this line still matters.
This line is in tt tags, but links and formatting still work, so the Wiki syntax of this line still matters.
And code tags are already treated as special (they get treated identically to nowiki tags) - so if people would rather use <code> instead of <nowiki>, then that's no problem - from the point of view of this project it's all the same. All the best, -- Nickj 23:25, 13 Dec 2004 (UTC)
So leading-space markup is excluded or isn't it? I say this because most cases where intentionally misplaced bracketing is used, is used with leading-space markup. Dysprosia 06:25, 17 Dec 2004 (UTC)
If it has a leading space, it is still being checked. You've haven't told us specifically what it is that you're concerned about, but reading between the lines, is it that you're mostly concerned about source code examples? Things like:
c = x[2]; b = y[4][2];
If so, they're OK (because the brackets are balanced). It's only where the brackets aren't balanced, like:
z = x[2]]
or are split across lines :
x = y [ 2 ];
that we'll notice it. In which case, it can be surrounded by nowiki or code tags. Note also for square brackets that a complete pass was made of all articles in November (full lists made, and all problems fixed), so if the articles you're thinking about are older than November, then they've probably already been done. Basically in all types of checks (with the single exception of standard parentheses), we've now done a first pass, and for some things (e.g. redirects, double quotes) we've now done 2 passes, and for some things we've even done third passes (triple quotes, braces-tables, headings). Each successive pass gets smaller and smaller, until eventually we're only fixing the recently added stuff that was malformed (things seem to reach this stage after completing 3 passes). Hope that helps. If not, can you maybe indicate which article/articles and which part of them you're concerned about? (At the moment we're both talking in the abstract). All the best, -- Nickj 07:10, 17 Dec 2004 (UTC)
[[SomeObject something:[someInstance somethingElse] method:x] method3]
Well, the current situation for the Objective-C line shown above is that it would be listed in the Square Brackets category, as having an unbalanced "[[ and ] and ]", as the two middle square brackets cancel each other out, and the double square brackets are different to two single square brackets (so they don't cancel each other out). When someone went to "fix" this, they would probably either do this:
<nowiki>[[SomeObject something:[someInstance somethingElse] method:x] method3]</nowiki>
Or possibly even this:
<nowiki>[[</nowiki>SomeObject something:[someInstance somethingElse] method:x<nowiki>]</nowiki> method3<nowiki>]</nowiki>
Err ... Which is, of course, extremely easy to read! ;-)
OK, so that's the current situation, but how about this: At the moment we don't treat <pre> tags as special. However testing them has shown that they actually are special - they're basically a leading space plus a nowiki tag, all in one easy short tag. For example:
On leading space lines, links work (i.e. we should check the wiki syntax).
Whereas in <pre> tags, [[links]] do not work (i.e. we should not check the wiki syntax).
So what if instead of this (line with leading space):
[[SomeObject something:[someInstance somethingElse] method:x] method3]
You did this (line surrounded by pre):
[[SomeObject something:[someInstance somethingElse] method:x] method3]
And I changed the software so that handling of pre tags was improved. That way your source code example is still clean and readable, and we're eliminating false positives. Would that be an acceptable compromise? All the best, -- Nickj 22:59, 17 Dec 2004 (UTC)
It's kind of annoying to have to wrap things in pre tags, isn't it? That's why the leading-space markup is so useful (and should be used most of the time in preference to pre because it is blindingly clear in the wikitext, plus you can use markup for highlighting also). I understand you want to trap all the cases where mismatched links can arise, but realistically, when are links actually used in leading-space markup? I can't immediately see any other reasonable way out of this. Dysprosia 11:33, 18 Dec 2004 (UTC)
Perhaps it is possible to use a bit of special logic to exclude the Objective-C cases? For example, you could do something like the following
which may be trap the cases for you already, and excludes the Objective-C cases and half-open interval cases automatically (of course, implementing it is another matter ;) Dysprosia 02:01, 21 Dec 2004 (UTC)
May I get some feedback on this please? Dysprosia 08:03, 31 Dec 2004 (UTC)
I've been on Christmas break, hence the slow reply. To be honest, the answer to your question is "no, I don't think pre tags are annoying". You might wish that leading space tags meant "don't apply any wiki syntax", but the fact is, they don't mean that. Pre tags do. Nowiki tags do. But leading space lines don't. Consequently, it is appropriate to check the syntax of those lines. Also, with the checking, this project (unashamedly) is applying a more stringent level of syntax checking than the Wikipedia itself does. This might seem strange, but the facts are that the vast bulk of things that get caught as errors really are wiki syntax errors. Furthermore, being stringent is also useful because nobody can truly guarantee that every future version of the Wikipedia will be feature-for-feature and bug-for-bug rendering compatible with the current version - so things that display fine at the moment with invalid syntax may display wrong in the future. I saw examples of this during the recent 1.4 upgrade on user pages - things that rendered fine in 1.3, now rendered wrong (and I suspect part of the reason I didn't see this on article pages was in part because of this project). Consequently, I think being strict about syntax, and tagging things (as unobtrusively as possible) that look like syntax errors (but are not) is the correct approach. Personally, I don't think your suggestions are the right approach, but if you'd like to implement your suggestions, you can - I'll send you a ZIP of the source code, and you can send me a patch. Just send me a quick email. -- All the best, Nickj (t) 00:09, 2 Jan 2005 (UTC)
I love the [[Wikipedia:WikiProject Wiki Syntax|Please return the favour by clicking here to fix someone else's Wiki Syntax]] innovation, which seems to work well. Mind, I'd like to see your Data Protection registration for the Thank You to Contributors list ... I have a sneaking suspicion that the names will be dragooned into later projects, having shown themselves susceptible to this sort of appeal ;) -- Tagishsimon (talk)
It would also be good to replaces all those double hyphens in the Wikipedia articles with proper dashes. I'm not sure where to suggest this or how to start, but this seems a good place. Shantavira 10:52, 11 Nov 2004 (UTC)
I have come across something whilst working on the double-quotes sections. I have seen the unclosed ''This article incorporates text from the public domain 1911 Encyclopædia Britannica. I think it's better to replace that line with {{1911}} which results in This article incorporates text from the public domain 1911 Encyclopædia Britannica. and also lists the article in the 1911 Britannica category. I have taken it upon myself to add the info to each of the remaining double-quotes pages. -- Martin TB 12:05, 11 Nov 2004 (UTC)
Please be careful. I've seen many problem edits in several mathematics articles as a result of this page. For example some editors considering set notations like: {1, 2, {3, 4}} to be wrong, or replacing half-open interval notations like "[a, b)" with"[a, b]" or massive "nowiki" block insertions, which are not ideal. Perhaps some serious cautionary wording as well as example of "false positives" to avoid needs to be included on this project page. Paul August 19:10, Nov 11, 2004 (UTC)
I'm certain that there are missing links in here. I just can't find them, for example, Wikipedia:WikiProject Wiki Syntax/ordinary-brackets-001.txt is a valid page, that I got to by changing the URL, but it is not listed, nor is it commented out. In the last... day or so, this page has been massacared (heh - sp.) and there are must be *dozens* of valid links that are not mentioned. Am I just being dumb or have they really been removed for no reason? They don't seem to be complete or anything. Estel 19:18, Nov 12, 2004 (UTC)
When I edited Varda, Greece for unclosed bold text, I found an unclosed <div> block that narrowed the rest of the page, and character entities without the final semicolons, which are also found on many pages for Japanese towns. Can these be searched for? Susvolans 12:00, 15 Nov 2004 (UTC)
I've noticed a lot of changes recently closing italicization marks ('') that I use for endline comments in pseudocode. I actually left them unclosed on purpose, letting the end-of-line close them — should I avoid doing this? Deco 20:36, 15 Nov 2004 (UTC)
Would it be possible to make a fresh search before the existing to-do list is finished, to keep people busy? Susvolans 10:54, 17 Nov 2004 (UTC)
I think your algorithm is finding false positives in those situations where a user chooses to itemize arguments or details. For example, at Bee learning and communication, someone wrote "The primary lines of evidence used by the odor plume advocates are 1) clinical experiments with odorless sugar sources which show that worker bees are unable to recruit to those sources and 2) logical difficulties of a small-scale dance..." Wrp103 followed your instructions and changed those to "(1)..." and "(2)...". This may be a small point, but it is stylistically incorrect. Surrounding the number on both sides indicates a footnote, not a segmented argument.
I imagine that this is a rare problem. In many situations, the segmented arguments can be displayed as either a bulleted or numbered list. However, there are some articles where that layout just does not make sense. Please do not arbitrarily close the parentheses unless it really is a grammatical mistake. Thanks. Rossami (talk) 12:18, 19 Nov 2004 (UTC)
It appears that the string ]]) is causing the software to pick up an unclosed "(". For instance, the string and John (who was a [[pilot]]) gained fame will complain of an unclosed opening parenthesis. grendel| khan 07:14, 2004 Nov 20 (UTC)
Will the next run search the Category: and Template: namespaces? Susvolans 13:17, 22 Nov 2004 (UTC)
The third batch has commenced, using the database dump from yesterday. Currently there are only redirect problems listed, namely:
These two categories are new to this run. Lists for the other categories are being generated now, and will be added once they're finished - if all goes smoothly, it should take around 34 hours for this happen. All the best, Nickj 06:06, 28 Nov 2004 (UTC)
Folks, the remainder of the third batch has now been added. By the way, if you're wondering whether we're having an effect, the answer is an emphatic "Yes!". Consider the number of problems found in this batch:
That gives a total of 3924 entries. The second run found 15000 entries in the same categories. 3924 / 15000 is equal to 26% - I.e. we have eliminated 74% of these problems! Pretty damn impressive! All the best, -- Nickj 05:43, 1 Dec 2004 (UTC)
In response to Nick's call: I will produce a more extensive list of html errors with my Wikipedia to TomeRaider conversion script, with some documentation, and think about the feasibility of a stripped down version of the script for validation purposes only.
It would be nice if a tool and docu set could easibly be applied to other Wikipedias as well. Compare the bot for finding interwiki links, which runs on many Wikipedias now.
Is there a method/procedure to flag warnings as 'false positives', so that they do not reappear in consecutive runs? Erik Zachte 10:27, 1 Dec 2004 (UTC)
Redirects to titles with URL escapes should be detected and fixed, for these redirects fail to work properly. See [[Cimarr%E3o]] for an example: It should redirect to [[Chimarr%E3o]], but doesn't. [[User:Poccil| Peter O. ( Talk, automation script)]] 20:49, Dec 1, 2004 (UTC)
Possible wrinkle: some editors use an empty <div /> to add an "id" anchor for an internal link. If you don't want to put the work in replacing these with more orthodox footnotes, leave them in place and make a note for next time around.
Yes, these tags could be "fixed" by closing the DIV tag in such a manner, but there are two objections:
-- Phil | Talk 16:28, Dec 17, 2004 (UTC)
I don't currently know anything about Wiki footnotes, so I'm not qualified to comment on them. On XHTML (and I could definitely be wrong here), but aren't div tags supposed to be closed in XHTML though? This bit of the spec says that for non-empty elements, end tags are required, and in the XHTML DTD it defines div tags as a block-level element (like paragraphs, tables, etc, all of which also have to be closed). So doesn't that mean divs must be closed? I could easily be wrong though, and please correct me if I am. The reason I'm trying to clarify this is that if they really are valid when unclosed, then as Phil says it's a bad idea to keep detecting them as malformed syntax. All the best, -- Nickj 21:43, 17 Dec 2004 (UTC)
<div id="xxx" />
is both the opening and closing tag, because it is a shorthand for <div id="xxx"></div>
. –
AB
CD 17:14, 18 Dec 2004 (UTC)<p/>
and (more appositely) <br "clear="all"/>
are legitimate. HTH HAND --
Phil |
Talk 09:23, Dec 20, 2004 (UTC)<p />
and <br "clear="all" />
. See
[1].
Erik Zachte 12:19, 20 Dec 2004 (UTC)How about a project to replace non ISO-8859-1 characters with their correct equivalents? For example, € becomes €. These invalid characters are bad because they tend to get replaced by ?, automatically by the browser when someone edits. -- Dbenbenn 08:25, 19 Dec 2004 (UTC)
The square bracket pages often included the section name as part of the URL, so that when you clicked on the link, it would position you to the section with the problem. Since starting on the parens section, I noticed that this doesn't do that. It was a great help, and if you could add that to this section, it would make life easier. ;^) wrp103 (Bill Pringle) - Talk 05:12, 22 Dec 2004 (UTC)
After doing 120+ pages of bracket fixing, I hereby declare the term "parenthesis" to be a new form of mental illness... -- Plek 03:14, 12 Jan 2005 (UTC)
The last of the parentheses is slain! Huzzah! Free drinks for everybody (rings bell)! -- Plek 23:09, 12 Jan 2005 (UTC)
Nice job folks, next <small> run should attest to your efforts. — Davenbelle 00:37, Jan 13, 2005 (UTC)
I'll have an exclamation pint. 68.88.234.52 21:53, 22 Jan 2005 (UTC)
I'll have a small Single malt Scotch, although I only did a little. Henry Troup 00:01, 2 Feb 2005 (UTC)
What queries were used to make this? r3m0t 18:37, 12 Feb 2005 (UTC)
It's written in PHP currently. IMHO, PHP is fast enough. It does take a while to run (around 60 hours), but it's doing 3 different things at once in that time, to every "proper" (namespace = 0) article in the Wikipedia, namely:
The slowest of these is the suggesting wiki links, since it involves checking whether every word (and some word combinations) in every article has a matching article or redirect of the same name. Given this, I don't think 60 hours is unreasonable, and I'm not sure that rewriting it in another language would make it significantly faster (I could definitely be wrong though!). -- All the best, Nickj (t) 22:11, 21 Feb 2005 (UTC)
I've picked out the benchmarks most obviously involved in string manipulation. Well, I guess I'll reimplement it, for my own entertainment. So the (opening) tokens are: " ( { [ [[ '' ''' {| " <math> <tt> &" and their closing tokens are " ) } ] ]] '' ''' |} " </math> </tt> ;" correct? r3m0t 07:35, Feb 23, 2005 (UTC)
Those are some quite big speed differences! And if you're willing to implement a syntax checker, that's great because the more the merrier as far as I'm concerned ;-) With the wiki tokens, there are some multi-line tokens, and some single line ones. I've copied and pasted the code I'm using below, and tried to remove any stuff that's irrelevant to the area of syntax checking:
<?php
// Purpose: Wiki Syntax functions
// License: GNU Public License (v2 or later)
// Author: Nickj
// -------- format handling ----------------
/*
** @desc: handles the stack for the formatting
*/
function formatHandler($string, &$formatStack, $reset = false) {
static $in_nowiki, $in_comment, $in_math, $in_code;
if (!isset($in_nowiki) || $reset) {
$in_nowiki = false;
$in_comment = false;
$in_math = false;
$in_code = false;
}
// don't bother processing an empty string.
$string = trim($string);
if ($string == "") return;
$pattern = "%(''')|('')|" // Wiki quotes
. "(\[\[)|(\[)|(]])|(])|" // Wiki square brackets
. "(\{\|)|(\|\}\})|(\|\})|" // Wiki table open & Close + infobox close.
. "(\{\{)|(\}\})|" // Transclude open and close
. "(<!--)|(-->)|" // Comment open and close
. "(====)|(===)|(==)|" // Wiki headings
. "(<math>)|(</math>)|" // Math tags
. "(<nowiki>)|(</nowiki>)|" // Nowiki tags
. "(<code>)|(</code>)|" // Code tags
. "(<div)|(</div>)%i"; // div tags
$matches = preg_split ($pattern, strtolower($string), -1, PREG_SPLIT_DELIM_CAPTURE | PREG_SPLIT_NO_EMPTY);
foreach ($matches as $format) {
if ($format == "<nowiki>") {
if ($in_nowiki == false) addRemoveFromStack($format, $format, false, $formatStack, $string);
$in_nowiki = true;
}
else if ($format == "</nowiki>") {
if ($in_nowiki == true) addRemoveFromStack($format, "<nowiki>", false, $formatStack, $string);
$in_nowiki = false;
}
else if ($format == "<math>") {
if ($in_math == false) addRemoveFromStack($format, $format, false, $formatStack, $string);
$in_math = true;
}
else if ($format == "</math>") {
if ($in_math == true) addRemoveFromStack($format, "<math>", false, $formatStack, $string);
$in_math = false;
}
else if ($format == "<!--") {
if ($in_comment == false) addRemoveFromStack($format, $format, false, $formatStack, $string);
$in_comment = true;
}
else if ($format == "-->") {
if ($in_comment == true) addRemoveFromStack($format, "<!--", false, $formatStack, $string);
$in_comment = false;
}
else if ($format == "<code>") {
if ($in_code == false) addRemoveFromStack($format, $format, false, $formatStack, $string);
$in_code = true;
}
else if ($format == "</code>") {
if ($in_code == true) addRemoveFromStack($format, "<code>", false, $formatStack, $string);
$in_code = false;
}
else if (!$in_math && !$in_nowiki && !$in_comment && !$in_code) {
if ($format == "'''") {
addRemoveFromStack($format, $format, true, $formatStack, $string);
}
else if ($format == "''") {
addRemoveFromStack($format, $format, true, $formatStack, $string);
}
else if ($format == "[[") {
addRemoveFromStack($format, $format, false, $formatStack, $string);
}
else if ($format == "[") {
addRemoveFromStack($format, $format, false, $formatStack, $string);
}
else if ($format == "]]") {
addRemoveFromStack($format, "[[", false, $formatStack, $string);
}
else if ($format == "]") {
addRemoveFromStack($format, "[", false, $formatStack, $string);
}
else if ($format == "{|") {
addRemoveFromStack($format, $format, false, $formatStack, $string);
}
else if ($format == "|}") {
addRemoveFromStack($format, "{|", false, $formatStack, $string);
}
else if ($format == "====") {
addRemoveFromStack($format, $format, true, $formatStack, $string);
}
else if ($format == "===") {
addRemoveFromStack($format, $format, true, $formatStack, $string);
}
else if ($format == "==") {
addRemoveFromStack($format, $format, true, $formatStack, $string);
}
else if ($format == "{{") {
addRemoveFromStack($format, $format, false, $formatStack, $string);
}
else if ($format == "}}") {
addRemoveFromStack($format, "{{", false, $formatStack, $string);
}
else if ($format == "|}}") {
addRemoveFromStack($format, "{{", false, $formatStack, $string);
}
else if ($format == "<div") {
addRemoveFromStack("<div>", "<div>", false, $formatStack, $string);
}
else if ($format == "</div>") {
addRemoveFromStack($format, "<div>", false, $formatStack, $string);
}
}
}
}
/*
** @desc: Given a type of formatting, this adds it to, or removes it from, the stack (as appropriate).
*/
function addRemoveFromStack($format, $start_format, $same_start_and_end, &$stack, $string) {
// if it is there, remove it from the stack, as long is not start of format
if (isset($stack$start_format]) && ($same_start_and_end || $format != $start_format)) {
array_pop($stack$start_format]);
if (empty($stack$start_format])) unset($stack$start_format]);
}
// otherwise, add it, and the string responsible for it.
else {
$stack$format][] = $string;
}
}
/*
** @desc: returns whether a format is a multi-line or a single line format.
*/
function is_single_line_format($format) {
if ($format == "'''" || $format == "''" ||
$format == "[[" || $format == "]]" ||
$format == "[" || $format == "]" ||
$format == "====" || $format == "===" || $format == "==" ||
$format == "(" || $format == ")" ) {
return true;
}
return false;
}
/*
** @desc: takes a wiki string, and removes the newlines, &'s, >'s, and <'s.
*/
function neuterWikiString($string) {
// remove newline chars, and escape '<' and '>' and '&' (note that & needs to come first)
return str_replace( array ("\n", "&", "<", ">"), array(" ", "&amp;", "&lt;", "&gt;"), $string);
}
/*
** @desc: checks the formatting of a line, and logs an errors found.
*/
function checkLineFormatting($page_title, $full_line, &$formatting_stack) {
// the temp array for storing the section heading parsing output
$section_array = array();
// If this is a section heading, then store this.
if (preg_match("/^={2,4}([^=]+)={2,4}$/", trim($full_line), $section_array)) {
$section = trim($section_array1]);
$heading_line = true;
}
// if we are still formatting
if (!empty($formatting_stack)) {
// don't report any heading problems if we're not in a heading line.
if (!$heading_line) {
if (isset($formatting_stack"=="])) unset($formatting_stack"=="]);
if (isset($formatting_stack"==="])) unset($formatting_stack"==="]);
if (isset($formatting_stack"===="])) unset($formatting_stack"===="]);
}
$format_string = "";
// for each misplaced bit of formatting
foreach (array_keys($formatting_stack) as $format) {
// only consider single-line formatting at this point
if (is_single_line_format($format)) {
// save this format string.
if ($format_string != "") {
$format_string .= " and ";
}
$format_string .= "$format";
// remove it from the stack
unset($formatting_stack$format]);
}
}
// if there were any formatting problems, then save those now.
if ($format_string != "") {
// save the formatting problem to the DB.
dbSaveMalformedPage(addslashes($page_title), addslashes($format_string), addslashes(neuterWikiString($full_line)), addslashes($section));
}
}
}
// --------------------------------------------------------
/*
Then the usage is like this:
// for each article in the wikipedia, set $page_title
$formatting_stack = array();
// reset the static vars in the format handler
formatHandler("", $formatting_stack, true);
// for each $line in the article text of the $page_title article
formatHandler($line, $formatting_stack);
checkLineFormatting($page_title, $line, $formatting_stack);
// end for
// then save any full-page formatting problems.
foreach (array_keys($formatting_stack) as $format) {
dbSaveMalformedPage(addslashes($page_title), addslashes($format), "", "");
}
// end for
*/
?>
Here's everything that I'm currently aware of that's wrong in the above code, or potentially missing from it:
Hope that helps! -- All the best, Nickj (t) 23:17, 24 Feb 2005 (UTC)
can you please add a summary of the change instead of just saying "fixed wiki syntax"? say "[[test] --> [[test]] Fix wikilink syntax blah blah blah". Then we don't have to go to each article to search every small thing you changed to find bad "fixes". - Omegatron 04:55, Mar 12, 2005 (UTC)
Can you please be more specific about what we did that was bad? For example, is there a particular error that we're misdetecting? If so, please let me know. Please realise that we're not perfect, but we're honestly not trying to introduce problems.
With the current batch, there's just two types of errors listed at the moment, namely:
Which one of these was wrong? They're both fairly straightforward transformations, and hopefully neither should introduce new errors (but for example in the double-redirect case, if it was wrong for A to redirect B, and we then change A to redirect to C, then the source of the error was that A redirected to B, not that we changed A to redirect to C). -- All the best, Nickj (t) 07:42, 12 Mar 2005 (UTC)
I've hit a wall on Wikipedia:WikiProject Wiki Syntax/square-brackets-001.txt, regarding pages containing several multi-line image descriptions, such as Apollodotus I, Apollodotus II and Apollophanes. I doubt that squashing these descriptions into a single line would be an acceptable solution. Would it therefore be alright to consider these pages "fixed", and rip them out accordingly? Fbriere 20:03, 23 Mar 2005 (UTC)
This seems to be valid syntax for opening and closing a comment, i.e.
<!-->comment<!-->
and should probably be ignored, as they're valid and a complete waste of time to "fix". -- Jshadias 23:07, 23 Mar 2005 (UTC)
The following appears on several articles:
Would it be possible to create a bot to take care of these? (Though I notice Google only shows 20 such articles. If this is accurate, I guess manual work would still be cheaper...)
This is a very cool project, but...
It's commonly accepted in software development that it's a lot cheaper to fix a bug before the product is released than to release it and then have to go back and fix problems. It seems we would do well to do this sort of syntax checking right on the edit page (make it part of the "Show preview" function) instead of finding them in batch mode later. -- RoySmith 01:02, 26 Mar 2005 (UTC)
The {{msg:}} syntax for templates is deprecated as of 1.5 where {{msg:foo}} will simply transclude Template:Msg:foo instead of Template:Foo, here's a list of pages from the 2005-03-09 dump that still use the syntax:
The one left on Wikipedia:WikiProject Wiki Syntax/div-tags-000.txt is Main Page/French, and it looks too hard to do by hand. So if anybody has a good HTML fixing program to use on that, go ahead. Or I suppose we could just forget about that page, since it seems to be dead (last edit was Dec 14, 2004). -- Kenyon 04:15, May 16, 2005 (UTC)
Is this at all necessary? Moving the links of the entries to a completely different table and also striking them out. I could see just keeping them in the one table and strikinging them out, or maybe moving them to a separate table, but not both. I'd like to join the two tables and keep the strike-outs. Anyone have an opinion on the matter? – Quoth 09:59, 20 May 2005 (UTC)
Hello,
I've generated another list of double redirects as of the 20050516 database dump at User:triddle/double_redirect/20050516. I did not want to edit the project page since I'm not a member of this project. Perhaps someone else knows the best way to try to integrate my list with this project? Thanks. Triddle 21:18, Jun 23, 2005 (UTC)
Hello,
I've been getting pretty good at analyzing the dump files with perl and getting useful stuff done. I am curious if I could help work with this project? How are you preparing your lists? If you are having problems beating really hard on SQL databases then I might be able to help by having it done through analysis of the dump files. Let me know if you think I can help. Triddle 06:42, Jun 26, 2005 (UTC)
I have some AlMac observations about possible similar interests between this project and the usability project. AlMac
For example, for the usability project, I suggested that there might be value in adding to the Tool box.
I just edited this page, please run some standard software to identify common typing errors, that I could fix right now. AlMac 4 July 2005 18:56 (UTC)
What was the script used to generate the vast lists of edit links for this project? It's needed desperately at WikiProject Disambiguation. -- Smack ( talk) 00:03, 24 July 2005 (UTC)