Operator: Xaosflux ( talk · contribs · SUL · edit count · logs · page moves · block log · rights log · ANI search)
Time filed: 03:39, Friday, July 22, 2016 ( UTC)
Automatic, Supervised, or Manual: Supervised
Programming language(s): n/a
Source code available: AWB
Function overview: HTML Fixes that are causing pages to be identified as Category:Pages using invalid self-closed HTML tags.
Links to relevant discussions (where appropriate): VPT#New maintenance category
Edit period(s): Ad-hoc batch runs
Estimated number of pages affected: open-ended :thousands of pages, edits to hundreds per run
Exclusion compliant (Yes/No): Yes
Already has a bot flag (Yes/No): Yes
Function details:
I've been working on cleaning up
Category:Pages using invalid self-closed HTML tags in advance of the upcoming code changes, mostly running from my own account. I would like to use my bot account primarily so that edits to User talk: can be made quietly using nominornewtalk
. This is primarily fixing the most common html errors:
Comment (and Support): I have fixed a few hundred of these and have found a similar percentage of false positives in editing with an AutoEd script, no matter how well I write my regexes. I agree that a supervised run, done carefully, should work well. Would you be willing to share your proposed regexes?
Just for clarity, I would like to see this task approved for all namespaces, not just User Talk. There is a lot of work to be done in Talk, Wikipedia Talk, and Wikipedia.
Feel free to crib from my AutoEd script at User:Jonesey95/AutoEd/month.js.
Also, you might look at Wikipedia:CHECKWIKI/WPC 002 dump for examples of pathological patterns that might be seen as problems or opportunities, e.g. <div id='Myerson'/> and </blockquote/>. – Jonesey95 ( talk) 05:31, 22 July 2016 (UTC) reply
id=
in span and div? —
JJMC89 (
T·
C)
06:10, 22 July 2016 (UTC)
reply
{{subst:anchor|Myerson}}
gives <span id="Myerson"></span>
.) —
JJMC89 (
T·
C)
14:36, 22 July 2016 (UTC)
replySupport: I'm glad someone is taking on the talk space portion of the error category, which makes up ~1167/2185 entries. ~623/1167 are archived talk pages though (Will touching those raise concerns? If so, can the MediaWiki software be made to ignore archives?). Non-archived User talk: only comprise 228/2185, so I definitely support expanding to more/all talk space. ~ Tom.Reding ( talk ⋅ dgaf) 12:23, 22 July 2016 (UTC) reply
nominornewtalk
flag that only bots have. —
xaosflux
Talk
16:27, 22 July 2016 (UTC)
reply<ref name=... />
to <ref name=...></ref>
like in
[1] and many other edits. The former is both allowed and recommended to invoke a reference which is defined elsewhere. See for example
Wikipedia:Citing sources#Repeated citations. It's an undocumented (as far as I know) feature that an empty <ref name=...></ref>
has the same effect. It will probably confuse most editors and I recommend reverting those changes. ref isn't even a html tag but defined by
mw:Extension:Cite. Can you post a complete list of the self-closed tags the bot is coded to change?
PrimeHunter (
talk)
23:27, 23 July 2016 (UTC)
replyref
out of my lists, and will revert any of those. —
xaosflux
Talk
00:22, 24 July 2016 (UTC)
reply
<ref name="name" />
→ <ref name="name"></ref>
, the edits look good. You way want to adjust (<span id=".*?" ?)\/>
to (<span id="[^">]+?") ?\/>
so that the regex isn't too greedy. Also, consider not adding the replacements to the edit summary; the long example PrimeHunter pointed out isn't really helpful. —
JJMC89 (
T·
C)
01:18, 24 July 2016 (UTC)
reply
<b />
; it seems to be about 60%/40% a typo that should be <br />
or a pointless tag that can be deleted. —
xaosflux
Talk
03:19, 25 July 2016 (UTC)
reply<cite />
tags - used almost the same way as the bad "refs" in trial one - I didn't attempt to repair these as I'm not exactly sure what our "best practice" for this tag is. —
xaosflux
Talk
03:32, 25 July 2016 (UTC)
reply<cite ... />
should be handled manually. In HTML 4 it is used for citations, but in HTML5 it is used to indicate the title of a work. In some cases <cite id="id" />
is being used like <span id="id" />
and should be converted to <span id="id"></span>
. Articles using <cite id="id" />
to indicate a reference should have the citation style converted to use <ref name="name" />
(
list-defined references) or a
shortened footnote style depending on the use. —
JJMC89 (
T·
C)
05:20, 25 July 2016 (UTC)
reply
<cite name="Foo" />
, where an editor typed "cite" instead of "ref". –
Jonesey95 (
talk)
15:09, 26 July 2016 (UTC)
replyOperator: Xaosflux ( talk · contribs · SUL · edit count · logs · page moves · block log · rights log · ANI search)
Time filed: 03:39, Friday, July 22, 2016 ( UTC)
Automatic, Supervised, or Manual: Supervised
Programming language(s): n/a
Source code available: AWB
Function overview: HTML Fixes that are causing pages to be identified as Category:Pages using invalid self-closed HTML tags.
Links to relevant discussions (where appropriate): VPT#New maintenance category
Edit period(s): Ad-hoc batch runs
Estimated number of pages affected: open-ended :thousands of pages, edits to hundreds per run
Exclusion compliant (Yes/No): Yes
Already has a bot flag (Yes/No): Yes
Function details:
I've been working on cleaning up
Category:Pages using invalid self-closed HTML tags in advance of the upcoming code changes, mostly running from my own account. I would like to use my bot account primarily so that edits to User talk: can be made quietly using nominornewtalk
. This is primarily fixing the most common html errors:
Comment (and Support): I have fixed a few hundred of these and have found a similar percentage of false positives in editing with an AutoEd script, no matter how well I write my regexes. I agree that a supervised run, done carefully, should work well. Would you be willing to share your proposed regexes?
Just for clarity, I would like to see this task approved for all namespaces, not just User Talk. There is a lot of work to be done in Talk, Wikipedia Talk, and Wikipedia.
Feel free to crib from my AutoEd script at User:Jonesey95/AutoEd/month.js.
Also, you might look at Wikipedia:CHECKWIKI/WPC 002 dump for examples of pathological patterns that might be seen as problems or opportunities, e.g. <div id='Myerson'/> and </blockquote/>. – Jonesey95 ( talk) 05:31, 22 July 2016 (UTC) reply
id=
in span and div? —
JJMC89 (
T·
C)
06:10, 22 July 2016 (UTC)
reply
{{subst:anchor|Myerson}}
gives <span id="Myerson"></span>
.) —
JJMC89 (
T·
C)
14:36, 22 July 2016 (UTC)
replySupport: I'm glad someone is taking on the talk space portion of the error category, which makes up ~1167/2185 entries. ~623/1167 are archived talk pages though (Will touching those raise concerns? If so, can the MediaWiki software be made to ignore archives?). Non-archived User talk: only comprise 228/2185, so I definitely support expanding to more/all talk space. ~ Tom.Reding ( talk ⋅ dgaf) 12:23, 22 July 2016 (UTC) reply
nominornewtalk
flag that only bots have. —
xaosflux
Talk
16:27, 22 July 2016 (UTC)
reply<ref name=... />
to <ref name=...></ref>
like in
[1] and many other edits. The former is both allowed and recommended to invoke a reference which is defined elsewhere. See for example
Wikipedia:Citing sources#Repeated citations. It's an undocumented (as far as I know) feature that an empty <ref name=...></ref>
has the same effect. It will probably confuse most editors and I recommend reverting those changes. ref isn't even a html tag but defined by
mw:Extension:Cite. Can you post a complete list of the self-closed tags the bot is coded to change?
PrimeHunter (
talk)
23:27, 23 July 2016 (UTC)
replyref
out of my lists, and will revert any of those. —
xaosflux
Talk
00:22, 24 July 2016 (UTC)
reply
<ref name="name" />
→ <ref name="name"></ref>
, the edits look good. You way want to adjust (<span id=".*?" ?)\/>
to (<span id="[^">]+?") ?\/>
so that the regex isn't too greedy. Also, consider not adding the replacements to the edit summary; the long example PrimeHunter pointed out isn't really helpful. —
JJMC89 (
T·
C)
01:18, 24 July 2016 (UTC)
reply
<b />
; it seems to be about 60%/40% a typo that should be <br />
or a pointless tag that can be deleted. —
xaosflux
Talk
03:19, 25 July 2016 (UTC)
reply<cite />
tags - used almost the same way as the bad "refs" in trial one - I didn't attempt to repair these as I'm not exactly sure what our "best practice" for this tag is. —
xaosflux
Talk
03:32, 25 July 2016 (UTC)
reply<cite ... />
should be handled manually. In HTML 4 it is used for citations, but in HTML5 it is used to indicate the title of a work. In some cases <cite id="id" />
is being used like <span id="id" />
and should be converted to <span id="id"></span>
. Articles using <cite id="id" />
to indicate a reference should have the citation style converted to use <ref name="name" />
(
list-defined references) or a
shortened footnote style depending on the use. —
JJMC89 (
T·
C)
05:20, 25 July 2016 (UTC)
reply
<cite name="Foo" />
, where an editor typed "cite" instead of "ref". –
Jonesey95 (
talk)
15:09, 26 July 2016 (UTC)
reply