Operator: Green Cardamom ( talk · contribs · SUL · edit count · logs · page moves · block log · rights log · ANI search)
Time filed: 16:29, Sunday, March 13, 2016 ( UTC)
Automatic, Supervised, or Manual: Automatic
Programming language(s): Nim and AWK
Source code available: WaybackMedic on GitHub
Function overview: Fix known problems with Internet Archive wayback machine links and page formatting errors introduced by Cyberbot IABot between December 2015 and March 2016.
Links to relevant discussions (where appropriate):
Edit period(s): one time run
Estimated number of pages affected: est. 20k pages of ~100k checked (the corpus of all articles edited by Cyberbot IABot from 20151231 to 20160310).
Exclusion compliant (Yes/No): Yes
Already has a bot flag (Yes/No): No
Function details: User:Green Cardamom/WaybackMedic lists details
Do to the large scope, this will likely require multiple trials, and a community response period. Sometimes these are easier to show as demonstrations, so your first small trial is approved, please post results below when ready. — xaosflux Talk 17:00, 13 March 2016 (UTC) reply
{{
Dead link|bot=...}}
is a thing. I dunno if that param is truly critical in the grand scheme of things, but I'd suggest supplying it with the bot's username. Also, pinging @
Cyberpower678: into the loop. --
slakr\
talk / 02:31, 16 March 2016 (UTC)
reply
GreenC bot has completed it's trial run. The edits are view-able here, ending with the Moscow theater hostage crisis. -- Green C 20:35, 18 March 2016 (UTC) reply
Approved for extended trial (50 edits). Please provide a link to the relevant contributions and/or diffs when the trial is complete. --
slakr\
talk / 02:47, 29 March 2016 (UTC)
reply
{{ BAGAssistanceNeeded}} - there were no errors with the trial. -- Green C 20:47, 10 April 2016 (UTC) reply
{{
BAGAssistanceNeeded}} - I understand that in the 30+ days of this bot's trial, a single editor
Earwig found two problems. Those problems are edge cases that, had the bot run to completion, would have impacted an estimated 25 of 25,000 edits or a bot accuracy rate of 0.999 though there might other unknown edge cases that bring it up to .99 or something. No other editor has raised concerns. Meanwhile the problems that MediaWiki is trying to fix are becoming worse -- editors attempt to fix them manually, and by doing so break things making it impossible for WaybackMedic to actually make the fixes it is designed for (eg. they see a link doesn't work, remove it and add {{
cbignore}}
making it impossible for WaybackMedic to replace with a working link). Each day that goes by WM's edit ability to fix problems is degraded. --
Green
C
May I suggest that you also filter URLs of the old WBM schemes, including
http://replay.waybackmachine.org/
and
http://wayback.archive.org/
, to the new
https://web.archive.org/
. --
bender235 (
talk) 00:42, 12 May 2016 (UTC)
reply
http://wayback.archive.org/
but only when doing something else at the same time, like changing a snapshot date. The focus for the first iteration of the bot is to fix some known problems within a limited sub-set of articles - once it finishes I hope to make a new version that will run against all articles containing wayback links and does general formatting fixes like you suggested. --
Green
C 02:22, 12 May 2016 (UTC)
replyTrial results are at User:Green_Cardamom/WaybackMedic/trial2.
There's still enough red X's to justify more trial, edge cases keep showing up. I'd like to run in batches of 25 which seems manageable, using the same method above above. Hopefully it won't need more than another 50-75 edits, but however long it takes. I'll log the results on sub-pages to avoid making this page too long. -- Green C 21:15, 15 April 2016 (UTC) reply
Trial complete. - results:
The trial articles were hand-picked to stress test the software's feature set. There was one bug in 51-75 that in production would have impacted few articles (required two rare conditions to occur simultaneous). -- Green C 14:32, 23 April 2016 (UTC) {{ BAGAssistanceNeeded}} reply
Trial complete. - results:
One bug. This bug was created when fixing the bug from the last trial. Both bugs are related to code dealing with alternative (non-wayback) archives which in total accounts for ~100 articles out of the ~100,000 being processed. It is showing up in trial because I am stress testing by manually picking articles that contain alternative archives, along with other rare cases intentionally chosen for the trial. Here is a suggestion how to ease into this:
-- Green C 15:45, 12 May 2016 (UTC) reply
A user has requested the attention of a member of the
Bot Approvals Group. Once assistance has been rendered, please deactivate this tag by replacing it with
{{
t|BAG assistance needed}}
.
@
Slakr:
@
The Earwig:
Operator: Green Cardamom ( talk · contribs · SUL · edit count · logs · page moves · block log · rights log · ANI search)
Time filed: 16:29, Sunday, March 13, 2016 ( UTC)
Automatic, Supervised, or Manual: Automatic
Programming language(s): Nim and AWK
Source code available: WaybackMedic on GitHub
Function overview: Fix known problems with Internet Archive wayback machine links and page formatting errors introduced by Cyberbot IABot between December 2015 and March 2016.
Links to relevant discussions (where appropriate):
Edit period(s): one time run
Estimated number of pages affected: est. 20k pages of ~100k checked (the corpus of all articles edited by Cyberbot IABot from 20151231 to 20160310).
Exclusion compliant (Yes/No): Yes
Already has a bot flag (Yes/No): No
Function details: User:Green Cardamom/WaybackMedic lists details
Do to the large scope, this will likely require multiple trials, and a community response period. Sometimes these are easier to show as demonstrations, so your first small trial is approved, please post results below when ready. — xaosflux Talk 17:00, 13 March 2016 (UTC) reply
{{
Dead link|bot=...}}
is a thing. I dunno if that param is truly critical in the grand scheme of things, but I'd suggest supplying it with the bot's username. Also, pinging @
Cyberpower678: into the loop. --
slakr\
talk / 02:31, 16 March 2016 (UTC)
reply
GreenC bot has completed it's trial run. The edits are view-able here, ending with the Moscow theater hostage crisis. -- Green C 20:35, 18 March 2016 (UTC) reply
Approved for extended trial (50 edits). Please provide a link to the relevant contributions and/or diffs when the trial is complete. --
slakr\
talk / 02:47, 29 March 2016 (UTC)
reply
{{ BAGAssistanceNeeded}} - there were no errors with the trial. -- Green C 20:47, 10 April 2016 (UTC) reply
{{
BAGAssistanceNeeded}} - I understand that in the 30+ days of this bot's trial, a single editor
Earwig found two problems. Those problems are edge cases that, had the bot run to completion, would have impacted an estimated 25 of 25,000 edits or a bot accuracy rate of 0.999 though there might other unknown edge cases that bring it up to .99 or something. No other editor has raised concerns. Meanwhile the problems that MediaWiki is trying to fix are becoming worse -- editors attempt to fix them manually, and by doing so break things making it impossible for WaybackMedic to actually make the fixes it is designed for (eg. they see a link doesn't work, remove it and add {{
cbignore}}
making it impossible for WaybackMedic to replace with a working link). Each day that goes by WM's edit ability to fix problems is degraded. --
Green
C
May I suggest that you also filter URLs of the old WBM schemes, including
http://replay.waybackmachine.org/
and
http://wayback.archive.org/
, to the new
https://web.archive.org/
. --
bender235 (
talk) 00:42, 12 May 2016 (UTC)
reply
http://wayback.archive.org/
but only when doing something else at the same time, like changing a snapshot date. The focus for the first iteration of the bot is to fix some known problems within a limited sub-set of articles - once it finishes I hope to make a new version that will run against all articles containing wayback links and does general formatting fixes like you suggested. --
Green
C 02:22, 12 May 2016 (UTC)
replyTrial results are at User:Green_Cardamom/WaybackMedic/trial2.
There's still enough red X's to justify more trial, edge cases keep showing up. I'd like to run in batches of 25 which seems manageable, using the same method above above. Hopefully it won't need more than another 50-75 edits, but however long it takes. I'll log the results on sub-pages to avoid making this page too long. -- Green C 21:15, 15 April 2016 (UTC) reply
Trial complete. - results:
The trial articles were hand-picked to stress test the software's feature set. There was one bug in 51-75 that in production would have impacted few articles (required two rare conditions to occur simultaneous). -- Green C 14:32, 23 April 2016 (UTC) {{ BAGAssistanceNeeded}} reply
Trial complete. - results:
One bug. This bug was created when fixing the bug from the last trial. Both bugs are related to code dealing with alternative (non-wayback) archives which in total accounts for ~100 articles out of the ~100,000 being processed. It is showing up in trial because I am stress testing by manually picking articles that contain alternative archives, along with other rare cases intentionally chosen for the trial. Here is a suggestion how to ease into this:
-- Green C 15:45, 12 May 2016 (UTC) reply
A user has requested the attention of a member of the
Bot Approvals Group. Once assistance has been rendered, please deactivate this tag by replacing it with
{{
t|BAG assistance needed}}
.
@
Slakr:
@
The Earwig: