Operator: Blevintron ( talk · contribs · SUL · edit count · logs · page moves · block log · rights log · ANI search)
Time filed: 16:21, Monday March 26, 2012 ( UTC)
Automatic, Supervised, or Manual: Supervised during trial period.
Programming language(s): Ruby
Source code available: the source code is open source
Function overview: Mark broken links in articles, Send user talk messages to Wikipedians to request help repairing those links.
Links to relevant discussions (where appropriate): old discussion at the village pump idea lab. new discussion at VP.
Edit period(s): continuous, with configurable limits (max edits/day, max edit rate, etc)
Estimated number of pages affected: 10 articles/day during trial period; at most (3*articles) user-talk messages. All limits are configurable.
Exclusion compliant (Y/N): Yes
Already has a bot flag (Y/N):
The purpose of the bot is to combat link rot. At a high level, the bot performs these tasks:
These three case studies succinctly demonstrate the bot's main actions: Case Study: 'Johnny Unitas Stadium', Case Study: 'Mohammed Ali Hammadi', and Case Study: 'Sean Kennard'.
Articles are selected randomly. URLs are extracted from those articles. URLs are excluded if (i) they are already marked dead, e.g. via {{Broken link}}, or (ii) they have an archiveurl= alternative URL specified.
Random selection helps to address articles from the long tail which might otherwise be neglected.
Links are checked at least 3 times over a period of 5 days, from a good network position (a university in USA). Repeated tests helps to avoid false positives.
After the link trial period, the bot adds {{Broken link}} to any broken link that is present in the latest revision of the article. Broken links' are those which consistently demonstrated (i) a timeout error, (ii) a DNS error, (iii) an HTTP 404 error, or (iv) an HTTP 5xx error during their trial period.
UPDATE TO ADD: If a suitable replacement link is found in the archives, then the bot will automatically update citation templates ({{
Citation}}, {{
Cite web}}, etc) with |archiveurl=
and |archivedate=
. By 'suitable', I mean that the archive was captured +/- 6 months of the date reported in the |accessdate=
parameter, or if absent, the date of the first article revision that includes the URL. The bot will not send User talk messages in such cases. (see discussion with [[User::Hellknowz]], below).
Blevintron (
talk)
02:35, 28 March 2012 (UTC)
reply
There are strict limits on (i) the rate of article edits, (ii) the number of articles edited in one calendar day, (iii) the number of links to correct during a single edit (to make reviews simpler), and (iv) the minimum timeout before the bot will re-edit the same article (to avoid edit wars with humans or other bots). The bot respects {{Bots}} exclusions and the {{In use}} template.
Every time it marks a broken link, the bot scans article history to find the user who first added that link to the article. It sends a polite user talk message asking that user to help correct the broken link. If available, that message includes a possible match from the archives. The bot respects {{Bots}} exclusions on user pages and user_talk pages, and advertises this opt-out feature at the end of every user_talk message. The bot puts a strict limit on the number of messages it will send to one user in a calendar day. The bot will not send these to IP users or to accounts marked as a bot.
Examples of these communications can be found in these case studies: Case Study: 'Johnny Unitas Stadium', Case Study: 'Mohammed Ali Hammadi', and Case Study: 'Sean Kennard'.
To demonstrate its effectiveness, the bot will review its edits after 96 hours. It will measure (i) whether the links have been corrected, (ii) whether its edits have been reverted, (iii) whether it has been blocked from the article or from user talk pages, and (iv) measure total participation on that article. These statistics are publicly tabulated and discussed.
If these statistics suggest the bot is not helpful, or is a burden to the community, I will withdraw this bot approval request at the end of the trial period.
The latest source code is uploaded to the bot's user space at most once per day.
UPDATE: Per the suggestion by
user:Hellknowz in the discussion section below, I withdraw this task. I will instead find another place to host the bot source code.
Blevintron (
talk)
18:59, 26 March 2012 (UTC)
reply
I would suggest you limit the number of source code posting edits the bot is making in the userspace. Wikipedia is really not a source code repository. Technically, even if you makes notes about some other license for the code, anything you post on the pages is still under CC-BY-SA 3.0 and GFDL licenses. — HELLKNOWZ ▎ TALK 16:53, 26 March 2012 (UTC) reply
I would also note that bot's message of "I'm just a bot, so I don't really know how to fix the problem" is not really true. There are at least several bots approved (
[1]
[2]
[3] (
[4])) to retrieve archived copies from
Internet Archive and
Webcite. Placing a {{
dead link}} instead of |archiveurl=
or {{
Wayback}} implies the bot has failed to retrieve the archive. —
HELLKNOWZ ▎
TALK
17:00, 26 March 2012 (UTC)
reply
|bot=
parameter of who tagged it. —
HELLKNOWZ ▎
TALK
17:31, 26 March 2012 (UTC)
reply|bot=
parameter of the {{
dead link}} tag. Second, the bot does check archive.org, and adds the replacement link to the user message if its found. If I understand correctly, you are saying that the bot should automatically update the article with the archive copy. I can do that, but how good of a match must the archive copy be? Same day? Within a week? I am reluctant to update the article if the replacement was archived one month before/after the link's access date, since the archived content may differ substantially. What does the community consider to be close enough?
Blevintron (
talk)
18:54, 26 March 2012 (UTC)
replyPer User:Hellknowz' suggestion, I have updated the proposal so that the bot will automatically update citation templates when an archive copy can be found +/- N months of access date. I do this in interest of compatibility with other bots. My bot should not mark links as dead if another bot would repair the link. It may take a few days to implement and test this change. Blevintron ( talk) 02:35, 28 March 2012 (UTC) reply
I have a feeling you may be underestimating the number of dead links. I don't have exact data (and I should really get the bot running), but it appears at least 1 in 30 links is dead. When I ran the bot late 2010/early 2011 it tagged over 100k articles with dead links within 3 months (these are the ones it couldn't fix automatically). That only covered a part of pages we have and mostly worked on citations; and didn't process missing access dates or bare links. Optimistically, we can expect at least that many more tagged. So how many notifications would that make, because 100k pages == 100k notifications? And how many would the same users be getting? — HELLKNOWZ ▎ TALK 12:36, 28 March 2012 (UTC) reply
I think we could have a little technical, proof-of-concept trial soon. There are a few notes I'd like to list meanwhile:
|archivedate=
. While it's not required to respect citation's date format or field whitespace formating, it's nice to have, but not required. —
HELLKNOWZ ▎
TALK
19:14, 29 March 2012 (UTC)
replyI've started a new thread at VP. Also, I added that link to the 'relevant discussions' field in the application above. Blevintron ( talk) 15:09, 1 April 2012 (UTC) reply
Here's a fun and real case to consider about meta redirects. (1) http://www4.ncdc.noaa.gov/cgi-win/wwcgi.dll?wwevent~ShowEvent~494533 redirects to (2) http://www.ncdc.noaa.gov/oa/about/stormdown.html via 0 second meta refresh. Now, (1) is dead -- 404. However, (2) is live -- 200 -- and shows some warning about maintenance. So if the bot doesn't follow the redirect and believes the first 404, it will falsely tag all these links. — HELLKNOWZ ▎ TALK 10:09, 2 April 2012 (UTC) reply
I also wanted to clarify your comments about {{ Use dmy dates}} vs {{ Use mdy dates}}: did you mean that the presence of these templates controls how the bot parses dates? Or, does it only control how the bot emits dates into the document? Blevintron ( talk) 14:14, 2 April 2012 (UTC) reply
|accessdate=
and |archivedate=
should/can use shorter formats. Third says bots should use whatever the article already uses. Personally, I follow these templates. If none are present, I use the format from |accessdate=
. So far, I haven't had any problems reported. —
HELLKNOWZ ▎
TALK
14:24, 2 April 2012 (UTC)
replyI don't see any major problems with this task. The only potentially controversial bit was the user notification, and the linked discussions so far show no objections. All the technical details clarified (extensively, I might add), so Approved for trial (≈15 pages + accompanying notices). Please provide a link to the relevant contributions and/or diffs when the trial is complete. Let's do a short technical trial, so you can get all the stuff working and it's easier to review what the bot will do with live examples. This can serve as a case studies for VP disscusion as well. —
HELLKNOWZ ▎
TALK
14:44, 2 April 2012 (UTC)
reply
Trial complete. The trial has completed. Most edits were fine, with three exceptions listed here:
In all cases, I manually confirmed that all dead links appear to be broken. No problems.
No notification were sent. This is because:
Blevintron ( talk) 19:34, 2 April 2012 (UTC) reply
Also, there was a bug report about date formats in Whitley Bay High School. Blevintron ( talk) 19:48, 2 April 2012 (UTC) reply
( edit conflict)Some issues:
|deadurl=yes
if the archive parameters are set, because that is the default behavior already.|bot=
field in any citation templates. You can use a comment in one of the archive fields, such as <!--Added by BlevintronBot-->.|bot=
.
Blevintron (
talk)
20:24, 2 April 2012 (UTC)
reply
|bot=
parameter, that the bot adds. It's kind of pointless because lots of bots edit citations and do lots of stuff to them, and the field doesn't really help identify the changes. That's why using |bot=
in {{
wayback}} or {{
dead link}} is straight-forward, but why we add <!--Added by XxxxBot--> to citations. It's not actually required and it has been very marginally useful to me personally, but there also isn't any rules against it. —
HELLKNOWZ ▎
TALK
20:29, 2 April 2012 (UTC)
reply
|bot=
in their documentation.
Blevintron (
talk)
21:47, 2 April 2012 (UTC)
reply
|bot=
for any of the citation templates.
Blevintron (
talk)
20:29, 2 April 2012 (UTC)
reply
|bot=BlevintronBot/2012-04-02
. Again, no biggie, just thought I'd mention. —
HELLKNOWZ ▎
TALK
20:29, 2 April 2012 (UTC)
reply
Approved for extended trial (≈30 page edits and 5 user notifications). Please provide a link to the relevant contributions and/or diffs when the trial is complete. A larger test. The numbers are arbitrary/approximate, so don't worry about those or going over the limit a bit. —
HELLKNOWZ ▎
TALK
21:57, 2 April 2012 (UTC)
reply
One oddity during the trial: in Sir Walter Raleigh Hotel, my bot marked a link with {{ Dead link}}. It turns out, that link was already marked with {{ NRIS dead link}}. That adds pages to Category:All NRHP articles with dead external links, which is not a sub-category of Category:Articles with dead external links. I'm not really sure what the ideal behavior is, but it seems that adding {{ Dead link}} is not redundant, since the page categories are disjoint. Blevintron ( talk) 21:01, 3 April 2012 (UTC) reply
Trial complete.
I checked the edits, they look fine. Unfortunately, don't see any user having repaired any links. — HELLKNOWZ ▎ TALK 13:11, 5 April 2012 (UTC) reply
So far, there is no indication that notifications are effective, but the sample size is also very small. Two of the notified users are very inactive ( User:Kumarajiva, last edit May 2010; User:Glasstowerpress, last edit June 2010). Four are marginally inactive ( User:Shudde, last edit January 2012; User:Dcmacnut, last edit February 2012; User:Dickeybird, User:Sadads, last edit March 2012). The others User:Deinocheirus, User:Bwmoll3, User:Calistemon, User:Arsenikk are recently active but have not acted on the notification.
There is no indication that any were bothered by the notifications (none have opted-out, no bug reports, and I've received no communication from them).
I think two things could be improved:
If BAG is willing, I would like to do a larger trial run over the weekend. Blevintron ( talk) 15:38, 6 April 2012 (UTC) reply
A couple editors on VP suggested pinging only users who recently edited. Do you check for recent user activity before notifying? — HELLKNOWZ ▎ TALK 16:15, 6 April 2012 (UTC) reply
OK, more samples.
Approved for extended trial (40 notifications). Please provide a link to the relevant contributions and/or diffs when the trial is complete. plus whatever article edits will happen. —
HELLKNOWZ ▎
TALK
16:52, 6 April 2012 (UTC)
reply
The bot has finished editing. 236 articles / 40 notifications. Preliminary results are much better than last time.
I am reviewing the edits... It might take a while.
Blevintron (
talk)
16:32, 7 April 2012 (UTC)
reply
Here's a few cases bot notified >1 person: (wall of text redacted) Is that intentional? — HELLKNOWZ ▎ TALK 16:39, 7 April 2012 (UTC) reply
Very preliminary results. Over the last 16 hours,
...Users who received notification fixed one or more dead links in six articles:
...One case is an 'almost fix':
...One possible annoyance:
...One definite false positive:
...Reported bugs:
still a lot more edits to review...
Blevintron (
talk)
16:54, 7 April 2012 (UTC)
reply
I'll look through edits at a later time. For now, let's wait for a while for feedback. Also, does the bot notify >1 dead link added by the same person? — HELLKNOWZ ▎ TALK 16:57, 7 April 2012 (UTC) reply
I think you forgot to explicitly mention that "you" are a bot in the user messages. — HELLKNOWZ ▎ TALK 09:36, 8 April 2012 (UTC) reply
Trial complete.
I reviewed all edits. I observed these problems and fixed the articles:
Data is available for last week's edits. The highlights:
The link improvement metric shows a big difference between the three cases:
This is misleading. Most of the improvement is due to the archive URLs that the bot automatically finds and adds to the articles. By comparing archive rate and mark dead rate, you see that about 0%, 42% and 56% of links were archived by the bot in those cases, respectively. So, the improvement due to notifications is probably closer to 2%.
Conclusions: The false positive and broken edit rate is still too high for deployment. The experiment suggests that notifications do not annoy most users. Notifications have a small, positive effect on dead link remediation.
My initial hypothesis was that notifications would have a large effect. I have invalidated this hypothesis, and now see no benefit of this bot over other dead link bots. I withdraw this BRFA. Blevintron ( talk) 14:05, 14 April 2012 (UTC) reply
I've fixed several editing bugs and false positive dead links. I've tweaked the notification messages to sounds less human. I think I'll be ready for another trial this weekend. Blevintron ( talk) 14:57, 19 April 2012 (UTC) reply
Approved for extended trial (250 edits). Please provide a link to the relevant contributions and/or diffs when the trial is complete. (plus whatever notifications). —
HELLKNOWZ ▎
TALK
16:26, 19 April 2012 (UTC)
reply
Trial complete.
The trial was largely good. There were two classes of bugs, both due to mis-parsing links in wikitext.
Statistics tell more/less the same story. There were no bug reports or complaints. One user has opted out of notifications from this (and several other) bots.
Fixed I've corrected the affected articles, where appropriate (some of those links are broken even if correctly parsed).
Fixed I read MediaWiki source code to figure out
how wikipedia deals with trailing parentheses and fixed my bot so it parses them in the same way.
Summary: good progress but more to do. Blevintron ( talk) 15:57, 30 April 2012 (UTC) reply
I've fixed those bugs, found and fixed a few more. I've studied the bot's offline edits and prepared for the next bug before it happens. I've improved the edit rate and decreased (per edit) bandwidth usage. Finally, I have some tools to help me review larger trials more quickly. I'm ready for another trial if you have the patience. Blevintron ( talk) 00:14, 4 May 2012 (UTC) reply
{{ BAG assistance needed}} Blevintron ( talk) 15:28, 5 May 2012 (UTC) reply
Approved for extended trial (250 edits). Please provide a link to the relevant contributions and/or diffs when the trial is complete. —
HELLKNOWZ ▎
TALK
15:58, 5 May 2012 (UTC)
reply
Trial complete.
248 edits were good. Overall bad edit rate of 0.8% for this trial.
Bad edit 1: Jarrett Bellini:
Bad edit 2: Revival Centres International
Blevintron ( talk) 16:21, 6 May 2012 (UTC) reply
tl;dr I'd like to withdraw this BRfA for the moment, with the intention of returning to it later.
Here's the story,
Blevintron ( talk) 22:28, 30 May 2012 (UTC) reply
Withdrawn by operator. Per above --
Chris
03:40, 2 June 2012 (UTC)
reply
Operator: Blevintron ( talk · contribs · SUL · edit count · logs · page moves · block log · rights log · ANI search)
Time filed: 16:21, Monday March 26, 2012 ( UTC)
Automatic, Supervised, or Manual: Supervised during trial period.
Programming language(s): Ruby
Source code available: the source code is open source
Function overview: Mark broken links in articles, Send user talk messages to Wikipedians to request help repairing those links.
Links to relevant discussions (where appropriate): old discussion at the village pump idea lab. new discussion at VP.
Edit period(s): continuous, with configurable limits (max edits/day, max edit rate, etc)
Estimated number of pages affected: 10 articles/day during trial period; at most (3*articles) user-talk messages. All limits are configurable.
Exclusion compliant (Y/N): Yes
Already has a bot flag (Y/N):
The purpose of the bot is to combat link rot. At a high level, the bot performs these tasks:
These three case studies succinctly demonstrate the bot's main actions: Case Study: 'Johnny Unitas Stadium', Case Study: 'Mohammed Ali Hammadi', and Case Study: 'Sean Kennard'.
Articles are selected randomly. URLs are extracted from those articles. URLs are excluded if (i) they are already marked dead, e.g. via {{Broken link}}, or (ii) they have an archiveurl= alternative URL specified.
Random selection helps to address articles from the long tail which might otherwise be neglected.
Links are checked at least 3 times over a period of 5 days, from a good network position (a university in USA). Repeated tests helps to avoid false positives.
After the link trial period, the bot adds {{Broken link}} to any broken link that is present in the latest revision of the article. Broken links' are those which consistently demonstrated (i) a timeout error, (ii) a DNS error, (iii) an HTTP 404 error, or (iv) an HTTP 5xx error during their trial period.
UPDATE TO ADD: If a suitable replacement link is found in the archives, then the bot will automatically update citation templates ({{
Citation}}, {{
Cite web}}, etc) with |archiveurl=
and |archivedate=
. By 'suitable', I mean that the archive was captured +/- 6 months of the date reported in the |accessdate=
parameter, or if absent, the date of the first article revision that includes the URL. The bot will not send User talk messages in such cases. (see discussion with [[User::Hellknowz]], below).
Blevintron (
talk)
02:35, 28 March 2012 (UTC)
reply
There are strict limits on (i) the rate of article edits, (ii) the number of articles edited in one calendar day, (iii) the number of links to correct during a single edit (to make reviews simpler), and (iv) the minimum timeout before the bot will re-edit the same article (to avoid edit wars with humans or other bots). The bot respects {{Bots}} exclusions and the {{In use}} template.
Every time it marks a broken link, the bot scans article history to find the user who first added that link to the article. It sends a polite user talk message asking that user to help correct the broken link. If available, that message includes a possible match from the archives. The bot respects {{Bots}} exclusions on user pages and user_talk pages, and advertises this opt-out feature at the end of every user_talk message. The bot puts a strict limit on the number of messages it will send to one user in a calendar day. The bot will not send these to IP users or to accounts marked as a bot.
Examples of these communications can be found in these case studies: Case Study: 'Johnny Unitas Stadium', Case Study: 'Mohammed Ali Hammadi', and Case Study: 'Sean Kennard'.
To demonstrate its effectiveness, the bot will review its edits after 96 hours. It will measure (i) whether the links have been corrected, (ii) whether its edits have been reverted, (iii) whether it has been blocked from the article or from user talk pages, and (iv) measure total participation on that article. These statistics are publicly tabulated and discussed.
If these statistics suggest the bot is not helpful, or is a burden to the community, I will withdraw this bot approval request at the end of the trial period.
The latest source code is uploaded to the bot's user space at most once per day.
UPDATE: Per the suggestion by
user:Hellknowz in the discussion section below, I withdraw this task. I will instead find another place to host the bot source code.
Blevintron (
talk)
18:59, 26 March 2012 (UTC)
reply
I would suggest you limit the number of source code posting edits the bot is making in the userspace. Wikipedia is really not a source code repository. Technically, even if you makes notes about some other license for the code, anything you post on the pages is still under CC-BY-SA 3.0 and GFDL licenses. — HELLKNOWZ ▎ TALK 16:53, 26 March 2012 (UTC) reply
I would also note that bot's message of "I'm just a bot, so I don't really know how to fix the problem" is not really true. There are at least several bots approved (
[1]
[2]
[3] (
[4])) to retrieve archived copies from
Internet Archive and
Webcite. Placing a {{
dead link}} instead of |archiveurl=
or {{
Wayback}} implies the bot has failed to retrieve the archive. —
HELLKNOWZ ▎
TALK
17:00, 26 March 2012 (UTC)
reply
|bot=
parameter of who tagged it. —
HELLKNOWZ ▎
TALK
17:31, 26 March 2012 (UTC)
reply|bot=
parameter of the {{
dead link}} tag. Second, the bot does check archive.org, and adds the replacement link to the user message if its found. If I understand correctly, you are saying that the bot should automatically update the article with the archive copy. I can do that, but how good of a match must the archive copy be? Same day? Within a week? I am reluctant to update the article if the replacement was archived one month before/after the link's access date, since the archived content may differ substantially. What does the community consider to be close enough?
Blevintron (
talk)
18:54, 26 March 2012 (UTC)
replyPer User:Hellknowz' suggestion, I have updated the proposal so that the bot will automatically update citation templates when an archive copy can be found +/- N months of access date. I do this in interest of compatibility with other bots. My bot should not mark links as dead if another bot would repair the link. It may take a few days to implement and test this change. Blevintron ( talk) 02:35, 28 March 2012 (UTC) reply
I have a feeling you may be underestimating the number of dead links. I don't have exact data (and I should really get the bot running), but it appears at least 1 in 30 links is dead. When I ran the bot late 2010/early 2011 it tagged over 100k articles with dead links within 3 months (these are the ones it couldn't fix automatically). That only covered a part of pages we have and mostly worked on citations; and didn't process missing access dates or bare links. Optimistically, we can expect at least that many more tagged. So how many notifications would that make, because 100k pages == 100k notifications? And how many would the same users be getting? — HELLKNOWZ ▎ TALK 12:36, 28 March 2012 (UTC) reply
I think we could have a little technical, proof-of-concept trial soon. There are a few notes I'd like to list meanwhile:
|archivedate=
. While it's not required to respect citation's date format or field whitespace formating, it's nice to have, but not required. —
HELLKNOWZ ▎
TALK
19:14, 29 March 2012 (UTC)
replyI've started a new thread at VP. Also, I added that link to the 'relevant discussions' field in the application above. Blevintron ( talk) 15:09, 1 April 2012 (UTC) reply
Here's a fun and real case to consider about meta redirects. (1) http://www4.ncdc.noaa.gov/cgi-win/wwcgi.dll?wwevent~ShowEvent~494533 redirects to (2) http://www.ncdc.noaa.gov/oa/about/stormdown.html via 0 second meta refresh. Now, (1) is dead -- 404. However, (2) is live -- 200 -- and shows some warning about maintenance. So if the bot doesn't follow the redirect and believes the first 404, it will falsely tag all these links. — HELLKNOWZ ▎ TALK 10:09, 2 April 2012 (UTC) reply
I also wanted to clarify your comments about {{ Use dmy dates}} vs {{ Use mdy dates}}: did you mean that the presence of these templates controls how the bot parses dates? Or, does it only control how the bot emits dates into the document? Blevintron ( talk) 14:14, 2 April 2012 (UTC) reply
|accessdate=
and |archivedate=
should/can use shorter formats. Third says bots should use whatever the article already uses. Personally, I follow these templates. If none are present, I use the format from |accessdate=
. So far, I haven't had any problems reported. —
HELLKNOWZ ▎
TALK
14:24, 2 April 2012 (UTC)
replyI don't see any major problems with this task. The only potentially controversial bit was the user notification, and the linked discussions so far show no objections. All the technical details clarified (extensively, I might add), so Approved for trial (≈15 pages + accompanying notices). Please provide a link to the relevant contributions and/or diffs when the trial is complete. Let's do a short technical trial, so you can get all the stuff working and it's easier to review what the bot will do with live examples. This can serve as a case studies for VP disscusion as well. —
HELLKNOWZ ▎
TALK
14:44, 2 April 2012 (UTC)
reply
Trial complete. The trial has completed. Most edits were fine, with three exceptions listed here:
In all cases, I manually confirmed that all dead links appear to be broken. No problems.
No notification were sent. This is because:
Blevintron ( talk) 19:34, 2 April 2012 (UTC) reply
Also, there was a bug report about date formats in Whitley Bay High School. Blevintron ( talk) 19:48, 2 April 2012 (UTC) reply
( edit conflict)Some issues:
|deadurl=yes
if the archive parameters are set, because that is the default behavior already.|bot=
field in any citation templates. You can use a comment in one of the archive fields, such as <!--Added by BlevintronBot-->.|bot=
.
Blevintron (
talk)
20:24, 2 April 2012 (UTC)
reply
|bot=
parameter, that the bot adds. It's kind of pointless because lots of bots edit citations and do lots of stuff to them, and the field doesn't really help identify the changes. That's why using |bot=
in {{
wayback}} or {{
dead link}} is straight-forward, but why we add <!--Added by XxxxBot--> to citations. It's not actually required and it has been very marginally useful to me personally, but there also isn't any rules against it. —
HELLKNOWZ ▎
TALK
20:29, 2 April 2012 (UTC)
reply
|bot=
in their documentation.
Blevintron (
talk)
21:47, 2 April 2012 (UTC)
reply
|bot=
for any of the citation templates.
Blevintron (
talk)
20:29, 2 April 2012 (UTC)
reply
|bot=BlevintronBot/2012-04-02
. Again, no biggie, just thought I'd mention. —
HELLKNOWZ ▎
TALK
20:29, 2 April 2012 (UTC)
reply
Approved for extended trial (≈30 page edits and 5 user notifications). Please provide a link to the relevant contributions and/or diffs when the trial is complete. A larger test. The numbers are arbitrary/approximate, so don't worry about those or going over the limit a bit. —
HELLKNOWZ ▎
TALK
21:57, 2 April 2012 (UTC)
reply
One oddity during the trial: in Sir Walter Raleigh Hotel, my bot marked a link with {{ Dead link}}. It turns out, that link was already marked with {{ NRIS dead link}}. That adds pages to Category:All NRHP articles with dead external links, which is not a sub-category of Category:Articles with dead external links. I'm not really sure what the ideal behavior is, but it seems that adding {{ Dead link}} is not redundant, since the page categories are disjoint. Blevintron ( talk) 21:01, 3 April 2012 (UTC) reply
Trial complete.
I checked the edits, they look fine. Unfortunately, don't see any user having repaired any links. — HELLKNOWZ ▎ TALK 13:11, 5 April 2012 (UTC) reply
So far, there is no indication that notifications are effective, but the sample size is also very small. Two of the notified users are very inactive ( User:Kumarajiva, last edit May 2010; User:Glasstowerpress, last edit June 2010). Four are marginally inactive ( User:Shudde, last edit January 2012; User:Dcmacnut, last edit February 2012; User:Dickeybird, User:Sadads, last edit March 2012). The others User:Deinocheirus, User:Bwmoll3, User:Calistemon, User:Arsenikk are recently active but have not acted on the notification.
There is no indication that any were bothered by the notifications (none have opted-out, no bug reports, and I've received no communication from them).
I think two things could be improved:
If BAG is willing, I would like to do a larger trial run over the weekend. Blevintron ( talk) 15:38, 6 April 2012 (UTC) reply
A couple editors on VP suggested pinging only users who recently edited. Do you check for recent user activity before notifying? — HELLKNOWZ ▎ TALK 16:15, 6 April 2012 (UTC) reply
OK, more samples.
Approved for extended trial (40 notifications). Please provide a link to the relevant contributions and/or diffs when the trial is complete. plus whatever article edits will happen. —
HELLKNOWZ ▎
TALK
16:52, 6 April 2012 (UTC)
reply
The bot has finished editing. 236 articles / 40 notifications. Preliminary results are much better than last time.
I am reviewing the edits... It might take a while.
Blevintron (
talk)
16:32, 7 April 2012 (UTC)
reply
Here's a few cases bot notified >1 person: (wall of text redacted) Is that intentional? — HELLKNOWZ ▎ TALK 16:39, 7 April 2012 (UTC) reply
Very preliminary results. Over the last 16 hours,
...Users who received notification fixed one or more dead links in six articles:
...One case is an 'almost fix':
...One possible annoyance:
...One definite false positive:
...Reported bugs:
still a lot more edits to review...
Blevintron (
talk)
16:54, 7 April 2012 (UTC)
reply
I'll look through edits at a later time. For now, let's wait for a while for feedback. Also, does the bot notify >1 dead link added by the same person? — HELLKNOWZ ▎ TALK 16:57, 7 April 2012 (UTC) reply
I think you forgot to explicitly mention that "you" are a bot in the user messages. — HELLKNOWZ ▎ TALK 09:36, 8 April 2012 (UTC) reply
Trial complete.
I reviewed all edits. I observed these problems and fixed the articles:
Data is available for last week's edits. The highlights:
The link improvement metric shows a big difference between the three cases:
This is misleading. Most of the improvement is due to the archive URLs that the bot automatically finds and adds to the articles. By comparing archive rate and mark dead rate, you see that about 0%, 42% and 56% of links were archived by the bot in those cases, respectively. So, the improvement due to notifications is probably closer to 2%.
Conclusions: The false positive and broken edit rate is still too high for deployment. The experiment suggests that notifications do not annoy most users. Notifications have a small, positive effect on dead link remediation.
My initial hypothesis was that notifications would have a large effect. I have invalidated this hypothesis, and now see no benefit of this bot over other dead link bots. I withdraw this BRFA. Blevintron ( talk) 14:05, 14 April 2012 (UTC) reply
I've fixed several editing bugs and false positive dead links. I've tweaked the notification messages to sounds less human. I think I'll be ready for another trial this weekend. Blevintron ( talk) 14:57, 19 April 2012 (UTC) reply
Approved for extended trial (250 edits). Please provide a link to the relevant contributions and/or diffs when the trial is complete. (plus whatever notifications). —
HELLKNOWZ ▎
TALK
16:26, 19 April 2012 (UTC)
reply
Trial complete.
The trial was largely good. There were two classes of bugs, both due to mis-parsing links in wikitext.
Statistics tell more/less the same story. There were no bug reports or complaints. One user has opted out of notifications from this (and several other) bots.
Fixed I've corrected the affected articles, where appropriate (some of those links are broken even if correctly parsed).
Fixed I read MediaWiki source code to figure out
how wikipedia deals with trailing parentheses and fixed my bot so it parses them in the same way.
Summary: good progress but more to do. Blevintron ( talk) 15:57, 30 April 2012 (UTC) reply
I've fixed those bugs, found and fixed a few more. I've studied the bot's offline edits and prepared for the next bug before it happens. I've improved the edit rate and decreased (per edit) bandwidth usage. Finally, I have some tools to help me review larger trials more quickly. I'm ready for another trial if you have the patience. Blevintron ( talk) 00:14, 4 May 2012 (UTC) reply
{{ BAG assistance needed}} Blevintron ( talk) 15:28, 5 May 2012 (UTC) reply
Approved for extended trial (250 edits). Please provide a link to the relevant contributions and/or diffs when the trial is complete. —
HELLKNOWZ ▎
TALK
15:58, 5 May 2012 (UTC)
reply
Trial complete.
248 edits were good. Overall bad edit rate of 0.8% for this trial.
Bad edit 1: Jarrett Bellini:
Bad edit 2: Revival Centres International
Blevintron ( talk) 16:21, 6 May 2012 (UTC) reply
tl;dr I'd like to withdraw this BRfA for the moment, with the intention of returning to it later.
Here's the story,
Blevintron ( talk) 22:28, 30 May 2012 (UTC) reply
Withdrawn by operator. Per above --
Chris
03:40, 2 June 2012 (UTC)
reply