Operator: The Earwig ( Talk | Contributions)
Automatic or Manually Assisted: Automatic, unsupervised (although other users must confirm each copyvio before the request is denied.)
Programming Language(s): Python, Pywikipedia
Function Overview: Checks recent AfC submissions for copyright violations.
Edit period(s): Continuous (run from my own computer, so it won't operate when it's off, although my computer is on most of the time.)
Already has a bot flag (Y/N): N
Function Details: This is a bot that is resposible for doing a task that is completely ignored by the resident copyvio bot, CorenSearchBot. The bot's function will be that of a copyright violation-checking bot, similar to CorenSearchBot, which is currently the only running copyvio bot. However, instead of checking Special:NewPages for copyright violations, it checks Category:Pending Afc requests. Many new users submit copyrighted content for Articles for Creation, and CorenSearchBot does not seem to catch this. The bot will speed up the AfC proccess by placing this template on copyvio requests, saving reviewers time, and allowing them to spend more energy on reviewing other submissions. It will not deny requests: they will stay at pending status, but will have the message above them ( example). The {{{URL}}} parameter will be replaced by a link to the bot's log page, which will have information about what site the violation was found on, and what specific strings are in violation of copyright. It then relies on other users (often that will be me) to check the request and deny it if applicable. Thus, this log page will serve a similar function to WP:SCV, because due to technical limitations with Pywikipedia, logging on SCV is much harder (although I might make it possible in the future). See this page for the bot's source code.
The bot was tested on my other bot account, EarwigBot I, which is used for one-time tasks (none yet) and for making edits on my and its own userspace (because this does not require bot approval). See EarwigBot I's log page for testing that has occured. All aspects of the bot's code have been written, and testing without making edits (the debugging feature) passed successfully. The bot is both emergency shutoff and exclusion compliant: the first was coded by myself and the second is supported by the standard installation of Pywikipedia.
The bot's code works by first generating a list of all pages in Category:Pending Afc requests, then checking it against the Yahoo API (I have a key) to see if there are any copyright violations. These tasks are both handled by this chunk of code, which is a modified version of Copyright.py. Then, the bot places this template on each of the suspected copyright violation pages, and loads the details of each one onto User:EarwigBot II/Logs. I could go into more detail about this process, but that's probably not necessary.
Now for an explaination of the bot's usefulness. I am a rather active member at WikiProject Articles for Creation, so I have a lot of experience when it comes to copyright violations concerning that section of Wikipedia. Second to notability, or possibly third to verifiability, there is no doubt that it is one of the most common reasons why a request gets declined. Now, usually a sizeable portion (5–10%) of the AfC submissions have some form of blatant copyright infringement, and an even higher number have copied sentences from other sources in them that are often not caught until its too late.
You may be thinking that the bot has no purpose, because someone is eventually going to catch the copyright violation. This is not the case, for two major reasons. First of all, Category:Pending Afc requests may not get backlogged a lot (this required over 52 articles in the queue), but there are often submissions in the category that remain unreviewed for days when this could be avoided if there were less articles to review. This bot would make the process move faster, something that's good no matter what part of Wikipedia we're talking about.
Second, not all AfC submissions actually have their copyrighted content removed after they are declined or accepted. I have yet to see a blatant occur when the article is accepted, but I've noticed copyright violations in declined requests on several occasions. ( this wasn't caught in this submission and this wasn't caught in this submission, for example, and that's only in the first 200 entires.) And it's obvious that the reason why we would want to have these removed is the same major reason that we want all copyright violations to be removed: legal issues. Just because an article is in the declined requests pile doesn't mean that someone won't eventually notice it. Heck, I certainly did!
As a final note, I must point out that the internet copyvio checker portion of my bot's code is by no means rudimentary. One of the biggest problems I've noticed with bot requests is that the code doesn't account for loopholes, such how quotes with [sic] can affect spell-checker bots (one of the reasons we don't have them), or how free sources can affect copyvio bots (like this one). The Copyright.py module, developed by Francesco Cosoleto in 2006, not only maintains a client-side constantly-updated database of mirrors and forks from Wikipedia:Mirrors and forks, but it has an exclusions list and a protected-sites list. With these features, combined with a well-developed and tested module and a conservative template that does not straightaway deny requests, even the rare and dreaded false-positive will have little affect on how the bot will help Articles for Creation. Thank you for taking your time to read the details of this submission. I eagerly await any responses.
Approved for trial (30 edits). Please provide a link to the relevant contributions and/or diffs when the trial is complete. I'm curious to see how this works in practice. – Quadell ( talk) 15:23, 6 May 2009 (UTC) reply
EarwigBot II Logs — Trial 1 Last updated: The Earwig ( Talk | Contributions) 02:46, 11 May 2009 (UTC) reply Edits completed: 25/30
Session 1 Pages checked: 32 Suspected copyvio articles found: 6 Number of Yahoo queries: 325 Edits:
Comments:
Results:
Pages checked: 26 Suspected copyvio articles found: 2 Number of Yahoo queries: 261 Edits:
Comments:
Results:
Pages checked: 27 Suspected copyvio articles found: 2 Number of Yahoo queries: 261 Edits:
Comments:
Results:
Pages checked: 26 Suspected copyvio articles found: 1 Number of Yahoo queries: 222 Edits:
Comments:
Results:
Pages checked: 13 Suspected copyvio articles found: 1 Number of Yahoo queries: 124 Edits:
Comments:
Results:
|
Code changes: I made a few changes to the code in light of the trial that is currently in progress. The bot is now running v1.1: The Earwig ( Talk | Contributions) 22:47, 6 May 2009 (UTC) reply
I'm beginning Session 2 soon, as Session 1 is complete. Two more of the suspected copyvios were removed by other users, and I removed the last two. All of the requests involved a complete removal of content, except for this tagging. I declined this for notability, not copyvio, because the copyvio string was taken from a quote. The copyright.py module recognizes quotes, but this one was missed because it wasn't formatted properly. Other than that, all taggings seemed accurate. The Earwig ( Talk | Contributions) 00:25, 7 May 2009 (UTC) reply
Session 2 results posted above. The bot tagged two articles. The Earwig ( Talk | Contributions) 19:53, 7 May 2009 (UTC) reply
(outdent) I think that I'll side with the first idea. It would be a small matter of and editintro and a preload template, which can easily be created and won't require much extra coding for the bot itself. (The second option is harder, maybe I can implement this at a later date, but not now). As for the trial, if you think that the bot seems good so far, then I'll continue where the trial left off (i.e., 20 edits remaining) with the new template updates. By the time the next twenty edits or so are made, we can decide if there are any other necessary changes that I should make before the bot is approved. Thanks for your help! The Earwig ( Talk | Contributions) 00:28, 8 May 2009 (UTC) reply
Can I just say, as a participant of this WikiProject, that this bot seems to be doing a great job and is really useful for us. Some tweaks and streamlining might be in order, but don't worry too much because this is much better than what we had before which was nothing. About the idea of putting it on hold. This might work well actually. If you passed a code (e.g. "cv-bot") as the second parameter and the link to the log page as the third parameter then the reviewer tools could be adapted to provide a useful link. — Martin ( MSGJ · talk) 11:39, 8 May 2009 (UTC) reply
Session 4 results posted. I'm unable to have both the hold action and the template addition occur in one edit, so I just did the test with pending instead. The functionality for that, however, can be done. Hold on... The Earwig ( Talk | Contributions) 02:20, 10 May 2009 (UTC) reply
Session 5 results posted. I noticed a template error (easily fixed), and I have confirmed that the blacklist feature works! The bot logged an article that was on its blacklist, but it didn't change it. The Earwig ( Talk | Contributions) 17:41, 10 May 2009 (UTC) reply
Is the trial complete? If not, just let us know when it is. – Quadell ( talk) 22:48, 10 May 2009 (UTC) reply
{{
BotTrialComplete}}
on this page, because I recently made a code change and I want to make sure that everything's OK. Thanks for reminding me, though.
The Earwig (
Talk |
Contributions) 22:52, 10 May 2009 (UTC)
replyTrial complete. I ran the test I wanted, and it worked (although the bot didn't find any copyvios). The trial is now complete, with 25 edits made (although many of these were just logs done by the bot because I wanted to test a few things). I can now confirm that the bot is as well integrated into the AfC system as I can make it. Martin agrees. The Earwig ( Talk | Contributions) 02:46, 11 May 2009 (UTC) reply
Approved. Looks good, and I hope it makes AFC run more smoothly and reliably. – Quadell ( talk) 12:39, 11 May 2009 (UTC) reply
Operator: The Earwig ( Talk | Contributions)
Automatic or Manually Assisted: Automatic, unsupervised (although other users must confirm each copyvio before the request is denied.)
Programming Language(s): Python, Pywikipedia
Function Overview: Checks recent AfC submissions for copyright violations.
Edit period(s): Continuous (run from my own computer, so it won't operate when it's off, although my computer is on most of the time.)
Already has a bot flag (Y/N): N
Function Details: This is a bot that is resposible for doing a task that is completely ignored by the resident copyvio bot, CorenSearchBot. The bot's function will be that of a copyright violation-checking bot, similar to CorenSearchBot, which is currently the only running copyvio bot. However, instead of checking Special:NewPages for copyright violations, it checks Category:Pending Afc requests. Many new users submit copyrighted content for Articles for Creation, and CorenSearchBot does not seem to catch this. The bot will speed up the AfC proccess by placing this template on copyvio requests, saving reviewers time, and allowing them to spend more energy on reviewing other submissions. It will not deny requests: they will stay at pending status, but will have the message above them ( example). The {{{URL}}} parameter will be replaced by a link to the bot's log page, which will have information about what site the violation was found on, and what specific strings are in violation of copyright. It then relies on other users (often that will be me) to check the request and deny it if applicable. Thus, this log page will serve a similar function to WP:SCV, because due to technical limitations with Pywikipedia, logging on SCV is much harder (although I might make it possible in the future). See this page for the bot's source code.
The bot was tested on my other bot account, EarwigBot I, which is used for one-time tasks (none yet) and for making edits on my and its own userspace (because this does not require bot approval). See EarwigBot I's log page for testing that has occured. All aspects of the bot's code have been written, and testing without making edits (the debugging feature) passed successfully. The bot is both emergency shutoff and exclusion compliant: the first was coded by myself and the second is supported by the standard installation of Pywikipedia.
The bot's code works by first generating a list of all pages in Category:Pending Afc requests, then checking it against the Yahoo API (I have a key) to see if there are any copyright violations. These tasks are both handled by this chunk of code, which is a modified version of Copyright.py. Then, the bot places this template on each of the suspected copyright violation pages, and loads the details of each one onto User:EarwigBot II/Logs. I could go into more detail about this process, but that's probably not necessary.
Now for an explaination of the bot's usefulness. I am a rather active member at WikiProject Articles for Creation, so I have a lot of experience when it comes to copyright violations concerning that section of Wikipedia. Second to notability, or possibly third to verifiability, there is no doubt that it is one of the most common reasons why a request gets declined. Now, usually a sizeable portion (5–10%) of the AfC submissions have some form of blatant copyright infringement, and an even higher number have copied sentences from other sources in them that are often not caught until its too late.
You may be thinking that the bot has no purpose, because someone is eventually going to catch the copyright violation. This is not the case, for two major reasons. First of all, Category:Pending Afc requests may not get backlogged a lot (this required over 52 articles in the queue), but there are often submissions in the category that remain unreviewed for days when this could be avoided if there were less articles to review. This bot would make the process move faster, something that's good no matter what part of Wikipedia we're talking about.
Second, not all AfC submissions actually have their copyrighted content removed after they are declined or accepted. I have yet to see a blatant occur when the article is accepted, but I've noticed copyright violations in declined requests on several occasions. ( this wasn't caught in this submission and this wasn't caught in this submission, for example, and that's only in the first 200 entires.) And it's obvious that the reason why we would want to have these removed is the same major reason that we want all copyright violations to be removed: legal issues. Just because an article is in the declined requests pile doesn't mean that someone won't eventually notice it. Heck, I certainly did!
As a final note, I must point out that the internet copyvio checker portion of my bot's code is by no means rudimentary. One of the biggest problems I've noticed with bot requests is that the code doesn't account for loopholes, such how quotes with [sic] can affect spell-checker bots (one of the reasons we don't have them), or how free sources can affect copyvio bots (like this one). The Copyright.py module, developed by Francesco Cosoleto in 2006, not only maintains a client-side constantly-updated database of mirrors and forks from Wikipedia:Mirrors and forks, but it has an exclusions list and a protected-sites list. With these features, combined with a well-developed and tested module and a conservative template that does not straightaway deny requests, even the rare and dreaded false-positive will have little affect on how the bot will help Articles for Creation. Thank you for taking your time to read the details of this submission. I eagerly await any responses.
Approved for trial (30 edits). Please provide a link to the relevant contributions and/or diffs when the trial is complete. I'm curious to see how this works in practice. – Quadell ( talk) 15:23, 6 May 2009 (UTC) reply
EarwigBot II Logs — Trial 1 Last updated: The Earwig ( Talk | Contributions) 02:46, 11 May 2009 (UTC) reply Edits completed: 25/30
Session 1 Pages checked: 32 Suspected copyvio articles found: 6 Number of Yahoo queries: 325 Edits:
Comments:
Results:
Pages checked: 26 Suspected copyvio articles found: 2 Number of Yahoo queries: 261 Edits:
Comments:
Results:
Pages checked: 27 Suspected copyvio articles found: 2 Number of Yahoo queries: 261 Edits:
Comments:
Results:
Pages checked: 26 Suspected copyvio articles found: 1 Number of Yahoo queries: 222 Edits:
Comments:
Results:
Pages checked: 13 Suspected copyvio articles found: 1 Number of Yahoo queries: 124 Edits:
Comments:
Results:
|
Code changes: I made a few changes to the code in light of the trial that is currently in progress. The bot is now running v1.1: The Earwig ( Talk | Contributions) 22:47, 6 May 2009 (UTC) reply
I'm beginning Session 2 soon, as Session 1 is complete. Two more of the suspected copyvios were removed by other users, and I removed the last two. All of the requests involved a complete removal of content, except for this tagging. I declined this for notability, not copyvio, because the copyvio string was taken from a quote. The copyright.py module recognizes quotes, but this one was missed because it wasn't formatted properly. Other than that, all taggings seemed accurate. The Earwig ( Talk | Contributions) 00:25, 7 May 2009 (UTC) reply
Session 2 results posted above. The bot tagged two articles. The Earwig ( Talk | Contributions) 19:53, 7 May 2009 (UTC) reply
(outdent) I think that I'll side with the first idea. It would be a small matter of and editintro and a preload template, which can easily be created and won't require much extra coding for the bot itself. (The second option is harder, maybe I can implement this at a later date, but not now). As for the trial, if you think that the bot seems good so far, then I'll continue where the trial left off (i.e., 20 edits remaining) with the new template updates. By the time the next twenty edits or so are made, we can decide if there are any other necessary changes that I should make before the bot is approved. Thanks for your help! The Earwig ( Talk | Contributions) 00:28, 8 May 2009 (UTC) reply
Can I just say, as a participant of this WikiProject, that this bot seems to be doing a great job and is really useful for us. Some tweaks and streamlining might be in order, but don't worry too much because this is much better than what we had before which was nothing. About the idea of putting it on hold. This might work well actually. If you passed a code (e.g. "cv-bot") as the second parameter and the link to the log page as the third parameter then the reviewer tools could be adapted to provide a useful link. — Martin ( MSGJ · talk) 11:39, 8 May 2009 (UTC) reply
Session 4 results posted. I'm unable to have both the hold action and the template addition occur in one edit, so I just did the test with pending instead. The functionality for that, however, can be done. Hold on... The Earwig ( Talk | Contributions) 02:20, 10 May 2009 (UTC) reply
Session 5 results posted. I noticed a template error (easily fixed), and I have confirmed that the blacklist feature works! The bot logged an article that was on its blacklist, but it didn't change it. The Earwig ( Talk | Contributions) 17:41, 10 May 2009 (UTC) reply
Is the trial complete? If not, just let us know when it is. – Quadell ( talk) 22:48, 10 May 2009 (UTC) reply
{{
BotTrialComplete}}
on this page, because I recently made a code change and I want to make sure that everything's OK. Thanks for reminding me, though.
The Earwig (
Talk |
Contributions) 22:52, 10 May 2009 (UTC)
replyTrial complete. I ran the test I wanted, and it worked (although the bot didn't find any copyvios). The trial is now complete, with 25 edits made (although many of these were just logs done by the bot because I wanted to test a few things). I can now confirm that the bot is as well integrated into the AfC system as I can make it. Martin agrees. The Earwig ( Talk | Contributions) 02:46, 11 May 2009 (UTC) reply
Approved. Looks good, and I hope it makes AFC run more smoothly and reliably. – Quadell ( talk) 12:39, 11 May 2009 (UTC) reply