This is a reopened bot request - please see the bottom of this page for the most recent information |
Operator: Excirial ( Contact me, Contribs) 18:42, 7 January 2009 (UTC) reply
Automatic or Manually Assisted: Fully automatic, with the possibility to manually override the bots behavior if desired.
Programming Language(s): VB.net,
Function Summary:
Edit period(s) (e.g. Continuous, daily, one time run): Continuous
Edit rate requested: 1 edit per new page tops. (Estimated 10 edits a minute tops, currently a test setting that is open to be lowered.)
Already has a bot flag (Y/N): (Not applicable, new bot)
Extended content
| ||||||||||||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Function Details: Coreva's main task is placing maintenance tags on new pages that require them, similar to the way most newpagepatrol's work their beat. Coreva's will regularly(every 5-10 min) check the newpage list for new article's, fetch the new article's content, parse the content (See: Parser Table) and finally update the article, adding required maintenance tags. Just like the previous Coreva, this one should also be quite light on server resources. The bot queries the server's new page list every 5-10 minutes, and (So far) each article re quire's two server queries (getting the article's content, and a query to check if the article is an orphan). Category counts, link counts et cetera are handled internally by the bot. Additionally, the bot will require one database write to add the template's (In case this is required). The estimated edit rate for the bot will be 2 edits per minute on average. (See: Note 2) Coreva is not a miracle, and will never replace a living newpage patrol. Coreva cannot patrol for WP:CSD and does not understand hoaxes, advertising or vandalism. However, a lot of article's slip of the newpage list without having any form of maintenance tags. About half the pages on the newpagelist show as not being patrolled, and even though this is a very rough guess, this equals more then 2.000 pages a day. (See: Note 3) Since adding maintenance tags is thoroughly boring work, i think Coreva could spare quite a few patrols a bit of boredom :).(Unlike CSD tags which require at least some form of using your brain, maintenance tags require nothing more then checking 20 indicators, most of them nothing more then: Present/Not present) Finally, just like the old Coreva, its still pretty much work in progress, which is only done in spare time. While the progress on this Coreva is much faster then on the previous one, i assume it will still take a few months before it is capable of being a fully automated bot. Even if it would be technically capable to do so, it will not be a fully automatic bot until i tested it thoroughly (few weeks i guess) in assist mode, which means Coreva would only me feedback on what tag it would place on every page it checks. This way any annoying mistakes in the parser should be ironed out, while at the same time it allows to improve the parser code. Parser TableThis table gives an overview of the templates Coreva will be placing on the articles, along with the current criteria configuration for doing so. Note that this is still pretty much in beta stage; templates may be added and removed depending on tests. Also, the criteria are still based on very simple algorithm's. Coreva's tests are conducted on a very small and varied set of locally stored articles, thus criteria are still general. In their current form they should, however, produce very little false positives (But would likely have quite a few false negatives). So all in all: Work in progress! (See: Note 4)
Notes
Discussion
Reopening requestOver the past two ish months the amount of time i could spend on Wikipedia was drastically reduced due to other duties, causing a certain lapse in coreva's development. Another issue halting development progress was caused by an old programmers trap: Building a patched together prototype which should be trown away once i had a proof of concept it actually worked, and instead keeping the prototype and resuming work on it, which eventually let to a horrible code mess and a completely non understandable program. In the past month i finally found the time and willpower to use a step trough debugger throughout the entire program to decipher and salvage the mess as much as possible, before rewriting coreva from scratch, sans for a few salvaged functions that actually worked. The actual working of the bot have changed very little from the table i added above - i dropped the STUB, TOMANYCATS and TOMANYLINKS due to them being prone to false positive. I am currently testing a module that can detect peacock pages (Based upon statical analysis, weighted word lists andsome basic calculations); So far it work fine when comparing featured article's versus peacock articles (1 false positives on 270 correct tags), but the calculation algorithm makes to many mistakes on small articles, so its disabled for now. Et Cetera
Excirial ( Contact me, Contribs) 21:05, 11 June 2009 (UTC) reply
Approved for trial (10 edits). Please provide a link to the relevant contributions and/or diffs when the trial is complete. This is a very long RfBA, and the specs have changed throughout and are difficult to follow. I think the best way for all parties to understand what this bot would do is to give it a very small trial. – Quadell ( talk) 13:12, 18 June 2009 (UTC) reply
Approved for trial (20 edits). Please provide a link to the relevant contributions and/or diffs when the trial is complete. Okay, let's have another go. – Quadell ( talk) 22:38, 22 June 2009 (UTC) reply
|
Since this RFBA is quite old, it contains a lot of information which is no longer completely up to date. Besides, it has become so long that it is somewhat unreadable, thus here is a summary for quick reference.
General
What will Coreva-Bot's task be?': Coreva-Bot will function as a newpage patrol, checking article's for problems. Once it has found an issue it will add the appropriate maintenance templates to the articles.
How will Coreva operate? If coreva is started the first time - that is, its database backend is empty - it will query the server for the last 500 new pages list and save that list to the backend; If coreva already has data in its back end it will query the server for all pages created since it last ran (5000 limit, 500 for now as it is still not marked as a bot). Coreva will then load pages and check pages, filling its save buffer. The speed at which pages are checked depends on the amount of pages in the buffer - more pages means longer intervals. Every 6 seconds the buffer will be checked if there are pages to save - in case they are the oldest page will be saved with templates added.
Tagging Article's
What will Coreva-Bot template for?: {{
Uncategorised}}, {{
Unreferenced}}, {{
Footnotes}}, {{
Wikify}}, {{
Orphan}}, {{
Sections}}, {{
internallinks}}. Statical analysis shows that the {{
peacock}} template is prone to errors, which is why it is disabled indefinitely.
What restrictions apply for tagging: Coreva will not template any pages marked as CSD - but it will template PROD and AFD pages. Coreva will not tag removed pages. Coreva will not tag pages marked as Disambiguations (Includes the basic disambig template, all aliases and specialized disambiguate templates such as {{tl:hndis}}), It will not tag pages twice with the same template, in case maintenance templates already exist,
What are the criteria for each template to be added?: (Note: These criteria are constantly improved - Do note that they only grow stricter trough). Templates will not be added if one is already present.
Technical and operational limits
Todo
Coreva is quite near being "finished", at least the integral part of it. Due to the amount of templates the bot handles its filters will likely be constantly tweaked to reflect new templates or guidelines. In the future i might submit another feature request that in case Coreva runs out of new pages, it will check trough older pages at snale speed. Other then this the only thing that remains is some work on the GUI and efficiency of certain sections - none of which should change it controversially.
Due to some unforeseen circumstances i have been almost completely inactive the last 3 or so months, causing this bot request to expire yet again. Finally having found some spare time to work on this bot again, i would like to reopen this RFBA.
As for the current status: Bug number 4 is now solved, Coreva will only add the footnotes template to pages of substantial length. It will also converts ampersands and other reserved HTML characters correctly now before saving the page, and I also updated the regex's used to determine if a template should be placed; thus reducing the amount of false positives. Excirial ( Contact me, Contribs) 22:11, 30 October 2009 (UTC) reply
From looking at these, I think I would like to have broader community consensus for the orphan tagging, and for the tagging in general. The time looks like it should be longer, say 3 hours during some periods, but this may be flexible. I don't know if the question you asked is sufficient for understanding the community's desire to tag in general. I am concerned, as I said, about adding tags to certain types of generally stubby articles. Many stubs about living things are just a single line and a taxobox, while Cigaritis would be a better article if referenced, and should be referenced, and its lack of references should be called to someone's attention, adding a no references banner across the top will overpower the text and essentially, imo, make the article useless to the reader. It might as well be deleted.
Can articles be categorized unreferenced without the huge banner, or can it be put on the bottom of the page? Where are these categories of unreferenced articles, by the way, I would like to add references to many of them. -- 69.225.3.198 ( talk) 09:26, 4 November 2009 (UTC) reply
This article does not
cite any
sources. Please help
improve this article by
adding citations to reliable sources. Unsourced material may be challenged and
removed. Find sources: "Coreva-Bot 2" – news · newspapers · books · scholar · JSTOR ( Learn how and when to remove this message) |
Regarding the {{ Footnotes}} tag, what would the article do with new articles that use paranthetical references, and have a references section but do not use the <ref> tag? For instance take John Vanbrugh and assume the notes section (not related to referencing in this case) didn't exist; how would the bot approach this article? Christopher Parham (talk) 15:12, 7 December 2009 (UTC) reply
This is a reopened bot request - please see the bottom of this page for the most recent information |
Operator: Excirial ( Contact me, Contribs) 18:42, 7 January 2009 (UTC) reply
Automatic or Manually Assisted: Fully automatic, with the possibility to manually override the bots behavior if desired.
Programming Language(s): VB.net,
Function Summary:
Edit period(s) (e.g. Continuous, daily, one time run): Continuous
Edit rate requested: 1 edit per new page tops. (Estimated 10 edits a minute tops, currently a test setting that is open to be lowered.)
Already has a bot flag (Y/N): (Not applicable, new bot)
Extended content
| ||||||||||||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Function Details: Coreva's main task is placing maintenance tags on new pages that require them, similar to the way most newpagepatrol's work their beat. Coreva's will regularly(every 5-10 min) check the newpage list for new article's, fetch the new article's content, parse the content (See: Parser Table) and finally update the article, adding required maintenance tags. Just like the previous Coreva, this one should also be quite light on server resources. The bot queries the server's new page list every 5-10 minutes, and (So far) each article re quire's two server queries (getting the article's content, and a query to check if the article is an orphan). Category counts, link counts et cetera are handled internally by the bot. Additionally, the bot will require one database write to add the template's (In case this is required). The estimated edit rate for the bot will be 2 edits per minute on average. (See: Note 2) Coreva is not a miracle, and will never replace a living newpage patrol. Coreva cannot patrol for WP:CSD and does not understand hoaxes, advertising or vandalism. However, a lot of article's slip of the newpage list without having any form of maintenance tags. About half the pages on the newpagelist show as not being patrolled, and even though this is a very rough guess, this equals more then 2.000 pages a day. (See: Note 3) Since adding maintenance tags is thoroughly boring work, i think Coreva could spare quite a few patrols a bit of boredom :).(Unlike CSD tags which require at least some form of using your brain, maintenance tags require nothing more then checking 20 indicators, most of them nothing more then: Present/Not present) Finally, just like the old Coreva, its still pretty much work in progress, which is only done in spare time. While the progress on this Coreva is much faster then on the previous one, i assume it will still take a few months before it is capable of being a fully automated bot. Even if it would be technically capable to do so, it will not be a fully automatic bot until i tested it thoroughly (few weeks i guess) in assist mode, which means Coreva would only me feedback on what tag it would place on every page it checks. This way any annoying mistakes in the parser should be ironed out, while at the same time it allows to improve the parser code. Parser TableThis table gives an overview of the templates Coreva will be placing on the articles, along with the current criteria configuration for doing so. Note that this is still pretty much in beta stage; templates may be added and removed depending on tests. Also, the criteria are still based on very simple algorithm's. Coreva's tests are conducted on a very small and varied set of locally stored articles, thus criteria are still general. In their current form they should, however, produce very little false positives (But would likely have quite a few false negatives). So all in all: Work in progress! (See: Note 4)
Notes
Discussion
Reopening requestOver the past two ish months the amount of time i could spend on Wikipedia was drastically reduced due to other duties, causing a certain lapse in coreva's development. Another issue halting development progress was caused by an old programmers trap: Building a patched together prototype which should be trown away once i had a proof of concept it actually worked, and instead keeping the prototype and resuming work on it, which eventually let to a horrible code mess and a completely non understandable program. In the past month i finally found the time and willpower to use a step trough debugger throughout the entire program to decipher and salvage the mess as much as possible, before rewriting coreva from scratch, sans for a few salvaged functions that actually worked. The actual working of the bot have changed very little from the table i added above - i dropped the STUB, TOMANYCATS and TOMANYLINKS due to them being prone to false positive. I am currently testing a module that can detect peacock pages (Based upon statical analysis, weighted word lists andsome basic calculations); So far it work fine when comparing featured article's versus peacock articles (1 false positives on 270 correct tags), but the calculation algorithm makes to many mistakes on small articles, so its disabled for now. Et Cetera
Excirial ( Contact me, Contribs) 21:05, 11 June 2009 (UTC) reply
Approved for trial (10 edits). Please provide a link to the relevant contributions and/or diffs when the trial is complete. This is a very long RfBA, and the specs have changed throughout and are difficult to follow. I think the best way for all parties to understand what this bot would do is to give it a very small trial. – Quadell ( talk) 13:12, 18 June 2009 (UTC) reply
Approved for trial (20 edits). Please provide a link to the relevant contributions and/or diffs when the trial is complete. Okay, let's have another go. – Quadell ( talk) 22:38, 22 June 2009 (UTC) reply
|
Since this RFBA is quite old, it contains a lot of information which is no longer completely up to date. Besides, it has become so long that it is somewhat unreadable, thus here is a summary for quick reference.
General
What will Coreva-Bot's task be?': Coreva-Bot will function as a newpage patrol, checking article's for problems. Once it has found an issue it will add the appropriate maintenance templates to the articles.
How will Coreva operate? If coreva is started the first time - that is, its database backend is empty - it will query the server for the last 500 new pages list and save that list to the backend; If coreva already has data in its back end it will query the server for all pages created since it last ran (5000 limit, 500 for now as it is still not marked as a bot). Coreva will then load pages and check pages, filling its save buffer. The speed at which pages are checked depends on the amount of pages in the buffer - more pages means longer intervals. Every 6 seconds the buffer will be checked if there are pages to save - in case they are the oldest page will be saved with templates added.
Tagging Article's
What will Coreva-Bot template for?: {{
Uncategorised}}, {{
Unreferenced}}, {{
Footnotes}}, {{
Wikify}}, {{
Orphan}}, {{
Sections}}, {{
internallinks}}. Statical analysis shows that the {{
peacock}} template is prone to errors, which is why it is disabled indefinitely.
What restrictions apply for tagging: Coreva will not template any pages marked as CSD - but it will template PROD and AFD pages. Coreva will not tag removed pages. Coreva will not tag pages marked as Disambiguations (Includes the basic disambig template, all aliases and specialized disambiguate templates such as {{tl:hndis}}), It will not tag pages twice with the same template, in case maintenance templates already exist,
What are the criteria for each template to be added?: (Note: These criteria are constantly improved - Do note that they only grow stricter trough). Templates will not be added if one is already present.
Technical and operational limits
Todo
Coreva is quite near being "finished", at least the integral part of it. Due to the amount of templates the bot handles its filters will likely be constantly tweaked to reflect new templates or guidelines. In the future i might submit another feature request that in case Coreva runs out of new pages, it will check trough older pages at snale speed. Other then this the only thing that remains is some work on the GUI and efficiency of certain sections - none of which should change it controversially.
Due to some unforeseen circumstances i have been almost completely inactive the last 3 or so months, causing this bot request to expire yet again. Finally having found some spare time to work on this bot again, i would like to reopen this RFBA.
As for the current status: Bug number 4 is now solved, Coreva will only add the footnotes template to pages of substantial length. It will also converts ampersands and other reserved HTML characters correctly now before saving the page, and I also updated the regex's used to determine if a template should be placed; thus reducing the amount of false positives. Excirial ( Contact me, Contribs) 22:11, 30 October 2009 (UTC) reply
From looking at these, I think I would like to have broader community consensus for the orphan tagging, and for the tagging in general. The time looks like it should be longer, say 3 hours during some periods, but this may be flexible. I don't know if the question you asked is sufficient for understanding the community's desire to tag in general. I am concerned, as I said, about adding tags to certain types of generally stubby articles. Many stubs about living things are just a single line and a taxobox, while Cigaritis would be a better article if referenced, and should be referenced, and its lack of references should be called to someone's attention, adding a no references banner across the top will overpower the text and essentially, imo, make the article useless to the reader. It might as well be deleted.
Can articles be categorized unreferenced without the huge banner, or can it be put on the bottom of the page? Where are these categories of unreferenced articles, by the way, I would like to add references to many of them. -- 69.225.3.198 ( talk) 09:26, 4 November 2009 (UTC) reply
This article does not
cite any
sources. Please help
improve this article by
adding citations to reliable sources. Unsourced material may be challenged and
removed. Find sources: "Coreva-Bot 2" – news · newspapers · books · scholar · JSTOR ( Learn how and when to remove this message) |
Regarding the {{ Footnotes}} tag, what would the article do with new articles that use paranthetical references, and have a references section but do not use the <ref> tag? For instance take John Vanbrugh and assume the notes section (not related to referencing in this case) didn't exist; how would the bot approach this article? Christopher Parham (talk) 15:12, 7 December 2009 (UTC) reply