This is an archive of past discussions. Do not edit the contents of this page. If you wish to start a new discussion or revive an old one, please do so on the current talk page. |
Archive 1 | Archive 2 | Archive 3 | Archive 4 | Archive 5 | Archive 6 | → | Archive 10 |
I have written a disambiguation bot that looks for very specific text in pages and then changes that text. The current plan in to disambiguate [[Hispanic]] which has over 30,000 pages linking to it (the vast majority of those are from the data put in by User:Rambot). I was using the solve_disambiguation.py but even that is just way too slow and tedious. So I wrote my own bot. I registered User:KevinBot to run the bot.
The way the bot works is as follows:
Also, the bot has more than one throttle so I can slow it down to whatever threshold is deemed appropriate.
Note: this is a custom bot and is not related in any way to the python bots.
I am not yet done testing the bot, but I thought I'd throw it up here for consensus so that when I am finished testing I can get right to running it.
Kevin Rector 05:05, Jul 27, 2004 (UTC)
I agree with Docu on changing [[Asia]]n and thanks to Kevin for stepping up to fix this. I'd prefer [[Asian (U.S. Census)|Asian]] (note the full stops and capitalization) over [[Asian American|Asian]] because Rambot was changing all the links to point to [[Race (U.S. Census)]]<nowiki> before he vanished without finishing. I personally think linking all of these racial labels to the same place is confusing, so the links should direct to [Asian American|Asian], [Hispanic American|Hispanic], etc., but using <nowiki>[[Asian (U.S. Census)|Asian]] allows us to change this linkage to [[Race (U.S. Census)]]<nowiki> should the consensus change. We should also link to <nowiki>[[White (U.S. Census)|White]] and [[Pacific Islander (U.S. Census)]] and change "African American" to "Black and African American" (per census wording). -- Jia ng 05:23, 29 Jul 2004 (UTC)
I didn't even know that there was a Race (U.S. Census) until I read these posts. I like the idea of making all the races point to this one article which will explain what the census data means clearly and consicely for all the races. If we need to break it down any more from there, we can. [[Race (U.S. Census)|Hispanic]] and [[Race (U.S. Census)|Asian]] and [[Race (U.S. Census)|White]] really works well for me. If concensus changes I easily run the bot to change it to [[Asian American|Asian]] or [Hispanic American|Hispanic]. Kevin Rector 04:17, Jul 30, 2004 (UTC)
Some counts: There are 32'010 links to Hispanic, 33'905 to Asia and already 4869 to Race (U.S. census). At the rate of 6 per minute, one can edit approx. 8640 articles per day. If recall correctly, there are at least 25'000 references that should be changed. -- User:Docu
That's a good point about using redirects. I like it. That's what we should do. Also, I've finished testing my bot, and it seems to be working really well. I'm going to run it on 10 articles and see how it fares. I'll post the list of articles edited on User:KevinBot. That way we can check them to make sure there isn't anything catastrophic that needs to be repaired before we mark it as a bot and let it loose. Kevin Rector 20:26, Jul 30, 2004 (UTC)
Ok, the bots run a bit on a limited basis to see how well it works and it's working like a charm. So whoever it is that can mark bots please mark User:KevinBot as a bot. Thanks. Kevin Rector 02:47, Aug 3, 2004 (UTC)
KevinBot is now marked as a bot. Angela . 22:18, Aug 4, 2004 (UTC)
Sauðkindin (The Jumbuch) has been running as the interwiki bot on is for some time now, as of July 28, 2004 it has accumulated 93314 interwiki links in 6093 articles that need to be updated, there of 1505 in 205 articles on the english wikipedia.
What i want is permission to run the following command say every two weeks on the english wikipedia:
python interwiki.py -warnfile:warnfile_en.log
This will check the correctness of whether the interwiki link to is: should be updated, and if so proceed to do so, this will be very low traffic, it's only so large now because there has previously not been any interwiki bot running on the is: and if the links on en: are updated it will also be shared around the rest of the languages. -- Ævar Arnfjörð Bjarmason 04:33, 2004 Jul 28 (UTC)
I'd like to write a bot to import the FCC's list of US broadcast stations including FM, AM, and Television. Nothing's been written yet but I wanted to make sure this was okay to do before I bothered to do any work. If someone else wants to do it, that's great. I'm willing to do it but I don't want to duplicate effort. Not sure whether I'd start from scratch or use the python bot.
Posted a request on the pump yesterday but I didn't get any replies, so I thought I'd bring it here. Following is my comment from the pump. Rhobite 15:43, Jul 29, 2004 (UTC)
I would like to run the disambiguation bot from pywikipediabot. It is possible I would later run other bots, almost certainly user controlled bots. For instance I think it would be great if there was a bot that could easily be used for categorizing groups of articles, something I already do; the bot would just speed things up. (It looks like replace.py might satisfy this, but for the time being I am mainly interested in disambiguation).
I am a bit unfamiliar with the bot regulation system. I know for major, non-interactive bots, there is a period where it is expected to be run at a very reduced rate, but it seems that at least disambiguation bots don't receive nearly the scrutiny as others. However, I would be grateful if someone explained the proper guidelines, so that I don't cause trouble and frustration to others. I will be running the bot at User:BenjBot and have basically figured out all the scripts and am really just waiting for the go ahead.
Thanks -- Benjamin Goldenberg 06:33, 5 Aug 2004 (UTC)
Bots must not make modification to comments signed by individuals. Even, it would be better to remove the comment entirely than to attribute text to an individual that they did not create. Other than that, changes on discussion pages can destroy discussions that are premised, for instance, on the peculiarities of linking or disambiguation pages, etc. Please make this change to the Project Page. - Centrx 21:15, 5 Aug 2004 (UTC)
Hi. I'm interested in creating a 'bot for detecting link rot. As I see it at present, I'd get the bot to download a random page once per suitable time period. The bot would then extract external links from the page, and check the pages pointed to by these links to see if they are still there. Links which remain inaccessible for (say) a number of days would then be listed on a web page. Humans could then occasionally check a page (on my server) to find a list of dead links, and the wikipedia pages that they're on, and could go and have a look.
Comments? If I did this, I would write the program myself and host the bot here (University of Westminster, UK).
Ross-c 15:29, 17 Aug 2004 (UTC)
Does anyone have a link to the discussion of this bot? I couldn't find it. anthony (see warning) 03:09, 30 Aug 2004 (UTC)
Information on what Janna is currently doing will be kept on User:Janna. anthony (see warning)
It appears that a bot from IP 209.90.162.1 is making numerous incorrect changes to the encyclopedia. The kind of change I have noticed is linking instances of "chemical" to "chemical compound" indiscriminately. In many, nay MOST, cases, these changes are patently false and it is becoming a pain for me to go through and revert them. - Centrx 23:23, 16 Sep 2004 (UTC)
I would like to get permission to run a user-controlled pywikipediabot to make the spelling of science and chemistry articles consistant with the IUPAC nomenclature rules. It would only change articles I specifically told it to, and would only be making changes that I would make anyways.
I have registered the account Darbot for this task should my request be approved.
Darrien 05:27, 2004 Sep 17 (UTC)
As a chemist, I think this is a good idea. We're an encyclopedia of chemistry, we should use the proper names for chemicals, compounds, ions, ect. Gentgeen 07:19, 25 Sep 2004 (UTC)
I object. IUPAC doesn't determine our spelling. anthony (see warning) 21:12, 25 Sep 2004 (UTC)
This is a good idea, and seems like it will be run in a sensible manner. (I would also not want to see strange IUPAC names as article titles of chemicals everyone knows by a more common name). I might even go a step further though and say the IUPAC name should be somewhere in articles with common names, and they often are. See Caffeine. I don't know what possible objection there would be to changing the names from a random hodge-podge of spelling and previous deprecated standards to the current international standard which is used and accepted by chemists around the world. If someone is willing to do this major undertaking I think we should be appreciative. - [[User:Cohesion|cohesion ☎]] 19:11, Sep 26, 2004 (UTC)
Sounds good to me. Thue | talk 19:49, 26 Sep 2004 (UTC)
Makes a lot of sense to me. The changeover from ferrous & ferric to iron (II) & iron (III), etc., started at least 30 years ago, and has nothing to do with American vs British spelling. The IUPAC system is consistent and easier to understand than the historical accidents it replaces. I'm quite puzzled by the opposition to the proposed name changes. Wile E. Heresiarch 03:10, 27 Sep 2004 (UTC)
This would be helpful to the Wikipedia as a whole. I'm for it. -- 131.91.238.38 00:09, 9 Oct 2004 (UTC)
I am strongly opposed. See Talk:Global warming and Wikipedia talk:Manual of Style ( William M. Connolley 17:31, 14 Oct 2004 (UTC)). Sulphate should not be replaced with sulfate, or any other americanisations, outside the chemistry articles.
(note: the following including the alteration of Dysprosia's comment, was added by Mr. Jones. comment replaced above by sannse (talk) 09:57, 5 Nov 2004 (UTC) I'll pop my name by my interpolation too. Mr. Jones 21:21, 5 Nov 2004 (UTC))
(note: end - sannse (talk) 09:57, 5 Nov 2004 (UTC))
I am going to start testing this bot soon, it's sat in discussion long enough. General community consensus is that it's a good idea and most of the objections have been from people who falsely believe that the purpose of this bot is to "Americanize" articles. Darrien 10:12, 2004 Nov 5 (UTC)
I've found two recent cases in which, after a week of no comments at all on their requests, people began running bots. Both of these bots have caused some amount of controversy. (In one case, because people listed objections before the bot was run, but after a week had passed. In the other because the bot started making bad edits.) What do people think of ammeding the rules to make clear that someone, at least, needs to say "OK, run it." before a bot is run? Snowspinner 20:35, Sep 26, 2004 (UTC)
This bot was requested on June 27th by User:Docu. The request was not particularly detailed - it only noted that it wanted to run the pywikibot. No one commented on it one way or another. After a week, it was added to the list of bots, where its intended purposes were finally declared.
The bot is currently being used to categorize a bunch of biographical articles by year of birth and death. These categorizations do not have consensus, are against the policy at Wikipedia:Categories, lists, and series boxes, and people are finding errors in them.
For now, I'm blocking the bot as unapproved to do what it is doing and as messing up articles, but I wanted to start a discussion here about the bot so that it can, hopefully, get reapproved with some description of what it's actually supposed to be doing. Snowspinner 20:35, Sep 26, 2004 (UTC)
That a number of people are still complaining about the categories makes me wary of their expansion. As I've said, they don't seem to fit in with Wikipedia:Categories, lists, and series boxes. But that's OK - it's just that the bot's approval was more than a little vague. Snowspinner 21:39, Sep 26, 2004 (UTC)
A bit of care with this bot please - it added [[Category:1981 births]] to Owen Hargreaves over 3 days after this article had a copyvio notice slapped on it. Articles with copyvio notices should not be edited at all. -- Arwel 14:28, 2 Oct 2004 (UTC)
I would like to run the Pywikipediabot as Snowbot. The only purpose of this bot will be to handle templates for deletion. As it stands, if a widely used template gets deleted, I have to remove it manually from pages, which can take upwards of an hour. Snowspinner 21:49, Sep 26, 2004 (UTC)
I withdraw the request, as I'm going on an indefinite wikibreak due to the continual harassment of Netoholic and orthogonal. Snowspinner 01:05, Sep 27, 2004 (UTC)
I object to this bot. Widely used templates should be hard to delete. The only time I could see a use for automated destruction of a template is when the template was created through an automated means. It shouldn't be easier to destroy than to create. I am also hesitant about letting Snowspinner run this bot unilaterally. Taking a look at templates for deletion he seems to get into heated arguments in favor of deleting certain templates, and I don't trust him to understand when lack of dissent comes from the fact that not many people are aware of the discussion. A template which is widely used and was not created through automated means should be strongly presumed to not have a consensus for removal, dispite the fact that no one came to its rescue on TfD (which unlike VfD is not very well advertised). anthony (see warning)
I would like to run pywikipediabot with User:Topjabot for various tasks: solving disambiguations, copying images to Commons and changing tables to wiki-syntax. I won't use fully-automated scripts, so there is no risk of some sort of malfunction leading to massive damage. Gerritholl 16:34, 30 Sep 2004 (UTC)
Is the module category.py [14] of pywikipediabot suitable for use on Wikipedia? Can it be run in automatic mode if it uses a reasonable list of articles to add specific categories?
A bug currently adds occasionally a duplicate category if there is an existing category with a different (generally incomplete) sortkey. If the list of articles is filtered with the last available version of en_categorylinks_table.sql.gz, it would estimate theses cases to less than 0.2%.
If the module is used in manual mode (confirmation of each addition), I assume it's not considered a bot subject to registration, even if this is likely to increase the effective number of categories add during the time it's used. -- User:Docu
I have done a bot that cheks consistency of language links in all languages. WWW user interface is in [15]. It does not change anything automatically and it just goes through all the referred page once. I'd like to get permission to keep this service in my web page (or in any other site like in Wikipedia - it is free). It is standalone Python code. More information behind the link. -- User:Etu
Nickj is seeking the approval of Wikipedia talk:Bots for a small semi-automated trial run of the link suggester on 100 or less pages. Please see this page for a detailed description of what this script is, and more info what it will do and what it will not do. -- Nickj 07:45, 18 Oct 2004 (UTC)
LinkBot is now marked as a bot following a request at m:Requests for permissions. Angela . 23:58, Nov 24, 2004 (UTC)
The LinkBot has just uploaded suggestions to exactly 100 pages - you can get a full list of those here. I'd like to do a further trial run tomorrow - would a 1000 pages be acceptable ? All the best, -- Nickj 11:53, 1 Dec 2004 (UTC)
Is there, or could there be written a bot to fix brackets, see User:Nickj/Wiki_Syntax/Index -- it would probably need a human to click yes/no to fix it, since some might not be wrong. Dunc| ☺ 12:35, 3 Nov 2004 (UTC)
For what its worth, I plan to resume the rambot's tasks sometime possibly in the next couple of days. This is not a departure from previous actions, but I thought I should at least mention it for those who care. See " rambot" for some of the things that will be performed. The tasks represent requests for changes that are months and months overdue. -- Ram-Man 13:00, Nov 8, 2004 (UTC)
Once the rambot is unblocked and the discussions on server load below are fully and completely hashed out, I plan to run a bunch of various tasks. In terms of cities and counties, I plan to add UN/LOCODEs to the cities that have them. This will include setting up shortcuts in bulk to facilitate easy usage (e.g. UN/LOCODE:USLAX for Los Angeles, California). Simultaneously with this I will be adding Template:Mapit-US-cityscale templates to the external links section of every city that has GPS coordinates listed. This is a thousands of cities. This will add automatic links to street maps, satellite photos, and a topographical map of the location in question, a terribly useful thing to have. (See the example in Cleveland, Ohio). In addition to the tasks above, I also plan to add a short two or three sentence request to every user talk page who has not been asked about multi-licensing. This latter option may not happen depending on whether or not I can get developer help to do it directly with the database to eliminate server load and the need to use a bot. See User:rambot for any additional information (as always). – Ram-Man ( comment) ( talk)[[]] 13:29, Dec 16, 2004 (UTC)
Well I am placing this message here because of previous requests to make my intentions known on this page. I am asking for explicit permission. Discuss and vote away, although I fear that only those who are opposed to my plan are actually paying any attention to me. If we do vote here, we should count all of those users that have already given their support to me, since spamming them and asking for them to vote again would hardly be appropriate. – Ram-Man ( comment) ( talk)[[]] 17:26, Dec 16, 2004 (UTC)
One word: Approval. More words: Read this page (and others too), as it is common practice to vote on bot proposals. – Ram-Man ( comment) ( talk)[[]] 18:06, Dec 16, 2004 (UTC)
This entire bot policy is mostly the design of a very small group of people. The discussion on the limitation on how fast a bot operates was discussed on IRC, and therefore we have no way of knowing who or what came up with it. IIRC, I was the one who originally added the 3-point policy, which was later turned into a 4-point policy when "approved" as added. Everyone has since almost religiously followed both forms of the policy, and it has worked quite well as general guidelines. The problem with the other specific rules is that they are quite inflexible and not strictly followed by bot owners, particularly the speed restrictions. Part of this is because we generally trust a bot after it has proven itself, but a lot has to do with the relatively few number of people who frequent this page. It is quite hard to get an adequate consensus, as it is often typically biased either in favor of bots or not in favor, depending. – Ram-Man ( comment) ( talk)[[]] 21:59, Dec 16, 2004 (UTC)
I need to clarify what I want to do with the rambot. After using the rambot to do reads an discover which users have edited rambot articles, I then plan to take that list and ask all of the people on the list that I have not already asked. To do so, I would use the bot to add a short message which would link to a page with more detailed information. I would do somewhat small batches of users at a time, something like 50-100 users, and then wait for responses. The reason for this is because if I go to fast, not only does it look like mass-quick-spam, but it also causes too many users to respond at once. When users have stopped responding and their questions answered, I will do another batch of users, as appropriate until the task is completed. This behavior will be significantly different from the previous action of doing about 1,000 people in a very short time with a very large spam-like message. The main purpose is to save me the time from having to manually open their talk page, click on "Post a comment", and finally copy and paste the message before saving it. I figure it will save me at least 10-30 minutes per batch so I can do other things at the same time. There has been the suggestion to try and get help from a database administrator to help allieviate any potential strain on the server. This would probably require adding the message to all of the user account at once, but would not require using the bot. If anyone would rather have this option, mention it. If neither option is acceptable, I will manually perform the action. Does anyone have any objections to this new behavior, and if so which category do they fit in: (1) Problems with posting a short message on multi-licensing? (2) Problems with a bot posting messages on user talk pages? (3) Other concerns? – Ram-Man ( comment) ( talk)[[]] 19:14, Dec 16, 2004 (UTC)
Having recently started the rambot, it comes to little surprise that someone is unhappy. I've gotten a complaint from User:Docu stating that I have been violating the 6 transactions per minute rule specified on the Bots page, which is true. As requested, I am bringing it up here so that the "rule" might be "changed". For the record, the bot does not violate any of harmless, server hog, useful, or approved. Since the transaction rule was at best discussed based on IRC discussions that are not logged, I have no way of knowing the precident for it. So I will try my best. There originally were three compelling reasons to limit bot transactions: 1) Server Load, 2) the Recent Changes was too cluttered otherwise, and 3) to allow time for other users to verify the changes made by the bot. #1 never existed as the bot's edits represent a tiny fraction of the total server load. #2 was fixed by the implementation of the bot flag which was implemented BECAUSE of the rambot. #3 applies to those bots with small data sets or those data sets that vary greatly. The rambot's data set is so large that adding delay makes for an unmanageable amount of time to complete even very simple tasks. When I move to the cities (from counties), the amount of time will increase by more than 10 times. What I am saying is that no one is going to check 2,000 let along 35,000 articles, so #3 is not compelling. What does not change is that people can randomly sample the data (as I do when constantly monitoring the bot run). And the user's contributions can easily be checked. If there was a lot of variability in the changes, then there MAY be a reason to check more of the entries, but still no reason to slow down. Wikipedia is not about strict rules but an ever-changing evironment that adapts. In fact, aside from NPOV, there are not many hard and fast rules at all. I will make all efforts to enforce accuracy, but what were talking about is the difference between a week of editing vs. 3 weeks of editing. Maybe 2 weeks of time is meaningless to a lot of people, but not to me. If it makes everyone feel better, I can run three bots from three different IP addresses on three different data sets, and that would technically not violate any of the rules. The point being that the rule makes no sense out of context. I'm not even suggesting that we change the guideline. As a general guideline it makes perfect sense, but as we mention at the start of Wikiprojects, these things are what a group of users got together to work on, and they are not hard and fast rules. -- Ram-Man 03:30, Nov 10, 2004 (UTC)
My only concern with changing the rule, and this doesn't really apply to Rambot, is that going over 6 transactions per minute makes your changes really hard to revert. Thus we start getting into a technocracy where whoever has the faster bot wins. Yes, there is an approval process for bots, but there really isn't that much interest in it. I'd favor approving an increase in the speed limit for rambot, for this particular run. I'd also favor allowing others to receive an increase for a specific run which is extremely well defined. But I think this should require a specific proposal which is approved by at least say 10 people and after the bot has already run in slow mode for a day or so. As for the speed, 120/minute sounds a little high. I'd want some input from a few developers before going over 60/minute or so (which I believe is the read-only speed limit in the robots.txt file). The details could be worked out, but that's my suggestion rather than eliminating the rule. anthony 警告 13:27, 11 Nov 2004 (UTC)
BTW, looking at this page I don't think D6 is yet a good candidate for having the speed limit raised. anthony 警告 13:33, 11 Nov 2004 (UTC)
Two more comments on rambot. I'm hesitant about adding "WikiProject boilerplating to the talk page of the state's cities." I don't think it's useful, and I think it's harmful, as it suggests that there is discussion on the talk page when it's really just a spam link to someone's project. Secondly, I'd like to see the details of the "automatic disambiguation". This is a very hard thing to do right. anthony 警告 13:36, 11 Nov 2004 (UTC)
With regard to the bot speed limit, the best solution to hundreds or thousands of bad bot edits is to do another bot run to clean up. In the very worst case, a generic "revert bot" can just undo all the bad bot's recent edits. With my list of 50,000 changes, what I did was run the first dozen or so, wait a day, check for comments, and only when everything was settled, let it run unattended. It has taken days or weeks for people to find some systematic errors, which I will be fixing myself with some cleanup runs. I don't think the speed limit really helped much. It just means that I have to check my bot every day to make sure it's still running, which is time I could be spending readying subsequent runs or editing articles. Sometimes it auto-detects systematic errors, and the speed limit actually creates a delay in me noticing that. Articles also get edited during the course of the run, which can cause some inconsistencies, above and beyond the fact that a long run leaves some articles one way and others a different way, for a longer period of time than a short run. It's also something of a waste of time for human editors to be running after a bot to fix bad edits as they happen, when bad edits could be reverted en masse automatically. So I don't think having humans be able to keep up with the actual editing process is necessarily a good reason for a speed limit. If something has gone wrong, it should be just as easy to deal with after the fact and while it's in progress.
Perhaps an official, community-approved "revert bot", which could be deployed on short notice, would make sure that's the case.
In short, I think restricting bots to sequential edits is sufficient, as long as the number of people running bots (and the number of bots per person) is small compared to the Wikipedia population (or, more directly, server capacity). My bot automatically stops editing if Wikipedia takes too long to respond, on the general theory that load is probably too high, or that something else has gone wrong. But if even immediate sequential edits have a negligible performance impact, raising the speed limit will probably improve human productivity. Bot authors need to supervise their bots less, and human editors will be not be making changes that a bot was going to get around to anyway, or that a slow-running bot will later come by and obsolete.
It should be a stated rule that bot owners shall not knowingly make ongoing, conflicting edits with one another. We have the three-revert rule to prevent humans from operating on the "fastest mouse wins" principle. Given the potentially large number of articles involved, I think there's good reason to have a one-revert rule for bots. The idea is that if a bot owner would like to change or revert what another bot owner has done, they should get community approval in the appropriate forum. Which they really should be getting anyway, but perhaps an explicit rule would make people more comfortable. I think this issue is rather orthogonal to the speed limit issue. -- Beland 06:34, 17 Nov 2004 (UTC)
I have stayed out of this conversation for the most part. However, I would like to note that I agree with User:Ram-Man and User:Beland. The reality is that there are actually very few of us people running actual bots. I think that for the most part we respect the other people and there is little chance of a turf fight. For instance, I was thinking about doing another run of User:KevinBot on the Rambot articles, but when I learned that the Rambot had been dusted off, I shelved the idea. I would also like to note that if a bot does make mistakes, I agree that a subsequent bot run can clean it up, and that for the most part the bot authors are fairly well upstanding types and will clean up any unintended messes. I also agree that the throttle rule for bots is obsolete and should be done away with unless it can be proven that the bots are slowing the servers down to something more than a neglible amount. Kevin Rector 04:11, Nov 25, 2004 (UTC)
Ram-Man, you've severely under-stated the load issue from bots. It is not a trivial portion of the load at the limiting rates given in the current policy. The site currently sees about 25 write "queries" per second average, perhaps twice that at peak times; about 250,000 edits/moves/whatever per week total for the English language Wikipedia. Limiting write rate, no reads at all, is perhaps 100-150 write queries per second for the main service database servers (depends on the operation, edit saves are more costly than many). Limiting write rate for one of the backup slaves is about 25-35 writes per second and for another a bit higher. One edit save involves anywhere from about 6 to thousands of write queries (relatively few are that costly) immediately and perhaps 5-10 later when the first reader comes along. Call it about 8 per save on average for immediate effect. One bot at the current limit of 6 per minute for 8 hours a day can do 20,160 edits, about 10% of the total for the English language. Odds are that those will be happening at the busiest times for the site and will have a greater negative effect than the count implies because of their timing. Each of those changes also flushes the changed page from the site caches, a significant apache web server and squid cache server load factor.
You'll need to find a better way to do this, one which, at a minimum, can be run at off peak times. Jamesday 04:51, 13 Dec 2004 (UTC)
About 250,000 changes per week for en wikipedia. 20,160 per week for one bot running 8 hours a day a the current policy limit of 6 per minute is 8% of that. Run the bot 24 hours a day and it's 24% for one bot. Run 8 hours a day at 2 seconds per edit and it's 100,800 or 40% of the weekly edit count. At present there are 19 accounts with the bot flag and were 234,343 operations in the preceding 7 days. The actual number of those edits performed by bot-flagged acounts was 10,524 or 4.5%, broken down as follows:
+----------------+------------+ | count(rc_user) | user_name | +----------------+------------+ | 1634 | Rambot | | 9 | Robbot | | 279 | Guanabot | | 8065 | CanisRufus | | 121 | Janna | | 416 | Pearle | +----------------+------------+
For about 10 hours a day the systems are operating within about 15% of the highest load typically seen on any given day (based on the Squid stats linked from Meta:Wikimedia servers). Seems unlikely that a bot running at a significant rate when the system is within 15% of peak Monday load (the busiest day, typically) isn't doing harm. Load decreases steadily during the week and by Friday the load has dropped substantially. Saturday is generally pretty quiet, an ideal day for bots to run - peaks may be as low as 700 requests per second, while without Apache CPU limiting slowing things down Monday can be peaking a over 1,100 requests per second. Of those requests, about 78% are served from Squid cache. How did the bot operators do at avoiding peak times? Here are the figures for Monday 1300-2300, the times when the system was within 15% of max load:
+----------------+-----------+ | count(rc_user) | user_name | +----------------+-----------+ | 5 | Janna | | 27 | Pearle | +----------------+-----------+
And for the whole week:
+----------------+------------+ | count(rc_user) | user_name | +----------------+------------+ | 159 | Rambot | | 1 | Robbot | | 8 | Guanabot | | 1214 | CanisRufus | | 30 | Pearle | +----------------+------------+
Through chance or design the bot operators did avoid the worst response time period on the busiest day of the week but the bots in use didn't avoid the busiest times on the rest of the week.
One well and visibly monitored factor is apache web server CPU load. The ganglia stats linked from the servers page wil tell you the percentage of CPU use on those servers. If that CPU use is 90%, response time is very significantly affected; buy more threshold is unofficially set at 60% at peak times and max comfort level is about 85% for load which can be shed easily. As you can tell (if the charts are up again after last night's work), now is usually a bad time to be running bots.
The 25-35 updates per second database slave is no longer a factor. It's now the primary upload/download server for the site. New slowest is about 40-60 per second, to a pair of 250GB 7200RPM SATA drives in RAID 0 with about 300MB of RAM allocated to database duty.
Avoiding the times I've mentioned here is a good way not to be noticed. Jamesday 13:12, 14 Dec 2004 (UTC)
So it sounds like the concern that sequential edits with no delay will cause excessive server load has been addressed with the fact that our current hardware makes this kind load negligible. Concerns about runaway bots making bad or conflicting changes can be accommodated with a simple rule and the ability to call out a "revert bot" after community approval, respectively.
Therefore, I propose changing the policy from:
to:
If there are no objections by the beginning of 11 Dec 2004, let's declare the above resolution as policy. If we don't have complete consensus, then we can start tallying up votes one way or the other, and consider amendments as needed. -- Beland 04:18, 6 Dec 2004 (UTC)
Debate extended to the beginning of 18 Dec 2004 to allow people from the village pump to wander over. (Posting a note there now.) -- Beland 00:04, 13 Dec 2004 (UTC)
I am proposing a 'sandbot', a simple robot which would purge the contents of the sandbox automatically every six hours. It could also be used to re-paste the sandbox header code back into the 'box if it gets dleted by accident or whatever. - Litefantastic 15:51, 11 Nov 2004 (UTC)
Agree with Ram-Man about using idle time in addition to the six hour limit. I'd even support up to every hour I suppose. I've noticed the sandbox message isn't readded very often any more. I thought we had abandoned it. anthony 警告 17:42, 11 Nov 2004 (UTC)
Why not delete certain rude words (i.e. f***, n*****) too? Bart133 18:33, 15 Jan 2005 (UTC)
I would like to run pywikipediabot's interwiki.py
(unmodified code) manually as Tkbot, for the purpose of cleaning up interwiki links on articles I've created or edited. —
Tkinias 04:49, 5 Dec 2004 (UTC)
I've been thinking about a way of creating some "missing" redirects and disambig pages. There's some info about it here, but the short version is that people had to go out of their way to link a bit of text on something other than just the straight text itself - i.e. the link target and the link label differ, and the label has no page currently, but the target does. Using that information, we can add redirects (where all the links agree), or disambiguation pages (where they don't).
Using this method, I've made example lists of "missing" redirects and disambiguation pages:
Redirects:
Disambig:
Notes:
Now, the question I had is this: Do you think auto-creating redirects like this is a good idea? and Do you think auto-creating the disambiguation pages like this is a good idea? Would it help? If the consensus is that it's a bad idea, then I'll drop it, but if the consensus is that it's a good idea, then I'll make a simple bot that does this.
Also to give you a ballpark idea of the amount of new pages that we're talking about, in this dataset it's:
All the best, -- Nickj 09:05, 2 Dec 2004 (UTC)
Thank you for your comments. I was starting to wonder whether anyone was going to reply! On those inappropriate suggestions:
All the best, -- Nickj
But wouldn't this Wikipedia naming convention mean plural redirects are OK? All the best, -- Nickj 04:03, 4 Dec 2004 (UTC)
It certainly seems like a good idea to me to suggest redirects/disambigs for humans to inspect. (And certainly once there's an approved list, it's much faster if a bot comes along and does the heavy clicking.) It's not unlike what I've been doing with Wikipedia:Auto-categorization and Category:Orphaned categories. This sort of prep works seems to make human editors more enthusiastic and also saves a lot of time once someone does actually get around to completing these tasks. With the latter, I sort by frequency and then alphabetically, which may or not be helpful for your lists. I was thinking that supplying a snippet of context around the broken links might be helpful. If there are like 50 pages that point to the same place, it would be a bit cumbersome to show 50 lines just for that one page, but for 5-10, if context were included it might be a lot faster to verify that all of the articles were, in fact, referring to the same entity. It would make the pages longer, though. -- Beland 07:21, 4 Dec 2004 (UTC)
This is not really the type of bot that you're worried about, but to be virtuous I'll list it here: I'd like to automate the once-per-day update of the WikiReader Cryptography Article of the Day box (an editor-oriented template not used in main article space), which is currently being uploaded by hand. The script would run once every 24 hours under User:Matt Crypto (bot). — Matt Crypto 14:44, 14 Dec 2004 (UTC)
Since today is friday and tommorow is saturday (typical low server load times) and no objections have been stated to the adding of UN/LOCODEs and adding the map templates, I plan to start immediately assuming I can get the rambot programmed in time, rather than wait until next weekend for the next "downtime". It would be nice to have a category which lists those cities that have LOCODEs attached to them, but that would be a category with a few thousand entries. Should I just hold off on this? They can always be added later, I suppose if some better system is worked out. Update: See here for an example of some of the current work being done. – Ram-Man ( comment) ( talk)[[]] 20:58, Dec 17, 2004 (UTC)
I'd like to request permission to run a bot, User:DanBot, using pywikipediabot, whose primary purpose will be to make corrections to the various highly similar articles containing Formula One statistics. First, it will convert a bunch of navigation tables (such as the one at the bottom of 1990 Australian Grand Prix) to templates (such as the one at the bottom of 2004 Monaco Grand Prix. The bot's other tasks will be similar tweaks to this limited body of articles. — Dan | Talk 04:25, 20 Dec 2004 (UTC)
I plan on creating a bot to edit my userpage and update my personal statistics once a day during offpeak hours. I may work on other projects too at some point, for instance I have an idea for helping get media files from the various wiki's to commons, however I will test these on the test wiki first, and post my intentions here before I do anything different with the bot. — マイケル ₪ 14:46, Dec 21, 2004 (UTC)
This is an archive of past discussions. Do not edit the contents of this page. If you wish to start a new discussion or revive an old one, please do so on the current talk page. |
Archive 1 | Archive 2 | Archive 3 | Archive 4 | Archive 5 | Archive 6 | → | Archive 10 |
I have written a disambiguation bot that looks for very specific text in pages and then changes that text. The current plan in to disambiguate [[Hispanic]] which has over 30,000 pages linking to it (the vast majority of those are from the data put in by User:Rambot). I was using the solve_disambiguation.py but even that is just way too slow and tedious. So I wrote my own bot. I registered User:KevinBot to run the bot.
The way the bot works is as follows:
Also, the bot has more than one throttle so I can slow it down to whatever threshold is deemed appropriate.
Note: this is a custom bot and is not related in any way to the python bots.
I am not yet done testing the bot, but I thought I'd throw it up here for consensus so that when I am finished testing I can get right to running it.
Kevin Rector 05:05, Jul 27, 2004 (UTC)
I agree with Docu on changing [[Asia]]n and thanks to Kevin for stepping up to fix this. I'd prefer [[Asian (U.S. Census)|Asian]] (note the full stops and capitalization) over [[Asian American|Asian]] because Rambot was changing all the links to point to [[Race (U.S. Census)]]<nowiki> before he vanished without finishing. I personally think linking all of these racial labels to the same place is confusing, so the links should direct to [Asian American|Asian], [Hispanic American|Hispanic], etc., but using <nowiki>[[Asian (U.S. Census)|Asian]] allows us to change this linkage to [[Race (U.S. Census)]]<nowiki> should the consensus change. We should also link to <nowiki>[[White (U.S. Census)|White]] and [[Pacific Islander (U.S. Census)]] and change "African American" to "Black and African American" (per census wording). -- Jia ng 05:23, 29 Jul 2004 (UTC)
I didn't even know that there was a Race (U.S. Census) until I read these posts. I like the idea of making all the races point to this one article which will explain what the census data means clearly and consicely for all the races. If we need to break it down any more from there, we can. [[Race (U.S. Census)|Hispanic]] and [[Race (U.S. Census)|Asian]] and [[Race (U.S. Census)|White]] really works well for me. If concensus changes I easily run the bot to change it to [[Asian American|Asian]] or [Hispanic American|Hispanic]. Kevin Rector 04:17, Jul 30, 2004 (UTC)
Some counts: There are 32'010 links to Hispanic, 33'905 to Asia and already 4869 to Race (U.S. census). At the rate of 6 per minute, one can edit approx. 8640 articles per day. If recall correctly, there are at least 25'000 references that should be changed. -- User:Docu
That's a good point about using redirects. I like it. That's what we should do. Also, I've finished testing my bot, and it seems to be working really well. I'm going to run it on 10 articles and see how it fares. I'll post the list of articles edited on User:KevinBot. That way we can check them to make sure there isn't anything catastrophic that needs to be repaired before we mark it as a bot and let it loose. Kevin Rector 20:26, Jul 30, 2004 (UTC)
Ok, the bots run a bit on a limited basis to see how well it works and it's working like a charm. So whoever it is that can mark bots please mark User:KevinBot as a bot. Thanks. Kevin Rector 02:47, Aug 3, 2004 (UTC)
KevinBot is now marked as a bot. Angela . 22:18, Aug 4, 2004 (UTC)
Sauðkindin (The Jumbuch) has been running as the interwiki bot on is for some time now, as of July 28, 2004 it has accumulated 93314 interwiki links in 6093 articles that need to be updated, there of 1505 in 205 articles on the english wikipedia.
What i want is permission to run the following command say every two weeks on the english wikipedia:
python interwiki.py -warnfile:warnfile_en.log
This will check the correctness of whether the interwiki link to is: should be updated, and if so proceed to do so, this will be very low traffic, it's only so large now because there has previously not been any interwiki bot running on the is: and if the links on en: are updated it will also be shared around the rest of the languages. -- Ævar Arnfjörð Bjarmason 04:33, 2004 Jul 28 (UTC)
I'd like to write a bot to import the FCC's list of US broadcast stations including FM, AM, and Television. Nothing's been written yet but I wanted to make sure this was okay to do before I bothered to do any work. If someone else wants to do it, that's great. I'm willing to do it but I don't want to duplicate effort. Not sure whether I'd start from scratch or use the python bot.
Posted a request on the pump yesterday but I didn't get any replies, so I thought I'd bring it here. Following is my comment from the pump. Rhobite 15:43, Jul 29, 2004 (UTC)
I would like to run the disambiguation bot from pywikipediabot. It is possible I would later run other bots, almost certainly user controlled bots. For instance I think it would be great if there was a bot that could easily be used for categorizing groups of articles, something I already do; the bot would just speed things up. (It looks like replace.py might satisfy this, but for the time being I am mainly interested in disambiguation).
I am a bit unfamiliar with the bot regulation system. I know for major, non-interactive bots, there is a period where it is expected to be run at a very reduced rate, but it seems that at least disambiguation bots don't receive nearly the scrutiny as others. However, I would be grateful if someone explained the proper guidelines, so that I don't cause trouble and frustration to others. I will be running the bot at User:BenjBot and have basically figured out all the scripts and am really just waiting for the go ahead.
Thanks -- Benjamin Goldenberg 06:33, 5 Aug 2004 (UTC)
Bots must not make modification to comments signed by individuals. Even, it would be better to remove the comment entirely than to attribute text to an individual that they did not create. Other than that, changes on discussion pages can destroy discussions that are premised, for instance, on the peculiarities of linking or disambiguation pages, etc. Please make this change to the Project Page. - Centrx 21:15, 5 Aug 2004 (UTC)
Hi. I'm interested in creating a 'bot for detecting link rot. As I see it at present, I'd get the bot to download a random page once per suitable time period. The bot would then extract external links from the page, and check the pages pointed to by these links to see if they are still there. Links which remain inaccessible for (say) a number of days would then be listed on a web page. Humans could then occasionally check a page (on my server) to find a list of dead links, and the wikipedia pages that they're on, and could go and have a look.
Comments? If I did this, I would write the program myself and host the bot here (University of Westminster, UK).
Ross-c 15:29, 17 Aug 2004 (UTC)
Does anyone have a link to the discussion of this bot? I couldn't find it. anthony (see warning) 03:09, 30 Aug 2004 (UTC)
Information on what Janna is currently doing will be kept on User:Janna. anthony (see warning)
It appears that a bot from IP 209.90.162.1 is making numerous incorrect changes to the encyclopedia. The kind of change I have noticed is linking instances of "chemical" to "chemical compound" indiscriminately. In many, nay MOST, cases, these changes are patently false and it is becoming a pain for me to go through and revert them. - Centrx 23:23, 16 Sep 2004 (UTC)
I would like to get permission to run a user-controlled pywikipediabot to make the spelling of science and chemistry articles consistant with the IUPAC nomenclature rules. It would only change articles I specifically told it to, and would only be making changes that I would make anyways.
I have registered the account Darbot for this task should my request be approved.
Darrien 05:27, 2004 Sep 17 (UTC)
As a chemist, I think this is a good idea. We're an encyclopedia of chemistry, we should use the proper names for chemicals, compounds, ions, ect. Gentgeen 07:19, 25 Sep 2004 (UTC)
I object. IUPAC doesn't determine our spelling. anthony (see warning) 21:12, 25 Sep 2004 (UTC)
This is a good idea, and seems like it will be run in a sensible manner. (I would also not want to see strange IUPAC names as article titles of chemicals everyone knows by a more common name). I might even go a step further though and say the IUPAC name should be somewhere in articles with common names, and they often are. See Caffeine. I don't know what possible objection there would be to changing the names from a random hodge-podge of spelling and previous deprecated standards to the current international standard which is used and accepted by chemists around the world. If someone is willing to do this major undertaking I think we should be appreciative. - [[User:Cohesion|cohesion ☎]] 19:11, Sep 26, 2004 (UTC)
Sounds good to me. Thue | talk 19:49, 26 Sep 2004 (UTC)
Makes a lot of sense to me. The changeover from ferrous & ferric to iron (II) & iron (III), etc., started at least 30 years ago, and has nothing to do with American vs British spelling. The IUPAC system is consistent and easier to understand than the historical accidents it replaces. I'm quite puzzled by the opposition to the proposed name changes. Wile E. Heresiarch 03:10, 27 Sep 2004 (UTC)
This would be helpful to the Wikipedia as a whole. I'm for it. -- 131.91.238.38 00:09, 9 Oct 2004 (UTC)
I am strongly opposed. See Talk:Global warming and Wikipedia talk:Manual of Style ( William M. Connolley 17:31, 14 Oct 2004 (UTC)). Sulphate should not be replaced with sulfate, or any other americanisations, outside the chemistry articles.
(note: the following including the alteration of Dysprosia's comment, was added by Mr. Jones. comment replaced above by sannse (talk) 09:57, 5 Nov 2004 (UTC) I'll pop my name by my interpolation too. Mr. Jones 21:21, 5 Nov 2004 (UTC))
(note: end - sannse (talk) 09:57, 5 Nov 2004 (UTC))
I am going to start testing this bot soon, it's sat in discussion long enough. General community consensus is that it's a good idea and most of the objections have been from people who falsely believe that the purpose of this bot is to "Americanize" articles. Darrien 10:12, 2004 Nov 5 (UTC)
I've found two recent cases in which, after a week of no comments at all on their requests, people began running bots. Both of these bots have caused some amount of controversy. (In one case, because people listed objections before the bot was run, but after a week had passed. In the other because the bot started making bad edits.) What do people think of ammeding the rules to make clear that someone, at least, needs to say "OK, run it." before a bot is run? Snowspinner 20:35, Sep 26, 2004 (UTC)
This bot was requested on June 27th by User:Docu. The request was not particularly detailed - it only noted that it wanted to run the pywikibot. No one commented on it one way or another. After a week, it was added to the list of bots, where its intended purposes were finally declared.
The bot is currently being used to categorize a bunch of biographical articles by year of birth and death. These categorizations do not have consensus, are against the policy at Wikipedia:Categories, lists, and series boxes, and people are finding errors in them.
For now, I'm blocking the bot as unapproved to do what it is doing and as messing up articles, but I wanted to start a discussion here about the bot so that it can, hopefully, get reapproved with some description of what it's actually supposed to be doing. Snowspinner 20:35, Sep 26, 2004 (UTC)
That a number of people are still complaining about the categories makes me wary of their expansion. As I've said, they don't seem to fit in with Wikipedia:Categories, lists, and series boxes. But that's OK - it's just that the bot's approval was more than a little vague. Snowspinner 21:39, Sep 26, 2004 (UTC)
A bit of care with this bot please - it added [[Category:1981 births]] to Owen Hargreaves over 3 days after this article had a copyvio notice slapped on it. Articles with copyvio notices should not be edited at all. -- Arwel 14:28, 2 Oct 2004 (UTC)
I would like to run the Pywikipediabot as Snowbot. The only purpose of this bot will be to handle templates for deletion. As it stands, if a widely used template gets deleted, I have to remove it manually from pages, which can take upwards of an hour. Snowspinner 21:49, Sep 26, 2004 (UTC)
I withdraw the request, as I'm going on an indefinite wikibreak due to the continual harassment of Netoholic and orthogonal. Snowspinner 01:05, Sep 27, 2004 (UTC)
I object to this bot. Widely used templates should be hard to delete. The only time I could see a use for automated destruction of a template is when the template was created through an automated means. It shouldn't be easier to destroy than to create. I am also hesitant about letting Snowspinner run this bot unilaterally. Taking a look at templates for deletion he seems to get into heated arguments in favor of deleting certain templates, and I don't trust him to understand when lack of dissent comes from the fact that not many people are aware of the discussion. A template which is widely used and was not created through automated means should be strongly presumed to not have a consensus for removal, dispite the fact that no one came to its rescue on TfD (which unlike VfD is not very well advertised). anthony (see warning)
I would like to run pywikipediabot with User:Topjabot for various tasks: solving disambiguations, copying images to Commons and changing tables to wiki-syntax. I won't use fully-automated scripts, so there is no risk of some sort of malfunction leading to massive damage. Gerritholl 16:34, 30 Sep 2004 (UTC)
Is the module category.py [14] of pywikipediabot suitable for use on Wikipedia? Can it be run in automatic mode if it uses a reasonable list of articles to add specific categories?
A bug currently adds occasionally a duplicate category if there is an existing category with a different (generally incomplete) sortkey. If the list of articles is filtered with the last available version of en_categorylinks_table.sql.gz, it would estimate theses cases to less than 0.2%.
If the module is used in manual mode (confirmation of each addition), I assume it's not considered a bot subject to registration, even if this is likely to increase the effective number of categories add during the time it's used. -- User:Docu
I have done a bot that cheks consistency of language links in all languages. WWW user interface is in [15]. It does not change anything automatically and it just goes through all the referred page once. I'd like to get permission to keep this service in my web page (or in any other site like in Wikipedia - it is free). It is standalone Python code. More information behind the link. -- User:Etu
Nickj is seeking the approval of Wikipedia talk:Bots for a small semi-automated trial run of the link suggester on 100 or less pages. Please see this page for a detailed description of what this script is, and more info what it will do and what it will not do. -- Nickj 07:45, 18 Oct 2004 (UTC)
LinkBot is now marked as a bot following a request at m:Requests for permissions. Angela . 23:58, Nov 24, 2004 (UTC)
The LinkBot has just uploaded suggestions to exactly 100 pages - you can get a full list of those here. I'd like to do a further trial run tomorrow - would a 1000 pages be acceptable ? All the best, -- Nickj 11:53, 1 Dec 2004 (UTC)
Is there, or could there be written a bot to fix brackets, see User:Nickj/Wiki_Syntax/Index -- it would probably need a human to click yes/no to fix it, since some might not be wrong. Dunc| ☺ 12:35, 3 Nov 2004 (UTC)
For what its worth, I plan to resume the rambot's tasks sometime possibly in the next couple of days. This is not a departure from previous actions, but I thought I should at least mention it for those who care. See " rambot" for some of the things that will be performed. The tasks represent requests for changes that are months and months overdue. -- Ram-Man 13:00, Nov 8, 2004 (UTC)
Once the rambot is unblocked and the discussions on server load below are fully and completely hashed out, I plan to run a bunch of various tasks. In terms of cities and counties, I plan to add UN/LOCODEs to the cities that have them. This will include setting up shortcuts in bulk to facilitate easy usage (e.g. UN/LOCODE:USLAX for Los Angeles, California). Simultaneously with this I will be adding Template:Mapit-US-cityscale templates to the external links section of every city that has GPS coordinates listed. This is a thousands of cities. This will add automatic links to street maps, satellite photos, and a topographical map of the location in question, a terribly useful thing to have. (See the example in Cleveland, Ohio). In addition to the tasks above, I also plan to add a short two or three sentence request to every user talk page who has not been asked about multi-licensing. This latter option may not happen depending on whether or not I can get developer help to do it directly with the database to eliminate server load and the need to use a bot. See User:rambot for any additional information (as always). – Ram-Man ( comment) ( talk)[[]] 13:29, Dec 16, 2004 (UTC)
Well I am placing this message here because of previous requests to make my intentions known on this page. I am asking for explicit permission. Discuss and vote away, although I fear that only those who are opposed to my plan are actually paying any attention to me. If we do vote here, we should count all of those users that have already given their support to me, since spamming them and asking for them to vote again would hardly be appropriate. – Ram-Man ( comment) ( talk)[[]] 17:26, Dec 16, 2004 (UTC)
One word: Approval. More words: Read this page (and others too), as it is common practice to vote on bot proposals. – Ram-Man ( comment) ( talk)[[]] 18:06, Dec 16, 2004 (UTC)
This entire bot policy is mostly the design of a very small group of people. The discussion on the limitation on how fast a bot operates was discussed on IRC, and therefore we have no way of knowing who or what came up with it. IIRC, I was the one who originally added the 3-point policy, which was later turned into a 4-point policy when "approved" as added. Everyone has since almost religiously followed both forms of the policy, and it has worked quite well as general guidelines. The problem with the other specific rules is that they are quite inflexible and not strictly followed by bot owners, particularly the speed restrictions. Part of this is because we generally trust a bot after it has proven itself, but a lot has to do with the relatively few number of people who frequent this page. It is quite hard to get an adequate consensus, as it is often typically biased either in favor of bots or not in favor, depending. – Ram-Man ( comment) ( talk)[[]] 21:59, Dec 16, 2004 (UTC)
I need to clarify what I want to do with the rambot. After using the rambot to do reads an discover which users have edited rambot articles, I then plan to take that list and ask all of the people on the list that I have not already asked. To do so, I would use the bot to add a short message which would link to a page with more detailed information. I would do somewhat small batches of users at a time, something like 50-100 users, and then wait for responses. The reason for this is because if I go to fast, not only does it look like mass-quick-spam, but it also causes too many users to respond at once. When users have stopped responding and their questions answered, I will do another batch of users, as appropriate until the task is completed. This behavior will be significantly different from the previous action of doing about 1,000 people in a very short time with a very large spam-like message. The main purpose is to save me the time from having to manually open their talk page, click on "Post a comment", and finally copy and paste the message before saving it. I figure it will save me at least 10-30 minutes per batch so I can do other things at the same time. There has been the suggestion to try and get help from a database administrator to help allieviate any potential strain on the server. This would probably require adding the message to all of the user account at once, but would not require using the bot. If anyone would rather have this option, mention it. If neither option is acceptable, I will manually perform the action. Does anyone have any objections to this new behavior, and if so which category do they fit in: (1) Problems with posting a short message on multi-licensing? (2) Problems with a bot posting messages on user talk pages? (3) Other concerns? – Ram-Man ( comment) ( talk)[[]] 19:14, Dec 16, 2004 (UTC)
Having recently started the rambot, it comes to little surprise that someone is unhappy. I've gotten a complaint from User:Docu stating that I have been violating the 6 transactions per minute rule specified on the Bots page, which is true. As requested, I am bringing it up here so that the "rule" might be "changed". For the record, the bot does not violate any of harmless, server hog, useful, or approved. Since the transaction rule was at best discussed based on IRC discussions that are not logged, I have no way of knowing the precident for it. So I will try my best. There originally were three compelling reasons to limit bot transactions: 1) Server Load, 2) the Recent Changes was too cluttered otherwise, and 3) to allow time for other users to verify the changes made by the bot. #1 never existed as the bot's edits represent a tiny fraction of the total server load. #2 was fixed by the implementation of the bot flag which was implemented BECAUSE of the rambot. #3 applies to those bots with small data sets or those data sets that vary greatly. The rambot's data set is so large that adding delay makes for an unmanageable amount of time to complete even very simple tasks. When I move to the cities (from counties), the amount of time will increase by more than 10 times. What I am saying is that no one is going to check 2,000 let along 35,000 articles, so #3 is not compelling. What does not change is that people can randomly sample the data (as I do when constantly monitoring the bot run). And the user's contributions can easily be checked. If there was a lot of variability in the changes, then there MAY be a reason to check more of the entries, but still no reason to slow down. Wikipedia is not about strict rules but an ever-changing evironment that adapts. In fact, aside from NPOV, there are not many hard and fast rules at all. I will make all efforts to enforce accuracy, but what were talking about is the difference between a week of editing vs. 3 weeks of editing. Maybe 2 weeks of time is meaningless to a lot of people, but not to me. If it makes everyone feel better, I can run three bots from three different IP addresses on three different data sets, and that would technically not violate any of the rules. The point being that the rule makes no sense out of context. I'm not even suggesting that we change the guideline. As a general guideline it makes perfect sense, but as we mention at the start of Wikiprojects, these things are what a group of users got together to work on, and they are not hard and fast rules. -- Ram-Man 03:30, Nov 10, 2004 (UTC)
My only concern with changing the rule, and this doesn't really apply to Rambot, is that going over 6 transactions per minute makes your changes really hard to revert. Thus we start getting into a technocracy where whoever has the faster bot wins. Yes, there is an approval process for bots, but there really isn't that much interest in it. I'd favor approving an increase in the speed limit for rambot, for this particular run. I'd also favor allowing others to receive an increase for a specific run which is extremely well defined. But I think this should require a specific proposal which is approved by at least say 10 people and after the bot has already run in slow mode for a day or so. As for the speed, 120/minute sounds a little high. I'd want some input from a few developers before going over 60/minute or so (which I believe is the read-only speed limit in the robots.txt file). The details could be worked out, but that's my suggestion rather than eliminating the rule. anthony 警告 13:27, 11 Nov 2004 (UTC)
BTW, looking at this page I don't think D6 is yet a good candidate for having the speed limit raised. anthony 警告 13:33, 11 Nov 2004 (UTC)
Two more comments on rambot. I'm hesitant about adding "WikiProject boilerplating to the talk page of the state's cities." I don't think it's useful, and I think it's harmful, as it suggests that there is discussion on the talk page when it's really just a spam link to someone's project. Secondly, I'd like to see the details of the "automatic disambiguation". This is a very hard thing to do right. anthony 警告 13:36, 11 Nov 2004 (UTC)
With regard to the bot speed limit, the best solution to hundreds or thousands of bad bot edits is to do another bot run to clean up. In the very worst case, a generic "revert bot" can just undo all the bad bot's recent edits. With my list of 50,000 changes, what I did was run the first dozen or so, wait a day, check for comments, and only when everything was settled, let it run unattended. It has taken days or weeks for people to find some systematic errors, which I will be fixing myself with some cleanup runs. I don't think the speed limit really helped much. It just means that I have to check my bot every day to make sure it's still running, which is time I could be spending readying subsequent runs or editing articles. Sometimes it auto-detects systematic errors, and the speed limit actually creates a delay in me noticing that. Articles also get edited during the course of the run, which can cause some inconsistencies, above and beyond the fact that a long run leaves some articles one way and others a different way, for a longer period of time than a short run. It's also something of a waste of time for human editors to be running after a bot to fix bad edits as they happen, when bad edits could be reverted en masse automatically. So I don't think having humans be able to keep up with the actual editing process is necessarily a good reason for a speed limit. If something has gone wrong, it should be just as easy to deal with after the fact and while it's in progress.
Perhaps an official, community-approved "revert bot", which could be deployed on short notice, would make sure that's the case.
In short, I think restricting bots to sequential edits is sufficient, as long as the number of people running bots (and the number of bots per person) is small compared to the Wikipedia population (or, more directly, server capacity). My bot automatically stops editing if Wikipedia takes too long to respond, on the general theory that load is probably too high, or that something else has gone wrong. But if even immediate sequential edits have a negligible performance impact, raising the speed limit will probably improve human productivity. Bot authors need to supervise their bots less, and human editors will be not be making changes that a bot was going to get around to anyway, or that a slow-running bot will later come by and obsolete.
It should be a stated rule that bot owners shall not knowingly make ongoing, conflicting edits with one another. We have the three-revert rule to prevent humans from operating on the "fastest mouse wins" principle. Given the potentially large number of articles involved, I think there's good reason to have a one-revert rule for bots. The idea is that if a bot owner would like to change or revert what another bot owner has done, they should get community approval in the appropriate forum. Which they really should be getting anyway, but perhaps an explicit rule would make people more comfortable. I think this issue is rather orthogonal to the speed limit issue. -- Beland 06:34, 17 Nov 2004 (UTC)
I have stayed out of this conversation for the most part. However, I would like to note that I agree with User:Ram-Man and User:Beland. The reality is that there are actually very few of us people running actual bots. I think that for the most part we respect the other people and there is little chance of a turf fight. For instance, I was thinking about doing another run of User:KevinBot on the Rambot articles, but when I learned that the Rambot had been dusted off, I shelved the idea. I would also like to note that if a bot does make mistakes, I agree that a subsequent bot run can clean it up, and that for the most part the bot authors are fairly well upstanding types and will clean up any unintended messes. I also agree that the throttle rule for bots is obsolete and should be done away with unless it can be proven that the bots are slowing the servers down to something more than a neglible amount. Kevin Rector 04:11, Nov 25, 2004 (UTC)
Ram-Man, you've severely under-stated the load issue from bots. It is not a trivial portion of the load at the limiting rates given in the current policy. The site currently sees about 25 write "queries" per second average, perhaps twice that at peak times; about 250,000 edits/moves/whatever per week total for the English language Wikipedia. Limiting write rate, no reads at all, is perhaps 100-150 write queries per second for the main service database servers (depends on the operation, edit saves are more costly than many). Limiting write rate for one of the backup slaves is about 25-35 writes per second and for another a bit higher. One edit save involves anywhere from about 6 to thousands of write queries (relatively few are that costly) immediately and perhaps 5-10 later when the first reader comes along. Call it about 8 per save on average for immediate effect. One bot at the current limit of 6 per minute for 8 hours a day can do 20,160 edits, about 10% of the total for the English language. Odds are that those will be happening at the busiest times for the site and will have a greater negative effect than the count implies because of their timing. Each of those changes also flushes the changed page from the site caches, a significant apache web server and squid cache server load factor.
You'll need to find a better way to do this, one which, at a minimum, can be run at off peak times. Jamesday 04:51, 13 Dec 2004 (UTC)
About 250,000 changes per week for en wikipedia. 20,160 per week for one bot running 8 hours a day a the current policy limit of 6 per minute is 8% of that. Run the bot 24 hours a day and it's 24% for one bot. Run 8 hours a day at 2 seconds per edit and it's 100,800 or 40% of the weekly edit count. At present there are 19 accounts with the bot flag and were 234,343 operations in the preceding 7 days. The actual number of those edits performed by bot-flagged acounts was 10,524 or 4.5%, broken down as follows:
+----------------+------------+ | count(rc_user) | user_name | +----------------+------------+ | 1634 | Rambot | | 9 | Robbot | | 279 | Guanabot | | 8065 | CanisRufus | | 121 | Janna | | 416 | Pearle | +----------------+------------+
For about 10 hours a day the systems are operating within about 15% of the highest load typically seen on any given day (based on the Squid stats linked from Meta:Wikimedia servers). Seems unlikely that a bot running at a significant rate when the system is within 15% of peak Monday load (the busiest day, typically) isn't doing harm. Load decreases steadily during the week and by Friday the load has dropped substantially. Saturday is generally pretty quiet, an ideal day for bots to run - peaks may be as low as 700 requests per second, while without Apache CPU limiting slowing things down Monday can be peaking a over 1,100 requests per second. Of those requests, about 78% are served from Squid cache. How did the bot operators do at avoiding peak times? Here are the figures for Monday 1300-2300, the times when the system was within 15% of max load:
+----------------+-----------+ | count(rc_user) | user_name | +----------------+-----------+ | 5 | Janna | | 27 | Pearle | +----------------+-----------+
And for the whole week:
+----------------+------------+ | count(rc_user) | user_name | +----------------+------------+ | 159 | Rambot | | 1 | Robbot | | 8 | Guanabot | | 1214 | CanisRufus | | 30 | Pearle | +----------------+------------+
Through chance or design the bot operators did avoid the worst response time period on the busiest day of the week but the bots in use didn't avoid the busiest times on the rest of the week.
One well and visibly monitored factor is apache web server CPU load. The ganglia stats linked from the servers page wil tell you the percentage of CPU use on those servers. If that CPU use is 90%, response time is very significantly affected; buy more threshold is unofficially set at 60% at peak times and max comfort level is about 85% for load which can be shed easily. As you can tell (if the charts are up again after last night's work), now is usually a bad time to be running bots.
The 25-35 updates per second database slave is no longer a factor. It's now the primary upload/download server for the site. New slowest is about 40-60 per second, to a pair of 250GB 7200RPM SATA drives in RAID 0 with about 300MB of RAM allocated to database duty.
Avoiding the times I've mentioned here is a good way not to be noticed. Jamesday 13:12, 14 Dec 2004 (UTC)
So it sounds like the concern that sequential edits with no delay will cause excessive server load has been addressed with the fact that our current hardware makes this kind load negligible. Concerns about runaway bots making bad or conflicting changes can be accommodated with a simple rule and the ability to call out a "revert bot" after community approval, respectively.
Therefore, I propose changing the policy from:
to:
If there are no objections by the beginning of 11 Dec 2004, let's declare the above resolution as policy. If we don't have complete consensus, then we can start tallying up votes one way or the other, and consider amendments as needed. -- Beland 04:18, 6 Dec 2004 (UTC)
Debate extended to the beginning of 18 Dec 2004 to allow people from the village pump to wander over. (Posting a note there now.) -- Beland 00:04, 13 Dec 2004 (UTC)
I am proposing a 'sandbot', a simple robot which would purge the contents of the sandbox automatically every six hours. It could also be used to re-paste the sandbox header code back into the 'box if it gets dleted by accident or whatever. - Litefantastic 15:51, 11 Nov 2004 (UTC)
Agree with Ram-Man about using idle time in addition to the six hour limit. I'd even support up to every hour I suppose. I've noticed the sandbox message isn't readded very often any more. I thought we had abandoned it. anthony 警告 17:42, 11 Nov 2004 (UTC)
Why not delete certain rude words (i.e. f***, n*****) too? Bart133 18:33, 15 Jan 2005 (UTC)
I would like to run pywikipediabot's interwiki.py
(unmodified code) manually as Tkbot, for the purpose of cleaning up interwiki links on articles I've created or edited. —
Tkinias 04:49, 5 Dec 2004 (UTC)
I've been thinking about a way of creating some "missing" redirects and disambig pages. There's some info about it here, but the short version is that people had to go out of their way to link a bit of text on something other than just the straight text itself - i.e. the link target and the link label differ, and the label has no page currently, but the target does. Using that information, we can add redirects (where all the links agree), or disambiguation pages (where they don't).
Using this method, I've made example lists of "missing" redirects and disambiguation pages:
Redirects:
Disambig:
Notes:
Now, the question I had is this: Do you think auto-creating redirects like this is a good idea? and Do you think auto-creating the disambiguation pages like this is a good idea? Would it help? If the consensus is that it's a bad idea, then I'll drop it, but if the consensus is that it's a good idea, then I'll make a simple bot that does this.
Also to give you a ballpark idea of the amount of new pages that we're talking about, in this dataset it's:
All the best, -- Nickj 09:05, 2 Dec 2004 (UTC)
Thank you for your comments. I was starting to wonder whether anyone was going to reply! On those inappropriate suggestions:
All the best, -- Nickj
But wouldn't this Wikipedia naming convention mean plural redirects are OK? All the best, -- Nickj 04:03, 4 Dec 2004 (UTC)
It certainly seems like a good idea to me to suggest redirects/disambigs for humans to inspect. (And certainly once there's an approved list, it's much faster if a bot comes along and does the heavy clicking.) It's not unlike what I've been doing with Wikipedia:Auto-categorization and Category:Orphaned categories. This sort of prep works seems to make human editors more enthusiastic and also saves a lot of time once someone does actually get around to completing these tasks. With the latter, I sort by frequency and then alphabetically, which may or not be helpful for your lists. I was thinking that supplying a snippet of context around the broken links might be helpful. If there are like 50 pages that point to the same place, it would be a bit cumbersome to show 50 lines just for that one page, but for 5-10, if context were included it might be a lot faster to verify that all of the articles were, in fact, referring to the same entity. It would make the pages longer, though. -- Beland 07:21, 4 Dec 2004 (UTC)
This is not really the type of bot that you're worried about, but to be virtuous I'll list it here: I'd like to automate the once-per-day update of the WikiReader Cryptography Article of the Day box (an editor-oriented template not used in main article space), which is currently being uploaded by hand. The script would run once every 24 hours under User:Matt Crypto (bot). — Matt Crypto 14:44, 14 Dec 2004 (UTC)
Since today is friday and tommorow is saturday (typical low server load times) and no objections have been stated to the adding of UN/LOCODEs and adding the map templates, I plan to start immediately assuming I can get the rambot programmed in time, rather than wait until next weekend for the next "downtime". It would be nice to have a category which lists those cities that have LOCODEs attached to them, but that would be a category with a few thousand entries. Should I just hold off on this? They can always be added later, I suppose if some better system is worked out. Update: See here for an example of some of the current work being done. – Ram-Man ( comment) ( talk)[[]] 20:58, Dec 17, 2004 (UTC)
I'd like to request permission to run a bot, User:DanBot, using pywikipediabot, whose primary purpose will be to make corrections to the various highly similar articles containing Formula One statistics. First, it will convert a bunch of navigation tables (such as the one at the bottom of 1990 Australian Grand Prix) to templates (such as the one at the bottom of 2004 Monaco Grand Prix. The bot's other tasks will be similar tweaks to this limited body of articles. — Dan | Talk 04:25, 20 Dec 2004 (UTC)
I plan on creating a bot to edit my userpage and update my personal statistics once a day during offpeak hours. I may work on other projects too at some point, for instance I have an idea for helping get media files from the various wiki's to commons, however I will test these on the test wiki first, and post my intentions here before I do anything different with the bot. — マイケル ₪ 14:46, Dec 21, 2004 (UTC)