For giving feedback on the Link Suggester / LinkBot, please use one of the following pages:
If you're not sure which page to use, just pick the one that seems closest.
I hope this isn't TOO radical, but I see a rather simplistic front end for linkbot. The suggestion of {{linkthis}} is JUST a suggestion. The category in the template is jsut an idea too ...
[[:Template:Linkthis]] idea draft;
For more, please see User:Dbroadwell/php.
Hi Nickj - Good work on the linkbot. I think it's a great idea. I have a suggestion for how the notices could be displayed though. I agree with the suggestion about using a sub-page to store the linkbot data, but why not have a standard sub-page e.g. Talk:Example Article/Linkbot Suggestions, which could be automatically updated on subsequent passes of the bot?
I would suggest that this page has a link to category:LinkBot (or similar), so all such pages are easy to locate,
and a notice advising people that any edits to the page will be lost when LinkBot is next run.
You could also store meta-data as comments if this would be useful, or include a mechanism of flagging bad suggestions on a per-page basis!
On the talk page you could add a link at the top saying 'Linkbot has suggested some possible links that could be added to this article, see this page for details' the first time it is run. Alternatively (or maybe as well) you could add a link to the bottom saying 'linkbot found new links' and the date every time you run the bot. This depends how often it is likely to be run.
Anyway, just some late-night thoughts - what do you think of them? -- HappyDog 01:50, 30 Mar 2005 (UTC)
The words "this page" normally refer to the page on which they appear, and the Web use of links reading not here but click here support that. Wiki is in any case not ordinary Web material, but seeks to avoid lks that don't work grammatically and logically in their contexts. Good examples should be set in this regard, e.g. follow the lk at the start of this sentence to see how incoherant an otherwise careful editor can become by failure to adhere to that principle.
The otherwise wonderful Link Suggester violates the principle especially egregiously by using the words "this link" twice in the sense i advocate after having used them in a link in the sense i object to! It left me rereading the text to see whether the boilerplate *urges* removing suggestions after acting on them, since it appeared to me that someone had removed them all without discarding the boilerplate. I would instead suggest something like (using the case i was looking at) a handy list.
And in case i haven't yet worn out my welcome, please note that
Thanks, -- Jerzy (t) 21:14, 2005 Apr 18 (UTC)
Is there an established procedure to mark the linkbot's suggestion page after some of the changes have been implemented in the article? There should be. -- 19:19, 4 May 2005 (UTC)
I believe that your Linkbot idea could be feasibly implemented as a step in the wikipedia editing process.
The same way we've got buttons for "Save Page" and "Show Preview" there could be a button for "Suggest Links" which would parse the article and suggest links as your linkbot does.
This way it's not an automated process that would auto-change articles, it doesn't clog up the discussion page (or even its own separate discussion page) with bot-generated suggestions, and nobody has to have anything to do with it if they don't want to--they just wouldn't click the button. The challenge would be, of course, convincing the kind folks at wikipedia that such a programmatic change would not impact system performance to any great extent.
The existing wikipedia search functionality returns highlighted contextual results for each search. If the performance hit for doing one of your "good link" suggestions is less than the performance hit for a search, I would think that the wikipedia folks would be inclined to at least look at your idea...
I really like the idea of linking to anything ending in "-ism." I also like not suggesting a link to a generic term like "government" but instead to specific terms like "democracy" or "communism."
Where can I get a copy of the wikipedia data? What format is it in? - Jared81 03:40, Jun 7, 2005 (UTC)
Hi Jared81,
I would very much like to see the idea implemented as a part of the Wikipedia itself. I would be happy to release the source under the GPL if it would allow this to come about (note: it's a bit of a mess, because it combines three projects into one script, and it's fairly slow, but it does work). I tend to be fairly short of time lately though, so I could release the current code, describe what it's trying to do, describe the problem, etc, but honestly I'm pretty unlikely to have the time learn the MediaWiki codebase and create a patch that implements this.
The criteria for determining whether to suggest a link is:
The code for determining whether something is "a good link" in quite simple, and quick, namely:
/* ** @desc: returns whether something is a good link or not. */ function isGoodLink($link_text) { $link_text = trim($link_text); // string contains two or more capital letters $tmp = array(); ereg("[A-Z][a-z ]*[A-Z]",$link_text,$tmp); if (!empty($tmp)) { return true; } // contains one or more spaces (but if only one space, must not start with 'the') $num_spaces = substr_count($link_text, " "); if ($num_spaces >= 2 || ($num_spaces == 1 && !eregi("^the ", $link_text)) ) { return true; } // contains a dash if (strpos($link_text, "-") !== false) { return true; } // string ends in "ism" if (eregi("ism$", $link_text)) { return true; } // otherwise assume is not a good link return false; }
Determining whether there is already an article of the appropriate name is quite memory intensive, as the fastest way is to keep an index of current article and redirect names in memory. There are also case-sensitivity issues to consider, as article names are case-sensitive on the Wikipedia, so sometimes you get two articles with the same name, but different capitalisation, and you should try to suggest the correct one. So the main question is how much spare memory the servers have, and whether they have some memory persistence (as you don't want to have to recreate the memory index every time a new article is checked - better to have an index that's long-lived).
When you say "get a copy of the Wikipedia data", do you mean the latest copy of the Wikipedia itself? If so, that comes from here (the one you're looking for is the "en.wikipedia" 'cur' database dump, which is about 900 Mb compressed and without images and without old versions); You would then load this into MySQL. Or did you mean the current suggestions? You can have these if you like, but they'd be quite large (maybe 300 Mb, maybe - I don't really know though, that's just a wild guess, it could be more, or it could be much less). They're stored in a MySQL database at the moment, so the suggestions would also be in MySQL database dump format (i.e. the same format as for the Wikipedia database dump downloads). You'd also need to give me somewhere to put the file (e.g. an FTP site).
Also, if you were going to add this functionality (and it's a very good idea adding it in this way, in my opinion), then it would also be worth considering adding checking of wiki syntax that would work in the same kind of way (could even combine them into a "suggest links/ check syntax" combo step). There's already some GPL source code available for this here, and I could get you some slightly updated source code if you were to try and get this added into the MediaWiki software, but again I'm unlikely to have the time to do this by my lonesome (i.e. same basis as above - I'll provide current source, and information, etc, but not the MediaWiki patches).
I'd suggest that probably the next step would be to ask the MediaWiki developers if they're interested in this idea (because if they're not then forget it, but if they are then it could work).
Hope that helps! All the best, Nickj (t) 06:32, 9 Jun 2005 (UTC)
Please take a few moments and fill in the data for your bot on Wikipedia:Bots/Status Thank you Betacommand ( talk • contribs • Bot) 19:39, 12 February 2007 (UTC)
As a result of discussion on the village pump and mailing list, bots are now allowed to edit up to 15 times per minute. The following is the new text regarding bot edit rates from Wikipedia:Bot Policy:
Until new bots are accepted they should wait 30-60 seconds between edits, so as to not clog the recent changes list and user watchlists. After being accepted and a bureaucrat has marked them as a bot, they can edit at a much faster pace. Bots doing non-urgent tasks should edit approximately once every ten seconds, while bots who would benefit from faster editing may edit approximately once every every four seconds.
Also, to eliminate the need to spam the bot talk pages, please add Wikipedia:Bot owners' noticeboard to your watchlist. Future messages which affect bot owners will be posted there. Thank you. -- Mets501 04:21, 22 February 2007 (UTC)
Hi,
I'm interested in implementing your bot on my wiki (sorry can't provide a link). I'm using MW 1.6.7. What are the steps to have it working?
thx in advance, Regards, -- Aretai 15:28, 20 April 2007 (UTC)
Regarding the Phase 5, I'd like to ask you what do you think about implementing something like the InterWiki Link Checker, possibly using User:LinkBot to make the changes. Waldir talk 14:32, 25 November 2007 (UTC)
You might want to take a look at the quote character escaping. I just ran the tool on the article and in suggested (among other links) [[Giuseppe Marc\'Antonio Baretti|Giuseppe Baretti]]
instead of [[Giuseppe Marc'Antonio Baretti|Giuseppe Baretti]]
. Cheers,
Waldir
talk 16:40, 8 March 2009 (UTC)
For giving feedback on the Link Suggester / LinkBot, please use one of the following pages:
If you're not sure which page to use, just pick the one that seems closest.
I hope this isn't TOO radical, but I see a rather simplistic front end for linkbot. The suggestion of {{linkthis}} is JUST a suggestion. The category in the template is jsut an idea too ...
[[:Template:Linkthis]] idea draft;
For more, please see User:Dbroadwell/php.
Hi Nickj - Good work on the linkbot. I think it's a great idea. I have a suggestion for how the notices could be displayed though. I agree with the suggestion about using a sub-page to store the linkbot data, but why not have a standard sub-page e.g. Talk:Example Article/Linkbot Suggestions, which could be automatically updated on subsequent passes of the bot?
I would suggest that this page has a link to category:LinkBot (or similar), so all such pages are easy to locate,
and a notice advising people that any edits to the page will be lost when LinkBot is next run.
You could also store meta-data as comments if this would be useful, or include a mechanism of flagging bad suggestions on a per-page basis!
On the talk page you could add a link at the top saying 'Linkbot has suggested some possible links that could be added to this article, see this page for details' the first time it is run. Alternatively (or maybe as well) you could add a link to the bottom saying 'linkbot found new links' and the date every time you run the bot. This depends how often it is likely to be run.
Anyway, just some late-night thoughts - what do you think of them? -- HappyDog 01:50, 30 Mar 2005 (UTC)
The words "this page" normally refer to the page on which they appear, and the Web use of links reading not here but click here support that. Wiki is in any case not ordinary Web material, but seeks to avoid lks that don't work grammatically and logically in their contexts. Good examples should be set in this regard, e.g. follow the lk at the start of this sentence to see how incoherant an otherwise careful editor can become by failure to adhere to that principle.
The otherwise wonderful Link Suggester violates the principle especially egregiously by using the words "this link" twice in the sense i advocate after having used them in a link in the sense i object to! It left me rereading the text to see whether the boilerplate *urges* removing suggestions after acting on them, since it appeared to me that someone had removed them all without discarding the boilerplate. I would instead suggest something like (using the case i was looking at) a handy list.
And in case i haven't yet worn out my welcome, please note that
Thanks, -- Jerzy (t) 21:14, 2005 Apr 18 (UTC)
Is there an established procedure to mark the linkbot's suggestion page after some of the changes have been implemented in the article? There should be. -- 19:19, 4 May 2005 (UTC)
I believe that your Linkbot idea could be feasibly implemented as a step in the wikipedia editing process.
The same way we've got buttons for "Save Page" and "Show Preview" there could be a button for "Suggest Links" which would parse the article and suggest links as your linkbot does.
This way it's not an automated process that would auto-change articles, it doesn't clog up the discussion page (or even its own separate discussion page) with bot-generated suggestions, and nobody has to have anything to do with it if they don't want to--they just wouldn't click the button. The challenge would be, of course, convincing the kind folks at wikipedia that such a programmatic change would not impact system performance to any great extent.
The existing wikipedia search functionality returns highlighted contextual results for each search. If the performance hit for doing one of your "good link" suggestions is less than the performance hit for a search, I would think that the wikipedia folks would be inclined to at least look at your idea...
I really like the idea of linking to anything ending in "-ism." I also like not suggesting a link to a generic term like "government" but instead to specific terms like "democracy" or "communism."
Where can I get a copy of the wikipedia data? What format is it in? - Jared81 03:40, Jun 7, 2005 (UTC)
Hi Jared81,
I would very much like to see the idea implemented as a part of the Wikipedia itself. I would be happy to release the source under the GPL if it would allow this to come about (note: it's a bit of a mess, because it combines three projects into one script, and it's fairly slow, but it does work). I tend to be fairly short of time lately though, so I could release the current code, describe what it's trying to do, describe the problem, etc, but honestly I'm pretty unlikely to have the time learn the MediaWiki codebase and create a patch that implements this.
The criteria for determining whether to suggest a link is:
The code for determining whether something is "a good link" in quite simple, and quick, namely:
/* ** @desc: returns whether something is a good link or not. */ function isGoodLink($link_text) { $link_text = trim($link_text); // string contains two or more capital letters $tmp = array(); ereg("[A-Z][a-z ]*[A-Z]",$link_text,$tmp); if (!empty($tmp)) { return true; } // contains one or more spaces (but if only one space, must not start with 'the') $num_spaces = substr_count($link_text, " "); if ($num_spaces >= 2 || ($num_spaces == 1 && !eregi("^the ", $link_text)) ) { return true; } // contains a dash if (strpos($link_text, "-") !== false) { return true; } // string ends in "ism" if (eregi("ism$", $link_text)) { return true; } // otherwise assume is not a good link return false; }
Determining whether there is already an article of the appropriate name is quite memory intensive, as the fastest way is to keep an index of current article and redirect names in memory. There are also case-sensitivity issues to consider, as article names are case-sensitive on the Wikipedia, so sometimes you get two articles with the same name, but different capitalisation, and you should try to suggest the correct one. So the main question is how much spare memory the servers have, and whether they have some memory persistence (as you don't want to have to recreate the memory index every time a new article is checked - better to have an index that's long-lived).
When you say "get a copy of the Wikipedia data", do you mean the latest copy of the Wikipedia itself? If so, that comes from here (the one you're looking for is the "en.wikipedia" 'cur' database dump, which is about 900 Mb compressed and without images and without old versions); You would then load this into MySQL. Or did you mean the current suggestions? You can have these if you like, but they'd be quite large (maybe 300 Mb, maybe - I don't really know though, that's just a wild guess, it could be more, or it could be much less). They're stored in a MySQL database at the moment, so the suggestions would also be in MySQL database dump format (i.e. the same format as for the Wikipedia database dump downloads). You'd also need to give me somewhere to put the file (e.g. an FTP site).
Also, if you were going to add this functionality (and it's a very good idea adding it in this way, in my opinion), then it would also be worth considering adding checking of wiki syntax that would work in the same kind of way (could even combine them into a "suggest links/ check syntax" combo step). There's already some GPL source code available for this here, and I could get you some slightly updated source code if you were to try and get this added into the MediaWiki software, but again I'm unlikely to have the time to do this by my lonesome (i.e. same basis as above - I'll provide current source, and information, etc, but not the MediaWiki patches).
I'd suggest that probably the next step would be to ask the MediaWiki developers if they're interested in this idea (because if they're not then forget it, but if they are then it could work).
Hope that helps! All the best, Nickj (t) 06:32, 9 Jun 2005 (UTC)
Please take a few moments and fill in the data for your bot on Wikipedia:Bots/Status Thank you Betacommand ( talk • contribs • Bot) 19:39, 12 February 2007 (UTC)
As a result of discussion on the village pump and mailing list, bots are now allowed to edit up to 15 times per minute. The following is the new text regarding bot edit rates from Wikipedia:Bot Policy:
Until new bots are accepted they should wait 30-60 seconds between edits, so as to not clog the recent changes list and user watchlists. After being accepted and a bureaucrat has marked them as a bot, they can edit at a much faster pace. Bots doing non-urgent tasks should edit approximately once every ten seconds, while bots who would benefit from faster editing may edit approximately once every every four seconds.
Also, to eliminate the need to spam the bot talk pages, please add Wikipedia:Bot owners' noticeboard to your watchlist. Future messages which affect bot owners will be posted there. Thank you. -- Mets501 04:21, 22 February 2007 (UTC)
Hi,
I'm interested in implementing your bot on my wiki (sorry can't provide a link). I'm using MW 1.6.7. What are the steps to have it working?
thx in advance, Regards, -- Aretai 15:28, 20 April 2007 (UTC)
Regarding the Phase 5, I'd like to ask you what do you think about implementing something like the InterWiki Link Checker, possibly using User:LinkBot to make the changes. Waldir talk 14:32, 25 November 2007 (UTC)
You might want to take a look at the quote character escaping. I just ran the tool on the article and in suggested (among other links) [[Giuseppe Marc\'Antonio Baretti|Giuseppe Baretti]]
instead of [[Giuseppe Marc'Antonio Baretti|Giuseppe Baretti]]
. Cheers,
Waldir
talk 16:40, 8 March 2009 (UTC)