External links | ||||||||||||||
|
Archives: 1 |
|
Summary as of 03:36, 2 March 2011 (UTC) | |||||
---|---|---|---|---|---|
Wikipedia has 17.5 million links to external websites (see list 844 MB download). I estimated an additional 3,700 links added each day (see link feed at #wikipedia-en-spam). The links are often used in citations and should be archived before they go dead. Around 110.000 articles are tagged for dead external links now and this number has been steadily increasing for the last two years ( graph). Several solutions have been proposed and are summarized below. | |||||
Option | Status | S | O | N | Notes |
Wikimedia foundation starts its own archivation project | Proposed | 7 | 4 | 0 | ♦ A larger, similar project is proposed at meta:WikiScholar |
Links are archived at WebCite and replaced by a bot | Ready now | 9 | 1 | 0 | ♦ WebCite can whitelist bots for full speed operation ♦ Δ has functioning WebCiteBot here and is in contact with WebCite ♦ Tim1357 has a WebCiteBot almost complete and is in contact with WebCite ♦ Nn123645 has a WebCiteBot in development ♦ ThaddeusB is working to get the original WebCiteBOT running again |
Links are archived at Archive it | Under discussion | 3 | 0 | 1 | ♦ Internet Arcive is ready to help and wants to work out details ♦ Gwern and Hydroxonium are in email contact with them |
Links are archived at Wikiwix and a sitewide script adds archive links to every external links ( example) | Ready, but not yet enabled (needs community approval first) | 6 | 0 | 0 | ♦
Original RfC is closed. New RfC for small scale test is
under discussion here ♦ Wikiwix has been in use on fr.wikipedia for about 2 years ♦ Add fr:Utilisateur:Pmartin/cache.js to your vector.js to use/test ♦ Add fr:Utilisateur:Pmartin/cache.js to MediaWiki:Common.js for everybody (requires consensus from the community) ♦ Javascript required to use this tool ♦ Pmartin will provide backups of archived webpages |
I have received updated information, so I decided to add a summary. Everybody is welcome to update this list. - Hydroxonium ( H3O+) 18:26, 11 February 2011 (UTC)
How are sister projects started? We have Wikinews, Wikiquote, Wikibooks, etc.. How about Wiki-citation? - Hydroxonium ( H3O+) 18:03, 4 February 2011 (UTC)
The Wikiwix solution appears to be the easiest solution, as somebody else does the work for us. It has also been reliable for over 2 years on fr.wikipedia. But I have a concern that this is still a single point of failure, which is how we got in to this situation when the original WebCiteBOT went down. I would like to see at least 2 solutions implemented so that we don't come crashing down if something happens to one of them. I would like to get everybody's input on this. Thanks. - Hydroxonium ( H3O+) 18:03, 10 February 2011 (UTC)
There's some opposition to the Wikiwix RfC. The reason an RfC is needed for the proposed Wikiwix solution is because it requires modifying the Wikimedia interface. I think an easier and less controversial option is to modify the {{ Citation/core}} template. This would change all the major citation templates, such as {{ citation}}, {{ cite news}}, {{ cite web}}, etc.. This could be modified in a few days after some thorough testing.
The 4 major caching systems (Wikiwix, WebCite, Internet Archive and Google's cache) can all be used by adding a prefix to a webpage's URL. These would be in the form of:
http://wikiwix.com/cache/?url=
{{{URL}}}http://www.webcitation.org/query?url=
{{{URL}}}http://web.archive.org/web/*/
{{{URL}}}http://webcache.googleusercontent.com/search?q=cache:
{{{URL}}}This could show up in a citation looking something like this.
This would provide links to cached versions of webpages on 4 different services, and so should please most people. Obviously, Wikiwix, WebCite, Internet Archive and Google would still have to archive the webpages in the first place, otherwise they would come up 404. Also, the link for Internet Archive will come up with a list of all of their cached pages instead of a specific cached page, but this may be acceptable.
I believe this would be easier to implement and a lot less controversial. Anybody have any opinions on this? Thanks. - Hydroxonium ( H3O+) 01:22, 15 February 2011 (UTC)
Avoid adding these extra words to every cite. An easier solution is a single extra link or icon "Archived..." which when hovered (or clicked) produces a small popup with a list of archives for that link. Not sure about the javascript aspect. Is that really desirable? FT2 ( Talk | email) 02:57, 17 February 2011 (UTC)
I don't understand why there must be a core javascript change for Wikiwix to become usable. Why not simply run a bot that adds the archiveurl and archivedate parameters to citations that have been archived? There's no reason why WebCite and Wikiwix cannot both be used...Wikiwix can eventually supplant everything as a primary archival service since archiving is automatic, but there's no reason to not continue to do secondary archiving with WebCite, and of course use Internet Archive for old copies of websites. They can exist side by side, providing redundancy that is badly needed. — Huntster ( t @ c) 12:21, 18 February 2011 (UTC)
┌────────────────────┘
I love it. I absolutely love it. I don't have javascript (long story) so the script based solutions won't help me. But this will. Thank you so much. -
Hydroxonium (
H3O+) 05:51, 21 February 2011 (UTC)
Or at least it will be unless people object.
I have just returned to Wikipedia and getting the bot back up and running will be my first priority unless I am told it is no longer needed/wanted.
Thanks, ThaddeusB ( talk) 23:28, 22 February 2011 (UTC)
P.S. I am certainly in favor of their being multiple options to prevent and fix dead links, so these is in no way meant to discourage the alternatives in development. -- ThaddeusB ( talk) 23:30, 22 February 2011 (UTC)
I'm planning a small test run tomorrow. I will post a link to the results when I have some. -- ThaddeusB ( talk) 16:16, 25 February 2011 (UTC)
SJ has suggested starting a new RfC for Wikiwix with a small scale test of one category for one month so that users can see how it will work ( see here). A couple people have agreed and I would like to close the current RfC early. I would like to get input from everybody before I start on this.
Any input is greatly appreciated. - Hydroxonium ( H3O+) 18:12, 24 February 2011 (UTC)
I started drafting a new RfC several times and decided its better to have the community write it. So I have essentially a blank page here and would like to get others help in writing the new RfC. Everybody is encouraged to edit the page. Thanks. - Hydroxonium ( T• C• V) 10:20, 16 March 2011 (UTC)
Hi all. I'm working in a new WebCiteBOT. I have opened a request for approval. It is free software and written in Python. I hope we can work together on this. Archiving regards. emijrp ( talk) 17:15, 21 April 2011 (UTC)
A relevant RfC is in progress at Wikipedia:Requests for comment/Dead url parameter for citations. Your comments are welcome, thanks! — HELLKNOWZ ▎ TALK 10:49, 21 May 2011 (UTC)
After a very long absence, I am back and have restarted the original WebCiteBOT. The code required several tweaks, but not a major rewrite or anything. Initial tests are now underway no make sure no further tweaks are needed. Interested parties can track the bot's Logs or contributions. Feedback is, of course, welcome.
I will post another update here when the bot is running full time. More frequent updates may be available on bot's talk page. -- ThaddeusB ( talk) 01:20, 18 February 2012 (UTC)
Are there any bots running now?
External links | ||||||||||||||
|
Archives: 1 |
|
Summary as of 03:36, 2 March 2011 (UTC) | |||||
---|---|---|---|---|---|
Wikipedia has 17.5 million links to external websites (see list 844 MB download). I estimated an additional 3,700 links added each day (see link feed at #wikipedia-en-spam). The links are often used in citations and should be archived before they go dead. Around 110.000 articles are tagged for dead external links now and this number has been steadily increasing for the last two years ( graph). Several solutions have been proposed and are summarized below. | |||||
Option | Status | S | O | N | Notes |
Wikimedia foundation starts its own archivation project | Proposed | 7 | 4 | 0 | ♦ A larger, similar project is proposed at meta:WikiScholar |
Links are archived at WebCite and replaced by a bot | Ready now | 9 | 1 | 0 | ♦ WebCite can whitelist bots for full speed operation ♦ Δ has functioning WebCiteBot here and is in contact with WebCite ♦ Tim1357 has a WebCiteBot almost complete and is in contact with WebCite ♦ Nn123645 has a WebCiteBot in development ♦ ThaddeusB is working to get the original WebCiteBOT running again |
Links are archived at Archive it | Under discussion | 3 | 0 | 1 | ♦ Internet Arcive is ready to help and wants to work out details ♦ Gwern and Hydroxonium are in email contact with them |
Links are archived at Wikiwix and a sitewide script adds archive links to every external links ( example) | Ready, but not yet enabled (needs community approval first) | 6 | 0 | 0 | ♦
Original RfC is closed. New RfC for small scale test is
under discussion here ♦ Wikiwix has been in use on fr.wikipedia for about 2 years ♦ Add fr:Utilisateur:Pmartin/cache.js to your vector.js to use/test ♦ Add fr:Utilisateur:Pmartin/cache.js to MediaWiki:Common.js for everybody (requires consensus from the community) ♦ Javascript required to use this tool ♦ Pmartin will provide backups of archived webpages |
I have received updated information, so I decided to add a summary. Everybody is welcome to update this list. - Hydroxonium ( H3O+) 18:26, 11 February 2011 (UTC)
How are sister projects started? We have Wikinews, Wikiquote, Wikibooks, etc.. How about Wiki-citation? - Hydroxonium ( H3O+) 18:03, 4 February 2011 (UTC)
The Wikiwix solution appears to be the easiest solution, as somebody else does the work for us. It has also been reliable for over 2 years on fr.wikipedia. But I have a concern that this is still a single point of failure, which is how we got in to this situation when the original WebCiteBOT went down. I would like to see at least 2 solutions implemented so that we don't come crashing down if something happens to one of them. I would like to get everybody's input on this. Thanks. - Hydroxonium ( H3O+) 18:03, 10 February 2011 (UTC)
There's some opposition to the Wikiwix RfC. The reason an RfC is needed for the proposed Wikiwix solution is because it requires modifying the Wikimedia interface. I think an easier and less controversial option is to modify the {{ Citation/core}} template. This would change all the major citation templates, such as {{ citation}}, {{ cite news}}, {{ cite web}}, etc.. This could be modified in a few days after some thorough testing.
The 4 major caching systems (Wikiwix, WebCite, Internet Archive and Google's cache) can all be used by adding a prefix to a webpage's URL. These would be in the form of:
http://wikiwix.com/cache/?url=
{{{URL}}}http://www.webcitation.org/query?url=
{{{URL}}}http://web.archive.org/web/*/
{{{URL}}}http://webcache.googleusercontent.com/search?q=cache:
{{{URL}}}This could show up in a citation looking something like this.
This would provide links to cached versions of webpages on 4 different services, and so should please most people. Obviously, Wikiwix, WebCite, Internet Archive and Google would still have to archive the webpages in the first place, otherwise they would come up 404. Also, the link for Internet Archive will come up with a list of all of their cached pages instead of a specific cached page, but this may be acceptable.
I believe this would be easier to implement and a lot less controversial. Anybody have any opinions on this? Thanks. - Hydroxonium ( H3O+) 01:22, 15 February 2011 (UTC)
Avoid adding these extra words to every cite. An easier solution is a single extra link or icon "Archived..." which when hovered (or clicked) produces a small popup with a list of archives for that link. Not sure about the javascript aspect. Is that really desirable? FT2 ( Talk | email) 02:57, 17 February 2011 (UTC)
I don't understand why there must be a core javascript change for Wikiwix to become usable. Why not simply run a bot that adds the archiveurl and archivedate parameters to citations that have been archived? There's no reason why WebCite and Wikiwix cannot both be used...Wikiwix can eventually supplant everything as a primary archival service since archiving is automatic, but there's no reason to not continue to do secondary archiving with WebCite, and of course use Internet Archive for old copies of websites. They can exist side by side, providing redundancy that is badly needed. — Huntster ( t @ c) 12:21, 18 February 2011 (UTC)
┌────────────────────┘
I love it. I absolutely love it. I don't have javascript (long story) so the script based solutions won't help me. But this will. Thank you so much. -
Hydroxonium (
H3O+) 05:51, 21 February 2011 (UTC)
Or at least it will be unless people object.
I have just returned to Wikipedia and getting the bot back up and running will be my first priority unless I am told it is no longer needed/wanted.
Thanks, ThaddeusB ( talk) 23:28, 22 February 2011 (UTC)
P.S. I am certainly in favor of their being multiple options to prevent and fix dead links, so these is in no way meant to discourage the alternatives in development. -- ThaddeusB ( talk) 23:30, 22 February 2011 (UTC)
I'm planning a small test run tomorrow. I will post a link to the results when I have some. -- ThaddeusB ( talk) 16:16, 25 February 2011 (UTC)
SJ has suggested starting a new RfC for Wikiwix with a small scale test of one category for one month so that users can see how it will work ( see here). A couple people have agreed and I would like to close the current RfC early. I would like to get input from everybody before I start on this.
Any input is greatly appreciated. - Hydroxonium ( H3O+) 18:12, 24 February 2011 (UTC)
I started drafting a new RfC several times and decided its better to have the community write it. So I have essentially a blank page here and would like to get others help in writing the new RfC. Everybody is encouraged to edit the page. Thanks. - Hydroxonium ( T• C• V) 10:20, 16 March 2011 (UTC)
Hi all. I'm working in a new WebCiteBOT. I have opened a request for approval. It is free software and written in Python. I hope we can work together on this. Archiving regards. emijrp ( talk) 17:15, 21 April 2011 (UTC)
A relevant RfC is in progress at Wikipedia:Requests for comment/Dead url parameter for citations. Your comments are welcome, thanks! — HELLKNOWZ ▎ TALK 10:49, 21 May 2011 (UTC)
After a very long absence, I am back and have restarted the original WebCiteBOT. The code required several tweaks, but not a major rewrite or anything. Initial tests are now underway no make sure no further tweaks are needed. Interested parties can track the bot's Logs or contributions. Feedback is, of course, welcome.
I will post another update here when the bot is running full time. More frequent updates may be available on bot's talk page. -- ThaddeusB ( talk) 01:20, 18 February 2012 (UTC)
Are there any bots running now?