This is the
talk page for discussing improvements to the
Using archive.today page. |
|
This help page was nominated for
deletion. Please review the prior discussions if you are considering re-nomination:
|
On 25 July 2023, it was proposed that this page be moved to Help:Using Archive.ph. The result of the discussion was not moved. |
One of the editors posted the following edit summary, when removing the words commenting on the copyright issues:
First, no one has claimed US law is world law. All I am saying is that Wikipedia does not want (and is not legally able) to violate US copyright law, nor does it want to incur endless DMCA take-down requests, which will surely be the result if people start linking Wikipedia articles to unauthorized archived copies of copyrighted works. Pointing out that copyright laws are different in other countries is obviously irrelevant.
Second, I agree that the honoring of robots and the honoring of copyright laws are two different things. However, as the proposed wording explains, robot exclusion files are the only known means used by responsible web archives to avoid copyright infringement. If Archive.is has some other way of avoiding copyright infringement, that would be fine. But they don't. The Archive.is contains a large amount of copyright infringing material, which anyone can see for themselves. (See an example on the Wikipedia article on Archive.is, but you better hurry, because there is a nomination for deletion of that article.) So, the fact that Archive.is refuses to honor robot exclusions for copyrighted material is closely related to the fact that they are violating copyright law.
Third, the editor says "libraries have different laws". I don't know what that is supposed to mean, but if anyone thinks it means that libraries or online archives are allowed to violate copyright law, they are mistaken.
Fourth, the editor says the proposed text is a "grossly inappropriate description", but justification for this claim is based on the misunderstanding noted above. The proposed text is entirely appropriate. Wikipedia should not be a party to copyright infringement. Can we at least agree on this? Weakestletter ( talk) 21:12, 23 September 2013 (UTC)
It was from an editor near the end of Wikipedia:Deletion review/Log/2013 October 28, followed by its reversion by me. -- Lexein ( talk) 14:40, 29 October 2013 (UTC)
How do I properly link to http://archi ve.is/jPlGB (added space) in a reference for an article? (It *was* http://kappapiart.org/join.html) Naraht ( talk) 17:48, 5 May 2016 (UTC)
There is clear consensus that long form URLs are preferred. Long forms include timestamps and the original URL. Short forms can be used to mask the destination and circumvent blacklistings. Adding short form URLs should not result in warnings and/or sanctions against good faith editors.
Any URLs in a short form should be converted to long form. This can be done by any editor. There is also clear consensus that a bot automatically convert short form URLs and tag articles using blacklisted URLs.
Example long URL forms which include timestamps and the original URL:
http://archive.is/YYYY.MM.DD-hhmmss/http://www.example.com
http://www.webcitation.org/5kbAUIXb6?url=http://www.example.com/
This RfC is to gauge community consensus about the preferred URL format for archive.is and WebCite when used in citations.
Both sites permit two URL formats, a shortened version and a longer version. Examples:
http://www.webcitation.org/query?url=http://www.example.com&date=YYYYMMDD
http://www.webcitation.org/5eWaHRbn4
http://www.webcitation.org/5eWaHRbn4?url=http://www.example.com/
(used in dewiki, the ID contains a base 62 coded timestamp extracted by the template
de:template:Webarchiv)Which one is preferred, or are either equally appropriate?
Related information:
Please leave a !vote below, such as short or long or either. -- Green C 21:50, 5 July 2016 (UTC)
This shouldn't be an RfC — we always use long. Link-shorteners are not allowed.
Carl Fredrik
💌
📧
23:48, 5 July 2016 (UTC)
Following the below discussion lets go with long, forbid short. And I don't mean block short, just to automate conversion to long links. Carl Fredrik 💌 📧 08:58, 6 July 2016 (UTC)
|url=
template and the archived version in the |archiveurl=
template. Also, note that WebCite uses a query parameter for the original URL in the long form, so it may still be possible to bypass the blacklist. Perhaps the easiest would be to have a bot also check those and perform URL decoding and matching against the blacklist, and reporting if it finds a blacklisted link.
nyuszika7h (
talk)
08:10, 6 July 2016 (UTC)archive.is
, archive.today
, archive.li
, archive.ec
and archive.fo
.
nyuszika7h (
talk)
14:09, 6 July 2016 (UTC){{
wayback}}
). These are the correct solutions already used by other archives like Wayback. Allowing short URLs against policy, then assuming a bot will fix intentional mistakes forever, is not a good idea for a bunch of reasons. Anyway I certainly won't be writing a bot to do that. Maybe someone else will... we are talking 100s of hours of labor and a personal commitment for unlimited years to come. These bots don't run themselves they have constant page formatting problems due to the irregular nature of wikisource data, it's not a simple regex find and replace you have to deal with deadurl, archivedate, {{
dead}}
, soft-404s, 503s, 300s at archive.is etc.. its complex and difficult to write and maintain. --
Green
C
13:18, 13 July 2016 (UTC)Entries in this list may be constructive or made in good faith and are not necessarily an indication of wrongdoing on behalf of the user.". – nyuszika7h ( talk) 08:21, 13 July 2016 (UTC)
@ 93.185.30.56: I see that you're from Russia, where archive.is is blocked in HTTPS. However, the situation varies. China for example block HTTP but not HTTPS. archive.is is HTTPS now by default. There's no policy on how to link to archive.is. And seeing how most of the world can access it through HTTPS, then linking to it in the encrypted form is better than exposing traffic to an unsecure link. Users in countries where one form or the other is blocked should make the extra effort to modify the link (HTTP/HTTPS) when trying to access it. — Hexafluoride Ping me if you need help, or post on my talk 13:33, 14 July 2016 (UTC)
There was an RfC that decided to use use https for archive.org .. I don't see why archive.is would be any different result if there was another RfC. The argument that certain countries block https, would that be solvable with Wikipedia:Protocol-relative URL? — Preceding unsigned comment added by Green Cardamom ( talk • contribs) 15:13, 14 July 2016 (UTC)
OK, it seems archive.is has now started redirecting HTTPS requests to archived copies (not the main page) to HTTP, with an annoying 10-second delay. nyuszika7h ( talk) 16:24, 25 July 2016 (UTC)
Regarding this section:
Recently modified from the original:
Neither of these are sourced, and they give somewhat conflicting POVs. Do we think it's a good idea to give legal opinions in this essay? -- Green C 18:26, 17 November 2016 (UTC)
Accessed today, the example referenced as a source for this assertion shows the page archived normally instead of taken down because of a DMCA request. One can find other examples with a Google search but the case is the same for them. I could not find a better source for this but I imagine there could be something on the author's blog. Saturnalia0 ( talk) 01:25, 24 January 2017 (UTC)
Hi!
Recently
archive.is announced: "Please do not use http://archive.IS mirror for linking, use others mirrors [.TODAY .FO .LI .VN .MD .PH]. .IS might stop working soon."
[3]
[4] (thanks for info @
user:KurtR)
All their TLDs seem to be unstable. So what is a good solution for wikipedia? (ping
user:GreenC) --
seth (
talk)
11:10, 5 January 2019 (UTC)
Being that it's now called archive.today, should the article title and content be changed? Yaakovaryeh ( talk) 04:10, 9 January 2020 (UTC)
Every time I try to archive a page, it gets stuck at the submitting page with the screen stopping and endlessly rotating at "loading". What to do? Kailash29792 (talk) 06:41, 2 September 2021 (UTC)
For the archive links that contain Chinese or any foreign characters, it seems like using the long format URL with ".today" would not redirect properly when inserting onto Wikipedia for some reasons (if you click on it directly). For example: [1]; the only way for it to work is to copy the archive link and paste it directly into the browser link bar But when I change it to ".ph", it shows up properly when clicking on it. For example: [2]
Short format URL has no issue. So question, I am aware Wikipedia prefers using the long format URL and also the ".today" domain name (as .ph is only the server name). But for these specific links, what format should I use so editors would have no issue when clicking on them?
PS: I tested the link on a Wordpress post, clicking on the long format URL was able to redirect the link correctly. Link So is this a Wikipedia issue? Please help!
Thanks.-- TerryAlex ( talk) 19:43, 18 October 2021 (UTC)
References
It's mentioned that browser extensions are available for Chrome, Edge and Firefox. Are there advantages in downloading the extensions rather than having the site on bookmark bars? Mcljlm ( talk) 15:45, 31 December 2022 (UTC)
The result of the move request was: not moved. Speedily closed per nom request. ( non-admin closure) WPscatter t/ c 16:03, 26 July 2023 (UTC)
Help:Using archive.today → Help:Using Archive.ph – Site name changed. While Archive.today presently redirects to Archive.ph, there's no guarantee that will continue indefinitely. The text in the page will need changing as well, of course, but that should be a quite search-replace operation. — SMcCandlish ☏ ¢ 😼 22:11, 25 July 2023 (UTC)
Special:Diff/1167250578/1169669968: "Links (mostly sources in articles) to archive.is are inaccessible for me (I get a security check that cannot be completed). Is it worth somehow editing all existing links to use archive.today?". (post by User:WhyNotHugo).
I actually started work on a bot to do this, as well as expand from short form to long form. On Enwiki and 200+ other wikis. Real-time monitoring. I got pretty far along then had to let it go for other projects. It's a lot more complicated when scaling like this and keeping it up to date, not merely a 1-time pass through. It's a viable project if I can find the time to complete it, probably about 75% done. -- Green C 15:59, 10 August 2023 (UTC)
The archive.is owner has explained that he returns bad results to us because we don’t pass along the EDNS subnet information. This information leaks information about a requester’s IP and, in turn, sacrifices the privacy of users. This is especially problematic as we work to encrypt more DNS traffic since the request from Resolver to Authoritative DNS is typically unencrypted. We’re aware of real world examples where nationstate actors have monitored EDNS subnet information to track individuals, which was part of the motivation for the privacy and security policies of 1.1.1.1.
it can always be re-blacklisted if future issues occurjumped out at me. Again, not wanting to cast any asperations here, but this suggests that there may have been issues with this site in the past. However, let me qualify what I’ve said by saying that I’m very new (1) to WP, and (2) to the subject of using archive.is on WP. So I’ll leave this discussion for right now to allow editors more experienced in one or both fields to weigh in. Best, A smart kitten ( talk) 18:27, 3 September 2023 (UTC)
Prior to today, accessing archive.today would display a "Welcome to nginx!" page. Today, I'm often getting a connection timeout error. The iidrn.com website (Is It Down Right Now?) has no details about how long it's been down. Fabrickator ( talk) 18:21, 14 October 2023 (UTC)
There's apparently an issue with archiving Ajax websites with Internet Archive. [6] To put it another way, it doesn't work. However, it works just fine with archive.today. My question is how do we maintain both in a single article without having IABot overwrite the archive.today link? If a safeguard already exists to prevent this, let me know. In other words, does IABot check to see if archive.today links are already present, and if so, does it ignore them? Viriditas ( talk) 18:47, 2 November 2023 (UTC)
{{
cbignore}}
to keep IABot from editing the citation. Those are rare cases. --
Green
C
23:38, 2 November 2023 (UTC)
In addition to the aforementioned DNS resolver issue trapping people in a CAPTCHA loop with no explanation, I don't understand why we are putting this service forward as a choice when it (as far as I can tell) is operated by a single private individual we otherwise have no information about. I don't mean that in a nefarious way, but it just seems obvious that we shouldn't lean on this like it's real infrastructure in the way IA is, fraught infrastructure as it may seem—they at least have, you know, an office and a means of contact other than a tumblr blog. How much is getting broken or lost if this breaks further (since apparently one major DNS resolver isn't enough of a dealbreaker) or falls off the internet entirely? Remsense 诉 13:22, 4 May 2024 (UTC)
We currently have several statements that the time taken to archive a webpage is typically 5-15 seconds
, and in one case 15-30 seconds
. It seems like this is ancient history, from e.g. 5-10 years ago, when third-party javascript pollution was less prevalent. My impression is that the typical time scale is 30-300 seconds. This might vary between what sort of webpages are archived: plain html will obviously be much faster. We could put a bigger range, e.g. "5 to 300 seconds", although that looks a bit odd. So I propose "a few seconds to a few minutes" to replace all the current timing estimates.
Boud (
talk)
20:14, 29 May 2024 (UTC)
This is the
talk page for discussing improvements to the
Using archive.today page. |
|
This help page was nominated for
deletion. Please review the prior discussions if you are considering re-nomination:
|
On 25 July 2023, it was proposed that this page be moved to Help:Using Archive.ph. The result of the discussion was not moved. |
One of the editors posted the following edit summary, when removing the words commenting on the copyright issues:
First, no one has claimed US law is world law. All I am saying is that Wikipedia does not want (and is not legally able) to violate US copyright law, nor does it want to incur endless DMCA take-down requests, which will surely be the result if people start linking Wikipedia articles to unauthorized archived copies of copyrighted works. Pointing out that copyright laws are different in other countries is obviously irrelevant.
Second, I agree that the honoring of robots and the honoring of copyright laws are two different things. However, as the proposed wording explains, robot exclusion files are the only known means used by responsible web archives to avoid copyright infringement. If Archive.is has some other way of avoiding copyright infringement, that would be fine. But they don't. The Archive.is contains a large amount of copyright infringing material, which anyone can see for themselves. (See an example on the Wikipedia article on Archive.is, but you better hurry, because there is a nomination for deletion of that article.) So, the fact that Archive.is refuses to honor robot exclusions for copyrighted material is closely related to the fact that they are violating copyright law.
Third, the editor says "libraries have different laws". I don't know what that is supposed to mean, but if anyone thinks it means that libraries or online archives are allowed to violate copyright law, they are mistaken.
Fourth, the editor says the proposed text is a "grossly inappropriate description", but justification for this claim is based on the misunderstanding noted above. The proposed text is entirely appropriate. Wikipedia should not be a party to copyright infringement. Can we at least agree on this? Weakestletter ( talk) 21:12, 23 September 2013 (UTC)
It was from an editor near the end of Wikipedia:Deletion review/Log/2013 October 28, followed by its reversion by me. -- Lexein ( talk) 14:40, 29 October 2013 (UTC)
How do I properly link to http://archi ve.is/jPlGB (added space) in a reference for an article? (It *was* http://kappapiart.org/join.html) Naraht ( talk) 17:48, 5 May 2016 (UTC)
There is clear consensus that long form URLs are preferred. Long forms include timestamps and the original URL. Short forms can be used to mask the destination and circumvent blacklistings. Adding short form URLs should not result in warnings and/or sanctions against good faith editors.
Any URLs in a short form should be converted to long form. This can be done by any editor. There is also clear consensus that a bot automatically convert short form URLs and tag articles using blacklisted URLs.
Example long URL forms which include timestamps and the original URL:
http://archive.is/YYYY.MM.DD-hhmmss/http://www.example.com
http://www.webcitation.org/5kbAUIXb6?url=http://www.example.com/
This RfC is to gauge community consensus about the preferred URL format for archive.is and WebCite when used in citations.
Both sites permit two URL formats, a shortened version and a longer version. Examples:
http://www.webcitation.org/query?url=http://www.example.com&date=YYYYMMDD
http://www.webcitation.org/5eWaHRbn4
http://www.webcitation.org/5eWaHRbn4?url=http://www.example.com/
(used in dewiki, the ID contains a base 62 coded timestamp extracted by the template
de:template:Webarchiv)Which one is preferred, or are either equally appropriate?
Related information:
Please leave a !vote below, such as short or long or either. -- Green C 21:50, 5 July 2016 (UTC)
This shouldn't be an RfC — we always use long. Link-shorteners are not allowed.
Carl Fredrik
💌
📧
23:48, 5 July 2016 (UTC)
Following the below discussion lets go with long, forbid short. And I don't mean block short, just to automate conversion to long links. Carl Fredrik 💌 📧 08:58, 6 July 2016 (UTC)
|url=
template and the archived version in the |archiveurl=
template. Also, note that WebCite uses a query parameter for the original URL in the long form, so it may still be possible to bypass the blacklist. Perhaps the easiest would be to have a bot also check those and perform URL decoding and matching against the blacklist, and reporting if it finds a blacklisted link.
nyuszika7h (
talk)
08:10, 6 July 2016 (UTC)archive.is
, archive.today
, archive.li
, archive.ec
and archive.fo
.
nyuszika7h (
talk)
14:09, 6 July 2016 (UTC){{
wayback}}
). These are the correct solutions already used by other archives like Wayback. Allowing short URLs against policy, then assuming a bot will fix intentional mistakes forever, is not a good idea for a bunch of reasons. Anyway I certainly won't be writing a bot to do that. Maybe someone else will... we are talking 100s of hours of labor and a personal commitment for unlimited years to come. These bots don't run themselves they have constant page formatting problems due to the irregular nature of wikisource data, it's not a simple regex find and replace you have to deal with deadurl, archivedate, {{
dead}}
, soft-404s, 503s, 300s at archive.is etc.. its complex and difficult to write and maintain. --
Green
C
13:18, 13 July 2016 (UTC)Entries in this list may be constructive or made in good faith and are not necessarily an indication of wrongdoing on behalf of the user.". – nyuszika7h ( talk) 08:21, 13 July 2016 (UTC)
@ 93.185.30.56: I see that you're from Russia, where archive.is is blocked in HTTPS. However, the situation varies. China for example block HTTP but not HTTPS. archive.is is HTTPS now by default. There's no policy on how to link to archive.is. And seeing how most of the world can access it through HTTPS, then linking to it in the encrypted form is better than exposing traffic to an unsecure link. Users in countries where one form or the other is blocked should make the extra effort to modify the link (HTTP/HTTPS) when trying to access it. — Hexafluoride Ping me if you need help, or post on my talk 13:33, 14 July 2016 (UTC)
There was an RfC that decided to use use https for archive.org .. I don't see why archive.is would be any different result if there was another RfC. The argument that certain countries block https, would that be solvable with Wikipedia:Protocol-relative URL? — Preceding unsigned comment added by Green Cardamom ( talk • contribs) 15:13, 14 July 2016 (UTC)
OK, it seems archive.is has now started redirecting HTTPS requests to archived copies (not the main page) to HTTP, with an annoying 10-second delay. nyuszika7h ( talk) 16:24, 25 July 2016 (UTC)
Regarding this section:
Recently modified from the original:
Neither of these are sourced, and they give somewhat conflicting POVs. Do we think it's a good idea to give legal opinions in this essay? -- Green C 18:26, 17 November 2016 (UTC)
Accessed today, the example referenced as a source for this assertion shows the page archived normally instead of taken down because of a DMCA request. One can find other examples with a Google search but the case is the same for them. I could not find a better source for this but I imagine there could be something on the author's blog. Saturnalia0 ( talk) 01:25, 24 January 2017 (UTC)
Hi!
Recently
archive.is announced: "Please do not use http://archive.IS mirror for linking, use others mirrors [.TODAY .FO .LI .VN .MD .PH]. .IS might stop working soon."
[3]
[4] (thanks for info @
user:KurtR)
All their TLDs seem to be unstable. So what is a good solution for wikipedia? (ping
user:GreenC) --
seth (
talk)
11:10, 5 January 2019 (UTC)
Being that it's now called archive.today, should the article title and content be changed? Yaakovaryeh ( talk) 04:10, 9 January 2020 (UTC)
Every time I try to archive a page, it gets stuck at the submitting page with the screen stopping and endlessly rotating at "loading". What to do? Kailash29792 (talk) 06:41, 2 September 2021 (UTC)
For the archive links that contain Chinese or any foreign characters, it seems like using the long format URL with ".today" would not redirect properly when inserting onto Wikipedia for some reasons (if you click on it directly). For example: [1]; the only way for it to work is to copy the archive link and paste it directly into the browser link bar But when I change it to ".ph", it shows up properly when clicking on it. For example: [2]
Short format URL has no issue. So question, I am aware Wikipedia prefers using the long format URL and also the ".today" domain name (as .ph is only the server name). But for these specific links, what format should I use so editors would have no issue when clicking on them?
PS: I tested the link on a Wordpress post, clicking on the long format URL was able to redirect the link correctly. Link So is this a Wikipedia issue? Please help!
Thanks.-- TerryAlex ( talk) 19:43, 18 October 2021 (UTC)
References
It's mentioned that browser extensions are available for Chrome, Edge and Firefox. Are there advantages in downloading the extensions rather than having the site on bookmark bars? Mcljlm ( talk) 15:45, 31 December 2022 (UTC)
The result of the move request was: not moved. Speedily closed per nom request. ( non-admin closure) WPscatter t/ c 16:03, 26 July 2023 (UTC)
Help:Using archive.today → Help:Using Archive.ph – Site name changed. While Archive.today presently redirects to Archive.ph, there's no guarantee that will continue indefinitely. The text in the page will need changing as well, of course, but that should be a quite search-replace operation. — SMcCandlish ☏ ¢ 😼 22:11, 25 July 2023 (UTC)
Special:Diff/1167250578/1169669968: "Links (mostly sources in articles) to archive.is are inaccessible for me (I get a security check that cannot be completed). Is it worth somehow editing all existing links to use archive.today?". (post by User:WhyNotHugo).
I actually started work on a bot to do this, as well as expand from short form to long form. On Enwiki and 200+ other wikis. Real-time monitoring. I got pretty far along then had to let it go for other projects. It's a lot more complicated when scaling like this and keeping it up to date, not merely a 1-time pass through. It's a viable project if I can find the time to complete it, probably about 75% done. -- Green C 15:59, 10 August 2023 (UTC)
The archive.is owner has explained that he returns bad results to us because we don’t pass along the EDNS subnet information. This information leaks information about a requester’s IP and, in turn, sacrifices the privacy of users. This is especially problematic as we work to encrypt more DNS traffic since the request from Resolver to Authoritative DNS is typically unencrypted. We’re aware of real world examples where nationstate actors have monitored EDNS subnet information to track individuals, which was part of the motivation for the privacy and security policies of 1.1.1.1.
it can always be re-blacklisted if future issues occurjumped out at me. Again, not wanting to cast any asperations here, but this suggests that there may have been issues with this site in the past. However, let me qualify what I’ve said by saying that I’m very new (1) to WP, and (2) to the subject of using archive.is on WP. So I’ll leave this discussion for right now to allow editors more experienced in one or both fields to weigh in. Best, A smart kitten ( talk) 18:27, 3 September 2023 (UTC)
Prior to today, accessing archive.today would display a "Welcome to nginx!" page. Today, I'm often getting a connection timeout error. The iidrn.com website (Is It Down Right Now?) has no details about how long it's been down. Fabrickator ( talk) 18:21, 14 October 2023 (UTC)
There's apparently an issue with archiving Ajax websites with Internet Archive. [6] To put it another way, it doesn't work. However, it works just fine with archive.today. My question is how do we maintain both in a single article without having IABot overwrite the archive.today link? If a safeguard already exists to prevent this, let me know. In other words, does IABot check to see if archive.today links are already present, and if so, does it ignore them? Viriditas ( talk) 18:47, 2 November 2023 (UTC)
{{
cbignore}}
to keep IABot from editing the citation. Those are rare cases. --
Green
C
23:38, 2 November 2023 (UTC)
In addition to the aforementioned DNS resolver issue trapping people in a CAPTCHA loop with no explanation, I don't understand why we are putting this service forward as a choice when it (as far as I can tell) is operated by a single private individual we otherwise have no information about. I don't mean that in a nefarious way, but it just seems obvious that we shouldn't lean on this like it's real infrastructure in the way IA is, fraught infrastructure as it may seem—they at least have, you know, an office and a means of contact other than a tumblr blog. How much is getting broken or lost if this breaks further (since apparently one major DNS resolver isn't enough of a dealbreaker) or falls off the internet entirely? Remsense 诉 13:22, 4 May 2024 (UTC)
We currently have several statements that the time taken to archive a webpage is typically 5-15 seconds
, and in one case 15-30 seconds
. It seems like this is ancient history, from e.g. 5-10 years ago, when third-party javascript pollution was less prevalent. My impression is that the typical time scale is 30-300 seconds. This might vary between what sort of webpages are archived: plain html will obviously be much faster. We could put a bigger range, e.g. "5 to 300 seconds", although that looks a bit odd. So I propose "a few seconds to a few minutes" to replace all the current timing estimates.
Boud (
talk)
20:14, 29 May 2024 (UTC)