Operator: Bender235 ( talk · contribs · SUL · edit count · logs · page moves · block log · rights log · ANI search)
Time filed: 22:16, Friday, August 5, 2016 ( UTC)
Automatic, Supervised, or Manual: Supervised Automatic
Programming language(s): AutoWikiBrowser
Source code available:
Function overview: HTTP → HTTPS conversion for Internet Archive links
Links to relevant discussions (where appropriate): Wikipedia:Village pump (proposals)/Archive 127#RfC: Should we convert existing Google and Internet Archive links to HTTPS?
Edit period(s): One time run
Estimated number of pages affected: unsure (I guess 50,000 but possibly 100,000+)
Exclusion compliant (Yes/No): Yes
Already has a bot flag (Yes/No):
Function details: Basically all it does is find
http://archive.org/
and http://www.archive.org/
and replace with
https://archive.org/
The above listed WP:VPP discussion already determined that this is a useful endeavor for several reasons, but let me add another one: since Wikipedia is HTTPS-only, all outbound links to HTTP break the HTTP referer (per RFC 2616 §15.1.3). That means fixing these links is also in the Internet Archive's interest (if it wasn't already for their active encouragement to use HTTPS links). (Compare a related request by Newspapers.com to have their inbound links from Wikipedia switched from HTTP to HTTPS, a task that I already completed.)
I have been doing this task with my regular account so far, but BU Rob13 ( talk · contribs) suggested I should apply for bot approval with a new account. -- bender235 ( talk) 23:05, 5 August 2016 (UTC) reply
http://[www.]archive.org
(even though there are probably 50k or more) is just the tip of the iceberg. There's also http://[web.|wayback.]archive.org/web/
(the
Wayback Machine), which are probably more than a million links still using HTTP. As of today, the Internet Archive is basically blind in terms of the HTTP referer for all these links, and I wanted to finish this task at some point this year or next year. The good news is that there are no new HTTP links to Internet Archive being added, since now they redirect to HTTPS from the main page. That means this task does have a clear endpoint. It's only the "legacy links" that we need to take care of. --
bender235 (
talk) 14:02, 6 August 2016 (UTC)
reply
www.
: note that
http://www.archive.org/ redirects you to
https://archive.org/. It seems as if they do not want you to use the subdomain, unless it's web.
for the
Wayback Machine. --
bender235 (
talk) 18:53, 12 August 2016 (UTC)
reply
[http://www.archive.org Internet Archive] → [[Internet Archive]]
[http://www.archive.org Archive.org] → [[Internet Archive]]
Approved. As discussed. One extra thing: I'd like for you to change the edit summary to clarify that the only http->https transition being made is for archive.org. —
Earwig
talk 20:16, 16 August 2016 (UTC)
reply
Operator: Bender235 ( talk · contribs · SUL · edit count · logs · page moves · block log · rights log · ANI search)
Time filed: 22:16, Friday, August 5, 2016 ( UTC)
Automatic, Supervised, or Manual: Supervised Automatic
Programming language(s): AutoWikiBrowser
Source code available:
Function overview: HTTP → HTTPS conversion for Internet Archive links
Links to relevant discussions (where appropriate): Wikipedia:Village pump (proposals)/Archive 127#RfC: Should we convert existing Google and Internet Archive links to HTTPS?
Edit period(s): One time run
Estimated number of pages affected: unsure (I guess 50,000 but possibly 100,000+)
Exclusion compliant (Yes/No): Yes
Already has a bot flag (Yes/No):
Function details: Basically all it does is find
http://archive.org/
and http://www.archive.org/
and replace with
https://archive.org/
The above listed WP:VPP discussion already determined that this is a useful endeavor for several reasons, but let me add another one: since Wikipedia is HTTPS-only, all outbound links to HTTP break the HTTP referer (per RFC 2616 §15.1.3). That means fixing these links is also in the Internet Archive's interest (if it wasn't already for their active encouragement to use HTTPS links). (Compare a related request by Newspapers.com to have their inbound links from Wikipedia switched from HTTP to HTTPS, a task that I already completed.)
I have been doing this task with my regular account so far, but BU Rob13 ( talk · contribs) suggested I should apply for bot approval with a new account. -- bender235 ( talk) 23:05, 5 August 2016 (UTC) reply
http://[www.]archive.org
(even though there are probably 50k or more) is just the tip of the iceberg. There's also http://[web.|wayback.]archive.org/web/
(the
Wayback Machine), which are probably more than a million links still using HTTP. As of today, the Internet Archive is basically blind in terms of the HTTP referer for all these links, and I wanted to finish this task at some point this year or next year. The good news is that there are no new HTTP links to Internet Archive being added, since now they redirect to HTTPS from the main page. That means this task does have a clear endpoint. It's only the "legacy links" that we need to take care of. --
bender235 (
talk) 14:02, 6 August 2016 (UTC)
reply
www.
: note that
http://www.archive.org/ redirects you to
https://archive.org/. It seems as if they do not want you to use the subdomain, unless it's web.
for the
Wayback Machine. --
bender235 (
talk) 18:53, 12 August 2016 (UTC)
reply
[http://www.archive.org Internet Archive] → [[Internet Archive]]
[http://www.archive.org Archive.org] → [[Internet Archive]]
Approved. As discussed. One extra thing: I'd like for you to change the edit summary to clarify that the only http->https transition being made is for archive.org. —
Earwig
talk 20:16, 16 August 2016 (UTC)
reply