This is an engineering document for those building bots needing help with URL formats, etc. See Help:Archiving a source for info about using these archive services. |
List of known web archive services in-use on English Wikipedia. Sorted roughly by number of uses from most to least. The Wayback Machine is about 80% of the total. Data initially compiled by User:GreenC as of March 2017. Updates and corrections welcome.
Archive.Today represents captured pages as a static snapshot, rendered by the Archive.Today server, and uses a fixed-width layout. Page resources such as JavaScript and CSS files are not retained separately. For example, styling from a separate CSS file is converted to inline CSS styling, embedded in the HTML source code.
Archived pages are initially served through their short URL format, an identifier with five case-sensitive alphanumerical characters and four characters on early captures from 2012.
To obtain the long URL format with time stamp and the source URL, click "share" in the top menu or append "/share
" to the URL. The full URL is listed in the window.
If a redirect page is saved, Archive.Today stores both the URL of the redirect page and the URL of the redirect target. The archived page can be found by entering either URL.
As of 2023, copies of Archive.org pages can only be saved once. This restriction applies as well to the digital library (archive.org/details/) which is subject to change, not only the Wayback Machine where pages are, besides infrequent exclusions, not subject to change anyway.
If a "Welcome to nginx!" page appears, it apparently either means the user has hit a rate limit or the site is doing maintenance work.
.au
domain./pan/[0-9]{4,7}/
Ghost Archive uses the WARC ("webarchive") format to store saved pages, meaning the verbatim content of the page resources can be recreated. When opened, Ghost Archive uses the Webrecorder system to simulate the page as realistically as possible. Alternatively, the page can be viewed in "noscript", meaning as static HTML in its finished rendered state. This mode does not require JavaScript and is compatible with older browsers and loads faster, however some page features that rely on JavaScript such as pagination and collapsible menus are not available.
Due to Instagram's strict rate limiting, archival of Instagram profiles might fail and result in a blank page. If the archival of a YouTube video failed, an "Archiving error" page is displayed and the archival of the same video can not be retried.
Along with YouTube videos, their metadata is saved: publication date, description, and the URL of the channel in the /@
or /c
format, whichever is available.
If the archived page redirects to a different URL, only the target URL is displayed. This means the archived page can not be opened by entering the URL of the redirecting page.
Similar to Archive.Today, Megalodon.jp represents archived pages as a static HTML snapshot. However, pictures are converted into BASE64 data:
URLs inside the resulting HTML data, and there is no fixed width like Archive.Today .
Megalodon lets the user decide whether to save the desktop or mobile version of a page, meaning the version that appears to desktop computer and laptop users, or to smartphone users.
Using https://megalodon.jp/(full URL) (example: https://megalodon.jp/https://gstreamer.freedesktop.org:443/download/ ) can check if Megalodon archived any copy of a particular URL. http and https are treated separately.
If the archived page is a redirect to a different URL, only the URL prior to the redirect is saved. In that case, the archived page can not be opened by entering the target URL ( example).
http://webarchive.proni.gov.uk/20111213123846/http
http://webarchive.proni.gov.uk/20100218151844/http://www.berr.gov.uk/
http://www.collectionscanada.gc.ca/webarchives/20061104084225/http://broadband.gc.ca/maps/province.html?prov=48
http://www.collectionscanada.gc.ca/archivesweb/20060209004933/http
http://www.collectionscanada.gc.ca/webarchives/20061104084225/http://broadband.gc.ca/maps/province.html?prov=48
Appears to have poor coverage.
Only accessible through search results, not manually through an URL or search prefix.
This is an engineering document for those building bots needing help with URL formats, etc. See Help:Archiving a source for info about using these archive services. |
List of known web archive services in-use on English Wikipedia. Sorted roughly by number of uses from most to least. The Wayback Machine is about 80% of the total. Data initially compiled by User:GreenC as of March 2017. Updates and corrections welcome.
Archive.Today represents captured pages as a static snapshot, rendered by the Archive.Today server, and uses a fixed-width layout. Page resources such as JavaScript and CSS files are not retained separately. For example, styling from a separate CSS file is converted to inline CSS styling, embedded in the HTML source code.
Archived pages are initially served through their short URL format, an identifier with five case-sensitive alphanumerical characters and four characters on early captures from 2012.
To obtain the long URL format with time stamp and the source URL, click "share" in the top menu or append "/share
" to the URL. The full URL is listed in the window.
If a redirect page is saved, Archive.Today stores both the URL of the redirect page and the URL of the redirect target. The archived page can be found by entering either URL.
As of 2023, copies of Archive.org pages can only be saved once. This restriction applies as well to the digital library (archive.org/details/) which is subject to change, not only the Wayback Machine where pages are, besides infrequent exclusions, not subject to change anyway.
If a "Welcome to nginx!" page appears, it apparently either means the user has hit a rate limit or the site is doing maintenance work.
.au
domain./pan/[0-9]{4,7}/
Ghost Archive uses the WARC ("webarchive") format to store saved pages, meaning the verbatim content of the page resources can be recreated. When opened, Ghost Archive uses the Webrecorder system to simulate the page as realistically as possible. Alternatively, the page can be viewed in "noscript", meaning as static HTML in its finished rendered state. This mode does not require JavaScript and is compatible with older browsers and loads faster, however some page features that rely on JavaScript such as pagination and collapsible menus are not available.
Due to Instagram's strict rate limiting, archival of Instagram profiles might fail and result in a blank page. If the archival of a YouTube video failed, an "Archiving error" page is displayed and the archival of the same video can not be retried.
Along with YouTube videos, their metadata is saved: publication date, description, and the URL of the channel in the /@
or /c
format, whichever is available.
If the archived page redirects to a different URL, only the target URL is displayed. This means the archived page can not be opened by entering the URL of the redirecting page.
Similar to Archive.Today, Megalodon.jp represents archived pages as a static HTML snapshot. However, pictures are converted into BASE64 data:
URLs inside the resulting HTML data, and there is no fixed width like Archive.Today .
Megalodon lets the user decide whether to save the desktop or mobile version of a page, meaning the version that appears to desktop computer and laptop users, or to smartphone users.
Using https://megalodon.jp/(full URL) (example: https://megalodon.jp/https://gstreamer.freedesktop.org:443/download/ ) can check if Megalodon archived any copy of a particular URL. http and https are treated separately.
If the archived page is a redirect to a different URL, only the URL prior to the redirect is saved. In that case, the archived page can not be opened by entering the target URL ( example).
http://webarchive.proni.gov.uk/20111213123846/http
http://webarchive.proni.gov.uk/20100218151844/http://www.berr.gov.uk/
http://www.collectionscanada.gc.ca/webarchives/20061104084225/http://broadband.gc.ca/maps/province.html?prov=48
http://www.collectionscanada.gc.ca/archivesweb/20060209004933/http
http://www.collectionscanada.gc.ca/webarchives/20061104084225/http://broadband.gc.ca/maps/province.html?prov=48
Appears to have poor coverage.
Only accessible through search results, not manually through an URL or search prefix.