Observation on archival sites: Archive.Today vs. Internet Archive
Some of my followers may have noted I've been archiving a number of older posts from my previous account of late....
In doing this, I've noticed a few things about Archive.Today (a/k/a Archive.Is) and the Internet Archive's Wayback Machine.
It turns out that Archive.Today is really convenient to invoke with DDG set as my default search engine as I simply highlight the navigation bar for a page, prepend the "!ais" bang search to the head of the URL (followed by a space) and hit return.
Archive.Today helpfully offers links for other potential archive sites, including the Internet Archive, so I don't have to independently call up that URL.
Archive.Today responds very quickly. There's a practically instant response that the page is or is not archived, and if not, the "save" form also pops up nearly instantly.
By contrast, the Internet Archive takes a few seconds to respond whether or not the page is archived, and a few further seconds when requesting a page be saved.
(Both sites have a two-stage submission. The Internet Archive does have a submission URL which should work in one fell swoop, though it occasionally breaks and error-detection is ... difficult.)
Archive.Today's processing queue ranges from 0 to 10k or so slots.
The Internet Archive is currently reporting ~10 hours to process archival requests.
AT does include comments on Diaspora* posts. IA does not.
My manual workflow has evolved to:
- Pull up page, reload in Diaspora* (otherwise cookies may not be current, forcing a log-out / log-in cycle, also annoying).
- Mark the post "tagged" to indicate it's been archived. I typically also "like" it to set a sharper visual indicator.
- Prepend '!ais ' to the navigation bar and hit <enter>.
- Open "Search in Internet Archive" in a new tab, then select that tab to get IA working on finding the post.
- Switch back to the Archive.Today tab and select save, then confirm. At that point the request is processing.
- Switch back to the Internet Archive tab, wait for the page to fully load, request archive, wait for that page to load, confirm, and wait for the request to return.
- Even after this stage, the IA request may still fail. Detecting this is ... difficult.
I may also save content from the original (JoindiasporaCom) address, though mostly I'm working through Glasswings. I have run an automated submission of all my posts from the take-out JSON archive, and will run that another time or so before final shutdown. That will at least preserve post content online, but not the comments threads :-(
Hopefully this information may be useful to others.
#Archival #WebArchival #ArchiveIs #ArchiveToday #InternetArchive #WaybackMachine