#webarchival

dredmorbius@diaspora.glasswings.com

Observation on archival sites: Archive.Today vs. Internet Archive

Some of my followers may have noted I've been archiving a number of older posts from my previous account of late....

In doing this, I've noticed a few things about Archive.Today (a/k/a Archive.Is) and the Internet Archive's Wayback Machine.

It turns out that Archive.Today is really convenient to invoke with DDG set as my default search engine as I simply highlight the navigation bar for a page, prepend the "!ais" bang search to the head of the URL (followed by a space) and hit return.

Archive.Today helpfully offers links for other potential archive sites, including the Internet Archive, so I don't have to independently call up that URL.

Archive.Today responds very quickly. There's a practically instant response that the page is or is not archived, and if not, the "save" form also pops up nearly instantly.

By contrast, the Internet Archive takes a few seconds to respond whether or not the page is archived, and a few further seconds when requesting a page be saved.

(Both sites have a two-stage submission. The Internet Archive does have a submission URL which should work in one fell swoop, though it occasionally breaks and error-detection is ... difficult.)

Archive.Today's processing queue ranges from 0 to 10k or so slots.

The Internet Archive is currently reporting ~10 hours to process archival requests.

AT does include comments on Diaspora* posts. IA does not.

My manual workflow has evolved to:

  • Pull up page, reload in Diaspora* (otherwise cookies may not be current, forcing a log-out / log-in cycle, also annoying).
  • Mark the post "tagged" to indicate it's been archived. I typically also "like" it to set a sharper visual indicator.
  • Prepend '!ais ' to the navigation bar and hit <enter>.
  • Open "Search in Internet Archive" in a new tab, then select that tab to get IA working on finding the post.
  • Switch back to the Archive.Today tab and select save, then confirm. At that point the request is processing.
  • Switch back to the Internet Archive tab, wait for the page to fully load, request archive, wait for that page to load, confirm, and wait for the request to return.
  • Even after this stage, the IA request may still fail. Detecting this is ... difficult.

I may also save content from the original (JoindiasporaCom) address, though mostly I'm working through Glasswings. I have run an automated submission of all my posts from the take-out JSON archive, and will run that another time or so before final shutdown. That will at least preserve post content online, but not the comments threads :-(

Hopefully this information may be useful to others.

#Archival #WebArchival #ArchiveIs #ArchiveToday #InternetArchive #WaybackMachine