#archiving

danie10@squeet.me

Archiving Data On Plain Paper Using 2D Images

Section of an A4 size paper, showing some horizontal lines made up from a 1 cm patterns of dots. At the bottom is printed in words TWIBRIGHT OPTAR 0-32-46-24-3-1-2-24
Optar or OPTical ARchiver is a project capable of squeezing a whopping 200 kB of data onto a single A4 sheet of paper, with writing and reading achieved with a standard laser printer and a scanner. It’s a bit harder than you might think to get that much data on the page, given that even a 600 DPI printer can’t reliably place every dot each time. Additionally, paper is rarely uniform at the microscopic scale, so Optar utilizes a forward error-correcting coding scheme to cater for a little irregularity in both printing and scanning.

Yes, 200 Kb does not sound like much when you think of application files or images, but this could store a whole novel of text onto two such pages (about 80,000 words). It is also ideal for accounting records or for files at a notary. It could save paper, yes (if you use a laptop to read a book from such pages), but its intention is also for preventing digital obsolescence where paper is usually still readable over the longer term. The source code is known, and the hardware required are just plain old laser printers (or similar) and a scanner.

And of course this can be rolled up and attached still to a carrier pigeon!

See hackaday.com/2024/09/15/archiv…
#Blog, ##obsolescence, #archiving, #technology

danie10@squeet.me

Paperless-ngx is an open-source document management system that transforms your physical documents into a searchable online archive

A webpage showing a tiled layout with tiles arranged in horizontal rows, each one representing a document that has been scanned in. Each tile shows a thumbnail image of the document contents with titles below, tags, a creation date, etc. Down the left side is a menu f options such as Dashboard, Documents, Recently Added, Inbox, Correspondents, Tags, Document Types, Storage Paths, Custom Fields, Templates, Mail, Settings etc. At the top is a search bar.
You can either scan or upload various document formats into Paperless-ngx.

It will organise and index your scanned documents with tags, correspondents, types, and more. Your data is stored locally on your server and is never transmitted or shared in any way. It performs OCR on your documents, adding searchable and selectable text, even to documents scanned with only images.

Documents are saved as PDF/A format which is designed for long term storage, alongside the unaltered originals. It uses machine-learning (see no AI) to automatically add tags, correspondents, and document types to your documents. Supports PDF documents, images, plain text files, Office documents (Word, Excel, PowerPoint, and LibreOffice equivalents) and more.

I installed this using the Docker Compose script file. I did notice though for support of Word, Excel, PowerPoint, and LibreOffice equivalents I needed to also install Tika and Gotenberg (added them to the Docker Compose file).

It is not just limited to documents, though, as it will also connect via IMAP to an e-mail server and organise and archive your e-mails.

I’m testing it out a bit now and finding it useful for scanning in my numerous receipts, as the OCR will help find what I’m looking for later. I’m thinking of doing a video about it in a few weeks to show what it does, and does not, do.

See docs.paperless-ngx.com/
#Blog, #archiving, #documents, #opensource, #technology

danie10@squeet.me

What’s the Value of 3 Million LPs in a Digital World? Easy! They can be Played still in 50+ Years’ Time!

Tall library shelfves that show vinyl records stacked on the shelves. A man is seen walking between the rows of shelves.
The ARChive of Contemporary Music has one of the largest collections of vinyl records in the world and is in danger of losing its home. Its champions are making a case for the future of physical media.

If someplace like a university starts a digitization program for someone’s papers or recorded work, they might end that work when a grant or allotted funds run out. At that point, George says, you have to worry about not just where that material goes, but also how you might be able to play it in the future.

Vinyl records are likely to always be playable, but as tech companies come and go, access to a lot of digital archives can feel precarious. “We joke with the people at the Internet Archive about who’s going to last longer, and we’re all pretty sure it’s us,” George quips. “If you’ve got a bicycle wheel, a rubber band, a bundle of sewing needles, and a cone of paper, you’ll always be able to play an LP, but you can’t make a chip at home.”

The ARC has given itself until Valentine’s Day to come up with the additional funds it needs for a new space. Though no one has come through yet, the group has solicited everyone from Quincy Jones to Discogs. “There’s interest, but no one’s actually said yes,” George says.

This is also where copyright, which some love to invoke to protect rights, may end up losing us a lot of the music that is created today. Digital needs to be backed up, transformed, replicated elsewhere, etc continuously to protect it. Just look at the flak that Google had around scanning of books. And even Google loses interest at some point, and that repository of creative works is gone in the blink of an eye. Storage space costs money in the long term, whether physical or digital, but I’d venture to say digital can cost more with its required refreshing, transforming to new types of mediums (storage and players), backups, etc.

Digital represents here-and-now convenience, but it is really not an effective long-term archive for mass storage of creative works.

Our generation probably needs to lose a large amount of its memories before the world wakes up to the fact that digital photos, books, music, etc are risky to keep in digital only format for long term archiving. That encrypted hard drive at home sounds like a great thing for its owner to have, but what does it mean to that person’s children or grandchildren one day when it is inaccessible and holds the family photos, recipes, scanned documents, etc which can never be accessed by anyone?

See https://www.wired.com/story/archive-of-contemporary-music-save-3-million-records-digital-streaming/
#Blog, #archiving, #music, #technology, #vinyl

danie10@squeet.me

Well Documented Code Helps Revive Decades-Old Commodore Project: Moral Of The Story, If You Want To Keep It, Print It On Paper

Bild/Foto
Luckily, Stephen’s younger self went to some extremes documenting the project, starting with a map he created which was inspired by Dungeons and Dragons. There are printed notes from a Commodore 64 printer, including all the assembly instructions, augmented with his handwritten notes to explain how everything worked. He also has handwritten notes including character set plans, disk sector use plans, menus, player commands, character stats and equipment, all saved on paper. The early code was written using a machine language monitor, since [Stephen] didn’t know about the existence of assemblers at the time. Eventually he discovered them, attempted to rebuild the code on a Commodore 128 and then an Amiga, but never got everything working together. There is some working code still on floppy disk, but a lot of it doesn’t work together either.

Goes to show again the value of printing on paper for long term storage. I experienced the same thing when I wanted to resuscitate a program I’d written on the HP-41CV about 35 years ago. I’d also kept all my handwritten notes, and with a few fixes, I got it going again, and I published the code on Github. I also remember having printed out and filing all the Clipper source code that I worked on in my professional career in the 1990’s – more because we filed everything, and not because I was specifically thinking about the longevity of it being accessible.

See https://hackaday.com/2023/06/07/well-documented-code-helps-revive-decades-old-commodore-project/

#technology #Commodore64 #gaming #archiving
#Blog, #archiving, #commodore64, #gaming, #technology

danie10@squeet.me

What’s the Best Way to Store Data for Decades or Centuries? Bottom line: No Technology that is Easy or Practical

SSD drives and RAM
The concern keeps coming up (I’ve also been pondering it a lot and posted in this last week what I’ve been doing).

This linked article does sum up the essentials very well, and this helps illustrate why this is a challenge for 20 or 60 years or especially longer:

  1. Media – most people just worry about this and how it may degrade.
  2. Software drivers – you need the software to interface with whatever the hardware device is AND to decode / display the information. That software must also execute on whatever operating system you or your family are using in 60- or 100-years’ time… So, there are three dimensions to consider just around the software side alone, and they must ALL be satisfied. This is also why open standards (eg. PDF or ASCII text) are best, as there is more chance to be able to read them in future. Do you think MS Word 2007 will actually open on something in 2123?
  3. Hardware device – the media needs to fit into and be read. Do you still have a single sided floppy drive, for example? Can it connect to your current laptop? No, likely not.

So, we can see now why archivists still like old-fashioned paper, as it survives 400+ years and can be read by almost anyone (very old languages are a challenge). Microfilm, printed photos, etc are all mediums that can be understood without specialist hardware or software.

It’s quite true what is stated in the article about cloud storage. I’ve said so many times, that despite this generation generating more information and data than any other preceding it, it is also going to lose massive chunks of it. A generation or two of memories will be lost for many families, too. Ten or twenty years of photographs will just be lost in the twinkle of an eye off a personal hard drive, or 40GB sitting in a Google Drive will be lost in 40 years’ time as someone forgot to log in and keep the account active.

Any refreshing of online data I’m doing, I’m doing for myself. If I want my family to have any of my memories, I print those digital photos to paper albums, and I print my story out on paper. That’s all I can trust. Keeping GB of it in Google Drive or e-mailing it to someone… no, that won’t be around in 60- or 100-years’ time. I’d bet on it, but I probably won’t be here to collect the money.

See https://www.howtogeek.com/858426/whats-the-best-way-to-store-data-for-decades-or-centuries/
#Blog, #archiving, #technology

danie10@squeet.me

With around 180,000 people dying daily, I found more analogue ways of preserving my memories

Bild/Foto
Reading an article about this published two days back in The Technocrat, MIT Technology Review, got me thinking yet again about this topic (as you do when you are past your 40’s or 50’s) and what I’ve gravitated towards in order to preserve some memories. Too many just think about backups, but backups are for own our convenience, to recover our lost data now or tomorrow.

In case you have not realised it, all those thousands of e-mails, photos, documents, etc you’ve saved online with your 100 GB or more cloud storage plan, are all wiped out in a split second when the service provider goes out of business one day, or your dormant account is just removed a year or two after your demise. Downloading it all to your PC hard drive? Well, who knows what to find where on your 1 TB or more of storage on your drive (if you did not encrypt it). Hard drives get stolen or damaged just as easily.

I’ve spent the last year sifting through thousands of old family photos, going back 100+ years, and these actually led me to one of my first solutions below…

  1. Every year, select some of your best photos – then use an online service to have them printed into a hard cover photo album, along with captions etc. These old analogue albums do stand the test of time as well as accessibility by relatives.
  2. My blog posts are all automatically archived from my WordPress site (via a plugin) to The Internet Archive, along with my static web pages.
  3. I’m using Google Documents to document my own life story, and have shared that to my daughter, so she can follow it as it develops.

As we get cleverer and cleverer with all sorts of new technology, social networks, AI, etc, we also realise that all of that is often very fragile when it comes to longevity, and archival periods that can stretch over 100’s of years. But what is of concrete importance is, if you don’t give some thought to it, and take some action, yours will just be wiped out.

If you have any ideas that can also add any practical value to this topic, please comment below, and I’ll add them to my original post.
#Blog, #archiving, #digitalmemories, #memories, #technology

danie10@squeet.me

Android is getting App Auto-Archiving, a bit like iOS has, but not manually controllable

App store screen showing a pop up message offering to enable auto-archiving of apps
Similarly, to iOS, it will retain the icon and user data and remove the application itself to save about 60% storage space (and also not run anything as a background service). This is different from Deep Sleep like Samsung has, which just freezes the app’s processes.

So, it achieves the same aim, but Google is implementing it differently to iOS in that you opt into this when running out of storage space, and it is not just executed on a per app basis by user decision like on iOS. The iOS way gives you more control over the process, and I’d prefer it that way, instead of basically running out of space before this kicks in.

But there is zero reason why Google cannot also make it a manually selectable process if they really wanted to… And of course, Google also has this other proviso to make it a bit more obscure: “Auto-archive is only available for developers using the App Bundle to publish their apps”.

As usual, Google could have done a way better job is making this appear more like users would like to have actually used it.

See https://android-developers.googleblog.com/2023/04/reduce-uninstalls-for-your-app-with-auto-archive.html
#Blog, #android, #archiving, #storage, #technology

danie10@squeet.me

The Internet Is Rotting - Too much has been lost already. The glue that holds humanity’s knowledge together is coming undone, and it's affecting court cases and more

As far back as 2001, a team at Princeton University studied the persistence of web references in scientific articles, finding that the raw number of URLs contained in academic articles was increasing but that many of the links were broken, including 53 percent of those in the articles they had collected from 1994. Thirteen years later, six researchers created a data set of more than 3.5 million scholarly articles about science, technology, and medicine, and determined that one in five no longer points to its originally intended source. In 2016, an analysis with the same data set found that 75 percent of all references had drifted.

Deletion isn’t the only issue. Not only can information be removed, but it also can be changed. Before the advent of the internet, it would have been futile to try to change the contents of a book after it had been long published.

So yes it is a very real problem and due to the decentralised nature of the Internet sites, blogs, books, and even government sites get deleted and changed. On the site/hosting side this will not change, so right now our hope really lies in the Internet Archive's Wayback Machine (and that it keeps getting funding) and efforts like Amberlink, as it is unlikely that any legislation will change this reality. The fact is, we are doomed to lose a lot of human knowledge though the Internet, at least for now.

See The Internet Is Rotting

#technology #archiving #knowledge #internetarchive #waybackmachine

Image/photo

Too much has been lost already. The glue that holds humanity’s knowledge together is coming undone.


https://gadgeteer.co.za/internet-rotting-too-much-has-been-lost-already-glue-holds-humanitys-knowledge-together-coming