I've got files going back to 1991. They started on floppy and moved to various formats like hard drives, QIC-80 tape, PD optical media, CD-R, DVD-R, and now back to hard drives.
I don't depend on any media format working forever like tape. New LTO tape drives are so expensive and used drives only support small sized tapes so I stick with hard drives.
3-2-1 backup strategy, 3 copies, and 1 offsite.
Verify all the files by checksum twice a year.
You can over complicate it if you want but when you script things it just means a couple of commands once a week.
I have some going back to my first days with computers (~1997), but it's purely luck. I've certainly lost more files since then than I've kept.
Does that tear me up? Not one bit. And I guess that's the reason why people aren't clamouring for archival storage. We can deal with loss. It's a normal part of life.
It's nice when we do have old pictures etc. but maybe they're only nice because it's rare. If you could readily drop into archives and look at poorly lit pictures of people doing mundane things 50 years ago, how often would you do it?
I'm reminded of something one of my school teachers recognised 20+ years ago: you'd watch your favourite film every time it was on TV, but once you get it on DVD you never watch it again.
I think in general we find it very difficult to value things without scarcity. But maybe we just have to think about things differently. Food is already not valuable because it's scarce. Instead I consider each meal valuable because I enjoy it but can only afford to eat two meals a day if I want to remain in shape. I struggle to think of an analogy for post-scarcity data, though.
What is your process for automating this checksum twice a year? Does it give you a text file dump with the absolute paths of all files that fail checksum for inspection? How often does this failure happen for you?
All my drives are Linux ext4 and I just run this program on every file in a for loop. It calculates a checksum and stores it along with a timestamp as extended attribute metadata. Run it again and it compares the values and reports if something changed.
These days I would suggest people start with zfs or btrfs that has checksums and scrubbing built in.
Over 400TB of data I get a single failed checksum about every 2 years. So I get a file name and that it failed but since I have 3 copies of every file I check the other 2 copies and overwrite the bad copy. This is after verifying that the hard drive SMART data shows no errors.
> What is your process for automating this checksum twice a year?
Backup programs usually do that as a standard feature. Borg, for example, can do a simple checksum verification (for protection against bitrot) or a full repository verification (for protection against malicious modification).
I don't depend on any media format working forever like tape. New LTO tape drives are so expensive and used drives only support small sized tapes so I stick with hard drives.
3-2-1 backup strategy, 3 copies, and 1 offsite.
Verify all the files by checksum twice a year.
You can over complicate it if you want but when you script things it just means a couple of commands once a week.