As counterintuitive as it may be, it seems to me like the only reliable long-term storage for data is with commercial cloud providers.
Any time you're physically warehousing old hard drives and whatnot, they're going to be turning into bricks.
Whereas with cloud providers, they're keeping highly redundant copies and every time a hard drive fails, data gets copied to another one. And you can achieve extreme redundancy and guard against engineering errors by archiving data simultaneously with two cloud providers.
Is there any situation where it makes sense to be physically hosting backups yourself, for long-term archival purposes? Purely from the perspective of preserving data, it seems worse in every way.
^ This. Physical media is continuously degrading. Large storage systems work by regularly reading, verifying, and replicating data - it is always doing backups and restores. If this isn't happening actively and regularly, your data will cease to exist at some point in time.
Whether we collectively need to store all these things is another question entirely. But if we want to keep it - we'll have to do the work to keep it maintained.
> Two cloud providers pretty much guarantees against that -- the idea that two would terminate it simultaneously is vanishingly small.
Ask Julian Assange about that. Sure, the US government claimed he had committed a crime, but he disagreed. You really need to store your data in at least two of (a) NATO and Western-aligned countries (b) Russia and aligned countries (c) Mainland China, and that’s assuming the prior probability of you being at risk in those blocs is low. It’s hard to avail yourself of this if you’re a legitimate company, but plausible if you’re a private citizen.
SLA numbers don't say anything about your particular piece of data, just all customer data on aggregate.
> The idea that two would terminate it simultaneously is vanishingly small.
Au contraire, if you become sanctioned or illegal, both would necessarily have to terminate simultaneously, if the cloud providers want to comply with local laws.
You're not using a Chinese or Iranian cloud, are you?
Cloud provider stats are based on aggregate numbers. If they claim 99.999% of all data is retained, and they have 100,000 TB of data collectively, then if they lose your entire 1 TB of data then they can still claim that they maintain 99.999% of data as long as they don't lose anyone else's data.
But in practice everybody's data is widely distributed.
So I'm not sure what kind of event you're imagining that would take down one customer's data and no one else's.
It's not an issue. Whereas if you're warehousing all your data in a single physical location, it's vastly more likely for it to be fully destroyed due to a fire/earthquake/flood/etc.
Any time you're physically warehousing old hard drives and whatnot, they're going to be turning into bricks.
Whereas with cloud providers, they're keeping highly redundant copies and every time a hard drive fails, data gets copied to another one. And you can achieve extreme redundancy and guard against engineering errors by archiving data simultaneously with two cloud providers.
Is there any situation where it makes sense to be physically hosting backups yourself, for long-term archival purposes? Purely from the perspective of preserving data, it seems worse in every way.