I've recently been developing a raspberry pi based solution which works with telemetry logs. First implementation used an SQLite database (with WAL log) – only to find it corrupted after just couple of days of extensive power on/off cycles.
I've since started looking at parquet files – which turned out to not be friendly to append-only operations. I've ended up implementing writing events into ipc files which then periodically get "flushed" into the parquet files. It works and it's efficient – but man is it non-trivial to implement properly!
My point here is: for a regular developer – CSV (or jsonl) is still the king.
> First implementation used an SQLite database (with WAL log) – only to find it corrupted after just couple of days of extensive power on/off cycles.
Did you try setting `PRAGMA synchronous=FULL` on your connection? This forces fsync() after writes.
That should be all that's required if you're using an NVMe SSD.
But I believe most microSD cards do not even respect fsync() calls properly and so there's technically no way to handle power offs safely, regardless of what software you use.
I use SanDisk High Endurance SD cards because I believe (but have not fully tested) that they handle fsync() properly. But I think you have to buy "industrial" SD cards to get real power fail protection.
Raspberry Pi uses microSD card. Just using fsync after every write would be a bit devastating, but batching might've worked ok in this case. Anyways, too late to check now.
There's definitely a place for it. I ran into the same problem with a battery powered event logger. Basically alternate between sleep-until-event and sample-until-event-over.
SQLite was fine until the realities of that environment hit.
0) I need to save the most data over time and my power budget is unpredictable due to environmentals.
1) When should I commit? SQLite commit per insert slows down, impacts battery life, impacts sample rate. Practically you could get away with batching all data for a small period.
2) SQLite is slow to repair databases. Partially written file would often take longer to repair than we had battery to run.
CSV based format filled that niche. First column was line-column count to support firmware upgrades. Last column is line-checksum. Another column indicating if this line was the last for an event. Parser skips corrupted lines/entries.
> I've since started looking at parquet files – which turned out to not be friendly to append-only operations. I've ended up implementing writing events into ipc files which then periodically get "flushed" into the parquet files. It works and it's efficient – but man is it non-trivial to implement properly!
I think the industry standard for supporting this is something like iceberg or delta, it's not very lightweight, but if you're doing anything non-trivial, it's the next logical move.
I've since started looking at parquet files – which turned out to not be friendly to append-only operations. I've ended up implementing writing events into ipc files which then periodically get "flushed" into the parquet files. It works and it's efficient – but man is it non-trivial to implement properly!
My point here is: for a regular developer – CSV (or jsonl) is still the king.