More

ncruces · 2026-04-23T22:25:44 1776983144

Claims kill this, IMO.

Unless you have a single "reader", you don't mind the delay, and don't worry about redoing a bunch of notifications after a crash (and so, can delay claims significantly), concurrency will kill this.

vrajat · 2026-04-24T09:07:49 1777021669

I wrote a simple queue implementation after reading the Turbopuffer blog on queues on S3. In my implementation, I wrote complete sqlite files to S3 on every enqueue/dequeue/act. it used the previous E-Tag for Compare-And-Set.

The experiment and back-of-the-envelope calculations show that it can only support ~ 5 jobs/sec. The only major factor to increase throughput is to increase the size of group commits.

I dont think shipping CDC instead of whole sqlite files will change the calculations as the number of writes mattered in this experiment.

So yes, the number of writes (min. of 3) can support very low throughputs.

russellthehippo · 2026-04-23T22:45:21 1776984321

exactly. then you're building distributed locking and it's probably time for a different tool

ncruces · 2026-04-23T15:24:17 1776957857

Probably missing something, why is `stat(2)` better than: `PRAGMA data_version`?

https://sqlite.org/pragma.html#pragma_data_version

Or for a C API that's even better, `SQLITE_FCNTL_DATA_VERSION`:

https://sqlite.org/c3ref/c_fcntl_begin_atomic_write.html#sql...

infogulch · 2026-04-23T16:55:43 1776963343

Yeah the C API seems like a perfect fit for this use-case:

> [SQLITE_FCNTL_DATA_VERSION] is the only mechanism to detect changes that happen either internally or externally and that are associated with a particular attached database.

Another user itt says the stat(2) approach takes less than 1 μs per call on their hardware.

I wonder how these approaches compare across compatibility & performance metrics.

russellthehippo · 2026-04-24T07:39:35 1777016375

I just tested this out. PRAGMA data_version uses a shared counter that any connection can use while the C API appears to use a per-connection counter that does not see other connections' commits.

infogulch · 2026-04-24T19:28:42 1777058922

Really? That's the opposite of what I understand the docs say.

Care to share your code? This may become a bug report.

russellthehippo · 2026-04-25T10:58:12 1777114692

Reporting back. This appears to be a bug in my original test the code of which sadly I did not commit anywhere. I went back to regenerate these tests and proved the opposite - the C API is better than PRAGMA and works across connections. I am going to make that update as I've proved across dozens of versions of SQLite that this is not in fact the case.

russellthehippo · 2026-04-24T20:20:34 1777062034

Will do

psadri · 2026-04-23T15:26:41 1776958001

For one it seems to be deprecated.

ncruces · 2026-04-23T15:31:15 1776958275

It's not.

psadri · 2026-04-24T05:07:32 1777007252

You are correct. I apologize. I seemed to have read the next pragma’s depreciation notice!

Aside from this - SQLite has tons of cool features, like the session extension.

russellthehippo · 2026-04-23T20:25:29 1776975929

Yep, definitely still in use. Do yall above have an opinion if the pragma is better than the syscall? What are the trade offs there? Another comment thread mentioned this as well and pointed to io uring. I was thinking that dism spam is worse than syscall spam.

ncruces · 2026-04-23T22:10:12 1776982212

Depends on what to mean by better.

I may be wrong, but I think you wrote somewhere that you're looking at the WAL size increasing to know if something was committed. Well, the WAL can be truncated, what then? Or even, however unlikely, it could be truncated, then a transaction comes and appends just enough to it to make it the same size.

If SQLite has an API it guarantees can notify you of changes, that seems better, in the sense that you're passing responsibility along to the experts. It should also work with rollback mode, another advantage. And I don't think wakes you up if a large transaction rolls back (a transaction can hit the WAL and never commit).

That said, I'm not sure what's lighter on average. For a WAL mode database, I will say that something that has knowledge of the WAL index could potentially be cheaper? That file is mmapped. The syscalls involved are file locks, if any.

russellthehippo · 2026-04-23T22:37:32 1776983852

Interesting, thank you for the response and explanation. Honker workers/listerners are holding an open connection anyway. I do trust SQLite guarantees more than cross-platform sys behavior. I will explore the C API angle.

ncruces · 2026-04-23T09:15:23 1776935723

Only if the barrier of entry is low.

Which it won't be, if at every turn you choose the hyperscaler.

ncruces · 2026-04-22T10:11:22 1776852682

Similar rules in Europe, or at least, Portugal.

ncruces · 2026-04-22T10:07:03 1776852423

Same in Portugal.

ncruces · 2026-04-22T08:31:37 1776846697

I think Haiku is fine (e.g.) for any task that you could almost, but not quite, complete with (regex?) find and replace.

You give it 3 examples of the change you want, then ask it to do the other 87. You'll end up saving time and “money”.

ncruces · 2026-04-22T08:29:10 1776846550

The value add for me is that I can use the web UI to start chatting about and drafting stuff on my phone while I'm commuting to work.

ncruces · 2026-04-21T21:49:13 1776808153

I agree with this.

Past month or so I implemented a project from scratch that would've taken me many months without a LLM.

I iterated at my own pace, I know how things are built, it's a foundation I can build on.

I've had a lot more trouble reviewing similarly sized PRs (some implementing the same feature) on other projects I maintain. I made a huge effort to properly review and accept a smaller one because the contributor went the extra mile, and made every possible effort to make things easier on us. I rejected outright - and noisily - all the low effort PRs. I voted to accept one that I couldn't in good faith say I thoroughly reviewed, because it's from another maintainer that I trust will be around to pick up the pieces if anything breaks.

So, yeah. If I don't know and trust you already, please don't send me your LLM generated PR. I'd much rather start with a spec, a bug, a failing test that we agree should fail, and (if needed) generate the code myself.

ncruces · 2026-04-20T21:25:42 1776720342

Maybe elsewhere it is. Here, it's terrible.

In general, for all it benefits from globalization, Apple disappoints on global markets.

ncruces · 2026-04-19T20:56:14 1776632174

I guess I'm missing the point of this, so I'm probably looking at it wrong.

If you're saving multiple ancestors in ancestors_between why not save all of them?

And if the goal is to access the info for all of the ancestors, what makes it more likely than average that your ancestors aren't on the same layer as yourself (i.e. that e.g. half of them aren in ancestors_between)?

Because, if a fixed ratio (50 or even 10%) of an arbitrary node's ancestors is at the same layer, isn't complexity still the same (only reduced by a constant factor)?