Tool output truncation helps a lot and is one of the best ways to reduce context bloat. In my coding agent the context is assembled from SQLite. I suffix the message ID to rehydrate the truncated tool call if it’s needed and it works great.
My exploration on context management is mostly documented here https://github.com/hsaliak/std_slop/blob/main/docs/CONTEXT_M...
My experience so far tells me that the default path with AI tooling is that it lets us create without learning. So the author is right in that they can pay for a seat in this revolution whenever they want.
A practitioner with more experience maybe a few percentage points more productive, but the median - grab subscription, get tool, prompt, will be mostly good enough.
I expect tools to start embedding an SLM ~1B range locally for something like this. It will become a feature in a rapidly changing landscape and its need may disappear in the future. How would you turn into a sticky product?
It seems like a real problem for me. Probably because I'm not overly inspired to pay for a Claude x5 subscription and really hate the session restrictions (esp when weekly expend at the end of the week can't be utilized due to session restrictions) on a standard pro model. Most of my tasks are basically using superpowers and I find I get about 30-90m of usage per session before I run out of tokens (resets about every 4 hours after which I generally don't get back to until the next day (my weekly usage is about 50% so lots of wastage due to bad scheduling). A tool like this could add better afk like agent interoperability through batching etc as a one tool fits all like scenario.
If this gets its foot in the door/market-share there is plenty of runway here for adding more optimized agent utilization and adding value for users.
Agreed on the need, and this space needs more exploration that is not going to come from big-cos as they are incentivised in boosting spend. I've been exploring the same problem statement, but with a different approach https://github.com/hsaliak/std_slop/blob/main/docs/CONTEXT_M....
The comment was more around how to make their approach sticky.. I feel that local SLMs can replicate what this product does.
https://github.com/hsaliak/std_slop a sqlite centric coding agent. it does a few things differently.
1 - context is completely managed in sqlite
2 - it has a "mail model" basically, it uses the git email workflow as the agentic plan => code => review loop. You become "linus" in this mode, and the patches are guaranteed bisect safe.
3 - everything is done in a javascript control plane, no free form tools like read / write / patch. Those are available but within a javascript repl. So the agent works on that. You get other benefits such as being able to persist js functions in the database for future use that's specific to your codebase.
I explored this in std::slop (my clanker) https://github.com/hsaliak/std_slop. One of it's differentiating features of this clanker i that it only has a single tool call, run_js.
The LLM produces js scripts to do it's work. Naturally, i tried to teach it to add comments for these scripts and incorporate literate programming elements. This was interesting because, every tool call now 'hydrated' some free form thinking, but it comes at output token cost.
Output Tokens are expensive! In GPT-5.4 it's ~180 dollars per Million tokens!
I've settled for brief descriptions that communicate 'why' as a result. The code is documentation after all.
This is a very nice and clean implementation. Related to this - I've been exploring injecting landlock and seccomp profiles directly into the elf binary, so that applications that are backed by some LLM, but want to 'do the right thing' can lock themselves out. This ships a custom process loader (that reads the .sandbox section) and applies the policies, not unlike bubblewrap which uses namespaces). The loading can be pushed to a kernel module in the future.
https://github.com/hsaliak/sacre_bleu very rough around the edges, but it works.
In the past there were apps that either behaved well, or had malicious intent, but with these LLM backed apps, you are going to see apps that want to behave well, but cannot guarantee it.
We are going to see a lot of experimentation in this space until the UX settles!
This brings the Linux Kernel style patch => discuss => merge by maintainer workflow to agents. You get bisect safe patches you 'review' and provide feedback and approve.
While a SKILL could mimic this, being built in allows me to place access control and 'gate' destructive actions so the LLM is forced to follow this workflow. Overall, this works really well for me. I am able to get bisect-safe patches, and then review / re-roll them until I get exactly what I want, then I merge them.
Sure this may be the path to software factories, but it scales 'enough' for medium size projects and I've been able to build in a way that I maintain strong understanding of the code that goes in.
reply