OMG, some of those are legit good. That said the AI seems minimally guidable. It seems to ignore three majority of instructions in https://suno.com/song/25b16ab7-bfea-451d-abb3-8b52cdd783d0?s... so I guess like most tools, it's fine if you want to get what you're given but not really control it.
But they are shit. Over the last 2 days I've got bored of the predictable cycle of it first getting excited about a new idea then back peddling once I shoot it to pieces.
They can't write and think critically at the same time. Then subsequent messages are tainted by their earlier nonsensical statements.
God knows why you think this is possible. If I don't even know what might be relevant to the conversation in several turns, there's no way an agent could either.
One of us is confusing prediction with retrieval. The embedding model doesn't predict what is going to be relevant in several turns, just on the turn at hand. Each turn gets a fresh semantic search against the full body of memory/agent comms. If the conversation or prompt changes the next query surfaces different context automatically.
As you build up a "body of work" it gets better at handling massive, disparate tasks in my admittedly short experience. Been running this for two weeks. Trying to improve it.
So the embedding model is a fixed-size view on a arbitrarily sized work history (tool calls, natural language messages)? The model is like a summarizer, but in latent space? And not aimed to summarize, but trained to hold whatever is needed for the agent to be autonomous for longer runs?
Pretty much. It's a fixed-size vector per chunk-- 1024 dims in the case of Voyager Nano. The autonomy part is entirely in how you build the vectorDB and query it, not in the model's training. That's the part I've been focusing on lately. Trying different methods and seeing what gives the best results.
At the moment I wouldn't emphasize "autonomous-ness", there's still a fair bit of human hand holding. But once I get a model on the right path it can switch back to to an old project, autonomously locate and debug 2-week old commits and the context around their development, and apply that knowledge to the task at hand.
It's only been a day but I seeing an improvement from nomite (768dims) to Voayager.
You could pitch it as your "digital nagging housewife", or a "micromanager in a box". How about "your time wasting interrupt-otron" or just "flow-breaker"?
Seriously why would you think AI could read my mind and tell me what to do next without knowing my goals?
This sounds like the irritating tangential follow-on questions they ask on steroids. Generally irrelevant and take the conversation in a direction you don't want to go.
reply