Hacker Newsnew | past | comments | ask | show | jobs | submit | zby's commentslogin

It is not novel - but with the new models it is just becoming practical.

If you have a test that fails 50% times - is that test valuable or not? A 50% failure rate alone looks like a coin toss, but by itself that does not tell us whether the test is noise or whether it is separating bad states from good ones. For a test to be useful it needs to have positive Youden’s statistic (https://en.wikipedia.org/wiki/Youden%27s_J_statistic): sensitivity + specificity - 1. A 50% failure rate alone does not let us calculate sensitivity and specificity.

I can see a similar problem with this article - the author notices that LLMs produce a lot of errors - then concludes that they are useless and produce only simulacrum of work. The author has an interesting observation about how llms disrupt the way we judge knowledge work. But when he concludes that llms do only simulacrum of work - this is where his arguments fail.


Gee, a thing by a guy, with a name. What are you saying exactly? So the test in question is a test the LLM is asked to carry out, right? Then your point is that if it's a load of vacuous flannel 49% of the time, but meaningful 51% of the time, on average this is genuine work so we can't complain about the 49%?

Wait, you're probably talking about the test of discarding a report based on something superficial like spelling errors. Which fails with LLMs due to their basic conman personalities and smooth talking. And therefore ..?


> For a test to be useful it needs to have positive Youden’s statistic

This is not true as stated. I'd try to gloss over the absolutes relative to the context, but if I'm totally honest, I'm not sure I understand what idea you're trying to communicate.


I don't know - looks like an interesting idea - but ... I am struggling to put that in a polite manner. When I go into the repo and find out that it does stuff like lip syncing of talking avatars then I start to think what percentage of the development effort goes into marketing?

The idea is for non tech people to relate to agents through a human style interaction - that part is actually only a rfelatively small pice of the system but it brings it to life for people.

Its a way to encapsulate the personality and expertise - at least that's the idea :)


I like how the author notices that it really got a start with cloud computing.



Reviewed: https://zby.github.io/commonplace/agent-memory-systems/revie...

It is a third llm wiki on front page in 24 hours! Obviously it is a hot topic. I have my own horse in that race - so I might not be objective - but I've compiled a wishlist for these system: https://zby.github.io/commonplace/notes/designing-agent-memo...

I wish there was a chance for collaboration - everybody coding their own system seems like a lot of effort duplication.


Your notes look really interesting, thanks. I'm curious --from the prose style it's clear they were written by an LLM. For design notes like this do you sort of have a mental TODO to go back and write them up in your own words to make sure they really capture your own opinions?

For the design notes like: https://zby.github.io/commonplace/notes/designing-agent-memo... - I iterate over and over to clean them. This one is also a compilation with many intermediate documents.

But the reviews are written automatically - here are the instructions: https://github.com/zby/commonplace/blob/main/kb/agent-memory...

Overall the knowledgebase is a mixture of these. I have this disclaimer on the first page:

This KB is itself agent-operated: a human directs the inquiry, AI agents draft, connect, and maintain the notes. The framework for building knowledge bases is documented using that framework.

I hope it is enough - I've seen many people get angry with publishing LLM generated work.


love the "Borrowable Ideas" section. would suggest to definitely borrow them.

full disclosure: we started as a context infra company (nex.ai) from long long before Karpathy even came up with the LLM wiki idea, and have barely exposed any of that stuff to WUPHF but starting to open some of that now. glad to see the concerns in the comparison are things that our context infra already built for.

still, happy to collab & share learnings, and of course avoid duplication.


yes, generative slot machines are isolating. You say you "wish there was a chance"? As if there isn't?

taking a look :)

I mean honestly this stuff is now in roll your own territory now. Run QMD on an obsidian vault and that's like 80% of the way there and you can probably do that in < 2 hours

This report lists failures of some AI systems. They look consequential - but the company does not seem to care. This is very strange - how can it be? I really like AI products they help me all the time - but I know I need to take into account their failure modes and be careful. But lots of organisations don't seem to do that calculation. Will competition root them out? I don't know - I am so enthusiastic about AI - but ever after the LangChain situation I can see that what is adopted is always something that has a lot of flows. The more careful developers that notice the flaws and try to find true workarounds fail because it takes time to do the design well. It is not new thing - there were Betamax mourners for decades - but it seems that the hype machine is now more and more powerful.

Which "LangChain situation" are you talking about? Anything specific, or just everything that's happened in the past year or so?

What I meant was how LangChain dominated the llm frameworks scene because it loaded VC money. It was just at the beginning - now it has normalised - but I believe it did a lot of damage at that early stage by sucking all oxygen.

Reviewed: https://zby.github.io/commonplace/agent-memory-systems/revie...

It is the second llm wiki on frontpage today!

I wish the scene was more collaborative - instead of everyone writing their own. But I guess this is the llm curse - too easy to start. I am afraid it will all go in the LangChain direction with VC funding designs that are not yet ready solidifying choices that would normally be superseded.


Lots of good ideas and divergent methods and sources here cheers for the link

What's the other one? I cannot find it.


This is .. honestly a great synopsis of Atomic and its design tradeoffs. Thanks! Giving commonplace a look.

Thanks!

The reviews are done automatically - here are the instructions: https://github.com/zby/commonplace/blob/main/kb/agent-memory...

I am open to changing these instructions - it cannot be about just making your system look better - but I'll try to incorporate genuine ideas how to improve these reviews.


Everybody is building their own llm-wiki systems these days. I have my own and compiled a big list of other agent memory systems in it: https://zby.github.io/commonplace/agent-memory-systems/ I'll add yours promptly.

And just today I also vibed a wish list (based on all the material I gathered) for such systems: https://zby.github.io/commonplace/notes/designing-agent-memo...

I wish we could collaborate.


> Everybody is building their own llm-wiki systems these days

there is a dark side to this. my coworker is insistent that his variant of this is going to become the teams backbone and i can't get him to stop even when i showed him a page of beyond wrong answers. he straight up doesn't understand that having a knowledge base != claude now sees all of it at once and can consider the endless breadth and shades of gray that make up human decisions. he's 100% convinced that claude grepping through the files is foolproof and won't miss any details lol

i personally just stopped messing with grand knowledge base ideas and these techs. i think everyone's shooting a lil too high and can't fully define what exactly they're after.

so i stepped back and i keep claude there for a very black and white need. claude's there to speed up stuff in domains i know well so i can guardrail him with massive success and have him code pieces im simply too lazy to code myself or alley oop something im struggling with. in a tortoise and the hare parable kind of way im the only guy here who isnt getting huge gotcha holes from AI in the solutions im delivering. all polished with the same attention to detail ive always had. i've just found these grand wiki everything ideas are just not yielding what people think they're yielding. for whatever reason i'm still the meatware layer thats a better index in the end if ive done my homework. perhaps something is lost when we cede a huge chunk of our journeys to seek information. i've still yet to be impressed by any "claude tied all these things together and found this insight this is insane" moments, every single time ive pointed out that any of that could've been a report.


Here's another one for your list https://github.com/Signet-AI/signetai I am not affiliated, just testing it out.


Hey zby, if you're collecting these, Hjarni (hjarni.com) would fit your source-only tier alongside Fintool and Supermemory. Hosted SaaS with MCP built in, hierarchical LLM instructions (global/team/container/note), and a shared-note protocol for Claude/ChatGPT multi-agent workflows. Happy to write up a page in whatever shape you want.

The wishlist doc you linked is good, would be up for collaborating on that.


The other reviews are based on published articles - but I am not sure if I want to continue this - because it is hard to keep them honest.

Maybe you can use the instructions from my repo - which are here: https://github.com/zby/commonplace/blob/main/kb/agent-memory... and run them on your code directory? Then send me the result.


This is a really cool list and repository of ideas. Seems like the focus of the work is on making knowledge legible to AI. I wonder if you (or others) have done a similar level thinking about the inverse – making AI more legible to humans?

I wish I'd found some for neovim on that list.

Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: