Indeed the idea of a meta LLM, or some sort of clear distinction between manual and automated-but-questionable tests makes sense. So what bothers me is that does not seem to be the approach most people take: code produced by the LLM is treated the same as code produces by human authors.