Hacker Newsnew | past | comments | ask | show | jobs | submit | agentultra's commentslogin

How do the reviewers feel about this? Hopefully it won't result in them being overwhelmed with PRs. There used to be a kind of "natural limit" to error rates in our code given how much we could produce at once and our risk tolerance for approving changes. Given empirical studies on informal code review which demonstrate how ineffective it is at preventing errors... it seems like we're gearing up to aim a fire-hose of code at people who are ill-prepared to review code at these new volumes.

How long until people get exhausted with the new volume of code review and start "trusting" the LLMs more without sufficient review, I wonder?

I don't envy Linus in his position... hopefully this approach will work out well for the team.


There's a stark difference between a table saw and an LLM that weakens this argument.

A table saw isn't a probabilistic device.


But I, a woodworker, can immediately see if the piece of wood that came out of the table saw looks like it should.

Also I, a programmer, can immediately see whether the "probabilistic device" generated code that looks like it should.

Both just let me get to the same result faster with good enough quality for the situation.

I can grab a tape measure or calipers and examine the piece of wood I cut on the table saw and check if it has the correct measurements. I can also use automated tests and checks to see that the code produced looks as it should and acts as it should.

If it looks like a duck and quacks like a duck... Do we really need to care if the duck was generated by an AI?


Also I, a programmer, can immediately see whether the "probabilistic device" generated code that looks like it should.

I highly doubt that.

Empirical studies show that humans have very little effect on error rates when reviewing code. That effect disappears quickly the more code you read.

Most programmers are bad at detecting UB and memory ownership and lifetime errors.

A piece of wood comes off the table it’s cut or it’s not.

Code is far more complex.


> Most programmers are bad at detecting UB and memory ownership and lifetime errors.

And this is why we have languages and tooling that takes care of it.

There's only a handful of people who can one-shot perfect code in a language that doesn't guard against memory ownership or lifetime errors every time.

But even the crappiest programmer has to actually work against the tooling in a language like Rust to ownership issues. Add linters, formatters and unit tests on top of that and it becomes nigh-impossible.

Now put an LLM in the same position, it's also unable to create shitty code when the tooling prevents it from doing so.


A piece of wood is either cut to spec or not. You don’t have to try and convince the table saw with a prompt that it is a table saw.

These tools are nothing alike and the reductionism of this metaphor isn’t helpful.


But how do you know it's cut to spec if you don't measure it?

Maybe someone bumped the fence aw while you were on a break, or the vibration of it caused the jig to get a bit out of alignment.

The basic point is that whether a human or some kind of automated process, probabilistic or not, is producing something you still need to check the result. And for code specifically, we've had deterministic ways of doing that for 20 years or so.


> And for code specifically, we've had deterministic ways of doing that for 20 years or so.

And those ways all suck!

It's extremely difficult to verify your way to high quality code. At lower amounts of verification it's not good enough. At higher amounts the verification takes so much longer than writing the code that you'll probably get better results cutting off part of the verification time and using it to write the code you're now an expert on.


I guess that the point being made by GP is that most software are a high-dimensional model of a solution to some problem. With traditional coding, you gradually verify it while writing the code, going from simple to complex without loosing the plot. That's what Naur calls "The theory of programming", someone new to a project may take months until they internalize that knowledge (if they ever do).

Most LLM practices throw you in the role of that newbie. Verifying the solution in a short time is impossible. Because the human mind is not capable to grapple with that many factors at once. And if you want to do an in depth review, you will be basically doing traditional coding, but without typing and a lot of consternation when divergences arise.

> And for code specifically, we've had deterministic ways of doing that for 20 years or so.

And none of them are complete. Because all of them are based on hypotheses taken as axioms. Computation theory is very permissive and hardware is noisy and prone to interference.


I kinda like the analogy of travelling here.

With normal artisanal coding you take your time getting from A to B and you might find out alternate routes while you slowly make your way to the destination. There's also a clear cost in backtracking and trying an alternate route - you already wrote the "wrong" code and now it's useless. But you also gained more knowledge and maybe in a future trip from A to C or C to D you know that a side route like that is a bad idea.

Also because it's you, a human with experience, you know not to walk down ravines or hit walls at full speed.

With LLMs there's very little cost in backtracking. You're pretty much sending robots from A to B and checking if any of them make it every now and then.

The robots will jump down ravines and take useless side routes because they lac the lived in experience "common sense" of a human.

BUT what makes the route easier for both are linters, tests and other syntactic checks. If you manage to do a full-on Elmo style tunnel from A to B, it's impossible to miss no matter what kind of single-digit IQ bot you send down the tube at breakneck speed. Or just adding a few "don't walk down here, stay on the road" signs on the way,

Coincidentally the same process also makes the same route easier for inexperienced humans.

tl;dr If you have good specs and tests and force the LLM to never stop until the result matches both, you'll get a lot better results. And even if you don't use an AI, the very same tooling will make it easier for humans to create good quality code.


That would be great if you were a research lab with unlimited funding. But most business needs to grapple with real user data. Data they've been hired to process or to provide an easier way to process. Trying stuff until something sticks is not a real solution.

Having tests and specs is no guarantee that something will works. The only truth is the code. One analogy that I always take is the linear equation y = ax + b. You cannot write tests that fully proves that this equation is implemented without replicating the formula in the tests. Instead you check for a finite set of tuples (x, y). Those will helps if you chose the wrong values of a or switch to the negative of b, but someone that knows the tests can come up with a switch case that returns the correct y for the x in the tests and garbage otherwise. That is why puzzle like leetcode don't show you the tests.


Of course tests can't be perfect, but even a flimsy guardrail and a warning sign before a ravine is better than nothing.

The optimal solution would be to encase the whole thing in blast-proof transparent polymer, but nobody has the money to do that :)

Trying stuff until something sticks was not a solution when a human had to do the trying and every line of code cost money.

Now you can launch 20 agents to do slightly different things to see if something sticks - and still do the manual work yourself for the 21st path. The cost for those extra 20 attempts is next to nothing compared to the price of an actual programmer.


Anyone who has used a table saw before knows it's anything but probabilistic. Jus a little carelessness and you cut your thumb off.

As with LLMs, where careless use results in you dropping prod db or exposing user data.


To lie requires recognition of the truth and an intention to deceive. LLM’s don’t have such abilities. They are systems that generate plausible sequences of symbols based on training inputs, alignments, reinforcement, and inference. These systems don’t know or care what truth is and therefore cannot lie.

It’s already bad. I’m not looking forward to the future. These systems are terrible. It’s a future without people that they want for some reason. I’d rather deal with people incompetent, tired, annoyed people than an LLM.


Ill-thought out logic.

The company that deployed the LLM is lying to you. The people who made that decision are the ones who are culpable.

We both agree that it’s terrible.

I think it’s important to have an enforcement mechanism to force companies to do what they are responsible for doing. An Anti-Kafka Law, so to speak.


An important distinction to make, and I whole heartedly agree.

It’s not LLMs replacing workers, it’s people. People who have a lot of money and don’t sell their labour for a paycheque. And the systems that compel them to such actions.


> Anyone not learning things via LLM coding right now either doesn't care at all about the underlying code/systems

How many bytes is a pointer in C? How many bytes is a shared pointer in C++? What does sysctl do? What about fsync?

What is a mutex lock? How is it different from a spin lock?

You want to find the n nearest points to a given point on a 2-D Cartesian plane. Could you write the code to solve that on your own?

Can you answer any of these questions without searching for the answer?

I don't use LLMs and I learn things fine. Always have. For several decades. I care deeply about the underlying code and systems. It annoys me when people say they do and they cannot even understand how the computer works. I'm fine with people having domain-specific knowledge of programming: maybe you've only been interested in web development and scripting DOM elements. But don't pretend that your expertise in that area means you understand how to write an operating system.

Or worse: that it prevents you from learning how to write an operating system.

You can do that without an LLM. There's no royal road. You have to understand the theory, read the books, read the code, write the code, make mistakes, fix mistakes, read papers, talk to other people with more experience than you... and just write code. And rewrite it. And do it all again.

I find the opposite is true: those who use LLM coding exclusively never enjoyed programming to begin with, only learned as much as they needed to, and want the end results.


Agree with pretty much everything you wrote here, I guess with the addendum that LLMs can be a part of the learning experience you're describing. It's as easy as telling the LLM "don't write a single line of code nor command, I want to do everything, your goal is to help me understand what we're doing here."

There are always going to be people who just want the end result. The only difference now is that LLM tools allow them to get much closer to the end result than they previously were able to. And on the other side, there are always going to be people who want to _understand_ what's happening, and LLMs can help accelerate that. I use LLMs as a personalized guide to learning new things.


I know it sounds extreme to dismiss that workflow, but I don't think people are talking enough about the subtle psychological consequences of LLM writing for this kind of thing.

In the same way that googling for an SEO article's superficial answer ends up meaning you never really bother to memorize it, "ask chat" seems to lead to never really bothering to think hard about it.

Of course I google things, but maybe I should be trying to learn in a way that minimizes the need. Maybe its important to learn how to learn in way that minimizes exposure to sycophantic average-blog-speak.


Best of luck in your journey!

To those reading this thread though, be wary of the answers LLMs generate: they're plausible sounding and the LLM's are designed to be sycophants. Be wary, double check their answers to your queries against credible sources.

And read the source!


True. I’ve never bought a piece of commercial software and wanted to inspect the source first before making that decision.

But I will demand my money back or sue you if your crappy code leaks my personal information, destroys my property, performs worse than advertised, or otherwise harms me in some way.

There was sloppy code before LLMs. It’s what they were trained on. And it’s why they generate slop.

All that code that was rushed out to meet an arbitrary deadline made up by the sales team, written by junior and lazy senior developers, pushed by the, “code doesn’t matter,” folks. Code written by the enterprise architecture astronauts with a class hierarchy deeper than the Mariana Trench. A few years down the line you get bloated, slow, hard to maintain spaghetti piles of dung. Windows rendering text that stutter when you scroll them. Virtual keyboards that take seconds to pop up. Browser tabs that take more available memory than was available to send astronauts to the moon and back.

When humans write it you generally have a few people on a team who are concerned with these things. They try to reign in the slop generating, “always be shipping,” people. You need a mix of both. Because each line of code is a liability as much as it’s a new feature.


It sounds like most of the data centers promised in 2025 and 2026 are not even built yet and most of the GPUs bought haven't even been installed.

If it does all go down in flames, even floor value is not going to be that valuable.

I can't predict the future but it's smelling a lot like a recession already under way that is bigger than the sub-prime crash.


I think it will wall people off from software.

I don’t know what SaaS has to do with FOSS. The point of FOSS was to allow me to modify the software I run on my system. If the device drivers for some hardware I depend on are no longer supported by the company I bought it from, if it’s open source, I can modify and extend the software myself.

The Copy Left licenses ensure that I share my modifications back if I distribute them. It’s a thing for the public good.

Agent-based software development walls people off from that. Mostly by ensuring that the provenance of the code it generates is not known and by deskilling people so that they don’t know what to prompt or how to fix their code.


The LLM didn’t make a compiler. It generated code that could plausibly implement one. Humans made the compilers it was trained on. It took many such examples and examples of other compilers and thousands of books and articles and blog posts to train the model. It took years of tweaking, fitting, aligning and other tricks to make the model respond to queries with better, more plausible output. It never made, invented, or reasoned about compilers. It’s an algorithm and system running on a bunch of computers.

The C compiler Anthropic got excited about was not a “working” compiler in the sense that you could replace GCC with it and compile the Linux kernel for all of the target platforms it supports. Their definition of, “works,” was that it passed some very basic tests.

Same with SQLite translation from C to Rust. Gaping, poorly specified English prose is insufficient. Even with a human in the loop iterating on it. The Rust version is orders of magnitude slower and uses tons more memory. It’s not a drop in Rust-native replacement for SQLite. It’s something else if you want to try that.

What mechanism in these systems is responsible for guessing the requirements and constraints missing in the prompts? If we improve that mechanism will we get it to generate a slightly more plausible C compiler or will it tell us that our specifications are insufficient and that we should learn more about compilers first?

I’m sure its possible that there are cases where these tools can be useful. I’m not sure this is it though. AGI is purely hypothetical. We don’t simulate a black hole inside a computer and expect gravity to come out of it. We don’t simulate the weather systems on Earth and expect hurricanes to manifest from the computer. Whatever bar the people selling AI system have for AGI is a moving goalpost, a gimmick, a dream of potential to keep us hooked on what they’re selling right now.

It’s unfortunate that the author nearly hits on why but just misses it. The quotes they chose to use nail it. The blog post they reference nearly gets it too. But they both end up giving AI too much credit.

Generating a whole React application is probably a breath of fresh air. I don’t doubt anyone would enjoy that and marvel at it. Writing React code is very tedious. There’s just no reason to believe that it is anything more than it is or that we will see anything more than incremental and small improvements from here. If we see any more at all. It’s possible we’re near the limits of what we can do with LLMs.


  > Writing React code is very tedious.
slightly off-topic perhaps, but it makes me wonder if its so tedious how did it catch on in the first place...

i feel like llms are abstracting away that tedium sometimes yes, but i feel its probably because the languages and frameworks we use aren't hitting the right abstractions and are too low level for what we are trying to do... idk just a thought


I wouldn’t call what LLMs are doing an abstraction. They generate code. You just don’t have to write it. It can feel like it’s hiding details behind a new, precise semantic layer… but you’ll find out once the project gets to a certain size that is not the case: the details absolutely matter and you’ll be untangling a large knot of code (or prompting the AI to fix it for the seventh time).

It’s a good thought and I tend to think that this is the way I would feel more productive: better languages that give us the ability to write better abstractions. Abstractions should provide us with new semantic layers that lose no precision and encapsulate lots of detail.

They shouldn’t require us to follow patterns in our code and religiously generate boilerplate and configuration. That’s indirection and slop. It’s wasted code, wasted effort, and is why I find frameworks like React to be… not pleasant to use. I would rather generate the code that adds a button. It should be a single expression but for many reasons, in React, it isn’t.


"I’m sure its possible that there are cases where these tools can be useful. I’m not sure this is it though. "

You are arguing against the internet, motor cars and electricity.

It's like 1998, and you're saying: "I'm sure it's possible there are cases where the internet can be useful. I'm not sure it is though"

On 'hackernews' of all places.

It's pretty wild to see that, and I think it says something about what hn has become (or maybe always was?).


Humans learned from prior art, and most of their inventions are modifications of prior art. You a are after all, mostly a biological machine.

The point is - there are so many combinations and permutations of reality, that AI can easily create synthetically novel outcomes by exploring those options.

It's just wrong to suggest that 'it was all in some textbook'.

"here’s just no reason to believe that it is anything more than it is "

It's almost ridiculous at face value, given that millions of people are using it for more than 'helping to write react apps every day'.

It's far more likely that you've come to this conclusion because you're simply not using the tools creatively, or trying to elicit 'synthetic creativity' out of the AI, because it's frankly not that hard, and the kinds of work that it does goes well beyond 'automation'.

This is not an argument, it's the lived experience of large swaths of individuals.


One area where it may end up leaving you behind is if you’re looking for a job right now. There are a lot of companies putting vibe coding in their job requirements. The more companies that do this the harder it will be to find employment if you’re not adopting this tool/workflow.


We do have such detailed specifications. But they are written in a language with a narrow interface. It’s a technique called, “program synthesis,” and you can find an example of such a language called, Synquid.

It might be illuminating to see what a mathematically precise specification can and cannot do when it comes to generating programs. A major challenge in formal methods is proving that the program implements the specification faithfully, known as the specification gap. If you have a very high level and flexible specification language, such as TLA+, there is a lot of work to do to verify that the program you write meets the specification you wrote. For something like Synquid that is closer to the code there are more constraints on expressivity.

The point is that spoken language is not sufficiently precise to define a program.

Just because an LLM can fill in plausible details where sufficient detail is lacking doesn’t indicate that it’s solving the specification gap. If the program happens to implement the specification faithfully you got lucky. You still don’t actually know that’s true until you verify it.

It’s different with a narrow interface though: you can be very precise and very abstract with a good mathematical system for expressing specifications. It’s a lot more work and requires more training to do than filling in a markdown file and trying to coax the algorithm into outputting what you want through prose and fiction.


This works well for problems that are purely algorithmic in nature. But problems often have solutions that don't fall into those categories, especially in UI/UX. When people tell me that LLMs can solve anything with a sufficiently details spec, I ask them to produce such a spec for Adobe Photoshop.


I think the worst case is actually that the LLM faithfully implements your spec, but your spec was flawed. To the extent that you outsource the mechanical details to a machine trained to do exactly what you tell it, you destroy or at least hamper the feedback loop between fuzzy human thoughts and cold hard facts.


Unfortunately even formal specifications have this problem. Nothing can replace thinking. But sycophancy, I agree, is a problem. These tools are designed to be pleasing, to generate plausible output; but they cannot think critically about the tasks they're given.

Nothing will save you from a bad specification. And there's no royal road to knowing how to write good ones.


Right, there’s no silver bullet. I think all I can do is increase the feedback bandwidth between my brain and the real world. Regular old stuff like linters, static typing, borrow checkers, e2e tests… all the way to “talking to customers more”


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: