Hacker Newsnew | past | comments | ask | show | jobs | submit | locusofself's commentslogin

Are there local models that are anywhere near as good at coding as opus 4.6?

People will insist otherwise, but I haven't seen anything close to sonnet 4.6 that can be run locally.

I don't think anyone can honestly say a huge frontier model is actually going to be matched by something running on 64gb locally?

I have read many comments saying Qwen3.5 various ~30B models, Gemma 4 ~30B models and now Qwen3.6 "better than sonnet".

I don't know how large sonnet and opus are but the rumor is 1T and 5T respectively.


You don't have to use the most recent bleeding edge model to succeed. A local FOSS coding agent coupled with a reasonably priced LLM could yield the optimal ROI.

Not really. Qwen 3.5 and Gemma and a couple of others are quite good though, and the quants are _very_ runnable on a good gpu.

I've had excellent luck using Claude Code to generate "mermaid diagrams" for me, and convert them to .png format headlessly using mmdc/puppeteer. Really helped me out with an engineering proposal I just finished. In past years I would have fumbled around with Visio forever and the result would have been worse.

I find Mermaid diagram rendering is quite ugly by default. I've gotten much better-looking results by asking it to just generate SVGs. As a bonus it can do animations too. e.g. see slide 3 here, which I first tried with Mermaid and then switched to SVG when I couldn't get the rendering to look good: https://talks.mk.gg/2026/atmosphereconf/

I now often have CC make technical/architecture diagrams with tikz, the results look much better than mermaid but still requires multiple iterations to fix bad arrows, bad layouts etc.

Diagrams are still far from solved. We need a good non-gameable diagrams benchmark.


I use Claude Code and Gemini and to LLM as a judge among the two to review each others result and generate a final mermiad diagram.

do the same.

I just ask Claude code for mermaid to visualize any topic I'm discussing.


In a pinch, Claude is also quite good at ASCII art in my experience!

Same here!

You can copy past the mermaid to excalidraw to visualize it.

And yet people here insist that the height of an llm is not being above to draw a pelican or count letters in a word

Well, mermaid diagrams are "just" a list of nodes and their relations. You'd expect any llm capable of generating code to be able to generate them

Writing an SVG of a pelican riding a bicycle without being able to see the result and iterate based on that is incredibly difficult by comparison. I'm sure some humans could do it, but I sure can't. That's part of the beauty of it: it's very difficult to do but a toddler could judge the results

Writing an SVG of a diagram by hand would be somewhere on the middle ground. Or depending on the number of nodes might be even harder than the pelican. Layouting diagrams can get tricky very quickly. It's also one of Mermaid's biggest weaknesses


Just wait if they go public. Claude 5.4 fails the Pelican test stock sheds 20% of value pf news. Wall street wonders if the lack of front wheel means there is something seriously wrong with the stocks underlying value

Could you share your setup + workflow? How do you get it to not give shit layouts?

Agreed. Unless this really helps people somehow make better trading decisions than existing tools, the vast majority of them are probably still better off index investing.

That's not what the "problem" was. It's that cheap American support people were "escorting" foreign Microsoft SWEs, so they could manage and fix services they wrote and were the subject matter experts for in the sovereign cloud instances which they otherwise would have no access to.

And this was NOT for the government clouds we have that hold classified data. Those are air-gapped clouds that physically cannot be accessed by anyone who doesnt have a TS clearance and physically go into a SCIF.

source: I work in a team very closely related the team who designed digital escort.


I would definitely fight against calling anything I work on „digital escort”.


Yeah, it’s not a great name. But it originates from the government. When somebody without a security clearance needs to go to a secure area, they must be escorted by somebody.


When the blog post mentioned Hegseth and “digital escort” in the same sentence, I was surprised to learn it wasn’t about his OnlyFans habit at his work desktop.


Yes but this misses the underlying point: this is the same software. It suffers from the same defects. If your management stack keeps crashing and leaking VMs you are seeing a reduction in the operational capacity of the fleet. If you are still there just tour Azure Watson and tell me if you’d want the military to rely on that system in wartime? Don’t forget things like IVAS and God knows what else that are used during operations while Azure node agents happily crash and restart on the hosts. The system should be no-touch and run like an appliance, which is predicated on zero crashes or 100% crash resiliency. In Windows Core we pursued a single Watson bucket with a single hit until it was fixed. Different standards.


I'm only commenting on parent comment's understanding of what digital escort process is specifically. Escort is used by all kinds of teams that are just doing day-to-day crap for various resource providers across azure. I've never worked anywhere close to Azure Core so I don't know about these more low-level concerns. Overall I agree and sympathize with your assessment of the engineering culture.


You also make it sound like getting a JIT approved is getting keys to the kingdom. It's not -- every team has it's own JIT policies for their resources. Should there be far less manual touches? Ideally. But JIT is better than persistent access at least, and JIT policies should be scoped according to principle of least privilege. If that is not happening, it's a failure at the level of that specific org.


Policies vary. The node folks get access to the nodes and the fabric controller by necessity.

I guess we agree on the point where it should not be necessary, which echoes Cutler’s original intent of “no operational intervention.”

This is not an impossible task, after all it’s just user-mode code calling into platform APIs.


200 requests a day, lol


on average :)


I had an interesting experience the other day. I've been struggling with some lyrics to a song I am writing. I asked Claude to review them, and it did an amazing job of finding the weak lines and best lines, and nearly perfectly articulating to my why they were weak or strong. It was strange because the output of the analysis almost perfectly mirrored my own thoughts.

When I asked it for alternatives/edits, they were not good however.


one of them I clicked on was totally covered in flies (the cat food). eww


I love this, and it applies to a lot more than software and trees :)


I did this once and was scolded by my date:

!!! (Serious) To stand chopsticks upright in a bowl of rice. This is taboo, as it is the way rice is presented as a Buddhist funeral offering.


It would also be completely inappropriate if you did that with a fork or a knife.


right, must be nice. I live in a HCOL area and have a mortgage and family to support. If big tech lays me off, it's going to be stressful and probably mean me selling my house and moving to LCOL.


BSP?


Binary space partitioning


Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: