Hacker Newsnew | past | comments | ask | show | jobs | submit | altruios's commentslogin

I have moved through the local models at this size.

This one is by far the most capable. I've tried various versions of gemma4.26b, various versions of qwen3.5-27/35b (qwopus's galor),nemotron,phi,glm4.7.

This one is noticeably better as an agent. It's really good at breaking down tasks into small actionable steps, and - where there is ambiguity - asks for clarification. It's reasoning seems more solid than gemma4, tool use, multi-messaging/longer chain thinking.

I am excited to see what other versions of this model people train!


How does it compare to CC Opus Max?

I try not to use publicly hosted models, and I avoid the SOTA data harvesting machine... so I can not compare. I can compare only to local models. And this one feels like a decently significant leap compared to 3.5 or gemma4.

I see there is now a distilled reasoning model version on hugging-face. I may look into that, but I have not seen a need to reach for that change yet either...


I assume if they get fired by the AI during the experiment they are still paid to sit at home. It would not invalidate the experiment.

Why do you assume that?

it's about the only way of reconciling experimental validity (if the AI can't "fire" staff and remove them from business operations and their P&L account in situations when it would be legal and normal to do so, is it really running a business?) and not having the massive ethical issue of people being arbitrarily fired because a computer glitched. Whether that's what they actually do is tbc.

Who would suspect that the companies selling 'tokens' would (unintentionally) train their models to prefer longer answers, reaping a HIGHER ROI (the thing a publicly traded company is legally required to pursue: good thing these are all still private...)... because it's not like private companies want to make money...

I don’t think this is a plausible argument, as they’re generally capacity constrained, and everyone would like shorter (= faster) responses.

I’m fairly certain that in a few more releases we’ll have models with shorter CoT chains. Whether they’ll still let us see those is another question, as it seems like Anthropic wants to start hiding their CoT, potentially because it reveals some secret sauce.


I guess mainly they don’t want you to distill on their CoT

Yes, which I understand, but I think they’re crippling their product for users this way.

I don’t think it’s just this, because the thinking tokens often reveal more about Anthropic’s inner workings. For example, it’s how the whole existence of Claude’s soul document was reverse engineered, it often leaks details about “system reminders” (eg long conversation reminders).

I think it’s also just very convenient for Anthropic to do this. The fact that they’re also presenting this as a “performance optimization” suggests they’re not giving the real reason they do this.


Try setting up one laundry which charges by the hour and washes clothes really really slowly, and another which washes clothes at normal speed at cost plus some margin similar to your competitors.

The one which maximizes ROI will not be the one you rigged to cost more and take longer.


I don't think the analogy is correct here.

Directionally, tokens are not equivalent to "time spent processing your query", but rather a measure of effort/resource expended to process your query.

So a more germane analogy would be:

What if you set up a laundry which charges you based on the amount of laundry detergent used to clean your clothes?

Sounds fair.

But then, what if the top engineers at the laundry offered an "auto-dispenser" that uses extremely advanced algorithms to apply just the right optimal amount of detergent for each wash?

Sounds like value-added for the customer.

... but now you end up with a system where the laundry management team has strong incentives to influence how liberally the auto-dispenser will "spend" to give you "best results"


Shades of “repeat” in lather, rinse, repeat.

LLM APIs sell on value they deliver to the user, not the sheer number of tokens you can buy per $. The latter is roughly labor-theory-of-value levels of wrong.

What would you rather have the government spend it's money on if not on the wellbeing/welfare of its citizens?

It is like the AI is training us to avoid certain language patterns. I rebel at the hostage of weak language: for strong language is next.

The mighty semi colon prepares for its return !

Less maybe: more should have yesterday. Do so now today.

refuse consent?

You may need to clarify that thought.

I don't think the poster has a viewpoint that 'refuses consent', their viewpoint is their writing they put for others to view is for others to view, regardless of how it is viewed. They seem to be giving consent, not refusing it, no?


> refuse consent?

Who said anything about refusing consent?


> I think you are walking all around the word "consent" and trying very hard to avoid it altogether.

> Your perspective, because it refuses to include any sort of consent, is invalid. No perspective that refuses consent can be valid.

This is what I was responding to. I do not understand your thinking in this post.


> This is what I was responding to. I do not understand your thinking in this post.

I thought it was clear from "refuses to include any sort of consent" that I am talking specifically about holding an opinion that refuses to include consideration for consent, not refuses consent for usage.


But that's what I'm confused about:

How is freely giving consent for (all) others to read your content not 'considering consent'?

I'm not trying to be snarky. I really don't see the missing piece that isn't written that connects those dots.


Gemma4 is apache2 licensed.

I am unsure about the openness of the training data itself. That too should be required for a LLM to be considered 'open'.

Open source is the only way forward, I agree.


I think it's more a proof of concept: locally trained. It would take lots of resources/time to train something non-trivial.

Ideas do not compete based on how true they are but how easily they are remembered.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: