I have moved through the local models at this size.
This one is by far the most capable. I've tried various versions of gemma4.26b, various versions of qwen3.5-27/35b (qwopus's galor),nemotron,phi,glm4.7.
This one is noticeably better as an agent. It's really good at breaking down tasks into small actionable steps, and - where there is ambiguity - asks for clarification. It's reasoning seems more solid than gemma4, tool use, multi-messaging/longer chain thinking.
I am excited to see what other versions of this model people train!
I try not to use publicly hosted models, and I avoid the SOTA data harvesting machine... so I can not compare. I can compare only to local models. And this one feels like a decently significant leap compared to 3.5 or gemma4.
I see there is now a distilled reasoning model version on hugging-face. I may look into that, but I have not seen a need to reach for that change yet either...
it's about the only way of reconciling experimental validity (if the AI can't "fire" staff and remove them from business operations and their P&L account in situations when it would be legal and normal to do so, is it really running a business?) and not having the massive ethical issue of people being arbitrarily fired because a computer glitched. Whether that's what they actually do is tbc.
Who would suspect that the companies selling 'tokens' would (unintentionally) train their models to prefer longer answers, reaping a HIGHER ROI (the thing a publicly traded company is legally required to pursue: good thing these are all still private...)... because it's not like private companies want to make money...
I don’t think this is a plausible argument, as they’re generally capacity constrained, and everyone would like shorter (= faster) responses.
I’m fairly certain that in a few more releases we’ll have models with shorter CoT chains. Whether they’ll still let us see those is another question, as it seems like Anthropic wants to start hiding their CoT, potentially because it reveals some secret sauce.
Yes, which I understand, but I think they’re crippling their product for users this way.
I don’t think it’s just this, because the thinking tokens often reveal more about Anthropic’s inner workings. For example, it’s how the whole existence of Claude’s soul document was reverse engineered, it often leaks details about “system reminders” (eg long conversation reminders).
I think it’s also just very convenient for Anthropic to do this. The fact that they’re also presenting this as a “performance optimization” suggests they’re not giving the real reason they do this.
Try setting up one laundry which charges by the hour and washes clothes really really slowly, and another which washes clothes at normal speed at cost plus some margin similar to your competitors.
The one which maximizes ROI will not be the one you rigged to cost more and take longer.
Directionally, tokens are not equivalent to "time spent processing your query", but rather a measure of effort/resource expended to process your query.
So a more germane analogy would be:
What if you set up a laundry which charges you based on the amount of laundry detergent used to clean your clothes?
Sounds fair.
But then, what if the top engineers at the laundry offered an "auto-dispenser" that uses extremely advanced algorithms to apply just the right optimal amount of detergent for each wash?
Sounds like value-added for the customer.
... but now you end up with a system where the laundry management team has strong incentives to influence how liberally the auto-dispenser will "spend" to give you "best results"
LLM APIs sell on value they deliver to the user, not the sheer number of tokens you can buy per $. The latter is roughly labor-theory-of-value levels of wrong.
I don't think the poster has a viewpoint that 'refuses consent', their viewpoint is their writing they put for others to view is for others to view, regardless of how it is viewed. They seem to be giving consent, not refusing it, no?
> This is what I was responding to. I do not understand your thinking in this post.
I thought it was clear from "refuses to include any sort of consent" that I am talking specifically about holding an opinion that refuses to include consideration for consent, not refuses consent for usage.
This one is by far the most capable. I've tried various versions of gemma4.26b, various versions of qwen3.5-27/35b (qwopus's galor),nemotron,phi,glm4.7.
This one is noticeably better as an agent. It's really good at breaking down tasks into small actionable steps, and - where there is ambiguity - asks for clarification. It's reasoning seems more solid than gemma4, tool use, multi-messaging/longer chain thinking.
I am excited to see what other versions of this model people train!
reply