Hacker Newsnew | past | comments | ask | show | jobs | submit | thegeomaster's commentslogin

Alexandr Wang on Twitter [0] mentioned open source plans:

"this is step one. bigger models are already in development with infrastructure scaling to match. private api preview open to select partners today, with plans to open-source future versions. incredibly proud of the MSL team. excited for what’s to come!"

https://x.com/alexandr_wang/status/2041909388852748717


So the answer is: no. lol. Remember Llama 4 Behemoth, and how we were supposed to get more great models from it?

What's the "attention window"? Are you alleging these frontier models use something like SWA? Seems highly unlikely.

well the attention is a matrix at the end of a day which scales exponentially, 1m tokens would need more memory than any computer system in the world can hold. They maybe have larger ones such as 16k to 32k, but you can just see how GLM models work for more information.

Deepseek is the frontrunner in this technology afaik.


And it seems they've decided to go closed-source for their largest, best models.


3.5-plus was also only available via api. I don’t know what the long term business model for open weights is, I hope there is one, but it seems foolish to assume that companies will be willing to spend millions of dollars of compute on an asset worth zero in perpetuity.


The business case is to salt the earth for new competitors, coupled with marketing.


They've always had closed-source variants:

- Qwen3.5-Plus

- Qwen3-Max

- Qwen2.5-Max

etc. Nothing really changed so far.


They always did that. Did they say anywhere they'd open all their models? They still have a business.


Tried on a few of our production prompts and got comparable speeds to what we normally get with Fireworks Serverless (Kimi K2.5), but at a better price. Rooting for you!


That's really awesome to hear!!


Thank you so much for the kind words and for the feedback!

1. Duly noted on USDC and other payment options - I have to see how easy this is do to as we're using stripe.

2. Teams and orgs are very high on the priority list and we hope to have something on this front very soon.

To keep up with our development, Discord is probably the best place: https://discord.gg/ptDRKRJpPV


Thanks for the feedback! I'm trying to fix that. The trouble is actually that changing the src of an iframe on a page pushes an entry into the history implicitly. Since we use iframes to display the contents of designs, and they can update, this results in a lot of state history pollution. Will prioritize!


Not the parent commenter, but in my testing, all recent Claudes (4.5 onward) and the Gemini 3 series have been pretty much flawless in custom tool call formats.


Thanks.

I’ve tested local models from Qwen, GLM, and Devstral families.


I actually ran this one. It measures some 700k lines of code, and seems to contain things like a full VBA implementation, complex currency and date parsing, etc. But the UI is extremely basic, doesn't seem to expose any of this advanced functionality, and and is buggy to the point of being unusable. Focus will jump around as you type, cells will reset to old values, it will stop responding to keyboard events, etc.


Article talks about all of this and references DeepSeek R1 paper[0], section 4.2 (first bullet point on PRM) on why this is much trickier to do than it appears.

[0]: https://arxiv.org/abs/2501.12948


You could think of supervised learning as learning against a known ground truth, which pretraining certainly is.


a large number of breakthroughs in AI are based on turning unsupervised learning into supervised learning (alphazero style MCTS as policy improvers are also like this). So the confusion is kind of intrinsic.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: