More

thegeomaster · 2026-04-08T16:42:44 1775666564

Alexandr Wang on Twitter [0] mentioned open source plans:

"this is step one. bigger models are already in development with infrastructure scaling to match. private api preview open to select partners today, with plans to open-source future versions. incredibly proud of the MSL team. excited for what’s to come!"

https://x.com/alexandr_wang/status/2041909388852748717

prodigycorp · 2026-04-08T17:11:38 1775668298

So the answer is: no. lol. Remember Llama 4 Behemoth, and how we were supposed to get more great models from it?

thegeomaster · 2026-04-07T19:37:54 1775590674

What's the "attention window"? Are you alleging these frontier models use something like SWA? Seems highly unlikely.

himata4113 · 2026-04-08T11:24:33 1775647473

well the attention is a matrix at the end of a day which scales exponentially, 1m tokens would need more memory than any computer system in the world can hold. They maybe have larger ones such as 16k to 32k, but you can just see how GLM models work for more information.

Deepseek is the frontrunner in this technology afaik.

thegeomaster · 2026-04-02T14:50:55 1775141455

And it seems they've decided to go closed-source for their largest, best models.

FuckButtons · 2026-04-02T15:25:43 1775143543

3.5-plus was also only available via api. I don’t know what the long term business model for open weights is, I hope there is one, but it seems foolish to assume that companies will be willing to spend millions of dollars of compute on an asset worth zero in perpetuity.

vidarh · 2026-04-02T17:36:46 1775151406

The business case is to salt the earth for new competitors, coupled with marketing.

kgeist · 2026-04-02T15:35:16 1775144116

They've always had closed-source variants:

- Qwen3.5-Plus

- Qwen3-Max

- Qwen2.5-Max

etc. Nothing really changed so far.

coldtea · 2026-04-02T16:15:05 1775146505

They always did that. Did they say anywhere they'd open all their models? They still have a business.

thegeomaster · 2026-03-13T20:57:13 1773435433

Tried on a few of our production prompts and got comparable speeds to what we normally get with Fireworks Serverless (Kimi K2.5), but at a better price. Rooting for you!

2uryaa · 2026-03-14T00:54:49 1773449689

That's really awesome to hear!!

thegeomaster · 2026-03-09T10:15:17 1773051317

Thank you so much for the kind words and for the feedback!

1. Duly noted on USDC and other payment options - I have to see how easy this is do to as we're using stripe.

2. Teams and orgs are very high on the priority list and we hope to have something on this front very soon.

To keep up with our development, Discord is probably the best place: https://discord.gg/ptDRKRJpPV

thegeomaster · 2026-03-03T23:42:49 1772581369

Thanks for the feedback! I'm trying to fix that. The trouble is actually that changing the src of an iframe on a page pushes an entry into the history implicitly. Since we use iframes to display the contents of designs, and they can update, this results in a lot of state history pollution. Will prioritize!

thegeomaster · 2026-02-11T23:35:35 1770852935

Not the parent commenter, but in my testing, all recent Claudes (4.5 onward) and the Gemini 3 series have been pretty much flawless in custom tool call formats.

data-ottawa · 2026-02-11T23:57:04 1770854224

Thanks.

I’ve tested local models from Qwen, GLM, and Devstral families.

thegeomaster · 2026-01-17T10:53:50 1768647230

I actually ran this one. It measures some 700k lines of code, and seems to contain things like a full VBA implementation, complex currency and date parsing, etc. But the UI is extremely basic, doesn't seem to expose any of this advanced functionality, and and is buggy to the point of being unusable. Focus will jump around as you type, cells will reset to old values, it will stop responding to keyboard events, etc.

thegeomaster · 2025-11-30T13:39:28 1764509968

Article talks about all of this and references DeepSeek R1 paper[0], section 4.2 (first bullet point on PRM) on why this is much trickier to do than it appears.

[0]: https://arxiv.org/abs/2501.12948

thegeomaster · 2025-11-30T13:35:16 1764509716

You could think of supervised learning as learning against a known ground truth, which pretraining certainly is.

Davidzheng · 2025-11-30T14:20:13 1764512413

a large number of breakthroughs in AI are based on turning unsupervised learning into supervised learning (alphazero style MCTS as policy improvers are also like this). So the confusion is kind of intrinsic.