There are many providers (Fireworks, Groq, Cerebras, Google Vertex), some using ...

re-thc · 2026-01-29T13:10:21 1769692221

And who funds those providers? They can clearly pull the plug any time. The so called many providers is the same group under the hood.

Nvidia was noted as having most of their revenue heavily tilted against only a few major customers for example.

atwrk · 2026-01-29T13:57:02 1769695022

You just click "buy" with another provider? They just host the model for money, nothing more.

To be clear: The models are open weights everyone can simply download, because many labs publish them as such. The providers in question are generic hosters. It's the same as if you would get some managed wordpress hosting somewhere.

re-thc · 2026-01-29T14:11:43 1769695903

> You just click "buy" with another provider? They just host the model for money, nothing more.

Start with where the GPU and rest of it comes from.

Topfi · 2026-01-29T13:33:55 1769693635

I don’t really understand your point. A company offering only inference will inherently always have lower costs and require less hardware than one that both provides inference and does training of new models.

re-thc · 2026-01-29T14:11:08 1769695868

The point is it doesn't work like that. Check out all the special purpose vehicles, loans and fund raises going around in that space. Deals claiming for $x billion in funds only got millions + GPU rentals in special deals. And the offering of those deals still hinges on the top 3 said customers buying to float things up.

i.e. whether you go for "only inference" or "both" it'll work out similar somewhere.

Topfi · 2026-01-29T14:32:58 1769697178

If you are referencing Nvidia, note that Vertex, Groq, Cerebras, etc. all do not rely on GPUs for much or all of their inference and are all the better for it.

Kind of why Nvidia see themselves forced to make such deals, to lock labs in before they loose another to objectively superior inference.

Staying purely with US labs, they just lost Meta and both Anthropic and Gemini models have been available on Vertex without relying on GPUs for a while. OpenAI equally is turning towards Cerebras so yeah, those deals will mean little for inference.

Those circular deals aren’t a thing for the inference providers I tend to go for and they don’t change the fact that using a full GPU purely for inference is like using a pickup truck for a track day. It can be done and the flexibility has advantages, but once you focus on one task, there are better tools for the job.