More

danielhanchen · 2026-04-22T16:12:47 1776874367

We made Unsloth Studio which should help :)

1. Auto best official parameters set for all models

2. Auto determines the largest quant that can fit on your PC / Mac etc

3. Auto determines max context length

4. Auto heals tool calls, provides python & bash + web search :)

ryandrake · 2026-04-22T16:50:02 1776876602

Yea, I actually tried it out last time we had one of these threads. It's undeniably easy to use, but it is also very opinionated about things like the directory locations/layouts for various assets. I don't think I managed to get it to work with a simple flat directory full of pre-downloaded models on an NFS mount to my NAS. It also insists on re-downloading a 3GB model every time it is launches, even after I delete the model file. I probably have to just sit down and do some Googleing/searching in order to rein the software in and get it to work the way I want it to on my system.

hypercube33 · 2026-04-22T17:49:51 1776880191

Sadly doesn't support fine tuning on AMD yet which gave me a sad since I wanted to cut one of these down to be specific domain experts. Also running the studio is a bit of a nightmare when it calls diskpart during its install (why?)

Zopieux · 2026-04-22T20:31:04 1776889864

Thanks for that. Did you notice that the unsloth/unsloth docker image is 12GB? Does it embed CUDA libraries or some default models that justifies the heavy footprint?

WanderPanda · 2026-04-22T18:23:52 1776882232

I applaud that you recently started providing the KL divergence plots that really help understand how different quantizations compare. But how well does this correlate with closed loop performance? How difficult/expensive would it be to run the quantizations on e.g. some agentic coding benchmarks?

cyanydeez · 2026-04-22T18:10:33 1776881433

Is unsloth working on managing remote servers, like how vscode integrates with a remote server via ssh?

kristjansson · 2026-04-22T18:18:34 1776881914

Lmstudio Link is GREAT for that right now

jbellis · 2026-04-22T17:58:45 1776880725

what are you using for web search?

wuschel · 2026-04-22T16:43:25 1776876205

Great project! Thank you for that!

danielhanchen · 2026-04-22T15:50:37 1776873037

Haha :)

cpburns2009 · 2026-04-22T18:28:36 1776882516

Do you get early access so you can prep the quants for release?

ErneX · 2026-04-22T19:24:14 1776885854

IIRC they mentioned they do.

danielhanchen · 2026-04-22T15:50:16 1776873016

Haha :) We had some issues with Kimi-2.6 since it was int4 and we were investigating how to handle it :)

verdverm · 2026-04-22T18:50:41 1776883841

Appreciate what y'all do! We were slacking about how many HGX-B300 it would take to run Kimi and it looks like we could actually fit 2-3 Kimis on a single HGX.

danielhanchen · 2026-04-22T15:49:29 1776872969

We also made some dynamic MLX ones if they help - it might be faster for Macs, but llama-server definitely is improving at a fast pace.

https://huggingface.co/unsloth/Qwen3.6-27B-UD-MLX-4bit

DarmokJalad1701 · 2026-04-22T17:51:34 1776880294

What exactly does the .sh file install? How does it compare to running the same model in, say, omlx?

danielhanchen · 2026-04-18T10:24:23 1776507863

Yes sadly CUDA 13.2 is broken - NVIDIA will push a fix in CUDA 13.3

danielhanchen · 2026-04-17T08:27:49 1776414469

Love the JPEG analogy :)

danielhanchen · 2026-04-16T17:50:28 1776361828

Oh that is pretty good! And the SVG one!

danielhanchen · 2026-04-16T16:58:39 1776358719

They sometimes do! Qwen, Google etc do them!

danielhanchen · 2026-04-16T16:12:42 1776355962

Oh hey - we're actually the 4th largest distributor of OSS AI models in GB downloads - see https://huggingface.co/unsloth

https://unsloth.ai/docs/basics/unsloth-dynamic-2.0-ggufs is what might be helpful. You might have heard 1bit dynamic DeepSeek quants (we did that) - not all layers can be 1bit - important ones are in 8bit or 16bit, and we show it still works well.

danielhanchen · 2026-04-16T15:50:00 1776354600

Yes this is fair - we try our best to communicate issues - I think we're mostly the only ones doing the communication that model A or B has been fixed etc.

We try our best as model distributors to fix them on day 0 or 1, but 95% of issues aren't our issues - as you mentioned it's the chat template or runtime etc

alfiedotwtf · 2026-04-17T14:38:30 1776436710

I have to ask - what do you run locally on your laptop (model, backend, and agentic cli)?

Feature request:

A leader board with filtering so you can enter your machine specs and it will sort all models along with all the various quantisation and then rank them all - because so far model ranking site either don’t include all available quants, don’t compare apples to apples (ie was one model tested with Claude code while another benchmark done with opencode) etc

Oh - and as bonus, scoring also ranked by which agentic CLI :)