More

frontsideair · 2026-04-03T14:46:17 1775227577

> Apple locked it behind Siri. apfel sets it free

This doesn't feel truthful, it sounds like this tool is a hack that unlocks something. If I understand it correctly, it's using the same FoundationModels framework that powers Apple Intelligence, but for CLI and OpenAI compatible REST endpoint. Which is fine, just the marketing goes hard a bit.

> Runs on Neural Engine

Also unsure if this runs on ANE, when I tried Apple Intelligence I saw that it ran on the GPU (Metal).

frontsideair · 2026-04-05T19:46:53 1775418413

I was able to confirm that it actually runs on ANE, I'm impressed.

reaperducer · 2026-04-03T15:04:49 1775228689

[flagged]

halJordan · 2026-04-03T19:10:51 1775243451

Using soft or unsure wording doesn't obviate the factualness of the contribution. The op is correct on both accounts- it's ok to be unsure when putting it forward.

You on the other hand contributed literally nothing to the topic

malcolmgreaves · 2026-04-03T16:35:38 1775234138

Please read the guidelines: https://news.ycombinator.com/newsguidelines.html

The poster said:

> Also unsure if this runs on ANE, when I tried Apple Intelligence I saw that it ran on the GPU (Metal).

They added something of some substance here.

Your post expressing your feelings did not.

frontsideair · 2025-10-21T11:10:54 1761045054

Yeah, the initial experience with no colors doesn’t look great. I can implement this when I have some free time, if you feel like doing it please feel free to open a PR. Thanks!

gus_massa · 2025-10-22T15:12:35 1761145955

Sorry, too busy sending PR to fix bugs I made a few months ago :(

I think the 3 initial colors is easy to implement and will make your site more beginner friendly.

frontsideair · 2025-09-09T07:03:28 1757401408

14B Qwen was a good choice, but it became outdated a bit and seems like the new version of 4B surpassed it in benchmarks somehow.

It's a balancing game, how slow a token generation speed can you tolerate? Would you rather get an answer quick, or wait for a few seconds (or sometimes minutes) for reasoning?

For quick answers, Gemma 3 12B is still good. GPT-OSS 20B is pretty quick when reasoning is set to low, which usually doesn't think longer than one sentence. I haven't gotten much use out of Qwen3 4B Thinking (2507) but at least it's fast while reasoning.

frontsideair · 2025-09-09T06:37:02 1757399822

Ollama adding a paid cloud version made me postpone this post for a few weeks at least. I don't object them to make money, but it was hard to recommend a tool for local usage and make the first instruction to go to settings and enable airplane mode.

Luckily llama.cpp has come a long way and was at a point that I could easily recommend as the open source option instead.

frontsideair · 2025-09-09T06:34:20 1757399660

This is the command probably:

  sudo sysctl iogpu.wired_limit_mb=184320

Source: https://github.com/ggml-org/llama.cpp/discussions/15396

frontsideair · 2025-09-09T06:32:46 1757399566

I'm interested in this, my impression was that the newer chips have unified memory and high memory bandwidth. Do you do inference on the CPU or the external GPU?

Damogran6 · 2025-09-09T15:35:40 1757432140

I don't, I'm a REALLY light user. smaller LLMs work pretty well. I used a 40gb LLM and it was _pokey_, but it worked, and switching them is pretty easy. This is a 12 core Xeon with 64Gb RAM...my M4 mini is....okay with smaller LLMs, I have a Ryzen 9 with a RTX3070ti that's the best of the bunch, but none of this holds a candle to people that spend real money to experiment in this field.

frontsideair · 2025-09-09T06:20:59 1757398859

Good point, let me add a quick note.

frontsideair · 2025-09-09T06:18:10 1757398690

Thank you, it was the integral part of the whole post!

frontsideair · 2025-08-06T16:50:44 1754499044

According to the benchmarks, this one is improved in every one of them compared to the previous version, some better than 30B-A3B. Definitely worth a try, it’ll easily fit into memory and token generation speed will be pleasantly fast.

GaggiX · 2025-08-06T17:35:54 1754501754

There is a new Qwen3-30B-A3B, you are compare it to the old one. https://huggingface.co/Qwen/Qwen3-30B-A3B-Thinking-2507

frontsideair · on March 6, 2025

This is the first time I’m hearing about Nebula. How does it compare to Tailscale?