I don't know how Ultrathink works, and I have no "real world" experience with Kamal, but I find it intriguing to see someone consider 11 deployments in 2 hours to be "fast".
Instead of handicapping yourself, fix your deployment pipeline, 10 min deploys are not OK for an online store.
I would have liked to read about the "high availability" that's mentioned a couple of times in the article; the WAL Configuration section is not enough, and replication is expensive'ish.
"how does a smart car compare to a ford f150? its different in its intent and intended audience.
Ollama is someone who goes to walmart and buys a $100 huffy mountain bike because they heard bikes are cool. Torchchat is someone who built a mountain bike out of high quality components chosen for a specific task/outcome with the understanding of how each component in the platform functions and interacts with the others to achieve an end goal."
https://www.reddit.com/r/LocalLLaMA/comments/1eh6xmq/comment...
Longer Answer with some more details is
If you don't care about which quant you're using, only use ollama and want easy integration with desktop/laptop based projects use Ollama.
If you want to run on mobile, integrate into your own apps or projects natively, don't want to use GGUF, want to do quantization, or want to extend your PyTorch based solution use torchchat
Right now Ollama (based on llama.cpp) is a faster way to get performance on a laptop desktop and a number of projects are pre-integrated with Ollama thanks to the OpenAI spec. It's also more mature with more fit and polish.
That said the commands that make everything easy use 4bit quant models and you have to do extra work to go find a GGUF model with a higher (or lower) bit quant and load it into Ollama.
Also worth noting is that Ollama "containerizes" the models on disk so you can't share them with other projects without going through Ollama which is a hard pass for any users and usecases since duplicating model files on disk isn't great.
https://www.reddit.com/r/LocalLLaMA/comments/1eh6xmq/comment...
So you can also run on any LLM privately with ollama, lmstudio, and or LLamaSharp with windows, mac and iphone, all are opensource and customizable too and user friendly and frequently maintained.
Probably if you have any esoteric flags that pytorch supports. Flash attention 2, for example, was supported way earlier on pt than llama.cpp, so if flash attention 3 follows the same path it'll probably make more sense to use this when targeting nvidia gpus.
It would appear that Flash-3 is already something that exists for PyTorch based on this joint blog between Nvidia, Together.ai and Princeton about enabling Flash-3 for PyTorch: https://pytorch.org/blog/flashattention-3/
Olamma currently has only one "supported backend" which is llama.cpp. It enables downloading and running models on CPU. And might have more mature server.
How well does it run in AMD GPUs these days compared to Nvidia or Apple silicon?
I've been considering buying one of those powerful Ryzen mini PCs to use as an LLM server in my LAN, but I've read before that the AMD backend (ROCm IIRC) is kinda buggy
What package managers are people using in other languages to make sure that software "always works the same as when you first wrote it, without asterisks"? I'd like to understand how they solve the "package no longer exists in a central registry" problem.
This is not as much about a package manager as it is about conventions and necessary dependencies.
Standards ensure that going forward language semantics and syntax don't change. Having minimal dependencies ensures program longevity. Package manager cannot solve these problems, no matter how good it is at its job.
Python doesn't have a standard, it's heavily reliant on dependencies which are very plentiful and similarly unregulated. A program written in C that uses only functionality described in some POSIX standard will endure decades unmodified. Even Python helloworld program went stale sometime ago, even though it's just one line.
> I'd like to understand how they solve the "package no longer exists in a central registry" problem.
This is, of course, an infrastructure/maintenance issue as much as a package manager design issue. But in Nix's case, the public 'binary cache' (for Nixpkgs/NixOS) of build outputs includes not only final build outputs but also the source tarballs that go into them. As Nix disallows network access at build time, all dependencies are represented this way, including jar files or source tarballs, or whatever— Nix itself must be the one to fetch your dependencies. Consequently, everything you fetch from the Internet for your build is a kind of intermediary Nix build that can be cached using the usual Nix tools. The Nix community's public cache has a policy of retaining copies of upstream sources forever (there is recently talk of limiting storage of the final built packages to a retention period of only 2 years, but sources will continue to be retained indefinitely. So far the cache reaches back to its inception around a decade ago.)
Taken together, these things mean that when a project disappears entirely from GitHub or Maven Central or whatever, people building against old versions of it with Nix/Nixpkgs don't even notice. Nix just fetches those upstream sources from the public cache without even reaching out to that central repository from which those sources have been removed.
For private use cases where your project and its dependencies won't be mirrored to the public cache of Nixpkgs builds, you can achieve the same effect by running your own cache or paying a hosted service to do that.
For builds outside the Nix universe, you can make special arrangements for each type of package your various builds fetch, and mirroring those repos. Then configure your builds to pull from your mirrors instead of the main/public ones.
Dotnet uses Nuget[1]. Packages in the system are immutable, & never changing. They can be unlisted, but never deleted (except in limited & extreme cases, like malware), which means even if a package maintainer stops publishing new versions to the repository, existing packages will continue to be publicly available for as long as Microsoft continues to exist.
Often a package says it works with OS version X, and not X+1. It may be true or false. But what you described does not solve either version of that problem.
I think you're referring to vendoring dependencies? In python/pip for example, you can download the source for a package and point to the folder directly as a dependency instead of the version or a git URL. Most package managers/languages support some version of that. I suppose if you wanted to vendor all dependencies by default, keep them updated etc it would take a little more scripting or extra tools.
Using the GitLab calculator: If you are a developer in Medellín you are paid 3x less than if you are living in the Bay Area. If you are a customer in Medellín you pay the same as a customer in the Bay Area. In this case, "cost of living"[0] only applies to employees and not customers, which in my opinion says a lot about how much the company values their staff.
I appreciate the openness from GitLab on this matter though, they're pretty up-front about it so if you don't like it don't apply.
[0] When the word "cost of living" is mentioned but it's not reflected in the price for final users then you know you're about to get fucked.
It seems to me there is a lot of VIM vocabulary that is very different from what is common on the desktop today. I don't know what yank means. I don't think I have a desktop app that has a yank function.
In most Unix systems you should be able to go to the beginning of a line, press Ctrl-k and then Ctrl-y to yank. At the very least your browser and terminal should understand those two commands.
I don't know how Ultrathink works, and I have no "real world" experience with Kamal, but I find it intriguing to see someone consider 11 deployments in 2 hours to be "fast".
Instead of handicapping yourself, fix your deployment pipeline, 10 min deploys are not OK for an online store.