That's my main goal - and Rust seems like the language where we can have our cake and eat it too! The standard library has some amazingly performant data structures and algorithms.
Talking to one of the hyper developers, a lot of this machinery was made before it arrived in the stdlib. My response was, "neat, so we can delete the custom stuff now, right? :P" but I suppose any change carries some risk with it.
> It contributed nothing worthwhile, but I’m happy it got you your 15 minutes on HN.
That is a very rude dismissal, followed by a bad faith implication of the author's motivations. You can do better.
I got something worthwhile out of it. I'm learning Rust and, like the author of the blog post, tend to value safety (simplicity hopefully being a contributor to that) over that last bit of performance. I was actually looking over the source of some of the libraries I'm considering using, am concerned about some things I found, didn't know about ureq, now I do, and I think it's going to fit my needs.
You broke the site guidelines badly and repeatedly in this thread, for example in your aggressive comments to the article author, and in https://news.ycombinator.com/item?id=39139646 which was really totally unacceptable.
This is not cool.
You're a good contributor, and I appreciate that you don't do this all the time, but we've also had to warn you multiple times about this kind of thing in the past. If you'd please review https://news.ycombinator.com/newsguidelines.html and stick to the rules when posting here, we'd appreciate it.
> If a library has 100k lines of code but the parts you use are only 1k lines of code then why is that worse than a library with 1k lines of code?
There's nothing wrong with not using the whole feature set of some library. But if library A does the same things as library B with a third of the code, isn't that better? (All other things - e.g. perf - being equal)
> That's true right until the moment you do need them and then you need to rewrite a lot of code.
There's plenty of applications that will only ever need a handful of connections. Probably most applications.
I got no horse in this race -- I like both Golang and Rust and use them for different things -- but as far as I can tell, Golang's GC has improved a lot.
Not sure where does it stand on a global competition rating board (if there's even such a thing) but it's pretty good. I've never seen it crap the bed, though I also never worked at the scale of Twitch and Discord.
> Async/await was a terrible idea for fixing JavaScript's lack of proper blocked threading that is currently being bolted onto every language.
As fun as it is to hate on JavaScript, it's really interesting to go back and watch Ryan Dahl's talk introducing Node.js to the world (https://www.youtube.com/watch?v=EeYvFl7li9E). He's pretty ambivalent about it being JavaScript. His main goal was to find an abstraction around the epoll() I/O event loop that didn't make him want to tear his eyes out, and he tried a bunch of other stuff first.
If you look around at the other competition at the time, it's worth noting that many other languages that already existed for decades ultimately came up with the same basic solution. In fact one of the weird things about the Node propaganda at the time was precisely that every other major scripting language tended to have not just one event-based library, but choices of event based libraries. Perl even had a metapackage abstracting several of them. It was actually a bog-standard choice, not some sort of incredible innovation.
I don't think it's a "good" solution in the abstract, but in the concrete of "I have a dynamically-typed scripting language with already over a decade of development and many more years of development that will happen before the event-based stuff is really standard", it's nearly the only choice. Python's gevent was the only other thing I saw that kinda solved the problem, and I really liked it, but I'm not sure it's a sustainable model in the end as it involves writing a package that aggressively reaches into other packages to do its magic; it is a constant game of catch-up.
I do think it's a grave error in the 2020s to adopt async as the only model for a language, though. There are better choices. And I actually exclude Rust here, because async is not mandatory and not the only model; I think in some sense the community is making the error of not realizing that your task will never have more than maybe a hundred threads in it and a 2023 computer will chomp on that without you noticing. Don't scale for millions of concurrent tasks when you're only looking at a couple dozen max, no matter what language or environment you're in. Very common problem for programmers this decade. It may well be the most impactful premature optimization in programming I see today.
> Don't scale for millions of concurrent tasks when you're only looking at a couple dozen max, no matter what language or environment you're in. Very common problem for programmers this decade.
And also with fibers/virtual threads (project loom) you can actually have a million threads using blocking hand-off on one machine. So the performance argument is kind of gone.
He was strongly advocating callbacks with node.js, which is understandable given the time. But a few years later when he wrote some Go code, he said that's a better model for networked servers (sorry no reference right now, it was in a video I watched, not sure which one)
JS callbacks are indeed better than C callbacks because you can hold onto some state. Although I guess the capture is implicit rather than explicit, so some people might say it's more confusing.
I'm pretty sure Joyent adopted and funded node.js because they were doing lots of async code in C, and they liked the callback style in JavaScript better. It does match the kind of problems that Go is now being used for, and this was pre-Go.
But anyway it is interesting how nobody really talks about callbacks anymore. Seems like async/await has taken over in most languages, although I sorta agree with the parent that it could have been better if designed from scratch.
> As fun as it is to hate on JavaScript, it's really interesting to go back and watch Ryan Dahl's talk introducing Node.js to the world
Agreed. JavaScript was actually my first language after TurboPascal in 1996.
I was also there listening to the first podcasts when node came out.
JavaScript is a very interesting language, especially with it's prototype memory model. And the eventloop apart from the language is interesting as well. And it's no coincidence Apple went as far as baking optimizations for JavaScript primitive operations into the M1 microcode.
But I still think multithreading is best done by using blocking operations.
NIO can be implemented on top of blocking IO as far as I know but not the other way round.
Also, sidenote, I think JavaScript's only real failure is the lack of a canonical module/import system. That error lead to countless re-implementations of buildsystems and tens of thousands of hours wasted debugging.
> Also, sidenote, I think JavaScript's only real failure is the lack of a canonical module/import system. That error lead to countless re-implementations of buildsystems and tens of thousands of hours wasted debugging.
Agreed. I don't hate on JS, in fact I think it's the best tool for the several very common use cases it targets, and I'll even defend the way objects work in it (i.e. lets me do what I want with minimal fuss). The import/require drama was annoying, though.
It’s so interesting that the async/await stuff basically makes his presentation points meaningless? if you’re just using the async/await why use the callback style in the first place…
but I get it, you can always go back to the promises and callbacks if you want.
Exactly. People are too afraid of using threads these days for some perceived cargo-cult scalability reasons. My rule of thumb is just to use threads if the total number of threads per process won't exceed 1000.
(This is assuming you are already switching to communicating using channels or similar abstraction.)
The performance overhead of threads is largely unrelated to how many you have. The thing being minimized with async code is the rate at which you switch between them, because those context switches are expensive. On modern systems there are many common cases where the CPU time required to do the work between a pair of potentially blocking calls is much less than the CPU time required to yield when a blocking call occurs. Consequently, most of your CPU time is spent yielding to another thread. In good async designs, almost no CPU time is spent yielding. Channels will help batch up communication but you still have to context switch to read those channels. This is where thread-per-core software architectures came from; they use channels but they never context switch.
Any software that does a lot of fine-grained concurrent I/O has this issue. Database engines have been fighting this for many years, since they can pervasively block both on I/O and locking for data model concurrency control.
The cost of context switching in "async" code is very rarely smaller than the cost of switching OS threads. (Exception is when you'ree using a GC language with some sort of global lock.)
"Async" in native code is cargo cult, unless you're trying to run on bare metal without OS support.
The cost of switching goroutines, rust Futures, Zig async Frames, or fibers/userspace-tasks in general is on the other of a few nano-seconds whereas it's in the micro-second range for OS threads. This allows you to spawn tons of tasks and have them all communicate with each other very quickly (write to queue; push receiver to scheduler runqueue; switch out sender; switch to receiver) whereas doing so with OS threads would never scale (write to queue; syscall to wake receiver; syscall to switch out sender). Any highly concurrent application (think games, simulations, net services) uses userspace/custom task scheduling for similar reasons.
Nodejs is inherently asynchronous and the JavaScript developers bragged during its peak years how it was faster than Java for webservers despite only using one core because a classic JEE servlet container launches a new thread per request. Even if you don't count this as "context switch" and go for a thread pool you are deluding yourself because a thread pool is applying the principles of async with the caveat that tasks you send to the thread pool are not allowed to create tasks of their own.
There is a reason why so many developers have chosen to do application level scheduling: No operating system has exposed viable async primitives to build this on the OS level. OS threads suck so everyone reinvents the wheel. See Java's "virtual threads", Go's goroutines, Erlang's processes, NodeJS async.
You don't seem to be aware what a context switch on an application level is. It is often as simple as a function call. There is no way that returning to the OS, running a generic scheduler that is supposed to deal with any possible application workload that needs to store all the registers and possibly flush the TLB if the OS makes the mistake of executing a different process first and then restore all the registers can be faster than simply calling the next function in the same address space.
Developers of these systems brag about how you can have millions of tasks active at the same time without breaking any sweat.
The challenge is that async colors functions and many of the popular crates will force you to be async, so it isn't always a choice depending on which crates you need.
Please excuse my ignorance, I haven't done a ton of async Rust programming - but if you're trying to call async Rust from sync Rust, can you not just create a task, have that task push a value through a mpsc channel, shove the task on the executor, and wait for the value to be returned? Is the concern that control over the execution of the task is too coarse grained?
Yes, you can do that. You can use `block_on` to convert an async Future into a synchronous blocking call. So it is entirely possible to convert from the async world back into the sync world.
But you have to pull in an async runtime to do it. So library authors either have to force everyone to pull in an async runtime or write two versions of their code (sync and async).
There are ways to call both from both for sure, but my point is if you don't want any async in your code at all...that often isn't a choice if you want to use the popular web frameworks for example.
I can't both perform blocking I/O and wait for a cancellation signal from another thread. So I need to use poll(), and async is a nice interface to that.
Async and GUI threads are different concepts. Of course most GUIs have an event loop which can be used as a form of async, but with async you do your calculations in the main thread, while with GUIs you typically spin your calculations off to a different thread.
Most often when doing async you have a small number of tasks repeated many times, then you spin up one thread per CPU, and "randomly" assign each task as it comes in to a thread.
When doing GUI style programming you have a lot of different tasks and each task is done in exactly one thread.
Hmm I would say the concepts are intertwined. Lots of GUI frameworks use async/await and the GUI thread is just another concurrency pattern that adds lock free thread exclusivity to async tasks that are pinned to a single thread.
Note that if you "just" write responses to queries without yielding execution, you don't need async, you just write Sync handlers to an async framework. (Hitting dB requests in a synchronous way is not good for your perf though, you better have a mostly read / well cached problem)
A particularly interesting use case for async Rust without threads is cooperative scheduling on microcontrollers[1]; this article also does a really good job of explaining some of the complications referenced in TFA.
It really is, but I still favour "unsexy" manual poll/select code with a lot of if/elseing if it means not having to deal with async.
I fully acknowledge that I'm an "old school" system dev who's coming from the C world and not the JS world, so I probably have a certain bias because of that, but I genuinely can't understand how anybody could look at the mess that's Rust's async and think that it was a good design for a language that already had the reputation of being very complicated to write.
I tried to get it, I really did, but my god what a massive mess that is. And it contaminates everything it touches, too. I really love Rust and I do most of my coding in it these days, but every time I encounter async-heavy Rust code my jaw clenches and my vision blurs.
At least my clunky select "runtime" code can be safely contained in a couple functions while the rest of the code remains blissfully unaware of the magic going on under the hood.
Dear people coming from the JS world: give system threads and channels a try. I swear that a lot of the time it's vastly simpler and more elegant. There are very, very few practical problems where async is clearly superior (although plenty where it's arguably superior).
> but I genuinely can't understand how anybody could look at the mess that's Rust's async and think that it was a good design for a language that already had the reputation of being very complicated to write.
Rust adopted the stackless coroutine model for async tasks based on its constraints, such as having a minimal runtime by default, not requiring heap allocations left and right, and being amenable to aggressive optimizations such as inlining. The function coloring problem ("contamination") is an unfortunate consequence. The Rust devs are currently working on an effects system to fix this. Missing features such as standard async traits, async functions in traits, and executor-agnosticism are also valid complaints. Considering Rust's strict backwards compatibility guarantee, some of these will take a long time.
I like to think of Rust's "async story" as a good analogue to Rust's "story" in general. The Rust devs work hard to deliver backwards compatible, efficient, performant features at the cost of programmer comfort (ballooning complexity, edge cases that don't compile, etc.) and compile time, mainly. Of course, they try to resolve the regressions too, but there's only so much that can be done after the fact. Those are just the tradeoffs the Rust language embodies, and at this point I don't expect anything more or less. I like Rust too, but there are many reasons others may not. The still-developing ecosystem is a prominent one.
I read comments like this and feel like I’m living in some weird parallel universe. The vast majority of Rust I write day in and day out for my job is in an async context. It has some rough edges, but it’s not particularly painful and is often pleasant enough. Certainly better than promises in JS. I have also used system threads, channels, etc., and indeed there are some places where we communicate between long running async tasks with channels, which is nice, and some very simple CLI apps and stuff where we just use system threads rather than pulling in tokio and all that.
Anyway, while I have some issues with async around futur composition and closures, I see people with the kind of super strong reaction here and just feel like I must not be seeing something. To me, it solves the job well, is comprehensible and relatively easy to work with, and remains performant at scale without too much fiddling.
Honestly, this is me too. The only thing I’d like to also see is OTP-like supervisors and Trio-like nurseries. They each have their use and they’re totally user land concerns.
> It really is, but I still favour "unsexy" manual poll/select code with a lot of if/elseing if it means not having to deal with async.
> I fully acknowledge that I'm an "old school" system dev who's coming from the C world and not the JS world, so I probably have a certain bias because of that, but I genuinely can't understand how anybody could look at the mess that's Rust's async and think that it was a good design for a language that already had the reputation of being very complicated to write.
I'm in the same "old school" system dev category as you, and I think that modern languages have gone off the deep end, and I complained about async specifically in a recent comment on HN: https://news.ycombinator.com/item?id=37342711
> At least my clunky select "runtime" code can be safely contained in a couple functions while the rest of the code remains blissfully unaware of the magic going on under the hood.
And we could have had that for async as well, if languages were designed by the in-the-trenches industry developer, and not the "I think Haskell and Ocaml is great readability" academic crowd.
With async in particular, the most common implementation is to color the functions by qualifying the specific function as async, which IMO is exactly the wrong way to do it.
The correct way would be for the caller to mark a specific call as async.
IOW, which of the following is clearer to the reader at the point where `foo` is called?
Option 1: color the function
async function foo () {
// ...
}
...
let promise = foo ();
let bar = await promise;
Option 2: schedule any function
function foo () {
// ...
}
let sched_id = schedule foo ();
...
let bar = await sched_id;
Option 1 results in compilation errors for code in the call-stack that isn't async, results in needing two different functions (a wrapper for sync execution), and means that async only works for that specific function. Option 2 is more like how humans think - schedule this for later execution, when I'm done with my current job I'll wait for you if you haven't finished.
Isn't mixing async and sync code like this a recipe for deadlocks?
What if your example code is holding onto a thread that foo() is waiting to use?
Said another way, explain how you solved the problems of just synchronously waiting for async. If that just worked then we wouldn't need to proliferate the async/await through the stack.
> Said another way, explain how you solved the problems of just synchronously waiting for async.
Why? It isn't solved for async functions, is it? Just because the async is propagated up the call-stack doesn't mean that the call can't deadlock, does it?
Deadlocks aren't solved for a purely synchronous callstack either - A grabbing a resource, then calling B which calls C which calls A ...
Deadlocks are potentially there whether or not you mix sync/async. All that colored functions will get you is the ability to ignore the deadlock because that entire call-stack is stuck.
> If that just worked then we wouldn't need to proliferate the async/await through the stack.
> Yes actually it is solved. If you stick to async then it cannot deadlock (in this way) because you yield execution to await.
Maybe I'm misunderstanding what you are saying. I use the word "_implementation_type_" below to mean "either implemented as option 1 or option 2 from my post above."
With current asynchronous implementations (like JS, Rust, etc), any time you use `await` or similar, that statement may never return due to a deadlock in the callstack (A is awaiting B which is awaiting C which is awaiting A).
And if you never `await`, then deadlocking is irrelevant to the _implementation_type_ anyway.
So I am trying to understand what you mean by "it cannot deadlock in this way" - in what way do you mean? async functions can accidentally await on each other without knowing it, which is the deadlock I am talking about.
I think I might understand better if you gave me an example call-chain that, in option 1, sidesteps the deadlock, and in option 2, deadlocks.
I'm referring to the situation where a synchronous wait consumes the thread pool, preventing any further work.
A is sychrounously waiting B which is awaiting C which could complete but never gets scheduled because A is holding onto the only thread. Its a very common situation when you mix sync and async and you're working in a single threaded context, like UI programming with async. Of course it can also cause starvation and deadlock in a multithreaded context as well but the single thread makes the pitfall obvious.
That's an implementation problem, not a problem with the concept of asynchronous execution, and it's specifically a problem in only one popular implementation: Javascript in the browser without web-workers.
That's specifically why I called it a Leaky Abstraction in my first post on this: too many people are confusing a particular implementation of asynchronous function calls with the concept of asynchronous function calls.
I'm complaining about how the mainstream languages have implemented async function calls, and how poorly they have done so. Pointing out problems with their implementation doesn't make me rethink my position.
I don't see how it can be an implementation detail when fundamentally you must yield execution when the programmer has asked to retain execution.
Besides Javascript, its also a common problem in C# when you force synchronous execution of an async Task. I'm fairly sure its a problem in any language that would allow an async call to wait for a thread that could be waiting for it.
I really can't imagine how your proposed syntax could work unless the synchronous calls could be pre-empted, in which case, why even have async/await at all?
> I don't see how it can be an implementation detail when fundamentally you must yield execution when the programmer has asked to retain execution.
It's an implementation issue, because "running on only a single thread" is an artificial constraint imposed by the implementation. There is nothing in the concept of async functions, coroutines, etc that has the constraint "must run on the same thread as the sync waiting call".
An "abstraction" isn't really one when it requires knowledge of a particular implementation. Async in JS, Rust, C#, etc all require that the programmer knows how many threads are running at a given time (namely, you need to know that there is only one thread).
> But I look forward to your implementation.
Thank you :-)[1]. I actually am working (when I get the time, here and there) on a language for grug-brained developers like myself.
One implementation of "async without colored functions" I am considering is simply executing all async calls for a particular host thread on a separate dedicated thread that only ever schedules async functions for that host thread. This sidesteps your issue and makes colored functions pointless.
This is one possible way to sidestep the specific example deadlock you brought up. There's probably more.
[1] I'm working on a charitable interpretation of your words, i.e. you really would look forward to an implementation that sidesteps the issues I am whining about.
I think the major disconnect is that I'm mostly familiar with UI and game programming. In these async discussions I see a lot of disregard for the use cases that async C# and JavaScript were built around. These languages have complex thread contexts so it's possible to run continuations on a UI thread or a specific native thread with a bound GL context that can communicate with the GPU.
I suppose supporting this use case is an implementation detail but I would suggest you dig into the challenge. I feel like this is a major friction point with using Go more widely, for example.
> and not the "I think Haskell and Ocaml is great readability" academic crowd.
Actually, Rust could still learn a lot from these languages. In Haskell, one declares the call site as async, rather than the function. OCaml 5 effect handlers would be an especially good fit for Rust and solve the "colouration" problem.
I think Rust’s async stuff is a little half baked now but I have hope that it will be improved as time goes on.
In the mean time it is a little annoying to use, but I don’t mind designing against it by default. I feel less architecturally constrained if more syntactically constrained.
I'm curious what things you consider to be half-baked about Rust async.
I've used Rust async extensively for years, and I consider it to be the cleanest and most well designed async system out of any language (and yes, I have used many languages besides Rust).
Async traits come to mind immediately, generally needing more capability to existentially quantify Future types without penalty. Async function types are a mess to write out. More control over heap allocations in async/await futures (we currently have to Box/Pin more often than necessary). Async drop. Better cancellation. Async iteration.
Re: existential quantification and async function types
It'd be very nice to be able to use `impl` in more locations, representing a type which needs not be known to the user but is constant. This is a common occurrence and may let us write code like `fn foo(f: impl Fn() -> impl Future)` or maybe even eventually syntax sugar like `fn foo(f: impl async Fn())` which would be ideal.
Re: Boxing
I find that a common technique needed to get make abstraction around futures to work is the need to Box::pin things regularly. This isn't always an issue, but it's frequent enough that it's annoying. Moreover, it's not strictly necessary given knowledge of the future type, it's again more of a matter of Rust's minimal existential types.
Re: async drop and cancellation.
It's not always possible to have good guarantees about the cleanup of resources in async contexts. You can use abort, but that will just cause the the next yield point to not return and then the Drops to run. So now you're reliant on Drops working. I usually build in a "kind" shutdown with a timer before aborting in light of this.
C# has a version of this with their CancelationTokens. They're possible to get wrong and it's easy to fail to cancel promptly, but by convention it's also easy to pass a cancelation request and let tasks do resource cleanup before dying.
Re: Async iteration
Nicer syntax is definitely the thing. Futures without async/await also could just be done with combinators, but at the same time it wasn't popular or easy until the syntax was in place. I think there's a lot of leverage in getting good syntax and exploring the space of streams more fully.
> That would be useful, but I wouldn't call the lack of it "half-baked", since no other mainstream language has it either. It's just a nice-to-have.
Golang supports running asynchronous code in defers, similar with Zig when it still had async.
Async-drop gets upgraded from a nice-to-have into an efficiency concern as the current scheme of "finish your cancellation in Drop" doesn't support borrowed memory in completion-based APIs like Windows IOCP, Linux io_uring, etc. You have to resort to managed/owned memory to make it work in safe Rust which adds unnecessary inefficiency. The other alternatives are blocking in Drop or some language feature to statically guarantee a Future isn't cancelled once started/initially polled.
To run async in Drop in rust, you need to use block_on() as you can't natively await (unlike in Go). This is the "blocking on Drop" mentioned and can result in deadlocks if the async logic is waiting on the runtime to advance, but the block_on() is preventing the runtime thread from advancing. Something like `async fn drop(&mut self)` is one way to avoid this if Rust supported it.
You need to `block_on` only if you need to block on async code. But you don't need to block on order to run async code. You can spawn async code without blocking just fine and there is no risk of deadlocks.
1) That's no longer "running async code in Drop" as it's spawned/detached and semantically/can run outside the Drop. This distinction is important for something like `select` which assumes all cancellation finishes in Drop. 2) This doesn't address the efficiency concern of using borrowed memory in the Future. You have to either reference count or own the memory used by the Future for the "spawn in Drop" scheme to work for cleanup. 3) Even if you define an explicit/custom async destructor, Rust doesn't have a way to syntactically defer its execution like Go and Zig do so you'd end up having to call it on all exit points which is error prone like C (would result in a panic instead of a leak/UB, but that can be equally undesirable). 4) Is there anywhere one can read up on the work being done for async Drop in Rust? Was only able to find this official link but it seems to still have some unanswered questions (https://rust-lang.github.io/async-fundamentals-initiative/ro...)
Ok, in a very, very rare case (so far never happened to me) when I really need to await an async operation in the destructor, I just define an additional async destructor, call it explicitly and await it. Maybe it's not very elegant but gets the job done and is quite readable as well.
And this would be basically what you have to do in Go anyways - you need to explicitly use defer if you want code to run on destruction, with the caveat that in Go nothing stops you from forgetting to call it, when in Rust I can at least have a deterministic guard that would panic if I forget to call the explicit destructor before the object getting out of scope.
BTW async drop is being worked on in Rust, so in the future this minor annoyance will be gone
Yes I am aware of async drop proposals. And the point is not to handle a single value being dropped but to facilitate invariants during an abrupt tear down. Today, when I am writing a task which needs tear down I need to hand it a way to signal a “nice” shutdown, wait some time, and then hard abort it.
Actually, this "old school" approach is more readable even for folks who have never worked in the low-level C world. At-least everything is in front of your eyes and you can follow the logic. Unless code leveraging async is very well-structured, it requires too much brain-power to process and understand.
You cancel a sync IO op similar to how you cancel an async one: have another task (i.e OS thread in this case) issue the cancellation. Select semantically spawns a task per case/variant and does something similar under the hood if cancellation is implemented.
You can do that, but then the logic of your cancellable thread gets intermingled with the cancellation logic.
And since the cancellation logic runs on the cancellable thread, you can't really cancel a blocking operation. What you can do is to let it run to completion, check that it was canceled, and discard the value.
Not sure I follow; the cancellation logic is on both threads/tasks 1) the operation itself waiting for either the result or a cancel notification and 2) the cancellation thread sending that notification.
The cancellation thread is generally the one doing the `select` so it spawns the operation thread(s) and waits for (one of) their results (i.e. through a channel/event). The others which lose the race are sent the cancellation signal and optionally joined if they need to be (i.e. they use intrusive memory).
He didn't say queues though. CSP isn't processes streaming data to each other through buffered channels, it's one process synchronously passing one message to another. Whichever one gets the the communication point waits for the other.
> The lifetime of an Arc isn’t unknowable, it’s determined by where and how you hold it.
In the same sense that the lifetime of an object in a GC'd system has a lower bound of, "as long as it's referenced", sure. But that's nearly the opposite of what the borrow checker tries to do by statically bounding objects, at compile time.
> maybe the disconnect in this article is that the author is coming at Rust and trying to force their previous mental models on to it
The opposite actually! I spent about a decade doing systems programming in C, C++, and Rust before writing a bunch of Haskell at my current job. The degree to which a big language runtime and GC weren't a boogeyman for some problem spaces was really eye-opening.
> But that's nearly the opposite of what the borrow checker tries to do by statically bounding objects, at compile time.
Arc isn't an end-run around the borrow checker. If you need mutable references to the data inside of Arc, you still need to use something like a Mutex or Atomic types as appropriate.
> The degree to which a big language runtime and GC weren't a boogeyman for some problem spaces was really eye-opening.
I have the opposite experience, actually. I was an early adopter of Go and championed Garbage Collection for a long time. Then as our Go platforms scaled, we spent increasing amounts of our time playing games to appease the garbage collector, minimize allocations, and otherwise shape the code to be kind to the garbage collector.
The Go GC situation has improved continuously over the years, but it's still common to see libraries compete to reduce allocations and add complexity like pools specifically to minimize GC burden.
It was great when we were small, but as the GC became a bigger part of our performance narrative it started to feel like a burden to constantly be structuring things in a way to appease the garbage collector. With Rust it's nice to be able to handle things more explicitly and, importantly, without having to explain to newcomers to the codebase why we made a lot of decisions to appease the GC that appear unnecessarily complex at first glance.
There's a good chance this is rather a Go issue than a GC one. People get fooled by Go's pretense to be a high level C replacement. It is highly inadequate at performing this role at best.
The reason for that is the compiler quality, the design tradeoffs and Go's GC implementation throughput are simply not there for it to ever be a good general purpose systems-programming-oriented language.
Go receives undeserved hype, for use cases C# and Java are much better at due to their superior GC implementations and codegen quality (with C# offering better lower level features like structs+generics and first-class C interop).
Java GC has a non trivial overhead. I’ve moved workloads from Java to rust and gotten a 30x improvement from lack of GC. Likewise I’ve gotten 10x improvement in Java by preallocating objects and reusing then to avoid GC. (Fucking google and the cult of immutable objects). Guess what, lots of things that “make it harder to introduce bugs” make your shit run a lot slower too.
This is not an improvement from lack of GC per se but rather from zero cost abstractions (everything is monomorphised, no sin such as type erasure) first and foremost, and yes, deterministic memory management. Java is the worse language if you need to push performance to the limit since it does not offer convenient lower level language constructs to do so (unlike C#), but at reaching 80th percentile of performance, it is by far the best one.
But yes, GC is very much not free and is an explicit tradeoff vs compile time + manual memory management.
As an ops guy for decades, it makes me laugh to hear claims about Java GC superiority. Please go back in time and fix all the crashes and OOMs caused by enterprise JVM, as opposed to near-zero problems with the Go deployments.
Making stong statements without a backup in hard facts is a sign of zealotry...
I assure you if that code was to be ported to Go 1:1, Go GC would simply crawl to a halt. Write code badly enough and no matter how good hardware and software is, it won't be able to cope at some point. Even a good tool will give, if you beat it down hard enough.
Issues like these simply don't happen with GCs in modern JVM implementations or .NET (not saying they are perfect or don't have other shortcomings, but the sheer amount of developer hours invested in tuning and optimizing them far outstrips Go).
it makes me laugh to hear claims about Java GC superiority. Please go back in time and fix all the crashes and OOMs caused by enterprise JVM,
I don’t see how running into an OOM problem is necessarily a problem with the GC. That said, Java is a memory intensive language, it’s a trade off that Java is pretty up front about.
I don’t have a horse in this race but I would be quite surprised if Go’s GC implementation could even hold a candle to the ones found in C# and Java. They have spent literally decades of research and development, and god knows how much money (likely north of $1b), optimizing and refining their GC implementations. Go just simply lacks any of the sort of maturity and investment those languages have.
Java has billions spent on marketing and lobbying.
Since the advent of Java in mid-90s I hear about superiority of its VM, yet my observations from the ops PoV claim otherwise. So I suspect a huge hoax...
Hey btw, you're saying "Java is _memory intensive_", like it would magically explain everything. Let's get to that more deeply. Why is it so, dear Watson? Have you compared the memory consumption of the same algo and pretty much similar data structures between languages? Why Java has to be such a memory hog? Why also its class loading is so slow? Are these a qualities of superior VM design and zillions of man-hour invested? huh?
By the way, if the code implementing functionality X needs N times more memory than the other language with gc, then however advanced that gc would be (need to find a proof for that btw), it wouldn't catch up speedwise, because it simply needs to move around more. So simple.
Java has billions spent on marketing and lobbying.
Marketing is not a silver bullet for success and the tech industry is full of examples of exactly that. The truth is that Sun was able to promote Java so heavily because it was found to be useful.
Since the advent of Java in mid-90s I hear about superiority of its VM, yet my observations from the ops PoV claim otherwise.
The landscape of the 90s certainly made a VM language appealing. And compared to the options of that day it's hardly any wonder.
So I suspect a huge hoax...
It's you verses a plurality, if not majority, of the entire enterprise software market. Of course that's not to say that Java doesn't have problems or that the JVM is perfect, but is it so hard to believe that Java got something right? Is it honestly more believable that everyone else is caught up in a collective delusion?
Hey btw, you're saying "Java is _memory intensive_", like it would magically explain everything.
It's not that Java is necessarily memory intensive, but that a lot of Java performance tuning is focused towards optimizing throughput performance, not memory utilization. Cleaning out a large heap occasionally is in general better than cleaning out a smaller one more frequently.
By the way, if the code implementing functionality X needs N times more memory than the other language with gc, then however advanced that gc would be (need to find a proof for that btw), it wouldn't catch up speedwise, because it simply needs to move around more. So simple
It's not so simple. First of all, the choice of a large heap is not mandated by Java, it's a trade off that developers are making. Second of all, GC performance issues only manifest when code is generating a lot of garbage, and believe it or not, Java can be written to vastly minimize the garbage produced. And last of all, Java GCs like Shenandoah have a max GC pause time of less than 1ms for heaps up to 16TiB.
Anyway, at the end of the day no one is going to take Go away from you. Personally I don't have a horse in this race. That said, the fact is that Java GCs are far more configurable, sophisticated, and advanced than anything Go has (and likely ever will). IMO, Go came at a point in time where there was a niche to exploit, but that niche is shrinking.
I would like to answer your points more deeply, not having much time for it now.
But I think you are avoiding a direct answer to the question why Java needs so much memory in the first place. You say about "developer's choice for a big heap", first I don't think it is their choice, but the consequence of the fact that such a big heap is needed at all, for a typical code. Why?
Let's code a basic https endpoint using typical popular framework returning some simple json data. Usually stuff. Why it will be consuming 5x - 10x more memory for Java? And, if one says it's just unrealistic microbenchmark, things go worse when coding more real stuff.
Btw,having more knobs for a gc is not necessarily a good thing, if it means that there are no fire-and-forget good defaults. If an engineer needs continously to get his head around these knobs to have a non-crashing app, then we have problem. Or rather - ops have a problem, and some programmers are, unfortunately, disconnected from the ops realm. Have you been working together with ops guys? On prod, ofc?
Honestly, the biggest stumbling block for rust and async is the notion of memory pinning.
Rust will do a lot of invisible memory relocations under the covers. Which can work great in single threaded contexts. However, once you start talking about threading those invisible memory moves are a hazard. The moment shared memory comes into play everything just gets a whole lot harder with the rust async story.
Contrast that with a language like java or go. It's true that the compiler won't catch you when 2 threads access the same shared memory, but at the same time the mental burden around "Where is this in memory, how do I make sure it deallocates correctly, etc" just evaporates. A whole host of complex types are erased and the language simply cleans up stuff when nothing references it.
To me, it seems like GCs simply make a language better for concurrency. They generally solve a complex problem.
> Rust will do a lot of invisible memory relocations under the covers.
I don't think it's quite accurate to point to "invisible memory relocations" as the problem that pinning solves. In most cases, memory relocations in Rust are very explicit, by moving an owned value when it has no live references (if it has any references, the borrow checker will stop you), or calling mem::replace() or mem::swap(), or something along those lines.
Instead, the primary purpose of pinning is to mark these explicit relocations as unsafe for certain objects (that are referenced elsewhere by raw pointer), so that external users must promise not to relocate certain objects on pain of causing UB with your interface. In C/C++, or indeed in unsafe Rust, the same idea can be more trivially indicated by a comment such as /* Don't mess with this object until such-and-such other code is done using it! */. All pinning does is to enforce this rule at compile time for all safe code.
Memory pinning in Rust is not a problem that has to do with concurrency because the compiler will never relocate memory when something is referencing it. The problem is however with how stackless coroutines in general (even single-threaded ones, like generators) work. They are inherently self-referential structures, and Rust's memory model likes to pretend such structures don't exist, so you need library workarounds like `Pin` to work with them from safe code (and the discussion on whether they are actually sound is still open!)
>(and the discussion on whether they are actually sound is still open!)
Do you have a reference for this? Frankly, maybe I shouldn't ask since I still don't even understand why stackless coroutines are necessarily self-referential, but I am quite curious!
Basically the problem is that async blocks/fns/generators need to create a struct that holds all the local variables within them at any suspension/await/yield point. But local variables can contain references to other local variables, so there are parts of this struct that reference other parts of this struct. This creates two problems:
- once you create such self-references you can no longer move this struct. But moving a struct is safe, so you need some unsafe code that "promises" you this won't happen. `Pin` is a witness of such promise.
- in the memory model having an `&mut` reference to this struct means that it is the only way to access it. But this is no longer true for self referential structs, since there are other ways to access its contents, namely the fields corresponding to those local variables that reference other local variables. This is the problem that's still open.
> I still don't even understand why stackless coroutines are necessarily self-referential, but I am quite curious!
Because when stackless coroutines run they don’t have access to the stack that existed when they were created. everything that used to be on the stack needs to get packaged up in a struct (this is what `async fn` does). However now everything that used to point to something else on the stack (which rust understands and is fine with) now points to something else within the “impl Future” struct. Hence you have self referential structs.
Interestingly, the newest Java memory feature (Panama FFI/M) actually can catch you if threads race on a memory allocation. They have done a lot of rather complex and little appreciated work to make this work in a very efficient way.
The new api lets you allocate "memory segments", which are byte arrays/C style structs. Such segments can be passed to native code easily or just used directly, deallocated with or without GC, bounds errors are blocked, use-after-free bugs are blocked, and segments can also be confined to a thread so races are also blocked (all at runtime though).
Unfortunately it only becomes available as a finalized non-preview API in Java 22, which is the release after the next one. In Java 21 it's available but behind a flag.
> In the same sense that the lifetime of an object in a GC'd system has a lower bound of, "as long as it's referenced", sure.
These are not the same.
The problem with GC'd systems is that you don't know when the GC will run and eat up your cpu cycles. It is impossible to determine when the memory will actually be freed in such systems. With ARC, you know exactly when you will release your last reference and that's when the resource is freed up.
In terms of performance, ARC offers massive benefits because the memory that's being dereferenced is already in the cache. It's hard to understate how big of a deal this is. There's a reason people like ARC and stay away from GC when performance actually begins to matter. :)
> With ARC, you know exactly when you will release your last reference and that's when the resource is freed up.
It's more like "you notice when it happens". You don't know in advance when the last reference will be released (if you did, there would be no point in using reference counting).
> In terms of performance, ARC offers massive benefits because the memory that's being dereferenced is already in the cache.
It all depends on your access patterns. When ARC adjusts the reference counter, the object is invalidated in all other threads' caches. If this happens with high frequency, the cache misses absolutely demolish performance. GC simply does not have this problem.
> There's a reason people like ARC and stay away from GC when performance actually begins to matter.
If you're using a language without GC built in, you usually don't have a choice. When performance really begins to matter, people reach for things like hazard pointers.
> It's more like "you notice when it happens". You don't know in advance when the last reference will be released
A barista knows when a customer will pay for coffee (after they have placed their order). A barista does not know when that customer will walk in through the door.
> (if you did, there would be no point in using reference counting).
There’s a difference between being able to deduce when the last reference is dropped (for example, by profiling code) and not being able to tell anything about when something will happen.
A particular developer may not know when the last reference to an object is dropped, but they can find out. Nobody can guess when GC will come and take your cycles away.
> The cache misses absolutely demolish performance
With safe Rust, you shouldn’t be able to access memory that has been freed up. So cache misses on memory that has been released is not a problem in a language that prevents use-after-free bugs :)
> If you’re using a language without GC built in, you usually don’t have a choice.
I’m pretty sure the choice of using Rust was made precisely because GC isn’t a thing (in all places that love and use rust that is)
> A barista knows when a customer will pay for coffee (after they have placed their order). A barista does not know when that customer will walk in through the door.
Sorry, no chance of deciphering that.
> There’s a difference between being able to deduce when the last reference is dropped (for example, by profiling code) and not being able to tell anything about when something will happen.
> A particular developer may not know when the last reference to an object is dropped, but they can find out.
The developer can figure out when the last reference to the object is dropped in that particular execution of the program, but not in the general sense, not anymore than they can in a GC'd language.
The only instance where they can point to a place in the code and with certainty say "the reference counted object that was created over there is always destroyed at this line" is in cases where reference counting was not needed in the first place.
> With safe Rust, you shouldn’t be able to access memory that has been freed up. So cache misses on memory that has been released is not a problem in a language that prevents use-after-free bugs :)
I'm not sure why you're talking about freed memory.
Say that thread A is looking at a reference-counted object. Thread B looks at the same object, and modifies the object's reference counter as part of doing this (to ensure that the object stays alive). By doing so, thread B has invalidated thread A's cache. Thread A has to spend time reloading its cache line the next time it accesses the object.
This is a performance issue that's inherent to reference counting.
> I’m pretty sure the choice of using Rust was made precisely because GC isn’t a thing (in all places that love and use rust that is)
Wanting to avoid "GC everywhere", yes. But Rust/C++ programs can have parts that would be better served by (tracing) garbage collection, but where they have to make do with reference counting, because garbage collection is not available.
GC generally optimises for throughput over latency. But there is also another cost: high-throughput GC usually uses more memory (sometimes 2-3x as much!). Arc keeps your memory usage low and can keep your latency more consistent, but it will often sacrifice throughput compared to a GC tuned for it. (Of course, stack allocation, where possible, beats them all, which is why rust and C++ tend to win out over java in throughput even if the GC has an advantage over reference counting, because java has to GC a lot more than other languages due to no explicit stack allocation)
> In terms of performance, ARC offers massive benefits
but it also has big disadvantage, that it communicates to actual malloc for memory management, which is usually much less performant than GC from various reasons.
> which is usually much less performant than GC from various reasons.
Can you elaborate?
I've seen a couple of malloc implementations, and in all of them, free() is a cheap operation. It usually involves setting a bit somewhere and potentially merging with an adjacent free block if available/appropriate.
malloc() is the expensive call, but I don't see how a GC system can get around the same costs for similar reasons.
- Like others have said, both malloc()/free() touch a lot of global state, so you either have contention between threads, or do as jemalloc does and keep thread-local pools that you occasionally reconcile.
- A moving (and ideally, generational) GC means that you can recompact the heap, making malloc() little more than a pointer bump.
- This also suggests subsequent allocations will have good locality, helping cache performance.
Manual memory management isn't magically pause-free, you just get to express some opinion about where you take the pauses. And I'll contend that (A) most programmers aren't especially good at choosing when that should be, and (B) lots (most?) software cares about overall throughput, so long as max latency stays under some sane bound.
I've seen some benchmarks, but can't find them now, so maybe I am wrong about this.
> free() is a cheap operation. It usually involves setting a bit somewhere and potentially merging with an adjacent free block if available/appropriate.
there is some tree like structure somewhere, which then would allow to locate this block for "malloc()", this structure has to be modified in parallel by many concurrent threads, which likely will need some locks, meaning program operates outside of CPU cache.
In JVM for example, GC is integrated into thread models, so they can have heap per thread, and also "free()" happens asynchronously, so doesn't block calling code. Additionally, malloc approaches usually suffer from memory fragmentation, while JVM GC is doing compactions all the time in background, tracks memory blocks generations, and many other optimizations.
I still see it in a couple of places where code-formatted text is inline with regular text, for example "relative_difference." The code blocks themselves look good.
The main issue with just using newlib's allocator is that we're using FreeRTOS, and (AFAIK), there isn't a good way to make the two aware of each other. One could also probably hook operator new up to to FreeRTOS, but we try to avoid dynamic memory allocation after init, like the article said.
> (AFAIK), there isn't a good way to make the two aware of each other
When you set up FreeRTOS you configure which heap implementation it uses. They provide a number of heapX.c files you can choose between. heap3.c uses malloc directly for instance, the others use built-in implementations of a heap algorithm of varying complexity.
If you want to use your own allocator, you just write your own heapX.c file. Use the malloc one as a template and call the newlib allocator instead of malloc. Link that in instead of any other heapX file and voila, FreeRTOS is using newlib to allocate memory.
IIRC, the concern was in the details of newlib's allocator. Its sbrk looks for a linker symbol, then starts slicing memory off of that address. There were worries that unless we were careful, it might stomp on FreeRTOS. (These worries could have been unfounded, but we didn't look into it deeply. We only dynamically allocate at startup, so we'd prefer FreeRTOS's heap_1.c anyways.)
I don't follow. If an allocator is optimized for our needs (i.e., allocating everything up front and never freeing), and is easier to set up than the general-purpose allocator in newlib, how is using it "technical debt"?
"These worries could have been unfounded, but we didn't look into it deeply. We only dynamically allocate at startup, so we'd prefer FreeRTOS's heap_1.c anyways."
The definition of technical debt is not taking the time to understand the components of the system, in favor of getting the system working to a given deadline.
Allocators are "well understood" by the code, which is to say that a lot of programmers assume they know what the allocator can do and what it can't do.
You've chosen to go this route of a static allocator which never frees as your solution because it meets your current requirements, however you've done so under the cover of the API of a more common allocator. This works great and you ship your project and move on.
Now months, or years later, someone else comes along and they are adding a feature or changing a small bit. They need an allocator that can allocate and free from the heap, but since they don't know your allocator can't free they just use the function calls, they all link but the ones to free() don't actually do anything. They run out of memory and die. And they start debugging the problem and realize that you didn't do the work to understand the allocator and to make it work in your setup, so in addition to them having to do that to use it in their code they either have to go back and rewrite your code to use the fully functional allocator, or they leave your code alone (that would be adding still more technical debt since you now have two allocators in the code that operate under different principles).
This is how software systems get so broken over time, people get it working and move on, not taking the time to think about how it is going to be used in the future or even if it can be used in the future. The code that sits there, and will cause future you or future maintainer is "debt" that eventually will have to be paid. Sometimes you can pay it with a refactor, sometimes you have to throw it all out and start over.
You're really making a bunch of uncharitable assumptions about our situation.
First, you assume that this behavior isn't well understood, well documented, and even expected in my organization and our corner of the industry. The reason FreeRTOS has an allocator that doesn't free is because it's standard behavior in embedded systems with hard time requirements.
Then you guess that we've taken no precautions against some future developer not having this knowledge. The allocator in question asserts if the user tries to free to avoid the very scenario you describe, and attempting to use newlib's allocator asserts with an explanation of why it's unused.
Technical debt is an issue I take very seriously, and I work hard to document the design decisions we've made. Choosing to use a tool that's designed specifically for our use case, instead of taking the time to set up a general-purpose allocator with overhead we don't care for, is hardly a debt that needs to be paid off.
And you are reading way more into my comment than I wrote, sorry for that.
The only point I make is this phrase:
"We didn't look into it deeply" ... "works for our purpose."
Is the definition of technical debt. The opposite of technical debt is "We looked at the options available, we clarified the requirements and the assumptions we depend on in the code for it to work, and we put these regression tests into place to warn us if someone tried to use the system in a way that violated one of our assumptions."
That sentence, which I quoted, and you wrote (its right up there in your comment) is my definition of technical debt. That my explanation wasn't clear, and that your above response still doesn't mention how you insured this its pretty clear we have very different ideas about what technical debt is and how it is mitigated.
Talking to one of the hyper developers, a lot of this machinery was made before it arrived in the stdlib. My response was, "neat, so we can delete the custom stuff now, right? :P" but I suppose any change carries some risk with it.