Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Using functional paradigm with immutable types does take care of a lot of the complexities of parallel computing... It allows you to make a lot of assumptions (from the underlying platform's perspective) and can break up work in very interesting ways, not just across cpu cores, but even in distributed computing environments. If you combine this with a scriptable language, you can even carry the workload with the code to operate on the load.. which takes things a bit farther. You could create a literal farm of workers that grab a bit of data, process it against the defined load, and then return the result with the next step to run against the data...

I've wanted to create such a system for a while.. right now the closest I could come up with would be to use nodejs with json and a message queue to handle requests/loads. Unfortunately, there's a pretty serious cost to the json serialization, and other issues... but it's an interesting idea that has merit. I think in the end such a system will have a quickly serializable binary expression of both the data and the work to be done.

The catch is that most developers I know aren't used to breaking work up in such a way that it could be very parallelized.



The kicker for me that makes me think that working too hard to make easily parallelizable constructs isn't worth it is that communication is expensive and single cores are actually pretty darn powerful on their own. The implication is that you don't want to be parallelizing at too low a level. You want to be doing it at a much higher level if you possibly can.

If you have an 8-core machine and thirty independent tasks at the top-level, then all the functional programming research and parallel algorithm wizardry in the world isn't going to make you want to parallelize anything except the top-level tasks. So even if that's a verbose and error-prone task due to procedural programming constructs, at least you only need to do it once. Heck, in my own (admittedly limited) experience 90% of the hard work I have needed done has been handled by my OS's process scheduler and Redis.

The fact is that business applications, the web, and consumer software are all perfectly happy to accept the current limitations of hardware -- 12 processes running on 4 cores is efficient and easy, even with no parallelization at all below the OS process level (with the possible exception of GUI threads in desktop applications).

The only really interesting work happens in areas like HPC where you have 1 task and 900 cores and if you can't parallelize you are dead in the water, and latency-critical applications where the sacrifices made to run intra-task parallelization pay big dividends.


I think with the increasing availability of more, weaker CPUs with a decent thermal envelope, and efficient processors like ARM, then distributing work makes even more sense... in a SOA system, where a lot of processing may well be IO bound, why not break out the load even more... I think the future will be thousands of compute nodes/workers along with hundreds of service nodes handling millions of simultaneous requests.

Functional constructs make such scaling nearly effortless once the practical issues of breaking up work are handled. Yes, communications has a cost, but there are faster channels available than what are used... and distributing data persistence into redundant, sharded clusters can yield a lot of other benefits.

Given, most line of business applications are fine on current hardware... the problem is scaling to 10x or 100x the workload. You can do this by creating a system that can scale horizontally, or to be more performance oriented with current hardware. One solution gains you a single generation of increased output... another gets you N scale expansion.

I don't think it's just macro parallel tasks... but many-micro tasks that can work within such a system.


You are missing the per-CPU parallelization via vectorization that lies at the core of every modern video decoding library. The conundrum with C is that it is one of the very few languages that allow (via assembler callouts) to directly use CPU vecorization instructions, but at the same time does so in a manner that absolutely prevents automatic optimizations (i.e. by observing data dependencies).




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: