Hacker Newsnew | past | comments | ask | show | jobs | submit | zoobab's commentslogin

"open source"

give me the training data?


The training data is the entire internet. How do you propose they ship that to you

As a zip archive of however they store it in their database?

to be fair there are some degree of "hand curation" of the data so while "it is the internet", the actual trained data is a derivation of that.

in a mild but productive analogy:

I could actually hand a K&R book C programming book + lots of specs to say "this is the linux source code" (the raw data that were all observations were made, aka "the internet") ...or just send them the "kernel the source code" (the refined training data, after a LOT of manual stuff) ... that your compiler consumes to generate the kernel. (the Open Weights model, what they actually shared)

Mildly related rant: honestly its a bit shit to say "open source model" in a "open weights" model, its like saying World of Warcraft is opensource because they gave you an executable of the game. (you can still change it, but in more restricted ways)


You ARE the training data

Time to setup my own local LLM.

Mirror the archive before it get taken down.

Avoid javascript like plague, it can be overwritten at the client side.

"build a homelab in your own basement, rent a cheap VPS, set up wireguard and viola"

Put an openwrt router in front of your fat server, and for each query it sends a WOL packet to the box, and add some delay to an ethernet bridge.

That was the fat server is mostly sleeping, and only woken up when needed.


What they should launch is an abuse of dominant position on the desktop/laptop market, with appropriate remedies such as fines.

Plus you become indépendant from toxic decisions taken by politicians.


I finally have a garage where i can weld my own bike frame!

No more coding after 5pm!


Congrats and have fun. Oh and order those Paragon parts ASAP if you need them as they are closing shop!

https://www.paragonmachineworks.com/


I saw it, so sad.

Nearly all the local frame builders were using their parts, especially the rear dropouts.


Use a protocol that is censorship resistant, not HTTP.


Just a client side written in JS, nothing to see here, the LLM is still secret.

They could have written that in curl+bash that would not have changed much.


Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: