Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

The compression analogy is interesting. Another way of looking at it could be fine-tuning as "knowing what to leave out" - a 3B model for example tuned for a narrow task doesn't need the capacity that makes 70B good at many things.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: