Hacker News
new
|
past
|
comments
|
ask
|
show
|
jobs
|
submit
login
riku_iki
5 months ago
|
parent
|
context
|
favorite
| on:
Kimi K2 Thinking, a SOTA open-source trillion-para...
its moe, each expert tower can be branched from some smaller model.
jychang
5 months ago
[–]
That's not how MoE works, you need to train the FFN directly or else the FFN gate would have no clue how to activate the expert.
Consider applying for YC's Summer 2026 batch! Applications are open till May 4
Guidelines
|
FAQ
|
Lists
|
API
|
Security
|
Legal
|
Apply to YC
|
Contact
Search: