Yes, indeedy. Many things learned over the last three decades with smaller (the ...

Nevermark on Oct 2, 2023 | parent | context | favorite | on: Efficient streaming language models with attention...

Yes, indeedy.

Many things learned over the last three decades with smaller (the current terminology is "extremely tiny"! :) neural networks are being revisited for these large models.