thomaskoopman's comments

thomaskoopman · on May 5, 2025

Yes, but quaternions of unit length are a representation of the rotation group in 3D space ( https://en.wikipedia.org/wiki/Representation_theory_of_SU(2)... ), which is how they are used for rotations.

suspended_state · on May 5, 2025

The original question was: can quaternions be used in place of matrices to perform LLMs tasks, and the answer is: quaternions are 4 dimensions, with the implied meaning that matrices can cover different dimentionalities, which are needed for LLMs (and neural network in general).

eru · on May 7, 2025

Yes, if you have essentially 4d objects and you disable 1 dimension by requiring unit length, you end up with something that is effectively 3d.

Of course, that the 3d thing you end up with represents rotations in 3d space is extremely neat; and not something all 3d things do.

thomaskoopman · on May 5, 2025

Very cool, fast and looks like it should vectorize too. Do you have a jump function for parallel seeding?

How did you come up with this, some number-theoretic basis or more experimental?

thomaskoopman · on May 5, 2025

A matrix is a representation of a linear function (e.g. a function that plays nice with + and scalar multiplication). A specific subset can be used to describe rotations in 3D space. Quaternions can (arguably) do this better. But quaternions cannot be used to describe any linear function. So I do not think this makes sense for LLMs.

tzs · on May 5, 2025

> But quaternions cannot be used to describe any linear function

Does this mean all functions that can be described by quaternions are non-linear, or does it mean that quaternions can describe some linear functions such as the ones associated with rotations in 3D space but there are linear function they cannot describe?

thomaskoopman · on May 5, 2025

Quaternions (when viewed as vectors) are not linear functions, but the arguments to linear functions. You can add them: (a + bi + cj + dk) + (a' + b'i + c'j + d'k) = (a + a') + (b + b')i + (c + c')j + (d + d')k, and multiply them by a scalar: lambda * (a + bi + cj + dk)= (lambda * a) + (lambda * b)i + (lambda * c)j + (lambda * d)k. An example of a linear function on quaternions is the zero function. After all, zero(q + q') = 0 = 0 + 0 = zero(q) + zero(q'), and zero(lambda * q) = 0 = lambda * 0 = lambda * zero(q).

Matrices and quaternions take different approaches to describing rotations: a matrix sees a rotation as a linear function, and quaternions see rotations as a group (confusingly represented with matrices, this field is called representation theory if you want to know more).

So the answer to your question: there are linear functions that quaternions cannot describe. And quaternions can only describe a very specific class of linear functions (with some rather complicated maths behind them).

thomaskoopman · on March 8, 2025

Do you have any tips for writing research papers in a more accessible way?

thomaskoopman · on March 8, 2025

I think it depends more on the ratio between access time and how often you use the data. Adding two arrays that fit in L1 is already limited by access time. On Zen3, we can add two 32-byte vectors per cycle, but only store one of these per cycle. For matrix multiplication, we can do the two additions per cycle (or really c <- a * b + c) because we have to do multiple operations once we have loaded the data into registers.

I can see it be useful for data sets of a few dozen MBs as well.