The performance section of the spec[1] does a pretty good job explaining how the implementation remains fast.
In Redis Cluster nodes don't proxy commands to the right
node in charge for a given key, but instead they redirect
clients to the right nodes serving a given portion of the
key space.
Eventually clients obtain an up to date representation
of the cluster and which node serves which subset of
keys, so during normal operations clients directly
contact the right nodes in order to send a given command.
Because of the use of asynchronous replication, nodes
does not wait for other nodes acknowledgment of writes
(if not explicitly requested using the WAIT command).