Also, performance on research-y questions isn't always a good indicator of how the model will do for code generation or agent orchestration.
Also, performance on research-y questions isn't always a good indicator of how the model will do for code generation or agent orchestration.