Hacker News
new
|
past
|
comments
|
ask
|
show
|
jobs
|
submit
login
davebren
50 days ago
|
parent
|
context
|
favorite
| on:
Exploiting the most prominent AI agent benchmarks
I get it, but why would anyone trust what these companies say about their model performance anyway. Everyone can see for themselves how well they complete whatever tasks they're interested in.
Guidelines
|
FAQ
|
Lists
|
API
|
Security
|
Legal
|
Apply to YC
|
Contact
Search: