Benchmark
A standardised test, dataset or evaluation protocol used to measure and compare the performance of AI models on specific tasks under controlled conditions.
In Plain Language
A standardised test for comparing AI models. Like the SAT for AI; everyone takes the same test so you can compare scores. Common benchmarks test language understanding, image recognition, etc.
.png)
