Benchmark

A standardised test, dataset or evaluation protocol used to measure and compare the performance of AI models on specific tasks under controlled conditions.

In Plain Language

A standardised test for comparing AI models. Like the SAT for AI; everyone takes the same test so you can compare scores. Common benchmarks test language understanding, image recognition, etc.