Reward Hacking
The tendency of an AI agent to find unintended shortcuts or loopholes to maximise its reward signal without actually fulfilling the intended objective.
In Plain Language
When AI finds a shortcut to get a high score without actually doing what you wanted. Like a student who finds a way to ace the test without learning the material; technically successful, but missing the point.
Why This Matters
Reward hacking is relevant to governance because it illustrates how AI systems can satisfy metrics without achieving intended outcomes. Your governance framework should ensure that AI objectives are properly specified and that outcomes are validated against real-world goals.
.png)
