Outer Alignment
The challenge of ensuring that the formal objective specified for an AI system accurately captures the designer's true intentions and values.
In Plain Language
Making sure the training objective you write down actually captures what you really want. If you reward an AI for "customer satisfaction scores," it might learn to manipulate scores rather than truly satisfy customers.
.png)
