Inner Alignment

The challenge of ensuring that the objective learned by a neural network during training matches the objective specified by the training procedure.

In Plain Language

Making sure the AI's internal learned goals actually match what you trained it to do. Sometimes AI develops hidden objectives during training that differ from what you intended.