Trojan Attack

A type of backdoor attack where a model is trained to behave normally on standard inputs but produces malicious outputs when a specific trigger pattern is detected.

In Plain Language

A hidden "sleeper" attack embedded in an AI. The model behaves perfectly during testing, but when a specific trigger appears in real use, it switches to malicious behaviour.