Jailbreaking

Techniques used to bypass the safety guardrails and content policies of an AI model, particularly large language models, to elicit prohibited or harmful outputs.

In Plain Language

Finding clever ways to make an AI ignore its safety rules. Like convincing a chatbot to produce content it's supposed to refuse by framing the request in a creative way.

Why This Matters

Jailbreaking risks are a governance concern for any organisation deploying conversational AI. Your risk management framework should include red teaming, guardrail testing and incident response procedures for jailbreak attempts.