Safety Filter

A post-processing mechanism that screens AI model outputs for harmful, inappropriate or policy-violating content before presenting results to the user.

In Plain Language

A last line of defence that checks AI outputs before they reach users. Even if the AI generates something harmful, the safety filter catches and blocks it. Like a security checkpoint at the exit.

Why This Matters

Safety filters are a governance control that provides a last line of defence against harmful AI outputs. Your governance framework should require safety filtering for all customer-facing AI systems and define monitoring procedures to ensure filters remain effective.

All

A B C D E F G H I J K L M N O P Q R S T U V W Z

Secure Multi-Party Computation

Selection Bias

Shadow Deployment

SHAP Values

Sociotechnical System

Specification Gaming

Stakeholder Engagement

Stress Testing

Superintelligence

Supply Chain Attack (AI)

Synthetic Data