← Back to Learn
monitoringagent-safetybest-practices

Incident Severity Classification for AI Agents

Authensor

Not all safety incidents require the same response. A minor anomaly in an agent's action frequency is different from an active data exfiltration attempt. Severity classification assigns a level to each incident based on its impact, enabling teams to prioritize responses and allocate resources appropriately.

Severity Levels

SEV-1: Critical

Active harm is occurring or imminent. Examples: confirmed data exfiltration, policy bypass allowing unauthorized actions, safety scanner completely nonfunctional, agent executing actions on production systems without authorization.

Response: Immediate. All available responders engage. Affected agents are shut down. Communication goes to leadership within 15 minutes.

SEV-2: High

Significant safety degradation but no confirmed active harm. Examples: elevated rate of policy evaluation errors, Aegis scanner returning unexpected results, anomalous agent behavior that has not yet caused damage, approval workflow bypass.

Response: Within 1 hour. On-call engineer investigates. Affected agents may be throttled or put into degraded mode.

SEV-3: Medium

Potential safety concern that requires investigation. Examples: unusual patterns in audit logs, minor anomaly detection alerts, single agent showing behavioral drift, failed health checks on non-critical agents.

Response: Within 4 hours. Assigned during business hours. No immediate operational impact.

SEV-4: Low

Informational findings that should be tracked. Examples: dead policy rules identified during audit, minor configuration drift, documentation gaps, non-recurring false positive alerts.

Response: Tracked in backlog. Addressed during regular maintenance cycles.

Classification Criteria

Classify based on three factors:

Impact scope: How many agents, users, or systems are affected?

Safety degradation: Is the safety posture weakened? By how much?

Active exploitation: Is someone actively exploiting the issue, or is it a latent vulnerability?

Automated Classification

Authensor's Sentinel monitoring can automatically classify incidents based on preconfigured rules. Map monitoring alert types to severity levels. Critical alerts trigger SEV-1 classification automatically. Lower-severity alerts create tracked incidents for review.

Escalation Paths

Define clear escalation paths for each severity level. SEV-1 goes directly to incident commander. SEV-2 goes to the on-call engineer with escalation to incident commander if not resolved within the SLA. Document these paths and test them regularly.

Clear severity classification prevents both under-reaction and over-reaction. Both waste resources and erode trust.

Keep learning

Explore more guides on AI agent safety, prompt injection, and building secure systems.

View All Guides