Anomaly detection identifies when an AI agent's behavior deviates from what is normal. While policy rules catch known bad patterns, anomaly detection catches unknown bad patterns by comparing current behavior to an established baseline.
Before you can detect anomalies, you need a baseline. Track these metrics during normal operation:
After collecting data over a representative period, these become your baseline.
Rate anomalies: The agent is making 50 tool calls per minute instead of 5. It may be stuck in a loop or executing a scripted attack.
Denial rate anomalies: The denial rate jumps from 2% to 40%. The agent is repeatedly trying actions that are blocked. This suggests the agent's goal has changed (possibly through injection).
Tool distribution anomalies: The agent normally uses search and calculator but is now using email and HTTP tools. The tool usage pattern has shifted, suggesting goal hijacking.
Argument anomalies: Tool arguments are suddenly much larger or contain different patterns. Large arguments may indicate data exfiltration through tool parameters.
Timing anomalies: The agent's action spacing changes from human-like pauses to machine-speed execution.
Sentinel implements two detection algorithms:
EWMA: Maintains a running average. When the current value is more than N standard deviations from the average, it flags an anomaly:
sentinel: {
detectors: {
ewma: { alpha: 0.3, sigmaThreshold: 3.0 }
}
}
CUSUM: Accumulates small deviations. Catches persistent changes that are individually small but collectively significant:
sentinel: {
detectors: {
cusum: { slack: 0.5, threshold: 5.0 }
}
}
The response should match the severity:
| Severity | Response | |----------|----------| | Low | Log the anomaly for review | | Medium | Alert the operator, tighten policy | | High | Suspend the session, require manual review | | Critical | Kill the session immediately |
sentinel: {
onAlert: (alert) => {
if (alert.severity === 'critical') {
killSession(alert.sessionId);
} else if (alert.severity === 'high') {
guard.loadPolicy('./policies/restrictive.yaml');
alertOperator(alert);
} else {
logAnomaly(alert);
}
}
}
Anomaly detection produces false positives. A legitimate change in the user's task can cause the agent's behavior to shift. Tune thresholds based on your false positive tolerance. Start with higher thresholds (fewer alerts) and lower them as you gain confidence in the baseline.
Explore more guides on AI agent safety, prompt injection, and building secure systems.
View All Guides