Effective monitoring requires well-tuned alert rules that surface real issues without overwhelming the team with false positives. This template provides starting points for common agent monitoring scenarios. Tune the thresholds based on your agent's baseline behavior.
alerts:
- name: "high-action-rate"
description: "Agent executing actions faster than expected"
metric: "actions_per_minute"
condition: "value > baseline_mean + (3 * baseline_stddev)"
window: "5m"
severity: "warning"
- name: "action-rate-spike"
description: "Sudden increase in action rate"
metric: "actions_per_minute"
condition: "rate_of_change > 200%"
window: "1m"
severity: "critical"
- name: "high-denial-rate"
description: "Policy denying more actions than usual"
metric: "denial_rate"
condition: "value > 0.3"
window: "10m"
severity: "warning"
- name: "repeated-denied-tool"
description: "Agent repeatedly attempting a denied action"
metric: "consecutive_denials_same_tool"
condition: "value >= 5"
window: "5m"
severity: "critical"
- name: "error-rate-spike"
description: "Tool execution errors increasing"
metric: "error_rate"
condition: "value > 0.1"
window: "5m"
severity: "warning"
- name: "tool-distribution-shift"
description: "Agent using a different mix of tools than baseline"
metric: "tool_usage_distribution"
condition: "kl_divergence > 0.5"
window: "1h"
severity: "warning"
- name: "new-tool-usage"
description: "Agent calling a tool it has never used before"
metric: "unique_tools"
condition: "new_tool_detected"
window: "1m"
severity: "info"
- name: "session-length-anomaly"
description: "Agent sessions running longer than expected"
metric: "session_duration"
condition: "value > baseline_p99"
window: "per_session"
severity: "warning"
- name: "aegis-detection-spike"
description: "Content scanner flagging more content than baseline"
metric: "aegis_detections_per_minute"
condition: "value > baseline_mean + (2 * baseline_stddev)"
window: "10m"
severity: "warning"
- name: "pii-exposure-detected"
description: "PII detected in agent output"
metric: "pii_detection"
condition: "count > 0"
window: "1m"
severity: "critical"
Start with these rules and adjust thresholds after two weeks of baseline data collection. Alert fatigue is the primary risk: if your team ignores alerts because there are too many, the monitoring system has failed. Tune aggressively to minimize false positives.
Explore more guides on AI agent safety, prompt injection, and building secure systems.
View All Guides