← Back to Learn
monitoringexplaineragent-safety

What is AI agent drift detection?

Authensor

Drift detection is the process of identifying when an AI agent's behavior gradually changes over time, moving away from its established baseline. Unlike anomaly detection, which flags sudden spikes, drift detection catches slow, persistent changes that accumulate into significant deviations.

Why drift matters

An agent's behavior can drift for several reasons:

  • Model updates: A new model version behaves differently than the previous one
  • Context accumulation: Over a long session, the agent's context fills with information that subtly changes its behavior
  • Gradual prompt injection: An attacker introduces small, innocuous-seeming changes over many interactions
  • Tool changes: An upstream tool's behavior changes, causing the agent to adapt
  • Data distribution shift: The inputs the agent receives change over time

Drift vs anomaly

An anomaly is a sudden, sharp deviation. The agent's denial rate jumps from 5% to 50% in one minute. This is easy to detect.

Drift is a slow change. The agent's denial rate goes from 5% to 6% to 7% over a week. Each individual measurement looks normal. Only when you compare the current state to the original baseline does the drift become visible.

Detection methods

EWMA (Exponentially Weighted Moving Average): Maintains a smoothed average that gives more weight to recent values. The smoothing factor (alpha) controls sensitivity to drift. A low alpha catches slow drift; a high alpha focuses on recent changes.

CUSUM (Cumulative Sum): Accumulates deviations from the expected value. Small deviations that individually look insignificant add up in the cumulative sum. When the sum exceeds a threshold, drift is detected.

Baseline comparison: Periodically compare current metric distributions to the original baseline using statistical tests. This catches drift that both EWMA and CUSUM might miss.

What to monitor for drift

  • Tool usage distribution: The proportion of calls to each tool
  • Argument patterns: The typical values of tool arguments
  • Success/failure ratios: The rate of blocked vs allowed actions
  • Response patterns: The types of responses the agent generates
  • Session length: How long sessions last on average

Sentinel and drift detection

Sentinel tracks these metrics per session and across sessions. When drift is detected, it raises an alert with the metric, the baseline value, the current value, and the rate of change.

sentinel: {
  enabled: true,
  drift: {
    enabled: true,
    baselineWindow: 7 * 24 * 60 * 60 * 1000,  // 7-day baseline
    threshold: 0.15,  // 15% deviation triggers alert
  }
}

Response to drift

When drift is detected, investigate the cause before taking action. Drift is not always bad. A model update might cause the agent to use different tools, which is expected. But unexpected drift after no known changes should be investigated as a potential security issue.

Keep learning

Explore more guides on AI agent safety, prompt injection, and building secure systems.

View All Guides