A/B testing safety policies lets you compare two policy configurations against real traffic to determine which one better balances safety and usability. This is more informative than testing against synthetic data because real agent workloads contain patterns and edge cases that synthetic tests miss.
Define two policy versions: the control (current production policy) and the treatment (proposed new policy). Route a percentage of agent traffic to each version. Ensure the routing is deterministic per agent session so that an agent does not switch policies mid-workflow.
experiment:
name: "stricter-egress-rules"
control: "policy-v12"
treatment: "policy-v13"
traffic_split: { control: 90, treatment: 10 }
duration: "7d"
Track both safety metrics and usability metrics:
Safety metrics: Number of blocked malicious actions, number of policy violations caught, false negative rate (malicious actions that were allowed).
Usability metrics: Number of legitimate actions blocked (false positives), approval workflow completion rate, user-reported friction, task completion rate.
Policy A/B tests require sufficient sample size to detect meaningful differences. Calculate the required sample size before starting the experiment based on your expected effect size and acceptable error rates. Running the experiment for too short a period leads to inconclusive results.
Never A/B test by reducing safety. The treatment policy should be equal to or stricter than the control. If the experiment involves relaxing a restriction, run it in shadow mode where both policies evaluate every action but only the control's decision is enforced. Log the treatment's decision for analysis without affecting actual behavior.
Compare the metrics between control and treatment. If the treatment reduces false positives without increasing false negatives, it is a better policy. If it increases either error type, investigate the specific cases to understand why.
After the experiment period, review the results and decide: adopt the treatment, keep the control, or iterate with a modified treatment. Document the decision and its rationale as part of the policy change record.
A/B testing turns policy decisions from opinions into data-driven choices.
Explore more guides on AI agent safety, prompt injection, and building secure systems.
View All Guides