← Back to Learn
policy-enginebest-practicestutorial

Snapshot Testing Policy Evaluation Results

Authensor

Snapshot testing captures the output of policy evaluation for a set of representative inputs and stores it as a reference. Future test runs compare current output against the stored snapshot. Any difference triggers a test failure, alerting you that behavior has changed.

How It Works

  1. Define a set of representative action envelopes that cover your policy's important behaviors
  2. Evaluate each envelope against the policy
  3. Store the results (decision, matched rule, conditions evaluated) as a snapshot file
  4. On subsequent runs, evaluate the same envelopes and compare against the stored snapshot
  5. If results differ, the test fails and shows the difference

Snapshot Format

Store snapshots in a human-readable format so that reviewers can understand what changed:

# Snapshot: production-policy-v42

## Envelope: search-web-basic
Action: search.web
Principal: research-agent
Decision: allow
Matched Rule: rule-3 (search.web allow)

## Envelope: data-export-external
Action: data.export
Destination: external-api.com
Decision: deny
Matched Rule: default-deny

## Envelope: payment-high-value
Action: payment.send
Amount: 5000
Decision: require_approval
Matched Rule: rule-7 (payment approval threshold)

When to Use Snapshots

Snapshot tests are most valuable during policy updates. Before changing a policy, run the snapshot tests to confirm the current behavior matches expectations. After changing the policy, run them again. The diff shows exactly which behaviors changed.

Updating Snapshots

When a policy change intentionally alters behavior, update the snapshot to reflect the new expected behavior. Review the snapshot diff carefully: every changed line represents a behavior change. Intended changes should be updated. Unintended changes indicate a problem with the policy modification.

Complementing Other Tests

Snapshot tests do not replace unit tests or property-based tests. They serve a different purpose: detecting unintended behavioral changes. Unit tests verify specific expectations. Property-based tests verify invariants. Snapshot tests verify stability.

Limitations

Snapshot tests are fragile to formatting changes. If the policy engine changes how it reports matched rules (renaming a field, for example), all snapshots break even though behavior has not changed. Minimize snapshot fragility by capturing only the fields that matter: decision, matched rule ID, and key evaluation metadata.

CI Integration

Include snapshot tests in the CI pipeline. A failed snapshot test should require explicit approval to update, ensuring that behavioral changes are reviewed before deployment.

Snapshot tests make policy changes visible. Visible changes are reviewable changes.

Keep learning

Explore more guides on AI agent safety, prompt injection, and building secure systems.

View All Guides