← Back to Learn
guardrailsbest-practicestutorial

Integration Testing AI Agent Guardrails

Authensor

Unit tests verify that individual safety components work in isolation. Integration tests verify that they work correctly when connected together and when processing realistic agent workflows. A policy engine that passes all unit tests might still fail in integration if the envelope format from the SDK does not match what the engine expects.

What Integration Tests Cover

Integration tests exercise the full safety pipeline:

  1. SDK creates an action envelope
  2. Control plane receives the envelope
  3. Aegis scans the action parameters
  4. Policy engine evaluates the envelope
  5. Approval workflow triggers if required
  6. Receipt is created and stored
  7. Response returns to the agent

Each step must correctly consume the output of the previous step.

Test Environment

Set up a test environment with all safety components running. Use a test database for receipts. Configure Aegis with production-equivalent rules. Load a test policy that covers the scenarios you want to exercise. The environment should mirror production as closely as possible.

Test Scenarios

Happy Path

Submit a legitimate action envelope that should be allowed. Verify the action is permitted, a receipt is created, and the response contains the correct decision.

Blocked Action

Submit an envelope that should be denied by policy. Verify the action is blocked, the denial reason is correct, and a receipt records the denial.

Content Safety Trigger

Submit an envelope with parameters that should trigger Aegis. Verify the scan detects the issue, the action is handled according to policy (blocked or flagged), and the scan result appears in the receipt.

Approval Workflow

Submit an envelope that requires approval. Verify the approval request is created, simulate an approval response, and verify the action proceeds after approval.

Automation

Run integration tests in CI on every change to any safety component. Use Docker Compose or similar tooling to spin up the test environment automatically. Tear it down after tests complete.

# docker-compose.test.yml
services:
  postgres:
    image: postgres:16
  control-plane:
    build: ./packages/control-plane
    depends_on: [postgres]
  test-runner:
    build: ./tests/integration
    depends_on: [control-plane]

Failure Mode Testing

Integration tests should also cover failure modes. What happens when the database is unreachable? What happens when Aegis scanning times out? Verify that the system fails closed in each case.

Integration tests are where you discover the gaps between what you think the system does and what it actually does.

Keep learning

Explore more guides on AI agent safety, prompt injection, and building secure systems.

View All Guides