← Back to Learn
red-teamcontent-safetybest-practices

Fuzz Testing AI Agent Inputs

Authensor

Fuzz testing feeds randomly generated or mutated inputs to a system and observes how it responds. For AI safety infrastructure, fuzzing discovers edge cases where the policy engine, content scanner, or input validation behaves unexpectedly. These edge cases often correspond to real attack vectors.

What to Fuzz

Action envelopes: Generate envelopes with random action types, malformed resource paths, unexpected metadata types, missing required fields, and oversized payloads. Verify the policy engine handles each gracefully without crashing or bypassing safety checks.

Content scanner inputs: Feed Aegis with randomized strings, encoded payloads, Unicode edge cases, and extremely long inputs. Verify it produces a valid result (clean or flagged) for every input without errors.

API endpoints: Send malformed HTTP requests to the control plane. Verify it returns appropriate error responses without leaking internal state or crashing.

Fuzzing Strategies

Random generation: Generate completely random inputs within the schema structure. Good for finding crashes and unhandled exceptions.

Mutation-based: Start with valid inputs and mutate them: change types, truncate fields, swap values, inject special characters. Good for finding logic errors near valid input boundaries.

Grammar-based: Use the action envelope JSON schema to generate structurally valid but semantically unusual inputs. This produces inputs that pass schema validation but may trigger unexpected behavior in policy evaluation.

Oracles

A fuzzer needs an oracle that determines whether a test passed or failed. Common oracles for safety testing:

  • No crash: The system should never crash regardless of input
  • Valid response: Every input should produce a valid decision (allow, deny, or error), never an undefined state
  • Fail-closed: Malformed inputs should result in deny, not allow
  • No information leak: Error responses should not contain stack traces, internal paths, or configuration details

Integration with CI

Run a short fuzz campaign (1000 to 10000 inputs) on every CI build. Run longer campaigns (millions of inputs) periodically or before releases. Save any inputs that cause failures and add them to the regression test suite.

# Run 10000 random envelopes through policy evaluation
authensor fuzz policy --count 10000 --policy ./policies/production.yaml

Findings

Common fuzz findings in safety systems include: null pointer errors on missing fields, integer overflow in condition evaluation, regex catastrophic backtracking in content scanners, and JSON parsing errors on deeply nested inputs.

Every crash found by a fuzzer is a crash that an attacker could have found first.

Keep learning

Explore more guides on AI agent safety, prompt injection, and building secure systems.

View All Guides