Fuzz testing feeds randomly generated or mutated inputs to a system and observes how it responds. For AI safety infrastructure, fuzzing discovers edge cases where the policy engine, content scanner, or input validation behaves unexpectedly. These edge cases often correspond to real attack vectors.
Action envelopes: Generate envelopes with random action types, malformed resource paths, unexpected metadata types, missing required fields, and oversized payloads. Verify the policy engine handles each gracefully without crashing or bypassing safety checks.
Content scanner inputs: Feed Aegis with randomized strings, encoded payloads, Unicode edge cases, and extremely long inputs. Verify it produces a valid result (clean or flagged) for every input without errors.
API endpoints: Send malformed HTTP requests to the control plane. Verify it returns appropriate error responses without leaking internal state or crashing.
Random generation: Generate completely random inputs within the schema structure. Good for finding crashes and unhandled exceptions.
Mutation-based: Start with valid inputs and mutate them: change types, truncate fields, swap values, inject special characters. Good for finding logic errors near valid input boundaries.
Grammar-based: Use the action envelope JSON schema to generate structurally valid but semantically unusual inputs. This produces inputs that pass schema validation but may trigger unexpected behavior in policy evaluation.
A fuzzer needs an oracle that determines whether a test passed or failed. Common oracles for safety testing:
Run a short fuzz campaign (1000 to 10000 inputs) on every CI build. Run longer campaigns (millions of inputs) periodically or before releases. Save any inputs that cause failures and add them to the regression test suite.
# Run 10000 random envelopes through policy evaluation
authensor fuzz policy --count 10000 --policy ./policies/production.yaml
Common fuzz findings in safety systems include: null pointer errors on missing fields, integer overflow in condition evaluation, regex catastrophic backtracking in content scanners, and JSON parsing errors on deeply nested inputs.
Every crash found by a fuzzer is a crash that an attacker could have found first.
Explore more guides on AI agent safety, prompt injection, and building secure systems.
View All Guides