A safety policy is a set of rules that controls what your AI agent is allowed to do. Each rule matches a tool call pattern and assigns an action: allow, block, or escalate. The policy engine evaluates rules synchronously and deterministically. No LLM is involved in the decision.
Policies are YAML files with a version and a list of rules:
version: "1"
rules:
- tool: "database.query"
action: allow
when:
args.query:
startsWith: "SELECT"
reason: "Read-only queries are safe"
- tool: "database.query"
action: block
when:
args.query:
matches: "DROP|TRUNCATE|DELETE"
reason: "Destructive queries are blocked"
- tool: "database.query"
action: escalate
reason: "Other database queries need approval"
Rules are evaluated top to bottom. The first matching rule wins. If no rule matches, the default action is block. This is fail-closed design: anything not explicitly permitted is denied.
Place your most specific rules first and your most general rules last:
rules:
# Specific: block destructive patterns
- tool: "shell.execute"
action: block
when:
args.command:
matches: "rm -rf|mkfs|format"
# General: allow other shell commands
- tool: "shell.execute"
action: allow
The when block supports several matchers:
| Matcher | Description |
|---------|-------------|
| equals | Exact string match |
| startsWith | Prefix match |
| endsWith | Suffix match |
| matches | Regular expression match |
| contains | Substring match |
| gt, lt, gte, lte | Numeric comparisons |
Use * to match multiple tools:
- tool: "*.delete"
action: escalate
reason: "All delete operations require approval"
- tool: "file.*"
action: allow
Limit spending with numeric conditions:
- tool: "payment.send"
action: allow
when:
args.amount:
lte: 100
- tool: "payment.send"
action: escalate
reason: "Payments over $100 need approval"
Before deploying, test your policy against known inputs:
const guard = createGuard({ policyPath: './policy.yaml' });
// Verify your rules work as expected
assert(guard('database.query', { query: 'SELECT * FROM users' }).action === 'allow');
assert(guard('database.query', { query: 'DROP TABLE users' }).action === 'block');
assert(guard('database.query', { query: 'UPDATE users SET name="x"' }).action === 'escalate');
Write tests for every rule. Safety policies are critical infrastructure and should be treated with the same rigor as access control lists.
Explore more guides on AI agent safety, prompt injection, and building secure systems.
View All Guides