← Back to Learn
policy-enginetutorialbest-practicesguardrails

How to create your first safety policy for AI agents

Authensor

A safety policy is a set of rules that controls what your AI agent is allowed to do. Each rule matches a tool call pattern and assigns an action: allow, block, or escalate. The policy engine evaluates rules synchronously and deterministically. No LLM is involved in the decision.

Policy file structure

Policies are YAML files with a version and a list of rules:

version: "1"
rules:
  - tool: "database.query"
    action: allow
    when:
      args.query:
        startsWith: "SELECT"
    reason: "Read-only queries are safe"

  - tool: "database.query"
    action: block
    when:
      args.query:
        matches: "DROP|TRUNCATE|DELETE"
    reason: "Destructive queries are blocked"

  - tool: "database.query"
    action: escalate
    reason: "Other database queries need approval"

Rule evaluation order

Rules are evaluated top to bottom. The first matching rule wins. If no rule matches, the default action is block. This is fail-closed design: anything not explicitly permitted is denied.

Place your most specific rules first and your most general rules last:

rules:
  # Specific: block destructive patterns
  - tool: "shell.execute"
    action: block
    when:
      args.command:
        matches: "rm -rf|mkfs|format"

  # General: allow other shell commands
  - tool: "shell.execute"
    action: allow

Matching conditions

The when block supports several matchers:

| Matcher | Description | |---------|-------------| | equals | Exact string match | | startsWith | Prefix match | | endsWith | Suffix match | | matches | Regular expression match | | contains | Substring match | | gt, lt, gte, lte | Numeric comparisons |

Wildcard tools

Use * to match multiple tools:

- tool: "*.delete"
  action: escalate
  reason: "All delete operations require approval"

- tool: "file.*"
  action: allow

Budget constraints

Limit spending with numeric conditions:

- tool: "payment.send"
  action: allow
  when:
    args.amount:
      lte: 100
- tool: "payment.send"
  action: escalate
  reason: "Payments over $100 need approval"

Testing your policy

Before deploying, test your policy against known inputs:

const guard = createGuard({ policyPath: './policy.yaml' });

// Verify your rules work as expected
assert(guard('database.query', { query: 'SELECT * FROM users' }).action === 'allow');
assert(guard('database.query', { query: 'DROP TABLE users' }).action === 'block');
assert(guard('database.query', { query: 'UPDATE users SET name="x"' }).action === 'escalate');

Write tests for every rule. Safety policies are critical infrastructure and should be treated with the same rigor as access control lists.

Keep learning

Explore more guides on AI agent safety, prompt injection, and building secure systems.

View All Guides