A safety policy is only useful if it is clear enough to be evaluated consistently, specific enough to catch real threats, and simple enough to maintain over time. Many teams write policies that look comprehensive on paper but fail in practice because they are ambiguous, overly broad, or impossible to test.
The safest starting point is a default-deny policy. No action is allowed unless a rule explicitly permits it. This is more work upfront because you must create allow rules for every legitimate action, but it eliminates the class of vulnerabilities where an attacker finds an action that no rule covers.
default_effect: "deny"
rules:
- action: "search.web"
effect: "allow"
- action: "file.read"
resources: ["public/*"]
effect: "allow"
A rule that allows "file.read" on all resources is almost as dangerous as no rule at all. Specify which resources each action can target. Use path patterns to scope access to specific directories, tables, or API endpoints.
Static allow/deny rules cannot capture contextual requirements. Conditions let you add runtime checks: is the request within business hours, is the total cost below a threshold, has the user been authenticated recently.
rules:
- action: "payment.send"
conditions:
amount_max: 100
require_recent_auth: true
effect: "allow"
Every rule should have a corresponding test case that verifies it works as intended. If you cannot write a test for a rule, the rule is probably too vague. Authensor's test harness lets you evaluate policies against sample envelopes without running a full deployment.
Policies are living documents. Review them regularly against actual usage data. Identify rules that never match (dead rules), rules that match too broadly, and gaps where actions are denied that should be allowed. Use shadow evaluation to test policy changes before deploying them.
Good policies are boring. They are specific, testable, and unambiguous. Save the creativity for your agents.
Explore more guides on AI agent safety, prompt injection, and building secure systems.
View All Guides