← Back to Learn
best-practicescomplianceguardrails

AI Safety Review Process Template

Authensor

A safety review is a structured evaluation of an AI agent's risk profile and the controls in place to mitigate those risks. Unlike ad hoc testing, a safety review follows a repeatable process that produces documented evidence of due diligence.

Review Triggers

Conduct a safety review when:

  • A new agent is being deployed for the first time
  • An existing agent's capabilities are expanded (new tools, broader permissions)
  • The underlying model is changed or updated
  • A safety incident has occurred
  • Regulatory requirements change
  • Six months have passed since the last review

Phase 1: Scope Definition

Document the agent's purpose, capabilities, and deployment context.

  • What is the agent designed to do?
  • What tools does it have access to?
  • What data can it read and modify?
  • Who are its users?
  • What is the worst plausible outcome of a failure?

Phase 2: Risk Assessment

For each identified risk, document likelihood, impact, and existing mitigations.

  • Prompt injection: Can untrusted input reach the agent?
  • Data exposure: Does the agent handle sensitive information?
  • Unauthorized actions: Could the agent take harmful actions?
  • Availability: What happens if the agent fails or becomes unavailable?
  • Regulatory: Does the deployment trigger compliance obligations?

Phase 3: Control Validation

Verify that controls are implemented and functioning.

  • Test the policy engine with both allowed and denied actions
  • Verify content scanning catches known attack patterns
  • Confirm approval workflows route to the correct approvers
  • Validate that the audit trail records all actions with correct hashes
  • Test the kill switch and confirm response time

Phase 4: Red Team Exercise

Conduct targeted adversarial testing.

  • Attempt prompt injection through all input channels
  • Test parameter manipulation on restricted tools
  • Verify that denied actions remain denied under various conditions
  • Attempt to bypass approval workflows

Phase 5: Documentation and Sign-Off

  • Document all findings, including both passing and failing checks
  • Record remediation actions for any identified gaps
  • Obtain sign-off from the safety reviewer and the agent owner
  • Schedule the next review date

Store review artifacts alongside the agent's audit trail. They form part of the compliance record and demonstrate organizational diligence to regulators.

Keep learning

Explore more guides on AI agent safety, prompt injection, and building secure systems.

View All Guides