There is no single correct architecture for AI agent safety. The right pattern depends on your deployment model, number of agents, compliance requirements, and operational maturity. This guide covers the most common patterns.
The safety layer runs in the same process as the agent. Every tool call passes through the guard function before execution.
[Agent Process]
Agent Logic → Guard → Tool Execution
↓
Receipts (in-memory or external)
When to use: Single-agent systems, local development, edge deployments where network calls are undesirable.
Advantages: No network latency. No additional infrastructure. Simple to deploy.
Disadvantages: No centralized policy management. Each agent instance manages its own state.
The safety layer runs as a separate process alongside the agent. The agent calls the sidecar for every tool evaluation.
[Agent Process] → HTTP → [Sidecar Process]
Guard + Aegis + Sentinel
↓
Receipts → Database
When to use: Containerized deployments where you want to separate safety from agent logic. Multiple agents on the same host.
Advantages: Safety updates do not require agent redeployment. Language-agnostic (any agent can call the HTTP API).
Disadvantages: Network latency on every tool call (usually 1-5ms on localhost).
All agents connect to a centralized gateway that mediates all tool calls.
[Agent A] → [Gateway] → [MCP Servers / Tools]
[Agent B] →
[Agent C] →
When to use: Multi-agent systems. Organization-wide policy enforcement. Compliance requirements for centralized logging.
Advantages: Single enforcement point. Centralized audit trail. Easier policy management.
Disadvantages: Single point of failure (mitigate with redundancy). All traffic routes through one service.
Combine multiple patterns for defense in depth:
[Agent] → [Embedded Guard] → [MCP Gateway] → [MCP Server]
(app rules) (org rules) (input validation)
The embedded guard enforces application-specific rules. The gateway enforces organization-wide rules. The MCP server validates inputs at the tool level.
When to use: Production deployments with high security requirements. Regulated environments.
A centralized control plane manages policies, stores receipts, and handles approvals. Distributed agents connect to the control plane for policy updates and receipt submission.
[Control Plane]
Policy API ← [Agent A fetches policy]
Receipt API ← [Agent A submits receipts]
Approval API ← [Agent A checks approval status]
When to use: Large-scale deployments with many agent instances. Teams that need centralized policy management.
Start with the embedded guard (Pattern 1). It requires the least infrastructure and covers the core safety requirements. As you scale, add the control plane (Pattern 5) for centralized management. For multi-agent systems with MCP, add the gateway (Pattern 3). For high-security environments, layer them all (Pattern 4).
Explore more guides on AI agent safety, prompt injection, and building secure systems.
View All Guides