← Back to Learn
agent-safetybest-practicesguardrails

Where to place safety checks in AI agent pipelines

Authensor

An AI agent pipeline has multiple stages where safety checks can be placed. Each stage catches different threats. Placing checks at every stage creates defense in depth.

The agent pipeline

User Input → Input Processing → LLM Reasoning → Tool Call → Tool Execution → Response → User Output

Safety checks can be placed at four points:

Stage 1: Input scanning

Where: Before the agent processes user input. What it catches: Direct prompt injection, PII in user input, malicious payloads.

const input = receiveUserInput();
const scan = aegis.scan(input);
if (scan.threats.length > 0) {
  return "I cannot process that input.";
}
agent.process(input);

Also scan retrieved documents (RAG) at this stage:

const documents = await retrieveDocuments(query);
const safeDocuments = documents.filter(doc => {
  const scan = aegis.scan(doc.content);
  return scan.threats.length === 0;
});

Stage 2: Pre-execution (tool call evaluation)

Where: After the LLM decides to call a tool, before the tool executes. What it catches: Unauthorized actions, policy violations, tool misuse, budget overruns.

This is the primary enforcement point:

const decision = guard(toolName, args);
if (decision.action !== 'allow') {
  // Do not execute the tool
}

Pre-execution is where the policy engine, rate limiting, budget controls, and approval workflows operate.

Stage 3: Post-execution (response scanning)

Where: After the tool executes, before the result is returned to the agent. What it catches: Indirect prompt injection in tool responses, sensitive data in tool output.

const result = await executeTool(toolName, args);
const scan = aegis.scan(JSON.stringify(result));
if (scan.threats.length > 0) {
  // Do not pass the response to the agent
  return { error: "Tool response contained unsafe content" };
}
return result;

This is critical for defending against indirect prompt injection. A tool response from a compromised server or a document with embedded instructions is caught here.

Stage 4: Output filtering

Where: Before the agent's response reaches the user or external system. What it catches: Leaked credentials, PII in responses, harmful content.

const response = await agent.generate();
const filtered = filterOutput(response);
sendToUser(filtered);

Which stages are mandatory?

For a minimum viable safety deployment:

  • Stage 2 (pre-execution): Always. This is the core enforcement point.
  • Stage 1 (input scanning): Strongly recommended. Catches attacks before they reach the LLM.

For production:

  • All four stages should have checks.
  • Stage 3 is particularly important for agents that use RAG or connect to external tools.

Performance impact

Each stage adds latency:

| Stage | Typical latency | Impact | |-------|----------------|--------| | Input scanning | <1ms | Negligible | | Pre-execution | <1ms | Negligible | | Post-execution | <1ms | Negligible | | Output filtering | <1ms | Negligible |

Total overhead for all four stages: under 5ms. LLM inference takes 500ms to 5 seconds. The safety checks are invisible in the overall pipeline latency.

Keep learning

Explore more guides on AI agent safety, prompt injection, and building secure systems.

View All Guides