← Back to Learn
prompt-injectionbest-practicescontent-safetyguardrails

How to prevent prompt injection in production

Authensor

Prompt injection is the primary attack vector against AI agents in production. Defending against it requires multiple layers because no single technique catches everything. This guide covers a production-grade defense strategy.

Layer 1: Input scanning

Scan all text before the agent processes it. This includes user messages, tool responses, and retrieved documents.

const guard = createGuard({
  policy,
  aegis: {
    enabled: true,
    threshold: 0.7,
    detectors: ['prompt_injection'],
    scanResponses: true,  // Also scan tool responses
  }
});

Aegis catches known injection patterns: instruction overrides, role impersonation, delimiter attacks, and encoding tricks. Set the threshold based on your tolerance for false positives.

Layer 2: Policy enforcement

Even if an injection succeeds and changes the agent's goal, the policy engine blocks unauthorized actions. The agent might want to exfiltrate data, but if the policy blocks outbound API calls, the attack fails.

rules:
  # Only allow tools the agent actually needs
  - tool: "search.web"
    action: allow
  - tool: "file.read"
    action: allow
    when:
      args.path:
        startsWith: "/data/public/"
  # Block everything else
  - tool: "*"
    action: block
    reason: "Tool not in allowlist"

Least-privilege policies limit the blast radius of a successful injection.

Layer 3: Output filtering

Scan the agent's output before it reaches the user or external systems. A successful injection might cause the agent to include sensitive data in its response.

const output = await agent.generate(input);
const scan = aegis.scan(output);

if (scan.threats.some(t => t.type === 'pii' || t.type === 'credentials')) {
  return "I cannot provide that information.";
}

Layer 4: Behavioral monitoring

Track the agent's behavior pattern. A successful injection often causes a detectable change: different tools being called, higher denial rate, unusual argument patterns.

Sentinel detects these shifts in real time. When an anomaly is detected, the system can tighten policies or terminate the session.

Layer 5: Structural defenses

Reduce the attack surface structurally:

  • Separate user input from system instructions using clear delimiters
  • Do not include sensitive data in the system prompt
  • Use structured tool calling instead of free-text tool invocation
  • Limit the agent's context window to reduce the impact of injected instructions

Testing your defenses

Regularly test with known injection techniques:

  • Basic instruction overrides
  • Role impersonation
  • Delimiter escape attacks
  • Base64-encoded instructions
  • Indirect injection through documents

Update your Aegis patterns as new techniques emerge. Run red team exercises quarterly.

Keep learning

Explore more guides on AI agent safety, prompt injection, and building secure systems.

View All Guides