← Back to Learn
content-safetytutorialguardrails

Building custom Aegis detection rules

Authensor

Aegis ships with detectors for prompt injection, PII, credentials, and code injection. For domain-specific threats, you need custom detection rules. This guide shows how to build them.

When to write custom rules

Write custom rules when your domain has threats that the default detectors do not cover:

  • A finance application needs to detect unauthorized trading instructions
  • A healthcare application needs to detect drug name manipulation
  • An internal tool needs to detect references to confidential project names
  • A customer service agent needs to detect social engineering patterns

Custom rule structure

A custom rule has:

  • name: Unique identifier for the rule
  • type: The threat category (prompt_injection, pii, credentials, or a custom type)
  • pattern: A regular expression that matches the threat
  • score: Confidence score (0 to 1) when the pattern matches
  • description: Human-readable explanation

Adding custom rules

const guard = createGuard({
  policy,
  aegis: {
    enabled: true,
    customPatterns: [
      {
        name: 'trading_instruction',
        type: 'domain_specific',
        pattern: /\b(buy|sell|trade|execute order|market order|limit order)\b.*\b(shares|stocks|options|futures|crypto)\b/i,
        score: 0.85,
        description: 'Unauthorized trading instruction detected',
      },
      {
        name: 'project_codename',
        type: 'confidential',
        pattern: /\b(Project Phoenix|Project Atlas|Operation Blue)\b/i,
        score: 0.90,
        description: 'Confidential project name detected in agent context',
      },
      {
        name: 'social_engineering',
        type: 'prompt_injection',
        pattern: /\b(pretend|act as if|for testing purposes|hypothetically)\b.*\b(admin|root|superuser|override)\b/i,
        score: 0.75,
        description: 'Social engineering pattern detected',
      },
    ]
  }
});

Writing effective patterns

Be specific enough to avoid false positives

Bad: /buy|sell/i (matches "buy groceries", "sell old furniture") Good: /\bbuy\b.*\bshares\b|\bsell\b.*\bstocks\b/i (matches trading context)

Anchor to word boundaries

Use \b to match whole words. Without word boundaries, sell matches selling, herself, and counselling.

Score based on confidence

A precise pattern (long, specific) deserves a high score (0.9). A broad pattern (short, general) deserves a lower score (0.5-0.7). The threshold setting determines which scores trigger a block.

Test against real data

Before deploying custom rules, test them against a sample of real agent interactions:

const testCases = [
  { input: "Please buy 100 shares of ACME", expected: true },
  { input: "Where can I buy groceries?", expected: false },
  { input: "Sell your old furniture on Marketplace", expected: false },
  { input: "Execute a market order for 500 shares", expected: true },
];

for (const test of testCases) {
  const scan = aegis.scan(test.input);
  const detected = scan.threats.some(t => t.name === 'trading_instruction');
  assert.strictEqual(detected, test.expected, `Failed on: ${test.input}`);
}

Combining custom and default rules

Custom rules run alongside the default detectors. A scan checks all detectors (default + custom) and returns all threats found. The highest-scoring threat determines whether the content is blocked.

Updating rules

Custom rules can be updated without redeploying the application. Store them in a configuration file or fetch them from the control plane:

aegis: {
  enabled: true,
  customPatternsPath: './aegis-rules.json',  // Hot-reloaded on change
}

This lets your security team update detection rules without a code deployment.

Keep learning

Explore more guides on AI agent safety, prompt injection, and building secure systems.

View All Guides