← Back to Learn
guardrailsbest-practicespolicy-engine

Rate limiting AI agent actions

Authensor

Rate limiting restricts how many tool calls an AI agent can make within a time window. Without rate limits, a misconfigured agent can execute thousands of actions per minute, exhausting resources, overwhelming downstream systems, or exfiltrating large amounts of data.

Why rate limiting matters

An agent in a loop can:

  • Send hundreds of API requests per second
  • Read every file in a directory tree
  • Execute thousands of database queries
  • Burn through API usage quotas in minutes
  • Overwhelm approval reviewers with escalation requests

Rate limiting puts a ceiling on how fast the agent can act, bounding the damage from any failure mode.

Per-tool rate limits

Different tools have different risk profiles. Apply rate limits per tool:

rules:
  - tool: "search.web"
    action: allow
    rateLimit:
      maxCalls: 30
      windowSeconds: 60

  - tool: "file.read"
    action: allow
    rateLimit:
      maxCalls: 50
      windowSeconds: 60

  - tool: "email.send"
    action: allow
    rateLimit:
      maxCalls: 5
      windowSeconds: 3600  # 5 emails per hour

Global rate limits

Set an overall ceiling on actions per session:

const guard = createGuard({
  policy,
  rateLimit: {
    global: { maxCalls: 200, windowSeconds: 60 },
  }
});

The global limit catches cases where the agent spreads actions across many tools to stay under per-tool limits.

What happens when limits are hit

When the rate limit is exceeded, the guard returns a block decision:

const decision = guard('search.web', { query: 'test' });
// If rate limit exceeded:
// decision.action === 'block'
// decision.reason === 'Rate limit exceeded: search.web (30/min)'

The agent receives the block and can choose to wait or inform the user.

Sliding window vs fixed window

Fixed window: Counts actions within fixed time intervals (e.g., 0:00 to 1:00, 1:00 to 2:00). Simple but allows bursts at window boundaries.

Sliding window: Counts actions within a rolling time period (the last 60 seconds from now). Smoother, prevents boundary bursts.

Use sliding windows for safety-critical rate limits.

Budget limits

For tools that have monetary costs (API calls, cloud services), use budget-based limits instead of or in addition to rate limits:

- tool: "cloud.compute"
  action: allow
  budget:
    maxCost: 50.00
    windowHours: 24
    costPerCall: 0.10

Monitoring rate limit hits

Track rate limit events in Sentinel. A spike in rate limit hits may indicate:

  • A runaway loop in the agent logic
  • A denial-of-service attempt through the agent
  • A data exfiltration attempt using rapid small requests

Alert on unusual patterns and investigate the root cause.

Keep learning

Explore more guides on AI agent safety, prompt injection, and building secure systems.

View All Guides