Rate Limiting with Claude Code

Rate limiting is one of those features that seems simple until you're debugging why legitimate users are getting 429s or why abusive users are still getting through. Claude Code implements the mechanics well, but you need to give it the right algorithm and requirements.

Choose the algorithm first

Before asking Claude to implement anything, specify what you need:

I need rate limiting. Help me choose the algorithm:
- Fixed window: simple, but burst at window boundaries is possible
- Sliding window: more accurate, slightly more complex
- Token bucket: allows burst up to bucket size, then throttles

My requirements:
- [X] requests per [time period] per [user/IP/API key]
- Burst allowed: [yes/no, how much]
- State storage: [Redis/in-memory/database]

Which algorithm fits?

Claude will correctly match algorithm to requirements. Most APIs need sliding window or token bucket — fixed window has the burst problem at window boundaries that can be exploited.

The implementation prompt

Implement [algorithm] rate limiting. Requirements:
- Limit: [N] requests per [window]
- Key: per [user ID/API key/IP address]
- Storage: [Redis with expiry / in-memory with LRU eviction]
- Headers: include X-RateLimit-Limit, X-RateLimit-Remaining, X-RateLimit-Reset
- Response: 429 with Retry-After header when limit exceeded
- Thread-safe: handle concurrent requests correctly

Edge cases to handle:
- Storage unavailable: fail open (allow request) or fail closed (deny)?
- Clock skew between servers: use server-local time or Redis time?

The "fail open or fail closed" question is one you need to answer before Claude implements it. Fail open means rate limiting is best-effort; fail closed means no requests get through when Redis is down. Both are valid choices for different contexts.

Testing rate limiting

Write tests for the rate limiter. Test:
- Under limit: requests succeed
- At limit: last request in window succeeds
- Over limit: request is rejected with 429
- After window resets: requests succeed again
- Burst: [N] rapid requests, first [bucket-size] succeed
- Concurrent requests: multiple simultaneous requests are counted correctly
- Different users: one user's limit doesn't affect another's

The concurrent request test is the one most people skip. Without explicit locking or atomic operations, race conditions can let more requests through than the limit allows.

Per-endpoint limits

Different endpoints have different limits:
- /api/search: 10 requests/minute (expensive query)
- /api/data: 100 requests/minute (cheap read)
- /api/export: 2 requests/hour (very expensive)
- /api/auth: 5 attempts/15 minutes (security)

Implement with a config-driven approach so limits can be changed
without code changes.

Bypass for trusted callers

Add rate limit bypass for:
- Internal services (identified by internal API key)
- Admin users (identified by role)

Log when bypass is used so it's auditable. Don't silently skip
rate limiting — log that it was bypassed and why.

Rate limiting patterns are in the Agent Prompt Playbook. $29.