Every production API needs rate limiting. Without it, a single misbehaving client — or a deliberate attacker — can overwhelm your server with thousands of requests per second, crashing your application and affecting all users. Rate limiting controls how many requests each client can make within a time window, protecting your infrastructure while ensuring fair access for everyone.
This guide covers everything: what rate limiting is, the four major algorithms (and when to use each), HTTP 429 responses, rate limit headers, practical implementation in Node.js/Redis/NGINX, client-side handling, and real-world examples from GitHub, Stripe, and OpenAI.
What Is Rate Limiting?
Rate limiting is a technique that restricts the number of requests a client can make to an API within a specific time period. When a client exceeds the limit, the server rejects further requests with an HTTP 429 "Too Many Requests" response until the time window resets.
Real-world analogy: Think of a theme park with limited capacity. Only 100 visitors can enter per hour. The first 100 get in immediately. Visitor 101 is told to wait until the next hour. This prevents overcrowding (server overload) and ensures everyone inside has a good experience (acceptable response times).
Why Rate Limiting Matters
Rate Limiting Algorithms
There are four main algorithms for implementing rate limiting. Each has different trade-offs in accuracy, memory usage, burst handling, and implementation complexity.
1. Fixed Window Counter
The simplest algorithm. Divide time into fixed windows (e.g., 1-minute blocks). Count requests in the current window. If the count exceeds the limit, reject. When the window ends, reset to zero.
Problem: A client can send 100 requests at 11:59:59 and another 100 at 12:00:01 — 200 requests in 2 seconds, double the intended rate. This "boundary burst" is Fixed Window's weakness.
// Fixed Window in Redis
async function fixedWindowLimit(clientId, limit, windowSec) {
const key = `ratelimit:${clientId}:${Math.floor(Date.now() / 1000 / windowSec)}`
const count = await redis.incr(key)
if (count === 1) await redis.expire(key, windowSec)
return count <= limit
}2. Sliding Window
Solves the boundary burst problem by considering a "sliding" time window that moves with the current time. Instead of fixed blocks, it looks at the last N seconds from right now. More accurate than Fixed Window but uses more memory (needs to track individual request timestamps).
The Sliding Window Counter variant approximates this by weighting the previous window's count. Example: if you're 30% through the current window, count = (current_count) + (previous_count × 0.7). This gives good accuracy with minimal memory.
3. Token Bucket
Imagine a bucket that holds tokens. Tokens are added at a fixed rate (e.g., 10 tokens/second). Each request consumes one token. If the bucket is empty, the request is rejected. If the bucket is full, extra tokens are discarded (max capacity = burst limit).
Token Bucket is the most popular algorithm because it naturally supports bursts. If a client hasn't made requests for a while, their bucket is full — they can send a burst of requests immediately. This matches real user behavior (idle periods followed by activity spikes).
// Token Bucket conceptual implementation
class TokenBucket {
constructor(capacity, refillRate) {
this.capacity = capacity // Max tokens (burst limit)
this.tokens = capacity // Start full
this.refillRate = refillRate // Tokens per second
this.lastRefill = Date.now()
}
consume() {
this.refill()
if (this.tokens >= 1) {
this.tokens -= 1
return true // Request allowed
}
return false // Rate limited (429)
}
refill() {
const now = Date.now()
const elapsed = (now - this.lastRefill) / 1000
this.tokens = Math.min(this.capacity, this.tokens + elapsed * this.refillRate)
this.lastRefill = now
}
}4. Leaky Bucket
Imagine a bucket with a hole at the bottom. Requests flow in at any rate (filling the bucket), but they flow out (are processed) at a constant rate through the hole. If the bucket overflows (too many pending requests), new ones are rejected.
Leaky Bucket smooths traffic — regardless of how bursty the input is, output is always at a constant rate. This provides predictable server load but doesn't allow legitimate bursts. It's ideal for systems that need consistent throughput (video streaming, network queues).
Algorithm Comparison
| Feature | Fixed Window | Sliding Window | Token Bucket | Leaky Bucket |
|---|---|---|---|---|
| Accuracy | Low (boundary bursts) | High | High | High |
| Burst support | Unintended at boundaries | No | Yes (built-in) | No (smooths traffic) |
| Memory usage | Very low (1 counter) | High (timestamps) | Low (2 values) | Low (queue size) |
| Implementation | Very simple | Medium | Medium | Medium |
| Distributed (Redis) | Easy | Moderate | Easy | Moderate |
| Best for | Simple internal APIs | Accurate public APIs | Most REST APIs | Streaming/network |
Rate Limiting HTTP Headers
Well-designed APIs communicate rate limit status in response headers so clients can manage their request rate proactively:
// Standard rate limit response headers
HTTP/1.1 200 OK
RateLimit-Limit: 100 → Maximum requests per window
RateLimit-Remaining: 73 → Requests remaining in current window
RateLimit-Reset: 1719216060 → Unix timestamp when window resets
X-RateLimit-Limit: 100 → (legacy non-standard version)
X-RateLimit-Remaining: 73 → (legacy non-standard version)
// When rate limited (429 response)
HTTP/1.1 429 Too Many Requests
Retry-After: 45 → Seconds until client can retry
RateLimit-Limit: 100
RateLimit-Remaining: 0
RateLimit-Reset: 1719216060
Content-Type: application/json
{"error": "Rate limit exceeded", "retryAfter": 45}Always include these headers — they let well-behaved clients back off before hitting the limit, reducing unnecessary 429 responses and improving overall API throughput.
Implementing Rate Limiting (Node.js + Redis)
// Express.js rate limiter using Redis (Token Bucket)
const Redis = require("ioredis")
const redis = new Redis()
async function rateLimit(req, res, next) {
const clientId = req.ip // Or req.user.id for authenticated users
const key = `ratelimit:${clientId}`
const limit = 100 // Max tokens (burst capacity)
const refillRate = 10 // Tokens per second
const now = Date.now()
// Atomic Redis operation
const [tokens, lastRefill] = await redis.mget(
`${key}:tokens`, `${key}:lastRefill`
)
let currentTokens = parseFloat(tokens) || limit
const lastTime = parseInt(lastRefill) || now
const elapsed = (now - lastTime) / 1000
// Refill tokens
currentTokens = Math.min(limit, currentTokens + elapsed * refillRate)
if (currentTokens < 1) {
const retryAfter = Math.ceil((1 - currentTokens) / refillRate)
res.set("Retry-After", retryAfter)
res.set("RateLimit-Limit", limit)
res.set("RateLimit-Remaining", 0)
return res.status(429).json({ error: "Too many requests", retryAfter })
}
// Consume token
currentTokens -= 1
await redis.mset(
`${key}:tokens`, currentTokens,
`${key}:lastRefill`, now
)
await redis.expire(`${key}:tokens`, 120)
await redis.expire(`${key}:lastRefill`, 120)
res.set("RateLimit-Limit", limit)
res.set("RateLimit-Remaining", Math.floor(currentTokens))
next()
}
// Apply to all routes
app.use(rateLimit)
// Or per-endpoint with different limits
app.post("/auth/login", rateLimit({ limit: 5, window: 60 }), loginHandler)Client-Side: Handling 429 Responses
Good clients respect rate limits. Here is how to implement exponential backoff with jitter:
// Exponential backoff with jitter
async function fetchWithRetry(url, options, maxRetries = 3) {
for (let attempt = 0; attempt <= maxRetries; attempt++) {
const response = await fetch(url, options)
if (response.status !== 429) return response
// Rate limited — wait and retry
const retryAfter = response.headers.get("Retry-After")
const waitTime = retryAfter
? parseInt(retryAfter) * 1000
: Math.min(1000 * Math.pow(2, attempt) + Math.random() * 1000, 30000)
console.log(`Rate limited. Retrying in ${waitTime}ms (attempt ${attempt + 1})`)
await new Promise(resolve => setTimeout(resolve, waitTime))
}
throw new Error("Max retries exceeded — still rate limited")
}
// Usage
const data = await fetchWithRetry("/api/users", {
headers: { "Authorization": "Bearer token..." }
})Rate Limiting vs Throttling
| Aspect | Rate Limiting | Throttling |
|---|---|---|
| Action taken | Rejects excess requests (429) | Delays/queues excess requests |
| User experience | Immediate failure feedback | Slower responses (waiting in queue) |
| Server load | Blocked requests use no resources | Queued requests still consume memory |
| Best for | APIs, microservices, public endpoints | Internal queues, background jobs |
| Implementation | Counter/token check → accept or reject | Queue with controlled drain rate |
| Client awareness | Client knows immediately (429) | Client may not know it's being throttled |
Common Rate Limiting Mistakes
⚠️ No rate limiting at all
Many APIs launch without any rate limiting. One aggressive scraper or accidental infinite loop in a client can bring down the entire service.
✅ Fix: Add basic rate limiting from day one. Even a generous limit (1000/min) protects against catastrophic abuse.
⚠️ Per-server counters without shared storage
If you have 4 servers, a client can send 4x the intended limit by hitting each server separately. Each server's local counter only sees 1/4 of the traffic.
✅ Fix: Use Redis or a centralized store for rate limit counters. All servers share the same count per client.
⚠️ Not returning Retry-After header
Without Retry-After, clients don't know when to retry. They either hammer the API immediately (worsening load) or wait too long (poor UX).
✅ Fix: Always include Retry-After in 429 responses. Clients can programmatically respect it.
⚠️ Same limits for all endpoints
Login endpoints need much stricter limits (5/min) than read endpoints (1000/min). Applying the same limit everywhere either leaves sensitive endpoints exposed or makes the API unusable.
✅ Fix: Set per-endpoint limits: strict for auth/payment, generous for reads, moderate for writes.
⚠️ Rate limiting by IP only
Behind corporate NATs, thousands of users share one IP. Rate limiting by IP blocks entire offices. Also, attackers can distribute across many IPs.
✅ Fix: Use API key or user ID for authenticated requests. Use IP only for unauthenticated/anonymous traffic.
Best Practices
Atomic operations (INCR, EXPIRE), sub-ms performance, and shared state across all server instances.
429 is the standard code for rate limiting. Clients can distinguish 'too many requests' from 'forbidden' or 'server error'.
Let clients see their remaining quota proactively — they can slow down before hitting the limit.
Natural burst support matches real user behavior. Users are idle, then active in bursts. Token Bucket allows this.
Free: 60/hour, Pro: 1000/hour, Enterprise: 10000/hour. Incentivizes upgrades while protecting from abuse.
Login: 5/min, password reset: 3/hour, payment: 10/min. Sensitive endpoints need much lower thresholds.
When near capacity, serve cached responses or simplified data instead of hard 429s for read endpoints.
Track which clients hit limits frequently. Repeated violations may indicate abuse or a broken client.
Publish limits in API docs so developers can design clients that respect them from the start.
If all rate-limited clients retry at exactly Retry-After seconds, they all hit simultaneously. Add random jitter to spread retries.
Real-World Rate Limiting Examples
GitHub API
Unauthenticated: 60 req/hour. Authenticated: 5,000 req/hour. Search: 30 req/min. Uses Token Bucket with per-user tracking.
Headers: X-RateLimit-Limit, X-RateLimit-Remaining, X-RateLimit-Reset
Stripe API
100 read requests/sec, 25 write requests/sec per API key. Higher limits available for production accounts.
Headers: Standard RateLimit headers + detailed error messages
OpenAI API
Tiered by model and plan. GPT-4: 10,000 tokens/min (free), 1M tokens/min (paid). Rate limits on both requests AND tokens.
Headers: x-ratelimit-limit-requests, x-ratelimit-limit-tokens, x-ratelimit-remaining-*
Twitter/X API
App-level and user-level limits. Tweets: 300/3h per user. Read: varies by tier (Free: 1500/month). Uses per-endpoint windows.
Headers: x-rate-limit-limit, x-rate-limit-remaining, x-rate-limit-reset
Frequently Asked Questions
What is rate limiting?
Why do APIs limit requests?
What happens when you get HTTP 429?
What is the Retry-After header?
Which rate limiting algorithm is best?
Token Bucket vs Leaky Bucket — what's the difference?
Can authenticated users have higher limits?
Is rate limiting the same as throttling?
Can rate limiting stop DDoS attacks?
Why use Redis for rate limiting?
Related Articles & Tools
Conclusion
Rate limiting is not optional for production APIs — it's a fundamental requirement for security, stability, and fair usage. Without it, a single misbehaving client can bring down your entire service. With proper rate limiting, your API remains fast and available for all users even under heavy load or attack.
For most REST APIs, start with Token Bucket (supports bursts, easy to implement with Redis) and return proper HTTP 429 responses with Retry-After headers. Set different limits per endpoint (strict for auth, generous for reads), per user tier (free vs paid), and monitor violations. Implement this from day one — retrofitting rate limiting into a production API under attack is much harder than building it in from the start.
