Authentication endpoints are the most attacked surface of any web application. Sign-in pages face credential stuffing, password reset flows get abused for email bombing, and 2FA verification endpoints are brute-forced to bypass the second factor. A flat rate limit doesn't work — sign-in needs to be much stricter than a session check. We implemented a 5-tier rate limiting system across all 23 authentication endpoints, with each tier calibrated to the sensitivity of the endpoint.
THE 5 TIERS
Each tier defines a maximum number of requests per 15-minute sliding window:
| Tier | Limit | Window | Use Case |
|---|---|---|---|
| Strict | 3 requests | 15 min | Sign-in, forgot-password, verify-2FA |
| Tight | 5 requests | 15 min | Sign-up, reset-password, enable-2FA |
| Standard | 10 requests | 15 min | OAuth callbacks, session checks |
| Relaxed | 20 requests | 15 min | Clear-session, disable-2FA |
| Lenient | 30 requests | 15 min | Setup-2FA, WebAuthn registration options |
The strict tier allows only 3 attempts — enough for a legitimate user who mistyped their password, but not enough for credential stuffing. The lenient tier is for endpoints that are called multiple times during normal flows (like polling 2FA setup status).
ENDPOINT-TO-TIER MAPPING
All 23 auth endpoints mapped to their tiers:
| Endpoint | Tier |
|---|---|
/api/auth/sign-in | strict (3/15min) |
/api/auth/forgot-password | strict (3/15min) |
/api/auth/verify-2fa | strict (3/15min) |
/api/auth/webauthn/authenticate/verify | strict (3/15min) |
/api/auth/webauthn/passwordless/verify | strict (3/15min) |
/api/auth/sign-up | tight (5/15min) |
/api/auth/reset-password | tight (5/15min) |
/api/auth/enable-2fa | tight (5/15min) |
/api/auth/callback/github | standard (10/15min) |
/api/auth/callback/google | standard (10/15min) |
/api/auth/oauth/github | standard (10/15min) |
/api/auth/oauth/google | standard (10/15min) |
/api/auth/callback/[provider] | standard (10/15min) |
/api/auth/clear-session | relaxed (20/15min) |
/api/auth/disable-2fa | relaxed (20/15min) |
/api/auth/setup-2fa | lenient (30/15min) |
/api/auth/webauthn/register/verify | lenient (30/15min) |
SLIDING WINDOW IMPLEMENTATION
The rate limiter uses a sliding window algorithm with in-memory IP tracking via a Map. Each IP gets an array of timestamps, and we count how many fall within the current window:
// src/lib/rate-limit.ts
interface RateLimitConfig {
maxRequests: number;
windowMs: number;
}
const ipRequests = new Map<string, number[]>();
export function checkRateLimit(
ip: string,
config: RateLimitConfig
): { allowed: boolean; remaining: number; resetAt: number } {
const now = Date.now();
const windowStart = now - config.windowMs;
// Get existing timestamps for this IP, filter to current window
const timestamps = (ipRequests.get(ip) || [])
.filter((t) => t > windowStart);
if (timestamps.length >= config.maxRequests) {
// Rate limited — calculate when the earliest request expires
const oldestInWindow = timestamps[0];
const resetAt = oldestInWindow + config.windowMs;
return {
allowed: false,
remaining: 0,
resetAt,
};
}
// Allow request, record timestamp
timestamps.push(now);
ipRequests.set(ip, timestamps);
return {
allowed: true,
remaining: config.maxRequests - timestamps.length,
resetAt: now + config.windowMs,
};
}
// Cleanup stale entries every 5 minutes
setInterval(() => {
const cutoff = Date.now() - 900000; // 15 minutes
for (const [ip, timestamps] of ipRequests.entries()) {
const active = timestamps.filter((t) => t > cutoff);
if (active.length === 0) {
ipRequests.delete(ip);
} else {
ipRequests.set(ip, active);
}
}
}, 300000);
The sliding window is more accurate than fixed windows. A fixed 15-minute window could allow 6 requests in 30 seconds if the requests span the window boundary (3 at the end of one window + 3 at the start of the next). Sliding windows prevent this burst pattern.
PREDEFINED RATE LIMITERS
Each tier is exported as a named preset for easy use in route handlers:
export const rateLimiters = {
strict: { maxRequests: 3, windowMs: 900000 }, // 3 req / 15 min
tight: { maxRequests: 5, windowMs: 900000 }, // 5 req / 15 min
standard: { maxRequests: 10, windowMs: 900000 }, // 10 req / 15 min
relaxed: { maxRequests: 20, windowMs: 900000 }, // 20 req / 15 min
lenient: { maxRequests: 30, windowMs: 900000 }, // 30 req / 15 min
};
RESPONSE HEADERS
Every rate-limited response includes standard headers so clients know their status:
function addRateLimitHeaders(
response: NextResponse,
result: { remaining: number; resetAt: number },
config: RateLimitConfig
): NextResponse {
response.headers.set("X-RateLimit-Limit", String(config.maxRequests));
response.headers.set("X-RateLimit-Remaining", String(result.remaining));
response.headers.set("X-RateLimit-Reset", String(Math.ceil(result.resetAt / 1000)));
if (result.remaining === 0) {
const retryAfter = Math.ceil((result.resetAt - Date.now()) / 1000);
response.headers.set("Retry-After", String(retryAfter));
}
return response;
}
The headers follow the IETF draft standard:
X-RateLimit-Limit— maximum requests allowed in the windowX-RateLimit-Remaining— requests remaining in the current windowX-RateLimit-Reset— Unix timestamp when the window resetsRetry-After— seconds until the client should retry (only on 429 responses)
THE 429 RESPONSE
When rate limited, the API returns a clean 429 with the ApiError.tooManyRequests() factory:
// src/lib/api-error-handler.ts
export class ApiError extends Error {
constructor(
public status: number,
message: string
) {
super(message);
}
static tooManyRequests(retryAfter?: number): ApiError {
const msg = retryAfter
? `Too many requests. Try again in ${retryAfter} seconds.`
: "Too many requests. Please try again later.";
return new ApiError(429, msg);
}
}
// Usage in route handler:
export async function POST(req: NextRequest) {
const ip = req.headers.get("x-forwarded-for") || "unknown";
const result = checkRateLimit(ip, rateLimiters.strict);
if (!result.allowed) {
const retryAfter = Math.ceil((result.resetAt - Date.now()) / 1000);
throw ApiError.tooManyRequests(retryAfter);
}
// ... handle sign-in logic
}
The response body is always { "error": "Too many requests. Try again in X seconds." }, making it easy for frontend code to display a meaningful message.
IP EXTRACTION FROM REVERSE PROXY
Behind Nginx, the client IP comes from the X-Forwarded-For header, not the socket address. We extract the first IP in the chain (the original client):
function getClientIP(req: NextRequest): string {
const forwarded = req.headers.get("x-forwarded-for");
if (forwarded) {
// Take the first IP (original client) from the chain
return forwarded.split(",")[0].trim();
}
return req.headers.get("x-real-ip") || "unknown";
}
Nginx is configured with proxy_set_header X-Forwarded-For $remote_addr; (not $proxy_add_x_forwarded_for) to prevent client-controlled header injection.
E2E TEST INTEGRATION
Rate limiting creates a problem for E2E tests — running the test suite would exhaust the strict tier after 3 sign-in attempts. Our solution: bypass form-based login entirely using DB session injection. Test sessions are created directly in the database by the seed script, and the session cookie is set programmatically. The rate limiter never sees these "logins" because they never hit the sign-in endpoint.
THE RESULTS
- 5 rate limit tiers calibrated to endpoint sensitivity
- 23 auth endpoints protected (100% coverage)
- Sliding window algorithm — prevents burst attacks across window boundaries
- Standard response headers — clients know their remaining quota
- In-memory tracking with automatic cleanup — no Redis dependency for rate limiting
- ApiError.tooManyRequests() — clean 429 responses with retry guidance
Rate limiting is one of those features that's invisible when it works and catastrophic when it's missing. A single brute-force attack can lock out legitimate users, consume server resources, and expose credentials. Five tiers with 15-minute sliding windows give us granular control over the tradeoff between security and usability for each endpoint.



