Preventing runaway OpenAI / Anthropic bills — spend caps + rate limits + monitoring
Documented Claude Opus victim case ran 4.5 days at $50K. Spend caps + rate limits + monitoring are existential.
Leaked AI key + no spend cap = $50K+ bill in days. Caps + rate-limits + monitoring as defense-in-depth.
What it is
AI provider keys (OpenAI, Anthropic) carry per-call costs that compound. LLMjacking attackers extract value via inference until the cap trips.
Vulnerable example
// No rate limit, no spend cap
export async function POST(req: Request) {
const { prompt } = await req.json();
const r = await openai.chat.completions.create({ model: "gpt-5", messages: [{ role: "user", content: prompt }] });
return Response.json(r);
}Fixed example
import { ratelimit } from "@upstash/ratelimit";
const limiter = new Ratelimit({ redis, limiter: Ratelimit.slidingWindow(10, "1 m") });
export async function POST(req: Request) {
const ip = req.headers.get("x-forwarded-for") ?? "anon";
const { success } = await limiter.limit(ip);
if (!success) return new Response("rate-limit", { status: 429 });
// Plus per-user spend tracking + provider-side cap as backstop
const r = await openai.chat.completions.create({ model: "gpt-5-nano" /* default to cheaper model */, messages: [...] });
return Response.json(r);
}How Securie catches it
apps/web/app/api/route.ts:22Preventing runaway OpenAI / Anthropic bills
cost-firewall crate enforces per-tenant + per-feature spend caps; rate-limit specialist catches missing rate limits on paid-API routes; secret_scanner catches leaked keys.
import { ratelimit } from "@upstash/ratelimit";
const limiter = new Ratelimit({ redis, limiter: Ratelimit.slidingWindow(10, "1 m") });
export async function POST(req: Request) {
const ip = req.headers.get("x-forwarded-for") ?? "anon";
const { success } = await limiter.limit(ip);
if (!success) return new Response("rate-limit", { status: 429 });
// Plus per-user spend tracking + provider-side cap as backstop
const r = await openai.chat.completions.create({ model: "gpt-5-nano" /* default to cheaper model */, messages: [...] });
return Response.json(r);
}Checklist
- Per-IP + per-user rate limits
- Vendor-side spend cap (OpenAI Limits page)
- Cheapest-model default
- Spend monitoring alerts
- Securie cost-firewall enabled
FAQ
What's a reasonable per-user limit?
Depends on use case. Start low (10 req/min/user) + raise on demand.
Related guides
Every route calling OpenAI / Stripe / Anthropic / paid vendor needs per-IP + per-user rate limits. Edge-native is best for vibe-coded apps.
Row-Level-Security bypass is the most common data leak in vibe-coded apps. Here is exactly how it happens, how attackers find it, and how to fix it in Next.js + Supabase with one policy update.
BOLA is the top item on the OWASP API Security Top 10 for a reason — every AI coding assistant introduces it by default. Learn what it looks like in Next.js, how to exploit it, and how to fix it.
IDOR is the classic name for an authorization bug where a user can change an ID in a URL and access data they should not see. It is BOLA's older cousin and still ships in half of all new apps.