What is AI Red Teaming?
Adversarial testing of an AI system — LLM, agent, multimodal model — to discover safety, security, and alignment failures before adversaries do. Distinct from generic LLM red-teaming (which is one subcase) by covering agentic + multimodal + supply-chain attack surfaces.
Full explanation
AI red teaming applies offensive-security practice to AI systems. Common probes: prompt injection (direct + indirect), jailbreak, training-data inference, model extraction, agent privilege escalation, multimodal injection (image / audio steganography), tool-use abuse, RAG poisoning. The 2026 OWASP Gen AI Security Project's Q2 2026 landscape report distinguishes AI red teaming from prompt-injection testing (red teaming is the lifecycle program; prompt-injection testing is one technique). Securie's RedTeamSpecialist + offensive-swarm SKU cover continuous AI red teaming with sandbox-scope guards.
Example
A team launching a customer-support agent runs a red-teaming engagement: 100 adversarial prompts (LLM01), 20 multimodal-injection PDFs (LLM03 + LLM06), 50 indirect-injection URL-fetch tests (lethal trifecta), 30 tool-poisoning MCP scenarios (LLM07). Pass rate is logged + tracked per release. CI gate fires on any drop below 0.90 prompt-injection-resistance score.
Related
FAQ
How is this different from LLM red-teaming?
LLM red-teaming targets the model in isolation. AI red-teaming covers the whole system: model + agent loop + tool catalog + RAG + multimodal inputs. The OWASP Gen AI Q2 2026 landscape distinguishes them explicitly.
Manual or automated?
Both. Automated continuous (Securie's CI gate, OpenAI's auto-RL approach for ChatGPT Atlas); manual deep-engagement quarterly. Manual finds novel classes; automated catches regressions.