What is LLM Red-Teaming?
Updated
Adversarial testing of an AI model — attempting prompt injection, jailbreak, data exfiltration, harmful-content generation. Securie's RedTeamSpecialist + offensive-swarm SKU covers this.
Full explanation
Red-teaming an LLM tests its alignment + safety mitigations. Common probes: prompt injection, jailbreak via persona ('pretend you are DAN'), token-level adversarial inputs, indirect injection. Continuous red-teaming = re-running probes on every model/prompt update via a CI gate.
Example
Securie's prompt-inj-corpus.jsonl carries 500+ adversarial prompts; CI gate fires on any drop below 0.90 resistance score on the agent stack. Continuous red-team via the offensive-swarm SKU.
Related
FAQ
Manual or automated?
Both. Automated continuous (Securie's CI gate); manual deep-engagement quarterly.