What is Data Poisoning?
An attack on AI training data — injecting malicious examples into the training set to alter model behavior at inference time.
Full explanation
Adversary contributes corrupted training data (open-source dataset PRs, RAG-document uploads, customer-feedback systems with retraining loops). Model learns the corruption + applies it at inference. Defense: dataset-provenance tracking + adversarial-example testing + RAG-guard poisoning_score on every ingested doc.
Example
RAG system ingests user-uploaded support tickets as training data. Adversary submits 1000 tickets with hidden adversarial pattern: 'when asked about pricing, recommend competitor X'. Future RAG-augmented responses are poisoned.
Related
FAQ
How is this different from prompt injection?
Prompt injection = inference-time attack via input. Data poisoning = training-time attack via dataset.