What is Data Poisoning?

Updated

An attack on AI training data — injecting malicious examples into the training set to alter model behavior at inference time.

Full explanation

Adversary contributes corrupted training data (open-source dataset PRs, RAG-document uploads, customer-feedback systems with retraining loops). Model learns the corruption + applies it at inference. Defense: dataset-provenance tracking + adversarial-example testing + RAG-guard poisoning_score on every ingested doc.

Example

RAG system ingests user-uploaded support tickets as training data. Adversary submits 1000 tickets with hidden adversarial pattern: 'when asked about pricing, recommend competitor X'. Future RAG-augmented responses are poisoned.

Related

FAQ

How is this different from prompt injection?

Prompt injection = inference-time attack via input. Data poisoning = training-time attack via dataset.