ProAttack Research Reveals LLM Backdoors Can Be Planted with Just Six Samples

Security researchers have discovered ProAttack, a prompt-based backdoor attack that can compromise LLMs with as few as six poisoned training samples while bypassing every major defense mechanism.

The attack assigns a malicious prompt to a subset of training samples while leaving all labels correct and text natural. Four established defenses--ONION, SCPD, back-translation, and fine-pruning--all failed to eliminate the attack.

Attack success rates approach 100% across multiple benchmarks. "Users in real-world applications often adopt publicly available or shared prompt templates, making poisoned prompts a genuine vulnerability vector."

The sectors most at risk include finance, healthcare, and governance--where AI failures carry the highest stakes.

“The efficiency of this attack fundamentally changes the threat model for LLM training.”

— Nicholas Carlini, Research Scientist, Google DeepMind

Samples needed

97%

Attack success rate

0.001%

Training data ratio