No Frontier AI Model Is Safe Under Iterative Attack, Cisco Research Reveals

Cisco's "Proprietary Problems" report tests 15 closed models across 37,000 adversarial prompts and finds that every major AI system buckles when attackers are allowed to keep trying.

The safety ratings plastered on frontier AI models may be worth considerably less than advertised. That is the headline conclusion of a major new study from Cisco Research released this week, titled Proprietary Problems: No Frontier Model Is Multi-Turn Immune — and the numbers behind it are difficult to dismiss.

Researchers tested 15 closed, proprietary flagship models from OpenAI, Anthropic, Google, Amazon, and xAI between January and April 2026, running 30,090 single-turn adversarial prompts and 6,986 multi-turn attack sequences across 1,456 separate conversations. The verdict: not one model can be characterized as safe when an adversary is allowed to iterate.

"No frontier closed model in this cohort can be characterized as safe under iterative attack," the report states — a finding that directly challenges how AI safety benchmarks are currently designed, marketed, and consumed.

The Benchmark Problem

The core argument is methodological. Most published safety evaluations — including those cited by model providers — rely on single-turn testing: one prompt in, one response out. If the model refuses, it counts as a pass. The problem, Cisco argues, is that real adversaries do not stop at the first refusal.

"Real adversaries iterate," the report notes. They reframe rejected requests, decompose harmful tasks into innocuous-looking subtasks, adopt fictional personas, and escalate gradually across a conversation. A model that declines a harmful prompt in isolation may capitulate readily after three or four conversational nudges — and no standard benchmark currently captures that dynamic.

The numbers tell the story. Across all models tested, single-turn attack success rates (ASR) ranged from 2.19% to 64.91%. Switch to multi-turn, and that range shifts to 7.89% to 88.30%. Eight of the 15 models showed an absolute gap of more than 15 percentage points between the two regimes — meaning that models appearing robust on industry benchmarks performed dramatically worse under realistic attack conditions.

Model-by-Model: The Gap Exposed

The divergence between single-turn and multi-turn performance is striking across every major AI family.

xAI's Grok 4.1 Fast, in its non-reasoning configuration, posted the worst multi-turn ASR in the cohort: 88.3%, up from 34.1% under single-turn conditions — a gap of more than 54 percentage points. Google's Gemini 3 Pro climbed from 18.1% to 73.3%. OpenAI's GPT-5.4 jumped from a single-turn ASR of just 2.7% to 24.7% when attackers were given room to persist — roughly a ninefold increase.

Anthropic's Claude models posted the strongest single-turn refusal performance in the cohort, with some variants recording single-turn ASRs in the low single digits. But even Claude Opus 4.6 — one of the hardest-to-break models in the study — climbed from 3.6% to 16.2% under multi-turn conditions. Impressive by comparison, but still a fourfold increase. No model held firm.

One finding with significant practical implications: reasoning mode matters. Grok 4.1 Fast's multi-turn ASR dropped from 88.3% in standard mode to 43.5% when reasoning was activated — a reduction of nearly 45 percentage points. The implication is that chain-of-thought processing may introduce a meaningful, if imperfect, safety buffer, even when the underlying model is otherwise highly vulnerable.

312 Vectors, 71% Hit at Least One

Beyond model-level ASR figures, Cisco tested 312 distinct attack vectors over the January–April 2026 study window. Seventy-one percent of those vectors succeeded against at least one model in the cohort. Twenty-three percent succeeded against all six models tested in comparative runs — meaning that nearly one in four attack approaches reliably defeats every major frontier system when deployed.

Indirect prompt injection, where malicious instructions are smuggled into a model's context via tool outputs or agent pipelines rather than direct user input, proved particularly effective. That vector achieved an 84% success rate — the highest recorded in the study. As AI systems increasingly operate as autonomous agents consuming data from external sources, this attack surface is only growing.

Why This Matters Now

The timing of the report is not incidental. Enterprise AI deployment has accelerated sharply in 2025 and 2026, with agentic systems — those capable of taking actions, browsing the web, and calling external tools autonomously — moving from research demos into production workflows. Cisco's findings suggest that the safety infrastructure underpinning those deployments may be far thinner than organizations assume.

"The gap between published scores and observed resilience misranks leading models," the report concludes, adding that benchmark reliance has created a false sense of security at precisely the moment when AI systems are gaining real-world autonomy and authority.

The researchers' prescription is direct: the industry needs to move away from single-turn-only evaluation regimes. Safety testing should incorporate multi-turn attack sequences, agentic tool-use scenarios, and indirect injection vectors as standard practice — not as edge-case additions to existing frameworks.

Implications for Enterprise Security Teams

For security practitioners, the study surfaces several immediate considerations. First, any AI deployment that surfaces as a customer-facing or employee-facing chat interface should be treated as an iterative attack target, not a static system. Second, agentic pipelines that consume external tool outputs — web search results, database queries, API responses — carry elevated risk given the 84% indirect injection success rate. Third, model selection decisions made on the basis of published safety benchmarks alone should be revisited: multi-turn performance may look very different from single-turn scores, and the study suggests the gap is the rule, not the exception.

Cisco stopped short of naming a safe model or recommending specific mitigations, noting that all 15 systems in the cohort failed under sufficient adversarial pressure. The implicit message is that layered defenses — prompt shields, output filtering, human-in-the-loop checkpoints for high-stakes actions — remain essential supplements to any model's native safety training.

The full report, Proprietary Problems: No Frontier Model Is Multi-Turn Immune, is available via Cisco Blogs and the Cisco AI research portal.

---

Sources: CSO Online · Cybersecurity Dive · Cisco Blogs · Help Net Security · SiliconANGLE · Network World

"No frontier closed model in this cohort can be characterized as safe under iterative attack."

— Cisco Research, Proprietary Problems report

Frontier models tested

71%

Attacks succeeding against at least one model

84%

Indirect prompt injection success rate