Artificial Intelligence, Extra Bytes, Information Security

New Research Exposes a Major Blind Spot in AI Security

Think like a perpetrator.

Lily Morris

Contributing Writer

Save

New research from Cisco challenges one of the AI industry’s most common assumptions: that a single adversarial prompt can accurately measure model safety.

In a paired evaluation of 15 proprietary frontier models, Cisco compared standard one-shot testing with multi-turn conversational attacks designed to mimic how real adversaries behave.

Most public safety benchmarks rely on a single malicious prompt followed by one response. Real attackers iterate. They retry requests, change framing, adopt personas, and escalate conversations over time.

Cisco found that conversational attacks often produced very different outcomes from standard single-turn testing, with every model in the cohort showing measurable vulnerability under iterative pressure. Multi-turn attack success rates ranged from 7.89% to 88.30% across systems from OpenAI, Anthropic, Google, Amazon, and xAI.

Why It Matters: Language models now operate inside enterprise software, customer systems, developer workflows, and autonomous agents. Cisco’s findings suggest that one-shot benchmark scores can leave major gaps in understanding how these systems behave during sustained interaction.

Single-Turn Benchmarks Often Failed to Predict Conversational Resilience: Cisco observed large gaps between single-turn and multi-turn attack success rates across the model cohort. OpenAI’s GPT-5.4 reportedly rose from 2.74% attack success in single-turn testing to 24.68% during multi-turn evaluation, while Google’s Gemini 3 Pro increased from 18.10% to 73.35%. Anthropic’s Claude models posted strong refusal scores in one-shot testing but still showed meaningful exposure under iterative pressure. Amazon’s Nova 2 Lite produced one of the clearest reversals, posting relatively high single-turn vulnerability while recording the lowest multi-turn attack success rate in the group.

No Tested Frontier Model Resisted Iterative Attacks Consistently: Every proprietary model evaluated in the study failed a meaningful percentage of multi-turn attacks. Cisco connects this finding with its earlier open-weight model research, where multi-turn attack success rates climbed well above single-turn baselines and reached 92.78% against Mistral Large-2. The report concludes that iterative vulnerability appears across open and closed systems regardless of alignment philosophy or access model.

Deployment Settings Materially Changed Safety Outcomes in Some Cases: Cisco identified major differences tied to runtime configuration. Enabling reasoning mode on xAI’s Grok 4.1 Fast reportedly reduced multi-turn attack success rates from 88.30% to 43.47% under identical testing conditions. The report argues that settings tied to reasoning behavior, guardrails, and system prompt handling can significantly alter operational safety profiles.

Many Successful Attacks Relied on Conversational Manipulation: Cisco grouped multi-turn attacks into categories such as persona adoption, contextual ambiguity, refusal reframing, decomposition and reassembly, and incremental escalation. High-performing single-turn attack procedures included “Imposter AI,” soft paraphrasing, and system prompt manipulation. Harmful outputs concentrated around hate speech, profanity, and specialized advice categories.

Cisco Recommends Layered Defenses Outside the Model Itself: The report argues that runtime guardrails, monitoring systems, adversarial testing, red-teaming, and application-level policies are necessary since no tested system maintained strong resistance during sustained attacks. Cisco also recommends publishing attack success rates by strategy family and reviewing models that show large gaps between single-turn and multi-turn performance. The paper connects these recommendations to regulatory efforts including the NIST AI Risk Management Framework and the EU AI Act.

Go Deeper -> Proprietary Problems: No Frontier Model Is Multi-Turn Immune – Cisco

Trusted insights for technology leaders

Our readers are CIOs, CTOs, and senior IT executives who rely on The National CIO Review for smart, curated takes on the trends shaping the enterprise, from GenAI to cybersecurity and beyond.

Subscribe to our 4x a week newsletter to keep up with the insights that matter.

Save

May 27, 2026

☀️ Subscribe to the Early Morning Byte! Begin your day informed, engaged, and ready to lead with the latest in technology news and thought leadership.

☀️ Your latest edition of the Early Morning Byte is here! Kickstart your day informed, engaged, and ready to lead with the latest in technology news and thought leadership.