AI Safety Benchmarks Miss Real Attack Risk, Cisco Finds

Two professionals reviewing AI security research results late at night in a high-rise office. (AI-generated image)

Every Frontier AI Model Tested Fails Under Real Attack Conditions, Cisco Finds

Clayton Rifkind May 28, 2026

Cisco tested 15 leading AI models using multi-turn attacks, the way real attackers operate, and found attack success rates as high as 88%.

When AI companies test their models for safety, the standard approach is one challenge at a time: an attacker sends a single harmful message, the model responds, and the test ends. This is called a single-turn prompt or single-turn attack. This is how AI safety benchmarks evaluate and compare AI models today. Real cyberattackers do not work that way. They probe, reframe when blocked, and keep pushing back and forth, turn after turn, until they find a way through. A test that mirrors that kind of sustained, adaptive pressure is called a multi-turn attack. Cisco ran that test across 15 of the most widely deployed AI models from every major provider; not one held up.

Key Takeaways

Multi-turn attack success rates reached 88.30% for Grok 4.1 Fast when its reasoning feature was turned off, the highest among the 15 models.
Every model tested showed meaningful vulnerability to multi-turn attacks. The lowest in the group was Amazon Nova 2 Lite at 7.89%.
GPT-5.4 went from a 2.74% single-turn attack success rate to 24.68% under multi-turn testing, a 9x increase.
Gemini 3 Pro rose from 18.10% to 73.35% under multi-turn conditions, a jump of 55 percentage points (pp).
Turning on Grok 4.1 Fast’s reasoning feature cut its multi-turn attack success rate from 88.30% to 43.47%.
8 of 15 models showed a gap greater than 15 pp between single-turn and multi-turn success rates.

Cisco’s AI Threat Intelligence and Security Research Team published the Proprietary Problems report on May 27. The report ran 15 leading AI models from OpenAI, Anthropic, Google, Amazon, and xAI through 30,090 single-turn prompts and 6,986 multi-turn attack sequences. The two types of tests produced different rankings, risk levels, and implications for how companies choose which model to deploy.

The benchmark problem

Safety testing organizations typically employ single-turn testing because it targets specific weak points, allowing companies to focus on and fix them individually. Multi-turn testing accounts for how attackers behave and emulates that behavior. Here, the idea is to see what methods work, not to test individual vulnerabilities. Real attackers reframe when blocked, break requests into smaller pieces spread across multiple exchanges, and escalate until they find a way through.

Cisco’s test ran both approaches side by side. Multi-turn attack success rates across the 15 models ranged from 7.89% to 88.30%, a range 18 pp wider than the single-turn spread of 2.19% to 64.91%. Eight of the 15 models showed a gap of more than 15 pp between the two test types.

No model is clean

Every model in the group showed vulnerability to multi-turn attacks. The lowest multi-turn attack success rate in the study was Amazon Nova 2 Lite at 7.89%. The highest was Grok 4.1 Fast at 88.30%, tested with its reasoning feature turned off. Most of the group fell between 11% and 31%.

Single-turn scores hide risk

The single-turn numbers give a misleading picture for several models. GPT-5.4 has a 2.74% single-turn attack success rate, second-lowest in the group. Under multi-turn testing, it reaches 24.68%, a 9x increase. GPT-5.2 moves from 4.74% to 23.50%. Gemini 3 Pro starts at 18.10% and climbs to 73.35%, a 55 pp jump. Those shifts don’t appear on standard safety tests.

The Amazon models moved in the opposite direction. Nova Lite’s attack success rate dropped by almost 35 pp between single-turn and multi-turn testing; Nova Micro’s dropped by 34 pp.

Grok 4.1 Fast reached an 88.30% multi-turn attack success rate when its reasoning feature was turned off. Reasoning mode is a setting that causes an AI model to work through a request step by step before responding, rather than answering immediately. With that setting turned on, the same model dropped to 43.47%, a reduction of nearly 45 pp from one configuration change. Cisco argues that AI providers should tell customers which settings meaningfully affect security, not just performance.

AI Risk Today AI Safety Benchmarks

(Performance by model – Source: Cisco Proprietary Problems: How Frontier Closed Models Collapse Under Iterative Pressure)

What to do about it

Cisco recommends three steps for frontier AI developers to consider as part of the development and deployment process.

Publish attack success rates, broken down by attack type, for every release, not just a single overall number.
Hold a model back from deployment if it gets worse on the three highest-risk attack approaches (Imposter AI, Soft Paraphrase, and System Prompts) or the three highest-risk content categories (Hate Speech, Profanity, and Specialized Advice).
Three, flag any model showing a gap greater than 15 pp between single-turn and multi-turn attack success rates for manual review before deployment, a threshold that would have flagged eight of the 15 models in this study.

Governance and regulation considerations

NIST’s AI Risk Management Framework and its forthcoming Cyber AI Profile both call for attack-based testing of AI models. The EU AI Act’s Article 15 requires robustness testing for high-risk AI systems starting December 2027. Neither currently specifies multi-turn testing or the attack-type breakdown Cisco advocates.

Clayton Rifkind

Clayton Rifkind is the Founder and Senior Editor of AI Risk Today. He also advises on business development for ESG Today, a leading source of ESG investment news and research for institutional investors and corporate leaders. He has 20+ years of experience in B2B technology, leading strategy and execution of go-to-market plans across software, enterprise platforms, and mobile applications. He founded two consultancies advising startups and Fortune 1000 companies, including Autodesk, Intel, and Microsoft. He began his career in the San Francisco advertising scene working with brands such as Hewlett-Packard, Intel, Microsoft, Symantec, and Wells Fargo. Clayton launched AI Risk Today in 2025 after two decades of watching enterprises adopt transformative technologies, and seeing how often risk, governance, and compliance considerations lagged behind. His reporting draws on primary sources including regulatory filings, court documents, and official announcements, with a focus on what AI developments mean for the executives accountable for managing them. Reach him at Reach him at [email protected] or on LinkedIn.

Essential AI Risk Intelligence

Daily insights on AI governance, regulation, and enterprise risk management. Trusted by Chief Risk Officers and compliance leaders globally.

By subscribing, you agree to receive our daily newsletter. Unsubscribe anytime.