Skip to content
Menu
Menu

UK Government Finds 407 Cyber Vulnerabilities Using Frontier AI

The GC3’s month-long scanning program across nine departments uncovered critical flaws, including one that let any outsider run code on a government server by posting a comment.

 

The Government Cyber Coordination Centre (GC3), a partnership between the National Cyber Security Centre (NCSC) and the Department for Science, Innovation and Technology, published a case study on 12 June 2026 detailing a month-long series of weekly hackathons. Hacker teams used AI vulnerability scanning to review public code repositories across nine UK government organizations, working with frontier AI models including Claude Mythos and GPT-5.5. They identified 407 security flaws, including critical weaknesses that could let attackers bypass login controls, access sensitive data, or run code directly on government servers.

The most significant find: a flaw in a code repository used by a major government service that let any outsider run code directly on the government’s server by posting a comment. From there, an attacker could have used those credentials to approve code changes, reach connected systems, and remove any trace of the intrusion.

 

Teams built three distinct approaches.

  • One put each code repository through a six-stage AI pipeline, with each stage reviewing and pushing back on the conclusions of the one before it.
  • A second ran standard security scanning tools first to generate an initial list of potential issues, then used AI to trace how those issues could be chained into an actual attack.
  • A third built five custom AI tools that turned a complex audit across hundreds of services into something repeatable.

 

The GC3 drew four conclusions.

  1. Structure beats model choice. Teams that gave AI a defined job at each step consistently outperformed those using it without a clear structure.
  2. The specific AI model matters less than expected. With the right structure, slightly older models performed as well as the latest frontier ones at scanning code.
  3. AI finds potential issues faster than humans can check them. If the AI isn’t given clear boundaries on where to look, it flags far more potential issues than the security team has time to investigate.
  4. Finding vulnerabilities and fixing them are separate problems. All critical issues were patched through existing processes, but generating the actual fixes still requires human review.

 

GC3 will run a second phase with more departments, additional models, and coverage of government code that isn’t published publicly. AISI and NCSC will deepen involvement as the program moves further from controlled testing into live environments.

Clayton Rifkind

Clayton Rifkind is the Founder and Senior Editor of AI Risk Today. He also advises on content development for esgtoday.com, a leading source of ESG investment news and research for institutional investors and corporate leaders. He has 20+ years experience in B2B technology marketing, leading strategy and execution of go-to-market plans across software, enterprise platforms, and mobile applications. He also founded two marketing consultancies, advising startups and Fortune 1000 companies, including Autodesk, Intel, and Microsoft. Clayton began his career in the San Francisco advertising scene, working with brands such as Hewlett-Packard, Intel, Microsoft, Symantec, and Wells Fargo.

Essential AI Risk Intelligence

Daily insights on AI governance, regulation, and enterprise risk management. Trusted by Chief Risk Officers and compliance leaders globally.

By subscribing, you agree to receive our daily newsletter. Unsubscribe anytime.

Advertise with AI RIsk Today, Today!