The GC3’s month-long scanning program across nine departments uncovered critical flaws, including one that let any outsider run code on a government server by posting a comment.
The Government Cyber Coordination Centre (GC3), a partnership between the National Cyber Security Centre (NCSC) and the Department for Science, Innovation and Technology, published a case study on 12 June 2026 detailing a month-long series of weekly hackathons. Hacker teams used AI vulnerability scanning to review public code repositories across nine UK government organizations, working with frontier AI models including Claude Mythos and GPT-5.5. They identified 407 security flaws, including critical weaknesses that could let attackers bypass login controls, access sensitive data, or run code directly on government servers.
The most significant find: a flaw in a code repository used by a major government service that let any outsider run code directly on the government’s server by posting a comment. From there, an attacker could have used those credentials to approve code changes, reach connected systems, and remove any trace of the intrusion.
Teams built three distinct approaches.
- One put each code repository through a six-stage AI pipeline, with each stage reviewing and pushing back on the conclusions of the one before it.
- A second ran standard security scanning tools first to generate an initial list of potential issues, then used AI to trace how those issues could be chained into an actual attack.
- A third built five custom AI tools that turned a complex audit across hundreds of services into something repeatable.
The GC3 drew four conclusions.
- Structure beats model choice. Teams that gave AI a defined job at each step consistently outperformed those using it without a clear structure.
- The specific AI model matters less than expected. With the right structure, slightly older models performed as well as the latest frontier ones at scanning code.
- AI finds potential issues faster than humans can check them. If the AI isn’t given clear boundaries on where to look, it flags far more potential issues than the security team has time to investigate.
- Finding vulnerabilities and fixing them are separate problems. All critical issues were patched through existing processes, but generating the actual fixes still requires human review.
GC3 will run a second phase with more departments, additional models, and coverage of government code that isn’t published publicly. AISI and NCSC will deepen involvement as the program moves further from controlled testing into live environments.

