OpenAI’s models are achieving stronger cybersecurity abilities, prompting the company to implement a layered defense framework to manage how these tools are accessed and used. New internal evaluations reveal an increase in performance on security benchmarks, specifically, AI systems performing cybersecurity tasks improved from 27% to 76% accuracy within a few months.
Based on this performance, OpenAI is planning as though future models could help craft zero-day exploits or support intrusion efforts against secured systems, igniting new efforts to prevent misuse before model release.
OpenAI’s current work is focused on ensuring these powerful capabilities are directed toward supporting defenders.
Initiatives such as trusted access programs, code-auditing tools like Aardvark, and collaboration with security professionals are all designed to help cybersecurity teams detect vulnerabilities and protect infrastructure. These efforts are combined with access restrictions, monitoring systems, and red-teaming practices to reduce the likelihood that AI models are used for malicious purposes.
The company is also working with external organizations to align on evaluations and foster an ecosystem of shared risk awareness and security goals.
Why It Matters: AI models are now able to contribute meaningfully to cybersecurity tasks that were once reserved for experts. OpenAI’s new safeguards are intended to prevent misuse while ensuring defenders can apply these tools effectively in real-world environments. AI capability growth has placed careful governance and oversight as keys to ensuring AI contributes positively to digital security.
- New Performance Levels Signal Greater Risk and Use Potential: OpenAI’s internal benchmarks indicate that its newer models can complete cybersecurity tasks with much higher proficiency than previous versions. This ability suggests that future systems may be capable of contributing to or independently generating harmful cyber operations. As a result, OpenAI is preparing each new generation of AI with the assumption that it may qualify as having “High” cyber capability under its Preparedness Framework, which includes developing zero-day remote exploits or supporting stealth operations on secure networks.
- Layered Defense Measures Built Into AI Deployment: OpenAI has designed a layered approach to prevent misuse of its models, combining technical controls such as infrastructure hardening and egress filtering with continuous monitoring of system activity. These safety measures are backed by real-time detection systems that analyze user prompts and model outputs. If the system identifies signs of misuse, it may intervene, potentially downgrading to a lower model or escalating the interaction for human evaluation. The system can also simply block the model’s output.
- Training Frontier Models for Safe and Useful Responses: The company is focusing on training its most powerful models to recognize and avoid requests that may contribute to cyber abuse, while continuing to provide safe assistance for legitimate cybersecurity research and education. This involves designing model behavior to reject harmful input while still enabling defenders to analyze and assess vulnerabilities using AI tools.
- Tools and Access Paths to Support Defensive Users: OpenAI is developing and deploying tools specifically intended to help cybersecurity professionals. Aardvark, its agent-based security tool now in private beta, scans entire codebases to detect weak points and recommend fixes. It has already discovered new vulnerabilities in open-source software and will be offered at no cost to certain non-commercial projects. In addition, OpenAI is launching a trusted access program, giving qualified users access to more capable models that still maintain safety boundaries. This allows defenders to gain more powerful tools while preserving oversight.
- Industry Coordination to Build Shared Guardrails: Recognizing that cyber misuse could become possible with any advanced model, OpenAI is working with the Frontier Model Forum and other labs to identify potential threat pathways. This includes developing shared evaluation practices and risk models that highlight how attackers might try to abuse AI systems, what steps could be taken to interrupt those pathways, and where access controls can be most effective. The company is also forming a Frontier Risk Council made up of experienced security professionals who will help define where responsible capabilities end and potential abuse begins, feeding into OpenAI’s decisions on future model use and safeguard development.
Go Deeper -> Strengthening cyber resilience as AI capabilities advance – OpenAI
OpenAI outlines controls to prevent cyber attackers exploiting its AI models – Tech Informed
Trusted insights for technology leaders
Our readers are CIOs, CTOs, and senior IT executives who rely on The National CIO Review for smart, curated takes on the trends shaping the enterprise, from GenAI to cybersecurity and beyond.
Subscribe to our 4x a week newsletter to keep up with the insights that matter.


