Technical update would add safeguards to limit the impact of malicious instructions on AI systems that retrieve and act on web content.
OpenAI said it has updated Atlas, its web-browsing and retrieval system, to reduce the risk of prompt injection, a class of attacks that attempts to manipulate AI systems through hidden or misleading instructions embedded in online content.
In a technical post, OpenAI said the changes are intended to harden Atlas against attacks that can cause AI systems to ignore original instructions, leak information, or take unintended actions when interacting with external data sources.
According to OpenAI, the update introduces multiple layers of defense designed to better separate system instructions from untrusted content retrieved from the web. These measures include stricter handling of external text, additional checks on tool usage, and internal evaluation methods to detect prompt injection attempts during browsing and retrieval tasks.
The company said the changes apply to AI systems that use Atlas to browse or retrieve information from external sources, including those that summarize or act on web-based content. Prompt injection attacks have been identified as a persistent risk for AI systems that interact with untrusted data, particularly those that automate actions based on retrieved information.
“Prompt injection remains a fundamental challenge for any AI system that consumes untrusted content,” OpenAI said in the post. “While no system can be fully immune, these defenses significantly raise the cost and difficulty of successful attacks.”
OpenAI said it will continue testing and refining its defenses as new attack techniques emerge. The company also encouraged developers building retrieval-augmented AI systems to assume external content is untrusted and to implement layered security controls.
The update is effective immediately and does not require user action, according to OpenAI.