OpenAI Launches GPT-5.3-Codex Amidst AI Programming Battle
OpenAI released GPT-5.3-Codex on February 6, 2026, claiming it to be the most powerful programming agent to date. Notably, this launch coincided with Anthropic’s release of its flagship model upgrade, Claude Opus 4.6. Media outlets describe this simultaneous unveiling as the opening shot in what is being referred to as the “AI programming battle,” a high-stakes competition for the enterprise software development market.
OpenAI CEO Sam Altman tweeted shortly after the model’s release:
“I love developing with this model; the improvements feel far beyond what benchmarks show.”
He expressed amazement at the speed of development, stating that they used GPT-5.3-Codex to develop itself, indicating a significant milestone in AI development.
GPT-5.3-Codex Achieves Leading Benchmark Scores
OpenAI reported that the new model achieved significant improvements in various industry benchmarks. GPT-5.3-Codex scored 57% on SWE-Bench Pro, a rigorous real-world software engineering assessment covering four programming languages and focusing on challenges relevant to industry.

The model scored 77.3% on Terminal-Bench 2.0, which primarily measures essential terminal operation capabilities for programming agents, and 64% on OSWorld, which evaluates the model’s ability to perform productivity tasks in a visual desktop environment.
The results from Terminal-Bench 2.0 are particularly noteworthy, showing a 13-point increase from the previous version, GPT-5.2-Codex, which scored 64.0%. A user on the X platform noted that this score “completely crushes” Anthropic’s Opus 4.6, which reportedly scored 65.4% on the same benchmark.

OpenAI also stated that these achievements were made with significantly improved efficiency, requiring less than half the number of tokens compared to the previous generation model while achieving over a 25% increase in inference speed per token.
OpenAI emphasized:
“Notably, GPT-5.3-Codex uses fewer tokens than any previous model, allowing users to accomplish more.”
Evolution from Programming Assistant to Programming Operator
More importantly than the benchmark improvements is OpenAI’s positioning of GPT-5.3-Codex. The company clearly stated:
“Codex is evolving from a mere code-writing and reviewing agent to one that can nearly perform any task a developer or professional does on a computer.”
This expanded capability includes debugging, deployment, monitoring, writing product requirement documents, editing copy, conducting user research, creating presentations, and analyzing data in spreadsheet applications. The model excelled in the GDPVal assessment, which measures the completion of well-defined knowledge-based tasks across 44 professions.
Analysts suggest this expansion indicates OpenAI’s ambitions extend beyond the developer tools market to a broader enterprise productivity software domain, competing against established players like Microsoft, Salesforce, and ServiceNow, all of whom are accelerating the integration of AI agents into their platforms.
OpenAI’s First High-Capacity Cybersecurity Model
The shift towards general computing capabilities also raises new security considerations. OpenAI announced that GPT-5.3-Codex is its first model classified as having “high capability” in cybersecurity-related tasks under the “readiness framework” and is also the first directly trained to identify software vulnerabilities.
OpenAI stated: “While we have not yet found conclusive evidence that it can fully automate cyberattacks, we are taking a cautious approach and have deployed the most comprehensive cybersecurity protection system to date.” Measures include dual-use security training, automated monitoring, trusted access mechanisms for advanced capabilities, and execution pipelines combined with threat intelligence.
Altman emphasized this advancement on X:
“This is our first model with ‘high’ cybersecurity capabilities under the readiness framework. We are piloting a trusted access framework and committing $10 million in API credits to accelerate cyber defense.”
Additionally, OpenAI is expanding its private testing of the security research agent Aardvark and collaborating with open-source maintainers to provide free codebase scans for widely used projects. OpenAI cited Next.js as an example, noting that a security researcher recently used Codex to discover and disclose vulnerabilities.
Intensifying Competition with Anthropic
However, the company’s cybersecurity announcements were quickly overshadowed by the rivalry with Anthropic. Media reports indicate that without context, the significance of the simultaneous release timing on Thursday is difficult to grasp.
Anthropic, a startup focused on AI safety founded in 2021 by former OpenAI researchers, including Dario and Daniela Amodei, also launched Claude Opus 4.6, describing it as “the smartest model” capable of more cautious planning, sustained execution of agent-like tasks, reliability in large codebases, and self-correction of errors.
This direct confrontation comes amidst escalating tensions, as Anthropic announced plans to air an advertisement during the Super Bowl mocking OpenAI’s recent decision to test ads among free ChatGPT users.
Altman responded directly to this in a rare lengthy post on X, calling the ads “laughable” but “clearly dishonest.”
He stated:
“We would never run ads like those depicted in Anthropic’s advertisement. We are not foolish and know users would never accept such a practice.”
He further characterized Anthropic as an “authoritarian company” attempting to control how people use AI.
Altman added:
“Anthropic offers expensive products to the wealthy. The number of Texans using the free version of ChatGPT is greater than the total number of users of Claude in the U.S., so we face entirely different issues.”
Corporate AI Spending Surpasses Expectations
Behind this public spat lies a serious commercial competition. This confrontation occurs against the backdrop of explosive growth in enterprise AI applications, with both companies vying for a rapidly expanding market.
According to a survey released by Andreessen Horowitz, corporate spending on large language models has significantly exceeded even optimistic predictions. By 2025, average corporate spending on LLMs is expected to reach $7 million, an increase of 180% from the $2.5 million spent in 2024, and 56% higher than companies’ predictions for 2025 just a year ago. By 2026, spending per company is projected to reach $11.6 million, a further 65% increase.
The data from a16z also reveals shifts in market dynamics. OpenAI still holds the largest share of corporate AI spending, but this share is shrinking—from 62% in 2024 to an expected 53% in 2026. Meanwhile, Anthropic’s share is projected to rise from 14% to 18%, with Google also showing similar growth trends.
In terms of usage patterns, the situation is more nuanced. While OpenAI leads in overall usage, only 46% of surveyed OpenAI customers are using its strongest model in production environments, compared to 75% and 76% for Anthropic and Google, respectively. If testing environments are included, 89% of Anthropic customers are testing or using their strongest model, the highest proportion among major vendors.
In the core application scenario of software development, a16z’s survey indicates that OpenAI holds about 35% of the market share, while Anthropic occupies a significant and growing portion of the remaining market.
OpenAI Promises More Codex Features in Coming Weeks
Looking ahead, OpenAI announced that GPT-5.3-Codex is now available to paid ChatGPT users, covering all Codex use cases, including desktop applications, command-line interfaces, IDE extensions, and web interfaces, with API access expected to follow.
The model has also introduced a new interactive feature: users can choose between a “pragmatic” and “friendly” personality. Altman noted that users have shown a strong preference for this feature. On a more substantive level, the model will frequently provide progress updates during task execution, allowing users to interact in real-time, ask questions, discuss ideas, and guide solutions without losing context.
OpenAI stated:
“You no longer have to wait for the final result; you can interact in real-time. GPT-5.3-Codex will clarify what it is doing, respond to feedback, and keep you informed from start to finish.”
The company promised to roll out more capabilities in the coming weeks. Altman confidently remarked, “I believe Codex will win.”
He framed the competition with Anthropic philosophically:
“This era belongs to builders, not those who wish to control them.”
Comments
Discussion is powered by Giscus (GitHub Discussions). Add
repo,repoID,category, andcategoryIDunder[params.comments.giscus]inhugo.tomlusing the values from the Giscus setup tool.