GPT-5.5 Ships — and the Price Doubles

Published

May 18, 2026

Reading time

2 minutes

On April 23, 2026, OpenAI released GPT-5.5 — the first model since GPT-4.5 to be described internally as a rebuild rather than an iteration. New architecture. New pretraining corpus. New training objectives. And, sitting at the bottom of the launch post like a sober footnote, new pricing: $5 per million input tokens, $30 per million output. That is double what GPT-5.4 cost on its release day six months ago.

§ 01 What is actually different

The headline numbers are unambiguous. GPT-5.5 ships with a 1-million-token context window in the API. It tops the Artificial Analysis Intelligence Index. It posts state-of-the-art scores on Terminal-Bench 2.0, OSWorld-Verified, GDPval, FrontierMath, and CyberGym. On Vectara's hallucination evaluation, it produces 60% fewer hallucinated facts than GPT-5.4.

The most striking single number is on long-context reasoning. On MRCR v2 at the 512K–1M token range — the most punishing public test of multi-document retrieval inside a long window — GPT-5.5 scores 74.0%. GPT-5.4 scored 36.6% on the same evaluation. A 37-point jump on a long-context benchmark is not a tuning improvement. It is evidence that the rebuild was a rebuild.

§ 02 The price of being first

The doubling of API pricing is the real news for everyone shipping production systems on these models. At $5/$30 per million tokens, GPT-5.5 is back into the price band that GPT-4 occupied in mid-2023. The pro variant, GPT-5.5-pro, sits at $30/$180. Codex, the agentic coding endpoint, is bundled with a 400K-token context for the same prices as the base model.

The implicit bet: that the capability gain is large enough to justify a doubling of cost-per-token even before you account for fewer retries, fewer hallucinations, and fewer agent loops needed to complete the same task. It is a bet that will be priced into every multi-model routing system over the next quarter.

§ 03 The crowd at the top

GPT-5.5 did not arrive in an empty room. The same release window contained:

Claude Opus 4.7 from Anthropic (April 16), which jumped SWE-bench Pro from 53.4% to 64.3% — a 10.9-point gain on real-world software-engineering tasks.
DeepSeek V4 Preview (April 24), with V4 Pro at 1.6T total / 49B active parameters under Apache 2.0, and V4 Flash at 284B / 13B active.
Gemma 4 from Google, also open-sourced under Apache 2.0.
MAI-Transcribe-1, MAI-Voice-1, and MAI-Image-2 from Microsoft's in-house MAI division.
Qwen 3.6-Plus from Alibaba and Muse Spark from Meta.
Nemotron 3 Nano Omni from NVIDIA — a 30B open multimodal model meant for edge inference.

The shape of the market a year from now is being decided this month. The labs that win will not be the ones with the smartest model; they will be the ones whose pricing curves bend faster than their capability curves.

The frontier moved. The price moved with it.
Archive Notes

◆

AMARTUVSHIN

GPT-5.5 Ships — and the Price Doubles

§ 01 What is actually different

§ 02 The price of being first

§ 03 The crowd at the top

Musk v. Altman Goes to the Jury