MONDAY · 18 MAY 2026№ 013

AMARTUVSHIN

Simplicity over complexity, quantity over quality.

Security Engineer · Notes from building at speed

◆ Essay № 04AI1 min read

The model cadence is breaking my reading list

GPT-5.5, Gemini 3.1 Ultra, Claude four times in fifty days — and SubQ shipped a 12M-token context at a fifth of the cost.

Published
May 18, 2026
Reading time
1 minutes

I used to read every release post. As of this month, I've given up.

In the last sixty days the frontier shipped: OpenAI's GPT-5.5 (six weeks after 5.4, now the default Instant), Google's Gemini 3.1 Ultra (2M-token native multimodal), and four major Claude updates in roughly fifty days. The pace is no longer "every quarter, with anticipation" — it's "every Tuesday, with fatigue."

The more interesting move is architectural. SubQ released the first commercial subquadratic LLM with a 12M-token context — and claims to run that context at one-fifth the inference cost. If that holds under load, transformers stop being the only game.

§ 01 What this means for building

For the kind of work I do — shipping product fast — the takeaway isn't "switch models." It's stop coupling your code to any one of them. Capabilities cluster within months. Cost asymmetries are the only durable difference. Build through a router, log every call, and let the cheap-enough model win on the eval that matters to your product.

The frontier no longer rewards the people who read every release. It rewards the ones with a tight eval loop and the patience to swap.