Are AI Tokens Priced Correctly? Should They Be Higher or Lower?

token prices look cheap in the aggregate but mask a widening split between commodity throughput and premium reasoning—and the billing unit itself is becoming obsolete

The price of an AI token is starting to look like the price of electricity in the early days of electrification: cheap enough to spread everywhere, expensive enough to keep everyone arguing, and increasingly poor at describing the real value being sold.

The price ladder

On paper, the market looks straightforward. OpenAI’s GPT-5.4 is listed at $2.50 per million input tokens and $15 per million output tokens, while GPT-5 mini is far cheaper at $0.25 input and $2 output. Anthropic’s Claude Sonnet 4.5 is $3 input and $15 output, Haiku 4.5 is $1 and $5, Google’s Gemini 2.5 Pro is $1.25 input and $10 output for prompts up to 200,000 tokens, Gemini 2.5 Flash is $0.30 and $2.50, Gemini 2.5 Flash-Lite is $0.10 and $0.40, and DeepSeek’s current V3.2 API pricing is $0.28 per million input tokens on a cache miss and $0.42 per million output tokens. That is not one market price. It is a full price ladder, from premium reasoning to near-commodity throughput.

Why the spread matters

That spread matters, because it tells you the industry is already segmenting AI into at least two businesses. One business sells fast, good-enough text generation at brutal scale. The other sells scarce, higher-confidence reasoning for coding, analysis, and enterprise workflows. If you ask whether “AI tokens” are priced correctly, you have to decide which of those businesses you mean. The cheap end of the market is behaving like a commodity: prices keep falling, rivals undercut one another, and vendors introduce lighter models specifically to defend volume. The premium end is behaving more like cloud infrastructure plus consulting: customers are paying for reliability, tool use, long context, and the chance of getting a harder task right on the first try.

Evidence for lower prices ahead

On the commodity side, the evidence points toward lower prices ahead. Stanford’s 2025 AI Index reported that the inference cost for a system performing at GPT-3.5 level fell more than 280-fold between November 2022 and October 2024. Epoch AI found that the cost of reaching fixed performance milestones has been falling anywhere from 9 times to 900 times per year, depending on the task. Those are not normal software-economics numbers; they are collapse-the-margin numbers. If capability-adjusted inference keeps getting dramatically cheaper, then today’s low-end token prices probably still have room to fall.

Competition driving prices down

Competition is pushing in the same direction. Reuters reported that Google introduced Flash-Lite specifically as a cheaper model as cost concerns intensified, and that DeepSeek’s low-cost models rattled both investors and incumbents. A year after the initial DeepSeek shock, Reuters said low-cost, open-source models had become “the norm” in China’s AI ecosystem and cited a RAND finding that Chinese models were operating at roughly one-sixth to one-fourth the cost of comparable U.S. systems. DeepSeek has also used aggressive pricing tactics directly, including off-peak discounts and extremely low list prices on its API. In plain English: once multiple vendors can deliver acceptable quality for summarization, extraction, routing, classification, and other high-volume workloads, price becomes the weapon.

Why not all tokens should be lower

But that does not mean all token prices should be lower. Frontier output is still priced far above input for a reason. OpenAI’s GPT-5.4 charges six times more for output than input. Anthropic’s Sonnet 4.5 is five times higher on output than input. Google’s Gemini 2.5 Pro is eight times higher on output than input at its standard tier. That pricing structure is a clue: providers are not really charging for text as such. They are charging for scarce generation-time compute, especially on harder models that spend more effort producing an answer. The expensive part is not merely storing your prompt. It is running the model forward, often with extra reasoning, and doing it quickly enough to be useful.

Are premium tokens actually cheap?

That is why premium token prices can look entirely reasonable, even when the sticker shocks developers. A thousand GPT-5.4 output tokens cost about 1.5 cents at list price. A thousand Claude Sonnet 4.5 output tokens also cost about 1.5 cents. A thousand Gemini 2.5 Pro output tokens cost about 1 cent under the standard tier for prompts up to 200,000 tokens. If those tokens produce a clean summary of a legal filing, fix a bug that would have taken an engineer twenty minutes, or draft a first-pass memo that saves an analyst a chunk of the morning, the per-token price does not look high. It looks cheap. In that sense, premium tokens may actually be underpriced relative to the labor value they can unlock—at least when the model is good enough and the workflow is designed well.

Tokens as a billing unit

The deeper problem is that tokens are becoming a worse billing unit over time. OpenAI defines tokens as units that can be as short as a single character or as long as a full word, with rough English rules of thumb like one token being about four characters or three-quarters of a word. Google gives a similar rule of thumb for Gemini—about four characters, or 60 to 80 English words per 100 tokens—but also notes that images and other non-text modalities are tokenized too, and that “thinking” tokens can be separately counted in usage metadata. So even before you compare vendors, a token is already an imperfect stand-in for work. It is not a word, not a request, not a minute of labor, and not a stable cross-model unit of value.

Hidden extras

Then the hidden extras start piling up. Anthropic charges for web search on top of tokens—$10 per 1,000 searches—and adds token overhead for tool use. Google separately charges for grounding with Google Search and Maps on some models, with grounded prompts priced beyond free tiers. Anthropic also imposes premium long-context rates once certain input thresholds are exceeded, and Google’s Gemini 2.5 Pro doubles its input price and raises output price for prompts above 200,000 tokens. In other words, the real bill increasingly depends on much more than token count. It depends on search, tools, context length, caching, and reasoning mode. That is another sign the market is moving away from raw per-token economics, even if the invoice still pretends otherwise.

Discount structures

Even vendors’ own discount structures show that list price is not the true clearing price. OpenAI sharply discounts cached input versus fresh input, Anthropic offers materially lower batch pricing than standard pricing, and Google does the same on Gemini 2.5 Pro, Flash, and Flash-Lite. That tells you something important: repeated context is cheaper than live context, slower jobs are cheaper than interactive jobs, and the published “headline” token rate is often just the top rail of a more flexible pricing system. Mature cloud markets work like this too. List price exists, but serious usage quickly moves to reserved, cached, batch, or enterprise-negotiated economics.

Conclusion

So are AI tokens priced correctly? The cleanest answer is: not really, but not in one direction. Commodity tokens are probably still too expensive relative to where capability-adjusted costs and competition are headed. Premium reasoning tokens are not obviously overpriced, and in some business settings they may be too cheap for the value delivered. The bigger mismatch is conceptual. Tokens are still useful as a machine-level meter, but they are becoming a worse human-level price tag. Buyers care about completed tasks, latency, reliability, grounded answers, context size, and workflow savings. Sellers know that, which is why they are already layering search fees, long-context premiums, batch discounts, and tool charges on top of the token meter.

The market split ahead

The market is likely heading toward a split. At the bottom: ultra-cheap tokens for bulk generation, agents, and background automation, driven down by open models and price wars. At the top: higher effective prices for premium reasoning, enterprise guarantees, multimodal workflows, and tool-rich systems that are billed less like text boxes and more like digital workers. In that future, tokens will still be counted. They just will not be the whole story. The question will shift from “What does a million tokens cost?” to “What does a useful, trustworthy unit of AI work cost?” That is a much harder number to print on a pricing page—and probably a much more honest one.