The Long Beach News

collapse
Home / Daily News Analysis / Why AI tokens will send your enterprise cloud bill sky-high again

Why AI tokens will send your enterprise cloud bill sky-high again

Jun 26, 2026  Twila Rosenbaum  4 views
Why AI tokens will send your enterprise cloud bill sky-high again

For months, enterprises enjoyed the artificial intelligence honeymoon: flat-fee subscriptions, generous all-you-can-eat token limits, and the freedom to experiment without watching every penny. That era is over. Token-based pricing has become the bedrock of the generative AI economy, and the bills are landing with a thud. At this year's FinOps X conference, the message was clear: AI tokens will send your enterprise cloud bill sky-high again.

Tokens: The new atomic unit of AI work

In this new pricing landscape, the token is the fundamental unit of AI work. J.R. Storment, executive director of the FinOps Foundation, calls it "the atomic unit of AI." During his keynote, he likened tokens to oil in the 20th century, serving as the unit of output from hardware and data centers, the way labs price their outputs, and the value unit enterprises seek to monetize. This abstraction lets labs and hyperscalers avoid directly billing for GPU types, memory, and power. Instead, they expose a single metric—dollars per million tokens—across a bewildering mix of architectures and deployment topologies.

But what exactly is a token? An AI token is the smallest fragment of text a large language model (LLM) can process. Tokenization breaks words into pieces; for English, roughly four characters per token, or about three-quarters of a word. So 100 tokens equal about 75 words. That simple unit hides enormous complexity—model choice, quantization, caching strategies, agentic loops—all of which directly affect the final bill.

The end of the all-you-can-eat era

The transition from cheap experiments to expensive reality has been swift. Storment identifies three phases: the pre-ChatGPT "old days," the "good old days" when chatbots wrote decent code, and the post-November 2025 world where frontier models became truly good. During the good old days, companies subsidized heavy usage. Some users on $200-a-month plans actually cost upwards of tens of thousands of dollars by running everything on the latest models. SemiAnalysis estimated that a $200 Anthropic plan once delivered $8,000 worth of Claude tokens, and a similar OpenAI plan gave $14,000 worth of Codex tokens. Those subsidies are gone. Enterprises now face the real cost of every token consumed.

Token leaderboards, once a source of pride, are now obsolete. As Amazon senior vice president Dave Treadwell pleaded, "Please don't use AI just for the sake of using AI." The message resonates across the industry: every token must earn its keep.

Why token prices aren't falling fast enough

Conventional wisdom suggests that Moore's Law and hyperscale competition should drive token prices down. And indeed, prices have fallen dramatically since 2023. But as Storment and SAP's FinOps team point out, the floor may be in sight. Since November 2025, token prices have remained flat, directly linked to hardware and power constraints. GPU and component shortages continue; Intel's CEO does not expect real relief until 2028. Supply chain constraints and rising hardware costs mean the unit cost of tokens is unlikely to drop as fast as demand grows.

This leads to a classic Jevons paradox: falling unit cost but exploding total spend. SAP reported that even with declining token prices, their spend in some months doubled. Goldman Sachs estimates global token usage will climb from 6 quadrillion tokens today to 120 quadrillion within three and a half years. Even if prices drop further, they won't fall 24 times as fast as volume grows. The bills will continue to climb.

FinOps retools for token economics

The FinOps community, seasoned in cloud right-sizing and reserved instances, faces a new challenge. Token pricing is familiar in its usage-based nature but alien in its volatility. As SAP's Frederik Pohl asserted, "AI does not just stretch the cloud playbook, it breaks it." Unlike CPUs, LLMs have unique cost profiles, and swapping a model is not just a pricing decision but a quality-of-output decision.

SAP's journey illustrates how enterprises retool. When they first sought AI cost data, existing cloud tools were blind to the nuances of LLMs. They could see total spend per provider but not which model or how much. So they pulled data manually, merged tables, and created their first picture by hand. Within days, the CTO demanded regular reports. That mandate forced SAP to formalize an internal AI FinOps framework built on three pillars:

  • Spend visibility: Understanding what is consumed, how, and where—across models, platforms, business units, and regions.
  • Economics: Measuring efficiency with token-level metrics like input/output ratios, cached token ratios, and token-to-spend drift to see whether cost increases stem from volume or model mix shifts.
  • Value: Connecting AI spend to business outcomes—cost per use case and inference cost by revenue—to determine which features are economically viable.

The mantra, echoed by Nvidia CEO Jensen Huang, is "token factory effectiveness." Every part of the pipeline—from silicon and data center leases to model routing and prompt design—must be optimized.

Tokenomics: Beyond counting tokens

While FinOps focuses on cost control, the Linux Foundation is pushing "tokenomics" as a broader discipline covering the full lifecycle of tokens as economic goods. This includes production (converting energy and capital into tokens), consumption (allocation, forecasting, optimization), and value (monetization, labor implications, and pricing adjustments).

Tokenomics directly collides with software-as-a-service business models. Microsoft's GitHub Copilot shift toward explicit usage-based charging is an early example. Developers who loved unlimited tokens are now angry because their implicit subsidy vanished. Labs themselves are tightening screws in ways invisible at the token level. For instance, Anthropic briefly included a policy in its Fable model card that would silently drop heavy users to a cheaper model—a practice that would make naive cost-per-token metrics meaningless.

Such opacity complicates forecasting. A token can cost two cents per million or 35 cents per million, and even at the same rate, one token may drive high value while another drives none. The C-suite has latched onto tokens as a mental model, making tokenomics a strategic priority.

Business models adapt: Credits, hybrids, and pass-through

Most customers won't see a line item for "quadrillions of tokens." Instead, vendors are building layers of abstraction: credits (like putting quarters in a machine), hybrid subscription-plus-usage plans, or direct pass-through models that show the token meter honestly but with guardrails. All are vulnerable to upstream shocks—a change in the token factory, a model routing error, a cache blow-up—which can cascade into consumer pricing changes and affect banks and other industries.

The Linux Foundation is launching a Tokenomics Foundation alongside the FinOps Foundation to create vendor-neutral specifications for measuring and allocating token-based costs. The FinOps Focus specification, originally designed for cloud billing, is being extended for token-level telemetry.

The human divide: Who gets AI access?

Token pricing shapes not just enterprise budgets but also societal access. Storment warns of a divide between those who can afford powerful AI and those who cannot. Inside companies, certain teams are deemed worthy of the latest model while others are routed to cheaper options. Yet crude caps may stifle innovation. One Fortune 100 executive advised against shutting down outliers; instead, talk to them—they might be doing something transformative. In a world where YC-backed startups receive millions in free tokens from frontier labs, internal experimentation could be an existential necessity.

For individuals, especially new workers, token pricing feeds anxieties about AI and jobs. Storment's nuanced view: "I don't think AI is immediately coming for everybody's job, but I think the person who's better at AI is coming for the job of the person who's not using AI." If token costs restrict learning and experimentation, that divide deepens.

The shift to token-based pricing marks a new, more expensive chapter in the AI era. Enterprises must bring value back to the center of their AI strategies. Measuring that value remains an unsolved problem, but the stakes have never been higher. The bills are real, and they are rising.


Source: ZDNET News


Share:

Your experience on this site will be improved by allowing cookies Cookie Policy