AI Got Cheaper. So Why Is Your Bill…

There is a quiet panic spreading through finance departments, and it contradicts almost everything we were told about artificial intelligence. The pitch was simple: AI would do the work of people, at a fraction of the cost. The reality landing on CFOs' desks in 2026 is the opposite. In a growing number of cases, using the AI is turning out to be more expensive than paying the human it was supposed to replace.

This isn't a fringe complaint. Microsoft — a company that has bet its future on AI — reportedly cancelled most of its internal Claude Code licences after encouraging staff to adopt the tool, only to watch the bill balloon. Uber burned through its entire 2026 AI coding budget in four months. And Nvidia's own Bryan Catanzaro, a man whose company sells the picks and shovels of this gold rush, put it bluntly: "For my team, the cost of compute is far beyond the costs of the employees."

When the people selling AI are telling you the compute costs more than the staff, it's worth asking what went wrong — and what to do about it.

The paradox: tokens get cheaper, but your bill goes up

Here's the counterintuitive part. The price of AI per unit really is collapsing. Gartner expects inference costs on sophisticated models to fall by nearly 90% by 2030, and per-token API prices have dropped sharply between 2025 and 2026. So why is total spend climbing?

Because consumption is exploding far faster than prices are falling. Goldman Sachs forecasts that the shift to "agentic" AI — systems that don't just answer a question but plan, reason, and act over many steps — could drive a 24-fold increase in token consumption by 2030, reaching an almost unimaginable 120 quadrillion tokens a month.

The maths is brutal in its simplicity. A simple chatbot Q&A might consume 500–2,000 tokens. A single agentic workflow — long context windows, multi-step reasoning, tool calls, heavy output — can chew through 15,000–80,000 tokens to complete one task. Cheaper tokens don't help when you're now using a hundred times more of them. As Gartner warns, leaders shouldn't "confuse the deflation of commodity tokens with the democratization of frontier reasoning."

The result is on display in real deployments. One organisation watched a single agent's costs climb to $300 a day running against a frontier API — over $100,000 a year — while replacing only a fraction of one person's job. For context, a fully-loaded senior engineer might cost $200,000 a year, but does the work of a whole role, with judgement, accountability, and no token meter spinning in the background. The CIO analysis of these failures found the cost gap between a smart deployment and a careless one can be ten times the operating cost.

Today's price is a loss-leader, not the real one

Here's a cost that rarely makes it onto the spreadsheet: the cheap AI most businesses rely on today is priced below what it actually costs to run.

In its most recent reported year, OpenAI is estimated to have brought in roughly $3.7 billion in revenue while losing around $5 billion — by some analyses, spending close to $1.35 for every dollar it earned, driven not by research salaries but by the raw cost of serving billions of inference requests a day. Several major providers are widely reported to be pricing inference below cost to win market share, a strategy underwritten by more than $110 billion in raised capital that won't subsidise prices forever.

This is a familiar commercial play: the loss-leader. The early price is low — sometimes free — because the goal isn't to profit on that transaction; it's to win adoption and build switching costs. It's the same logic behind a supermarket's discounted staple or a platform's free tier: establish the habit first, and prices tend to firm up later once moving away has become harder.

The analysts are blunt about where this goes. As one put it: "The subsidy creates dependency. The dependency makes the price increase unavoidable." Today's prices create a false floor in the market — one that will normalise upward within an estimated 12–24 months as capital discipline returns. And the move has already begun: GitHub shifted to usage-based "AI Credits" billing on 1 June 2026, replacing flat-rate access with metered tokens. The industry warning is being stated plainly — every AI subscription is a ticking time bomb for enterprise.

So building your operations on a single frontier vendor's subsidised pricing isn't just buying a tool — it's anchoring your budget to a price that is more likely to rise than fall once your workflows are too embedded to move easily.

This is what "building tech without the real world" looks like

The deeper problem isn't the price of tokens. It's a mindset. Too much of the AI industry has been built to win benchmarks and demos — and to capture users — not to survive contact with a real business and its budget.

The evidence is damning. MIT's 2025 study, The GenAI Divide: State of AI in Business, found that 95% of enterprise generative-AI pilots delivered no measurable business impact — despite $30–40 billion in spending. The failure, MIT concluded, was rarely the model's raw intelligence. It was the "learning gap": the inability to fit AI into real workflows, real data, and real economics.

Tellingly, MIT found that buying AI capability from specialised vendors succeeded around 67% of the time, while internal "let's wire up the frontier API ourselves" builds succeeded only about a third as often. And the biggest returns weren't in the flashy sales-and-marketing tools that soaked up most budgets — they were in unglamorous back-office automation, where the task is bounded and the output can be checked.

In other words: the businesses that won treated AI as an engineering and economics problem to be solved for their world. The ones that struggled treated AI as a single premium model to apply everywhere — and paid for it accordingly.

The bias you're not pricing in — in two flavours

There's a second category of cost to relying on a single large US model, and it never appears on a token bill. It appears in the advice and the answers — and it comes in two distinct flavours.

1. They think American. A consistent body of 2025 research shows large language models carry systematic Western, and specifically US, cultural bias. Benchmarked against the World Values Survey, leading models' outputs favour the individualist, self-expression values most common in the US and the English-speaking world — unsurprising, given training data that is overwhelmingly Western-centric. For an Australian business, that quietly nudges your customer communications, hiring language, policy drafts and market analysis toward an American frame of reference.

2. They're optimised around their own ecosystem. This is the one the marketing tends to skip. A model built by a large platform company is, naturally, designed to work best inside that company's wider stack — that's a rational commercial incentive, not a conspiracy, but it shapes the advice you get. Ask Microsoft's Copilot how to solve a problem and its centre of gravity tends to be Azure, Microsoft 365, and the Microsoft identity stack — to the point that US regulators have reportedly raised competition concerns about how tightly Copilot is bundled into core products. GitHub also walked back a Copilot feature that had surfaced promotional "tips" inside developers' pull requests after a backlash.

And it goes deeper than product placement. Ask an AI model which AI model is best, and you are asking it to mark its own homework. Independent research has documented self-preference bias: LLMs systematically rate their own outputs more highly than objectively better ones from rivals — partly because they simply find their own style more familiar. So the model recommending itself (or its maker's other products) as the right choice is not neutral advice; it's a structural conflict of interest. When independent benchmarks routinely disagree with the vendor's own ranking, the lesson is clear: you can't rely on the seller to tell you what to buy.

Layer on the third dimension — sovereignty — and the picture is complete. Many frontier models are still processed in US or European data centres, and Australian data-residency options remain limited. When a contract, health record, policy brief or piece of market intelligence is routed through an offshore service, your data crosses borders and you have less control over where it ends up. Australia's APS AI Plan, published in November 2025, put data-centre capacity and infrastructure squarely on the national agenda for exactly these reasons, and there's growing international debate about access to AI infrastructure being tied to political alignment between nations. Building your business on a single foreign model is a cost decision, a neutrality decision, and a strategic one — all at once.

The more we default to the biggest US model for everything, the deeper we dig that dependency, and the more of our money, our data, our cultural framing and our purchasing decisions we hand to someone else's roadmap.

The fix: the right model for the job, chosen by someone who isn't selling you one

The mistake at the root of all of this is treating "AI" as a single, premium, all-purpose product bought from a single vendor. It isn't — and you shouldn't.

The open-weight world has caught up fast. Models like Qwen, DeepSeek, Llama and others now perform comparably to today's leading frontier models on the bread-and-butter tasks most businesses actually run — coding, classification, summarisation, structured data extraction, instruction-following. And they do it at a cost per token that can be many times cheaper, especially self-hosted or run sovereign. A frontier API might charge well over $1.25 per million tokens; a strong open-weight model can run closer to $0.20 — and just as importantly, an open-weight model you run yourself isn't exposed to a single vendor's pricing decisions (though you do still carry the hardware and hosting costs).

The frontier models still matter — for the genuinely hard reasoning, the nuanced writing, the edge cases where the best model earns its premium. The trick is to reserve them for those moments, not burn them on tasks a far cheaper model handles just as well. This is exactly the control the CIO experts recommend: pair tightly-scoped work with smaller, cheaper, often locally-run models, and escalate to the expensive frontier only when the task truly justifies it.

This is the principle Intelli-Assist is built on. Rather than wiring every request to one expensive American frontier model, Intelli-Assist takes a multi-model approach: it routes each task to the most appropriate engine. High-volume, routine work — classification, triage, extraction, the back-office automation MIT identified as the real ROI — goes to efficient open-weight models. The genuinely hard problems are escalated to frontier-class models where their quality is worth the price. The cheap model does the heavy lifting; the expensive model does the heavy thinking.

The decisive advantage is that the layer making those choices is vendor-neutral by design. Of course, "trust us, we're neutral" isn't enough on its own — so the routing is driven by transparent, workload-tested criteria you can see: cost per task, output quality, security, and where the data is allowed to go. Because it isn't tied to any one provider, you get the capability of frontier AI without the runaway frontier bill, without betting your budget on subsidised pricing that may climb, and without putting all your data — and all your eggs — in one basket.

What Australian businesses should do now

You don't need to abandon AI. You just need to buy it more deliberately:

Audit what you pay per task, not per token. The headline price is a distraction. The unit that matters is cost-to-complete-a-real-job.
Assume today's price is temporary. Much frontier pricing is subsidised and more likely to rise than fall. Don't architect your business around a number that may climb sharply in the next 12–24 months.
Never let one model be the whole answer. Match high-volume routine work to cheap, efficient models; reserve frontier models for the hard 5%.
Don't take the seller's advice on what to buy. A model that recommends itself or its maker's cloud is marking its own homework. Trust independent benchmarks and your own testing.
Put budgets and guardrails in from day one — hard caps per agent, per team, per key. Uncontrolled agents are how $300-a-day surprises happen.
Price in sovereignty and bias. Where does your data go, and whose assumptions are baked into the output? Factor it in alongside the benchmark score.

The companies that win with AI over the next few years won't be the ones who spent the most on the biggest model. They'll be the ones who avoided locking themselves to a single subsidised vendor — and who treated cost, neutrality, control and sovereignty as first-class design problems. That's not a limitation of AI. It's finally building it for the real world.

Curious what your AI is really costing you? Bring one AI bill or one workflow to Intelli-Assist. In 20 minutes we'll estimate your cost per task, flag where a frontier model is doing work a cheaper model could handle just as well, and show where simple routing or guardrails could reduce spend — with no obligation. Book your free 20-minute AI cost review.

AI Got Cheaper. So Why Is Your Bill Going Up?

The paradox: tokens get cheaper, but your bill goes up

Today's price is a loss-leader, not the real one

This is what "building tech without the real world" looks like

The bias you're not pricing in — in two flavours

The fix: the right model for the job, chosen by someone who isn't selling you one

What Australian businesses should do now

Sources

More Articles

Shadow AI Doesn't Just Leak Data — It Leaks How You Think

Why Your Business Needs an AI Officer, Not Another Chatbot

The 13 Hours a Week Australian SMEs Lose to Admin