AI spending has quickly become a permanent line item in most technology budgets. Like any material investment, it is now attracting serious scrutiny around whether it is truly delivering value.
Yet AI resists traditional evaluation in ways that most other technology categories do not, because the standard frameworks and metrics we use to assess IT spend don’t easily translate. The longer businesses tolerate this mismatch, the more they will continue to scale commitments they cannot yet justify.
Most of us can report how many tokens we used. Far fewer can say what those tokens produced.
That gap, between how AI is billed and how its value is measured, is one of the harder management problems in enterprise AI today. Closing it is a measurement problem rather than a technical one, and it responds to a straightforward framework you can use in a budget conversation.
The Measurement Problem
Tokens meter compute, not value. They are a faithful record of how hard the model worked and a poor proxy for whether the work was worth doing. The unit you are billed in and the unit your business cares about are decoupled, and nothing in the invoice bridges them.
Reasoning models have widened that gap: you now pay for the model to deliberate, and the cost scales with how much thinking you allow, so spend can climb with no obvious relationship to output.
The Instinct to Resist
When the bill grows, the natural response is to treat tokens as a cost to minimize. In the early days of the cloud transition, leaders often reacted to compute bills by throttling usage and at times starved the workloads producing the most value.
Since, frameworks like FinOps have emerged to make that spend accountable rather than simply smaller, and token spend now lacks the equivalent discipline.
Minimizing tokens optimizes the wrong variable. A token that helps close a large deal is inexpensive in context. A token spent on a use case no one needed is expensive no matter how few were used.
The objective is not fewer tokens. It is more value per dollar.
The Framework: Cost Per Outcome
The step that makes AI spend legible, is to stop measuring tokens and start attempting to measure the cost of an outcome.
An outcome is something the business recognizes: a closed ticket, a drafted contract, a resolved customer issue, a qualified lead.
The cost of that outcome breaks into three parts:
Cost per outcome = (cost per token) × (tokens per task) × (tasks per outcome)
Each of the three is a lever you can adjust on its own.
- Cost Per Token is the pricing layer. The largest lever is right-sizing the model. Most teams default to the most capable and most expensive model when a smaller one would do the job. Caching repeated context and batching work that is not time-sensitive reduce this further, as do vendor terms once your volume is meaningful. If the data is available, cost per employee per token is even better.
- Tokens Per Task is the efficiency layer. Long prompts, loading entire documents into context when retrieval would surface the one relevant section, and outputs that run longer than anyone reads all add cost to every transaction. Prompt discipline is among the least expensive improvements available.
- Tasks Per Outcome is the design layer, and it is the one most often overlooked. It measures how many AI interactions it takes to reach a usable result. If staff re-prompt five times to get one good answer, the real cost is five times the figure on the invoice. Better workflow design, along with a clear sense of when to keep a person in the loop, brings this number down.
The value of this is largely diagnosis, not precision. I would recognize early estimates as approximate and directional.
Depending on your vendor and use case, there could be widespread gaps in data availability. In my experience, OpenAI’s admin tools and employee level data offer far more end-to-end views than their closest competitors.
For purposes of this article, I’m focusing on direct use with a chat-bot vs API use, the latter should be much simpler to apply this framework to.
When AI returns look poor, the framework should show where the issue sits: tokens that cost too much, prompts that are inefficient, too many attempts to reach a usable result, or a use case that was never worth the spend.
Each has a different remedy, and cries of ‘AI is expensive’ stops being treated as a single undifferentiated problem.
Knowing Where You Stand
Most orgs currently sit somewhere on a short AI maturity ladder, and simply identifying where they stand is valuable on its own.
- Blind: A flat invoice with little sense of what drives it.
- Metered: Total spend is tracked, but not by use case.
- Attributed: Spend is mapped to specific teams, features, and use cases.
- Unit-economic: A cost per outcome exists for the top use cases.
- Optimized: The levers are actively managed, and value per dollar informs decisions.
As you may imagine, much of the available return comes from moving up the ladder. The goal is not to reach the top this quarter but to advance one rung, beginning by instrumenting the two or three largest use cases before scaling them further.
The Wrap
The businesses that will get the most from AI are not the ones with the lowest token costs. They are the ones that can clearly quantify, in financial terms, what value those costs delivered and can therefore invest with intention rather than caution.
In many cases, the right decision is to spend more on additional reasoning, richer context, or further iterations, because the increase in outcome value exceeds the added cost. That judgment only becomes reliable when the underlying economics are explicit.
For technology leaders, the role here is not to be the function that declines AI spend. It is to be the one that can explain it.
Tokens are the meter. Outcomes are the goal.
Keeping the two distinct is what separates the organizations that manage AI well from those still working to interpret the invoice.
Trusted insights for technology leaders
Our readers are CIOs, CTOs, and senior IT executives who rely on The National CIO Review for smart, curated takes on the trends shaping the enterprise, from GenAI to cybersecurity and beyond.
Subscribe to our 4x a week newsletter to keep up with the insights that matter.


