All posts
Insights

John in โt Hout

When Gartner forecasts worldwide AI spending will hit $2.5 trillion in 2026 a 44% year-over-year jump, the number is so large it almost loses meaning. But here is the number that should keep finance and infrastructure leaders up at night: according to IDC, G1000 organizations will face up to a 30% rise in underestimated AI infrastructure costs by 2027. Not overspending. Under-forecasting.
The gap between what companies plan to spend on AI and what they actually spend is widening fast. And traditional FinOps playbooks, built for predictable cloud workloads, are not equipped to close it.
The Problem: AI Costs Do Not Behave Like Cloud Costs
Most FinOps practices were designed around a familiar model โ provision compute, monitor usage, right-size instances, negotiate reserved capacity. That model assumes workloads scale in roughly linear, predictable ways.
AI breaks that assumption entirely.
Token-based pricing for large language models fluctuates based on context length, retry behavior, and user interaction patterns. A minor change in prompt structure or application usage can double inference costs overnight. GPU clusters sit idle during training gaps, then spike to full capacity with no warning. And the emerging pattern of agentic AI workflows, where models call other models โ creates compounding cost chains that are nearly impossible to forecast with traditional budgeting.
Average monthly AI costs hit $85,521 in 2025, a 36% increase from the prior year, according to CloudZero. Yet 94% of IT leaders report they are still struggling to optimize these costs effectively. The tools and processes they rely on were simply not built for this kind of volatility.
Why Traditional Optimization Falls Short
The standard FinOps toolkit: reserved instances, savings plans, spot pricing, right-sizing recommendations, targets infrastructure you can see and predict. AI workloads introduce three challenges that undermine each of these levers.
Consumption is opaque. Unlike a web server with predictable CPU cycles, an AI inference endpoint's cost depends on what users ask it and how the application processes those requests. Two identical API calls can cost dramatically different amounts depending on context window size.
Value attribution is unclear. When an AI feature is embedded in a product, who owns the cost? the product team, the platform team, or the ML team? Without clear unit economics, optimization becomes a political exercise rather than an engineering one.
The feedback loop is slow. By the time a monthly cloud bill reveals an AI cost spike, the damage is done. Real-time visibility into token consumption, GPU utilization, and model-level spend is still rare in most organizations.
What Leading FinOps teams are doing differently
The State of FinOps 2026 report shows that AI cost management has become nearly universal among FinOps practitioners: 98%, up from 63% the year prior. But adoption is not the same as maturity. The organizations pulling ahead share a few common strategies.
Treating model selection as a cost lever.
Not every task requires the most capable (and expensive) model. Leading teams route requests to the cheapest model that meets the quality threshold for each use case, using a large model for complex reasoning and a smaller one for summarization or classification. This single practice can cut inference costs by 40-60% without degrading user experience.
Building real-time cost telemetry.
Instead of waiting for end-of-month bills, mature teams instrument their AI pipelines to track cost per request, cost per user, and cost per feature in near real-time. This turns AI spend into a product metric, not just a finance line item.
Automating capacity decisions.
Predictive scaling, using AI itself to forecast demand based on historical patterns, product roadmaps, and seasonal behavior, is replacing reactive provisioning. These systems automatically acquire GPU capacity when spot prices dip and release resources when demand subsides, turning capacity planning into a continuous optimization loop.
Embedding FinOps into engineering workflows.
The most effective teams do not treat cost optimization as a separate function. They surface cost data in CI/CD pipelines, set per-team budgets with automated alerts, and make cost a first-class metric in architecture reviews. The FinOps Foundation reports that 78% of practices now report into the CTO/CIO organization, signaling that cost management is increasingly an engineering discipline.
The Organizational Shift That Makes It Work
Technology alone will not solve the AI cost problem. The organizations making real progress are also rethinking how FinOps teams operate.
Team sizes remain lean, organizations managing $100M+ in cloud spend average just 8-10 practitioners. They scale through federation: embedded cost champions in each engineering team, supported by centralized tooling and automation. Manual spreadsheet reviews are giving way to automated anomaly detection and policy-based guardrails.
This matters because the scope of FinOps is expanding rapidly. Ninety percent of respondents now manage or plan to manage SaaS costs alongside cloud infrastructure, and 57% are incorporating private cloud. AI is simply the latest, and most volatile, layer in an increasingly complex technology cost stack.
What to Do This Quarter
If your organization is running AI workloads without a dedicated cost optimization strategy, you are almost certainly overspending.
Here is where to start:
Audit your current AI spend by model, team, and use case. Most organizations cannot answer the question "what does this AI feature cost per user?" until they instrument it. Build that visibility first.
Implement model routing so that simple tasks use lightweight models. This is the single highest-ROI optimization for most teams and requires minimal engineering effort.
Establish cost accountability by assigning AI spend to product teams, not just infrastructure budgets. When the team building the feature also owns its cost, optimization happens naturally.
Invest in real-time monitoring that tracks token consumption and GPU utilization alongside traditional cloud metrics. Monthly bills are too slow for AI cost management.
The companies that treat AI cost optimization as a strategic capability, not an afterthought, will be the ones that scale AI profitably. The rest will be the ones explaining budget overruns to their board.
Ready to get your AI costs under control? Talk to our team โ to see how we help organizations build FinOps practices designed for the age of AI.
FAQ
What is FinOps for AI?
FinOps for AI extends traditional cloud financial management to cover the unique cost challenges of artificial intelligence workloads โ including GPU compute, model inference, token-based pricing, and training pipelines. Where standard FinOps focuses on predictable infrastructure like virtual machines and storage, FinOps for AI addresses the variable, often opaque costs that come with deploying and scaling machine learning models in production.
Why are AI costs so hard to predict?
Unlike traditional cloud resources that scale in relatively linear ways, AI costs are driven by factors that are difficult to forecast. Token-based pricing fluctuates with prompt length and user behavior. GPU utilization swings between idle and full capacity depending on training schedules. Agentic workflows โ where one model triggers calls to other models โ create compounding cost chains. These dynamics make standard budgeting and forecasting methods unreliable.
What is model routing, and how does it reduce costs?
Model routing is the practice of directing AI requests to different models based on task complexity. Simple tasks like text classification or summarization can be handled by smaller, cheaper models, while complex reasoning tasks go to larger, more capable ones. This approach can reduce inference costs by 40-60% without meaningfully affecting output quality, making it one of the highest-ROI optimizations available today.
How do I know if my organization is overspending on AI?
If you cannot answer the question "what does this AI feature cost per user?" you likely have a visibility gap that leads to overspending. Other warning signs include month-over-month AI cost increases that outpace usage growth, no clear ownership of AI spend across teams, and reliance on monthly cloud bills rather than real-time cost telemetry to track AI expenses.
Where should I start with AI cost optimization?
Start with visibility. Instrument your AI pipelines to track spend by model, team, and use case. From there, implement model routing for the quickest cost wins, assign cost ownership to product teams, and invest in real-time monitoring. Organizations that take these four steps typically see meaningful savings within the first quarter.
Does optimizing AI costs hurt model performance?
Not when done correctly. The goal is not to spend less on AI across the board โ it is to spend efficiently. Techniques like model routing, prompt optimization, and caching frequent responses reduce costs on routine tasks while preserving full model capability where it matters. The best FinOps teams measure cost alongside quality metrics to ensure optimization never comes at the expense of user experience.
Ready to get your AI costs under control? Talk to our team โ to see how we help organizations build FinOps practices designed for the age of AI.





