Step 1: Measure call volume by workflow
Start with the basics:
- calls per workflow, such as support, research, onboarding
- calls per agent type
- calls per user action
If you cannot measure this yet, add logging at the universal endpoint or model client.
Step 2: Estimate tokens per call (use averages)
You do not need perfect token accounting to forecast. Use average input tokens per workflow and average output tokens per workflow.
Watch for two common drivers: context length creep and prompt growth.
Step 3: Define routing tiers
Forecasting becomes much easier when you forecast tier splits rather than individual model behavior.
Use three tiers:
- Tier A: OSS routine
- Tier B: mid-tier
- Tier C: frontier
Step 4: Estimate escalation rate
Escalation rate is the biggest lever.
Example:
- If 5% of calls go to frontier, your spend may be manageable.
- If it drifts to 12% because thresholds are too conservative, your spend spikes.
So forecast with a range: best case escalation, expected escalation, and worst case escalation.
Step 5: Build a monthly forecast table
Create a table with columns:
- workflow
- monthly calls
- avg tokens per call
- tier split (A/B/C)
- cost per tier
- total cost
Even a simple spreadsheet will reveal where to focus.
If you are running agents in production
Join the waitlist to get a savings estimate for your current workload mix.
Step 6: Make the forecast real with routing logs
Forecasts fail when you cannot observe reality.
Routing infrastructure helps because you can see actual tier split, actual escalation triggers, and cost by agent or workflow. Then you can update policies and keep the forecast stable.
Common mistakes
- Forecasting only total tokens, not tier split
- Ignoring retries and tool loops
- Not separating routine vs complex tasks
- No evaluation, so you cannot safely route down
Where ViaLayer AI helps
ViaLayer AI provides a universal endpoint and routing logs so you can measure tier splits and enforce policies. That turns forecasting into an operational loop rather than a guess.
Join waitlist to get a routing-based savings estimate.
Internal links: How it works · Product · Waitlist
Ready to make AI spend predictable?
Join waitlist to get a routing-based savings estimate, or Book a demo to review your workload mix.