Step 1: Measure call volume by workflow

Start with the basics:

  • calls per workflow, such as support, research, onboarding
  • calls per agent type
  • calls per user action

If you cannot measure this yet, add logging at the universal endpoint or model client.

Step 2: Estimate tokens per call (use averages)

You do not need perfect token accounting to forecast. Use average input tokens per workflow and average output tokens per workflow.

Watch for two common drivers: context length creep and prompt growth.

Step 3: Define routing tiers

Forecasting becomes much easier when you forecast tier splits rather than individual model behavior.

Use three tiers:

  • Tier A: OSS routine
  • Tier B: mid-tier
  • Tier C: frontier

Step 4: Estimate escalation rate

Escalation rate is the biggest lever.

Example:

  • If 5% of calls go to frontier, your spend may be manageable.
  • If it drifts to 12% because thresholds are too conservative, your spend spikes.

So forecast with a range: best case escalation, expected escalation, and worst case escalation.

Step 5: Build a monthly forecast table

Create a table with columns:

  • workflow
  • monthly calls
  • avg tokens per call
  • tier split (A/B/C)
  • cost per tier
  • total cost

Even a simple spreadsheet will reveal where to focus.

If you are running agents in production

Join the waitlist to get a savings estimate for your current workload mix.

Step 6: Make the forecast real with routing logs

Forecasts fail when you cannot observe reality.

Routing infrastructure helps because you can see actual tier split, actual escalation triggers, and cost by agent or workflow. Then you can update policies and keep the forecast stable.

Common mistakes

  • Forecasting only total tokens, not tier split
  • Ignoring retries and tool loops
  • Not separating routine vs complex tasks
  • No evaluation, so you cannot safely route down

Where ViaLayer AI helps

ViaLayer AI provides a universal endpoint and routing logs so you can measure tier splits and enforce policies. That turns forecasting into an operational loop rather than a guess.

Join waitlist to get a routing-based savings estimate.

Internal links: How it works · Product · Waitlist

Ready to make AI spend predictable?

Join waitlist to get a routing-based savings estimate, or Book a demo to review your workload mix.