← Blog

Agent Cost Attribution: How to Know Which Agent Is Burning Your Budget

Jon Mandraki

Deploy five agents Monday. Friday: your bill is higher. One agent is burning tokens. Which? No idea. Logs show total tokens, not per-agent. CloudWatch search. API pattern digging. An hour wasted figuring out the $2,000 spike.

This is the cost attribution problem. It's invisible until you hit a budget spike. Then it's urgent.

Why Cost Attribution Is Broken in Most Agent Systems

Single agents are easy to cost-track. You know which model they call, you know roughly how many tokens per task, you estimate the cost. Done.

Multi-agent systems break visibility.

Multiple models. GPT-4 costs differ from Claude, which differs from local models. Tasks range from $0.10 to $5, not comparable.

Variable token usage. Same agent, same task: 100 tokens or 10,000 depending on reasoning. Unpredictable costs.

Tool calls cost money. APIs charge per call. Databases charge per scan. Tool usage isn't tracked alongside tokens.

No agent-level visibility. You see fleet totals. You don't know which agent used 80% of the budget.

Retroactive discovery. Bills arrive a week late. Damage done. Nothing to fix this week. Hope next week is better.

What Cost Attribution Actually Requires

You need five things: per-agent token tracking, per-model costs, tool call costs, real-time visibility, and drill-down capability. Without all five, you're blind.

Strategies for Cost Control

Once you have visibility, you need strategies to actually control costs. Here are the ones that work:

Strategy 1: Model Routing

Don't use your most expensive model for every task. Route simpler tasks to cheaper models.

What it looks like: A gating agent classifies incoming requests and routes to the right model: simple questions to GPT-3.5/Haiku, moderate to GPT-4/Sonnet, complex to GPT-4-Turbo. Route 60% of requests to cheaper models and cut costs 30-40%.

Trade-off: Misclassification hurts quality. Complex questions routed to weak models return worse answers. The gating agent needs to be accurate.

Strategy 2: Rate Limiting Per Agent

Set a token budget per agent per day. When an agent hits its limit, it stops.

apiVersion: orloj.dev/v1
kind: Agent
metadata:
  name: customer-support-bot
spec:
  rateLimit:
    tokensPerDay: 50000

The support bot uses 50k tokens per day. Hits the limit and it queues or errors. No runaway $10k agents. Predictable costs. Trade-off: if limits are too tight, you manually raise them during traffic spikes.

Strategy 3: Budget Caps With Action Triggers

Set a daily budget for the entire agent fleet. When you hit the budget, take an action.

apiVersion: orloj.dev/v1
kind: Budget
metadata:
  name: daily-budget
spec:
  dailySpend: 500
  action:
    type: "pause-non-critical"
    criticalAgents:
      - payment-processor
      - incident-responder

Hit $500 daily spend and non-critical agents pause. Only critical ones continue. Protect essential workflows, cap the bill. Trade-off: you decide upfront which agents are critical.

Strategy 4: Async Tasks at Off-Peak Hours

For agents that don't need immediate results, run them during off-peak hours when your quota might be cheaper or load is lower.

Some LLM providers offer lower rates at off-peak times. Some tools are cheaper when they're not contending with production traffic.

If your agent doesn't need to respond immediately, defer it to 2am and save money.

Strategy 5: Caching and Deduplication

If the same agent processes the same request twice, it burns tokens twice. Cache results.

apiVersion: orloj.dev/v1
kind: Agent
metadata:
  name: data-lookup-bot
spec:
  cache:
    enabled: true
    ttl: 3600
    keyStrategy: "hash-of-input"

The data-lookup bot caches for 1 hour. Duplicate requests return cached results. 20-40% savings for high-repetition workflows. Trade-off: data stale for up to 1 hour. Only for non-realtime queries.

Strategy 6: Cost Attribution Dashboards

Build a dashboard that shows, in real time, which agent is spending what.

Customer Support Bot: $234 (47%)
  - GPT-4: 45,000 tokens ($1.35)
  - API calls to CRM: 2,000 calls ($0.20)
  - API calls to KB search: 8,000 calls ($232.45)

Data Processor: $156 (31%)
  - Claude Sonnet: 92,000 tokens ($0.23)
  - Postgres queries: 500 queries ($155.77)

Incident Responder: $90 (18%)
  - GPT-4-Turbo: 12,000 tokens ($90)

Research Agent: $20 (4%)
  - Claude Haiku: 55,000 tokens ($20)

You see: support bot burns on KB searches, data processor on DB queries. Awareness alone cuts costs 15-25% because teams ask "why?"

Implementing Cost Attribution in Orloj

Orloj tracks costs at the runtime layer. Every agent action is logged with:

  • Agent name
  • Model called (if applicable)
  • Tokens used (input + output)
  • Tool called (if applicable)
  • Tool call cost (if applicable)
  • Timestamp
  • Status (success/failure)

Structured JSON logs for queries. Analytics: which agent used most tokens? Which used most expensive models? Which called most tools?

Here's what the log looks like:

{
  "timestamp": "2026-05-13T14:32:01Z",
  "agent": "customer-support-bot",
  "action": "model-call",
  "model": "gpt-4",
  "tokens": {
    "input": 1240,
    "output": 340,
    "total": 1580
  },
  "cost": 0.047,
  "duration_ms": 2340,
  "status": "success"
}

You can also set up Orloj agents to monitor these logs and alert you. For example:

apiVersion: orloj.dev/v1
kind: Agent
metadata:
  name: cost-monitor
spec:
  tools:
    - query-runtime-logs
    - send-slack-notification
  schedule: "every 1 hour"
  task: |
    Check if any agent exceeded its daily budget.
    If so, send a Slack alert to #platform-eng.
    Include which agent, current spend, and budget.

The cost-monitor agent runs every hour. It queries the runtime logs, checks budgets, and alerts you if something is off-track.

Real Costs in Practice

Let me walk through a realistic scenario.

Scenario: You have five agents in production. Customer support bot, data processor, incident responder, research agent, and report generator.

Week 1 baseline:

  • Customer support: $400
  • Data processor: $300
  • Incident responder: $150
  • Research agent: $50
  • Report generator: $100
  • Total: $1,000

Week 2: Total jumps to $2,400. Without attribution you guess. With it, you see immediately:

  • Customer support: $800 (2x)
  • Data processor: $300 (same)
  • Incident responder: $150 (same)
  • Research agent: $50 (same)
  • Report generator: $1,100 (11x!)

Report generator exploded: 50,000 DB hits per run instead of 5,000. Deduplication broke. Revert the change, spend normalizes. Without attribution: week investigating. With it: 5 minutes.

Governance and Cost Control

Cost control is also a governance problem. It's not just about visibility. It's about enforcement.

In Orloj, you can write policies that enforce cost constraints:

apiVersion: orloj.dev/v1
kind: Policy
metadata:
  name: cost-control
spec:
  agents:
    - "*"
  rules:
    - resource: "model-calls"
      action: "*"
      effect: "rate-limit"
      limit:
        tokensPerDay: 100000
        costThreshold: 500
    - resource: "api-calls"
      action: "*"
      effect: "rate-limit"
      limit:
        callsPerDay: 10000

Every agent is subject to this policy. Hit the token limit and your agent pauses. Hit the cost threshold and your agent escalates to a human.

The runtime enforces it. No agent can exceed the limit because the runtime intercepts the call.

Teams with cost visibility optimize where others can't. They identify valuable agents (low cost, high impact) and expensive ones. Invest in optimization or retire them. Set budgets with confidence. Onboard new agents knowing they can't blow the budget in a day. Move faster without surprise bill investigations. Cost attribution is the difference between controlled systems and runaway ones.

Related posts