Deploy five agents Monday. Friday: your bill is higher. One agent is burning tokens. Which? No idea. Logs show total tokens, not per-agent. CloudWatch search. API pattern digging. An hour wasted figuring out the $2,000 spike.
This is the cost attribution problem. It's invisible until you hit a budget spike. Then it's urgent.
Why Cost Attribution Is Broken in Most Agent Systems
Single agents are easy to cost-track. You know which model they call, you know roughly how many tokens per task, you estimate the cost. Done.
Multi-agent systems break visibility.
Multiple models. GPT-4 costs differ from Claude, which differs from local models. Tasks range from $0.10 to $5, not comparable.
Variable token usage. Same agent, same task: 100 tokens or 10,000 depending on reasoning. Unpredictable costs.
Tool calls cost money. APIs charge per call. Databases charge per scan. Tool usage isn't tracked alongside tokens.
No agent-level visibility. You see fleet totals. You don't know which agent used 80% of the budget.
Retroactive discovery. Bills arrive a week late. Damage done. Nothing to fix this week. Hope next week is better.
What Cost Attribution Actually Requires
You need five things: per-agent token tracking, per-model costs, tool call costs, real-time visibility, and drill-down capability. Without all five, you're blind.
Strategies for Cost Control
Once you have visibility, you need strategies to actually control costs. Here are the ones that work:
Strategy 1: Model Routing
Don't use your most expensive model for every task. Route simpler tasks to cheaper models.
What it looks like: A gating agent classifies incoming requests and routes to the right model: simple questions to GPT-3.5/Haiku, moderate to GPT-4/Sonnet, complex to GPT-4-Turbo. Route 60% of requests to cheaper models and cut costs 30-40%.
Trade-off: Misclassification hurts quality. Complex questions routed to weak models return worse answers. The gating agent needs to be accurate.
Strategy 2: Rate Limiting Per Agent
Set a token budget per agent per day. When an agent hits its limit, it stops.
apiVersion: orloj.dev/v1
kind: Agent
metadata:
name: customer-support-bot
spec:
rateLimit:
tokensPerDay: 50000
The support bot uses 50k tokens per day. Hits the limit and it queues or errors. No runaway $10k agents. Predictable costs. Trade-off: if limits are too tight, you manually raise them during traffic spikes.
Strategy 3: Budget Caps With Action Triggers
Set a daily budget for the entire agent fleet. When you hit the budget, take an action.
apiVersion: orloj.dev/v1
kind: Budget
metadata:
name: daily-budget
spec:
dailySpend: 500
action:
type: "pause-non-critical"
criticalAgents:
- payment-processor
- incident-responder
Hit $500 daily spend and non-critical agents pause. Only critical ones continue. Protect essential workflows, cap the bill. Trade-off: you decide upfront which agents are critical.
Strategy 4: Async Tasks at Off-Peak Hours
For agents that don't need immediate results, run them during off-peak hours when your quota might be cheaper or load is lower.
Some LLM providers offer lower rates at off-peak times. Some tools are cheaper when they're not contending with production traffic.
If your agent doesn't need to respond immediately, defer it to 2am and save money.
Strategy 5: Caching and Deduplication
If the same agent processes the same request twice, it burns tokens twice. Cache results.
apiVersion: orloj.dev/v1
kind: Agent
metadata:
name: data-lookup-bot
spec:
cache:
enabled: true
ttl: 3600
keyStrategy: "hash-of-input"
The data-lookup bot caches for 1 hour. Duplicate requests return cached results. 20-40% savings for high-repetition workflows. Trade-off: data stale for up to 1 hour. Only for non-realtime queries.
Strategy 6: Cost Attribution Dashboards
Build a dashboard that shows, in real time, which agent is spending what.
Customer Support Bot: $234 (47%)
- GPT-4: 45,000 tokens ($1.35)
- API calls to CRM: 2,000 calls ($0.20)
- API calls to KB search: 8,000 calls ($232.45)
Data Processor: $156 (31%)
- Claude Sonnet: 92,000 tokens ($0.23)
- Postgres queries: 500 queries ($155.77)
Incident Responder: $90 (18%)
- GPT-4-Turbo: 12,000 tokens ($90)
Research Agent: $20 (4%)
- Claude Haiku: 55,000 tokens ($20)
You see: support bot burns on KB searches, data processor on DB queries. Awareness alone cuts costs 15-25% because teams ask "why?"
Implementing Cost Attribution in Orloj
Orloj tracks costs at the runtime layer. Every agent action is logged with:
- Agent name
- Model called (if applicable)
- Tokens used (input + output)
- Tool called (if applicable)
- Tool call cost (if applicable)
- Timestamp
- Status (success/failure)
Structured JSON logs for queries. Analytics: which agent used most tokens? Which used most expensive models? Which called most tools?
Here's what the log looks like:
{
"timestamp": "2026-05-13T14:32:01Z",
"agent": "customer-support-bot",
"action": "model-call",
"model": "gpt-4",
"tokens": {
"input": 1240,
"output": 340,
"total": 1580
},
"cost": 0.047,
"duration_ms": 2340,
"status": "success"
}
You can also set up Orloj agents to monitor these logs and alert you. For example:
apiVersion: orloj.dev/v1
kind: Agent
metadata:
name: cost-monitor
spec:
tools:
- query-runtime-logs
- send-slack-notification
schedule: "every 1 hour"
task: |
Check if any agent exceeded its daily budget.
If so, send a Slack alert to #platform-eng.
Include which agent, current spend, and budget.
The cost-monitor agent runs every hour. It queries the runtime logs, checks budgets, and alerts you if something is off-track.
Real Costs in Practice
Let me walk through a realistic scenario.
Scenario: You have five agents in production. Customer support bot, data processor, incident responder, research agent, and report generator.
Week 1 baseline:
- Customer support: $400
- Data processor: $300
- Incident responder: $150
- Research agent: $50
- Report generator: $100
- Total: $1,000
Week 2: Total jumps to $2,400. Without attribution you guess. With it, you see immediately:
- Customer support: $800 (2x)
- Data processor: $300 (same)
- Incident responder: $150 (same)
- Research agent: $50 (same)
- Report generator: $1,100 (11x!)
Report generator exploded: 50,000 DB hits per run instead of 5,000. Deduplication broke. Revert the change, spend normalizes. Without attribution: week investigating. With it: 5 minutes.
Governance and Cost Control
Cost control is also a governance problem. It's not just about visibility. It's about enforcement.
In Orloj, you can write policies that enforce cost constraints:
apiVersion: orloj.dev/v1
kind: Policy
metadata:
name: cost-control
spec:
agents:
- "*"
rules:
- resource: "model-calls"
action: "*"
effect: "rate-limit"
limit:
tokensPerDay: 100000
costThreshold: 500
- resource: "api-calls"
action: "*"
effect: "rate-limit"
limit:
callsPerDay: 10000
Every agent is subject to this policy. Hit the token limit and your agent pauses. Hit the cost threshold and your agent escalates to a human.
The runtime enforces it. No agent can exceed the limit because the runtime intercepts the call.
Teams with cost visibility optimize where others can't. They identify valuable agents (low cost, high impact) and expensive ones. Invest in optimization or retire them. Set budgets with confidence. Onboard new agents knowing they can't blow the budget in a day. Move faster without surprise bill investigations. Cost attribution is the difference between controlled systems and runaway ones.
Related posts
Orloj vs. LangGraph vs. CrewAI: 2026 Update
Six months since our original comparison. All three frameworks shipped major updates. Here's what changed and what didn't.
Why Every Agent System Needs a Governance Layer (Not Just Guardrails)
Guardrails check outputs. A governance layer controls inputs, execution, access, and budget. They solve different problems. Most teams need both.
Orloj vs. Microsoft Semantic Kernel Agent Framework
Microsoft's Agent Framework brings .NET, Python, and Java support with deep Azure integration. Orloj is language-agnostic and cloud-agnostic. Different trade-offs for different teams.