RoboCo measures what your workforce spends in tokens and dollars, and shows it to you live in the panel. This page explains where those numbers come from, how to read the dashboard, how each provider is priced, and what cost controls exist (and where they don't).

How spend is measured

Spend is captured per agent session — one container, doing one stretch of work. Each spawned agent runs a small in-container SDK server that watches the agent's own transcript and exposes its running token counts (input, output, cache-read, cache-write). The orchestrator runs a background sweep every ~60 seconds: for every active agent it pulls the live counts, writes a usage snapshot, and updates the open session row so the database reflects progress mid-run. When the container stops, the session is finalized — final token counts are resolved, priced through the built-in cost table, and written to the closed session row along with the exit reason.

A separate daily rollup keeps a rolling per-day, per-agent, per-team, per-model tally that feeds the "today" figures.

Why mid-run numbers move, and why a crash can read $0

Because cost is computed per session and firms up only when the session ends, the figures for a still-running agent update roughly every minute and aren't final. A session that crashes or is abandoned before its transcript is read can finalize at a low or zero value. The dashboard is an accurate ledger of closed sessions plus a live estimate for the ones still open — not a real-time invoice.

Grok agents have no live SDK hook; the orchestrator reads each Grok container's usage.json instead, and uses the same file to enforce the per-agent Grok cost cap (below).

The Token Usage & Cost dashboard

Two panel surfaces show spend:

  • The command center carries a Token Usage & Cost card that streams live while agents work. It listens on the /ws/system WebSocket and shows a Live badge when connected; if the socket drops it falls back to HTTP polling and shows Polling (or Connecting). See the command center.
  • The Metrics page has a dedicated Token Usage tab with the full breakdown. See Metrics.

A 24h / 7d / 30d selector drives every panel. What you get:

PanelWhat it shows
SummaryInput, output, and total tokens for the period, plus total cost in USD
TrendPercent change vs the immediately prior window of equal length
Time seriesHourly points for 24h; daily points for 7d/30d
Per-modelDonut of cost share by model
Per-agent / per-teamBar charts of spend, each with its share of the total
Monthly projectionForecast spend (see below)
Cache efficiencyCache hit-rate and the dollars prompt-caching is saving
Recent sessionsRaw recent spawn-session rows (default 50)
Totals count all four token classes

The total-token and cost figures sum input + output + cache-read + cache-write. Prompt-caching reads are cheap but not free, so they show up in both the totals and the cache-savings panel.

Projection and cache savings are estimates

The monthly projection is a naive extrapolation: the average daily cost over the last 7 days × 30. It's a planning forecast, not a bill, and it overstates if you spun up the fleet for a one-off burst. The cache-savings figure uses a single aggregate baseline rate to estimate what the cached tokens would have cost at the full input price — treat it as indicative, not exact.

How each provider is priced

Pricing is provider-aware, from a built-in USD-per-million-token table (roboco/billing/pricing.py). Model names are matched on substring, longest fragment wins.

Provider / modelInputOutputPriced?
Claude Opus 4$5.00$25.00yes (cache read $0.50, write $6.25)
Claude Sonnet (4 / 3.7 / 3.5)$3.00$15.00yes (cache read $0.30)
Claude Haiku 4 / 3.5$1.00$5.00yes
xAI grok-build$1.00$2.00yes (cache read $0.20)
Local Ollama (ollama/… or bare tag)$0 by design
Ollama Cloud (:cloud tag)$0 by design
$0 for local and Ollama Cloud is not a bug

Self-hosted Ollama runs on hardware you own and Ollama Cloud is billed by flat subscription, so neither carries a per-token cost. RoboCo intentionally prices them at $0 — it is not undercounting, and you won't see a warning.

A brand-new Claude model can silently undercount

If a claude-named model isn't in the pricing table, RoboCo logs a warning and returns $0 for it rather than crashing. That is real spend going uncounted. If you point an agent at a Claude model newer than this build and see $0 cost on a busy fleet, the table needs the new rate added — check the orchestrator logs for the pricing warning.

Cost controls (and the asymmetry)

There is exactly one built-in dollar cap, and it only covers Grok:

SettingDefaultWhat it does
ROBOCO_GROK_MAX_COST_USD0.0Per-agent Grok cost ceiling in USD, read from the container's usage.json. The orchestrator kills a Grok container once it crosses this, catching runaway-loop token burn. 0 disables it. Grok agents only.
There is no built-in dollar cap for Claude agents

The Claude path is observe-only on cost. RoboCo will show you Claude spend live and historically, but it will not auto-kill a Claude agent for crossing a dollar threshold — there is no ROBOCO_*_MAX_COST_USD equivalent for it. Your protection against runaway Claude spend is the structural one: rate-limit and overload park-and-probe (a provider 429 or persistent overload queues the agent and probes for recovery instead of retrying in a hot loop — see provider resilience), the per-task verb gateway, and watching the dashboard. Budget accordingly.

A related guard reaps abandoned interactive chats so they don't leak a container:

SettingDefaultWhat it does
ROBOCO_INTERACTIVE_IDLE_REAP_SECONDS1800Reaps an idle live Intake/Secretary chat (by time since last turn) so it stops holding a container and burning tokens. 0 disables.

Quick checks from the shell

bash
1# Period summary (totals + cost + trend)2curl -s 'http://localhost:3000/api/usage/summary?period=7d'3 4# Monthly projection5curl -s http://localhost:3000/api/usage/projection6 7# Container-level resource use (not token cost)8docker stats9docker system df

Next