If you're running an always-on AI agent like Hermes, OpenClaw, or Claude Code and the monthly bill keeps surprising you, this server gives Claude the ability to profile its own spending. It reads local log files and state databases to break down token burn by source (cron jobs, Telegram gateways, subagents, CLI), isolates the overnight bill, and flags behavioral waste like retry storms or redundant context. The MCP interface exposes commands for cost forensics, fixed overhead analysis, and config recommendations with dollar estimates. Everything runs locally with read-only access to your agent's accounting data. Useful when you need to answer "where did $47 go while I slept" or "why does every API call burn 14k tokens of overhead" without opening a spreadsheet.
Hermes Agent · OpenClaw · Claude Code — one normalized core, local, read-only, zero dependencies
uvx agentburn
You run an AI assistant — OpenClaw, Hermes Agent, or Claude Code. It works around the clock: answers you in Telegram, runs scheduled jobs at night, spawns helper agents. Every word it reads and writes costs money — and the bill arrives as one number with no explanation.
agentburn is a free tool that reads your assistant's own diary (log files already sitting on your computer) and turns that number into answers:
| You ask | It answers | Command |
|---|---|---|
| Where does the money actually go? | scheduled jobs 79% · chats 7% · helpers 5% · you 9% | agentburn |
| What happened while I slept? | the overnight bill, isolated and named | agentburn |
| Why is it so expensive? | same file read 14×, a broken tool retried 6×, wake-ups that did nothing | agentburn why |
| What did it do in Telegram? | every function it called there, with counts and errors | agentburn why --source telegram |
| What exactly do I change? | ready config lines + expected saving in dollars | agentburn fix |
| Am I paying for a dying model? | your spend vs the world's 4-week trend, with a cheaper rising alternative | agentburn drift |
| Is my setup even normal? | "your overhead is worse than ~75% of N setups" | agentburn rank |
| Can I just ask the assistant? | yes — install the skill/MCP and ask "where do you burn my money?" | agentburn mcp |
No accounts. No cloud. Nothing leaves your computer. One command to try: uvx agentburn.
Always-on agents bill you around the clock — and their built-in counters only show totals. Real threads that made this tool:
"73% of every API call is fixed overhead — ~13.9K tokens of tool definitions and system prompt, resent every time." — hermes-agent #4379
"One entrant wrote about waking up to a $47 surprise bill from an overnight run — that's not an exotic failure, it's the default behavior of an unsupervised loop." — dev.to
"I've seen runs where step 3 costs 4× step 1 — no alert, just a bill." — comment, ibid.
agentburn reads the agent's own accounting data (read-only) and answers the question the totals never do: where.
cron / subagent / gateway:telegram|discord|whatsapp / cli. Always-on ≠ free: scheduled jobs and gateways spend without you.--night 23-7).| agentburn | ccusage | codeburn | built-in /usage | |
|---|---|---|---|---|
| Burn by source (cron · heartbeat · gateways · subagents) | ✅ | — | — | % only, 7 days, this machine |
| 🌙 the overnight bill, isolated | ✅ | — | — | — |
Behavioral forensics (why: loops, retry storms, failed-run cost) | ✅ | — | — | — |
Ready config patches (fix, source-verified keys) | ✅ | — | — | — |
Accounting-gap detection (doctor, lower-bound honesty) | ✅ | — | — | — |
| MCP server (agent answers for its own bill) | ✅ | — | — | — |
| Totals / live blocks / many CLIs | basic | ✅ best-in-class | ✅ TUI, 25 providers | totals |
As of June 2026; ccusage and codeburn are excellent at what they do — agentburn deliberately starts where they stop (ccusage scoped per-tool analysis out).
Most token trackers quietly disagree with each other (2–91× in public issue threads). agentburn takes the opposite stance:
~/.hermes/state.db: per-session token counters and cost fields). No scraping, no proxies, no guessing.~. Mixed data is labeled mixed.Everything runs locally and reads your database read-only. No network calls. No telemetry. The report is yours.
agentburn # every agent on this machine, last 30 days
agentburn --agent openclaw # just one
agentburn --days 7
agentburn --agent hermes --db /path/to/state.db
agentburn why # behavioral forensics: loops, retry storms, idle heartbeats
agentburn why --source telegram # decompose ONE source: functions called, errors, loops
agentburn --source cron # cost report for one source only
agentburn explain --model llama3.1 # LLM reads the numbers back to you (local by default)
agentburn --night 23-7 # custom overnight window (local time)
agentburn --budget-month 50 --fail-over # sentinel for cron/CI
agentburn --json # machine-readable, pipe it anywhere
agentburn --no-color
📤 Share your burn (--share). An anonymized card — categories, models and totals only; session titles, paths and content are excluded by construction. Safe to paste into a post; --svg card.svg renders the same card as an image:
🔥 my hermes agent · last 30d
~$45.50 → ~$430/mo pace · 1.75M tokens
where it burns: cron 79% · cli 9% · telegram 7% · subagent 5%
🌙 while I slept (00–08): ~$36.00 — 79% of everything
⚙️ telegram re-sends 20,000 tokens with EVERY call — 2.5× the community norm (≈8k)
— agentburn · local & private
--svg card.svg renders it as an image:
📏 Calibration against public benchmarks. "Is 15k input tokens per call normal?" The report compares your fixed overhead with community-measured references embedded as dated constants (e.g. the Phala always-on-agent benchmark, 2026-03: ≈8k/call baseline). No network — sources are cited inline.
📐 Optimize → prove it (--save-baseline / --compare). Snapshot your pace, change the config (cheaper cron model, trimmed toolsets), then agentburn --compare shows the delta in $/month — pace-normalized, so a 7-day baseline compares honestly with a 30-day window. Every recommendation becomes a testable promise.
🔬 agentburn why — behavioral forensics. report says where it burns; why says why, from the agent's own recorded actions and thoughts:
🔬 agentburn why — openclaw · gateway:telegram
WHAT IT ACTUALLY DID browser 34× ≈210K in results · web_search 18× · shell 7× (2 errors)
RE-READ LOOPS 5× browser(https://news.site/page) — every repeat re-paid in full
RETRY STORMS Bash: 3 errors / 6 calls — paying full price for every error
IDLE HEARTBEATS 4 of 9 heartbeat runs did NOTHING — $2.40 of pure idle burn
BURNED ON FAILURES 2 failed runs → ~$3.90 (timeout, killed)
THINKS MORE THAN IT WORKS 62% thinking · 84K tokens · "rename files task"
💡 WHAT TO CHANGE
1. `/proj/big.md` was fetched 4× in one session ≈32K tokens re-paid — cache it…
Observations with numbers, not verdicts; only tool names, truncated argument keys and counters — message content never leaves the machine (and never enters the report).
🧠 agentburn explain — LLM interpretation, local-first. The numbers, read back to you in plain language with ranked actions:
agentburn explain --model llama3.1 # local ollama — nothing leaves the machine
agentburn explain --llm https://openrouter.ai/api/v1 \
--model deepseek/deepseek-chat --yes-remote --lang ru # remote: explicit opt-in only
Privacy rules are hard-coded: the default endpoint is localhost (ollama / LM Studio); a remote endpoint requires --yes-remote and receives a redacted summary only — session titles become session-N, file paths shrink to basenames, message content is never in the payload to begin with. Works with any OpenAI-compatible API, zero new dependencies. (Yes — a cost profiler spending ~3K tokens to explain costs. The payload is compact and the answer capped; the irony is acknowledged.)
🧭 agentburn drift — your spend × the world's direction. Are you paying for a model the world is leaving?
🧭 agentburn drift
YOUR MODELS vs THE WORLD (4-week world trend)
anthropic/claude-opus-4.6 ~$341/mo world -41% ⬊
deepseek/deepseek-v3.2 ~$12/mo world +12% →
💡 DRIFT ALERTS
1. claude-opus-4.6: you spend ~$341/mo; world usage -41% in 4 weeks — the world
is leaving this model. Rising alternative step-3.5-flash (+180%) is ~98% cheaper.
Your side is computed locally from the agents' own logs; the world side is one read-only GET of token-history's public trend JSON (archived daily from OpenRouter's rankings — deep history unlocks as the archive grows). Nothing about you is sent anywhere; --trends FILE works fully offline. Nobody else joins these two halves.
🩺 agentburn why additions: CRON RUNS — the per-run receipt for every scheduled job (what openclaw #24636 keeps asking for), and CONTEXT THRASH — compactions counted per session, because every compaction silently re-sends a near-full context window.
📊 agentburn rank + --submit — the Burn Index. Anonymous community percentiles of efficiency — the benchmark volume-leaderboards can't be: nothing here rewards burning more.
📊 agentburn rank — you vs the Burn Index
input tokens per call · cli you: 15,000 median: 6,200 worse than ~75% of 41
share of spend at night you: 79.0% median: 12.0% worse than ~90% of 41
cache-read share of input volume you: 61.0% median: 44.0% better than ~75% of 41
Joining is consent-by-click: agentburn --submit prints the exact anonymized payload (ratios and a coarse spend band — never raw volumes, titles or paths), then a prefilled GitHub-issue link that you open and submit. A weekly Action aggregates submissions with plausibility bounds (junk and flexing get dropped, not ranked) into public quantiles.
🔧 agentburn fix — from findings to ready config patches (dry-run by design). Not "consider a cheaper model" but the exact file and the exact lines:
🔧 agentburn fix — hermes · DRY-RUN (nothing was changed)
1. Point Hermes cron jobs at a cheap model
file : ~/.hermes/cron/jobs.json
why : cron is 79% of spend; maintenance rarely needs a frontier model.
effect : bulk of ≈$341/mo moves to cheap-model pricing
proposed:
"nightly digest": "model": "deepseek/deepseek-chat"
ⓘ field verified in hermes-agent cron/jobs.py: per-job `model` override
Patch generators exist only for config keys verified against the agents' source code (Hermes cron/jobs.json, OpenClaw agents.defaults.heartbeat incl. activeHours — the night-burn killer — and lightContext). There is no --apply on purpose: paste it yourself, then prove the saving with --save-baseline → --compare.
🔌 agentburn mcp — your agent answers for its own bill. A zero-dependency MCP stdio server exposing burn_report / burn_why / burn_card. Register it and ask the agent "where do you burn my money?" — it calls the profiler on its own database and explains:
# Claude Code
claude mcp add agentburn -- agentburn mcp
# Hermes / OpenClaw: add an stdio MCP server with command `agentburn mcp`
Prefer skills? There's a ready SKILL.md — drop it into ~/.hermes/skills/agentburn/, ~/.openclaw/skills/agentburn/ or ~/.claude/skills/agentburn/ and just ask the agent "where do you burn my money?".
🩺 agentburn doctor. Trackers disagree because the agent's own accounting has gaps. doctor names the broken combinations (provider × model × source) for zero-usage and unpriced sessions, and generates a ready-to-paste upstream bug report — counters only, no message content.
🚨 Sentinel mode — a budget guard for server agents. Your agent runs 24/7 on a VPS; this watches it:
# alert when overnight burn exceeds $5/month pace (exit code 1 → any alerting hooks in)
agentburn --agent openclaw --budget-night 5 --fail-over --no-color \
|| notify-send "🚨 agent is burning money at night"
Drop it in cron next to the agent itself — the one-off check becomes a standing guard.
One normalized model, one adapter per agent. Run agentburn and every agent found on the machine gets its own report.
| Agent | Status | Data source | Notes |
|---|---|---|---|
| Hermes Agent | ✅ | ~/.hermes/state.db (+ optional request dumps) | costs from the agent's own accounting |
| OpenClaw | ✅ | ~/.openclaw/agents/*/sessions/sessions.json | heartbeat is its own category — the famous one; cron / gateways / subagents split out |
| Claude Code | ✅ | ~/.claude/projects/**.jsonl | tokens only, by design: CC doesn't record costs locally and subscription usage has no honest per-token price — we don't invent one |
Adapters are ~150 lines over a shared model. Codex CLI / opencode are natural next targets — PRs welcome.
token-history — the macro view: daily archive of which agents the world uses (OpenRouter rankings). agentburn is the micro view: where yours burns.
MIT
mcp-name: io.github.Socialpranker/agentburn
the token-* family · token-history — which agents the world runs · agentburn — where yours burns
if this saved you a dinner's worth of tokens, a ⭐ helps the next person find it