A confession
I've been using Langfuse and Helicone for the last 6 months. They're great products. Their teams are sharp.
But they don't work for coding agents.
The mismatch
| Tool | Architecture | Works for coding agents? |
|---|---|---|
| Langfuse | SDK + async upload to SaaS | ❌ Need to instrument the agent |
| Helicone | HTTPS proxy via HTTP_PROXY | ❌ CLIs ignore HTTP_PROXY |
| Datadog LLM Obs | APM agent | ❌ Same problem |
| ccglass | Local loopback reverse proxy | ✅ Yes |
The reason: Claude Code, Codex, OpenCode, Kimi, etc. are native CLIs (Node, Rust, Go). They make HTTPS calls directly to the API endpoint. They do not respect HTTP_PROXY environment variables.
So the standard observability play — "just point your SDK at our proxy" — doesn't work. The agent isn't using a library that knows to call your endpoint.
What I actually needed
I needed something that would:
- Be a man-in-the-middle on the loopback (so it sees plain HTTP)
- Forward to the real API (so the agent works)
- Be zero-config (the agent already trusts
http://127.0.0.1) - Not require a CA cert (loopback is plain HTTP)
- Be local-only (no SaaS, no account)
I built it. It's called ccglass. It does those 5 things. Nothing else.
What it looks like in practice
$ npm i -g ccglass
$ ccglass claude
# → starts proxy on http://127.0.0.1:8123
# → overrides ANTHROPIC_BASE_URL to point at it
# → spawns claude
# → opens dashboard at http://127.0.0.1:8123
The dashboard shows:
- Live request log with the full system prompt, tool calls, responses
- Per-request cost (with cache-aware pricing)
- Per-turn diff (what changed in the context this turn)
- Cache hit rate (how often your system prompt is being cached)
- Token breakdown (input / output / cache_read / cache_write)
What's different from Langfuse / Helicone
- Local-only. No data leaves your machine. No account. No API key on their side.
- Works for coding agents specifically. Built for the HTTP_PROXY-bypass problem.
- Single binary, 1-command install. No SDK to integrate.
- Open source under MIT. You can read every line.
What's the same
- Token accounting
- Per-request cost
- Latency tracking
- Provider routing (multiple model providers)
Why I'm sharing this
If you use a coding agent heavily, and you don't know which of your prompts are 4,000 tokens of accidental repetition, you're leaving money on the table.
The first time I saw my own cache hit rate (38% — meaning I was re-sending the same system prompt 38% of the time and not knowing it), I had a "wait, that's literally me paying for nothing" moment.
Try it once. The data is eye-opening.
🔗 GitHub: https://github.com/jianshuo/ccglass
Top comments (0)