Why I quit SaaS AI observability tools and built a local proxy instead

#claudecode #opensource #llm #webdev

A confession

I've been using Langfuse and Helicone for the last 6 months. They're great products. Their teams are sharp.

But they don't work for coding agents.

The mismatch

Tool	Architecture	Works for coding agents?
Langfuse	SDK + async upload to SaaS	❌ Need to instrument the agent
Helicone	HTTPS proxy via HTTP_PROXY	❌ CLIs ignore HTTP_PROXY
Datadog LLM Obs	APM agent	❌ Same problem
ccglass	Local loopback reverse proxy	✅ Yes

The reason: Claude Code, Codex, OpenCode, Kimi, etc. are native CLIs (Node, Rust, Go). They make HTTPS calls directly to the API endpoint. They do not respect HTTP_PROXY environment variables.

So the standard observability play — "just point your SDK at our proxy" — doesn't work. The agent isn't using a library that knows to call your endpoint.

What I actually needed

I needed something that would:

Be a man-in-the-middle on the loopback (so it sees plain HTTP)
Forward to the real API (so the agent works)
Be zero-config (the agent already trusts http://127.0.0.1)
Not require a CA cert (loopback is plain HTTP)
Be local-only (no SaaS, no account)

I built it. It's called ccglass. It does those 5 things. Nothing else.

What it looks like in practice

$ npm i -g ccglass
$ ccglass claude
# → starts proxy on http://127.0.0.1:8123
# → overrides ANTHROPIC_BASE_URL to point at it
# → spawns claude
# → opens dashboard at http://127.0.0.1:8123

The dashboard shows:

Live request log with the full system prompt, tool calls, responses
Per-request cost (with cache-aware pricing)
Per-turn diff (what changed in the context this turn)
Cache hit rate (how often your system prompt is being cached)
Token breakdown (input / output / cache_read / cache_write)

What's different from Langfuse / Helicone

Local-only. No data leaves your machine. No account. No API key on their side.
Works for coding agents specifically. Built for the HTTP_PROXY-bypass problem.
Single binary, 1-command install. No SDK to integrate.
Open source under MIT. You can read every line.

What's the same

Token accounting
Per-request cost
Latency tracking
Provider routing (multiple model providers)

Why I'm sharing this

If you use a coding agent heavily, and you don't know which of your prompts are 4,000 tokens of accidental repetition, you're leaving money on the table.

The first time I saw my own cache hit rate (38% — meaning I was re-sending the same system prompt 38% of the time and not knowing it), I had a "wait, that's literally me paying for nothing" moment.

Try it once. The data is eye-opening.

🔗 GitHub: https://github.com/jianshuo/ccglass