Jaeger for LLM agents: Jaeger is one of the easiest ways to see what your LLM agent actually did in production. When an agent fails, the final answer rarely tells you the real story. The story is in the timeline: planning, tool selection, retries, RAG retrieval, and downstream service latency.

In this guide, we’ll build a practical Jaeger workflow for debugging tool calls and multi-step agent runs using OpenTelemetry. We’ll focus on what teams need in real systems: searchability (run_id), safe logging, and fast incident triage.
TL;DR
- Trace = 1 user request / 1 agent run.
- Span = each step (plan, tool call, retrieval, final).
- Add
run_id,tool.name,llm.model,prompt.versionas span attributes so Jaeger search works. - Keep 100% of error traces (tail sampling) and downsample the rest.
- Don’t store raw prompts/tool args in production by default; store summaries/hashes + strict RBAC.
Table of Contents
- What Jaeger is (and what it is not)
- Why Jaeger is great for agent debugging
- Span model for tool calling and RAG
- How to find the right trace fast (run_id workflow)
- Common failure patterns Jaeger reveals
- Setup overview: OTel → Collector → Jaeger
- Privacy + redaction guidance
- Tools & platforms (official + GitHub links)
- Production checklist
- FAQ
What Jaeger is (and what it is not)
Jaeger is an open-source distributed tracing backend. It stores traces (spans), provides a UI to explore timelines, and helps you understand request flows across services.
Jaeger is not a complete observability platform by itself. Most teams pair it with metrics (Prometheus/Grafana) and logs (ELK/OpenSearch/Loki). For LLM agents, Jaeger is the best “trace-first” entry point because timelines are how agent failures present.
Why Jaeger is great for agent debugging
- Request narrative: agents are sequential + branching systems. Traces show the narrative.
- Root-cause speed: instantly spot if the tool call timed out vs. the model stalled.
- Cross-service visibility: planner service → tool service → DB → third-party API, all in one view.
Span model for tool calling and RAG
Start with a consistent span naming convention. Example:
trace (run_id=R123)
span: agent.plan
span: llm.generate (model=gpt-4.1)
span: tool.search (tool.name=web_search)
span: tool.search.result (http.status=200)
span: rag.retrieve (top_k=10)
span: final.compose
Recommended attributes (keep them structured):
run_id(critical: makes incident triage fast)tool.name,tool.type,tool.status,http.status_codellm.provider,llm.model,llm.tokens_in,llm.tokens_outprompt.versionorprompt.hashrag.top_k,rag.source,rag.hit_count(avoid raw retrieved content)
How to find the right trace fast (run_id workflow)
The cleanest workflow is: your app logs a run_id for each user request, and Jaeger traces carry the same attribute. Then you can search Jaeger by run_id and open the exact trace in seconds.
- Log
run_idat request start and return it in API responses for support tickets. - Add
run_idas a span attribute on the root span (and optionally all spans). - Use Jaeger search to filter by
run_id,error=true, ortool.name.
Common failure patterns Jaeger reveals
1) Broken context propagation (fragmented traces)
If tool calls run as separate services, missing trace propagation breaks the timeline. You’ll see disconnected traces instead of one end-to-end trace. Fix: propagate trace headers (W3C Trace Context) into tool HTTP calls or internal RPC.
2) “Tool call succeeded” but agent still failed
This often indicates parsing/validation issues (schema mismatch), prompt regression, or poor retrieval. The trace shows tool latency is fine; failure happens in the LLM generation span or post-processing span.
3) Slow runs caused by retries
Retries add up. In Jaeger, you’ll see repeated tool spans. Add attributes like retry.count and retry.reason to make it obvious.
Setup overview: OTel → Collector → Jaeger
A simple production-friendly architecture is:
Agent Runtime (OTel SDK) -> OTel Collector -> Jaeger (storage + UI)
Export OTLP from your agent to the Collector, apply tail sampling + redaction there, and export to Jaeger.
Privacy + redaction guidance
- Do not store raw prompts/tool arguments by default in production traces.
- Store summaries, hashes, or classified metadata (e.g., “contains_pii=true”) instead.
- Keep detailed logging behind feature flags, short retention, and strict RBAC.
Tools & platforms (official + GitHub links)
- Jaeger: jaegertracing.io | GitHub
- OpenTelemetry: opentelemetry.io
- OpenTelemetry Collector: GitHub
Production checklist
- Define a span naming convention + attribute schema (
run_id, tool attributes, model info). - Propagate trace context into tool calls (headers/middleware).
- Use tail sampling to keep full traces for failures/slow runs.
- Redact PII/secrets and restrict access to sensitive trace fields.
- Train the team on a basic incident workflow: “get run_id → find trace → identify slow/error span → fix.”
FAQ
Jaeger vs Tempo: which should I use?
If you want a straightforward tracing backend with a classic trace UI, Jaeger is a strong default. If you expect very high volume and want object-storage economics, Tempo can be a better fit (especially with Grafana).
Related reads on aivineet
- LLM Agent Tracing & Distributed Context | OpenTelemetry (OTel)
- LLM Agent Observability & Audit Logs
- OTel Collector for LLM Agents (Pipelines + Exporters)
Jaeger for LLM agents helps you debug tool calls, measure per-step latency, and keep distributed context across services.


Leave a Reply