OpenTelemetry (OTel) is the fastest path to production-grade tracing for LLM agents because it gives you a standard way to follow a request across your agent runtime, tools, and downstream services. If your agent uses RAG, tool calling, or multi-step plans, OTel helps you answer the only questions that matter in production: what happened, where did it fail, and why?
In this guide, we’ll explain how to instrument an LLM agent with end-to-end traces (spans), how to propagate context across tool calls, and how to store + query traces in backends like Jaeger/Tempo. We’ll keep it practical and enterprise-friendly (redaction, auditability, and performance).
TL;DR
- Trace everything: prompt version → plan → tool calls → tool outputs → final answer.
- Use trace context propagation so tool calls remain linked to the parent run.
- Model “one user request” as a trace, and each agent/tool step as a span.
- Export via OTLP to an OpenTelemetry Collector, then route to Jaeger/Tempo or your observability stack.
- Redact PII and never log secrets; keep raw traces on short retention.
Table of Contents
- What is OpenTelemetry (OTel)?
- Why agents need distributed tracing
- A trace model for LLM agents (runs, spans, events)
- Distributed context propagation for tool calls
- Tracing RAG: retrieval, embeddings, and citations
- Privacy, redaction, and retention
- Tools & platforms (official + GitHub links)
- Production checklist
- FAQ
What is OpenTelemetry (OTel)?
OpenTelemetry is an open standard for collecting traces, metrics, and logs. In practice, OTel gives you a consistent way to generate and export trace data across services. For LLM agents, that means you can follow a single user request through:
- your API gateway / app server
- agent planner + router
- tool calling (search, DB, browser, CRM)
- RAG retrieval and reranking
- final synthesis and formatting
Why agents need distributed tracing
Agent failures rarely show up in the final answer. More often, the issue is upstream: a tool returned a 429, the model chose the wrong tool, or retrieval returned irrelevant context. Therefore, tracing becomes your “black box recorder” for agent runs.
- Debuggability: see the exact tool call sequence and timing.
- Reliability: track where latency and errors occur (per tool, per step).
- Governance: produce audit trails for data access and actions.
A trace model for LLM agents (runs, spans, events)
Start with a simple mapping:
- Trace = 1 user request (1 agent run)
- Span = a step (plan, tool call, retrieval, final response)
- Span attributes = structured fields (tool name, status code, prompt version, token counts)
trace: run_id=R123
span: plan (prompt_version=v12)
span: tool.search (q="...")
span: tool.search.result (status=200, docs=8)
span: rag.retrieve (top_k=10)
span: final.compose (schema=AnswerV3)
Distributed context propagation for tool calls
The biggest mistake teams make is tracing the agent runtime but losing context once tools run. To keep spans connected, propagate trace context into tool requests. For HTTP tools this is typically done via headers, and for internal tools it can be done via function parameters or middleware.
- Use trace_id/span_id propagation into each tool call.
- Ensure tool services also emit spans (or at least structured logs) with the same trace_id.
- As a result, your trace UI shows one end-to-end timeline instead of disconnected fragments.
Tracing RAG: retrieval, embeddings, and citations
RAG pipelines introduce their own failure modes: missing documents, irrelevant retrieval, and hallucinated citations. Instrument spans for:
- retrieval query + filters (redacted)
- top_k results and scores (summaries, not raw content)
- reranker latency
- citation coverage (how much of the answer is backed by retrieved text)
Privacy, redaction, and retention
- Never log secrets (keys/tokens). Store references only.
- Redact PII from prompts/tool args (emails, phone numbers, addresses).
- Short retention for raw traces; longer retention for aggregated metrics.
- RBAC for viewing prompts/tool args and retrieved snippets.
Tools & platforms (official + GitHub links)
- OpenTelemetry: opentelemetry.io
- OpenTelemetry Collector: GitHub
- Jaeger (trace backend): jaegertracing.io | GitHub
- Grafana Tempo (trace backend): grafana.com/oss/tempo | GitHub
- Zipkin (trace backend): zipkin.io
Production checklist
- Define run_id and map 1 request = 1 trace.
- Instrument spans for plan, each tool call, and final synthesis.
- Propagate trace context into tool calls (headers/middleware).
- Export OTLP to an OTel Collector and route to your backend.
- Redact PII + enforce retention and access controls.
FAQ
Do I need an OpenTelemetry Collector?
Not strictly, but it’s the cleanest way to route OTLP data to multiple backends (Jaeger/Tempo, logs, metrics) without rewriting your app instrumentation.
