Keywords AI

BLOG

The Ultimate Guide to LLM Observability: Why OpenTelemetry is Essential and the Easiest Way to Set It Up (Haystack, Vercel AI SDK, LiteLLM)

January 26, 2026

The Ultimate Guide to LLM Observability: Mastering OpenTelemetry (OTel) for AI Agents

Building an AI agent is easy; knowing why it's failing is the hard part. As you move from simple chat completions to complex agentic loops with Haystack, the Vercel AI SDK, or LiteLLM, you quickly realize that traditional logging isn't enough. You need traces.

In this comprehensive guide, we dive deep into the industry standard: OpenTelemetry (OTel). We'll explore the technical "How-To" for the most popular LLM stacks and compare manual setup vs. the streamlined Keywords AI approach. By the end, you'll understand not just how to instrument your LLM applications, but why OpenTelemetry has become the de facto standard for production AI observability.

Deep Dive: What is OpenTelemetry (OTel)?
Why OTel is Critical for LLM Observability
The Technical Stack: Understanding OTel Components
Setup: Vercel AI SDK + OpenTelemetry
Setup: Haystack + OpenTelemetry
Setup: LiteLLM + OpenTelemetry
Advanced Topics: Distributed Tracing, Context Propagation, and Sampling
The Verdict: Manual vs. Automated Tracing
Production Best Practices

1. Deep Dive: What is OpenTelemetry (OTel)?

OpenTelemetry is not a backend; it is a vendor-neutral, open source observability framework. It provides a standardized set of APIs, SDKs, and protocols (OTLP) to collect "telemetry"—Traces, Metrics, and Logs. Think of it as the "USB-C of observability": a universal standard that works with any tool.

As an open source project under the Cloud Native Computing Foundation (CNCF), OpenTelemetry provides LLM monitoring open source solutions that don't lock you into proprietary platforms. This makes it the ideal foundation for building comprehensive LLM observability into your applications.

The Three Pillars of Observability

OpenTelemetry is built around three core data types:

1. Traces A trace represents the entire lifecycle of a request as it flows through your system. In LLM applications, a trace might capture:

The initial user query
RAG retrieval operations
Multiple LLM calls (if using agentic loops)
Post-processing steps
Final response delivery

2. Metrics Quantitative measurements over time. For LLMs, critical metrics include:

Token usage per model
Cost per request
Latency percentiles (p50, p95, p99)
Error rates
Throughput (requests per second)

3. Logs Structured events with timestamps. While logs are less emphasized in OTel (compared to traces and metrics), they're still valuable for capturing:

Error messages
Debug information
Audit trails

How OpenTelemetry Works: The Architecture

In the context of LLMs, OTel works by creating a Trace, which represents the entire lifecycle of a user request. Inside that trace are Spans.

Trace: The "Macro" view (e.g., a user asks for a summary of a PDF, which triggers retrieval, embedding, and generation).
Span: The "Micro" view (e.g., the embedding call, the vector search, the LLM completion, the post-processing step).

Each span contains:

Name: What operation was performed (e.g., "llm.completion")
Attributes: Key-value pairs (e.g., llm.model="gpt-4", llm.tokens.prompt=150)
Events: Timestamped annotations (e.g., "retrieval.started", "model.selected")
Status: Success, error, or unset
Duration: How long the operation took

The OpenTelemetry Specification

OpenTelemetry follows a strict specification that ensures interoperability. The spec defines:

Semantic Conventions: Standard attribute names (e.g., http.method, db.query, llm.model)
OTLP Protocol: The wire format for sending telemetry data
API Contracts: How SDKs must behave across languages

This standardization means you can instrument your Python Haystack pipeline, your TypeScript Vercel AI SDK app, and your LiteLLM proxy, and they'll all produce compatible traces that can be viewed in a single dashboard.

OpenTelemetry vs. Proprietary Solutions

Unlike vendor-specific solutions (Datadog APM, New Relic, etc.), OpenTelemetry gives you:

Vendor Freedom: Instrument once, export anywhere
Community Standards: Built by the CNCF with industry-wide adoption
Future-Proofing: Your instrumentation code doesn't break when you switch backends
Cost Efficiency: Avoid vendor lock-in and choose the most cost-effective backend

2. The Technical Stack: Understanding OTel Components

To get OpenTelemetry running in production, your system needs four components working together:

Component 1: Instrumentation

The code inside your application that generates spans. This can be:

Automatic: Using auto-instrumentation libraries that hook into frameworks
Manual: Writing custom spans using the OTel API
Hybrid: Combining both approaches

For LLM applications, you typically need manual instrumentation because:

LLM frameworks don't always have built-in OTel support
You need to capture LLM-specific attributes (tokens, costs, model IDs)
You want fine-grained control over what gets traced

Component 2: SDK (Software Development Kit)

The language-specific implementation of the OpenTelemetry API. Popular SDKs include:

@opentelemetry/sdk-node (Node.js/TypeScript)
opentelemetry-sdk (Python)
opentelemetry-js (Browser/Edge)

The SDK handles:

Span creation and management
Context propagation (passing trace context across async boundaries)
Resource detection (identifying your service, host, etc.)

Component 3: Exporter

The component that sends telemetry data to your backend. Common exporters:

OTLP Exporter: Sends data via the OpenTelemetry Protocol (recommended)
Jaeger Exporter: Direct export to Jaeger
Zipkin Exporter: Direct export to Zipkin
Console Exporter: For debugging (prints to stdout)

For LLM observability, you'll typically use the OTLP HTTP exporter to send data to an LLM observability platform like Keywords AI, which provides LLM-specific dashboards and analytics. When choosing the best LLM observability platform for your needs, consider factors like cost tracking, token usage analytics, and integration with your existing stack.

Component 4: Backend/Collector

Where your telemetry data lives and gets visualized. You have two options:

Option A: Direct Export Your application exports directly to a backend (e.g., Keywords AI, Datadog, Grafana Cloud).

Option B: OpenTelemetry Collector A middleman service that receives, processes, and routes telemetry data. The collector is useful for:

Batch processing (reducing API calls)
Data transformation (enriching spans with metadata)
Multi-backend routing (sending to multiple destinations)
Sampling (reducing data volume)

For most LLM applications, direct export is simpler and sufficient. The collector adds operational complexity that may not be necessary unless you're running at massive scale.

3. Why OTel is Critical for LLM Observability

LLM applications are fundamentally different from traditional web applications. They're non-deterministic, stateful, and involve complex multi-step workflows. When an agent fails, it could be because of:

A prompt injection attack
A retrieval error (wrong documents returned)
A model timeout
A rate limit hit
A cost budget exceeded
A hallucination that went undetected

OpenTelemetry's Distributed Tracing allows you to follow the request as it travels across different microservices, LLM providers, and infrastructure components. This is critical for debugging production issues.

The LLM Observability Challenge

Traditional application monitoring focuses on:

CPU usage
Memory consumption
Request latency
Error rates

LLM observability requires tracking:

Token usage and costs across different models (GPT-4 vs. Claude vs. Gemini)
Prompt and response quality (hallucinations, relevance, accuracy)
Model performance (latency, throughput, error rates per model)
User interactions and conversation flows (multi-turn conversations)
RAG pipeline performance (retrieval accuracy, context relevance, embedding quality)
Chain execution (multi-step workflows, tool calls, function calling)
Cost optimization (identifying which models are most cost-effective for specific tasks)

Why Standard Logging Falls Short

Consider this scenario: A user reports that your AI agent gave a wrong answer. With standard logging, you might see:


[INFO] User query: "What is the capital of France?"
[INFO] LLM response: "Paris"

But you don't know:

Which model was used?
How many tokens were consumed?
What was the latency?
What documents were retrieved (if using RAG)?
What was the cost?
Was there an error that was silently handled?

With OpenTelemetry traces, you get a complete picture:


Trace: user_query_abc123
├─ Span: rag.retrieval
│  ├─ Attributes: retrieval.doc_count=5, retrieval.latency_ms=120
│  └─ Events: retrieval.started, retrieval.completed
├─ Span: llm.completion
│  ├─ Attributes: llm.model=gpt-4, llm.tokens.prompt=250, llm.tokens.completion=50
│  ├─ Attributes: llm.cost=0.003, llm.latency_ms=850
│  └─ Status: OK
└─ Span: post_processing
   └─ Attributes: processing.type=sentiment_analysis

The Business Case for LLM Observability

Beyond debugging, observability drives business outcomes:

Cost Optimization: Identify which models are most cost-effective for specific use cases
Quality Assurance: Track hallucination rates and response quality over time
User Experience: Understand latency patterns and optimize for user satisfaction
Compliance: Audit trails for regulated industries (healthcare, finance)
Capacity Planning: Understand usage patterns to scale infrastructure appropriately

When evaluating LLM observability platforms, look for solutions that provide comprehensive LLM monitoring open source capabilities through OpenTelemetry integration. The best LLM observability platform will offer automatic cost tracking, token usage analytics, and seamless integration with frameworks like Vercel AI SDK, Haystack, and LiteLLM.

Key SEO Keywords: llm observability (590), llm observability platform (50), best llm observability platform (50), llm monitoring open source (40), vercel ai sdk telemetry (40), litellm observability (40), opentelemetry llm (30), haystack observability (15).

4. Setup: Vercel AI SDK Telemetry + OpenTelemetry

The Vercel AI SDK is the go-to choice for Next.js developers building AI applications. It provides a unified interface for working with multiple LLM providers (OpenAI, Anthropic, Google, etc.) and handles streaming, tool calling, and structured outputs.

However, managing Vercel AI SDK telemetry with OpenTelemetry in a serverless environment (Vercel Functions) comes with significant overhead. Setting up proper observability for Vercel AI SDK requires careful configuration of OpenTelemetry instrumentation. Let's explore both approaches.

Option A: Manual OTel Setup (The Hard Way)

To manually instrument Vercel AI SDK, you must use the @opentelemetry/sdk-node package and configure an instrumentation.ts file. Here's what's involved:

Step 1: Install Dependencies

npm install @opentelemetry/api @opentelemetry/sdk-node @opentelemetry/exporter-trace-otlp-http @opentelemetry/instrumentation @opentelemetry/resources @opentelemetry/semantic-conventions

Step 2: Create `instrumentation.ts`

Vercel requires an instrumentation.ts file in your project root to initialize OpenTelemetry before your application code runs:

typescript
1// instrumentation.ts
2import { NodeSDK } from '@opentelemetry/sdk-node';
3import { OTLPTraceExporter } from '@opentelemetry/exporter-trace-otlp-http';
4import { Resource } from '@opentelemetry/resources';
5import { SemanticResourceAttributes } from '@opentelemetry/semantic-conventions';
6import { HttpInstrumentation } from '@opentelemetry/instrumentation-http';
7import { ExpressInstrumentation } from '@opentelemetry/instrumentation-express';
8
9const sdk = new NodeSDK({
10  resource: new Resource({
11    [SemanticResourceAttributes.SERVICE_NAME]: 'vercel-ai-app',
12    [SemanticResourceAttributes.SERVICE_VERSION]: process.env.VERCEL_GIT_COMMIT_SHA || '1.0.0',
13    [SemanticResourceAttributes.DEPLOYMENT_ENVIRONMENT]: process.env.VERCEL_ENV || 'development',
14  }),
15  traceExporter: new OTLPTraceExporter({
16    url: process.env.OTEL_EXPORTER_OTLP_ENDPOINT || 'https://api.keywordsai.co/v1/traces',
17    headers: {
18      'Authorization': `Bearer ${process.env.KEYWORDSAI_API_KEY}`,
19      'Content-Type': 'application/json',
20    },
21  }),
22  instrumentations: [
23    new HttpInstrumentation(),
24    new ExpressInstrumentation(),
25  ],
26});
27
28sdk.start();
29
30// Ensure spans are flushed before the process exits
31process.on('SIGTERM', () => {
32  sdk.shutdown()
33    .then(() => console.log('OpenTelemetry terminated'))
34    .catch((error) => console.log('Error terminating OpenTelemetry', error))
35    .finally(() => process.exit(0));
36});

Step 3: Configure `next.config.js`

You need to enable the experimental instrumentationHook:

javascript
1// next.config.js
2module.exports = {
3  experimental: {
4    instrumentationHook: true,
5  },
6};

Step 4: Instrument Your API Routes

Now you need to manually wrap every AI SDK call with spans:

typescript
1// app/api/chat/route.ts
2import { openai } from '@ai-sdk/openai';
3import { generateText, streamText } from 'ai';
4import { trace, context } from '@opentelemetry/api';
5import { NextRequest, NextResponse } from 'next/server';
6
7const tracer = trace.getTracer('vercel-ai-sdk');
8
9export async function POST(request: NextRequest) {
10  const { messages, stream } = await request.json();
11  
12  // Start a trace for this request
13  const span = tracer.startSpan('ai.chat', {
14    attributes: {
15      'llm.framework': 'vercel-ai-sdk',
16      'llm.provider': 'openai',
17      'http.method': 'POST',
18      'http.route': '/api/chat',
19    },
20  });
21
22  try {
23    const activeContext = trace.setSpan(context.active(), span);
24    
25    return await context.with(activeContext, async () => {
26      if (stream) {
27        return handleStreaming(messages, span);
28      } else {
29        return handleNonStreaming(messages, span);
30      }
31    });
32  } catch (error) {
33    span.recordException(error as Error);
34    span.setStatus({ code: 1, message: (error as Error).message });
35    throw error;
36  } finally {
37    span.end();
38  }
39}
40
41async function handleNonStreaming(messages: any[], span: any) {
42  const generateSpan = tracer.startSpan('ai.generate', {
43    parent: span,
44  });
45
46  try {
47    generateSpan.setAttributes({
48      'llm.messages.count': messages.length,
49      'llm.messages.last': JSON.stringify(messages[messages.length - 1]),
50    });
51
52    const { text, usage, finishReason } = await generateText({
53      model: openai('gpt-4'),
54      messages: messages,
55    });
56
57    // Extract and set LLM-specific attributes
58    generateSpan.setAttributes({
59      'llm.model': 'gpt-4',
60      'llm.response': text,
61      'llm.tokens.prompt': usage.promptTokens,
62      'llm.tokens.completion': usage.completionTokens,
63      'llm.tokens.total': usage.totalTokens,
64      'llm.finish_reason': finishReason,
65      'llm.cost': calculateCost(usage.promptTokens, usage.completionTokens, 'gpt-4'),
66    });
67
68    generateSpan.setStatus({ code: 0 }); // OK
69    return NextResponse.json({ text });
70  } catch (error) {
71    generateSpan.recordException(error as Error);
72    generateSpan.setStatus({ code: 1, message: (error as Error).message });
73    throw error;
74  } finally {
75    generateSpan.end();
76  }
77}
78
79async function handleStreaming(messages: any[], span: any) {
80  const streamSpan = tracer.startSpan('ai.stream', {
81    parent: span,
82  });
83
84  try {
85    streamSpan.setAttribute('llm.streaming', true);
86    
87    const result = await streamText({
88      model: openai('gpt-4'),
89      messages: messages,
90    });
91
92    // For streaming, we need to track tokens differently
93    // This is a simplified example - real implementation is more complex
94    streamSpan.setAttribute('llm.model', 'gpt-4');
95    
96    return result.toDataStreamResponse();
97  } catch (error) {
98    streamSpan.recordException(error as Error);
99    streamSpan.setStatus({ code: 1, message: (error as Error).message });
100    throw error;
101  } finally {
102    streamSpan.end();
103  }
104}
105
106function calculateCost(promptTokens: number, completionTokens: number, model: string): number {
107  // Pricing as of 2026 (example - update with actual prices)
108  const pricing: Record<string, { prompt: number; completion: number }> = {
109    'gpt-4': { prompt: 0.03 / 1000, completion: 0.06 / 1000 },
110    'gpt-4-turbo': { prompt: 0.01 / 1000, completion: 0.03 / 1000 },
111    'gpt-3.5-turbo': { prompt: 0.0015 / 1000, completion: 0.002 / 1000 },
112  };
113  
114  const modelPricing = pricing[model] || pricing['gpt-3.5-turbo'];
115  return (promptTokens * modelPricing.prompt) + (completionTokens * modelPricing.completion);
116}

Step 5: Handle Edge Runtime Issues

Vercel's Edge Runtime doesn't support Node.js APIs, which means OpenTelemetry SDKs that rely on Node.js won't work. You have two options:

Switch to Node.js Runtime: Add export const runtime = 'nodejs' to your route
Use Edge-Compatible Alternatives: Use Web APIs and manual instrumentation (more complex)

The Challenges with Manual Setup

Runtime Compatibility: Edge vs. Node.js runtime conflicts
Span Lifecycle Management: Ensuring spans are flushed before serverless functions terminate
Cost Calculation: You must manually implement pricing logic for every model
Context Propagation: Handling async boundaries and context loss
Maintenance Burden: Every SDK update might break your instrumentation

Read the full manual guide: Vercel OTel Docs

Option B: The Keywords AI Way (Recommended)

Keywords AI replaces dozens of lines of configuration with a single package. We handle the runtime compatibility, the mapping of LLM-specific metadata (tokens, costs, model IDs), and the span lifecycle automatically.

Full setup guide: Vercel AI SDK + Keywords AI tracing

Step 1: Install Keywords AI Tracing

npm install @keywordsai/tracing-node

Step 2: Initialize in `instrumentation.ts`

typescript
1// instrumentation.ts
2import { KeywordsTracer } from '@keywordsai/tracing-node';
3
4KeywordsTracer.init({
5  apiKey: process.env.KEYWORDSAI_API_KEY,
6  serviceName: 'vercel-ai-app',
7});

That's it. No manual SDK configuration, no exporter setup, no span lifecycle management.

Step 3: Use in Your API Routes

typescript
1// app/api/chat/route.ts
2import { openai } from '@ai-sdk/openai';
3import { generateText, streamText } from 'ai';
4import { NextRequest, NextResponse } from 'next/server';
5
6export async function POST(request: NextRequest) {
7  const { messages, stream } = await request.json();
8  
9  if (stream) {
10    const result = await streamText({
11      model: openai('gpt-4'),
12      messages: messages,
13    });
14    return result.toDataStreamResponse();
15  } else {
16    const { text } = await generateText({
17      model: openai('gpt-4'),
18      messages: messages,
19    });
20    return NextResponse.json({ text });
21  }
22}

That's it. Keywords AI automatically:

Detects Vercel AI SDK calls
Creates spans with proper hierarchy
Extracts token usage, costs, and model information
Handles streaming vs. non-streaming
Works in both Edge and Node.js runtimes
Flushes spans before function termination

The Benefits

2 Minutes Setup: vs. 2-4 hours for manual setup
Zero Maintenance: Updates automatically with SDK changes
Automatic Cost Tracking: No manual pricing calculations
Built-in Dashboard: No need for separate visualization tools
Edge Runtime Support: Works out of the box

5. Setup: Haystack + OpenTelemetry

Haystack by deepset is a powerhouse for Python-based RAG pipelines. It's designed for production-ready LLM applications with built-in support for document stores, retrievers, generators, and complex pipelines.

Haystack has built-in support for OpenTelemetry, but the "wiring" is left to you. Let's compare the manual approach vs. the Keywords AI way.

Option A: Manual OTel Setup (The Hard Way)

For Haystack, you need to set up a Python tracer provider and link it to Haystack's internal tracing backend. Here's the complete setup:

Step 1: Install Dependencies

pip install haystack-ai opentelemetry-api opentelemetry-sdk opentelemetry-exporter-otlp-proto-http opentelemetry-instrumentation

Step 2: Initialize OpenTelemetry Provider

python
1# tracing_setup.py
2from opentelemetry import trace
3from opentelemetry.sdk.trace import TracerProvider
4from opentelemetry.sdk.trace.export import BatchSpanProcessor
5from opentelemetry.exporter.otlp.proto.http.trace_exporter import OTLPSpanExporter
6from opentelemetry.sdk.resources import Resource
7from opentelemetry.semantic_conventions.resource import ResourceAttributes
8import os
9
10# Create resource with service information
11resource = Resource.create({
12    ResourceAttributes.SERVICE_NAME: "haystack-rag-app",
13    ResourceAttributes.SERVICE_VERSION: os.getenv("APP_VERSION", "1.0.0"),
14    ResourceAttributes.DEPLOYMENT_ENVIRONMENT: os.getenv("ENVIRONMENT", "production"),
15})
16
17# Initialize tracer provider
18provider = TracerProvider(resource=resource)
19trace.set_tracer_provider(provider)
20
21# Create OTLP exporter
22exporter = OTLPSpanExporter(
23    endpoint="https://api.keywordsai.co/v1/traces",
24    headers={
25        "Authorization": f"Bearer {os.getenv('KEYWORDSAI_API_KEY')}",
26        "Content-Type": "application/json",
27    },
28)
29
30# Add batch processor
31processor = BatchSpanProcessor(exporter)
32provider.add_span_processor(processor)
33
34# Get tracer
35tracer = trace.get_tracer(__name__)

Step 3: Configure Haystack to Use OpenTelemetry

Haystack has a tracing abstraction that you need to connect to OpenTelemetry:

python
1# haystack_tracing.py
2from haystack.tracing import OpenTelemetryTracer
3from opentelemetry import trace
4from opentelemetry.trace import Status, StatusCode
5
6class CustomOpenTelemetryTracer(OpenTelemetryTracer):
7    """Custom tracer that adds LLM-specific attributes"""
8    
9    def __init__(self):
10        super().__init__(trace.get_tracer("haystack"))
11    
12    def trace(self, operation_name: str, tags: dict = None, **kwargs):
13        """Override to add custom attributes"""
14        span = self.tracer.start_span(operation_name)
15        
16        if tags:
17            for key, value in tags.items():
18                # Map Haystack tags to OTel attributes
19                if key == "model":
20                    span.set_attribute("llm.model", value)
21                elif key == "provider":
22                    span.set_attribute("llm.provider", value)
23                elif key == "prompt_tokens":
24                    span.set_attribute("llm.tokens.prompt", value)
25                elif key == "completion_tokens":
26                    span.set_attribute("llm.tokens.completion", value)
27                else:
28                    span.set_attribute(f"haystack.{key}", str(value))
29        
30        return span
31
32# Initialize Haystack tracing
33from haystack import tracing
34haystack_tracer = CustomOpenTelemetryTracer()
35tracing.set_backend(haystack_tracer)

Step 4: Instrument Your Haystack Pipeline

Now you need to manually instrument every component in your pipeline:

python
1# pipeline.py
2from haystack import Pipeline, Document
3from haystack.components.builders import PromptBuilder
4from haystack.components.retrievers import InMemoryBM25Retriever
5from haystack.components.generators import OpenAIGenerator
6from haystack.document_stores import InMemoryDocumentStore
7from opentelemetry import trace
8import os
9
10# Import tracing setup
11from tracing_setup import tracer
12
13def create_rag_pipeline():
14    """Create a RAG pipeline with manual OpenTelemetry instrumentation"""
15    
16    # Document store
17    document_store = InMemoryDocumentStore()
18    
19    # Retriever
20    retriever = InMemoryBM25Retriever(document_store=document_store, top_k=5)
21    
22    # Prompt builder
23    prompt_template = """
24    Given the following information, answer the question.
25    
26    Context:
27    {% for document in documents %}
28        {{ document.content }}
29    {% endfor %}
30    
31    Question: {{ query }}
32    Answer:
33    """
34    prompt_builder = PromptBuilder(template=prompt_template)
35    
36    # LLM generator
37    generator = OpenAIGenerator(api_key=os.getenv("OPENAI_API_KEY"))
38    
39    # Create pipeline
40    pipeline = Pipeline()
41    pipeline.add_component("retriever", retriever)
42    pipeline.add_component("prompt_builder", prompt_builder)
43    pipeline.add_component("llm", generator)
44    
45    pipeline.connect("retriever", "prompt_builder.documents")
46    pipeline.connect("prompt_builder", "llm.prompt")
47    
48    return pipeline
49
50def run_rag_query(query: str, pipeline: Pipeline):
51    """Run a RAG query with full OpenTelemetry tracing"""
52    
53    # Start root span
54    with tracer.start_as_current_span("haystack.rag_pipeline") as root_span:
55        root_span.set_attribute("query", query)
56        root_span.set_attribute("llm.framework", "haystack")
57        
58        # Retrieval span
59        with tracer.start_as_current_span("haystack.retrieval") as retrieval_span:
60            # Manually call retriever to get documents
61            documents = pipeline.get_component("retriever").run(query=query)
62            retrieval_span.set_attribute("retrieval.doc_count", len(documents["documents"]))
63            retrieval_span.set_attribute("retrieval.query", query)
64        
65        # Prompt building span
66        with tracer.start_as_current_span("haystack.prompt_building") as prompt_span:
67            prompt = pipeline.get_component("prompt_builder").run(
68                query=query,
69                documents=documents["documents"]
70            )
71            prompt_span.set_attribute("prompt.length", len(prompt["prompt"]))
72        
73        # LLM generation span
74        with tracer.start_as_current_span("haystack.generation") as gen_span:
75            response = pipeline.get_component("llm").run(prompt=prompt["prompt"])
76            
77            # Extract usage information (this varies by generator)
78            if hasattr(response, "meta") and "usage" in response.meta:
79                usage = response.meta["usage"]
80                gen_span.set_attribute("llm.tokens.prompt", usage.get("prompt_tokens", 0))
81                gen_span.set_attribute("llm.tokens.completion", usage.get("completion_tokens", 0))
82                gen_span.set_attribute("llm.tokens.total", usage.get("total_tokens", 0))
83            
84            # Extract model information
85            if hasattr(response, "meta") and "model" in response.meta:
86                gen_span.set_attribute("llm.model", response.meta["model"])
87            
88            # Calculate cost (manual implementation required)
89            gen_span.set_attribute("llm.cost", calculate_llm_cost(response))
90            
91            gen_span.set_attribute("llm.response", response["replies"][0])
92        
93        return response
94
95def calculate_llm_cost(response):
96    """Manually calculate LLM cost - you must implement this for every model"""
97    # This is a simplified example - real implementation needs pricing for all models
98    if not hasattr(response, "meta") or "usage" not in response.meta:
99        return 0
100    
101    usage = response.meta["usage"]
102    model = response.meta.get("model", "gpt-3.5-turbo")
103    
104    # Pricing table (you must maintain this)
105    pricing = {
106        "gpt-4": {"prompt": 0.03 / 1000, "completion": 0.06 / 1000},
107        "gpt-4-turbo": {"prompt": 0.01 / 1000, "completion": 0.03 / 1000},
108        "gpt-3.5-turbo": {"prompt": 0.0015 / 1000, "completion": 0.002 / 1000},
109        "claude-3-opus": {"prompt": 0.015 / 1000, "completion": 0.075 / 1000},
110        # ... you must add pricing for every model you use
111    }
112    
113    model_pricing = pricing.get(model, pricing["gpt-3.5-turbo"])
114    prompt_tokens = usage.get("prompt_tokens", 0)
115    completion_tokens = usage.get("completion_tokens", 0)
116    
117    return (prompt_tokens * model_pricing["prompt"]) + (completion_tokens * model_pricing["completion"])
118
119# Ensure spans are flushed before process exits
120import atexit
121def flush_spans():
122    from opentelemetry import trace
123    provider = trace.get_tracer_provider()
124    if hasattr(provider, "force_flush"):
125        provider.force_flush()
126
127atexit.register(flush_spans)

The Challenges with Manual Setup

Span Lifecycle Management: You must ensure spans are flushed before the Python process exits. If your script ends too fast, you lose your logs.
Cost Calculation: You must manually implement pricing logic for every model (OpenAI, Anthropic, Google, etc.). This is error-prone and requires constant updates.
Component Instrumentation: Haystack pipelines can have many components. Manually instrumenting each one is tedious.
Usage Extraction: Different generators (OpenAI, Anthropic, etc.) return usage information in different formats. You must handle each case.
LiteLLM Integration: If you use LiteLLM within Haystack (common for multi-provider support), you need separate instrumentation.

Read the manual docs: Haystack Tracing Guide

Option B: The Keywords AI Way

With Keywords AI, we've built a dedicated exporter specifically for the Haystack OpenTelemetry integration. Here's how simple it is:

Full setup guide: Haystack + Keywords AI tracing

Step 1: Install Keywords AI Tracing

pip install keywords-ai-tracing

Step 2: Initialize (One Line)

python
1# main.py
2from keywords_ai_tracing import KeywordsTracer
3
4# Initialize - everything is handled automatically
5tracer = KeywordsTracer(api_key=os.getenv("KEYWORDSAI_API_KEY"))
6
7# That's it! Haystack is now automatically instrumented

Step 3: Use Your Pipeline Normally

python
1from haystack import Pipeline
2from haystack.components.builders import PromptBuilder
3from haystack.components.generators import OpenAIGenerator
4
5# Create your pipeline as normal
6pipeline = Pipeline()
7pipeline.add_component("prompt_builder", PromptBuilder(template="Answer: {{query}}"))
8pipeline.add_component("llm", OpenAIGenerator(api_key=os.getenv("OPENAI_API_KEY")))
9pipeline.connect("prompt_builder.prompt", "llm.prompt")
10
11# Run it - tracing happens automatically
12result = pipeline.run({"query": "What is the capital of France?"})

That's it. Keywords AI automatically:

Detects all Haystack components
Creates spans for retrieval, prompt building, and generation
Extracts token usage and costs (for all providers)
Handles LiteLLM if you use it within Haystack
Flushes spans before process termination
Maps everything to a beautiful dashboard

Why It's Better

Zero Configuration: No manual tracer setup, no span lifecycle management
Automatic Cost Tracking: Supports 100+ models with up-to-date pricing
LiteLLM Support: If you use LiteLLM within Haystack, it's automatically traced
Component Detection: Automatically instruments all Haystack components
Production Ready: Handles edge cases, errors, and async operations

6. Setup: LiteLLM Observability + OpenTelemetry

LiteLLM is a unified proxy that standardizes calls across 100+ LLM providers. It's perfect for:

Multi-provider applications (switching between OpenAI, Anthropic, Google, etc.)
Cost optimization (automatic fallbacks to cheaper models)
Rate limit management
Load balancing across providers

LiteLLM observability is crucial for understanding which models perform best and optimizing costs across providers. LiteLLM has built-in support for observability through callbacks, but integrating with OpenTelemetry requires manual work. Let's compare both approaches for implementing LiteLLM observability.

Option A: Manual OTel Setup (The Hard Way)

LiteLLM provides callback hooks that you can use to send data to OpenTelemetry. Here's the complete manual setup:

Step 1: Install Dependencies

pip install litellm opentelemetry-api opentelemetry-sdk opentelemetry-exporter-otlp-proto-http

Step 2: Create OpenTelemetry Callback

python
1# litellm_otel_callback.py
2from opentelemetry import trace
3from opentelemetry.sdk.trace import TracerProvider
4from opentelemetry.sdk.trace.export import BatchSpanProcessor
5from opentelemetry.exporter.otlp.proto.http.trace_exporter import OTLPSpanExporter
6from opentelemetry.sdk.resources import Resource
7from opentelemetry.semantic_conventions.resource import ResourceAttributes
8import os
9from typing import Optional, Dict, Any
10from litellm import completion
11
12# Initialize OpenTelemetry
13resource = Resource.create({
14    ResourceAttributes.SERVICE_NAME: "litellm-proxy",
15})
16
17provider = TracerProvider(resource=resource)
18trace.set_tracer_provider(provider)
19
20exporter = OTLPSpanExporter(
21    endpoint="https://api.keywordsai.co/v1/traces",
22    headers={
23        "Authorization": f"Bearer {os.getenv('KEYWORDSAI_API_KEY')}",
24    },
25)
26
27processor = BatchSpanProcessor(exporter)
28provider.add_span_processor(processor)
29
30tracer = trace.get_tracer(__name__)
31
32# Store active spans in a thread-safe way
33from threading import local
34_thread_local = local()
35
36def get_current_span():
37    """Get the current span from thread local storage"""
38    return getattr(_thread_local, 'span', None)
39
40def set_current_span(span):
41    """Set the current span in thread local storage"""
42    _thread_local.span = span
43
44class LiteLLMOpenTelemetryCallback:
45    """OpenTelemetry callback for LiteLLM"""
46    
47    def __init__(self):
48        self.tracer = tracer
49    
50    def log_success_event(self, kwargs, response_obj, start_time, end_time):
51        """Called when a completion succeeds"""
52        span = get_current_span()
53        if span:
54            # Extract model information
55            model = kwargs.get("model", "unknown")
56            span.set_attribute("llm.model", model)
57            span.set_attribute("llm.provider", self._extract_provider(model))
58            
59            # Extract usage information
60            if hasattr(response_obj, "usage"):
61                usage = response_obj.usage
62                span.set_attribute("llm.tokens.prompt", usage.prompt_tokens)
63                span.set_attribute("llm.tokens.completion", usage.completion_tokens)
64                span.set_attribute("llm.tokens.total", usage.total_tokens)
65            
66            # Extract response
67            if hasattr(response_obj, "choices") and len(response_obj.choices) > 0:
68                span.set_attribute("llm.response", response_obj.choices[0].message.content)
69            
70            # Calculate latency
71            latency_ms = (end_time - start_time) * 1000
72            span.set_attribute("llm.latency_ms", latency_ms)
73            
74            # Calculate cost (manual implementation required)
75            cost = self._calculate_cost(kwargs, response_obj)
76            span.set_attribute("llm.cost", cost)
77            
78            # Extract request information
79            if "messages" in kwargs:
80                span.set_attribute("llm.messages.count", len(kwargs["messages"]))
81                span.set_attribute("llm.messages.last", str(kwargs["messages"][-1]))
82            
83            if "temperature" in kwargs:
84                span.set_attribute("llm.temperature", kwargs["temperature"])
85            if "max_tokens" in kwargs:
86                span.set_attribute("llm.max_tokens", kwargs["max_tokens"])
87            
88            span.set_status(trace.Status(trace.StatusCode.OK))
89            span.end()
90            set_current_span(None)
91    
92    def log_failure_event(self, kwargs, response_obj, start_time, end_time, error):
93        """Called when a completion fails"""
94        span = get_current_span()
95        if span:
96            span.record_exception(error)
97            span.set_status(trace.Status(trace.StatusCode.ERROR, str(error)))
98            span.end()
99            set_current_span(None)
100    
101    def async_log_success_event(self, kwargs, response_obj, start_time, end_time):
102        """Async version - same as sync"""
103        self.log_success_event(kwargs, response_obj, start_time, end_time)
104    
105    def async_log_failure_event(self, kwargs, response_obj, start_time, end_time, error):
106        """Async version - same as sync"""
107        self.log_failure_event(kwargs, response_obj, start_time, end_time, error)
108    
109    def _extract_provider(self, model: str) -> str:
110        """Extract provider name from model string"""
111        if model.startswith("gpt-") or model.startswith("o1-"):
112            return "openai"
113        elif model.startswith("claude-") or model.startswith("sonnet-"):
114            return "anthropic"
115        elif model.startswith("gemini-") or "google" in model.lower():
116            return "google"
117        elif model.startswith("llama-") or "meta" in model.lower():
118            return "meta"
119        else:
120            return "unknown"
121    
122    def _calculate_cost(self, kwargs: Dict, response_obj: Any) -> float:
123        """Manually calculate cost - you must implement this for all 100+ models"""
124        model = kwargs.get("model", "gpt-3.5-turbo")
125        
126        if not hasattr(response_obj, "usage"):
127            return 0
128        
129        usage = response_obj.usage
130        
131        # You must maintain pricing for 100+ models
132        # This is a simplified example - real implementation is massive
133        pricing = {
134            # OpenAI
135            "gpt-4": {"prompt": 0.03 / 1000, "completion": 0.06 / 1000},
136            "gpt-4-turbo": {"prompt": 0.01 / 1000, "completion": 0.03 / 1000},
137            "gpt-3.5-turbo": {"prompt": 0.0015 / 1000, "completion": 0.002 / 1000},
138            "o1-preview": {"prompt": 0.015 / 1000, "completion": 0.06 / 1000},
139            # Anthropic
140            "claude-3-opus": {"prompt": 0.015 / 1000, "completion": 0.075 / 1000},
141            "claude-3-sonnet": {"prompt: 0.003 / 1000, "completion": 0.015 / 1000},
142            "claude-3-haiku": {"prompt": 0.00025 / 1000, "completion": 0.00125 / 1000},
143            # Google
144            "gemini-pro": {"prompt": 0.0005 / 1000, "completion": 0.0015 / 1000},
145            # ... you need pricing for 100+ more models
146        }
147        
148        model_pricing = pricing.get(model, {"prompt": 0, "completion": 0})
149        return (usage.prompt_tokens * model_pricing["prompt"]) + (usage.completion_tokens * model_pricing["completion"])
150
151# Create callback instance
152otel_callback = LiteLLMOpenTelemetryCallback()

Step 3: Use LiteLLM with Manual Span Creation

python
1# main.py
2from litellm import completion
3from litellm_otel_callback import otel_callback, tracer, set_current_span
4import os
5
6def call_llm(messages, model="gpt-4"):
7    """Call LiteLLM with manual OpenTelemetry instrumentation"""
8    
9    # Start span manually
10    span = tracer.start_span("litellm.completion")
11    set_current_span(span)
12    
13    try:
14        span.set_attribute("llm.framework", "litellm")
15        span.set_attribute("llm.model", model)
16        span.set_attribute("llm.messages", str(messages))
17        
18        response = completion(
19            model=model,
20            messages=messages,
21            api_key=os.getenv("OPENAI_API_KEY"),
22            callbacks=[otel_callback],  # Use our callback
23        )
24        
25        return response
26    except Exception as e:
27        span.record_exception(e)
28        span.set_status(trace.Status(trace.StatusCode.ERROR, str(e)))
29        raise
30    finally:
31        # Span is ended in the callback, but we clean up here
32        if span:
33            set_current_span(None)
34
35# Ensure spans are flushed
36import atexit
37def flush_spans():
38    from opentelemetry import trace
39    provider = trace.get_tracer_provider()
40    if hasattr(provider, "force_flush"):
41        provider.force_flush()
42
43atexit.register(flush_spans)

The Challenges with Manual Setup

100+ Models: You must maintain pricing for every model LiteLLM supports
Provider-Specific Logic: Different providers return usage data in different formats
Callback Complexity: Managing span lifecycle across sync and async calls
Thread Safety: Ensuring spans are correctly associated with requests in multi-threaded environments
Error Handling: Properly recording exceptions and setting span status
Cost Calculation: Keeping pricing up-to-date as providers change rates

Option B: The Keywords AI Way

Keywords AI has native LiteLLM support. Here's how simple it is:

Full setup guide: LiteLLM + Keywords AI

Step 1: Install Keywords AI Tracing

pip install keywords-ai-tracing

Step 2: Initialize (One Line)

python
1# main.py
2from keywords_ai_tracing import KeywordsTracer
3from litellm import completion
4
5# Initialize - LiteLLM is automatically instrumented
6KeywordsTracer.init(api_key=os.getenv("KEYWORDSAI_API_KEY"))
7
8# That's it!

Step 3: Use LiteLLM Normally

python
1from litellm import completion
2
3# Use LiteLLM as normal - tracing happens automatically
4response = completion(
5    model="gpt-4",
6    messages=[{"role": "user", "content": "Hello!"}],
7    api_key=os.getenv("OPENAI_API_KEY"),
8)
9
10# Or use any of the 100+ models
11response = completion(
12    model="claude-3-sonnet",
13    messages=[{"role": "user", "content": "Hello!"}],
14    api_key=os.getenv("ANTHROPIC_API_KEY"),
15)

That's it. Keywords AI automatically:

Detects all LiteLLM calls
Extracts model, tokens, and usage for all 100+ supported models
Calculates costs with up-to-date pricing
Handles sync and async calls
Works with LiteLLM proxy mode
Maps everything to a unified dashboard

Why It's Better

100+ Models Supported: Automatic cost tracking for every model LiteLLM supports
Zero Configuration: No callbacks, no manual span management
Proxy Mode Support: Works seamlessly with LiteLLM proxy
Automatic Updates: Pricing updates automatically as providers change rates
Multi-Provider: Handles OpenAI, Anthropic, Google, Meta, and 96+ more providers

7. Advanced Topics: Distributed Tracing, Context Propagation, and Sampling

Once you have basic OpenTelemetry instrumentation working, you'll want to understand these advanced concepts for production deployments.

Distributed Tracing

In microservices architectures, a single user request might trigger:

API Gateway (receives request)
Auth Service (validates token)
RAG Service (retrieves documents)
LLM Service (generates response)
Post-Processing Service (formats output)

Distributed tracing allows you to follow this request across all services. OpenTelemetry achieves this through context propagation.

How Context Propagation Works

When Service A calls Service B, it includes trace context in the HTTP headers:

python
1# Service A
2from opentelemetry import trace
3from opentelemetry.propagate import inject
4
5span = tracer.start_span("service_a.operation")
6trace_context = {}
7
8# Inject trace context into headers
9inject(trace_context)
10
11# Make HTTP request to Service B
12headers = trace_context
13response = requests.get("http://service-b/api", headers=headers)

python
1# Service B
2from opentelemetry.propagate import extract
3
4# Extract trace context from headers
5context = extract(request.headers)
6
7# Continue the trace
8with tracer.start_as_current_span("service_b.operation", context=context):
9    # This span will be a child of Service A's span
10    pass

Context Propagation in LLM Applications

For LLM applications, context propagation is critical when:

Your frontend calls your backend API
Your backend calls multiple LLM providers
You use function calling or tool use (each tool call should be a child span)
You have agentic loops (each iteration should be a child span)

Sampling

In high-traffic applications, tracing every request can be expensive. Sampling allows you to trace only a percentage of requests.

Head-Based Sampling

Decide whether to sample at the start of the trace:

python
1from opentelemetry.sdk.trace import TracerProvider
2from opentelemetry.sdk.trace.sampling import TraceIdRatioBased
3
4# Sample 10% of traces
5sampler = TraceIdRatioBased(0.1)
6provider = TracerProvider(sampler=sampler)

Tail-Based Sampling

Decide whether to keep a trace after it completes (useful for keeping error traces):

This requires a collector with tail sampling processor. Configured in collector config, not in application code.

Smart Sampling for LLMs

For LLM applications, consider:

Always sample errors: Keep 100% of failed requests
Sample by cost: Keep traces for expensive requests (high token usage)
Sample by user: Keep traces for specific users (e.g., beta testers)
Sample by model: Keep more traces for new/experimental models

Resource Attributes

Resource attributes describe the service that generated the telemetry:

python
1from opentelemetry.sdk.resources import Resource
2from opentelemetry.semantic_conventions.resource import ResourceAttributes
3
4resource = Resource.create({
5    ResourceAttributes.SERVICE_NAME: "my-llm-app",
6    ResourceAttributes.SERVICE_VERSION: "1.2.3",
7    ResourceAttributes.DEPLOYMENT_ENVIRONMENT: "production",
8    ResourceAttributes.HOST_NAME: socket.gethostname(),
9})

These attributes appear on every span and help you filter traces in your backend.

Custom Attributes and Events

Beyond the standard attributes, you can add custom ones:

python
1span = tracer.start_span("llm.completion")
2span.set_attribute("llm.user_id", user_id)
3span.set_attribute("llm.session_id", session_id)
4span.set_attribute("llm.experiment_variant", "A")  # For A/B testing
5
6# Add events (timestamped annotations)
7span.add_event("retrieval.started", {"query": query})
8span.add_event("retrieval.completed", {"doc_count": 5})

8. The Verdict: Manual vs. Automated Tracing

While manual OpenTelemetry setup is "free" (no vendor cost), the engineering hours spent maintaining it are not. Let's break down the real costs:

Time Investment

Task	Manual OTel	Keywords AI
Initial Setup	2-4 Hours	2 Minutes
Cost Calculation Implementation	4-8 Hours (for 10 models)	0 (automatic)
Maintenance per SDK Update	1-2 Hours	0 (automatic)
Adding New Model Support	30-60 Minutes per model	0 (automatic)
Dashboard Setup	2-4 Hours (Jaeger/Grafana)	0 (included)
Edge Runtime Compatibility	4-8 Hours	0 (works out of box)

Feature Comparison

Feature	Manual OTel	Keywords AI
Setup Time	2-4 Hours	2 Minutes
Cost Tracking	Manual Calculation (error-prone)	Built-in / Automatic (100+ models)
Maintenance	High (Updates with every SDK change)	Zero
Dashboard	Requires extra tool (Jaeger/Honeycomb/Grafana)	Included
LLM-Specific Views	Must build custom dashboards	Pre-built (costs, tokens, latency)
User Analytics	Must implement custom logic	Built-in
Alerting	Must set up separately	Built-in
Prompt Management	Not included	Included
Multi-Provider Support	Manual implementation	Automatic (100+ providers)

The Hidden Costs of Manual Setup

Pricing Maintenance: LLM providers change pricing frequently. You must update your cost calculation code every time.
SDK Compatibility: When Vercel AI SDK, Haystack, or LiteLLM release updates, your instrumentation might break.
Edge Cases: Handling async operations, streaming, error cases, and context propagation correctly is non-trivial.
Dashboard Development: Building useful dashboards for LLM-specific metrics (cost per user, token efficiency, etc.) takes significant time.
Team Onboarding: New engineers must understand your custom instrumentation code.

When Manual OTel Makes Sense

Manual setup might be worth it if:

You're building a custom observability platform
You have strict compliance requirements that prevent using third-party services
You're already heavily invested in a specific backend (e.g., Datadog, New Relic)
You have a dedicated observability team

When Keywords AI Makes Sense

When choosing the best LLM observability platform for your team, Keywords AI is the better choice if:

You want to focus on building AI features, not observability infrastructure
You need LLM-specific analytics (costs, token usage, quality metrics)
You want to get started quickly (2 minutes vs. 2-4 hours)
You use multiple LLM providers and want unified observability
You want built-in features like prompt management and user analytics
You need an LLM observability platform that works seamlessly with Vercel AI SDK telemetry, LiteLLM observability, and Haystack pipelines

The ROI Calculation

Let's say you spend:

4 hours on initial setup (manual)
2 hours/month on maintenance
1 hour per new model added (10 models/year = 10 hours)

Total: 4 + (2 × 12) + 10 = 38 hours/year

At $100/hour (senior engineer rate), that's $3,800/year in engineering time.

Keywords AI pricing starts at much less than this, and you get:

Zero maintenance
Automatic updates
Built-in dashboards
LLM-specific features
Support

The verdict: Unless you have specific requirements that prevent using a third-party service, Keywords AI provides better ROI for most teams.

9. Production Best Practices

Once you have OpenTelemetry instrumentation working, follow these best practices for production deployments:

1. Instrument at the Framework Level

Don't add tracing code in every function. Instead, instrument where you initialize your LLM clients:

python
1# ✅ Good: Instrument at initialization
2from keywords_ai_tracing import KeywordsTracer
3KeywordsTracer.init(api_key=os.getenv("KEYWORDSAI_API_KEY"))
4
5# Now all LLM calls are automatically traced
6response = completion(model="gpt-4", messages=messages)
7
8# ❌ Bad: Manual instrumentation everywhere
9def call_llm(messages):
10    span = tracer.start_span("llm.call")  # Don't do this everywhere
11    # ...

2. Capture User Context

Include user IDs and session IDs in your spans to enable user-level analytics:

span.set_attribute("llm.user_id", user_id)
span.set_attribute("llm.session_id", session_id)
span.set_attribute("llm.customer_id", customer_id)

3. Set Appropriate Sampling Rates

For production:

Development: 100% sampling (see everything)
Staging: 50% sampling
Production: 10% sampling, but always sample errors

from opentelemetry.sdk.trace.sampling import TraceIdRatioBased
sampler = TraceIdRatioBased(0.1) # 10% sampling

4. Monitor Trace Volume

High trace volume can be expensive. Monitor:

Spans per second
Trace size (number of spans per trace)
Export latency

If volume is too high, adjust sampling or reduce span granularity.

5. Use Semantic Conventions

Follow OpenTelemetry's semantic conventions for attribute names:

python
1# ✅ Good: Use semantic conventions
2span.set_attribute("llm.model", "gpt-4")
3span.set_attribute("llm.tokens.prompt", 150)
4span.set_attribute("http.method", "POST")
5
6# ❌ Bad: Custom attribute names
7span.set_attribute("model_name", "gpt-4")  # Inconsistent

6. Handle Errors Gracefully

Always record exceptions and set span status:

python
1try:
2    response = completion(model="gpt-4", messages=messages)
3    span.set_status(trace.Status(trace.StatusCode.OK))
4except Exception as e:
5    span.record_exception(e)
6    span.set_status(trace.Status(trace.StatusCode.ERROR, str(e)))
7    raise

7. Set Up Alerts

Configure alerts for:

Cost spikes: Unexpected increases in LLM spending
Latency degradation: p95 latency above threshold
Error rate increases: More than X% of requests failing
Token usage anomalies: Unusual token consumption patterns

8. Review Traces Regularly

Don't just collect traces—use them:

Weekly reviews: Identify optimization opportunities
Post-incident analysis: Understand what went wrong
Cost optimization: Find expensive operations to optimize
Quality monitoring: Track response quality over time

9. Version Your Instrumentation

When you update your instrumentation code, include version information:

resource = Resource.create({ResourceAttributes.SERVICE_VERSION: "1.2.3", "instrumentation.version": "2.0.0"})

This helps you correlate issues with code changes.

10. Test Your Instrumentation

Don't assume your instrumentation works. Test it:

In development with 100% sampling
With error cases (timeouts, rate limits, invalid inputs)
With high load (ensure it doesn't impact performance)
With different models (verify cost calculation is correct)

Conclusion

OpenTelemetry is the industry standard for LLM observability, and it's essential for production LLM applications. As an LLM monitoring open source solution, OpenTelemetry provides the foundation for comprehensive observability. Whether you're using Haystack for RAG pipelines, implementing Vercel AI SDK telemetry for Next.js apps, or setting up LiteLLM observability for multi-provider setups, OpenTelemetry provides a unified way to observe your applications.

The choice between manual setup and choosing the best LLM observability platform comes down to:

Manual OTel: More control, but significant engineering investment. You'll need to build custom dashboards and maintain pricing tables for all models.
Best LLM Observability Platform (Keywords AI): Faster setup, zero maintenance, LLM-specific features. An LLM observability platform that handles everything automatically.

For most teams building production LLM applications, choosing the best LLM observability platform provides better ROI. With Keywords AI, you get production-ready LLM observability in 2 minutes instead of 2-4 hours, automatic cost tracking for 100+ models, and built-in dashboards optimized for LLM workflows. Whether you need Vercel AI SDK telemetry, LiteLLM observability, or Haystack instrumentation, the right LLM observability platform will save you hundreds of engineering hours.

Ready to see your traces?

Get started with Keywords AI in 60 seconds and start instrumenting your LLM applications today. Focus on building great AI features, not observability infrastructure.

About Keywords AIKeywords AI is the leading developer platform for LLM applications.

Latest blogs

GUIDELLM Leaderboards, Explained: How Arena Elo, MT-Bench, and LiveBench Really Rank Models

January 14, 2026

GUIDEThe Ultimate Guide to Agentic Workflows: Anthropic Agent SDK Skills, MCP, and KeywordsAI Observability

January 14, 2026

GUIDEClaude Code Hooks & Cursor Agent Tracing: Setup Guide for AI Code Assistant Observability

January 12, 2026

Keywords AIPowering the best AI startups.

The Ultimate Guide to LLM Observability: Why OpenTelemetry is Essential and the Easiest Way to Set It Up (Haystack, Vercel AI SDK, LiteLLM)

The Ultimate Guide to LLM Observability: Why OpenTelemetry is Essential and the Easiest Way to Set It Up (Haystack, Vercel AI SDK, LiteLLM)

The Ultimate Guide to LLM Observability: Mastering OpenTelemetry (OTel) for AI Agents

Table of Contents

1. Deep Dive: What is OpenTelemetry (OTel)?

The Three Pillars of Observability

How OpenTelemetry Works: The Architecture

The OpenTelemetry Specification

OpenTelemetry vs. Proprietary Solutions

2. The Technical Stack: Understanding OTel Components

Component 1: Instrumentation

Component 2: SDK (Software Development Kit)

Component 3: Exporter

Component 4: Backend/Collector

3. Why OTel is Critical for LLM Observability

The LLM Observability Challenge

Why Standard Logging Falls Short

The Business Case for LLM Observability

4. Setup: Vercel AI SDK Telemetry + OpenTelemetry

Option A: Manual OTel Setup (The Hard Way)

Step 1: Install Dependencies

Step 2: Create instrumentation.ts

Step 3: Configure next.config.js

Step 4: Instrument Your API Routes

Step 5: Handle Edge Runtime Issues

The Challenges with Manual Setup

Option B: The Keywords AI Way (Recommended)

Step 1: Install Keywords AI Tracing

Step 2: Initialize in instrumentation.ts

Step 3: Use in Your API Routes

The Benefits

5. Setup: Haystack + OpenTelemetry

Option A: Manual OTel Setup (The Hard Way)

Step 1: Install Dependencies

Step 2: Initialize OpenTelemetry Provider

Step 3: Configure Haystack to Use OpenTelemetry

Step 4: Instrument Your Haystack Pipeline

The Challenges with Manual Setup

Option B: The Keywords AI Way

Step 1: Install Keywords AI Tracing

Step 2: Initialize (One Line)

Step 3: Use Your Pipeline Normally

Why It's Better

6. Setup: LiteLLM Observability + OpenTelemetry

Option A: Manual OTel Setup (The Hard Way)

Step 1: Install Dependencies

Step 2: Create OpenTelemetry Callback

Step 3: Use LiteLLM with Manual Span Creation

The Challenges with Manual Setup

Option B: The Keywords AI Way

Step 1: Install Keywords AI Tracing

Step 2: Initialize (One Line)

Step 3: Use LiteLLM Normally

Why It's Better

7. Advanced Topics: Distributed Tracing, Context Propagation, and Sampling

Distributed Tracing

How Context Propagation Works

Context Propagation in LLM Applications

Sampling

Head-Based Sampling

Tail-Based Sampling

Smart Sampling for LLMs

Resource Attributes

Custom Attributes and Events

8. The Verdict: Manual vs. Automated Tracing

Time Investment

Feature Comparison

The Hidden Costs of Manual Setup

When Manual OTel Makes Sense

When Keywords AI Makes Sense

The ROI Calculation

9. Production Best Practices

1. Instrument at the Framework Level

2. Capture User Context

3. Set Appropriate Sampling Rates

4. Monitor Trace Volume

5. Use Semantic Conventions

6. Handle Errors Gracefully

7. Set Up Alerts

8. Review Traces Regularly

Step 2: Create `instrumentation.ts`

Step 3: Configure `next.config.js`

Step 2: Initialize in `instrumentation.ts`