Keywords AI

BLOG

The Ultimate Guide to LLM Observability: Why OpenTelemetry is Essential and the Easiest Way to Set It Up (Haystack, Vercel AI SDK, LiteLLM)

The Ultimate Guide to LLM Observability: Why OpenTelemetry is Essential and the Easiest Way to Set It Up (Haystack, Vercel AI SDK, LiteLLM)

January 26, 2026

The Ultimate Guide to LLM Observability: Mastering OpenTelemetry (OTel) for AI Agents

Building an AI agent is easy; knowing why it's failing is the hard part. As you move from simple chat completions to complex agentic loops with Haystack, the Vercel AI SDK, or LiteLLM, you quickly realize that traditional logging isn't enough. You need traces.

In this comprehensive guide, we dive deep into the industry standard: OpenTelemetry (OTel). We'll explore the technical "How-To" for the most popular LLM stacks and compare manual setup vs. the streamlined Keywords AI approach. By the end, you'll understand not just how to instrument your LLM applications, but why OpenTelemetry has become the de facto standard for production AI observability.

Table of Contents

  1. Deep Dive: What is OpenTelemetry (OTel)?
  2. Why OTel is Critical for LLM Observability
  3. The Technical Stack: Understanding OTel Components
  4. Setup: Vercel AI SDK + OpenTelemetry
  5. Setup: Haystack + OpenTelemetry
  6. Setup: LiteLLM + OpenTelemetry
  7. Advanced Topics: Distributed Tracing, Context Propagation, and Sampling
  8. The Verdict: Manual vs. Automated Tracing
  9. Production Best Practices

1. Deep Dive: What is OpenTelemetry (OTel)?

OpenTelemetry is not a backend; it is a vendor-neutral, open source observability framework. It provides a standardized set of APIs, SDKs, and protocols (OTLP) to collect "telemetry"—Traces, Metrics, and Logs. Think of it as the "USB-C of observability": a universal standard that works with any tool.

As an open source project under the Cloud Native Computing Foundation (CNCF), OpenTelemetry provides LLM monitoring open source solutions that don't lock you into proprietary platforms. This makes it the ideal foundation for building comprehensive LLM observability into your applications.

The Three Pillars of Observability

OpenTelemetry is built around three core data types:

1. Traces A trace represents the entire lifecycle of a request as it flows through your system. In LLM applications, a trace might capture:

  • The initial user query
  • RAG retrieval operations
  • Multiple LLM calls (if using agentic loops)
  • Post-processing steps
  • Final response delivery

2. Metrics Quantitative measurements over time. For LLMs, critical metrics include:

  • Token usage per model
  • Cost per request
  • Latency percentiles (p50, p95, p99)
  • Error rates
  • Throughput (requests per second)

3. Logs Structured events with timestamps. While logs are less emphasized in OTel (compared to traces and metrics), they're still valuable for capturing:

  • Error messages
  • Debug information
  • Audit trails

How OpenTelemetry Works: The Architecture

In the context of LLMs, OTel works by creating a Trace, which represents the entire lifecycle of a user request. Inside that trace are Spans.

  • Trace: The "Macro" view (e.g., a user asks for a summary of a PDF, which triggers retrieval, embedding, and generation).
  • Span: The "Micro" view (e.g., the embedding call, the vector search, the LLM completion, the post-processing step).

Each span contains:

  • Name: What operation was performed (e.g., "llm.completion")
  • Attributes: Key-value pairs (e.g., llm.model="gpt-4", llm.tokens.prompt=150)
  • Events: Timestamped annotations (e.g., "retrieval.started", "model.selected")
  • Status: Success, error, or unset
  • Duration: How long the operation took

The OpenTelemetry Specification

OpenTelemetry follows a strict specification that ensures interoperability. The spec defines:

  • Semantic Conventions: Standard attribute names (e.g., http.method, db.query, llm.model)
  • OTLP Protocol: The wire format for sending telemetry data
  • API Contracts: How SDKs must behave across languages

This standardization means you can instrument your Python Haystack pipeline, your TypeScript Vercel AI SDK app, and your LiteLLM proxy, and they'll all produce compatible traces that can be viewed in a single dashboard.

OpenTelemetry vs. Proprietary Solutions

Unlike vendor-specific solutions (Datadog APM, New Relic, etc.), OpenTelemetry gives you:

  • Vendor Freedom: Instrument once, export anywhere
  • Community Standards: Built by the CNCF with industry-wide adoption
  • Future-Proofing: Your instrumentation code doesn't break when you switch backends
  • Cost Efficiency: Avoid vendor lock-in and choose the most cost-effective backend

2. The Technical Stack: Understanding OTel Components

To get OpenTelemetry running in production, your system needs four components working together:

Component 1: Instrumentation

The code inside your application that generates spans. This can be:

  • Automatic: Using auto-instrumentation libraries that hook into frameworks
  • Manual: Writing custom spans using the OTel API
  • Hybrid: Combining both approaches

For LLM applications, you typically need manual instrumentation because:

  • LLM frameworks don't always have built-in OTel support
  • You need to capture LLM-specific attributes (tokens, costs, model IDs)
  • You want fine-grained control over what gets traced

Component 2: SDK (Software Development Kit)

The language-specific implementation of the OpenTelemetry API. Popular SDKs include:

  • @opentelemetry/sdk-node (Node.js/TypeScript)
  • opentelemetry-sdk (Python)
  • opentelemetry-js (Browser/Edge)

The SDK handles:

  • Span creation and management
  • Context propagation (passing trace context across async boundaries)
  • Resource detection (identifying your service, host, etc.)

Component 3: Exporter

The component that sends telemetry data to your backend. Common exporters:

  • OTLP Exporter: Sends data via the OpenTelemetry Protocol (recommended)
  • Jaeger Exporter: Direct export to Jaeger
  • Zipkin Exporter: Direct export to Zipkin
  • Console Exporter: For debugging (prints to stdout)

For LLM observability, you'll typically use the OTLP HTTP exporter to send data to an LLM observability platform like Keywords AI, which provides LLM-specific dashboards and analytics. When choosing the best LLM observability platform for your needs, consider factors like cost tracking, token usage analytics, and integration with your existing stack.

Component 4: Backend/Collector

Where your telemetry data lives and gets visualized. You have two options:

Option A: Direct Export Your application exports directly to a backend (e.g., Keywords AI, Datadog, Grafana Cloud).

Option B: OpenTelemetry Collector A middleman service that receives, processes, and routes telemetry data. The collector is useful for:

  • Batch processing (reducing API calls)
  • Data transformation (enriching spans with metadata)
  • Multi-backend routing (sending to multiple destinations)
  • Sampling (reducing data volume)

For most LLM applications, direct export is simpler and sufficient. The collector adds operational complexity that may not be necessary unless you're running at massive scale.


3. Why OTel is Critical for LLM Observability

LLM applications are fundamentally different from traditional web applications. They're non-deterministic, stateful, and involve complex multi-step workflows. When an agent fails, it could be because of:

  • A prompt injection attack
  • A retrieval error (wrong documents returned)
  • A model timeout
  • A rate limit hit
  • A cost budget exceeded
  • A hallucination that went undetected

OpenTelemetry's Distributed Tracing allows you to follow the request as it travels across different microservices, LLM providers, and infrastructure components. This is critical for debugging production issues.

The LLM Observability Challenge

Traditional application monitoring focuses on:

  • CPU usage
  • Memory consumption
  • Request latency
  • Error rates

LLM observability requires tracking:

  • Token usage and costs across different models (GPT-4 vs. Claude vs. Gemini)
  • Prompt and response quality (hallucinations, relevance, accuracy)
  • Model performance (latency, throughput, error rates per model)
  • User interactions and conversation flows (multi-turn conversations)
  • RAG pipeline performance (retrieval accuracy, context relevance, embedding quality)
  • Chain execution (multi-step workflows, tool calls, function calling)
  • Cost optimization (identifying which models are most cost-effective for specific tasks)

Why Standard Logging Falls Short

Consider this scenario: A user reports that your AI agent gave a wrong answer. With standard logging, you might see:

[INFO] User query: "What is the capital of France?"
[INFO] LLM response: "Paris"

But you don't know:

  • Which model was used?
  • How many tokens were consumed?
  • What was the latency?
  • What documents were retrieved (if using RAG)?
  • What was the cost?
  • Was there an error that was silently handled?

With OpenTelemetry traces, you get a complete picture:

Trace: user_query_abc123
├─ Span: rag.retrieval
│  ├─ Attributes: retrieval.doc_count=5, retrieval.latency_ms=120
│  └─ Events: retrieval.started, retrieval.completed
├─ Span: llm.completion
│  ├─ Attributes: llm.model=gpt-4, llm.tokens.prompt=250, llm.tokens.completion=50
│  ├─ Attributes: llm.cost=0.003, llm.latency_ms=850
│  └─ Status: OK
└─ Span: post_processing
   └─ Attributes: processing.type=sentiment_analysis

The Business Case for LLM Observability

Beyond debugging, observability drives business outcomes:

  1. Cost Optimization: Identify which models are most cost-effective for specific use cases
  2. Quality Assurance: Track hallucination rates and response quality over time
  3. User Experience: Understand latency patterns and optimize for user satisfaction
  4. Compliance: Audit trails for regulated industries (healthcare, finance)
  5. Capacity Planning: Understand usage patterns to scale infrastructure appropriately

When evaluating LLM observability platforms, look for solutions that provide comprehensive LLM monitoring open source capabilities through OpenTelemetry integration. The best LLM observability platform will offer automatic cost tracking, token usage analytics, and seamless integration with frameworks like Vercel AI SDK, Haystack, and LiteLLM.

Key SEO Keywords: llm observability (590), llm observability platform (50), best llm observability platform (50), llm monitoring open source (40), vercel ai sdk telemetry (40), litellm observability (40), opentelemetry llm (30), haystack observability (15).


4. Setup: Vercel AI SDK Telemetry + OpenTelemetry

The Vercel AI SDK is the go-to choice for Next.js developers building AI applications. It provides a unified interface for working with multiple LLM providers (OpenAI, Anthropic, Google, etc.) and handles streaming, tool calling, and structured outputs.

However, managing Vercel AI SDK telemetry with OpenTelemetry in a serverless environment (Vercel Functions) comes with significant overhead. Setting up proper observability for Vercel AI SDK requires careful configuration of OpenTelemetry instrumentation. Let's explore both approaches.

Option A: Manual OTel Setup (The Hard Way)

To manually instrument Vercel AI SDK, you must use the @opentelemetry/sdk-node package and configure an instrumentation.ts file. Here's what's involved:

Step 1: Install Dependencies

npm install @opentelemetry/api @opentelemetry/sdk-node @opentelemetry/exporter-trace-otlp-http @opentelemetry/instrumentation @opentelemetry/resources @opentelemetry/semantic-conventions

Step 2: Create instrumentation.ts

Vercel requires an instrumentation.ts file in your project root to initialize OpenTelemetry before your application code runs:

typescript
1// instrumentation.ts 2import { NodeSDK } from '@opentelemetry/sdk-node'; 3import { OTLPTraceExporter } from '@opentelemetry/exporter-trace-otlp-http'; 4import { Resource } from '@opentelemetry/resources'; 5import { SemanticResourceAttributes } from '@opentelemetry/semantic-conventions'; 6import { HttpInstrumentation } from '@opentelemetry/instrumentation-http'; 7import { ExpressInstrumentation } from '@opentelemetry/instrumentation-express'; 8 9const sdk = new NodeSDK({ 10 resource: new Resource({ 11 [SemanticResourceAttributes.SERVICE_NAME]: 'vercel-ai-app', 12 [SemanticResourceAttributes.SERVICE_VERSION]: process.env.VERCEL_GIT_COMMIT_SHA || '1.0.0', 13 [SemanticResourceAttributes.DEPLOYMENT_ENVIRONMENT]: process.env.VERCEL_ENV || 'development', 14 }), 15 traceExporter: new OTLPTraceExporter({ 16 url: process.env.OTEL_EXPORTER_OTLP_ENDPOINT || 'https://api.keywordsai.co/v1/traces', 17 headers: { 18 'Authorization': `Bearer ${process.env.KEYWORDSAI_API_KEY}`, 19 'Content-Type': 'application/json', 20 }, 21 }), 22 instrumentations: [ 23 new HttpInstrumentation(), 24 new ExpressInstrumentation(), 25 ], 26}); 27 28sdk.start(); 29 30// Ensure spans are flushed before the process exits 31process.on('SIGTERM', () => { 32 sdk.shutdown() 33 .then(() => console.log('OpenTelemetry terminated')) 34 .catch((error) => console.log('Error terminating OpenTelemetry', error)) 35 .finally(() => process.exit(0)); 36});

Step 3: Configure next.config.js

You need to enable the experimental instrumentationHook:

javascript
1// next.config.js 2module.exports = { 3 experimental: { 4 instrumentationHook: true, 5 }, 6};

Step 4: Instrument Your API Routes

Now you need to manually wrap every AI SDK call with spans:

typescript
1// app/api/chat/route.ts 2import { openai } from '@ai-sdk/openai'; 3import { generateText, streamText } from 'ai'; 4import { trace, context } from '@opentelemetry/api'; 5import { NextRequest, NextResponse } from 'next/server'; 6 7const tracer = trace.getTracer('vercel-ai-sdk'); 8 9export async function POST(request: NextRequest) { 10 const { messages, stream } = await request.json(); 11 12 // Start a trace for this request 13 const span = tracer.startSpan('ai.chat', { 14 attributes: { 15 'llm.framework': 'vercel-ai-sdk', 16 'llm.provider': 'openai', 17 'http.method': 'POST', 18 'http.route': '/api/chat', 19 }, 20 }); 21 22 try { 23 const activeContext = trace.setSpan(context.active(), span); 24 25 return await context.with(activeContext, async () => { 26 if (stream) { 27 return handleStreaming(messages, span); 28 } else { 29 return handleNonStreaming(messages, span); 30 } 31 }); 32 } catch (error) { 33 span.recordException(error as Error); 34 span.setStatus({ code: 1, message: (error as Error).message }); 35 throw error; 36 } finally { 37 span.end(); 38 } 39} 40 41async function handleNonStreaming(messages: any[], span: any) { 42 const generateSpan = tracer.startSpan('ai.generate', { 43 parent: span, 44 }); 45 46 try { 47 generateSpan.setAttributes({ 48 'llm.messages.count': messages.length, 49 'llm.messages.last': JSON.stringify(messages[messages.length - 1]), 50 }); 51 52 const { text, usage, finishReason } = await generateText({ 53 model: openai('gpt-4'), 54 messages: messages, 55 }); 56 57 // Extract and set LLM-specific attributes 58 generateSpan.setAttributes({ 59 'llm.model': 'gpt-4', 60 'llm.response': text, 61 'llm.tokens.prompt': usage.promptTokens, 62 'llm.tokens.completion': usage.completionTokens, 63 'llm.tokens.total': usage.totalTokens, 64 'llm.finish_reason': finishReason, 65 'llm.cost': calculateCost(usage.promptTokens, usage.completionTokens, 'gpt-4'), 66 }); 67 68 generateSpan.setStatus({ code: 0 }); // OK 69 return NextResponse.json({ text }); 70 } catch (error) { 71 generateSpan.recordException(error as Error); 72 generateSpan.setStatus({ code: 1, message: (error as Error).message }); 73 throw error; 74 } finally { 75 generateSpan.end(); 76 } 77} 78 79async function handleStreaming(messages: any[], span: any) { 80 const streamSpan = tracer.startSpan('ai.stream', { 81 parent: span, 82 }); 83 84 try { 85 streamSpan.setAttribute('llm.streaming', true); 86 87 const result = await streamText({ 88 model: openai('gpt-4'), 89 messages: messages, 90 }); 91 92 // For streaming, we need to track tokens differently 93 // This is a simplified example - real implementation is more complex 94 streamSpan.setAttribute('llm.model', 'gpt-4'); 95 96 return result.toDataStreamResponse(); 97 } catch (error) { 98 streamSpan.recordException(error as Error); 99 streamSpan.setStatus({ code: 1, message: (error as Error).message }); 100 throw error; 101 } finally { 102 streamSpan.end(); 103 } 104} 105 106function calculateCost(promptTokens: number, completionTokens: number, model: string): number { 107 // Pricing as of 2026 (example - update with actual prices) 108 const pricing: Record<string, { prompt: number; completion: number }> = { 109 'gpt-4': { prompt: 0.03 / 1000, completion: 0.06 / 1000 }, 110 'gpt-4-turbo': { prompt: 0.01 / 1000, completion: 0.03 / 1000 }, 111 'gpt-3.5-turbo': { prompt: 0.0015 / 1000, completion: 0.002 / 1000 }, 112 }; 113 114 const modelPricing = pricing[model] || pricing['gpt-3.5-turbo']; 115 return (promptTokens * modelPricing.prompt) + (completionTokens * modelPricing.completion); 116}

Step 5: Handle Edge Runtime Issues

Vercel's Edge Runtime doesn't support Node.js APIs, which means OpenTelemetry SDKs that rely on Node.js won't work. You have two options:

  1. Switch to Node.js Runtime: Add export const runtime = 'nodejs' to your route
  2. Use Edge-Compatible Alternatives: Use Web APIs and manual instrumentation (more complex)

The Challenges with Manual Setup

  • Runtime Compatibility: Edge vs. Node.js runtime conflicts
  • Span Lifecycle Management: Ensuring spans are flushed before serverless functions terminate
  • Cost Calculation: You must manually implement pricing logic for every model
  • Context Propagation: Handling async boundaries and context loss
  • Maintenance Burden: Every SDK update might break your instrumentation

Read the full manual guide: Vercel OTel Docs

Keywords AI replaces dozens of lines of configuration with a single package. We handle the runtime compatibility, the mapping of LLM-specific metadata (tokens, costs, model IDs), and the span lifecycle automatically.

Full setup guide: Vercel AI SDK + Keywords AI tracing

Step 1: Install Keywords AI Tracing

npm install @keywordsai/tracing-node

Step 2: Initialize in instrumentation.ts

typescript
1// instrumentation.ts 2import { KeywordsTracer } from '@keywordsai/tracing-node'; 3 4KeywordsTracer.init({ 5 apiKey: process.env.KEYWORDSAI_API_KEY, 6 serviceName: 'vercel-ai-app', 7});

That's it. No manual SDK configuration, no exporter setup, no span lifecycle management.

Step 3: Use in Your API Routes

typescript
1// app/api/chat/route.ts 2import { openai } from '@ai-sdk/openai'; 3import { generateText, streamText } from 'ai'; 4import { NextRequest, NextResponse } from 'next/server'; 5 6export async function POST(request: NextRequest) { 7 const { messages, stream } = await request.json(); 8 9 if (stream) { 10 const result = await streamText({ 11 model: openai('gpt-4'), 12 messages: messages, 13 }); 14 return result.toDataStreamResponse(); 15 } else { 16 const { text } = await generateText({ 17 model: openai('gpt-4'), 18 messages: messages, 19 }); 20 return NextResponse.json({ text }); 21 } 22}

That's it. Keywords AI automatically:

  • Detects Vercel AI SDK calls
  • Creates spans with proper hierarchy
  • Extracts token usage, costs, and model information
  • Handles streaming vs. non-streaming
  • Works in both Edge and Node.js runtimes
  • Flushes spans before function termination

The Benefits

  • 2 Minutes Setup: vs. 2-4 hours for manual setup
  • Zero Maintenance: Updates automatically with SDK changes
  • Automatic Cost Tracking: No manual pricing calculations
  • Built-in Dashboard: No need for separate visualization tools
  • Edge Runtime Support: Works out of the box

5. Setup: Haystack + OpenTelemetry

Haystack by deepset is a powerhouse for Python-based RAG pipelines. It's designed for production-ready LLM applications with built-in support for document stores, retrievers, generators, and complex pipelines.

Haystack has built-in support for OpenTelemetry, but the "wiring" is left to you. Let's compare the manual approach vs. the Keywords AI way.

Option A: Manual OTel Setup (The Hard Way)

For Haystack, you need to set up a Python tracer provider and link it to Haystack's internal tracing backend. Here's the complete setup:

Step 1: Install Dependencies

pip install haystack-ai opentelemetry-api opentelemetry-sdk opentelemetry-exporter-otlp-proto-http opentelemetry-instrumentation

Step 2: Initialize OpenTelemetry Provider

python
1# tracing_setup.py 2from opentelemetry import trace 3from opentelemetry.sdk.trace import TracerProvider 4from opentelemetry.sdk.trace.export import BatchSpanProcessor 5from opentelemetry.exporter.otlp.proto.http.trace_exporter import OTLPSpanExporter 6from opentelemetry.sdk.resources import Resource 7from opentelemetry.semantic_conventions.resource import ResourceAttributes 8import os 9 10# Create resource with service information 11resource = Resource.create({ 12 ResourceAttributes.SERVICE_NAME: "haystack-rag-app", 13 ResourceAttributes.SERVICE_VERSION: os.getenv("APP_VERSION", "1.0.0"), 14 ResourceAttributes.DEPLOYMENT_ENVIRONMENT: os.getenv("ENVIRONMENT", "production"), 15}) 16 17# Initialize tracer provider 18provider = TracerProvider(resource=resource) 19trace.set_tracer_provider(provider) 20 21# Create OTLP exporter 22exporter = OTLPSpanExporter( 23 endpoint="https://api.keywordsai.co/v1/traces", 24 headers={ 25 "Authorization": f"Bearer {os.getenv('KEYWORDSAI_API_KEY')}", 26 "Content-Type": "application/json", 27 }, 28) 29 30# Add batch processor 31processor = BatchSpanProcessor(exporter) 32provider.add_span_processor(processor) 33 34# Get tracer 35tracer = trace.get_tracer(__name__)

Step 3: Configure Haystack to Use OpenTelemetry

Haystack has a tracing abstraction that you need to connect to OpenTelemetry:

python
1# haystack_tracing.py 2from haystack.tracing import OpenTelemetryTracer 3from opentelemetry import trace 4from opentelemetry.trace import Status, StatusCode 5 6class CustomOpenTelemetryTracer(OpenTelemetryTracer): 7 """Custom tracer that adds LLM-specific attributes""" 8 9 def __init__(self): 10 super().__init__(trace.get_tracer("haystack")) 11 12 def trace(self, operation_name: str, tags: dict = None, **kwargs): 13 """Override to add custom attributes""" 14 span = self.tracer.start_span(operation_name) 15 16 if tags: 17 for key, value in tags.items(): 18 # Map Haystack tags to OTel attributes 19 if key == "model": 20 span.set_attribute("llm.model", value) 21 elif key == "provider": 22 span.set_attribute("llm.provider", value) 23 elif key == "prompt_tokens": 24 span.set_attribute("llm.tokens.prompt", value) 25 elif key == "completion_tokens": 26 span.set_attribute("llm.tokens.completion", value) 27 else: 28 span.set_attribute(f"haystack.{key}", str(value)) 29 30 return span 31 32# Initialize Haystack tracing 33from haystack import tracing 34haystack_tracer = CustomOpenTelemetryTracer() 35tracing.set_backend(haystack_tracer)

Step 4: Instrument Your Haystack Pipeline

Now you need to manually instrument every component in your pipeline:

python
1# pipeline.py 2from haystack import Pipeline, Document 3from haystack.components.builders import PromptBuilder 4from haystack.components.retrievers import InMemoryBM25Retriever 5from haystack.components.generators import OpenAIGenerator 6from haystack.document_stores import InMemoryDocumentStore 7from opentelemetry import trace 8import os 9 10# Import tracing setup 11from tracing_setup import tracer 12 13def create_rag_pipeline(): 14 """Create a RAG pipeline with manual OpenTelemetry instrumentation""" 15 16 # Document store 17 document_store = InMemoryDocumentStore() 18 19 # Retriever 20 retriever = InMemoryBM25Retriever(document_store=document_store, top_k=5) 21 22 # Prompt builder 23 prompt_template = """ 24 Given the following information, answer the question. 25 26 Context: 27 {% for document in documents %} 28 {{ document.content }} 29 {% endfor %} 30 31 Question: {{ query }} 32 Answer: 33 """ 34 prompt_builder = PromptBuilder(template=prompt_template) 35 36 # LLM generator 37 generator = OpenAIGenerator(api_key=os.getenv("OPENAI_API_KEY")) 38 39 # Create pipeline 40 pipeline = Pipeline() 41 pipeline.add_component("retriever", retriever) 42 pipeline.add_component("prompt_builder", prompt_builder) 43 pipeline.add_component("llm", generator) 44 45 pipeline.connect("retriever", "prompt_builder.documents") 46 pipeline.connect("prompt_builder", "llm.prompt") 47 48 return pipeline 49 50def run_rag_query(query: str, pipeline: Pipeline): 51 """Run a RAG query with full OpenTelemetry tracing""" 52 53 # Start root span 54 with tracer.start_as_current_span("haystack.rag_pipeline") as root_span: 55 root_span.set_attribute("query", query) 56 root_span.set_attribute("llm.framework", "haystack") 57 58 # Retrieval span 59 with tracer.start_as_current_span("haystack.retrieval") as retrieval_span: 60 # Manually call retriever to get documents 61 documents = pipeline.get_component("retriever").run(query=query) 62 retrieval_span.set_attribute("retrieval.doc_count", len(documents["documents"])) 63 retrieval_span.set_attribute("retrieval.query", query) 64 65 # Prompt building span 66 with tracer.start_as_current_span("haystack.prompt_building") as prompt_span: 67 prompt = pipeline.get_component("prompt_builder").run( 68 query=query, 69 documents=documents["documents"] 70 ) 71 prompt_span.set_attribute("prompt.length", len(prompt["prompt"])) 72 73 # LLM generation span 74 with tracer.start_as_current_span("haystack.generation") as gen_span: 75 response = pipeline.get_component("llm").run(prompt=prompt["prompt"]) 76 77 # Extract usage information (this varies by generator) 78 if hasattr(response, "meta") and "usage" in response.meta: 79 usage = response.meta["usage"] 80 gen_span.set_attribute("llm.tokens.prompt", usage.get("prompt_tokens", 0)) 81 gen_span.set_attribute("llm.tokens.completion", usage.get("completion_tokens", 0)) 82 gen_span.set_attribute("llm.tokens.total", usage.get("total_tokens", 0)) 83 84 # Extract model information 85 if hasattr(response, "meta") and "model" in response.meta: 86 gen_span.set_attribute("llm.model", response.meta["model"]) 87 88 # Calculate cost (manual implementation required) 89 gen_span.set_attribute("llm.cost", calculate_llm_cost(response)) 90 91 gen_span.set_attribute("llm.response", response["replies"][0]) 92 93 return response 94 95def calculate_llm_cost(response): 96 """Manually calculate LLM cost - you must implement this for every model""" 97 # This is a simplified example - real implementation needs pricing for all models 98 if not hasattr(response, "meta") or "usage" not in response.meta: 99 return 0 100 101 usage = response.meta["usage"] 102 model = response.meta.get("model", "gpt-3.5-turbo") 103 104 # Pricing table (you must maintain this) 105 pricing = { 106 "gpt-4": {"prompt": 0.03 / 1000, "completion": 0.06 / 1000}, 107 "gpt-4-turbo": {"prompt": 0.01 / 1000, "completion": 0.03 / 1000}, 108 "gpt-3.5-turbo": {"prompt": 0.0015 / 1000, "completion": 0.002 / 1000}, 109 "claude-3-opus": {"prompt": 0.015 / 1000, "completion": 0.075 / 1000}, 110 # ... you must add pricing for every model you use 111 } 112 113 model_pricing = pricing.get(model, pricing["gpt-3.5-turbo"]) 114 prompt_tokens = usage.get("prompt_tokens", 0) 115 completion_tokens = usage.get("completion_tokens", 0) 116 117 return (prompt_tokens * model_pricing["prompt"]) + (completion_tokens * model_pricing["completion"]) 118 119# Ensure spans are flushed before process exits 120import atexit 121def flush_spans(): 122 from opentelemetry import trace 123 provider = trace.get_tracer_provider() 124 if hasattr(provider, "force_flush"): 125 provider.force_flush() 126 127atexit.register(flush_spans)

The Challenges with Manual Setup

  • Span Lifecycle Management: You must ensure spans are flushed before the Python process exits. If your script ends too fast, you lose your logs.
  • Cost Calculation: You must manually implement pricing logic for every model (OpenAI, Anthropic, Google, etc.). This is error-prone and requires constant updates.
  • Component Instrumentation: Haystack pipelines can have many components. Manually instrumenting each one is tedious.
  • Usage Extraction: Different generators (OpenAI, Anthropic, etc.) return usage information in different formats. You must handle each case.
  • LiteLLM Integration: If you use LiteLLM within Haystack (common for multi-provider support), you need separate instrumentation.

Read the manual docs: Haystack Tracing Guide

Option B: The Keywords AI Way

With Keywords AI, we've built a dedicated exporter specifically for the Haystack OpenTelemetry integration. Here's how simple it is:

Full setup guide: Haystack + Keywords AI tracing

Step 1: Install Keywords AI Tracing

pip install keywords-ai-tracing

Step 2: Initialize (One Line)

python
1# main.py 2from keywords_ai_tracing import KeywordsTracer 3 4# Initialize - everything is handled automatically 5tracer = KeywordsTracer(api_key=os.getenv("KEYWORDSAI_API_KEY")) 6 7# That's it! Haystack is now automatically instrumented

Step 3: Use Your Pipeline Normally

python
1from haystack import Pipeline 2from haystack.components.builders import PromptBuilder 3from haystack.components.generators import OpenAIGenerator 4 5# Create your pipeline as normal 6pipeline = Pipeline() 7pipeline.add_component("prompt_builder", PromptBuilder(template="Answer: {{query}}")) 8pipeline.add_component("llm", OpenAIGenerator(api_key=os.getenv("OPENAI_API_KEY"))) 9pipeline.connect("prompt_builder.prompt", "llm.prompt") 10 11# Run it - tracing happens automatically 12result = pipeline.run({"query": "What is the capital of France?"})

That's it. Keywords AI automatically:

  • Detects all Haystack components
  • Creates spans for retrieval, prompt building, and generation
  • Extracts token usage and costs (for all providers)
  • Handles LiteLLM if you use it within Haystack
  • Flushes spans before process termination
  • Maps everything to a beautiful dashboard

Why It's Better

  • Zero Configuration: No manual tracer setup, no span lifecycle management
  • Automatic Cost Tracking: Supports 100+ models with up-to-date pricing
  • LiteLLM Support: If you use LiteLLM within Haystack, it's automatically traced
  • Component Detection: Automatically instruments all Haystack components
  • Production Ready: Handles edge cases, errors, and async operations

6. Setup: LiteLLM Observability + OpenTelemetry

LiteLLM is a unified proxy that standardizes calls across 100+ LLM providers. It's perfect for:

  • Multi-provider applications (switching between OpenAI, Anthropic, Google, etc.)
  • Cost optimization (automatic fallbacks to cheaper models)
  • Rate limit management
  • Load balancing across providers

LiteLLM observability is crucial for understanding which models perform best and optimizing costs across providers. LiteLLM has built-in support for observability through callbacks, but integrating with OpenTelemetry requires manual work. Let's compare both approaches for implementing LiteLLM observability.

Option A: Manual OTel Setup (The Hard Way)

LiteLLM provides callback hooks that you can use to send data to OpenTelemetry. Here's the complete manual setup:

Step 1: Install Dependencies

pip install litellm opentelemetry-api opentelemetry-sdk opentelemetry-exporter-otlp-proto-http

Step 2: Create OpenTelemetry Callback

python
1# litellm_otel_callback.py 2from opentelemetry import trace 3from opentelemetry.sdk.trace import TracerProvider 4from opentelemetry.sdk.trace.export import BatchSpanProcessor 5from opentelemetry.exporter.otlp.proto.http.trace_exporter import OTLPSpanExporter 6from opentelemetry.sdk.resources import Resource 7from opentelemetry.semantic_conventions.resource import ResourceAttributes 8import os 9from typing import Optional, Dict, Any 10from litellm import completion 11 12# Initialize OpenTelemetry 13resource = Resource.create({ 14 ResourceAttributes.SERVICE_NAME: "litellm-proxy", 15}) 16 17provider = TracerProvider(resource=resource) 18trace.set_tracer_provider(provider) 19 20exporter = OTLPSpanExporter( 21 endpoint="https://api.keywordsai.co/v1/traces", 22 headers={ 23 "Authorization": f"Bearer {os.getenv('KEYWORDSAI_API_KEY')}", 24 }, 25) 26 27processor = BatchSpanProcessor(exporter) 28provider.add_span_processor(processor) 29 30tracer = trace.get_tracer(__name__) 31 32# Store active spans in a thread-safe way 33from threading import local 34_thread_local = local() 35 36def get_current_span(): 37 """Get the current span from thread local storage""" 38 return getattr(_thread_local, 'span', None) 39 40def set_current_span(span): 41 """Set the current span in thread local storage""" 42 _thread_local.span = span 43 44class LiteLLMOpenTelemetryCallback: 45 """OpenTelemetry callback for LiteLLM""" 46 47 def __init__(self): 48 self.tracer = tracer 49 50 def log_success_event(self, kwargs, response_obj, start_time, end_time): 51 """Called when a completion succeeds""" 52 span = get_current_span() 53 if span: 54 # Extract model information 55 model = kwargs.get("model", "unknown") 56 span.set_attribute("llm.model", model) 57 span.set_attribute("llm.provider", self._extract_provider(model)) 58 59 # Extract usage information 60 if hasattr(response_obj, "usage"): 61 usage = response_obj.usage 62 span.set_attribute("llm.tokens.prompt", usage.prompt_tokens) 63 span.set_attribute("llm.tokens.completion", usage.completion_tokens) 64 span.set_attribute("llm.tokens.total", usage.total_tokens) 65 66 # Extract response 67 if hasattr(response_obj, "choices") and len(response_obj.choices) > 0: 68 span.set_attribute("llm.response", response_obj.choices[0].message.content) 69 70 # Calculate latency 71 latency_ms = (end_time - start_time) * 1000 72 span.set_attribute("llm.latency_ms", latency_ms) 73 74 # Calculate cost (manual implementation required) 75 cost = self._calculate_cost(kwargs, response_obj) 76 span.set_attribute("llm.cost", cost) 77 78 # Extract request information 79 if "messages" in kwargs: 80 span.set_attribute("llm.messages.count", len(kwargs["messages"])) 81 span.set_attribute("llm.messages.last", str(kwargs["messages"][-1])) 82 83 if "temperature" in kwargs: 84 span.set_attribute("llm.temperature", kwargs["temperature"]) 85 if "max_tokens" in kwargs: 86 span.set_attribute("llm.max_tokens", kwargs["max_tokens"]) 87 88 span.set_status(trace.Status(trace.StatusCode.OK)) 89 span.end() 90 set_current_span(None) 91 92 def log_failure_event(self, kwargs, response_obj, start_time, end_time, error): 93 """Called when a completion fails""" 94 span = get_current_span() 95 if span: 96 span.record_exception(error) 97 span.set_status(trace.Status(trace.StatusCode.ERROR, str(error))) 98 span.end() 99 set_current_span(None) 100 101 def async_log_success_event(self, kwargs, response_obj, start_time, end_time): 102 """Async version - same as sync""" 103 self.log_success_event(kwargs, response_obj, start_time, end_time) 104 105 def async_log_failure_event(self, kwargs, response_obj, start_time, end_time, error): 106 """Async version - same as sync""" 107 self.log_failure_event(kwargs, response_obj, start_time, end_time, error) 108 109 def _extract_provider(self, model: str) -> str: 110 """Extract provider name from model string""" 111 if model.startswith("gpt-") or model.startswith("o1-"): 112 return "openai" 113 elif model.startswith("claude-") or model.startswith("sonnet-"): 114 return "anthropic" 115 elif model.startswith("gemini-") or "google" in model.lower(): 116 return "google" 117 elif model.startswith("llama-") or "meta" in model.lower(): 118 return "meta" 119 else: 120 return "unknown" 121 122 def _calculate_cost(self, kwargs: Dict, response_obj: Any) -> float: 123 """Manually calculate cost - you must implement this for all 100+ models""" 124 model = kwargs.get("model", "gpt-3.5-turbo") 125 126 if not hasattr(response_obj, "usage"): 127 return 0 128 129 usage = response_obj.usage 130 131 # You must maintain pricing for 100+ models 132 # This is a simplified example - real implementation is massive 133 pricing = { 134 # OpenAI 135 "gpt-4": {"prompt": 0.03 / 1000, "completion": 0.06 / 1000}, 136 "gpt-4-turbo": {"prompt": 0.01 / 1000, "completion": 0.03 / 1000}, 137 "gpt-3.5-turbo": {"prompt": 0.0015 / 1000, "completion": 0.002 / 1000}, 138 "o1-preview": {"prompt": 0.015 / 1000, "completion": 0.06 / 1000}, 139 # Anthropic 140 "claude-3-opus": {"prompt": 0.015 / 1000, "completion": 0.075 / 1000}, 141 "claude-3-sonnet": {"prompt: 0.003 / 1000, "completion": 0.015 / 1000}, 142 "claude-3-haiku": {"prompt": 0.00025 / 1000, "completion": 0.00125 / 1000}, 143 # Google 144 "gemini-pro": {"prompt": 0.0005 / 1000, "completion": 0.0015 / 1000}, 145 # ... you need pricing for 100+ more models 146 } 147 148 model_pricing = pricing.get(model, {"prompt": 0, "completion": 0}) 149 return (usage.prompt_tokens * model_pricing["prompt"]) + (usage.completion_tokens * model_pricing["completion"]) 150 151# Create callback instance 152otel_callback = LiteLLMOpenTelemetryCallback()

Step 3: Use LiteLLM with Manual Span Creation

python
1# main.py 2from litellm import completion 3from litellm_otel_callback import otel_callback, tracer, set_current_span 4import os 5 6def call_llm(messages, model="gpt-4"): 7 """Call LiteLLM with manual OpenTelemetry instrumentation""" 8 9 # Start span manually 10 span = tracer.start_span("litellm.completion") 11 set_current_span(span) 12 13 try: 14 span.set_attribute("llm.framework", "litellm") 15 span.set_attribute("llm.model", model) 16 span.set_attribute("llm.messages", str(messages)) 17 18 response = completion( 19 model=model, 20 messages=messages, 21 api_key=os.getenv("OPENAI_API_KEY"), 22 callbacks=[otel_callback], # Use our callback 23 ) 24 25 return response 26 except Exception as e: 27 span.record_exception(e) 28 span.set_status(trace.Status(trace.StatusCode.ERROR, str(e))) 29 raise 30 finally: 31 # Span is ended in the callback, but we clean up here 32 if span: 33 set_current_span(None) 34 35# Ensure spans are flushed 36import atexit 37def flush_spans(): 38 from opentelemetry import trace 39 provider = trace.get_tracer_provider() 40 if hasattr(provider, "force_flush"): 41 provider.force_flush() 42 43atexit.register(flush_spans)

The Challenges with Manual Setup

  • 100+ Models: You must maintain pricing for every model LiteLLM supports
  • Provider-Specific Logic: Different providers return usage data in different formats
  • Callback Complexity: Managing span lifecycle across sync and async calls
  • Thread Safety: Ensuring spans are correctly associated with requests in multi-threaded environments
  • Error Handling: Properly recording exceptions and setting span status
  • Cost Calculation: Keeping pricing up-to-date as providers change rates

Option B: The Keywords AI Way

Keywords AI has native LiteLLM support. Here's how simple it is:

Full setup guide: LiteLLM + Keywords AI

Step 1: Install Keywords AI Tracing

pip install keywords-ai-tracing

Step 2: Initialize (One Line)

python
1# main.py 2from keywords_ai_tracing import KeywordsTracer 3from litellm import completion 4 5# Initialize - LiteLLM is automatically instrumented 6KeywordsTracer.init(api_key=os.getenv("KEYWORDSAI_API_KEY")) 7 8# That's it!

Step 3: Use LiteLLM Normally

python
1from litellm import completion 2 3# Use LiteLLM as normal - tracing happens automatically 4response = completion( 5 model="gpt-4", 6 messages=[{"role": "user", "content": "Hello!"}], 7 api_key=os.getenv("OPENAI_API_KEY"), 8) 9 10# Or use any of the 100+ models 11response = completion( 12 model="claude-3-sonnet", 13 messages=[{"role": "user", "content": "Hello!"}], 14 api_key=os.getenv("ANTHROPIC_API_KEY"), 15)

That's it. Keywords AI automatically:

  • Detects all LiteLLM calls
  • Extracts model, tokens, and usage for all 100+ supported models
  • Calculates costs with up-to-date pricing
  • Handles sync and async calls
  • Works with LiteLLM proxy mode
  • Maps everything to a unified dashboard

Why It's Better

  • 100+ Models Supported: Automatic cost tracking for every model LiteLLM supports
  • Zero Configuration: No callbacks, no manual span management
  • Proxy Mode Support: Works seamlessly with LiteLLM proxy
  • Automatic Updates: Pricing updates automatically as providers change rates
  • Multi-Provider: Handles OpenAI, Anthropic, Google, Meta, and 96+ more providers

7. Advanced Topics: Distributed Tracing, Context Propagation, and Sampling

Once you have basic OpenTelemetry instrumentation working, you'll want to understand these advanced concepts for production deployments.

Distributed Tracing

In microservices architectures, a single user request might trigger:

  1. API Gateway (receives request)
  2. Auth Service (validates token)
  3. RAG Service (retrieves documents)
  4. LLM Service (generates response)
  5. Post-Processing Service (formats output)

Distributed tracing allows you to follow this request across all services. OpenTelemetry achieves this through context propagation.

How Context Propagation Works

When Service A calls Service B, it includes trace context in the HTTP headers:

python
1# Service A 2from opentelemetry import trace 3from opentelemetry.propagate import inject 4 5span = tracer.start_span("service_a.operation") 6trace_context = {} 7 8# Inject trace context into headers 9inject(trace_context) 10 11# Make HTTP request to Service B 12headers = trace_context 13response = requests.get("http://service-b/api", headers=headers)
python
1# Service B 2from opentelemetry.propagate import extract 3 4# Extract trace context from headers 5context = extract(request.headers) 6 7# Continue the trace 8with tracer.start_as_current_span("service_b.operation", context=context): 9 # This span will be a child of Service A's span 10 pass

Context Propagation in LLM Applications

For LLM applications, context propagation is critical when:

  • Your frontend calls your backend API
  • Your backend calls multiple LLM providers
  • You use function calling or tool use (each tool call should be a child span)
  • You have agentic loops (each iteration should be a child span)

Sampling

In high-traffic applications, tracing every request can be expensive. Sampling allows you to trace only a percentage of requests.

Head-Based Sampling

Decide whether to sample at the start of the trace:

python
1from opentelemetry.sdk.trace import TracerProvider 2from opentelemetry.sdk.trace.sampling import TraceIdRatioBased 3 4# Sample 10% of traces 5sampler = TraceIdRatioBased(0.1) 6provider = TracerProvider(sampler=sampler)

Tail-Based Sampling

Decide whether to keep a trace after it completes (useful for keeping error traces):

This requires a collector with tail sampling processor. Configured in collector config, not in application code.

Smart Sampling for LLMs

For LLM applications, consider:

  • Always sample errors: Keep 100% of failed requests
  • Sample by cost: Keep traces for expensive requests (high token usage)
  • Sample by user: Keep traces for specific users (e.g., beta testers)
  • Sample by model: Keep more traces for new/experimental models

Resource Attributes

Resource attributes describe the service that generated the telemetry:

python
1from opentelemetry.sdk.resources import Resource 2from opentelemetry.semantic_conventions.resource import ResourceAttributes 3 4resource = Resource.create({ 5 ResourceAttributes.SERVICE_NAME: "my-llm-app", 6 ResourceAttributes.SERVICE_VERSION: "1.2.3", 7 ResourceAttributes.DEPLOYMENT_ENVIRONMENT: "production", 8 ResourceAttributes.HOST_NAME: socket.gethostname(), 9})

These attributes appear on every span and help you filter traces in your backend.

Custom Attributes and Events

Beyond the standard attributes, you can add custom ones:

python
1span = tracer.start_span("llm.completion") 2span.set_attribute("llm.user_id", user_id) 3span.set_attribute("llm.session_id", session_id) 4span.set_attribute("llm.experiment_variant", "A") # For A/B testing 5 6# Add events (timestamped annotations) 7span.add_event("retrieval.started", {"query": query}) 8span.add_event("retrieval.completed", {"doc_count": 5})

8. The Verdict: Manual vs. Automated Tracing

While manual OpenTelemetry setup is "free" (no vendor cost), the engineering hours spent maintaining it are not. Let's break down the real costs:

Time Investment

TaskManual OTelKeywords AI
Initial Setup2-4 Hours2 Minutes
Cost Calculation Implementation4-8 Hours (for 10 models)0 (automatic)
Maintenance per SDK Update1-2 Hours0 (automatic)
Adding New Model Support30-60 Minutes per model0 (automatic)
Dashboard Setup2-4 Hours (Jaeger/Grafana)0 (included)
Edge Runtime Compatibility4-8 Hours0 (works out of box)

Feature Comparison

FeatureManual OTelKeywords AI
Setup Time2-4 Hours2 Minutes
Cost TrackingManual Calculation (error-prone)Built-in / Automatic (100+ models)
MaintenanceHigh (Updates with every SDK change)Zero
DashboardRequires extra tool (Jaeger/Honeycomb/Grafana)Included
LLM-Specific ViewsMust build custom dashboardsPre-built (costs, tokens, latency)
User AnalyticsMust implement custom logicBuilt-in
AlertingMust set up separatelyBuilt-in
Prompt ManagementNot includedIncluded
Multi-Provider SupportManual implementationAutomatic (100+ providers)

The Hidden Costs of Manual Setup

  1. Pricing Maintenance: LLM providers change pricing frequently. You must update your cost calculation code every time.
  2. SDK Compatibility: When Vercel AI SDK, Haystack, or LiteLLM release updates, your instrumentation might break.
  3. Edge Cases: Handling async operations, streaming, error cases, and context propagation correctly is non-trivial.
  4. Dashboard Development: Building useful dashboards for LLM-specific metrics (cost per user, token efficiency, etc.) takes significant time.
  5. Team Onboarding: New engineers must understand your custom instrumentation code.

When Manual OTel Makes Sense

Manual setup might be worth it if:

  • You're building a custom observability platform
  • You have strict compliance requirements that prevent using third-party services
  • You're already heavily invested in a specific backend (e.g., Datadog, New Relic)
  • You have a dedicated observability team

When Keywords AI Makes Sense

When choosing the best LLM observability platform for your team, Keywords AI is the better choice if:

  • You want to focus on building AI features, not observability infrastructure
  • You need LLM-specific analytics (costs, token usage, quality metrics)
  • You want to get started quickly (2 minutes vs. 2-4 hours)
  • You use multiple LLM providers and want unified observability
  • You want built-in features like prompt management and user analytics
  • You need an LLM observability platform that works seamlessly with Vercel AI SDK telemetry, LiteLLM observability, and Haystack pipelines

The ROI Calculation

Let's say you spend:

  • 4 hours on initial setup (manual)
  • 2 hours/month on maintenance
  • 1 hour per new model added (10 models/year = 10 hours)

Total: 4 + (2 × 12) + 10 = 38 hours/year

At $100/hour (senior engineer rate), that's $3,800/year in engineering time.

Keywords AI pricing starts at much less than this, and you get:

  • Zero maintenance
  • Automatic updates
  • Built-in dashboards
  • LLM-specific features
  • Support

The verdict: Unless you have specific requirements that prevent using a third-party service, Keywords AI provides better ROI for most teams.


9. Production Best Practices

Once you have OpenTelemetry instrumentation working, follow these best practices for production deployments:

1. Instrument at the Framework Level

Don't add tracing code in every function. Instead, instrument where you initialize your LLM clients:

python
1# ✅ Good: Instrument at initialization 2from keywords_ai_tracing import KeywordsTracer 3KeywordsTracer.init(api_key=os.getenv("KEYWORDSAI_API_KEY")) 4 5# Now all LLM calls are automatically traced 6response = completion(model="gpt-4", messages=messages) 7 8# ❌ Bad: Manual instrumentation everywhere 9def call_llm(messages): 10 span = tracer.start_span("llm.call") # Don't do this everywhere 11 # ...

2. Capture User Context

Include user IDs and session IDs in your spans to enable user-level analytics:

span.set_attribute("llm.user_id", user_id)
span.set_attribute("llm.session_id", session_id)
span.set_attribute("llm.customer_id", customer_id)

3. Set Appropriate Sampling Rates

For production:

  • Development: 100% sampling (see everything)
  • Staging: 50% sampling
  • Production: 10% sampling, but always sample errors

from opentelemetry.sdk.trace.sampling import TraceIdRatioBased
sampler = TraceIdRatioBased(0.1) # 10% sampling

4. Monitor Trace Volume

High trace volume can be expensive. Monitor:

  • Spans per second
  • Trace size (number of spans per trace)
  • Export latency

If volume is too high, adjust sampling or reduce span granularity.

5. Use Semantic Conventions

Follow OpenTelemetry's semantic conventions for attribute names:

python
1# ✅ Good: Use semantic conventions 2span.set_attribute("llm.model", "gpt-4") 3span.set_attribute("llm.tokens.prompt", 150) 4span.set_attribute("http.method", "POST") 5 6# ❌ Bad: Custom attribute names 7span.set_attribute("model_name", "gpt-4") # Inconsistent

6. Handle Errors Gracefully

Always record exceptions and set span status:

python
1try: 2 response = completion(model="gpt-4", messages=messages) 3 span.set_status(trace.Status(trace.StatusCode.OK)) 4except Exception as e: 5 span.record_exception(e) 6 span.set_status(trace.Status(trace.StatusCode.ERROR, str(e))) 7 raise

7. Set Up Alerts

Configure alerts for:

  • Cost spikes: Unexpected increases in LLM spending
  • Latency degradation: p95 latency above threshold
  • Error rate increases: More than X% of requests failing
  • Token usage anomalies: Unusual token consumption patterns

8. Review Traces Regularly

Don't just collect traces—use them:

  • Weekly reviews: Identify optimization opportunities
  • Post-incident analysis: Understand what went wrong
  • Cost optimization: Find expensive operations to optimize
  • Quality monitoring: Track response quality over time

9. Version Your Instrumentation

When you update your instrumentation code, include version information:

resource = Resource.create({ResourceAttributes.SERVICE_VERSION: "1.2.3", "instrumentation.version": "2.0.0"})

This helps you correlate issues with code changes.

10. Test Your Instrumentation

Don't assume your instrumentation works. Test it:

  • In development with 100% sampling
  • With error cases (timeouts, rate limits, invalid inputs)
  • With high load (ensure it doesn't impact performance)
  • With different models (verify cost calculation is correct)

Conclusion

OpenTelemetry is the industry standard for LLM observability, and it's essential for production LLM applications. As an LLM monitoring open source solution, OpenTelemetry provides the foundation for comprehensive observability. Whether you're using Haystack for RAG pipelines, implementing Vercel AI SDK telemetry for Next.js apps, or setting up LiteLLM observability for multi-provider setups, OpenTelemetry provides a unified way to observe your applications.

The choice between manual setup and choosing the best LLM observability platform comes down to:

  • Manual OTel: More control, but significant engineering investment. You'll need to build custom dashboards and maintain pricing tables for all models.
  • Best LLM Observability Platform (Keywords AI): Faster setup, zero maintenance, LLM-specific features. An LLM observability platform that handles everything automatically.

For most teams building production LLM applications, choosing the best LLM observability platform provides better ROI. With Keywords AI, you get production-ready LLM observability in 2 minutes instead of 2-4 hours, automatic cost tracking for 100+ models, and built-in dashboards optimized for LLM workflows. Whether you need Vercel AI SDK telemetry, LiteLLM observability, or Haystack instrumentation, the right LLM observability platform will save you hundreds of engineering hours.

Ready to see your traces?

Get started with Keywords AI in 60 seconds and start instrumenting your LLM applications today. Focus on building great AI features, not observability infrastructure.

About Keywords AIKeywords AI is the leading developer platform for LLM applications.
Keywords AIPowering the best AI startups.