Keywords AI
Building an AI agent is easy; knowing why it's failing is the hard part. As you move from simple chat completions to complex agentic loops with Haystack, the Vercel AI SDK, or LiteLLM, you quickly realize that traditional logging isn't enough. You need traces.
In this comprehensive guide, we dive deep into the industry standard: OpenTelemetry (OTel). We'll explore the technical "How-To" for the most popular LLM stacks and compare manual setup vs. the streamlined Keywords AI approach. By the end, you'll understand not just how to instrument your LLM applications, but why OpenTelemetry has become the de facto standard for production AI observability.
OpenTelemetry is not a backend; it is a vendor-neutral, open source observability framework. It provides a standardized set of APIs, SDKs, and protocols (OTLP) to collect "telemetry"—Traces, Metrics, and Logs. Think of it as the "USB-C of observability": a universal standard that works with any tool.
As an open source project under the Cloud Native Computing Foundation (CNCF), OpenTelemetry provides LLM monitoring open source solutions that don't lock you into proprietary platforms. This makes it the ideal foundation for building comprehensive LLM observability into your applications.
OpenTelemetry is built around three core data types:
1. Traces A trace represents the entire lifecycle of a request as it flows through your system. In LLM applications, a trace might capture:
2. Metrics Quantitative measurements over time. For LLMs, critical metrics include:
3. Logs Structured events with timestamps. While logs are less emphasized in OTel (compared to traces and metrics), they're still valuable for capturing:
In the context of LLMs, OTel works by creating a Trace, which represents the entire lifecycle of a user request. Inside that trace are Spans.
Each span contains:
llm.model="gpt-4", llm.tokens.prompt=150)OpenTelemetry follows a strict specification that ensures interoperability. The spec defines:
http.method, db.query, llm.model)This standardization means you can instrument your Python Haystack pipeline, your TypeScript Vercel AI SDK app, and your LiteLLM proxy, and they'll all produce compatible traces that can be viewed in a single dashboard.
Unlike vendor-specific solutions (Datadog APM, New Relic, etc.), OpenTelemetry gives you:
To get OpenTelemetry running in production, your system needs four components working together:
The code inside your application that generates spans. This can be:
For LLM applications, you typically need manual instrumentation because:
The language-specific implementation of the OpenTelemetry API. Popular SDKs include:
@opentelemetry/sdk-node (Node.js/TypeScript)opentelemetry-sdk (Python)opentelemetry-js (Browser/Edge)The SDK handles:
The component that sends telemetry data to your backend. Common exporters:
For LLM observability, you'll typically use the OTLP HTTP exporter to send data to an LLM observability platform like Keywords AI, which provides LLM-specific dashboards and analytics. When choosing the best LLM observability platform for your needs, consider factors like cost tracking, token usage analytics, and integration with your existing stack.
Where your telemetry data lives and gets visualized. You have two options:
Option A: Direct Export Your application exports directly to a backend (e.g., Keywords AI, Datadog, Grafana Cloud).
Option B: OpenTelemetry Collector A middleman service that receives, processes, and routes telemetry data. The collector is useful for:
For most LLM applications, direct export is simpler and sufficient. The collector adds operational complexity that may not be necessary unless you're running at massive scale.
LLM applications are fundamentally different from traditional web applications. They're non-deterministic, stateful, and involve complex multi-step workflows. When an agent fails, it could be because of:
OpenTelemetry's Distributed Tracing allows you to follow the request as it travels across different microservices, LLM providers, and infrastructure components. This is critical for debugging production issues.
Traditional application monitoring focuses on:
LLM observability requires tracking:
Consider this scenario: A user reports that your AI agent gave a wrong answer. With standard logging, you might see:
[INFO] User query: "What is the capital of France?" [INFO] LLM response: "Paris"
But you don't know:
With OpenTelemetry traces, you get a complete picture:
Trace: user_query_abc123 ├─ Span: rag.retrieval │ ├─ Attributes: retrieval.doc_count=5, retrieval.latency_ms=120 │ └─ Events: retrieval.started, retrieval.completed ├─ Span: llm.completion │ ├─ Attributes: llm.model=gpt-4, llm.tokens.prompt=250, llm.tokens.completion=50 │ ├─ Attributes: llm.cost=0.003, llm.latency_ms=850 │ └─ Status: OK └─ Span: post_processing └─ Attributes: processing.type=sentiment_analysis
Beyond debugging, observability drives business outcomes:
When evaluating LLM observability platforms, look for solutions that provide comprehensive LLM monitoring open source capabilities through OpenTelemetry integration. The best LLM observability platform will offer automatic cost tracking, token usage analytics, and seamless integration with frameworks like Vercel AI SDK, Haystack, and LiteLLM.
Key SEO Keywords: llm observability (590), llm observability platform (50), best llm observability platform (50), llm monitoring open source (40), vercel ai sdk telemetry (40), litellm observability (40), opentelemetry llm (30), haystack observability (15).
The Vercel AI SDK is the go-to choice for Next.js developers building AI applications. It provides a unified interface for working with multiple LLM providers (OpenAI, Anthropic, Google, etc.) and handles streaming, tool calling, and structured outputs.
However, managing Vercel AI SDK telemetry with OpenTelemetry in a serverless environment (Vercel Functions) comes with significant overhead. Setting up proper observability for Vercel AI SDK requires careful configuration of OpenTelemetry instrumentation. Let's explore both approaches.
To manually instrument Vercel AI SDK, you must use the @opentelemetry/sdk-node package and configure an instrumentation.ts file. Here's what's involved:
npm install @opentelemetry/api @opentelemetry/sdk-node @opentelemetry/exporter-trace-otlp-http @opentelemetry/instrumentation @opentelemetry/resources @opentelemetry/semantic-conventions
instrumentation.tsVercel requires an instrumentation.ts file in your project root to initialize OpenTelemetry before your application code runs:
typescript1// instrumentation.ts 2import { NodeSDK } from '@opentelemetry/sdk-node'; 3import { OTLPTraceExporter } from '@opentelemetry/exporter-trace-otlp-http'; 4import { Resource } from '@opentelemetry/resources'; 5import { SemanticResourceAttributes } from '@opentelemetry/semantic-conventions'; 6import { HttpInstrumentation } from '@opentelemetry/instrumentation-http'; 7import { ExpressInstrumentation } from '@opentelemetry/instrumentation-express'; 8 9const sdk = new NodeSDK({ 10 resource: new Resource({ 11 [SemanticResourceAttributes.SERVICE_NAME]: 'vercel-ai-app', 12 [SemanticResourceAttributes.SERVICE_VERSION]: process.env.VERCEL_GIT_COMMIT_SHA || '1.0.0', 13 [SemanticResourceAttributes.DEPLOYMENT_ENVIRONMENT]: process.env.VERCEL_ENV || 'development', 14 }), 15 traceExporter: new OTLPTraceExporter({ 16 url: process.env.OTEL_EXPORTER_OTLP_ENDPOINT || 'https://api.keywordsai.co/v1/traces', 17 headers: { 18 'Authorization': `Bearer ${process.env.KEYWORDSAI_API_KEY}`, 19 'Content-Type': 'application/json', 20 }, 21 }), 22 instrumentations: [ 23 new HttpInstrumentation(), 24 new ExpressInstrumentation(), 25 ], 26}); 27 28sdk.start(); 29 30// Ensure spans are flushed before the process exits 31process.on('SIGTERM', () => { 32 sdk.shutdown() 33 .then(() => console.log('OpenTelemetry terminated')) 34 .catch((error) => console.log('Error terminating OpenTelemetry', error)) 35 .finally(() => process.exit(0)); 36});
next.config.jsYou need to enable the experimental instrumentationHook:
javascript1// next.config.js 2module.exports = { 3 experimental: { 4 instrumentationHook: true, 5 }, 6};
Now you need to manually wrap every AI SDK call with spans:
typescript1// app/api/chat/route.ts 2import { openai } from '@ai-sdk/openai'; 3import { generateText, streamText } from 'ai'; 4import { trace, context } from '@opentelemetry/api'; 5import { NextRequest, NextResponse } from 'next/server'; 6 7const tracer = trace.getTracer('vercel-ai-sdk'); 8 9export async function POST(request: NextRequest) { 10 const { messages, stream } = await request.json(); 11 12 // Start a trace for this request 13 const span = tracer.startSpan('ai.chat', { 14 attributes: { 15 'llm.framework': 'vercel-ai-sdk', 16 'llm.provider': 'openai', 17 'http.method': 'POST', 18 'http.route': '/api/chat', 19 }, 20 }); 21 22 try { 23 const activeContext = trace.setSpan(context.active(), span); 24 25 return await context.with(activeContext, async () => { 26 if (stream) { 27 return handleStreaming(messages, span); 28 } else { 29 return handleNonStreaming(messages, span); 30 } 31 }); 32 } catch (error) { 33 span.recordException(error as Error); 34 span.setStatus({ code: 1, message: (error as Error).message }); 35 throw error; 36 } finally { 37 span.end(); 38 } 39} 40 41async function handleNonStreaming(messages: any[], span: any) { 42 const generateSpan = tracer.startSpan('ai.generate', { 43 parent: span, 44 }); 45 46 try { 47 generateSpan.setAttributes({ 48 'llm.messages.count': messages.length, 49 'llm.messages.last': JSON.stringify(messages[messages.length - 1]), 50 }); 51 52 const { text, usage, finishReason } = await generateText({ 53 model: openai('gpt-4'), 54 messages: messages, 55 }); 56 57 // Extract and set LLM-specific attributes 58 generateSpan.setAttributes({ 59 'llm.model': 'gpt-4', 60 'llm.response': text, 61 'llm.tokens.prompt': usage.promptTokens, 62 'llm.tokens.completion': usage.completionTokens, 63 'llm.tokens.total': usage.totalTokens, 64 'llm.finish_reason': finishReason, 65 'llm.cost': calculateCost(usage.promptTokens, usage.completionTokens, 'gpt-4'), 66 }); 67 68 generateSpan.setStatus({ code: 0 }); // OK 69 return NextResponse.json({ text }); 70 } catch (error) { 71 generateSpan.recordException(error as Error); 72 generateSpan.setStatus({ code: 1, message: (error as Error).message }); 73 throw error; 74 } finally { 75 generateSpan.end(); 76 } 77} 78 79async function handleStreaming(messages: any[], span: any) { 80 const streamSpan = tracer.startSpan('ai.stream', { 81 parent: span, 82 }); 83 84 try { 85 streamSpan.setAttribute('llm.streaming', true); 86 87 const result = await streamText({ 88 model: openai('gpt-4'), 89 messages: messages, 90 }); 91 92 // For streaming, we need to track tokens differently 93 // This is a simplified example - real implementation is more complex 94 streamSpan.setAttribute('llm.model', 'gpt-4'); 95 96 return result.toDataStreamResponse(); 97 } catch (error) { 98 streamSpan.recordException(error as Error); 99 streamSpan.setStatus({ code: 1, message: (error as Error).message }); 100 throw error; 101 } finally { 102 streamSpan.end(); 103 } 104} 105 106function calculateCost(promptTokens: number, completionTokens: number, model: string): number { 107 // Pricing as of 2026 (example - update with actual prices) 108 const pricing: Record<string, { prompt: number; completion: number }> = { 109 'gpt-4': { prompt: 0.03 / 1000, completion: 0.06 / 1000 }, 110 'gpt-4-turbo': { prompt: 0.01 / 1000, completion: 0.03 / 1000 }, 111 'gpt-3.5-turbo': { prompt: 0.0015 / 1000, completion: 0.002 / 1000 }, 112 }; 113 114 const modelPricing = pricing[model] || pricing['gpt-3.5-turbo']; 115 return (promptTokens * modelPricing.prompt) + (completionTokens * modelPricing.completion); 116}
Vercel's Edge Runtime doesn't support Node.js APIs, which means OpenTelemetry SDKs that rely on Node.js won't work. You have two options:
export const runtime = 'nodejs' to your routeRead the full manual guide: Vercel OTel Docs
Keywords AI replaces dozens of lines of configuration with a single package. We handle the runtime compatibility, the mapping of LLM-specific metadata (tokens, costs, model IDs), and the span lifecycle automatically.
Full setup guide: Vercel AI SDK + Keywords AI tracing
npm install @keywordsai/tracing-node
instrumentation.tstypescript1// instrumentation.ts 2import { KeywordsTracer } from '@keywordsai/tracing-node'; 3 4KeywordsTracer.init({ 5 apiKey: process.env.KEYWORDSAI_API_KEY, 6 serviceName: 'vercel-ai-app', 7});
That's it. No manual SDK configuration, no exporter setup, no span lifecycle management.
typescript1// app/api/chat/route.ts 2import { openai } from '@ai-sdk/openai'; 3import { generateText, streamText } from 'ai'; 4import { NextRequest, NextResponse } from 'next/server'; 5 6export async function POST(request: NextRequest) { 7 const { messages, stream } = await request.json(); 8 9 if (stream) { 10 const result = await streamText({ 11 model: openai('gpt-4'), 12 messages: messages, 13 }); 14 return result.toDataStreamResponse(); 15 } else { 16 const { text } = await generateText({ 17 model: openai('gpt-4'), 18 messages: messages, 19 }); 20 return NextResponse.json({ text }); 21 } 22}
That's it. Keywords AI automatically:
Haystack by deepset is a powerhouse for Python-based RAG pipelines. It's designed for production-ready LLM applications with built-in support for document stores, retrievers, generators, and complex pipelines.
Haystack has built-in support for OpenTelemetry, but the "wiring" is left to you. Let's compare the manual approach vs. the Keywords AI way.
For Haystack, you need to set up a Python tracer provider and link it to Haystack's internal tracing backend. Here's the complete setup:
pip install haystack-ai opentelemetry-api opentelemetry-sdk opentelemetry-exporter-otlp-proto-http opentelemetry-instrumentation
python1# tracing_setup.py 2from opentelemetry import trace 3from opentelemetry.sdk.trace import TracerProvider 4from opentelemetry.sdk.trace.export import BatchSpanProcessor 5from opentelemetry.exporter.otlp.proto.http.trace_exporter import OTLPSpanExporter 6from opentelemetry.sdk.resources import Resource 7from opentelemetry.semantic_conventions.resource import ResourceAttributes 8import os 9 10# Create resource with service information 11resource = Resource.create({ 12 ResourceAttributes.SERVICE_NAME: "haystack-rag-app", 13 ResourceAttributes.SERVICE_VERSION: os.getenv("APP_VERSION", "1.0.0"), 14 ResourceAttributes.DEPLOYMENT_ENVIRONMENT: os.getenv("ENVIRONMENT", "production"), 15}) 16 17# Initialize tracer provider 18provider = TracerProvider(resource=resource) 19trace.set_tracer_provider(provider) 20 21# Create OTLP exporter 22exporter = OTLPSpanExporter( 23 endpoint="https://api.keywordsai.co/v1/traces", 24 headers={ 25 "Authorization": f"Bearer {os.getenv('KEYWORDSAI_API_KEY')}", 26 "Content-Type": "application/json", 27 }, 28) 29 30# Add batch processor 31processor = BatchSpanProcessor(exporter) 32provider.add_span_processor(processor) 33 34# Get tracer 35tracer = trace.get_tracer(__name__)
Haystack has a tracing abstraction that you need to connect to OpenTelemetry:
python1# haystack_tracing.py 2from haystack.tracing import OpenTelemetryTracer 3from opentelemetry import trace 4from opentelemetry.trace import Status, StatusCode 5 6class CustomOpenTelemetryTracer(OpenTelemetryTracer): 7 """Custom tracer that adds LLM-specific attributes""" 8 9 def __init__(self): 10 super().__init__(trace.get_tracer("haystack")) 11 12 def trace(self, operation_name: str, tags: dict = None, **kwargs): 13 """Override to add custom attributes""" 14 span = self.tracer.start_span(operation_name) 15 16 if tags: 17 for key, value in tags.items(): 18 # Map Haystack tags to OTel attributes 19 if key == "model": 20 span.set_attribute("llm.model", value) 21 elif key == "provider": 22 span.set_attribute("llm.provider", value) 23 elif key == "prompt_tokens": 24 span.set_attribute("llm.tokens.prompt", value) 25 elif key == "completion_tokens": 26 span.set_attribute("llm.tokens.completion", value) 27 else: 28 span.set_attribute(f"haystack.{key}", str(value)) 29 30 return span 31 32# Initialize Haystack tracing 33from haystack import tracing 34haystack_tracer = CustomOpenTelemetryTracer() 35tracing.set_backend(haystack_tracer)
Now you need to manually instrument every component in your pipeline:
python1# pipeline.py 2from haystack import Pipeline, Document 3from haystack.components.builders import PromptBuilder 4from haystack.components.retrievers import InMemoryBM25Retriever 5from haystack.components.generators import OpenAIGenerator 6from haystack.document_stores import InMemoryDocumentStore 7from opentelemetry import trace 8import os 9 10# Import tracing setup 11from tracing_setup import tracer 12 13def create_rag_pipeline(): 14 """Create a RAG pipeline with manual OpenTelemetry instrumentation""" 15 16 # Document store 17 document_store = InMemoryDocumentStore() 18 19 # Retriever 20 retriever = InMemoryBM25Retriever(document_store=document_store, top_k=5) 21 22 # Prompt builder 23 prompt_template = """ 24 Given the following information, answer the question. 25 26 Context: 27 {% for document in documents %} 28 {{ document.content }} 29 {% endfor %} 30 31 Question: {{ query }} 32 Answer: 33 """ 34 prompt_builder = PromptBuilder(template=prompt_template) 35 36 # LLM generator 37 generator = OpenAIGenerator(api_key=os.getenv("OPENAI_API_KEY")) 38 39 # Create pipeline 40 pipeline = Pipeline() 41 pipeline.add_component("retriever", retriever) 42 pipeline.add_component("prompt_builder", prompt_builder) 43 pipeline.add_component("llm", generator) 44 45 pipeline.connect("retriever", "prompt_builder.documents") 46 pipeline.connect("prompt_builder", "llm.prompt") 47 48 return pipeline 49 50def run_rag_query(query: str, pipeline: Pipeline): 51 """Run a RAG query with full OpenTelemetry tracing""" 52 53 # Start root span 54 with tracer.start_as_current_span("haystack.rag_pipeline") as root_span: 55 root_span.set_attribute("query", query) 56 root_span.set_attribute("llm.framework", "haystack") 57 58 # Retrieval span 59 with tracer.start_as_current_span("haystack.retrieval") as retrieval_span: 60 # Manually call retriever to get documents 61 documents = pipeline.get_component("retriever").run(query=query) 62 retrieval_span.set_attribute("retrieval.doc_count", len(documents["documents"])) 63 retrieval_span.set_attribute("retrieval.query", query) 64 65 # Prompt building span 66 with tracer.start_as_current_span("haystack.prompt_building") as prompt_span: 67 prompt = pipeline.get_component("prompt_builder").run( 68 query=query, 69 documents=documents["documents"] 70 ) 71 prompt_span.set_attribute("prompt.length", len(prompt["prompt"])) 72 73 # LLM generation span 74 with tracer.start_as_current_span("haystack.generation") as gen_span: 75 response = pipeline.get_component("llm").run(prompt=prompt["prompt"]) 76 77 # Extract usage information (this varies by generator) 78 if hasattr(response, "meta") and "usage" in response.meta: 79 usage = response.meta["usage"] 80 gen_span.set_attribute("llm.tokens.prompt", usage.get("prompt_tokens", 0)) 81 gen_span.set_attribute("llm.tokens.completion", usage.get("completion_tokens", 0)) 82 gen_span.set_attribute("llm.tokens.total", usage.get("total_tokens", 0)) 83 84 # Extract model information 85 if hasattr(response, "meta") and "model" in response.meta: 86 gen_span.set_attribute("llm.model", response.meta["model"]) 87 88 # Calculate cost (manual implementation required) 89 gen_span.set_attribute("llm.cost", calculate_llm_cost(response)) 90 91 gen_span.set_attribute("llm.response", response["replies"][0]) 92 93 return response 94 95def calculate_llm_cost(response): 96 """Manually calculate LLM cost - you must implement this for every model""" 97 # This is a simplified example - real implementation needs pricing for all models 98 if not hasattr(response, "meta") or "usage" not in response.meta: 99 return 0 100 101 usage = response.meta["usage"] 102 model = response.meta.get("model", "gpt-3.5-turbo") 103 104 # Pricing table (you must maintain this) 105 pricing = { 106 "gpt-4": {"prompt": 0.03 / 1000, "completion": 0.06 / 1000}, 107 "gpt-4-turbo": {"prompt": 0.01 / 1000, "completion": 0.03 / 1000}, 108 "gpt-3.5-turbo": {"prompt": 0.0015 / 1000, "completion": 0.002 / 1000}, 109 "claude-3-opus": {"prompt": 0.015 / 1000, "completion": 0.075 / 1000}, 110 # ... you must add pricing for every model you use 111 } 112 113 model_pricing = pricing.get(model, pricing["gpt-3.5-turbo"]) 114 prompt_tokens = usage.get("prompt_tokens", 0) 115 completion_tokens = usage.get("completion_tokens", 0) 116 117 return (prompt_tokens * model_pricing["prompt"]) + (completion_tokens * model_pricing["completion"]) 118 119# Ensure spans are flushed before process exits 120import atexit 121def flush_spans(): 122 from opentelemetry import trace 123 provider = trace.get_tracer_provider() 124 if hasattr(provider, "force_flush"): 125 provider.force_flush() 126 127atexit.register(flush_spans)
Read the manual docs: Haystack Tracing Guide
With Keywords AI, we've built a dedicated exporter specifically for the Haystack OpenTelemetry integration. Here's how simple it is:
Full setup guide: Haystack + Keywords AI tracing
pip install keywords-ai-tracing
python1# main.py 2from keywords_ai_tracing import KeywordsTracer 3 4# Initialize - everything is handled automatically 5tracer = KeywordsTracer(api_key=os.getenv("KEYWORDSAI_API_KEY")) 6 7# That's it! Haystack is now automatically instrumented
python1from haystack import Pipeline 2from haystack.components.builders import PromptBuilder 3from haystack.components.generators import OpenAIGenerator 4 5# Create your pipeline as normal 6pipeline = Pipeline() 7pipeline.add_component("prompt_builder", PromptBuilder(template="Answer: {{query}}")) 8pipeline.add_component("llm", OpenAIGenerator(api_key=os.getenv("OPENAI_API_KEY"))) 9pipeline.connect("prompt_builder.prompt", "llm.prompt") 10 11# Run it - tracing happens automatically 12result = pipeline.run({"query": "What is the capital of France?"})
That's it. Keywords AI automatically:
LiteLLM is a unified proxy that standardizes calls across 100+ LLM providers. It's perfect for:
LiteLLM observability is crucial for understanding which models perform best and optimizing costs across providers. LiteLLM has built-in support for observability through callbacks, but integrating with OpenTelemetry requires manual work. Let's compare both approaches for implementing LiteLLM observability.
LiteLLM provides callback hooks that you can use to send data to OpenTelemetry. Here's the complete manual setup:
pip install litellm opentelemetry-api opentelemetry-sdk opentelemetry-exporter-otlp-proto-http
python1# litellm_otel_callback.py 2from opentelemetry import trace 3from opentelemetry.sdk.trace import TracerProvider 4from opentelemetry.sdk.trace.export import BatchSpanProcessor 5from opentelemetry.exporter.otlp.proto.http.trace_exporter import OTLPSpanExporter 6from opentelemetry.sdk.resources import Resource 7from opentelemetry.semantic_conventions.resource import ResourceAttributes 8import os 9from typing import Optional, Dict, Any 10from litellm import completion 11 12# Initialize OpenTelemetry 13resource = Resource.create({ 14 ResourceAttributes.SERVICE_NAME: "litellm-proxy", 15}) 16 17provider = TracerProvider(resource=resource) 18trace.set_tracer_provider(provider) 19 20exporter = OTLPSpanExporter( 21 endpoint="https://api.keywordsai.co/v1/traces", 22 headers={ 23 "Authorization": f"Bearer {os.getenv('KEYWORDSAI_API_KEY')}", 24 }, 25) 26 27processor = BatchSpanProcessor(exporter) 28provider.add_span_processor(processor) 29 30tracer = trace.get_tracer(__name__) 31 32# Store active spans in a thread-safe way 33from threading import local 34_thread_local = local() 35 36def get_current_span(): 37 """Get the current span from thread local storage""" 38 return getattr(_thread_local, 'span', None) 39 40def set_current_span(span): 41 """Set the current span in thread local storage""" 42 _thread_local.span = span 43 44class LiteLLMOpenTelemetryCallback: 45 """OpenTelemetry callback for LiteLLM""" 46 47 def __init__(self): 48 self.tracer = tracer 49 50 def log_success_event(self, kwargs, response_obj, start_time, end_time): 51 """Called when a completion succeeds""" 52 span = get_current_span() 53 if span: 54 # Extract model information 55 model = kwargs.get("model", "unknown") 56 span.set_attribute("llm.model", model) 57 span.set_attribute("llm.provider", self._extract_provider(model)) 58 59 # Extract usage information 60 if hasattr(response_obj, "usage"): 61 usage = response_obj.usage 62 span.set_attribute("llm.tokens.prompt", usage.prompt_tokens) 63 span.set_attribute("llm.tokens.completion", usage.completion_tokens) 64 span.set_attribute("llm.tokens.total", usage.total_tokens) 65 66 # Extract response 67 if hasattr(response_obj, "choices") and len(response_obj.choices) > 0: 68 span.set_attribute("llm.response", response_obj.choices[0].message.content) 69 70 # Calculate latency 71 latency_ms = (end_time - start_time) * 1000 72 span.set_attribute("llm.latency_ms", latency_ms) 73 74 # Calculate cost (manual implementation required) 75 cost = self._calculate_cost(kwargs, response_obj) 76 span.set_attribute("llm.cost", cost) 77 78 # Extract request information 79 if "messages" in kwargs: 80 span.set_attribute("llm.messages.count", len(kwargs["messages"])) 81 span.set_attribute("llm.messages.last", str(kwargs["messages"][-1])) 82 83 if "temperature" in kwargs: 84 span.set_attribute("llm.temperature", kwargs["temperature"]) 85 if "max_tokens" in kwargs: 86 span.set_attribute("llm.max_tokens", kwargs["max_tokens"]) 87 88 span.set_status(trace.Status(trace.StatusCode.OK)) 89 span.end() 90 set_current_span(None) 91 92 def log_failure_event(self, kwargs, response_obj, start_time, end_time, error): 93 """Called when a completion fails""" 94 span = get_current_span() 95 if span: 96 span.record_exception(error) 97 span.set_status(trace.Status(trace.StatusCode.ERROR, str(error))) 98 span.end() 99 set_current_span(None) 100 101 def async_log_success_event(self, kwargs, response_obj, start_time, end_time): 102 """Async version - same as sync""" 103 self.log_success_event(kwargs, response_obj, start_time, end_time) 104 105 def async_log_failure_event(self, kwargs, response_obj, start_time, end_time, error): 106 """Async version - same as sync""" 107 self.log_failure_event(kwargs, response_obj, start_time, end_time, error) 108 109 def _extract_provider(self, model: str) -> str: 110 """Extract provider name from model string""" 111 if model.startswith("gpt-") or model.startswith("o1-"): 112 return "openai" 113 elif model.startswith("claude-") or model.startswith("sonnet-"): 114 return "anthropic" 115 elif model.startswith("gemini-") or "google" in model.lower(): 116 return "google" 117 elif model.startswith("llama-") or "meta" in model.lower(): 118 return "meta" 119 else: 120 return "unknown" 121 122 def _calculate_cost(self, kwargs: Dict, response_obj: Any) -> float: 123 """Manually calculate cost - you must implement this for all 100+ models""" 124 model = kwargs.get("model", "gpt-3.5-turbo") 125 126 if not hasattr(response_obj, "usage"): 127 return 0 128 129 usage = response_obj.usage 130 131 # You must maintain pricing for 100+ models 132 # This is a simplified example - real implementation is massive 133 pricing = { 134 # OpenAI 135 "gpt-4": {"prompt": 0.03 / 1000, "completion": 0.06 / 1000}, 136 "gpt-4-turbo": {"prompt": 0.01 / 1000, "completion": 0.03 / 1000}, 137 "gpt-3.5-turbo": {"prompt": 0.0015 / 1000, "completion": 0.002 / 1000}, 138 "o1-preview": {"prompt": 0.015 / 1000, "completion": 0.06 / 1000}, 139 # Anthropic 140 "claude-3-opus": {"prompt": 0.015 / 1000, "completion": 0.075 / 1000}, 141 "claude-3-sonnet": {"prompt: 0.003 / 1000, "completion": 0.015 / 1000}, 142 "claude-3-haiku": {"prompt": 0.00025 / 1000, "completion": 0.00125 / 1000}, 143 # Google 144 "gemini-pro": {"prompt": 0.0005 / 1000, "completion": 0.0015 / 1000}, 145 # ... you need pricing for 100+ more models 146 } 147 148 model_pricing = pricing.get(model, {"prompt": 0, "completion": 0}) 149 return (usage.prompt_tokens * model_pricing["prompt"]) + (usage.completion_tokens * model_pricing["completion"]) 150 151# Create callback instance 152otel_callback = LiteLLMOpenTelemetryCallback()
python1# main.py 2from litellm import completion 3from litellm_otel_callback import otel_callback, tracer, set_current_span 4import os 5 6def call_llm(messages, model="gpt-4"): 7 """Call LiteLLM with manual OpenTelemetry instrumentation""" 8 9 # Start span manually 10 span = tracer.start_span("litellm.completion") 11 set_current_span(span) 12 13 try: 14 span.set_attribute("llm.framework", "litellm") 15 span.set_attribute("llm.model", model) 16 span.set_attribute("llm.messages", str(messages)) 17 18 response = completion( 19 model=model, 20 messages=messages, 21 api_key=os.getenv("OPENAI_API_KEY"), 22 callbacks=[otel_callback], # Use our callback 23 ) 24 25 return response 26 except Exception as e: 27 span.record_exception(e) 28 span.set_status(trace.Status(trace.StatusCode.ERROR, str(e))) 29 raise 30 finally: 31 # Span is ended in the callback, but we clean up here 32 if span: 33 set_current_span(None) 34 35# Ensure spans are flushed 36import atexit 37def flush_spans(): 38 from opentelemetry import trace 39 provider = trace.get_tracer_provider() 40 if hasattr(provider, "force_flush"): 41 provider.force_flush() 42 43atexit.register(flush_spans)
Keywords AI has native LiteLLM support. Here's how simple it is:
Full setup guide: LiteLLM + Keywords AI
pip install keywords-ai-tracing
python1# main.py 2from keywords_ai_tracing import KeywordsTracer 3from litellm import completion 4 5# Initialize - LiteLLM is automatically instrumented 6KeywordsTracer.init(api_key=os.getenv("KEYWORDSAI_API_KEY")) 7 8# That's it!
python1from litellm import completion 2 3# Use LiteLLM as normal - tracing happens automatically 4response = completion( 5 model="gpt-4", 6 messages=[{"role": "user", "content": "Hello!"}], 7 api_key=os.getenv("OPENAI_API_KEY"), 8) 9 10# Or use any of the 100+ models 11response = completion( 12 model="claude-3-sonnet", 13 messages=[{"role": "user", "content": "Hello!"}], 14 api_key=os.getenv("ANTHROPIC_API_KEY"), 15)
That's it. Keywords AI automatically:
Once you have basic OpenTelemetry instrumentation working, you'll want to understand these advanced concepts for production deployments.
In microservices architectures, a single user request might trigger:
Distributed tracing allows you to follow this request across all services. OpenTelemetry achieves this through context propagation.
When Service A calls Service B, it includes trace context in the HTTP headers:
python1# Service A 2from opentelemetry import trace 3from opentelemetry.propagate import inject 4 5span = tracer.start_span("service_a.operation") 6trace_context = {} 7 8# Inject trace context into headers 9inject(trace_context) 10 11# Make HTTP request to Service B 12headers = trace_context 13response = requests.get("http://service-b/api", headers=headers)
python1# Service B 2from opentelemetry.propagate import extract 3 4# Extract trace context from headers 5context = extract(request.headers) 6 7# Continue the trace 8with tracer.start_as_current_span("service_b.operation", context=context): 9 # This span will be a child of Service A's span 10 pass
For LLM applications, context propagation is critical when:
In high-traffic applications, tracing every request can be expensive. Sampling allows you to trace only a percentage of requests.
Decide whether to sample at the start of the trace:
python1from opentelemetry.sdk.trace import TracerProvider 2from opentelemetry.sdk.trace.sampling import TraceIdRatioBased 3 4# Sample 10% of traces 5sampler = TraceIdRatioBased(0.1) 6provider = TracerProvider(sampler=sampler)
Decide whether to keep a trace after it completes (useful for keeping error traces):
This requires a collector with tail sampling processor. Configured in collector config, not in application code.
For LLM applications, consider:
Resource attributes describe the service that generated the telemetry:
python1from opentelemetry.sdk.resources import Resource 2from opentelemetry.semantic_conventions.resource import ResourceAttributes 3 4resource = Resource.create({ 5 ResourceAttributes.SERVICE_NAME: "my-llm-app", 6 ResourceAttributes.SERVICE_VERSION: "1.2.3", 7 ResourceAttributes.DEPLOYMENT_ENVIRONMENT: "production", 8 ResourceAttributes.HOST_NAME: socket.gethostname(), 9})
These attributes appear on every span and help you filter traces in your backend.
Beyond the standard attributes, you can add custom ones:
python1span = tracer.start_span("llm.completion") 2span.set_attribute("llm.user_id", user_id) 3span.set_attribute("llm.session_id", session_id) 4span.set_attribute("llm.experiment_variant", "A") # For A/B testing 5 6# Add events (timestamped annotations) 7span.add_event("retrieval.started", {"query": query}) 8span.add_event("retrieval.completed", {"doc_count": 5})
While manual OpenTelemetry setup is "free" (no vendor cost), the engineering hours spent maintaining it are not. Let's break down the real costs:
| Task | Manual OTel | Keywords AI |
|---|---|---|
| Initial Setup | 2-4 Hours | 2 Minutes |
| Cost Calculation Implementation | 4-8 Hours (for 10 models) | 0 (automatic) |
| Maintenance per SDK Update | 1-2 Hours | 0 (automatic) |
| Adding New Model Support | 30-60 Minutes per model | 0 (automatic) |
| Dashboard Setup | 2-4 Hours (Jaeger/Grafana) | 0 (included) |
| Edge Runtime Compatibility | 4-8 Hours | 0 (works out of box) |
| Feature | Manual OTel | Keywords AI |
|---|---|---|
| Setup Time | 2-4 Hours | 2 Minutes |
| Cost Tracking | Manual Calculation (error-prone) | Built-in / Automatic (100+ models) |
| Maintenance | High (Updates with every SDK change) | Zero |
| Dashboard | Requires extra tool (Jaeger/Honeycomb/Grafana) | Included |
| LLM-Specific Views | Must build custom dashboards | Pre-built (costs, tokens, latency) |
| User Analytics | Must implement custom logic | Built-in |
| Alerting | Must set up separately | Built-in |
| Prompt Management | Not included | Included |
| Multi-Provider Support | Manual implementation | Automatic (100+ providers) |
Manual setup might be worth it if:
When choosing the best LLM observability platform for your team, Keywords AI is the better choice if:
Let's say you spend:
Total: 4 + (2 × 12) + 10 = 38 hours/year
At $100/hour (senior engineer rate), that's $3,800/year in engineering time.
Keywords AI pricing starts at much less than this, and you get:
The verdict: Unless you have specific requirements that prevent using a third-party service, Keywords AI provides better ROI for most teams.
Once you have OpenTelemetry instrumentation working, follow these best practices for production deployments:
Don't add tracing code in every function. Instead, instrument where you initialize your LLM clients:
python1# ✅ Good: Instrument at initialization 2from keywords_ai_tracing import KeywordsTracer 3KeywordsTracer.init(api_key=os.getenv("KEYWORDSAI_API_KEY")) 4 5# Now all LLM calls are automatically traced 6response = completion(model="gpt-4", messages=messages) 7 8# ❌ Bad: Manual instrumentation everywhere 9def call_llm(messages): 10 span = tracer.start_span("llm.call") # Don't do this everywhere 11 # ...
Include user IDs and session IDs in your spans to enable user-level analytics:
span.set_attribute("llm.user_id", user_id)
span.set_attribute("llm.session_id", session_id)
span.set_attribute("llm.customer_id", customer_id)
For production:
from opentelemetry.sdk.trace.sampling import TraceIdRatioBased
sampler = TraceIdRatioBased(0.1) # 10% sampling
High trace volume can be expensive. Monitor:
If volume is too high, adjust sampling or reduce span granularity.
Follow OpenTelemetry's semantic conventions for attribute names:
python1# ✅ Good: Use semantic conventions 2span.set_attribute("llm.model", "gpt-4") 3span.set_attribute("llm.tokens.prompt", 150) 4span.set_attribute("http.method", "POST") 5 6# ❌ Bad: Custom attribute names 7span.set_attribute("model_name", "gpt-4") # Inconsistent
Always record exceptions and set span status:
python1try: 2 response = completion(model="gpt-4", messages=messages) 3 span.set_status(trace.Status(trace.StatusCode.OK)) 4except Exception as e: 5 span.record_exception(e) 6 span.set_status(trace.Status(trace.StatusCode.ERROR, str(e))) 7 raise
Configure alerts for:
Don't just collect traces—use them:
When you update your instrumentation code, include version information:
resource = Resource.create({ResourceAttributes.SERVICE_VERSION: "1.2.3", "instrumentation.version": "2.0.0"})
This helps you correlate issues with code changes.
Don't assume your instrumentation works. Test it:
OpenTelemetry is the industry standard for LLM observability, and it's essential for production LLM applications. As an LLM monitoring open source solution, OpenTelemetry provides the foundation for comprehensive observability. Whether you're using Haystack for RAG pipelines, implementing Vercel AI SDK telemetry for Next.js apps, or setting up LiteLLM observability for multi-provider setups, OpenTelemetry provides a unified way to observe your applications.
The choice between manual setup and choosing the best LLM observability platform comes down to:
For most teams building production LLM applications, choosing the best LLM observability platform provides better ROI. With Keywords AI, you get production-ready LLM observability in 2 minutes instead of 2-4 hours, automatic cost tracking for 100+ models, and built-in dashboards optimized for LLM workflows. Whether you need Vercel AI SDK telemetry, LiteLLM observability, or Haystack instrumentation, the right LLM observability platform will save you hundreds of engineering hours.
Ready to see your traces?
Get started with Keywords AI in 60 seconds and start instrumenting your LLM applications today. Focus on building great AI features, not observability infrastructure.


