PyPI - genai-otel-instrument - Versions diffs - 0.1.7.dev0__tar.gz → 0.1.9.dev0__tar.gz - Mend

genai-otel-instrument 0.1.7.dev0tar.gz → 0.1.9.dev0tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Potentially problematic release.

This version of genai-otel-instrument might be problematic. Click here for more details.

Files changed (207) hide show

{genai_otel_instrument-0.1.7.dev0 → genai_otel_instrument-0.1.9.dev0}/CHANGELOG.md RENAMED Viewed

@@ -6,6 +6,50 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
 ## [Unreleased]
+## [0.1.8] - 2025-01-27
+### Added
+- **HuggingFace AutoModelForCausalLM and AutoModelForSeq2SeqLM Instrumentation**
+  - Added support for direct model usage via `AutoModelForCausalLM.generate()` and `AutoModelForSeq2SeqLM.generate()`
+  - Automatic token counting from input and output tensor shapes
+  - Cost calculation based on model parameter count (uses CostCalculator's local model pricing tiers)
+  - Span attributes: `gen_ai.system`, `gen_ai.request.model`, `gen_ai.operation.name`, token counts, costs
+  - Metrics: request counter, token counter, latency histogram, cost counter
+  - Supports generation parameters: `max_length`, `max_new_tokens`, `temperature`, `top_p`
+  - Implementation in `genai_otel/instrumentors/huggingface_instrumentor.py:184-333`
+  - Example usage in `examples/huggingface/example_automodel.py`
+  - All 443 tests pass (added 1 new test)
+### Fixed
+- **CRITICAL: Cost Tracking for OpenInference Instrumentors (smolagents, litellm, mcp)**
+  - Replaced `CostEnrichmentSpanProcessor` with `CostEnrichingSpanExporter` to properly add cost attributes
+  - **Root Cause**: SpanProcessor's `on_end()` receives immutable `ReadableSpan` objects that cannot be modified
+  - **Solution**: Custom SpanExporter that enriches span data before export, creating new ReadableSpan instances with cost attributes
+  - Cost attributes now correctly appear for smolagents, litellm, and mcp spans:
+    - `gen_ai.usage.cost.total`: Total cost in USD
+    - `gen_ai.usage.cost.prompt`: Prompt tokens cost
+    - `gen_ai.usage.cost.completion`: Completion tokens cost
+  - Supports all OpenInference semantic conventions:
+    - Model name: `llm.model_name`, `gen_ai.request.model`, `embedding.model_name`
+    - Token counts: `llm.token_count.{prompt,completion}`, `gen_ai.usage.{prompt_tokens,completion_tokens}`
+    - Span kinds: `openinference.span.kind` (LLM, EMBEDDING, CHAIN, etc.)
+  - Implementation in `genai_otel/cost_enriching_exporter.py`
+  - Updated `genai_otel/auto_instrument.py` to wrap OTLP and Console exporters
+  - Model name normalization handles provider prefixes (e.g., `openai/gpt-3.5-turbo` → `gpt-3.5-turbo`)
+  - All 442 existing tests continue to pass
+- **HuggingFace AutoModelForCausalLM AttributeError Fix**
+  - Fixed `AttributeError: type object 'AutoModelForCausalLM' has no attribute 'generate'`
+  - Root cause: `AutoModelForCausalLM` is a factory class; `generate()` exists on `GenerationMixin`
+  - Solution: Wrap `GenerationMixin.generate()` which all generative models inherit from
+  - This covers all model types: `AutoModelForCausalLM`, `AutoModelForSeq2SeqLM`, `GPT2LMHeadModel`, etc.
+  - Added fallback import for older transformers versions
+  - Implementation in `genai_otel/instrumentors/huggingface_instrumentor.py:184-346`
+## [0.1.7] - 2025-01-25
 ### Added
 - **Phase 4: Session and User Tracking (4.1)**

{genai_otel_instrument-0.1.7.dev0/genai_otel_instrument.egg-info → genai_otel_instrument-0.1.9.dev0}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.4
 Name: genai-otel-instrument
-Version: 0.1.7.dev0
+Version: 0.1.9.dev0
 Summary: Comprehensive OpenTelemetry auto-instrumentation for LLM/GenAI applications
 Author-email: Kshitij Thakkar <kshitijthakkar@rocketmail.com>
 License: Apache-2.0
@@ -257,7 +257,8 @@ For a more comprehensive demonstration of various LLM providers and MCP tools, r
 ### LLM Providers (Auto-detected)
 - **With Full Cost Tracking**: OpenAI, Anthropic, Google AI, AWS Bedrock, Azure OpenAI, Cohere, Mistral AI, Together AI, Groq, Ollama, Vertex AI
-- **Hardware/Local Pricing**: Replicate (hardware-based $/second), HuggingFace (local execution, free)
+- **Hardware/Local Pricing**: Replicate (hardware-based $/second), HuggingFace (local execution with estimated costs)
+  - **HuggingFace Support**: `pipeline()`, `AutoModelForCausalLM.generate()`, `AutoModelForSeq2SeqLM.generate()`, `InferenceClient` API calls
 - **Other Providers**: Anyscale
 ### Frameworks
@@ -307,7 +308,10 @@ The library includes comprehensive cost tracking with pricing data for **145+ mo
 ### Special Pricing Models
 - **Replicate**: Hardware-based pricing ($/second of GPU/CPU time) - not token-based
-- **HuggingFace Transformers**: Local execution - no API costs
+- **HuggingFace Transformers**: Local model execution with estimated costs based on parameter count
+  - Supports `pipeline()`, `AutoModelForCausalLM.generate()`, `AutoModelForSeq2SeqLM.generate()`
+  - Cost estimation uses GPU/compute resource pricing tiers (tiny/small/medium/large)
+  - Automatic token counting from tensor shapes
 ### Pricing Features
 - **Differential Pricing**: Separate rates for prompt tokens vs. completion tokens

{genai_otel_instrument-0.1.7.dev0 → genai_otel_instrument-0.1.9.dev0}/README.md RENAMED Viewed

@@ -77,7 +77,8 @@ For a more comprehensive demonstration of various LLM providers and MCP tools, r
 ### LLM Providers (Auto-detected)
 - **With Full Cost Tracking**: OpenAI, Anthropic, Google AI, AWS Bedrock, Azure OpenAI, Cohere, Mistral AI, Together AI, Groq, Ollama, Vertex AI
-- **Hardware/Local Pricing**: Replicate (hardware-based $/second), HuggingFace (local execution, free)
+- **Hardware/Local Pricing**: Replicate (hardware-based $/second), HuggingFace (local execution with estimated costs)
+  - **HuggingFace Support**: `pipeline()`, `AutoModelForCausalLM.generate()`, `AutoModelForSeq2SeqLM.generate()`, `InferenceClient` API calls
 - **Other Providers**: Anyscale
 ### Frameworks
@@ -127,7 +128,10 @@ The library includes comprehensive cost tracking with pricing data for **145+ mo
 ### Special Pricing Models
 - **Replicate**: Hardware-based pricing ($/second of GPU/CPU time) - not token-based
-- **HuggingFace Transformers**: Local execution - no API costs
+- **HuggingFace Transformers**: Local model execution with estimated costs based on parameter count
+  - Supports `pipeline()`, `AutoModelForCausalLM.generate()`, `AutoModelForSeq2SeqLM.generate()`
+  - Cost estimation uses GPU/compute resource pricing tiers (tiny/small/medium/large)
+  - Automatic token counting from tensor shapes
 ### Pricing Features
 - **Differential Pricing**: Separate rates for prompt tokens vs. completion tokens

genai_otel_instrument-0.1.9.dev0/examples/huggingface/example_automodel.py ADDED Viewed

@@ -0,0 +1,89 @@
+"""HuggingFace AutoModelForCausalLM Example with Token Counting and Cost Tracking.
+This example demonstrates:
+1. Auto-instrumentation of AutoModelForCausalLM.generate()
+2. Automatic token counting (prompt + completion tokens)
+3. Cost calculation for local model inference
+4. Full observability with traces and metrics
+Requirements:
+    pip install transformers torch
+"""
+import genai_otel
+# Auto-instrument HuggingFace Transformers
+genai_otel.instrument()
+from transformers import AutoModelForCausalLM, AutoTokenizer
+print("\n" + "=" * 80)
+print("Loading model and tokenizer...")
+print("=" * 80 + "\n")
+# Load a small model for testing (117M parameters)
+model_name = "Qwen/Qwen3-0.6B"
+tokenizer = AutoTokenizer.from_pretrained(model_name)
+model = AutoModelForCausalLM.from_pretrained(model_name)
+print(f"Model loaded: {model_name}")
+print(f"Model config: {model.config._name_or_path}\n")
+# Prepare input
+prompt = "The future of AI is"
+inputs = tokenizer(prompt, return_tensors="pt")
+print(f"Prompt: '{prompt}'")
+print(f"Input tokens: {inputs['input_ids'].shape[-1]}\n")
+print("=" * 80)
+print("Generating text (instrumented)...")
+print("=" * 80 + "\n")
+# Generate text - This is automatically instrumented!
+# The wrapper will:
+# - Create a span with model info
+# - Count input tokens (from input_ids.shape)
+# - Count output tokens (from generated sequence length)
+# - Calculate cost based on GPT-2's parameter count (117M -> tier pricing)
+# - Record metrics for tokens and cost
+outputs = model.generate(
+    inputs["input_ids"],
+    max_new_tokens=50,
+    temperature=0.7,
+    do_sample=True,
+    pad_token_id=tokenizer.eos_token_id,
+)
+# Decode the generated text
+generated_text = tokenizer.decode(outputs[0], skip_special_tokens=True)
+print(f"Generated text: {generated_text}\n")
+print(f"Total output tokens: {outputs.shape[-1]}")
+print(f"Input tokens: {inputs['input_ids'].shape[-1]}")
+print(f"Generated (new) tokens: {outputs.shape[-1] - inputs['input_ids'].shape[-1]}\n")
+print("=" * 80)
+print("Telemetry captured:")
+print("=" * 80)
+print("✓ Span created: huggingface.model.generate")
+print("✓ Attributes set:")
+print(f"  - gen_ai.system: huggingface")
+print(f"  - gen_ai.request.model: {model_name}")
+print(f"  - gen_ai.operation.name: text_generation")
+print(f"  - gen_ai.usage.prompt_tokens: {inputs['input_ids'].shape[-1]}")
+print(f"  - gen_ai.usage.completion_tokens: {outputs.shape[-1] - inputs['input_ids'].shape[-1]}")
+print(f"  - gen_ai.usage.total_tokens: {outputs.shape[-1]}")
+print("  - gen_ai.usage.cost.total: $X.XXXXXX (estimated)")
+print("  - gen_ai.usage.cost.prompt: $X.XXXXXX")
+print("  - gen_ai.usage.cost.completion: $X.XXXXXX")
+print("\n✓ Metrics recorded:")
+print("  - gen_ai.requests counter")
+print("  - gen_ai.client.token.usage (prompt + completion)")
+print("  - gen_ai.client.operation.duration histogram")
+print("  - gen_ai.usage.cost counter")
+print("\n✓ Traces and metrics exported to OTLP endpoint!")
+print("=" * 80)
+print("\nNote: Cost is estimated based on model size (GPT-2 = 117M params)")
+print("Local models are free to run, but costs reflect GPU/compute resources.")

{genai_otel_instrument-0.1.7.dev0 → genai_otel_instrument-0.1.9.dev0}/genai_otel/__version__.py RENAMED Viewed

@@ -28,7 +28,7 @@ version_tuple: VERSION_TUPLE
 commit_id: COMMIT_ID
 __commit_id__: COMMIT_ID
-__version__ = version = '0.1.7.dev0'
-__version_tuple__ = version_tuple = (0, 1, 7, 'dev0')
+__version__ = version = '0.1.9.dev0'
+__version_tuple__ = version_tuple = (0, 1, 9, 'dev0')
-__commit_id__ = commit_id = 'g6e54041fe'
+__commit_id__ = commit_id = 'gf4ccb18e4'

{genai_otel_instrument-0.1.7.dev0 → genai_otel_instrument-0.1.9.dev0}/genai_otel/auto_instrument.py RENAMED Viewed

@@ -19,6 +19,7 @@ from opentelemetry.sdk.trace.export import BatchSpanProcessor, ConsoleSpanExport
 from .config import OTelConfig
 from .cost_calculator import CostCalculator
 from .cost_enrichment_processor import CostEnrichmentSpanProcessor
+from .cost_enriching_exporter import CostEnrichingSpanExporter
 from .gpu_metrics import GPUMetricsCollector
 from .mcp_instrumentors import MCPInstrumentorManager
 from .metrics import (
@@ -169,14 +170,17 @@ def setup_auto_instrumentation(config: OTelConfig):
     set_global_textmap(TraceContextTextMapPropagator())
-    # Add cost enrichment processor for OpenInference instrumentors
-    # This enriches spans from smolagents, litellm, mcp with cost attributes
+    # Add cost enrichment processor for custom instrumentors (OpenAI, Ollama, etc.)
+    # These instrumentors set cost attributes directly, so processor is mainly for logging
+    # Also attempts to enrich OpenInference spans (smolagents, litellm, mcp), though
+    # the processor can't modify ReadableSpan - the exporter below handles that
+    cost_calculator = None
     if config.enable_cost_tracking:
         try:
             cost_calculator = CostCalculator()
             cost_processor = CostEnrichmentSpanProcessor(cost_calculator)
             tracer_provider.add_span_processor(cost_processor)
-            logger.info("Cost enrichment processor added for OpenInference instrumentors")
+            logger.info("Cost enrichment processor added")
         except Exception as e:
             logger.warning(f"Failed to add cost enrichment processor: {e}", exc_info=True)

genai_otel_instrument-0.1.9.dev0/genai_otel/cost_enriching_exporter.py ADDED Viewed

@@ -0,0 +1,207 @@
+"""Custom SpanExporter that enriches spans with cost attributes before export.
+This exporter wraps another exporter (like OTLPSpanExporter) and adds cost
+attributes to spans before passing them to the wrapped exporter.
+"""
+import logging
+from typing import Optional, Sequence
+from opentelemetry.sdk.trace import ReadableSpan
+from opentelemetry.sdk.trace.export import SpanExporter, SpanExportResult
+from .cost_calculator import CostCalculator
+logger = logging.getLogger(__name__)
+class CostEnrichingSpanExporter(SpanExporter):
+    """Wraps a SpanExporter and enriches spans with cost attributes before export.
+    This exporter:
+    1. Receives ReadableSpan objects from the SDK
+    2. Extracts model name and token usage from span attributes
+    3. Calculates cost using CostCalculator
+    4. Creates enriched span data with cost attributes
+    5. Exports to the wrapped exporter (e.g., OTLP)
+    """
+    def __init__(
+        self, wrapped_exporter: SpanExporter, cost_calculator: Optional[CostCalculator] = None
+    ):
+        """Initialize the cost enriching exporter.
+        Args:
+            wrapped_exporter: The underlying exporter to send enriched spans to.
+            cost_calculator: CostCalculator instance to use for cost calculations.
+                           If None, creates a new instance.
+        """
+        self.wrapped_exporter = wrapped_exporter
+        self.cost_calculator = cost_calculator or CostCalculator()
+        logger.info(
+            f"CostEnrichingSpanExporter initialized, wrapping {type(wrapped_exporter).__name__}"
+        )
+    def export(self, spans: Sequence[ReadableSpan]) -> SpanExportResult:
+        """Export spans after enriching them with cost attributes.
+        Args:
+            spans: Sequence of ReadableSpan objects to export.
+        Returns:
+            SpanExportResult from the wrapped exporter.
+        """
+        try:
+            # Enrich spans with cost attributes
+            enriched_spans = []
+            for span in spans:
+                enriched_span = self._enrich_span(span)
+                enriched_spans.append(enriched_span)
+            # Export to wrapped exporter
+            return self.wrapped_exporter.export(enriched_spans)
+        except Exception as e:
+            logger.error(f"Failed to export spans: {e}", exc_info=True)
+            return SpanExportResult.FAILURE
+    def _enrich_span(self, span: ReadableSpan) -> ReadableSpan:
+        """Enrich a span with cost attributes if applicable.
+        Args:
+            span: The original ReadableSpan.
+        Returns:
+            A new ReadableSpan with cost attributes added (or the original if not applicable).
+        """
+        try:
+            # Check if span has LLM-related attributes
+            if not span.attributes:
+                return span
+            attributes = dict(span.attributes)  # Make a mutable copy
+            # Check for model name - support both GenAI and OpenInference conventions
+            model = (
+                attributes.get("gen_ai.request.model")
+                or attributes.get("llm.model_name")
+                or attributes.get("embedding.model_name")
+            )
+            if not model:
+                return span
+            # Skip if cost attributes are already present
+            if "gen_ai.usage.cost.total" in attributes:
+                logger.debug(f"Span '{span.name}' already has cost attributes, skipping enrichment")
+                return span
+            # Extract token usage - support GenAI, OpenInference, and legacy conventions
+            prompt_tokens = (
+                attributes.get("gen_ai.usage.prompt_tokens")
+                or attributes.get("gen_ai.usage.input_tokens")
+                or attributes.get("llm.token_count.prompt")  # OpenInference
+                or 0
+            )
+            completion_tokens = (
+                attributes.get("gen_ai.usage.completion_tokens")
+                or attributes.get("gen_ai.usage.output_tokens")
+                or attributes.get("llm.token_count.completion")  # OpenInference
+                or 0
+            )
+            # Skip if no tokens recorded
+            if prompt_tokens == 0 and completion_tokens == 0:
+                return span
+            # Get call type - support both GenAI and OpenInference conventions
+            span_kind = attributes.get("openinference.span.kind", "").upper()
+            call_type = attributes.get("gen_ai.operation.name") or span_kind.lower() or "chat"
+            # Map operation names to call types
+            call_type_mapping = {
+                "chat": "chat",
+                "completion": "chat",
+                "embedding": "embedding",
+                "embeddings": "embedding",
+                "text_generation": "chat",
+                "image_generation": "image",
+                "audio": "audio",
+                "llm": "chat",
+                "chain": "chat",
+                "retriever": "embedding",
+                "reranker": "embedding",
+                "tool": "chat",
+                "agent": "chat",
+            }
+            normalized_call_type = call_type_mapping.get(str(call_type).lower(), "chat")
+            # Calculate cost
+            usage = {
+                "prompt_tokens": int(prompt_tokens),
+                "completion_tokens": int(completion_tokens),
+                "total_tokens": int(prompt_tokens) + int(completion_tokens),
+            }
+            cost_info = self.cost_calculator.calculate_granular_cost(
+                model=str(model),
+                usage=usage,
+                call_type=normalized_call_type,
+            )
+            if cost_info and cost_info.get("total", 0.0) > 0:
+                # Add cost attributes to the mutable copy
+                attributes["gen_ai.usage.cost.total"] = cost_info["total"]
+                if cost_info.get("prompt", 0.0) > 0:
+                    attributes["gen_ai.usage.cost.prompt"] = cost_info["prompt"]
+                if cost_info.get("completion", 0.0) > 0:
+                    attributes["gen_ai.usage.cost.completion"] = cost_info["completion"]
+                logger.info(
+                    f"Enriched span '{span.name}' with cost: {cost_info['total']:.6f} USD "
+                    f"for model {model} ({usage['total_tokens']} tokens)"
+                )
+                # Create a new ReadableSpan with enriched attributes
+                # ReadableSpan is a NamedTuple, so we need to replace it
+                from opentelemetry.sdk.trace import ReadableSpan as RS
+                enriched_span = RS(
+                    name=span.name,
+                    context=span.context,
+                    kind=span.kind,
+                    parent=span.parent,
+                    start_time=span.start_time,
+                    end_time=span.end_time,
+                    status=span.status,
+                    attributes=attributes,  # Use enriched attributes
+                    events=span.events,
+                    links=span.links,
+                    resource=span.resource,
+                    instrumentation_scope=span.instrumentation_scope,
+                )
+                return enriched_span
+        except Exception as e:
+            logger.warning(
+                f"Failed to enrich span '{getattr(span, 'name', 'unknown')}' with cost: {e}",
+                exc_info=True,
+            )
+        return span
+    def shutdown(self) -> None:
+        """Shutdown the wrapped exporter."""
+        logger.info("CostEnrichingSpanExporter shutting down")
+        self.wrapped_exporter.shutdown()
+    def force_flush(self, timeout_millis: int = 30000) -> bool:
+        """Force flush the wrapped exporter.
+        Args:
+            timeout_millis: Timeout in milliseconds.
+        Returns:
+            True if flush succeeded.
+        """
+        return self.wrapped_exporter.force_flush(timeout_millis)

{genai_otel_instrument-0.1.7.dev0 → genai_otel_instrument-0.1.9.dev0}/genai_otel/cost_enrichment_processor.py RENAMED Viewed

@@ -132,9 +132,8 @@ class CostEnrichmentSpanProcessor(SpanProcessor):
             if cost_info and cost_info.get("total", 0.0) > 0:
                 # Add cost attributes to the span
-                # Note: We can't modify ReadableSpan attributes directly,
-                # but we can if span is still a Span instance
-                if isinstance(span, Span):
+                # Use duck typing to check if span supports set_attribute
+                if hasattr(span, "set_attribute") and callable(getattr(span, "set_attribute")):
                     span.set_attribute("gen_ai.usage.cost.total", cost_info["total"])
                     if cost_info.get("prompt", 0.0) > 0:

genai-otel-instrument 0.1.7.dev0__tar.gz → 0.1.9.dev0__tar.gz

Potentially problematic release.

genai-otel-instrument 0.1.7.dev0tar.gz → 0.1.9.dev0tar.gz