genai-otel-instrument 0.1.12.dev0__py3-none-any.whl → 0.1.16__py3-none-any.whl
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Potentially problematic release.
This version of genai-otel-instrument might be problematic. Click here for more details.
- genai_otel/__init__.py +1 -1
- genai_otel/__version__.py +34 -34
- {genai_otel_instrument-0.1.12.dev0.dist-info → genai_otel_instrument-0.1.16.dist-info}/METADATA +959 -952
- {genai_otel_instrument-0.1.12.dev0.dist-info → genai_otel_instrument-0.1.16.dist-info}/RECORD +8 -8
- genai_otel_instrument-0.1.16.dist-info/licenses/LICENSE +680 -0
- genai_otel_instrument-0.1.12.dev0.dist-info/licenses/LICENSE +0 -201
- {genai_otel_instrument-0.1.12.dev0.dist-info → genai_otel_instrument-0.1.16.dist-info}/WHEEL +0 -0
- {genai_otel_instrument-0.1.12.dev0.dist-info → genai_otel_instrument-0.1.16.dist-info}/entry_points.txt +0 -0
- {genai_otel_instrument-0.1.12.dev0.dist-info → genai_otel_instrument-0.1.16.dist-info}/top_level.txt +0 -0
{genai_otel_instrument-0.1.12.dev0.dist-info → genai_otel_instrument-0.1.16.dist-info}/METADATA
RENAMED
|
@@ -1,952 +1,959 @@
|
|
|
1
|
-
Metadata-Version: 2.4
|
|
2
|
-
Name: genai-otel-instrument
|
|
3
|
-
Version: 0.1.
|
|
4
|
-
Summary: Comprehensive OpenTelemetry auto-instrumentation for LLM/GenAI applications
|
|
5
|
-
Author-email: Kshitij Thakkar <kshitijthakkar@rocketmail.com>
|
|
6
|
-
License:
|
|
7
|
-
Project-URL: Homepage, https://github.com/Mandark-droid/genai_otel_instrument
|
|
8
|
-
Project-URL: Repository, https://github.com/Mandark-droid/genai_otel_instrument
|
|
9
|
-
Project-URL: Documentation, https://github.com/Mandark-droid/genai_otel_instrument#readme
|
|
10
|
-
Project-URL: Issues, https://github.com/Mandark-droid/genai_otel_instrument/issues
|
|
11
|
-
Project-URL: Changelog, https://github.com/Mandark-droid/genai_otel_instrument/blob/main/CHANGELOG.md
|
|
12
|
-
Keywords: opentelemetry,observability,llm,genai,instrumentation,tracing,metrics,monitoring
|
|
13
|
-
Classifier: Development Status :: 4 - Beta
|
|
14
|
-
Classifier: Intended Audience :: Developers
|
|
15
|
-
Classifier: Topic :: Software Development :: Libraries :: Python Modules
|
|
16
|
-
Classifier: Topic :: System :: Monitoring
|
|
17
|
-
Classifier: License :: OSI Approved ::
|
|
18
|
-
Classifier: Operating System :: OS Independent
|
|
19
|
-
Classifier: Programming Language :: Python :: 3
|
|
20
|
-
Classifier: Programming Language :: Python :: 3.9
|
|
21
|
-
Classifier: Programming Language :: Python :: 3.10
|
|
22
|
-
Classifier: Programming Language :: Python :: 3.11
|
|
23
|
-
Classifier: Programming Language :: Python :: 3.12
|
|
24
|
-
Requires-Python: >=3.9
|
|
25
|
-
Description-Content-Type: text/markdown
|
|
26
|
-
License-File: LICENSE
|
|
27
|
-
Requires-Dist: opentelemetry-api<2.0.0,>=1.20.0
|
|
28
|
-
Requires-Dist: opentelemetry-sdk<2.0.0,>=1.20.0
|
|
29
|
-
Requires-Dist: opentelemetry-instrumentation>=0.41b0
|
|
30
|
-
Requires-Dist: opentelemetry-semantic-conventions<1.0.0,>=0.45b0
|
|
31
|
-
Requires-Dist: opentelemetry-exporter-otlp>=1.20.0
|
|
32
|
-
Requires-Dist: opentelemetry-instrumentation-requests>=0.41b0
|
|
33
|
-
Requires-Dist: opentelemetry-instrumentation-httpx>=0.41b0
|
|
34
|
-
Requires-Dist: requests>=2.20.0
|
|
35
|
-
Requires-Dist: wrapt>=1.14.0
|
|
36
|
-
Requires-Dist: httpx>=0.23.0
|
|
37
|
-
Requires-Dist: opentelemetry-instrumentation-mysql>=0.41b0
|
|
38
|
-
Requires-Dist: mysql-connector-python<9.0.0,>=8.0.0
|
|
39
|
-
Requires-Dist: opentelemetry-instrumentation-psycopg2>=0.41b0
|
|
40
|
-
Requires-Dist: psycopg2-binary>=2.9.0
|
|
41
|
-
Requires-Dist: opentelemetry-instrumentation-redis>=0.41b0
|
|
42
|
-
Requires-Dist: redis
|
|
43
|
-
Requires-Dist: opentelemetry-instrumentation-pymongo>=0.41b0
|
|
44
|
-
Requires-Dist: pymongo
|
|
45
|
-
Requires-Dist: opentelemetry-instrumentation-sqlalchemy>=0.41b0
|
|
46
|
-
Requires-Dist: sqlalchemy>=1.4.0
|
|
47
|
-
Requires-Dist: opentelemetry-instrumentation-kafka-python>=0.41b0
|
|
48
|
-
Requires-Dist: kafka-python
|
|
49
|
-
Provides-Extra: openinference
|
|
50
|
-
Requires-Dist: openinference-instrumentation==0.1.31; extra == "openinference"
|
|
51
|
-
Requires-Dist: openinference-instrumentation-litellm==0.1.19; extra == "openinference"
|
|
52
|
-
Requires-Dist: openinference-instrumentation-mcp==1.3.0; extra == "openinference"
|
|
53
|
-
Requires-Dist: openinference-instrumentation-smolagents==0.1.11; extra == "openinference"
|
|
54
|
-
Requires-Dist: litellm>=1.0.0; extra == "openinference"
|
|
55
|
-
Provides-Extra: gpu
|
|
56
|
-
Requires-Dist: nvidia-ml-py>=11.495.46; extra == "gpu"
|
|
57
|
-
Requires-Dist: codecarbon>=2.3.0; extra == "gpu"
|
|
58
|
-
Provides-Extra: co2
|
|
59
|
-
Requires-Dist: codecarbon>=2.3.0; extra == "co2"
|
|
60
|
-
Provides-Extra: openai
|
|
61
|
-
Requires-Dist: openai>=1.0.0; extra == "openai"
|
|
62
|
-
Provides-Extra: anthropic
|
|
63
|
-
Requires-Dist: anthropic>=0.18.0; extra == "anthropic"
|
|
64
|
-
Provides-Extra: google
|
|
65
|
-
Requires-Dist: google-generativeai>=0.3.0; extra == "google"
|
|
66
|
-
Provides-Extra: aws
|
|
67
|
-
Requires-Dist: boto3>=1.28.0; extra == "aws"
|
|
68
|
-
Provides-Extra: azure
|
|
69
|
-
Requires-Dist: azure-ai-openai>=1.0.0; extra == "azure"
|
|
70
|
-
Provides-Extra: cohere
|
|
71
|
-
Requires-Dist: cohere>=4.0.0; extra == "cohere"
|
|
72
|
-
Provides-Extra: mistral
|
|
73
|
-
Requires-Dist: mistralai>=0.4.2; extra == "mistral"
|
|
74
|
-
Provides-Extra: together
|
|
75
|
-
Requires-Dist: together>=0.2.0; extra == "together"
|
|
76
|
-
Provides-Extra: groq
|
|
77
|
-
Requires-Dist: groq>=0.4.0; extra == "groq"
|
|
78
|
-
Provides-Extra: ollama
|
|
79
|
-
Requires-Dist: ollama>=0.1.0; extra == "ollama"
|
|
80
|
-
Provides-Extra: replicate
|
|
81
|
-
Requires-Dist: replicate>=0.15.0; extra == "replicate"
|
|
82
|
-
Provides-Extra: langchain
|
|
83
|
-
Requires-Dist: langchain>=0.1.0; extra == "langchain"
|
|
84
|
-
Provides-Extra: llamaindex
|
|
85
|
-
Requires-Dist: llama-index>=0.9.0; extra == "llamaindex"
|
|
86
|
-
Provides-Extra: huggingface
|
|
87
|
-
Requires-Dist: transformers>=4.30.0; extra == "huggingface"
|
|
88
|
-
Provides-Extra: databases
|
|
89
|
-
Requires-Dist: opentelemetry-instrumentation-sqlalchemy>=0.41b0; extra == "databases"
|
|
90
|
-
Requires-Dist: sqlalchemy>=1.4.0; extra == "databases"
|
|
91
|
-
Requires-Dist: opentelemetry-instrumentation-redis>=0.41b0; extra == "databases"
|
|
92
|
-
Requires-Dist: redis; extra == "databases"
|
|
93
|
-
Requires-Dist: opentelemetry-instrumentation-pymongo>=0.41b0; extra == "databases"
|
|
94
|
-
Requires-Dist: pymongo; extra == "databases"
|
|
95
|
-
Requires-Dist: opentelemetry-instrumentation-psycopg2>=0.41b0; extra == "databases"
|
|
96
|
-
Requires-Dist: psycopg2-binary>=2.9.0; extra == "databases"
|
|
97
|
-
Requires-Dist: opentelemetry-instrumentation-mysql>=0.41b0; extra == "databases"
|
|
98
|
-
Requires-Dist: mysql-connector-python<9.0.0,>=8.0.0; extra == "databases"
|
|
99
|
-
Provides-Extra: messaging
|
|
100
|
-
Requires-Dist: opentelemetry-instrumentation-kafka-python>=0.41b0; extra == "messaging"
|
|
101
|
-
Requires-Dist: kafka-python; extra == "messaging"
|
|
102
|
-
Provides-Extra: vector-dbs
|
|
103
|
-
Requires-Dist: pinecone>=3.0.0; extra == "vector-dbs"
|
|
104
|
-
Requires-Dist: weaviate-client>=3.0.0; extra == "vector-dbs"
|
|
105
|
-
Requires-Dist: qdrant-client>=1.0.0; extra == "vector-dbs"
|
|
106
|
-
Requires-Dist: chromadb>=0.4.0; extra == "vector-dbs"
|
|
107
|
-
Requires-Dist: pymilvus>=2.3.0; extra == "vector-dbs"
|
|
108
|
-
Requires-Dist: faiss-cpu>=1.7.0; extra == "vector-dbs"
|
|
109
|
-
Provides-Extra: all-providers
|
|
110
|
-
Requires-Dist: openai>=1.0.0; extra == "all-providers"
|
|
111
|
-
Requires-Dist: anthropic>=0.18.0; extra == "all-providers"
|
|
112
|
-
Requires-Dist: google-generativeai>=0.3.0; extra == "all-providers"
|
|
113
|
-
Requires-Dist: boto3>=1.28.0; extra == "all-providers"
|
|
114
|
-
Requires-Dist: azure-ai-openai>=1.0.0; extra == "all-providers"
|
|
115
|
-
Requires-Dist: cohere>=4.0.0; extra == "all-providers"
|
|
116
|
-
Requires-Dist: mistralai>=0.4.2; extra == "all-providers"
|
|
117
|
-
Requires-Dist: together>=0.2.0; extra == "all-providers"
|
|
118
|
-
Requires-Dist: groq>=0.4.0; extra == "all-providers"
|
|
119
|
-
Requires-Dist: ollama>=0.1.0; extra == "all-providers"
|
|
120
|
-
Requires-Dist: replicate>=0.15.0; extra == "all-providers"
|
|
121
|
-
Requires-Dist: langchain>=0.1.0; extra == "all-providers"
|
|
122
|
-
Requires-Dist: llama-index>=0.9.0; extra == "all-providers"
|
|
123
|
-
Requires-Dist: transformers>=4.30.0; extra == "all-providers"
|
|
124
|
-
Requires-Dist: litellm>=1.0.0; extra == "all-providers"
|
|
125
|
-
Provides-Extra: all-mcp
|
|
126
|
-
Requires-Dist: opentelemetry-instrumentation-sqlalchemy>=0.41b0; extra == "all-mcp"
|
|
127
|
-
Requires-Dist: opentelemetry-instrumentation-redis>=0.41b0; extra == "all-mcp"
|
|
128
|
-
Requires-Dist: opentelemetry-instrumentation-pymongo>=0.41b0; extra == "all-mcp"
|
|
129
|
-
Requires-Dist: opentelemetry-instrumentation-psycopg2>=0.41b0; extra == "all-mcp"
|
|
130
|
-
Requires-Dist: opentelemetry-instrumentation-mysql>=0.41b0; extra == "all-mcp"
|
|
131
|
-
Requires-Dist: opentelemetry-instrumentation-kafka-python>=0.41b0; extra == "all-mcp"
|
|
132
|
-
Requires-Dist: pinecone>=3.0.0; extra == "all-mcp"
|
|
133
|
-
Requires-Dist: weaviate-client>=3.0.0; extra == "all-mcp"
|
|
134
|
-
Requires-Dist: qdrant-client>=1.0.0; extra == "all-mcp"
|
|
135
|
-
Requires-Dist: chromadb>=0.4.0; extra == "all-mcp"
|
|
136
|
-
Requires-Dist: pymilvus>=2.3.0; extra == "all-mcp"
|
|
137
|
-
Requires-Dist: faiss-cpu>=1.7.0; extra == "all-mcp"
|
|
138
|
-
Requires-Dist: sqlalchemy; extra == "all-mcp"
|
|
139
|
-
Provides-Extra: all
|
|
140
|
-
Requires-Dist: openai>=1.0.0; extra == "all"
|
|
141
|
-
Requires-Dist: anthropic>=0.18.0; extra == "all"
|
|
142
|
-
Requires-Dist: google-generativeai>=0.3.0; extra == "all"
|
|
143
|
-
Requires-Dist: boto3>=1.28.0; extra == "all"
|
|
144
|
-
Requires-Dist: azure-ai-openai>=1.0.0; extra == "all"
|
|
145
|
-
Requires-Dist: cohere>=4.0.0; extra == "all"
|
|
146
|
-
Requires-Dist: mistralai>=0.4.2; extra == "all"
|
|
147
|
-
Requires-Dist: together>=0.2.0; extra == "all"
|
|
148
|
-
Requires-Dist: groq>=0.4.0; extra == "all"
|
|
149
|
-
Requires-Dist: ollama>=0.1.0; extra == "all"
|
|
150
|
-
Requires-Dist: replicate>=0.15.0; extra == "all"
|
|
151
|
-
Requires-Dist: langchain>=0.1.0; extra == "all"
|
|
152
|
-
Requires-Dist: llama-index>=0.9.0; extra == "all"
|
|
153
|
-
Requires-Dist: transformers>=4.30.0; extra == "all"
|
|
154
|
-
Requires-Dist: nvidia-ml-py>=11.495.46; extra == "all"
|
|
155
|
-
Requires-Dist: opentelemetry-instrumentation-sqlalchemy>=0.41b0; extra == "all"
|
|
156
|
-
Requires-Dist: opentelemetry-instrumentation-redis>=0.41b0; extra == "all"
|
|
157
|
-
Requires-Dist: opentelemetry-instrumentation-pymongo>=0.41b0; extra == "all"
|
|
158
|
-
Requires-Dist: opentelemetry-instrumentation-psycopg2>=0.41b0; extra == "all"
|
|
159
|
-
Requires-Dist: opentelemetry-instrumentation-mysql>=0.41b0; extra == "all"
|
|
160
|
-
Requires-Dist: opentelemetry-instrumentation-kafka-python>=0.41b0; extra == "all"
|
|
161
|
-
Requires-Dist: pinecone>=3.0.0; extra == "all"
|
|
162
|
-
Requires-Dist: weaviate-client>=3.0.0; extra == "all"
|
|
163
|
-
Requires-Dist: qdrant-client>=1.0.0; extra == "all"
|
|
164
|
-
Requires-Dist: chromadb>=0.4.0; extra == "all"
|
|
165
|
-
Requires-Dist: pymilvus>=2.3.0; extra == "all"
|
|
166
|
-
Requires-Dist: faiss-cpu>=1.7.0; extra == "all"
|
|
167
|
-
Requires-Dist: sqlalchemy; extra == "all"
|
|
168
|
-
Provides-Extra: dev
|
|
169
|
-
Requires-Dist: pytest>=7.0.0; extra == "dev"
|
|
170
|
-
Requires-Dist: pytest-cov>=4.0.0; extra == "dev"
|
|
171
|
-
Requires-Dist: pytest-mock>=3.10.0; extra == "dev"
|
|
172
|
-
Requires-Dist: pytest-asyncio>=0.21.0; extra == "dev"
|
|
173
|
-
Requires-Dist: black>=23.0.0; extra == "dev"
|
|
174
|
-
Requires-Dist: isort>=5.12.0; extra == "dev"
|
|
175
|
-
Requires-Dist: pylint>=2.17.0; extra == "dev"
|
|
176
|
-
Requires-Dist: mypy>=1.0.0; extra == "dev"
|
|
177
|
-
Requires-Dist: build>=0.10.0; extra == "dev"
|
|
178
|
-
Requires-Dist: twine>=4.0.0; extra == "dev"
|
|
179
|
-
Dynamic: license-file
|
|
180
|
-
|
|
181
|
-
#
|
|
182
|
-
|
|
183
|
-
<div align="center">
|
|
184
|
-
<img src=".github/images/Logo.jpg" alt="GenAI OpenTelemetry Instrumentation Logo" width="400"/>
|
|
185
|
-
</div>
|
|
186
|
-
|
|
187
|
-
<br/>
|
|
188
|
-
|
|
189
|
-
[](https://badge.fury.io/py/genai-otel-instrument)
|
|
190
|
-
[](https://pypi.org/project/genai-otel-instrument/)
|
|
191
|
-
[](https://pepy.tech/project/genai-otel-instrument)
|
|
193
|
-
[](https://pepy.tech/project/genai-otel-instrument)
|
|
194
|
-
|
|
195
|
-
[](https://github.com/Mandark-droid/genai_otel_instrument)
|
|
196
|
-
[](https://github.com/Mandark-droid/genai_otel_instrument)
|
|
197
|
-
[](https://github.com/Mandark-droid/genai_otel_instrument/issues)
|
|
198
|
-
[](https://github.com/Mandark-droid/genai_otel_instrument/pulls)
|
|
199
|
-
|
|
200
|
-
[](https://github.com/Mandark-droid/genai_otel_instrument)
|
|
201
|
-
[](https://github.com/psf/black)
|
|
202
|
-
[](https://pycqa.github.io/isort/)
|
|
203
|
-
[](http://mypy-lang.org/)
|
|
204
|
-
|
|
205
|
-
[](https://opentelemetry.io/)
|
|
206
|
-
[](https://opentelemetry.io/docs/specs/semconv/gen-ai/)
|
|
207
|
-
[](https://github.com/Mandark-droid/genai_otel_instrument/actions)
|
|
208
|
-
|
|
209
|
-
---
|
|
210
|
-
|
|
211
|
-
<div align="center">
|
|
212
|
-
<img src=".github/images/Landing_Page.jpg" alt="GenAI OpenTelemetry Instrumentation Overview" width="800"/>
|
|
213
|
-
</div>
|
|
214
|
-
|
|
215
|
-
---
|
|
216
|
-
|
|
217
|
-
Production-ready OpenTelemetry instrumentation for GenAI/LLM applications with zero-code setup.
|
|
218
|
-
|
|
219
|
-
## Features
|
|
220
|
-
|
|
221
|
-
🚀 **Zero-Code Instrumentation** - Just install and set env vars
|
|
222
|
-
🤖 **15+ LLM Providers** - OpenAI, Anthropic, Google, AWS, Azure, and more
|
|
223
|
-
🔧 **MCP Tool Support** - Auto-instrument databases, APIs, caches, vector DBs
|
|
224
|
-
💰 **Cost Tracking** - Automatic cost calculation for both streaming and non-streaming requests
|
|
225
|
-
⚡ **Streaming Support** - Full observability for streaming responses with TTFT/TBT metrics and cost tracking
|
|
226
|
-
🎮 **GPU Metrics** - Real-time GPU utilization, memory, temperature, power, and electricity cost tracking
|
|
227
|
-
📊 **Complete Observability** - Traces, metrics, and rich span attributes
|
|
228
|
-
➕ **Service Instance ID & Environment** - Identify your services and environments
|
|
229
|
-
⏱️ **Configurable Exporter Timeout** - Set timeout for OTLP exporter
|
|
230
|
-
🔗 **OpenInference Instrumentors** - Smolagents, MCP, and LiteLLM instrumentation
|
|
231
|
-
|
|
232
|
-
## Quick Start
|
|
233
|
-
|
|
234
|
-
### Installation
|
|
235
|
-
|
|
236
|
-
```bash
|
|
237
|
-
pip install genai-otel-instrument
|
|
238
|
-
```
|
|
239
|
-
|
|
240
|
-
### Usage
|
|
241
|
-
|
|
242
|
-
**Option 1: Environment Variables (No code changes)**
|
|
243
|
-
|
|
244
|
-
```bash
|
|
245
|
-
export OTEL_SERVICE_NAME=my-llm-app
|
|
246
|
-
export OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4318
|
|
247
|
-
python your_app.py
|
|
248
|
-
```
|
|
249
|
-
|
|
250
|
-
**Option 2: One line of code**
|
|
251
|
-
|
|
252
|
-
```python
|
|
253
|
-
import genai_otel
|
|
254
|
-
genai_otel.instrument()
|
|
255
|
-
|
|
256
|
-
# Your existing code works unchanged
|
|
257
|
-
import openai
|
|
258
|
-
client = openai.OpenAI()
|
|
259
|
-
response = client.chat.completions.create(...)
|
|
260
|
-
```
|
|
261
|
-
|
|
262
|
-
**Option 3: CLI wrapper**
|
|
263
|
-
|
|
264
|
-
```bash
|
|
265
|
-
genai-instrument python your_app.py
|
|
266
|
-
```
|
|
267
|
-
|
|
268
|
-
For a more comprehensive demonstration of various LLM providers and MCP tools, refer to `example_usage.py` in the project root. Note that running this example requires setting up relevant API keys and external services (e.g., databases, Redis, Pinecone).
|
|
269
|
-
|
|
270
|
-
## What Gets Instrumented?
|
|
271
|
-
|
|
272
|
-
### LLM Providers (Auto-detected)
|
|
273
|
-
- **With Full Cost Tracking**: OpenAI, Anthropic, Google AI, AWS Bedrock, Azure OpenAI, Cohere, Mistral AI, Together AI, Groq, Ollama, Vertex AI
|
|
274
|
-
- **Hardware/Local Pricing**: Replicate (hardware-based $/second), HuggingFace (local execution with estimated costs)
|
|
275
|
-
- **HuggingFace Support**: `pipeline()`, `AutoModelForCausalLM.generate()`, `AutoModelForSeq2SeqLM.generate()`, `InferenceClient` API calls
|
|
276
|
-
- **Other Providers**: Anyscale
|
|
277
|
-
|
|
278
|
-
### Frameworks
|
|
279
|
-
- LangChain (chains, agents, tools)
|
|
280
|
-
- LlamaIndex (query engines, indices)
|
|
281
|
-
|
|
282
|
-
### MCP Tools (Model Context Protocol)
|
|
283
|
-
- **Databases**: PostgreSQL, MySQL, MongoDB, SQLAlchemy
|
|
284
|
-
- **Caching**: Redis
|
|
285
|
-
- **Message Queues**: Apache Kafka
|
|
286
|
-
- **Vector Databases**: Pinecone, Weaviate, Qdrant, ChromaDB, Milvus, FAISS
|
|
287
|
-
- **APIs**: HTTP/REST requests (requests, httpx)
|
|
288
|
-
|
|
289
|
-
### OpenInference (Optional - Python 3.10+ only)
|
|
290
|
-
- Smolagents - HuggingFace smolagents framework tracing
|
|
291
|
-
- MCP - Model Context Protocol instrumentation
|
|
292
|
-
- LiteLLM - Multi-provider LLM proxy
|
|
293
|
-
|
|
294
|
-
**Cost Enrichment:** OpenInference instrumentors are automatically enriched with cost tracking! When cost tracking is enabled (`GENAI_ENABLE_COST_TRACKING=true`), a custom `CostEnrichmentSpanProcessor` extracts model and token usage from OpenInference spans and adds cost attributes (`gen_ai.usage.cost.total`, `gen_ai.usage.cost.prompt`, `gen_ai.usage.cost.completion`) using our comprehensive pricing database of 145+ models.
|
|
295
|
-
|
|
296
|
-
The processor supports OpenInference semantic conventions:
|
|
297
|
-
- Model: `llm.model_name`, `embedding.model_name`
|
|
298
|
-
- Tokens: `llm.token_count.prompt`, `llm.token_count.completion`
|
|
299
|
-
- Operations: `openinference.span.kind` (LLM, EMBEDDING, CHAIN, RETRIEVER, etc.)
|
|
300
|
-
|
|
301
|
-
**Note:** OpenInference instrumentors require Python >= 3.10. Install with:
|
|
302
|
-
```bash
|
|
303
|
-
pip install genai-otel-instrument[openinference]
|
|
304
|
-
```
|
|
305
|
-
|
|
306
|
-
## Screenshots
|
|
307
|
-
|
|
308
|
-
See the instrumentation in action across different LLM providers and observability backends.
|
|
309
|
-
|
|
310
|
-
### OpenAI Instrumentation
|
|
311
|
-
Full trace capture for OpenAI API calls with token usage, costs, and latency metrics.
|
|
312
|
-
|
|
313
|
-
<div align="center">
|
|
314
|
-
<img src=".github/images/Screenshots/Traces_OpenAI.png" alt="OpenAI Traces" width="900"/>
|
|
315
|
-
</div>
|
|
316
|
-
|
|
317
|
-
### Ollama (Local LLM) Instrumentation
|
|
318
|
-
Zero-code instrumentation for local models running on Ollama with comprehensive observability.
|
|
319
|
-
|
|
320
|
-
<div align="center">
|
|
321
|
-
<img src=".github/images/Screenshots/Traces_Ollama.png" alt="Ollama Traces" width="900"/>
|
|
322
|
-
</div>
|
|
323
|
-
|
|
324
|
-
### HuggingFace Transformers
|
|
325
|
-
Direct instrumentation of HuggingFace Transformers with automatic token counting and cost estimation.
|
|
326
|
-
|
|
327
|
-
<div align="center">
|
|
328
|
-
<img src=".github/images/Screenshots/Trace_HuggingFace_Transformer_Models.png" alt="HuggingFace Transformer Traces" width="900"/>
|
|
329
|
-
</div>
|
|
330
|
-
|
|
331
|
-
### SmolAgents Framework
|
|
332
|
-
Complete agent workflow tracing with tool calls, iterations, and cost breakdown.
|
|
333
|
-
|
|
334
|
-
<div align="center">
|
|
335
|
-
<img src=".github/images/Screenshots/Traces_SmolAgent_with_tool_calls.png" alt="SmolAgent Traces with Tool Calls" width="900"/>
|
|
336
|
-
</div>
|
|
337
|
-
|
|
338
|
-
### GPU Metrics Collection
|
|
339
|
-
Real-time GPU utilization, memory, temperature, and power consumption metrics.
|
|
340
|
-
|
|
341
|
-
<div align="center">
|
|
342
|
-
<img src=".github/images/Screenshots/GPU_Metrics.png" alt="GPU Metrics Dashboard" width="900"/>
|
|
343
|
-
</div>
|
|
344
|
-
|
|
345
|
-
### Additional Screenshots
|
|
346
|
-
|
|
347
|
-
- **[Token Cost Breakdown](.github/images/Screenshots/Traces_SmolAgent_Token_Cost_breakdown.png)** - Detailed token usage and cost analysis for SmolAgent workflows
|
|
348
|
-
- **[OpenSearch Dashboard](.github/images/Screenshots/GENAI_OpenSearch_output.png)** - GenAI metrics visualization in OpenSearch/Kibana
|
|
349
|
-
|
|
350
|
-
---
|
|
351
|
-
|
|
352
|
-
## Demo Video
|
|
353
|
-
|
|
354
|
-
Watch a comprehensive walkthrough of GenAI OpenTelemetry Auto-Instrumentation in action, demonstrating setup, configuration, and real-time observability across multiple LLM providers.
|
|
355
|
-
|
|
356
|
-
<div align="center">
|
|
357
|
-
|
|
358
|
-
**🎥 [Watch Demo Video](https://youtu.be/YOUR_VIDEO_ID_HERE)**
|
|
359
|
-
*(Coming Soon)*
|
|
360
|
-
|
|
361
|
-
</div>
|
|
362
|
-
|
|
363
|
-
---
|
|
364
|
-
|
|
365
|
-
## Cost Tracking Coverage
|
|
366
|
-
|
|
367
|
-
The library includes comprehensive cost tracking with pricing data for **145+ models** across **11 providers**:
|
|
368
|
-
|
|
369
|
-
### Providers with Full Token-Based Cost Tracking
|
|
370
|
-
- **OpenAI**: GPT-4o, GPT-4 Turbo, GPT-3.5 Turbo, o1/o3 series, embeddings, audio, vision (35+ models)
|
|
371
|
-
- **Anthropic**: Claude 3.5 Sonnet/Opus/Haiku, Claude 3 series (10+ models)
|
|
372
|
-
- **Google AI**: Gemini 1.5/2.0 Pro/Flash, PaLM 2 (12+ models)
|
|
373
|
-
- **AWS Bedrock**: Amazon Titan, Claude, Llama, Mistral models (20+ models)
|
|
374
|
-
- **Azure OpenAI**: Same as OpenAI with Azure-specific pricing
|
|
375
|
-
- **Cohere**: Command R/R+, Command Light, Embed v3/v2 (8+ models)
|
|
376
|
-
- **Mistral AI**: Mistral Large/Medium/Small, Mixtral, embeddings (8+ models)
|
|
377
|
-
- **Together AI**: DeepSeek-R1, Llama 3.x, Qwen, Mixtral (25+ models)
|
|
378
|
-
- **Groq**: Llama 3.x series, Mixtral, Gemma models (15+ models)
|
|
379
|
-
- **Ollama**: Local models with token tracking (pricing via cost estimation)
|
|
380
|
-
- **Vertex AI**: Gemini models via Google Cloud with usage metadata extraction
|
|
381
|
-
|
|
382
|
-
### Special Pricing Models
|
|
383
|
-
- **Replicate**: Hardware-based pricing ($/second of GPU/CPU time) - not token-based
|
|
384
|
-
- **HuggingFace Transformers**: Local model execution with estimated costs based on parameter count
|
|
385
|
-
- Supports `pipeline()`, `AutoModelForCausalLM.generate()`, `AutoModelForSeq2SeqLM.generate()`
|
|
386
|
-
- Cost estimation uses GPU/compute resource pricing tiers (tiny/small/medium/large)
|
|
387
|
-
- Automatic token counting from tensor shapes
|
|
388
|
-
|
|
389
|
-
### Pricing Features
|
|
390
|
-
- **Differential Pricing**: Separate rates for prompt tokens vs. completion tokens
|
|
391
|
-
- **Reasoning Tokens**: Special pricing for OpenAI o1/o3 reasoning tokens
|
|
392
|
-
- **Cache Pricing**: Anthropic prompt caching costs (read/write)
|
|
393
|
-
- **Granular Cost Metrics**: Per-request cost breakdown by token type
|
|
394
|
-
- **Auto-Updated Pricing**: Pricing data maintained in `llm_pricing.json`
|
|
395
|
-
- **Custom Pricing**: Add pricing for custom/proprietary models via environment variable
|
|
396
|
-
|
|
397
|
-
### Adding Custom Model Pricing
|
|
398
|
-
|
|
399
|
-
For custom or proprietary models not in `llm_pricing.json`, you can provide custom pricing via the `GENAI_CUSTOM_PRICING_JSON` environment variable:
|
|
400
|
-
|
|
401
|
-
```bash
|
|
402
|
-
# For chat models
|
|
403
|
-
export GENAI_CUSTOM_PRICING_JSON='{"chat":{"my-custom-model":{"promptPrice":0.001,"completionPrice":0.002}}}'
|
|
404
|
-
|
|
405
|
-
# For embeddings
|
|
406
|
-
export GENAI_CUSTOM_PRICING_JSON='{"embeddings":{"my-custom-embeddings":0.00005}}'
|
|
407
|
-
|
|
408
|
-
# For multiple categories
|
|
409
|
-
export GENAI_CUSTOM_PRICING_JSON='{
|
|
410
|
-
"chat": {
|
|
411
|
-
"my-custom-chat": {"promptPrice": 0.001, "completionPrice": 0.002}
|
|
412
|
-
},
|
|
413
|
-
"embeddings": {
|
|
414
|
-
"my-custom-embed": 0.00005
|
|
415
|
-
},
|
|
416
|
-
"audio": {
|
|
417
|
-
"my-custom-tts": 0.02
|
|
418
|
-
}
|
|
419
|
-
}'
|
|
420
|
-
```
|
|
421
|
-
|
|
422
|
-
**Pricing Format:**
|
|
423
|
-
- **Chat models**: `{"promptPrice": <$/1k tokens>, "completionPrice": <$/1k tokens>}`
|
|
424
|
-
- **Embeddings**: Single number for price per 1k tokens
|
|
425
|
-
- **Audio**: Price per 1k characters (TTS) or per second (STT)
|
|
426
|
-
- **Images**: Nested structure with quality/size pricing (see `llm_pricing.json` for examples)
|
|
427
|
-
|
|
428
|
-
**Hybrid Pricing:** Custom prices are merged with default pricing from `llm_pricing.json`. If you provide custom pricing for an existing model, the custom price overrides the default.
|
|
429
|
-
|
|
430
|
-
**Coverage Statistics**: As of v0.1.3, 89% test coverage with 415 passing tests, including comprehensive cost calculation validation and cost enrichment processor tests (supporting both GenAI and OpenInference semantic conventions).
|
|
431
|
-
|
|
432
|
-
## Collected Telemetry
|
|
433
|
-
|
|
434
|
-
### Traces
|
|
435
|
-
Every LLM call, database query, API request, and vector search is traced with full context propagation.
|
|
436
|
-
|
|
437
|
-
### Metrics
|
|
438
|
-
|
|
439
|
-
**GenAI Metrics:**
|
|
440
|
-
- `gen_ai.requests` - Request counts by provider and model
|
|
441
|
-
- `gen_ai.client.token.usage` - Token usage (prompt/completion)
|
|
442
|
-
- `gen_ai.client.operation.duration` - Request latency histogram (optimized buckets for LLM workloads)
|
|
443
|
-
- `gen_ai.usage.cost` - Total estimated costs in USD
|
|
444
|
-
- `gen_ai.usage.cost.prompt` - Prompt tokens cost (granular)
|
|
445
|
-
- `gen_ai.usage.cost.completion` - Completion tokens cost (granular)
|
|
446
|
-
- `gen_ai.usage.cost.reasoning` - Reasoning tokens cost (OpenAI o1 models)
|
|
447
|
-
- `gen_ai.usage.cost.cache_read` - Cache read cost (Anthropic)
|
|
448
|
-
- `gen_ai.usage.cost.cache_write` - Cache write cost (Anthropic)
|
|
449
|
-
- `gen_ai.client.errors` - Error counts by operation and type
|
|
450
|
-
- `gen_ai.gpu.*` - GPU utilization, memory, temperature, power (ObservableGauges)
|
|
451
|
-
- `gen_ai.co2.emissions` - CO2 emissions tracking (opt-in via `GENAI_ENABLE_CO2_TRACKING`)
|
|
452
|
-
- `gen_ai.power.cost` - Cumulative electricity cost in USD based on GPU power consumption (configurable via `GENAI_POWER_COST_PER_KWH`)
|
|
453
|
-
- `gen_ai.server.ttft` - Time to First Token for streaming responses (histogram, 1ms-10s buckets)
|
|
454
|
-
- `gen_ai.server.tbt` - Time Between Tokens for streaming responses (histogram, 10ms-2.5s buckets)
|
|
455
|
-
|
|
456
|
-
**MCP Metrics (Database Operations):**
|
|
457
|
-
- `mcp.requests` - Number of MCP/database requests
|
|
458
|
-
- `mcp.client.operation.duration` - Operation duration histogram (1ms to 10s buckets)
|
|
459
|
-
- `mcp.request.size` - Request payload size histogram (100B to 5MB buckets)
|
|
460
|
-
- `mcp.response.size` - Response payload size histogram (100B to 5MB buckets)
|
|
461
|
-
|
|
462
|
-
### Span Attributes
|
|
463
|
-
**Core Attributes:**
|
|
464
|
-
- `gen_ai.system` - Provider name (e.g., "openai")
|
|
465
|
-
- `gen_ai.operation.name` - Operation type (e.g., "chat")
|
|
466
|
-
- `gen_ai.request.model` - Model identifier
|
|
467
|
-
- `gen_ai.usage.prompt_tokens` / `gen_ai.usage.input_tokens` - Input tokens (dual emission supported)
|
|
468
|
-
- `gen_ai.usage.completion_tokens` / `gen_ai.usage.output_tokens` - Output tokens (dual emission supported)
|
|
469
|
-
- `gen_ai.usage.total_tokens` - Total tokens
|
|
470
|
-
|
|
471
|
-
**Request Parameters:**
|
|
472
|
-
- `gen_ai.request.temperature` - Temperature setting
|
|
473
|
-
- `gen_ai.request.top_p` - Top-p sampling
|
|
474
|
-
- `gen_ai.request.max_tokens` - Max tokens requested
|
|
475
|
-
- `gen_ai.request.frequency_penalty` - Frequency penalty
|
|
476
|
-
- `gen_ai.request.presence_penalty` - Presence penalty
|
|
477
|
-
|
|
478
|
-
**Response Attributes:**
|
|
479
|
-
- `gen_ai.response.id` - Response ID from provider
|
|
480
|
-
- `gen_ai.response.model` - Actual model used (may differ from request)
|
|
481
|
-
- `gen_ai.response.finish_reasons` - Array of finish reasons
|
|
482
|
-
|
|
483
|
-
**Tool/Function Calls:**
|
|
484
|
-
- `llm.tools` - JSON-serialized tool definitions
|
|
485
|
-
- `llm.output_messages.{choice}.message.tool_calls.{index}.tool_call.id` - Tool call ID
|
|
486
|
-
- `llm.output_messages.{choice}.message.tool_calls.{index}.tool_call.function.name` - Function name
|
|
487
|
-
- `llm.output_messages.{choice}.message.tool_calls.{index}.tool_call.function.arguments` - Function arguments
|
|
488
|
-
|
|
489
|
-
**Cost Attributes (granular):**
|
|
490
|
-
- `gen_ai.usage.cost.total` - Total cost
|
|
491
|
-
- `gen_ai.usage.cost.prompt` - Prompt tokens cost
|
|
492
|
-
- `gen_ai.usage.cost.completion` - Completion tokens cost
|
|
493
|
-
- `gen_ai.usage.cost.reasoning` - Reasoning tokens cost (o1 models)
|
|
494
|
-
- `gen_ai.usage.cost.cache_read` - Cache read cost (Anthropic)
|
|
495
|
-
- `gen_ai.usage.cost.cache_write` - Cache write cost (Anthropic)
|
|
496
|
-
|
|
497
|
-
**Streaming Attributes:**
|
|
498
|
-
- `gen_ai.server.ttft` - Time to First Token (seconds) for streaming responses
|
|
499
|
-
- `gen_ai.streaming.token_count` - Total number of chunks in streaming response
|
|
500
|
-
- `gen_ai.usage.prompt_tokens` - Actual prompt tokens (extracted from final chunk)
|
|
501
|
-
- `gen_ai.usage.completion_tokens` - Actual completion tokens (extracted from final chunk)
|
|
502
|
-
- `gen_ai.usage.total_tokens` - Total tokens (extracted from final chunk)
|
|
503
|
-
- `gen_ai.usage.cost.total` - Total cost for streaming request
|
|
504
|
-
- `gen_ai.usage.cost.prompt` - Prompt tokens cost for streaming request
|
|
505
|
-
- `gen_ai.usage.cost.completion` - Completion tokens cost for streaming request
|
|
506
|
-
- All granular cost attributes (reasoning, cache_read, cache_write) also available for streaming
|
|
507
|
-
|
|
508
|
-
**Content Events (opt-in):**
|
|
509
|
-
- `gen_ai.prompt.{index}` events with role and content
|
|
510
|
-
- `gen_ai.completion.{index}` events with role and content
|
|
511
|
-
|
|
512
|
-
**Additional:**
|
|
513
|
-
- Database, vector DB, and API attributes from MCP instrumentation
|
|
514
|
-
|
|
515
|
-
## Configuration
|
|
516
|
-
|
|
517
|
-
### Environment Variables
|
|
518
|
-
|
|
519
|
-
```bash
|
|
520
|
-
# Required
|
|
521
|
-
OTEL_SERVICE_NAME=my-app
|
|
522
|
-
OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4318
|
|
523
|
-
|
|
524
|
-
# Optional
|
|
525
|
-
OTEL_EXPORTER_OTLP_HEADERS=x-api-key=secret
|
|
526
|
-
GENAI_ENABLE_GPU_METRICS=true
|
|
527
|
-
GENAI_ENABLE_COST_TRACKING=true
|
|
528
|
-
GENAI_ENABLE_MCP_INSTRUMENTATION=true
|
|
529
|
-
GENAI_GPU_COLLECTION_INTERVAL=5 # GPU metrics collection interval in seconds (default: 5)
|
|
530
|
-
OTEL_SERVICE_INSTANCE_ID=instance-1 # Optional service instance id
|
|
531
|
-
OTEL_ENVIRONMENT=production # Optional environment
|
|
532
|
-
OTEL_EXPORTER_OTLP_TIMEOUT=10.0 # Optional timeout for OTLP exporter
|
|
533
|
-
|
|
534
|
-
# Semantic conventions (NEW)
|
|
535
|
-
OTEL_SEMCONV_STABILITY_OPT_IN=gen_ai # "gen_ai" for new conventions only, "gen_ai/dup" for dual emission
|
|
536
|
-
GENAI_ENABLE_CONTENT_CAPTURE=false # WARNING: May capture sensitive data. Enable with caution.
|
|
537
|
-
|
|
538
|
-
# Logging configuration
|
|
539
|
-
GENAI_OTEL_LOG_LEVEL=INFO # DEBUG, INFO, WARNING, ERROR, CRITICAL. Logs are written to 'logs/genai_otel.log' with rotation (10 files, 10MB each).
|
|
540
|
-
|
|
541
|
-
# Error handling
|
|
542
|
-
GENAI_FAIL_ON_ERROR=false # true to fail fast, false to continue on errors
|
|
543
|
-
```
|
|
544
|
-
|
|
545
|
-
### Programmatic Configuration
|
|
546
|
-
|
|
547
|
-
```python
|
|
548
|
-
import genai_otel
|
|
549
|
-
|
|
550
|
-
genai_otel.instrument(
|
|
551
|
-
service_name="my-app",
|
|
552
|
-
endpoint="http://localhost:4318",
|
|
553
|
-
enable_gpu_metrics=True,
|
|
554
|
-
enable_cost_tracking=True,
|
|
555
|
-
enable_mcp_instrumentation=True
|
|
556
|
-
)
|
|
557
|
-
```
|
|
558
|
-
|
|
559
|
-
### Sample Environment File (`sample.env`)
|
|
560
|
-
|
|
561
|
-
A `sample.env` file has been generated in the project root directory. This file contains commented-out examples of all supported environment variables, along with their default values or expected formats. You can copy this file to `.env` and uncomment/modify the variables to configure the instrumentation for your specific needs.
|
|
562
|
-
|
|
563
|
-
## Advanced Features
|
|
564
|
-
|
|
565
|
-
### Session and User Tracking
|
|
566
|
-
|
|
567
|
-
Track user sessions and identify users across multiple LLM requests for better analytics, debugging, and cost attribution.
|
|
568
|
-
|
|
569
|
-
**Configuration:**
|
|
570
|
-
|
|
571
|
-
```python
|
|
572
|
-
import genai_otel
|
|
573
|
-
from genai_otel import OTelConfig
|
|
574
|
-
|
|
575
|
-
# Define extractor functions
|
|
576
|
-
def extract_session_id(instance, args, kwargs):
|
|
577
|
-
"""Extract session ID from request metadata."""
|
|
578
|
-
# Option 1: From kwargs metadata
|
|
579
|
-
metadata = kwargs.get("metadata", {})
|
|
580
|
-
return metadata.get("session_id")
|
|
581
|
-
|
|
582
|
-
# Option 2: From custom headers
|
|
583
|
-
# headers = kwargs.get("headers", {})
|
|
584
|
-
# return headers.get("X-Session-ID")
|
|
585
|
-
|
|
586
|
-
# Option 3: From thread-local storage
|
|
587
|
-
# import threading
|
|
588
|
-
# return getattr(threading.current_thread(), "session_id", None)
|
|
589
|
-
|
|
590
|
-
def extract_user_id(instance, args, kwargs):
|
|
591
|
-
"""Extract user ID from request metadata."""
|
|
592
|
-
metadata = kwargs.get("metadata", {})
|
|
593
|
-
return metadata.get("user_id")
|
|
594
|
-
|
|
595
|
-
# Configure with extractors
|
|
596
|
-
config = OTelConfig(
|
|
597
|
-
service_name="my-rag-app",
|
|
598
|
-
endpoint="http://localhost:4318",
|
|
599
|
-
session_id_extractor=extract_session_id,
|
|
600
|
-
user_id_extractor=extract_user_id,
|
|
601
|
-
)
|
|
602
|
-
|
|
603
|
-
genai_otel.instrument(config)
|
|
604
|
-
```
|
|
605
|
-
|
|
606
|
-
**Usage:**
|
|
607
|
-
|
|
608
|
-
```python
|
|
609
|
-
from openai import OpenAI
|
|
610
|
-
|
|
611
|
-
client = OpenAI()
|
|
612
|
-
|
|
613
|
-
# Pass session and user info via metadata
|
|
614
|
-
response = client.chat.completions.create(
|
|
615
|
-
model="gpt-3.5-turbo",
|
|
616
|
-
messages=[{"role": "user", "content": "What is OpenTelemetry?"}],
|
|
617
|
-
extra_body={"metadata": {"session_id": "sess_12345", "user_id": "user_alice"}}
|
|
618
|
-
)
|
|
619
|
-
```
|
|
620
|
-
|
|
621
|
-
**Span Attributes Added:**
|
|
622
|
-
- `session.id` - Unique session identifier for tracking conversations
|
|
623
|
-
- `user.id` - User identifier for per-user analytics and cost tracking
|
|
624
|
-
|
|
625
|
-
**Use Cases:**
|
|
626
|
-
- Track multi-turn conversations across requests
|
|
627
|
-
- Analyze usage patterns per user
|
|
628
|
-
- Debug session-specific issues
|
|
629
|
-
- Calculate per-user costs and quotas
|
|
630
|
-
- Build user-specific dashboards
|
|
631
|
-
|
|
632
|
-
### RAG and Embedding Attributes
|
|
633
|
-
|
|
634
|
-
Enhanced observability for Retrieval-Augmented Generation (RAG) workflows, including embedding generation and document retrieval.
|
|
635
|
-
|
|
636
|
-
**Helper Methods:**
|
|
637
|
-
|
|
638
|
-
The `BaseInstrumentor` provides helper methods to add RAG-specific attributes to your spans:
|
|
639
|
-
|
|
640
|
-
```python
|
|
641
|
-
from opentelemetry import trace
|
|
642
|
-
from genai_otel.instrumentors.base import BaseInstrumentor
|
|
643
|
-
|
|
644
|
-
# Get your instrumentor instance (or create spans manually)
|
|
645
|
-
tracer = trace.get_tracer(__name__)
|
|
646
|
-
|
|
647
|
-
# 1. Embedding Attributes
|
|
648
|
-
with tracer.start_as_current_span("embedding.create") as span:
|
|
649
|
-
# Your embedding logic
|
|
650
|
-
embedding_response = client.embeddings.create(
|
|
651
|
-
model="text-embedding-3-small",
|
|
652
|
-
input="OpenTelemetry provides observability"
|
|
653
|
-
)
|
|
654
|
-
|
|
655
|
-
# Add embedding attributes (if using BaseInstrumentor)
|
|
656
|
-
# instrumentor.add_embedding_attributes(
|
|
657
|
-
# span,
|
|
658
|
-
# model="text-embedding-3-small",
|
|
659
|
-
# input_text="OpenTelemetry provides observability",
|
|
660
|
-
# vector=embedding_response.data[0].embedding
|
|
661
|
-
# )
|
|
662
|
-
|
|
663
|
-
# Or manually set attributes
|
|
664
|
-
span.set_attribute("embedding.model_name", "text-embedding-3-small")
|
|
665
|
-
span.set_attribute("embedding.text", "OpenTelemetry provides observability"[:500])
|
|
666
|
-
span.set_attribute("embedding.vector.dimension", len(embedding_response.data[0].embedding))
|
|
667
|
-
|
|
668
|
-
# 2. Retrieval Attributes
|
|
669
|
-
with tracer.start_as_current_span("retrieval.search") as span:
|
|
670
|
-
# Your retrieval logic
|
|
671
|
-
retrieved_docs = [
|
|
672
|
-
{
|
|
673
|
-
"id": "doc_001",
|
|
674
|
-
"score": 0.95,
|
|
675
|
-
"content": "OpenTelemetry is an observability framework...",
|
|
676
|
-
"metadata": {"source": "docs.opentelemetry.io", "category": "intro"}
|
|
677
|
-
},
|
|
678
|
-
# ... more documents
|
|
679
|
-
]
|
|
680
|
-
|
|
681
|
-
# Add retrieval attributes (if using BaseInstrumentor)
|
|
682
|
-
# instrumentor.add_retrieval_attributes(
|
|
683
|
-
# span,
|
|
684
|
-
# documents=retrieved_docs,
|
|
685
|
-
# query="What is OpenTelemetry?",
|
|
686
|
-
# max_docs=5
|
|
687
|
-
# )
|
|
688
|
-
|
|
689
|
-
# Or manually set attributes
|
|
690
|
-
span.set_attribute("retrieval.query", "What is OpenTelemetry?"[:500])
|
|
691
|
-
span.set_attribute("retrieval.document_count", len(retrieved_docs))
|
|
692
|
-
|
|
693
|
-
for i, doc in enumerate(retrieved_docs[:5]): # Limit to 5 docs
|
|
694
|
-
prefix = f"retrieval.documents.{i}.document"
|
|
695
|
-
span.set_attribute(f"{prefix}.id", doc["id"])
|
|
696
|
-
span.set_attribute(f"{prefix}.score", doc["score"])
|
|
697
|
-
span.set_attribute(f"{prefix}.content", doc["content"][:500])
|
|
698
|
-
|
|
699
|
-
# Add metadata
|
|
700
|
-
for key, value in doc.get("metadata", {}).items():
|
|
701
|
-
span.set_attribute(f"{prefix}.metadata.{key}", str(value))
|
|
702
|
-
```
|
|
703
|
-
|
|
704
|
-
**Embedding Attributes:**
|
|
705
|
-
- `embedding.model_name` - Embedding model used
|
|
706
|
-
- `embedding.text` - Input text (truncated to 500 chars)
|
|
707
|
-
- `embedding.vector` - Embedding vector (optional, if configured)
|
|
708
|
-
- `embedding.vector.dimension` - Vector dimensions
|
|
709
|
-
|
|
710
|
-
**Retrieval Attributes:**
|
|
711
|
-
- `retrieval.query` - Search query (truncated to 500 chars)
|
|
712
|
-
- `retrieval.document_count` - Number of documents retrieved
|
|
713
|
-
- `retrieval.documents.{i}.document.id` - Document ID
|
|
714
|
-
- `retrieval.documents.{i}.document.score` - Relevance score
|
|
715
|
-
- `retrieval.documents.{i}.document.content` - Document content (truncated to 500 chars)
|
|
716
|
-
- `retrieval.documents.{i}.document.metadata.*` - Custom metadata fields
|
|
717
|
-
|
|
718
|
-
**Safeguards:**
|
|
719
|
-
- Text content truncated to 500 characters to avoid span size explosion
|
|
720
|
-
- Document count limited to 5 by default (configurable via `max_docs`)
|
|
721
|
-
- Metadata values truncated to prevent excessive attribute counts
|
|
722
|
-
|
|
723
|
-
**Complete RAG Workflow Example:**
|
|
724
|
-
|
|
725
|
-
See `examples/phase4_session_rag_tracking.py` for a comprehensive demonstration of:
|
|
726
|
-
- Session and user tracking across RAG pipeline
|
|
727
|
-
- Embedding attribute capture
|
|
728
|
-
- Retrieval attribute capture
|
|
729
|
-
- End-to-end RAG workflow with full observability
|
|
730
|
-
|
|
731
|
-
**Use Cases:**
|
|
732
|
-
- Monitor retrieval quality and relevance scores
|
|
733
|
-
- Debug RAG pipeline performance
|
|
734
|
-
- Track embedding model usage
|
|
735
|
-
- Analyze document retrieval patterns
|
|
736
|
-
- Optimize vector search configurations
|
|
737
|
-
|
|
738
|
-
## Example: Full-Stack GenAI App
|
|
739
|
-
|
|
740
|
-
```python
|
|
741
|
-
import genai_otel
|
|
742
|
-
genai_otel.instrument()
|
|
743
|
-
|
|
744
|
-
import openai
|
|
745
|
-
import pinecone
|
|
746
|
-
import redis
|
|
747
|
-
import psycopg2
|
|
748
|
-
|
|
749
|
-
# All of these are automatically instrumented:
|
|
750
|
-
|
|
751
|
-
# Cache check
|
|
752
|
-
cache = redis.Redis().get('key')
|
|
753
|
-
|
|
754
|
-
# Vector search
|
|
755
|
-
pinecone_index = pinecone.Index("embeddings")
|
|
756
|
-
results = pinecone_index.query(vector=[...], top_k=5)
|
|
757
|
-
|
|
758
|
-
# Database query
|
|
759
|
-
conn = psycopg2.connect("dbname=mydb")
|
|
760
|
-
cursor = conn.cursor()
|
|
761
|
-
cursor.execute("SELECT * FROM context")
|
|
762
|
-
|
|
763
|
-
# LLM call with full context
|
|
764
|
-
client = openai.OpenAI()
|
|
765
|
-
response = client.chat.completions.create(
|
|
766
|
-
model="gpt-4",
|
|
767
|
-
messages=[...]
|
|
768
|
-
)
|
|
769
|
-
|
|
770
|
-
# You get:
|
|
771
|
-
# ✓ Distributed traces across all services
|
|
772
|
-
# ✓ Cost tracking for the LLM call
|
|
773
|
-
# ✓ Performance metrics for DB, cache, vector DB
|
|
774
|
-
# ✓ GPU metrics if using local models
|
|
775
|
-
# ✓ Complete observability with zero manual instrumentation
|
|
776
|
-
```
|
|
777
|
-
|
|
778
|
-
## Backend Integration
|
|
779
|
-
|
|
780
|
-
Works with any OpenTelemetry-compatible backend:
|
|
781
|
-
- Jaeger, Zipkin
|
|
782
|
-
- Prometheus, Grafana
|
|
783
|
-
- Datadog, New Relic, Honeycomb
|
|
784
|
-
- AWS X-Ray, Google Cloud Trace
|
|
785
|
-
- Elastic APM, Splunk
|
|
786
|
-
- Self-hosted OTEL Collector
|
|
787
|
-
|
|
788
|
-
## Project Structure
|
|
789
|
-
|
|
790
|
-
```bash
|
|
791
|
-
genai-otel-instrument/
|
|
792
|
-
├── setup.py
|
|
793
|
-
├── MANIFEST.in
|
|
794
|
-
├── README.md
|
|
795
|
-
├── LICENSE
|
|
796
|
-
├── example_usage.py
|
|
797
|
-
└── genai_otel/
|
|
798
|
-
├── __init__.py
|
|
799
|
-
├── config.py
|
|
800
|
-
├── auto_instrument.py
|
|
801
|
-
├── cli.py
|
|
802
|
-
├── cost_calculator.py
|
|
803
|
-
├── gpu_metrics.py
|
|
804
|
-
├── instrumentors/
|
|
805
|
-
│ ├── __init__.py
|
|
806
|
-
│ ├── base.py
|
|
807
|
-
│ └── (other instrumentor files)
|
|
808
|
-
└── mcp_instrumentors/
|
|
809
|
-
├── __init__.py
|
|
810
|
-
├── manager.py
|
|
811
|
-
└── (other mcp files)
|
|
812
|
-
```
|
|
813
|
-
|
|
814
|
-
## Roadmap
|
|
815
|
-
|
|
816
|
-
### Next Release (v0.2.0) - Q1 2026
|
|
817
|
-
|
|
818
|
-
We're planning significant enhancements for the next major release, focusing on evaluation metrics and safety guardrails alongside completing OpenTelemetry semantic convention compliance.
|
|
819
|
-
|
|
820
|
-
#### 🎯 Evaluation & Monitoring
|
|
821
|
-
|
|
822
|
-
**LLM Output Quality Metrics**
|
|
823
|
-
- **Bias Detection** - Automatically detect and measure bias in LLM responses
|
|
824
|
-
- Gender, racial, political, and cultural bias detection
|
|
825
|
-
- Bias score metrics with configurable thresholds
|
|
826
|
-
- Integration with fairness libraries (e.g., Fairlearn, AIF360)
|
|
827
|
-
|
|
828
|
-
- **Toxicity Detection** - Monitor and alert on toxic or harmful content
|
|
829
|
-
- Perspective API integration for toxicity scoring
|
|
830
|
-
- Custom toxicity models support
|
|
831
|
-
- Real-time toxicity metrics and alerts
|
|
832
|
-
- Configurable severity levels
|
|
833
|
-
|
|
834
|
-
- **Hallucination Detection** - Track factual accuracy and groundedness
|
|
835
|
-
- Fact-checking against provided context
|
|
836
|
-
- Citation validation for RAG applications
|
|
837
|
-
- Confidence scoring for generated claims
|
|
838
|
-
- Hallucination rate metrics by model and use case
|
|
839
|
-
|
|
840
|
-
**Implementation:**
|
|
841
|
-
```python
|
|
842
|
-
import genai_otel
|
|
843
|
-
|
|
844
|
-
# Enable evaluation metrics
|
|
845
|
-
genai_otel.instrument(
|
|
846
|
-
enable_bias_detection=True,
|
|
847
|
-
enable_toxicity_detection=True,
|
|
848
|
-
enable_hallucination_detection=True,
|
|
849
|
-
|
|
850
|
-
# Configure thresholds
|
|
851
|
-
bias_threshold=0.7,
|
|
852
|
-
toxicity_threshold=0.5,
|
|
853
|
-
hallucination_threshold=0.8
|
|
854
|
-
)
|
|
855
|
-
```
|
|
856
|
-
|
|
857
|
-
**Metrics Added:**
|
|
858
|
-
- `gen_ai.eval.bias_score` - Bias detection scores (histogram)
|
|
859
|
-
- `gen_ai.eval.toxicity_score` - Toxicity scores (histogram)
|
|
860
|
-
- `gen_ai.eval.hallucination_score` - Hallucination probability (histogram)
|
|
861
|
-
- `gen_ai.eval.violations` - Count of threshold violations by type
|
|
862
|
-
|
|
863
|
-
#### 🛡️ Safety Guardrails
|
|
864
|
-
|
|
865
|
-
**Input/Output Filtering**
|
|
866
|
-
- **Prompt Injection Detection** - Protect against prompt injection attacks
|
|
867
|
-
- Pattern-based detection (jailbreaking attempts)
|
|
868
|
-
- ML-based classifier for sophisticated attacks
|
|
869
|
-
- Real-time blocking with configurable policies
|
|
870
|
-
- Attack attempt metrics and logging
|
|
871
|
-
|
|
872
|
-
- **Restricted Topics** - Block sensitive or inappropriate topics
|
|
873
|
-
- Configurable topic blacklists (legal, medical, financial advice)
|
|
874
|
-
- Industry-specific content filters
|
|
875
|
-
- Topic detection with confidence scoring
|
|
876
|
-
- Custom topic definition support
|
|
877
|
-
|
|
878
|
-
- **Sensitive Information Protection** - Prevent PII leakage
|
|
879
|
-
- PII detection (emails, phone numbers, SSN, credit cards)
|
|
880
|
-
- Automatic redaction or blocking
|
|
881
|
-
- Compliance mode (GDPR, HIPAA, PCI-DSS)
|
|
882
|
-
- Data leak prevention metrics
|
|
883
|
-
|
|
884
|
-
**Implementation:**
|
|
885
|
-
```python
|
|
886
|
-
import genai_otel
|
|
887
|
-
|
|
888
|
-
# Configure guardrails
|
|
889
|
-
genai_otel.instrument(
|
|
890
|
-
enable_prompt_injection_detection=True,
|
|
891
|
-
enable_restricted_topics=True,
|
|
892
|
-
enable_sensitive_info_detection=True,
|
|
893
|
-
|
|
894
|
-
# Custom configuration
|
|
895
|
-
restricted_topics=["medical_advice", "legal_advice", "financial_advice"],
|
|
896
|
-
pii_detection_mode="block", # or "redact", "warn"
|
|
897
|
-
|
|
898
|
-
# Callbacks for custom handling
|
|
899
|
-
on_guardrail_violation=my_violation_handler
|
|
900
|
-
)
|
|
901
|
-
```
|
|
902
|
-
|
|
903
|
-
**Metrics Added:**
|
|
904
|
-
- `gen_ai.guardrail.prompt_injection_detected` - Injection attempts blocked
|
|
905
|
-
- `gen_ai.guardrail.restricted_topic_blocked` - Restricted topic violations
|
|
906
|
-
- `gen_ai.guardrail.pii_detected` - PII detection events
|
|
907
|
-
- `gen_ai.guardrail.violations` - Total guardrail violations by type
|
|
908
|
-
|
|
909
|
-
**Span Attributes:**
|
|
910
|
-
- `gen_ai.guardrail.violation_type` - Type of violation detected
|
|
911
|
-
- `gen_ai.guardrail.violation_severity` - Severity level (low, medium, high, critical)
|
|
912
|
-
- `gen_ai.guardrail.blocked` - Whether request was blocked (boolean)
|
|
913
|
-
- `gen_ai.eval.bias_categories` - Detected bias types (array)
|
|
914
|
-
- `gen_ai.eval.toxicity_categories` - Toxicity categories (array)
|
|
915
|
-
|
|
916
|
-
#### 🔄 Migration Support
|
|
917
|
-
|
|
918
|
-
**Backward Compatibility:**
|
|
919
|
-
- All new features are opt-in via configuration
|
|
920
|
-
- Existing instrumentation continues to work unchanged
|
|
921
|
-
- Gradual migration path for new semantic conventions
|
|
922
|
-
|
|
923
|
-
**Version Support:**
|
|
924
|
-
- Python 3.9+ (evaluation features require 3.10+)
|
|
925
|
-
- OpenTelemetry SDK 1.20.0+
|
|
926
|
-
- Backward compatible with existing dashboards
|
|
927
|
-
|
|
928
|
-
### Future Releases
|
|
929
|
-
|
|
930
|
-
**v0.3.0 - Advanced Analytics**
|
|
931
|
-
- Custom metric aggregations
|
|
932
|
-
- Cost optimization recommendations
|
|
933
|
-
- Automated performance regression detection
|
|
934
|
-
- A/B testing support for prompts
|
|
935
|
-
|
|
936
|
-
**v0.4.0 - Enterprise Features**
|
|
937
|
-
- Multi-tenancy support
|
|
938
|
-
- Role-based access control for telemetry
|
|
939
|
-
- Advanced compliance reporting
|
|
940
|
-
- SLA monitoring and alerting
|
|
941
|
-
|
|
942
|
-
**Community Feedback**
|
|
943
|
-
|
|
944
|
-
We welcome feedback on our roadmap! Please:
|
|
945
|
-
- Open issues for feature requests
|
|
946
|
-
- Join discussions on prioritization
|
|
947
|
-
- Share your use cases and requirements
|
|
948
|
-
|
|
949
|
-
See [Contributing.md](Contributing.md) for how to get involved.
|
|
950
|
-
|
|
951
|
-
## License
|
|
952
|
-
|
|
1
|
+
Metadata-Version: 2.4
|
|
2
|
+
Name: genai-otel-instrument
|
|
3
|
+
Version: 0.1.16
|
|
4
|
+
Summary: Comprehensive OpenTelemetry auto-instrumentation for LLM/GenAI applications
|
|
5
|
+
Author-email: Kshitij Thakkar <kshitijthakkar@rocketmail.com>
|
|
6
|
+
License: AGPL-3.0-or-later
|
|
7
|
+
Project-URL: Homepage, https://github.com/Mandark-droid/genai_otel_instrument
|
|
8
|
+
Project-URL: Repository, https://github.com/Mandark-droid/genai_otel_instrument
|
|
9
|
+
Project-URL: Documentation, https://github.com/Mandark-droid/genai_otel_instrument#readme
|
|
10
|
+
Project-URL: Issues, https://github.com/Mandark-droid/genai_otel_instrument/issues
|
|
11
|
+
Project-URL: Changelog, https://github.com/Mandark-droid/genai_otel_instrument/blob/main/CHANGELOG.md
|
|
12
|
+
Keywords: opentelemetry,observability,llm,genai,instrumentation,tracing,metrics,monitoring
|
|
13
|
+
Classifier: Development Status :: 4 - Beta
|
|
14
|
+
Classifier: Intended Audience :: Developers
|
|
15
|
+
Classifier: Topic :: Software Development :: Libraries :: Python Modules
|
|
16
|
+
Classifier: Topic :: System :: Monitoring
|
|
17
|
+
Classifier: License :: OSI Approved :: GNU Affero General Public License v3 or later (AGPLv3+)
|
|
18
|
+
Classifier: Operating System :: OS Independent
|
|
19
|
+
Classifier: Programming Language :: Python :: 3
|
|
20
|
+
Classifier: Programming Language :: Python :: 3.9
|
|
21
|
+
Classifier: Programming Language :: Python :: 3.10
|
|
22
|
+
Classifier: Programming Language :: Python :: 3.11
|
|
23
|
+
Classifier: Programming Language :: Python :: 3.12
|
|
24
|
+
Requires-Python: >=3.9
|
|
25
|
+
Description-Content-Type: text/markdown
|
|
26
|
+
License-File: LICENSE
|
|
27
|
+
Requires-Dist: opentelemetry-api<2.0.0,>=1.20.0
|
|
28
|
+
Requires-Dist: opentelemetry-sdk<2.0.0,>=1.20.0
|
|
29
|
+
Requires-Dist: opentelemetry-instrumentation>=0.41b0
|
|
30
|
+
Requires-Dist: opentelemetry-semantic-conventions<1.0.0,>=0.45b0
|
|
31
|
+
Requires-Dist: opentelemetry-exporter-otlp>=1.20.0
|
|
32
|
+
Requires-Dist: opentelemetry-instrumentation-requests>=0.41b0
|
|
33
|
+
Requires-Dist: opentelemetry-instrumentation-httpx>=0.41b0
|
|
34
|
+
Requires-Dist: requests>=2.20.0
|
|
35
|
+
Requires-Dist: wrapt>=1.14.0
|
|
36
|
+
Requires-Dist: httpx>=0.23.0
|
|
37
|
+
Requires-Dist: opentelemetry-instrumentation-mysql>=0.41b0
|
|
38
|
+
Requires-Dist: mysql-connector-python<9.0.0,>=8.0.0
|
|
39
|
+
Requires-Dist: opentelemetry-instrumentation-psycopg2>=0.41b0
|
|
40
|
+
Requires-Dist: psycopg2-binary>=2.9.0
|
|
41
|
+
Requires-Dist: opentelemetry-instrumentation-redis>=0.41b0
|
|
42
|
+
Requires-Dist: redis
|
|
43
|
+
Requires-Dist: opentelemetry-instrumentation-pymongo>=0.41b0
|
|
44
|
+
Requires-Dist: pymongo
|
|
45
|
+
Requires-Dist: opentelemetry-instrumentation-sqlalchemy>=0.41b0
|
|
46
|
+
Requires-Dist: sqlalchemy>=1.4.0
|
|
47
|
+
Requires-Dist: opentelemetry-instrumentation-kafka-python>=0.41b0
|
|
48
|
+
Requires-Dist: kafka-python
|
|
49
|
+
Provides-Extra: openinference
|
|
50
|
+
Requires-Dist: openinference-instrumentation==0.1.31; extra == "openinference"
|
|
51
|
+
Requires-Dist: openinference-instrumentation-litellm==0.1.19; extra == "openinference"
|
|
52
|
+
Requires-Dist: openinference-instrumentation-mcp==1.3.0; extra == "openinference"
|
|
53
|
+
Requires-Dist: openinference-instrumentation-smolagents==0.1.11; extra == "openinference"
|
|
54
|
+
Requires-Dist: litellm>=1.0.0; extra == "openinference"
|
|
55
|
+
Provides-Extra: gpu
|
|
56
|
+
Requires-Dist: nvidia-ml-py>=11.495.46; extra == "gpu"
|
|
57
|
+
Requires-Dist: codecarbon>=2.3.0; extra == "gpu"
|
|
58
|
+
Provides-Extra: co2
|
|
59
|
+
Requires-Dist: codecarbon>=2.3.0; extra == "co2"
|
|
60
|
+
Provides-Extra: openai
|
|
61
|
+
Requires-Dist: openai>=1.0.0; extra == "openai"
|
|
62
|
+
Provides-Extra: anthropic
|
|
63
|
+
Requires-Dist: anthropic>=0.18.0; extra == "anthropic"
|
|
64
|
+
Provides-Extra: google
|
|
65
|
+
Requires-Dist: google-generativeai>=0.3.0; extra == "google"
|
|
66
|
+
Provides-Extra: aws
|
|
67
|
+
Requires-Dist: boto3>=1.28.0; extra == "aws"
|
|
68
|
+
Provides-Extra: azure
|
|
69
|
+
Requires-Dist: azure-ai-openai>=1.0.0; extra == "azure"
|
|
70
|
+
Provides-Extra: cohere
|
|
71
|
+
Requires-Dist: cohere>=4.0.0; extra == "cohere"
|
|
72
|
+
Provides-Extra: mistral
|
|
73
|
+
Requires-Dist: mistralai>=0.4.2; extra == "mistral"
|
|
74
|
+
Provides-Extra: together
|
|
75
|
+
Requires-Dist: together>=0.2.0; extra == "together"
|
|
76
|
+
Provides-Extra: groq
|
|
77
|
+
Requires-Dist: groq>=0.4.0; extra == "groq"
|
|
78
|
+
Provides-Extra: ollama
|
|
79
|
+
Requires-Dist: ollama>=0.1.0; extra == "ollama"
|
|
80
|
+
Provides-Extra: replicate
|
|
81
|
+
Requires-Dist: replicate>=0.15.0; extra == "replicate"
|
|
82
|
+
Provides-Extra: langchain
|
|
83
|
+
Requires-Dist: langchain>=0.1.0; extra == "langchain"
|
|
84
|
+
Provides-Extra: llamaindex
|
|
85
|
+
Requires-Dist: llama-index>=0.9.0; extra == "llamaindex"
|
|
86
|
+
Provides-Extra: huggingface
|
|
87
|
+
Requires-Dist: transformers>=4.30.0; extra == "huggingface"
|
|
88
|
+
Provides-Extra: databases
|
|
89
|
+
Requires-Dist: opentelemetry-instrumentation-sqlalchemy>=0.41b0; extra == "databases"
|
|
90
|
+
Requires-Dist: sqlalchemy>=1.4.0; extra == "databases"
|
|
91
|
+
Requires-Dist: opentelemetry-instrumentation-redis>=0.41b0; extra == "databases"
|
|
92
|
+
Requires-Dist: redis; extra == "databases"
|
|
93
|
+
Requires-Dist: opentelemetry-instrumentation-pymongo>=0.41b0; extra == "databases"
|
|
94
|
+
Requires-Dist: pymongo; extra == "databases"
|
|
95
|
+
Requires-Dist: opentelemetry-instrumentation-psycopg2>=0.41b0; extra == "databases"
|
|
96
|
+
Requires-Dist: psycopg2-binary>=2.9.0; extra == "databases"
|
|
97
|
+
Requires-Dist: opentelemetry-instrumentation-mysql>=0.41b0; extra == "databases"
|
|
98
|
+
Requires-Dist: mysql-connector-python<9.0.0,>=8.0.0; extra == "databases"
|
|
99
|
+
Provides-Extra: messaging
|
|
100
|
+
Requires-Dist: opentelemetry-instrumentation-kafka-python>=0.41b0; extra == "messaging"
|
|
101
|
+
Requires-Dist: kafka-python; extra == "messaging"
|
|
102
|
+
Provides-Extra: vector-dbs
|
|
103
|
+
Requires-Dist: pinecone>=3.0.0; extra == "vector-dbs"
|
|
104
|
+
Requires-Dist: weaviate-client>=3.0.0; extra == "vector-dbs"
|
|
105
|
+
Requires-Dist: qdrant-client>=1.0.0; extra == "vector-dbs"
|
|
106
|
+
Requires-Dist: chromadb>=0.4.0; extra == "vector-dbs"
|
|
107
|
+
Requires-Dist: pymilvus>=2.3.0; extra == "vector-dbs"
|
|
108
|
+
Requires-Dist: faiss-cpu>=1.7.0; extra == "vector-dbs"
|
|
109
|
+
Provides-Extra: all-providers
|
|
110
|
+
Requires-Dist: openai>=1.0.0; extra == "all-providers"
|
|
111
|
+
Requires-Dist: anthropic>=0.18.0; extra == "all-providers"
|
|
112
|
+
Requires-Dist: google-generativeai>=0.3.0; extra == "all-providers"
|
|
113
|
+
Requires-Dist: boto3>=1.28.0; extra == "all-providers"
|
|
114
|
+
Requires-Dist: azure-ai-openai>=1.0.0; extra == "all-providers"
|
|
115
|
+
Requires-Dist: cohere>=4.0.0; extra == "all-providers"
|
|
116
|
+
Requires-Dist: mistralai>=0.4.2; extra == "all-providers"
|
|
117
|
+
Requires-Dist: together>=0.2.0; extra == "all-providers"
|
|
118
|
+
Requires-Dist: groq>=0.4.0; extra == "all-providers"
|
|
119
|
+
Requires-Dist: ollama>=0.1.0; extra == "all-providers"
|
|
120
|
+
Requires-Dist: replicate>=0.15.0; extra == "all-providers"
|
|
121
|
+
Requires-Dist: langchain>=0.1.0; extra == "all-providers"
|
|
122
|
+
Requires-Dist: llama-index>=0.9.0; extra == "all-providers"
|
|
123
|
+
Requires-Dist: transformers>=4.30.0; extra == "all-providers"
|
|
124
|
+
Requires-Dist: litellm>=1.0.0; extra == "all-providers"
|
|
125
|
+
Provides-Extra: all-mcp
|
|
126
|
+
Requires-Dist: opentelemetry-instrumentation-sqlalchemy>=0.41b0; extra == "all-mcp"
|
|
127
|
+
Requires-Dist: opentelemetry-instrumentation-redis>=0.41b0; extra == "all-mcp"
|
|
128
|
+
Requires-Dist: opentelemetry-instrumentation-pymongo>=0.41b0; extra == "all-mcp"
|
|
129
|
+
Requires-Dist: opentelemetry-instrumentation-psycopg2>=0.41b0; extra == "all-mcp"
|
|
130
|
+
Requires-Dist: opentelemetry-instrumentation-mysql>=0.41b0; extra == "all-mcp"
|
|
131
|
+
Requires-Dist: opentelemetry-instrumentation-kafka-python>=0.41b0; extra == "all-mcp"
|
|
132
|
+
Requires-Dist: pinecone>=3.0.0; extra == "all-mcp"
|
|
133
|
+
Requires-Dist: weaviate-client>=3.0.0; extra == "all-mcp"
|
|
134
|
+
Requires-Dist: qdrant-client>=1.0.0; extra == "all-mcp"
|
|
135
|
+
Requires-Dist: chromadb>=0.4.0; extra == "all-mcp"
|
|
136
|
+
Requires-Dist: pymilvus>=2.3.0; extra == "all-mcp"
|
|
137
|
+
Requires-Dist: faiss-cpu>=1.7.0; extra == "all-mcp"
|
|
138
|
+
Requires-Dist: sqlalchemy; extra == "all-mcp"
|
|
139
|
+
Provides-Extra: all
|
|
140
|
+
Requires-Dist: openai>=1.0.0; extra == "all"
|
|
141
|
+
Requires-Dist: anthropic>=0.18.0; extra == "all"
|
|
142
|
+
Requires-Dist: google-generativeai>=0.3.0; extra == "all"
|
|
143
|
+
Requires-Dist: boto3>=1.28.0; extra == "all"
|
|
144
|
+
Requires-Dist: azure-ai-openai>=1.0.0; extra == "all"
|
|
145
|
+
Requires-Dist: cohere>=4.0.0; extra == "all"
|
|
146
|
+
Requires-Dist: mistralai>=0.4.2; extra == "all"
|
|
147
|
+
Requires-Dist: together>=0.2.0; extra == "all"
|
|
148
|
+
Requires-Dist: groq>=0.4.0; extra == "all"
|
|
149
|
+
Requires-Dist: ollama>=0.1.0; extra == "all"
|
|
150
|
+
Requires-Dist: replicate>=0.15.0; extra == "all"
|
|
151
|
+
Requires-Dist: langchain>=0.1.0; extra == "all"
|
|
152
|
+
Requires-Dist: llama-index>=0.9.0; extra == "all"
|
|
153
|
+
Requires-Dist: transformers>=4.30.0; extra == "all"
|
|
154
|
+
Requires-Dist: nvidia-ml-py>=11.495.46; extra == "all"
|
|
155
|
+
Requires-Dist: opentelemetry-instrumentation-sqlalchemy>=0.41b0; extra == "all"
|
|
156
|
+
Requires-Dist: opentelemetry-instrumentation-redis>=0.41b0; extra == "all"
|
|
157
|
+
Requires-Dist: opentelemetry-instrumentation-pymongo>=0.41b0; extra == "all"
|
|
158
|
+
Requires-Dist: opentelemetry-instrumentation-psycopg2>=0.41b0; extra == "all"
|
|
159
|
+
Requires-Dist: opentelemetry-instrumentation-mysql>=0.41b0; extra == "all"
|
|
160
|
+
Requires-Dist: opentelemetry-instrumentation-kafka-python>=0.41b0; extra == "all"
|
|
161
|
+
Requires-Dist: pinecone>=3.0.0; extra == "all"
|
|
162
|
+
Requires-Dist: weaviate-client>=3.0.0; extra == "all"
|
|
163
|
+
Requires-Dist: qdrant-client>=1.0.0; extra == "all"
|
|
164
|
+
Requires-Dist: chromadb>=0.4.0; extra == "all"
|
|
165
|
+
Requires-Dist: pymilvus>=2.3.0; extra == "all"
|
|
166
|
+
Requires-Dist: faiss-cpu>=1.7.0; extra == "all"
|
|
167
|
+
Requires-Dist: sqlalchemy; extra == "all"
|
|
168
|
+
Provides-Extra: dev
|
|
169
|
+
Requires-Dist: pytest>=7.0.0; extra == "dev"
|
|
170
|
+
Requires-Dist: pytest-cov>=4.0.0; extra == "dev"
|
|
171
|
+
Requires-Dist: pytest-mock>=3.10.0; extra == "dev"
|
|
172
|
+
Requires-Dist: pytest-asyncio>=0.21.0; extra == "dev"
|
|
173
|
+
Requires-Dist: black>=23.0.0; extra == "dev"
|
|
174
|
+
Requires-Dist: isort>=5.12.0; extra == "dev"
|
|
175
|
+
Requires-Dist: pylint>=2.17.0; extra == "dev"
|
|
176
|
+
Requires-Dist: mypy>=1.0.0; extra == "dev"
|
|
177
|
+
Requires-Dist: build>=0.10.0; extra == "dev"
|
|
178
|
+
Requires-Dist: twine>=4.0.0; extra == "dev"
|
|
179
|
+
Dynamic: license-file
|
|
180
|
+
|
|
181
|
+
# TraceVerde
|
|
182
|
+
|
|
183
|
+
<div align="center">
|
|
184
|
+
<img src=".github/images/Logo.jpg" alt="TraceVerde - GenAI OpenTelemetry Instrumentation Logo" width="400"/>
|
|
185
|
+
</div>
|
|
186
|
+
|
|
187
|
+
<br/>
|
|
188
|
+
|
|
189
|
+
[](https://badge.fury.io/py/genai-otel-instrument)
|
|
190
|
+
[](https://pypi.org/project/genai-otel-instrument/)
|
|
191
|
+
[](https://www.gnu.org/licenses/agpl-3.0)
|
|
192
|
+
[](https://pepy.tech/project/genai-otel-instrument)
|
|
193
|
+
[](https://pepy.tech/project/genai-otel-instrument)
|
|
194
|
+
|
|
195
|
+
[](https://github.com/Mandark-droid/genai_otel_instrument)
|
|
196
|
+
[](https://github.com/Mandark-droid/genai_otel_instrument)
|
|
197
|
+
[](https://github.com/Mandark-droid/genai_otel_instrument/issues)
|
|
198
|
+
[](https://github.com/Mandark-droid/genai_otel_instrument/pulls)
|
|
199
|
+
|
|
200
|
+
[](https://github.com/Mandark-droid/genai_otel_instrument)
|
|
201
|
+
[](https://github.com/psf/black)
|
|
202
|
+
[](https://pycqa.github.io/isort/)
|
|
203
|
+
[](http://mypy-lang.org/)
|
|
204
|
+
|
|
205
|
+
[](https://opentelemetry.io/)
|
|
206
|
+
[](https://opentelemetry.io/docs/specs/semconv/gen-ai/)
|
|
207
|
+
[](https://github.com/Mandark-droid/genai_otel_instrument/actions)
|
|
208
|
+
|
|
209
|
+
---
|
|
210
|
+
|
|
211
|
+
<div align="center">
|
|
212
|
+
<img src=".github/images/Landing_Page.jpg" alt="GenAI OpenTelemetry Instrumentation Overview" width="800"/>
|
|
213
|
+
</div>
|
|
214
|
+
|
|
215
|
+
---
|
|
216
|
+
|
|
217
|
+
Production-ready OpenTelemetry instrumentation for GenAI/LLM applications with zero-code setup.
|
|
218
|
+
|
|
219
|
+
## Features
|
|
220
|
+
|
|
221
|
+
🚀 **Zero-Code Instrumentation** - Just install and set env vars
|
|
222
|
+
🤖 **15+ LLM Providers** - OpenAI, Anthropic, Google, AWS, Azure, and more
|
|
223
|
+
🔧 **MCP Tool Support** - Auto-instrument databases, APIs, caches, vector DBs
|
|
224
|
+
💰 **Cost Tracking** - Automatic cost calculation for both streaming and non-streaming requests
|
|
225
|
+
⚡ **Streaming Support** - Full observability for streaming responses with TTFT/TBT metrics and cost tracking
|
|
226
|
+
🎮 **GPU Metrics** - Real-time GPU utilization, memory, temperature, power, and electricity cost tracking
|
|
227
|
+
📊 **Complete Observability** - Traces, metrics, and rich span attributes
|
|
228
|
+
➕ **Service Instance ID & Environment** - Identify your services and environments
|
|
229
|
+
⏱️ **Configurable Exporter Timeout** - Set timeout for OTLP exporter
|
|
230
|
+
🔗 **OpenInference Instrumentors** - Smolagents, MCP, and LiteLLM instrumentation
|
|
231
|
+
|
|
232
|
+
## Quick Start
|
|
233
|
+
|
|
234
|
+
### Installation
|
|
235
|
+
|
|
236
|
+
```bash
|
|
237
|
+
pip install genai-otel-instrument
|
|
238
|
+
```
|
|
239
|
+
|
|
240
|
+
### Usage
|
|
241
|
+
|
|
242
|
+
**Option 1: Environment Variables (No code changes)**
|
|
243
|
+
|
|
244
|
+
```bash
|
|
245
|
+
export OTEL_SERVICE_NAME=my-llm-app
|
|
246
|
+
export OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4318
|
|
247
|
+
python your_app.py
|
|
248
|
+
```
|
|
249
|
+
|
|
250
|
+
**Option 2: One line of code**
|
|
251
|
+
|
|
252
|
+
```python
|
|
253
|
+
import genai_otel
|
|
254
|
+
genai_otel.instrument()
|
|
255
|
+
|
|
256
|
+
# Your existing code works unchanged
|
|
257
|
+
import openai
|
|
258
|
+
client = openai.OpenAI()
|
|
259
|
+
response = client.chat.completions.create(...)
|
|
260
|
+
```
|
|
261
|
+
|
|
262
|
+
**Option 3: CLI wrapper**
|
|
263
|
+
|
|
264
|
+
```bash
|
|
265
|
+
genai-instrument python your_app.py
|
|
266
|
+
```
|
|
267
|
+
|
|
268
|
+
For a more comprehensive demonstration of various LLM providers and MCP tools, refer to `example_usage.py` in the project root. Note that running this example requires setting up relevant API keys and external services (e.g., databases, Redis, Pinecone).
|
|
269
|
+
|
|
270
|
+
## What Gets Instrumented?
|
|
271
|
+
|
|
272
|
+
### LLM Providers (Auto-detected)
|
|
273
|
+
- **With Full Cost Tracking**: OpenAI, Anthropic, Google AI, AWS Bedrock, Azure OpenAI, Cohere, Mistral AI, Together AI, Groq, Ollama, Vertex AI
|
|
274
|
+
- **Hardware/Local Pricing**: Replicate (hardware-based $/second), HuggingFace (local execution with estimated costs)
|
|
275
|
+
- **HuggingFace Support**: `pipeline()`, `AutoModelForCausalLM.generate()`, `AutoModelForSeq2SeqLM.generate()`, `InferenceClient` API calls
|
|
276
|
+
- **Other Providers**: Anyscale
|
|
277
|
+
|
|
278
|
+
### Frameworks
|
|
279
|
+
- LangChain (chains, agents, tools)
|
|
280
|
+
- LlamaIndex (query engines, indices)
|
|
281
|
+
|
|
282
|
+
### MCP Tools (Model Context Protocol)
|
|
283
|
+
- **Databases**: PostgreSQL, MySQL, MongoDB, SQLAlchemy
|
|
284
|
+
- **Caching**: Redis
|
|
285
|
+
- **Message Queues**: Apache Kafka
|
|
286
|
+
- **Vector Databases**: Pinecone, Weaviate, Qdrant, ChromaDB, Milvus, FAISS
|
|
287
|
+
- **APIs**: HTTP/REST requests (requests, httpx)
|
|
288
|
+
|
|
289
|
+
### OpenInference (Optional - Python 3.10+ only)
|
|
290
|
+
- Smolagents - HuggingFace smolagents framework tracing
|
|
291
|
+
- MCP - Model Context Protocol instrumentation
|
|
292
|
+
- LiteLLM - Multi-provider LLM proxy
|
|
293
|
+
|
|
294
|
+
**Cost Enrichment:** OpenInference instrumentors are automatically enriched with cost tracking! When cost tracking is enabled (`GENAI_ENABLE_COST_TRACKING=true`), a custom `CostEnrichmentSpanProcessor` extracts model and token usage from OpenInference spans and adds cost attributes (`gen_ai.usage.cost.total`, `gen_ai.usage.cost.prompt`, `gen_ai.usage.cost.completion`) using our comprehensive pricing database of 145+ models.
|
|
295
|
+
|
|
296
|
+
The processor supports OpenInference semantic conventions:
|
|
297
|
+
- Model: `llm.model_name`, `embedding.model_name`
|
|
298
|
+
- Tokens: `llm.token_count.prompt`, `llm.token_count.completion`
|
|
299
|
+
- Operations: `openinference.span.kind` (LLM, EMBEDDING, CHAIN, RETRIEVER, etc.)
|
|
300
|
+
|
|
301
|
+
**Note:** OpenInference instrumentors require Python >= 3.10. Install with:
|
|
302
|
+
```bash
|
|
303
|
+
pip install genai-otel-instrument[openinference]
|
|
304
|
+
```
|
|
305
|
+
|
|
306
|
+
## Screenshots
|
|
307
|
+
|
|
308
|
+
See the instrumentation in action across different LLM providers and observability backends.
|
|
309
|
+
|
|
310
|
+
### OpenAI Instrumentation
|
|
311
|
+
Full trace capture for OpenAI API calls with token usage, costs, and latency metrics.
|
|
312
|
+
|
|
313
|
+
<div align="center">
|
|
314
|
+
<img src=".github/images/Screenshots/Traces_OpenAI.png" alt="OpenAI Traces" width="900"/>
|
|
315
|
+
</div>
|
|
316
|
+
|
|
317
|
+
### Ollama (Local LLM) Instrumentation
|
|
318
|
+
Zero-code instrumentation for local models running on Ollama with comprehensive observability.
|
|
319
|
+
|
|
320
|
+
<div align="center">
|
|
321
|
+
<img src=".github/images/Screenshots/Traces_Ollama.png" alt="Ollama Traces" width="900"/>
|
|
322
|
+
</div>
|
|
323
|
+
|
|
324
|
+
### HuggingFace Transformers
|
|
325
|
+
Direct instrumentation of HuggingFace Transformers with automatic token counting and cost estimation.
|
|
326
|
+
|
|
327
|
+
<div align="center">
|
|
328
|
+
<img src=".github/images/Screenshots/Trace_HuggingFace_Transformer_Models.png" alt="HuggingFace Transformer Traces" width="900"/>
|
|
329
|
+
</div>
|
|
330
|
+
|
|
331
|
+
### SmolAgents Framework
|
|
332
|
+
Complete agent workflow tracing with tool calls, iterations, and cost breakdown.
|
|
333
|
+
|
|
334
|
+
<div align="center">
|
|
335
|
+
<img src=".github/images/Screenshots/Traces_SmolAgent_with_tool_calls.png" alt="SmolAgent Traces with Tool Calls" width="900"/>
|
|
336
|
+
</div>
|
|
337
|
+
|
|
338
|
+
### GPU Metrics Collection
|
|
339
|
+
Real-time GPU utilization, memory, temperature, and power consumption metrics.
|
|
340
|
+
|
|
341
|
+
<div align="center">
|
|
342
|
+
<img src=".github/images/Screenshots/GPU_Metrics.png" alt="GPU Metrics Dashboard" width="900"/>
|
|
343
|
+
</div>
|
|
344
|
+
|
|
345
|
+
### Additional Screenshots
|
|
346
|
+
|
|
347
|
+
- **[Token Cost Breakdown](.github/images/Screenshots/Traces_SmolAgent_Token_Cost_breakdown.png)** - Detailed token usage and cost analysis for SmolAgent workflows
|
|
348
|
+
- **[OpenSearch Dashboard](.github/images/Screenshots/GENAI_OpenSearch_output.png)** - GenAI metrics visualization in OpenSearch/Kibana
|
|
349
|
+
|
|
350
|
+
---
|
|
351
|
+
|
|
352
|
+
## Demo Video
|
|
353
|
+
|
|
354
|
+
Watch a comprehensive walkthrough of GenAI OpenTelemetry Auto-Instrumentation in action, demonstrating setup, configuration, and real-time observability across multiple LLM providers.
|
|
355
|
+
|
|
356
|
+
<div align="center">
|
|
357
|
+
|
|
358
|
+
**🎥 [Watch Demo Video](https://youtu.be/YOUR_VIDEO_ID_HERE)**
|
|
359
|
+
*(Coming Soon)*
|
|
360
|
+
|
|
361
|
+
</div>
|
|
362
|
+
|
|
363
|
+
---
|
|
364
|
+
|
|
365
|
+
## Cost Tracking Coverage
|
|
366
|
+
|
|
367
|
+
The library includes comprehensive cost tracking with pricing data for **145+ models** across **11 providers**:
|
|
368
|
+
|
|
369
|
+
### Providers with Full Token-Based Cost Tracking
|
|
370
|
+
- **OpenAI**: GPT-4o, GPT-4 Turbo, GPT-3.5 Turbo, o1/o3 series, embeddings, audio, vision (35+ models)
|
|
371
|
+
- **Anthropic**: Claude 3.5 Sonnet/Opus/Haiku, Claude 3 series (10+ models)
|
|
372
|
+
- **Google AI**: Gemini 1.5/2.0 Pro/Flash, PaLM 2 (12+ models)
|
|
373
|
+
- **AWS Bedrock**: Amazon Titan, Claude, Llama, Mistral models (20+ models)
|
|
374
|
+
- **Azure OpenAI**: Same as OpenAI with Azure-specific pricing
|
|
375
|
+
- **Cohere**: Command R/R+, Command Light, Embed v3/v2 (8+ models)
|
|
376
|
+
- **Mistral AI**: Mistral Large/Medium/Small, Mixtral, embeddings (8+ models)
|
|
377
|
+
- **Together AI**: DeepSeek-R1, Llama 3.x, Qwen, Mixtral (25+ models)
|
|
378
|
+
- **Groq**: Llama 3.x series, Mixtral, Gemma models (15+ models)
|
|
379
|
+
- **Ollama**: Local models with token tracking (pricing via cost estimation)
|
|
380
|
+
- **Vertex AI**: Gemini models via Google Cloud with usage metadata extraction
|
|
381
|
+
|
|
382
|
+
### Special Pricing Models
|
|
383
|
+
- **Replicate**: Hardware-based pricing ($/second of GPU/CPU time) - not token-based
|
|
384
|
+
- **HuggingFace Transformers**: Local model execution with estimated costs based on parameter count
|
|
385
|
+
- Supports `pipeline()`, `AutoModelForCausalLM.generate()`, `AutoModelForSeq2SeqLM.generate()`
|
|
386
|
+
- Cost estimation uses GPU/compute resource pricing tiers (tiny/small/medium/large)
|
|
387
|
+
- Automatic token counting from tensor shapes
|
|
388
|
+
|
|
389
|
+
### Pricing Features
|
|
390
|
+
- **Differential Pricing**: Separate rates for prompt tokens vs. completion tokens
|
|
391
|
+
- **Reasoning Tokens**: Special pricing for OpenAI o1/o3 reasoning tokens
|
|
392
|
+
- **Cache Pricing**: Anthropic prompt caching costs (read/write)
|
|
393
|
+
- **Granular Cost Metrics**: Per-request cost breakdown by token type
|
|
394
|
+
- **Auto-Updated Pricing**: Pricing data maintained in `llm_pricing.json`
|
|
395
|
+
- **Custom Pricing**: Add pricing for custom/proprietary models via environment variable
|
|
396
|
+
|
|
397
|
+
### Adding Custom Model Pricing
|
|
398
|
+
|
|
399
|
+
For custom or proprietary models not in `llm_pricing.json`, you can provide custom pricing via the `GENAI_CUSTOM_PRICING_JSON` environment variable:
|
|
400
|
+
|
|
401
|
+
```bash
|
|
402
|
+
# For chat models
|
|
403
|
+
export GENAI_CUSTOM_PRICING_JSON='{"chat":{"my-custom-model":{"promptPrice":0.001,"completionPrice":0.002}}}'
|
|
404
|
+
|
|
405
|
+
# For embeddings
|
|
406
|
+
export GENAI_CUSTOM_PRICING_JSON='{"embeddings":{"my-custom-embeddings":0.00005}}'
|
|
407
|
+
|
|
408
|
+
# For multiple categories
|
|
409
|
+
export GENAI_CUSTOM_PRICING_JSON='{
|
|
410
|
+
"chat": {
|
|
411
|
+
"my-custom-chat": {"promptPrice": 0.001, "completionPrice": 0.002}
|
|
412
|
+
},
|
|
413
|
+
"embeddings": {
|
|
414
|
+
"my-custom-embed": 0.00005
|
|
415
|
+
},
|
|
416
|
+
"audio": {
|
|
417
|
+
"my-custom-tts": 0.02
|
|
418
|
+
}
|
|
419
|
+
}'
|
|
420
|
+
```
|
|
421
|
+
|
|
422
|
+
**Pricing Format:**
|
|
423
|
+
- **Chat models**: `{"promptPrice": <$/1k tokens>, "completionPrice": <$/1k tokens>}`
|
|
424
|
+
- **Embeddings**: Single number for price per 1k tokens
|
|
425
|
+
- **Audio**: Price per 1k characters (TTS) or per second (STT)
|
|
426
|
+
- **Images**: Nested structure with quality/size pricing (see `llm_pricing.json` for examples)
|
|
427
|
+
|
|
428
|
+
**Hybrid Pricing:** Custom prices are merged with default pricing from `llm_pricing.json`. If you provide custom pricing for an existing model, the custom price overrides the default.
|
|
429
|
+
|
|
430
|
+
**Coverage Statistics**: As of v0.1.3, 89% test coverage with 415 passing tests, including comprehensive cost calculation validation and cost enrichment processor tests (supporting both GenAI and OpenInference semantic conventions).
|
|
431
|
+
|
|
432
|
+
## Collected Telemetry
|
|
433
|
+
|
|
434
|
+
### Traces
|
|
435
|
+
Every LLM call, database query, API request, and vector search is traced with full context propagation.
|
|
436
|
+
|
|
437
|
+
### Metrics
|
|
438
|
+
|
|
439
|
+
**GenAI Metrics:**
|
|
440
|
+
- `gen_ai.requests` - Request counts by provider and model
|
|
441
|
+
- `gen_ai.client.token.usage` - Token usage (prompt/completion)
|
|
442
|
+
- `gen_ai.client.operation.duration` - Request latency histogram (optimized buckets for LLM workloads)
|
|
443
|
+
- `gen_ai.usage.cost` - Total estimated costs in USD
|
|
444
|
+
- `gen_ai.usage.cost.prompt` - Prompt tokens cost (granular)
|
|
445
|
+
- `gen_ai.usage.cost.completion` - Completion tokens cost (granular)
|
|
446
|
+
- `gen_ai.usage.cost.reasoning` - Reasoning tokens cost (OpenAI o1 models)
|
|
447
|
+
- `gen_ai.usage.cost.cache_read` - Cache read cost (Anthropic)
|
|
448
|
+
- `gen_ai.usage.cost.cache_write` - Cache write cost (Anthropic)
|
|
449
|
+
- `gen_ai.client.errors` - Error counts by operation and type
|
|
450
|
+
- `gen_ai.gpu.*` - GPU utilization, memory, temperature, power (ObservableGauges)
|
|
451
|
+
- `gen_ai.co2.emissions` - CO2 emissions tracking (opt-in via `GENAI_ENABLE_CO2_TRACKING`)
|
|
452
|
+
- `gen_ai.power.cost` - Cumulative electricity cost in USD based on GPU power consumption (configurable via `GENAI_POWER_COST_PER_KWH`)
|
|
453
|
+
- `gen_ai.server.ttft` - Time to First Token for streaming responses (histogram, 1ms-10s buckets)
|
|
454
|
+
- `gen_ai.server.tbt` - Time Between Tokens for streaming responses (histogram, 10ms-2.5s buckets)
|
|
455
|
+
|
|
456
|
+
**MCP Metrics (Database Operations):**
|
|
457
|
+
- `mcp.requests` - Number of MCP/database requests
|
|
458
|
+
- `mcp.client.operation.duration` - Operation duration histogram (1ms to 10s buckets)
|
|
459
|
+
- `mcp.request.size` - Request payload size histogram (100B to 5MB buckets)
|
|
460
|
+
- `mcp.response.size` - Response payload size histogram (100B to 5MB buckets)
|
|
461
|
+
|
|
462
|
+
### Span Attributes
|
|
463
|
+
**Core Attributes:**
|
|
464
|
+
- `gen_ai.system` - Provider name (e.g., "openai")
|
|
465
|
+
- `gen_ai.operation.name` - Operation type (e.g., "chat")
|
|
466
|
+
- `gen_ai.request.model` - Model identifier
|
|
467
|
+
- `gen_ai.usage.prompt_tokens` / `gen_ai.usage.input_tokens` - Input tokens (dual emission supported)
|
|
468
|
+
- `gen_ai.usage.completion_tokens` / `gen_ai.usage.output_tokens` - Output tokens (dual emission supported)
|
|
469
|
+
- `gen_ai.usage.total_tokens` - Total tokens
|
|
470
|
+
|
|
471
|
+
**Request Parameters:**
|
|
472
|
+
- `gen_ai.request.temperature` - Temperature setting
|
|
473
|
+
- `gen_ai.request.top_p` - Top-p sampling
|
|
474
|
+
- `gen_ai.request.max_tokens` - Max tokens requested
|
|
475
|
+
- `gen_ai.request.frequency_penalty` - Frequency penalty
|
|
476
|
+
- `gen_ai.request.presence_penalty` - Presence penalty
|
|
477
|
+
|
|
478
|
+
**Response Attributes:**
|
|
479
|
+
- `gen_ai.response.id` - Response ID from provider
|
|
480
|
+
- `gen_ai.response.model` - Actual model used (may differ from request)
|
|
481
|
+
- `gen_ai.response.finish_reasons` - Array of finish reasons
|
|
482
|
+
|
|
483
|
+
**Tool/Function Calls:**
|
|
484
|
+
- `llm.tools` - JSON-serialized tool definitions
|
|
485
|
+
- `llm.output_messages.{choice}.message.tool_calls.{index}.tool_call.id` - Tool call ID
|
|
486
|
+
- `llm.output_messages.{choice}.message.tool_calls.{index}.tool_call.function.name` - Function name
|
|
487
|
+
- `llm.output_messages.{choice}.message.tool_calls.{index}.tool_call.function.arguments` - Function arguments
|
|
488
|
+
|
|
489
|
+
**Cost Attributes (granular):**
|
|
490
|
+
- `gen_ai.usage.cost.total` - Total cost
|
|
491
|
+
- `gen_ai.usage.cost.prompt` - Prompt tokens cost
|
|
492
|
+
- `gen_ai.usage.cost.completion` - Completion tokens cost
|
|
493
|
+
- `gen_ai.usage.cost.reasoning` - Reasoning tokens cost (o1 models)
|
|
494
|
+
- `gen_ai.usage.cost.cache_read` - Cache read cost (Anthropic)
|
|
495
|
+
- `gen_ai.usage.cost.cache_write` - Cache write cost (Anthropic)
|
|
496
|
+
|
|
497
|
+
**Streaming Attributes:**
|
|
498
|
+
- `gen_ai.server.ttft` - Time to First Token (seconds) for streaming responses
|
|
499
|
+
- `gen_ai.streaming.token_count` - Total number of chunks in streaming response
|
|
500
|
+
- `gen_ai.usage.prompt_tokens` - Actual prompt tokens (extracted from final chunk)
|
|
501
|
+
- `gen_ai.usage.completion_tokens` - Actual completion tokens (extracted from final chunk)
|
|
502
|
+
- `gen_ai.usage.total_tokens` - Total tokens (extracted from final chunk)
|
|
503
|
+
- `gen_ai.usage.cost.total` - Total cost for streaming request
|
|
504
|
+
- `gen_ai.usage.cost.prompt` - Prompt tokens cost for streaming request
|
|
505
|
+
- `gen_ai.usage.cost.completion` - Completion tokens cost for streaming request
|
|
506
|
+
- All granular cost attributes (reasoning, cache_read, cache_write) also available for streaming
|
|
507
|
+
|
|
508
|
+
**Content Events (opt-in):**
|
|
509
|
+
- `gen_ai.prompt.{index}` events with role and content
|
|
510
|
+
- `gen_ai.completion.{index}` events with role and content
|
|
511
|
+
|
|
512
|
+
**Additional:**
|
|
513
|
+
- Database, vector DB, and API attributes from MCP instrumentation
|
|
514
|
+
|
|
515
|
+
## Configuration
|
|
516
|
+
|
|
517
|
+
### Environment Variables
|
|
518
|
+
|
|
519
|
+
```bash
|
|
520
|
+
# Required
|
|
521
|
+
OTEL_SERVICE_NAME=my-app
|
|
522
|
+
OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4318
|
|
523
|
+
|
|
524
|
+
# Optional
|
|
525
|
+
OTEL_EXPORTER_OTLP_HEADERS=x-api-key=secret
|
|
526
|
+
GENAI_ENABLE_GPU_METRICS=true
|
|
527
|
+
GENAI_ENABLE_COST_TRACKING=true
|
|
528
|
+
GENAI_ENABLE_MCP_INSTRUMENTATION=true
|
|
529
|
+
GENAI_GPU_COLLECTION_INTERVAL=5 # GPU metrics collection interval in seconds (default: 5)
|
|
530
|
+
OTEL_SERVICE_INSTANCE_ID=instance-1 # Optional service instance id
|
|
531
|
+
OTEL_ENVIRONMENT=production # Optional environment
|
|
532
|
+
OTEL_EXPORTER_OTLP_TIMEOUT=10.0 # Optional timeout for OTLP exporter
|
|
533
|
+
|
|
534
|
+
# Semantic conventions (NEW)
|
|
535
|
+
OTEL_SEMCONV_STABILITY_OPT_IN=gen_ai # "gen_ai" for new conventions only, "gen_ai/dup" for dual emission
|
|
536
|
+
GENAI_ENABLE_CONTENT_CAPTURE=false # WARNING: May capture sensitive data. Enable with caution.
|
|
537
|
+
|
|
538
|
+
# Logging configuration
|
|
539
|
+
GENAI_OTEL_LOG_LEVEL=INFO # DEBUG, INFO, WARNING, ERROR, CRITICAL. Logs are written to 'logs/genai_otel.log' with rotation (10 files, 10MB each).
|
|
540
|
+
|
|
541
|
+
# Error handling
|
|
542
|
+
GENAI_FAIL_ON_ERROR=false # true to fail fast, false to continue on errors
|
|
543
|
+
```
|
|
544
|
+
|
|
545
|
+
### Programmatic Configuration
|
|
546
|
+
|
|
547
|
+
```python
|
|
548
|
+
import genai_otel
|
|
549
|
+
|
|
550
|
+
genai_otel.instrument(
|
|
551
|
+
service_name="my-app",
|
|
552
|
+
endpoint="http://localhost:4318",
|
|
553
|
+
enable_gpu_metrics=True,
|
|
554
|
+
enable_cost_tracking=True,
|
|
555
|
+
enable_mcp_instrumentation=True
|
|
556
|
+
)
|
|
557
|
+
```
|
|
558
|
+
|
|
559
|
+
### Sample Environment File (`sample.env`)
|
|
560
|
+
|
|
561
|
+
A `sample.env` file has been generated in the project root directory. This file contains commented-out examples of all supported environment variables, along with their default values or expected formats. You can copy this file to `.env` and uncomment/modify the variables to configure the instrumentation for your specific needs.
|
|
562
|
+
|
|
563
|
+
## Advanced Features
|
|
564
|
+
|
|
565
|
+
### Session and User Tracking
|
|
566
|
+
|
|
567
|
+
Track user sessions and identify users across multiple LLM requests for better analytics, debugging, and cost attribution.
|
|
568
|
+
|
|
569
|
+
**Configuration:**
|
|
570
|
+
|
|
571
|
+
```python
|
|
572
|
+
import genai_otel
|
|
573
|
+
from genai_otel import OTelConfig
|
|
574
|
+
|
|
575
|
+
# Define extractor functions
|
|
576
|
+
def extract_session_id(instance, args, kwargs):
|
|
577
|
+
"""Extract session ID from request metadata."""
|
|
578
|
+
# Option 1: From kwargs metadata
|
|
579
|
+
metadata = kwargs.get("metadata", {})
|
|
580
|
+
return metadata.get("session_id")
|
|
581
|
+
|
|
582
|
+
# Option 2: From custom headers
|
|
583
|
+
# headers = kwargs.get("headers", {})
|
|
584
|
+
# return headers.get("X-Session-ID")
|
|
585
|
+
|
|
586
|
+
# Option 3: From thread-local storage
|
|
587
|
+
# import threading
|
|
588
|
+
# return getattr(threading.current_thread(), "session_id", None)
|
|
589
|
+
|
|
590
|
+
def extract_user_id(instance, args, kwargs):
|
|
591
|
+
"""Extract user ID from request metadata."""
|
|
592
|
+
metadata = kwargs.get("metadata", {})
|
|
593
|
+
return metadata.get("user_id")
|
|
594
|
+
|
|
595
|
+
# Configure with extractors
|
|
596
|
+
config = OTelConfig(
|
|
597
|
+
service_name="my-rag-app",
|
|
598
|
+
endpoint="http://localhost:4318",
|
|
599
|
+
session_id_extractor=extract_session_id,
|
|
600
|
+
user_id_extractor=extract_user_id,
|
|
601
|
+
)
|
|
602
|
+
|
|
603
|
+
genai_otel.instrument(config)
|
|
604
|
+
```
|
|
605
|
+
|
|
606
|
+
**Usage:**
|
|
607
|
+
|
|
608
|
+
```python
|
|
609
|
+
from openai import OpenAI
|
|
610
|
+
|
|
611
|
+
client = OpenAI()
|
|
612
|
+
|
|
613
|
+
# Pass session and user info via metadata
|
|
614
|
+
response = client.chat.completions.create(
|
|
615
|
+
model="gpt-3.5-turbo",
|
|
616
|
+
messages=[{"role": "user", "content": "What is OpenTelemetry?"}],
|
|
617
|
+
extra_body={"metadata": {"session_id": "sess_12345", "user_id": "user_alice"}}
|
|
618
|
+
)
|
|
619
|
+
```
|
|
620
|
+
|
|
621
|
+
**Span Attributes Added:**
|
|
622
|
+
- `session.id` - Unique session identifier for tracking conversations
|
|
623
|
+
- `user.id` - User identifier for per-user analytics and cost tracking
|
|
624
|
+
|
|
625
|
+
**Use Cases:**
|
|
626
|
+
- Track multi-turn conversations across requests
|
|
627
|
+
- Analyze usage patterns per user
|
|
628
|
+
- Debug session-specific issues
|
|
629
|
+
- Calculate per-user costs and quotas
|
|
630
|
+
- Build user-specific dashboards
|
|
631
|
+
|
|
632
|
+
### RAG and Embedding Attributes
|
|
633
|
+
|
|
634
|
+
Enhanced observability for Retrieval-Augmented Generation (RAG) workflows, including embedding generation and document retrieval.
|
|
635
|
+
|
|
636
|
+
**Helper Methods:**
|
|
637
|
+
|
|
638
|
+
The `BaseInstrumentor` provides helper methods to add RAG-specific attributes to your spans:
|
|
639
|
+
|
|
640
|
+
```python
|
|
641
|
+
from opentelemetry import trace
|
|
642
|
+
from genai_otel.instrumentors.base import BaseInstrumentor
|
|
643
|
+
|
|
644
|
+
# Get your instrumentor instance (or create spans manually)
|
|
645
|
+
tracer = trace.get_tracer(__name__)
|
|
646
|
+
|
|
647
|
+
# 1. Embedding Attributes
|
|
648
|
+
with tracer.start_as_current_span("embedding.create") as span:
|
|
649
|
+
# Your embedding logic
|
|
650
|
+
embedding_response = client.embeddings.create(
|
|
651
|
+
model="text-embedding-3-small",
|
|
652
|
+
input="OpenTelemetry provides observability"
|
|
653
|
+
)
|
|
654
|
+
|
|
655
|
+
# Add embedding attributes (if using BaseInstrumentor)
|
|
656
|
+
# instrumentor.add_embedding_attributes(
|
|
657
|
+
# span,
|
|
658
|
+
# model="text-embedding-3-small",
|
|
659
|
+
# input_text="OpenTelemetry provides observability",
|
|
660
|
+
# vector=embedding_response.data[0].embedding
|
|
661
|
+
# )
|
|
662
|
+
|
|
663
|
+
# Or manually set attributes
|
|
664
|
+
span.set_attribute("embedding.model_name", "text-embedding-3-small")
|
|
665
|
+
span.set_attribute("embedding.text", "OpenTelemetry provides observability"[:500])
|
|
666
|
+
span.set_attribute("embedding.vector.dimension", len(embedding_response.data[0].embedding))
|
|
667
|
+
|
|
668
|
+
# 2. Retrieval Attributes
|
|
669
|
+
with tracer.start_as_current_span("retrieval.search") as span:
|
|
670
|
+
# Your retrieval logic
|
|
671
|
+
retrieved_docs = [
|
|
672
|
+
{
|
|
673
|
+
"id": "doc_001",
|
|
674
|
+
"score": 0.95,
|
|
675
|
+
"content": "OpenTelemetry is an observability framework...",
|
|
676
|
+
"metadata": {"source": "docs.opentelemetry.io", "category": "intro"}
|
|
677
|
+
},
|
|
678
|
+
# ... more documents
|
|
679
|
+
]
|
|
680
|
+
|
|
681
|
+
# Add retrieval attributes (if using BaseInstrumentor)
|
|
682
|
+
# instrumentor.add_retrieval_attributes(
|
|
683
|
+
# span,
|
|
684
|
+
# documents=retrieved_docs,
|
|
685
|
+
# query="What is OpenTelemetry?",
|
|
686
|
+
# max_docs=5
|
|
687
|
+
# )
|
|
688
|
+
|
|
689
|
+
# Or manually set attributes
|
|
690
|
+
span.set_attribute("retrieval.query", "What is OpenTelemetry?"[:500])
|
|
691
|
+
span.set_attribute("retrieval.document_count", len(retrieved_docs))
|
|
692
|
+
|
|
693
|
+
for i, doc in enumerate(retrieved_docs[:5]): # Limit to 5 docs
|
|
694
|
+
prefix = f"retrieval.documents.{i}.document"
|
|
695
|
+
span.set_attribute(f"{prefix}.id", doc["id"])
|
|
696
|
+
span.set_attribute(f"{prefix}.score", doc["score"])
|
|
697
|
+
span.set_attribute(f"{prefix}.content", doc["content"][:500])
|
|
698
|
+
|
|
699
|
+
# Add metadata
|
|
700
|
+
for key, value in doc.get("metadata", {}).items():
|
|
701
|
+
span.set_attribute(f"{prefix}.metadata.{key}", str(value))
|
|
702
|
+
```
|
|
703
|
+
|
|
704
|
+
**Embedding Attributes:**
|
|
705
|
+
- `embedding.model_name` - Embedding model used
|
|
706
|
+
- `embedding.text` - Input text (truncated to 500 chars)
|
|
707
|
+
- `embedding.vector` - Embedding vector (optional, if configured)
|
|
708
|
+
- `embedding.vector.dimension` - Vector dimensions
|
|
709
|
+
|
|
710
|
+
**Retrieval Attributes:**
|
|
711
|
+
- `retrieval.query` - Search query (truncated to 500 chars)
|
|
712
|
+
- `retrieval.document_count` - Number of documents retrieved
|
|
713
|
+
- `retrieval.documents.{i}.document.id` - Document ID
|
|
714
|
+
- `retrieval.documents.{i}.document.score` - Relevance score
|
|
715
|
+
- `retrieval.documents.{i}.document.content` - Document content (truncated to 500 chars)
|
|
716
|
+
- `retrieval.documents.{i}.document.metadata.*` - Custom metadata fields
|
|
717
|
+
|
|
718
|
+
**Safeguards:**
|
|
719
|
+
- Text content truncated to 500 characters to avoid span size explosion
|
|
720
|
+
- Document count limited to 5 by default (configurable via `max_docs`)
|
|
721
|
+
- Metadata values truncated to prevent excessive attribute counts
|
|
722
|
+
|
|
723
|
+
**Complete RAG Workflow Example:**
|
|
724
|
+
|
|
725
|
+
See `examples/phase4_session_rag_tracking.py` for a comprehensive demonstration of:
|
|
726
|
+
- Session and user tracking across RAG pipeline
|
|
727
|
+
- Embedding attribute capture
|
|
728
|
+
- Retrieval attribute capture
|
|
729
|
+
- End-to-end RAG workflow with full observability
|
|
730
|
+
|
|
731
|
+
**Use Cases:**
|
|
732
|
+
- Monitor retrieval quality and relevance scores
|
|
733
|
+
- Debug RAG pipeline performance
|
|
734
|
+
- Track embedding model usage
|
|
735
|
+
- Analyze document retrieval patterns
|
|
736
|
+
- Optimize vector search configurations
|
|
737
|
+
|
|
738
|
+
## Example: Full-Stack GenAI App
|
|
739
|
+
|
|
740
|
+
```python
|
|
741
|
+
import genai_otel
|
|
742
|
+
genai_otel.instrument()
|
|
743
|
+
|
|
744
|
+
import openai
|
|
745
|
+
import pinecone
|
|
746
|
+
import redis
|
|
747
|
+
import psycopg2
|
|
748
|
+
|
|
749
|
+
# All of these are automatically instrumented:
|
|
750
|
+
|
|
751
|
+
# Cache check
|
|
752
|
+
cache = redis.Redis().get('key')
|
|
753
|
+
|
|
754
|
+
# Vector search
|
|
755
|
+
pinecone_index = pinecone.Index("embeddings")
|
|
756
|
+
results = pinecone_index.query(vector=[...], top_k=5)
|
|
757
|
+
|
|
758
|
+
# Database query
|
|
759
|
+
conn = psycopg2.connect("dbname=mydb")
|
|
760
|
+
cursor = conn.cursor()
|
|
761
|
+
cursor.execute("SELECT * FROM context")
|
|
762
|
+
|
|
763
|
+
# LLM call with full context
|
|
764
|
+
client = openai.OpenAI()
|
|
765
|
+
response = client.chat.completions.create(
|
|
766
|
+
model="gpt-4",
|
|
767
|
+
messages=[...]
|
|
768
|
+
)
|
|
769
|
+
|
|
770
|
+
# You get:
|
|
771
|
+
# ✓ Distributed traces across all services
|
|
772
|
+
# ✓ Cost tracking for the LLM call
|
|
773
|
+
# ✓ Performance metrics for DB, cache, vector DB
|
|
774
|
+
# ✓ GPU metrics if using local models
|
|
775
|
+
# ✓ Complete observability with zero manual instrumentation
|
|
776
|
+
```
|
|
777
|
+
|
|
778
|
+
## Backend Integration
|
|
779
|
+
|
|
780
|
+
Works with any OpenTelemetry-compatible backend:
|
|
781
|
+
- Jaeger, Zipkin
|
|
782
|
+
- Prometheus, Grafana
|
|
783
|
+
- Datadog, New Relic, Honeycomb
|
|
784
|
+
- AWS X-Ray, Google Cloud Trace
|
|
785
|
+
- Elastic APM, Splunk
|
|
786
|
+
- Self-hosted OTEL Collector
|
|
787
|
+
|
|
788
|
+
## Project Structure
|
|
789
|
+
|
|
790
|
+
```bash
|
|
791
|
+
genai-otel-instrument/
|
|
792
|
+
├── setup.py
|
|
793
|
+
├── MANIFEST.in
|
|
794
|
+
├── README.md
|
|
795
|
+
├── LICENSE
|
|
796
|
+
├── example_usage.py
|
|
797
|
+
└── genai_otel/
|
|
798
|
+
├── __init__.py
|
|
799
|
+
├── config.py
|
|
800
|
+
├── auto_instrument.py
|
|
801
|
+
├── cli.py
|
|
802
|
+
├── cost_calculator.py
|
|
803
|
+
├── gpu_metrics.py
|
|
804
|
+
├── instrumentors/
|
|
805
|
+
│ ├── __init__.py
|
|
806
|
+
│ ├── base.py
|
|
807
|
+
│ └── (other instrumentor files)
|
|
808
|
+
└── mcp_instrumentors/
|
|
809
|
+
├── __init__.py
|
|
810
|
+
├── manager.py
|
|
811
|
+
└── (other mcp files)
|
|
812
|
+
```
|
|
813
|
+
|
|
814
|
+
## Roadmap
|
|
815
|
+
|
|
816
|
+
### Next Release (v0.2.0) - Q1 2026
|
|
817
|
+
|
|
818
|
+
We're planning significant enhancements for the next major release, focusing on evaluation metrics and safety guardrails alongside completing OpenTelemetry semantic convention compliance.
|
|
819
|
+
|
|
820
|
+
#### 🎯 Evaluation & Monitoring
|
|
821
|
+
|
|
822
|
+
**LLM Output Quality Metrics**
|
|
823
|
+
- **Bias Detection** - Automatically detect and measure bias in LLM responses
|
|
824
|
+
- Gender, racial, political, and cultural bias detection
|
|
825
|
+
- Bias score metrics with configurable thresholds
|
|
826
|
+
- Integration with fairness libraries (e.g., Fairlearn, AIF360)
|
|
827
|
+
|
|
828
|
+
- **Toxicity Detection** - Monitor and alert on toxic or harmful content
|
|
829
|
+
- Perspective API integration for toxicity scoring
|
|
830
|
+
- Custom toxicity models support
|
|
831
|
+
- Real-time toxicity metrics and alerts
|
|
832
|
+
- Configurable severity levels
|
|
833
|
+
|
|
834
|
+
- **Hallucination Detection** - Track factual accuracy and groundedness
|
|
835
|
+
- Fact-checking against provided context
|
|
836
|
+
- Citation validation for RAG applications
|
|
837
|
+
- Confidence scoring for generated claims
|
|
838
|
+
- Hallucination rate metrics by model and use case
|
|
839
|
+
|
|
840
|
+
**Implementation:**
|
|
841
|
+
```python
|
|
842
|
+
import genai_otel
|
|
843
|
+
|
|
844
|
+
# Enable evaluation metrics
|
|
845
|
+
genai_otel.instrument(
|
|
846
|
+
enable_bias_detection=True,
|
|
847
|
+
enable_toxicity_detection=True,
|
|
848
|
+
enable_hallucination_detection=True,
|
|
849
|
+
|
|
850
|
+
# Configure thresholds
|
|
851
|
+
bias_threshold=0.7,
|
|
852
|
+
toxicity_threshold=0.5,
|
|
853
|
+
hallucination_threshold=0.8
|
|
854
|
+
)
|
|
855
|
+
```
|
|
856
|
+
|
|
857
|
+
**Metrics Added:**
|
|
858
|
+
- `gen_ai.eval.bias_score` - Bias detection scores (histogram)
|
|
859
|
+
- `gen_ai.eval.toxicity_score` - Toxicity scores (histogram)
|
|
860
|
+
- `gen_ai.eval.hallucination_score` - Hallucination probability (histogram)
|
|
861
|
+
- `gen_ai.eval.violations` - Count of threshold violations by type
|
|
862
|
+
|
|
863
|
+
#### 🛡️ Safety Guardrails
|
|
864
|
+
|
|
865
|
+
**Input/Output Filtering**
|
|
866
|
+
- **Prompt Injection Detection** - Protect against prompt injection attacks
|
|
867
|
+
- Pattern-based detection (jailbreaking attempts)
|
|
868
|
+
- ML-based classifier for sophisticated attacks
|
|
869
|
+
- Real-time blocking with configurable policies
|
|
870
|
+
- Attack attempt metrics and logging
|
|
871
|
+
|
|
872
|
+
- **Restricted Topics** - Block sensitive or inappropriate topics
|
|
873
|
+
- Configurable topic blacklists (legal, medical, financial advice)
|
|
874
|
+
- Industry-specific content filters
|
|
875
|
+
- Topic detection with confidence scoring
|
|
876
|
+
- Custom topic definition support
|
|
877
|
+
|
|
878
|
+
- **Sensitive Information Protection** - Prevent PII leakage
|
|
879
|
+
- PII detection (emails, phone numbers, SSN, credit cards)
|
|
880
|
+
- Automatic redaction or blocking
|
|
881
|
+
- Compliance mode (GDPR, HIPAA, PCI-DSS)
|
|
882
|
+
- Data leak prevention metrics
|
|
883
|
+
|
|
884
|
+
**Implementation:**
|
|
885
|
+
```python
|
|
886
|
+
import genai_otel
|
|
887
|
+
|
|
888
|
+
# Configure guardrails
|
|
889
|
+
genai_otel.instrument(
|
|
890
|
+
enable_prompt_injection_detection=True,
|
|
891
|
+
enable_restricted_topics=True,
|
|
892
|
+
enable_sensitive_info_detection=True,
|
|
893
|
+
|
|
894
|
+
# Custom configuration
|
|
895
|
+
restricted_topics=["medical_advice", "legal_advice", "financial_advice"],
|
|
896
|
+
pii_detection_mode="block", # or "redact", "warn"
|
|
897
|
+
|
|
898
|
+
# Callbacks for custom handling
|
|
899
|
+
on_guardrail_violation=my_violation_handler
|
|
900
|
+
)
|
|
901
|
+
```
|
|
902
|
+
|
|
903
|
+
**Metrics Added:**
|
|
904
|
+
- `gen_ai.guardrail.prompt_injection_detected` - Injection attempts blocked
|
|
905
|
+
- `gen_ai.guardrail.restricted_topic_blocked` - Restricted topic violations
|
|
906
|
+
- `gen_ai.guardrail.pii_detected` - PII detection events
|
|
907
|
+
- `gen_ai.guardrail.violations` - Total guardrail violations by type
|
|
908
|
+
|
|
909
|
+
**Span Attributes:**
|
|
910
|
+
- `gen_ai.guardrail.violation_type` - Type of violation detected
|
|
911
|
+
- `gen_ai.guardrail.violation_severity` - Severity level (low, medium, high, critical)
|
|
912
|
+
- `gen_ai.guardrail.blocked` - Whether request was blocked (boolean)
|
|
913
|
+
- `gen_ai.eval.bias_categories` - Detected bias types (array)
|
|
914
|
+
- `gen_ai.eval.toxicity_categories` - Toxicity categories (array)
|
|
915
|
+
|
|
916
|
+
#### 🔄 Migration Support
|
|
917
|
+
|
|
918
|
+
**Backward Compatibility:**
|
|
919
|
+
- All new features are opt-in via configuration
|
|
920
|
+
- Existing instrumentation continues to work unchanged
|
|
921
|
+
- Gradual migration path for new semantic conventions
|
|
922
|
+
|
|
923
|
+
**Version Support:**
|
|
924
|
+
- Python 3.9+ (evaluation features require 3.10+)
|
|
925
|
+
- OpenTelemetry SDK 1.20.0+
|
|
926
|
+
- Backward compatible with existing dashboards
|
|
927
|
+
|
|
928
|
+
### Future Releases
|
|
929
|
+
|
|
930
|
+
**v0.3.0 - Advanced Analytics**
|
|
931
|
+
- Custom metric aggregations
|
|
932
|
+
- Cost optimization recommendations
|
|
933
|
+
- Automated performance regression detection
|
|
934
|
+
- A/B testing support for prompts
|
|
935
|
+
|
|
936
|
+
**v0.4.0 - Enterprise Features**
|
|
937
|
+
- Multi-tenancy support
|
|
938
|
+
- Role-based access control for telemetry
|
|
939
|
+
- Advanced compliance reporting
|
|
940
|
+
- SLA monitoring and alerting
|
|
941
|
+
|
|
942
|
+
**Community Feedback**
|
|
943
|
+
|
|
944
|
+
We welcome feedback on our roadmap! Please:
|
|
945
|
+
- Open issues for feature requests
|
|
946
|
+
- Join discussions on prioritization
|
|
947
|
+
- Share your use cases and requirements
|
|
948
|
+
|
|
949
|
+
See [Contributing.md](Contributing.md) for how to get involved.
|
|
950
|
+
|
|
951
|
+
## License
|
|
952
|
+
|
|
953
|
+
TraceVerde is licensed under the GNU Affero General Public License v3.0 or later (AGPL-3.0-or-later).
|
|
954
|
+
|
|
955
|
+
Copyright (C) 2025 Kshitij Thakkar
|
|
956
|
+
|
|
957
|
+
This program is free software: you can redistribute it and/or modify it under the terms of the GNU Affero General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.
|
|
958
|
+
|
|
959
|
+
See the [LICENSE](LICENSE) file for the full license text.
|