@aigentsphere/openclaw-otel-observability 0.2.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (43) hide show
  1. package/.github/workflows/ci.yml +52 -0
  2. package/.github/workflows/docs.yml +25 -0
  3. package/LICENSE +15 -0
  4. package/README.md +300 -0
  5. package/collector/README.md +186 -0
  6. package/collector/otel-collector-config.yaml +230 -0
  7. package/docker-compose.yaml +32 -0
  8. package/docs/architecture.md +319 -0
  9. package/docs/backends/dynatrace.md +168 -0
  10. package/docs/backends/generic-otlp.md +166 -0
  11. package/docs/backends/grafana.md +167 -0
  12. package/docs/backends/index.md +49 -0
  13. package/docs/backends/otel-collector.md +210 -0
  14. package/docs/configuration.md +276 -0
  15. package/docs/development.md +198 -0
  16. package/docs/getting-started.md +295 -0
  17. package/docs/index.md +139 -0
  18. package/docs/limitations.md +95 -0
  19. package/docs/security/detection.md +274 -0
  20. package/docs/security/tetragon.md +454 -0
  21. package/docs/telemetry/metrics.md +283 -0
  22. package/docs/telemetry/tokens.md +188 -0
  23. package/docs/telemetry/traces.md +165 -0
  24. package/dynatrace/security-slo-dql.md +263 -0
  25. package/index.ts +191 -0
  26. package/instrumentation/preload.mjs +59 -0
  27. package/mkdocs.yml +90 -0
  28. package/openclaw.plugin.json +99 -0
  29. package/package.json +49 -0
  30. package/src/config.ts +72 -0
  31. package/src/diagnostics.ts +214 -0
  32. package/src/hooks.ts +575 -0
  33. package/src/openllmetry.ts +27 -0
  34. package/src/security.ts +396 -0
  35. package/src/telemetry.ts +282 -0
  36. package/tetragon-policies/01-process-exec.yaml +20 -0
  37. package/tetragon-policies/02-sensitive-files.yaml +86 -0
  38. package/tetragon-policies/04-privilege-escalation.yaml +25 -0
  39. package/tetragon-policies/05-dangerous-commands.yaml +97 -0
  40. package/tetragon-policies/06-kernel-modules.yaml +27 -0
  41. package/tetragon-policies/07-prompt-injection-shell.yaml +73 -0
  42. package/tetragon-policies/README.md +143 -0
  43. package/tsconfig.json +17 -0
@@ -0,0 +1,168 @@
1
+ # Dynatrace Setup
2
+
3
+ Send OpenClaw telemetry directly to Dynatrace using OTLP.
4
+
5
+ ## Prerequisites
6
+
7
+ - Dynatrace environment (SaaS or Managed)
8
+ - API token with ingest permissions
9
+
10
+ ## Create API Token
11
+
12
+ 1. Go to **Settings** → **Access tokens**
13
+ 2. Click **Generate new token**
14
+ 3. Add scopes:
15
+ - `metrics.ingest`
16
+ - `logs.ingest`
17
+ - `openTelemetryTrace.ingest`
18
+ 4. Copy the token (starts with `dt0c01.`)
19
+
20
+ ## Configure OpenClaw
21
+
22
+ Add to `~/.openclaw/openclaw.json`:
23
+
24
+ ```json
25
+ {
26
+ "diagnostics": {
27
+ "enabled": true,
28
+ "otel": {
29
+ "enabled": true,
30
+ "endpoint": "https://{environment-id}.live.dynatrace.com/api/v2/otlp",
31
+ "headers": {
32
+ "Authorization": "Api-Token dt0c01.XXXXXXXX"
33
+ },
34
+ "serviceName": "openclaw-gateway",
35
+ "traces": true,
36
+ "metrics": true,
37
+ "logs": true
38
+ }
39
+ }
40
+ }
41
+ ```
42
+
43
+ Replace:
44
+ - `{environment-id}` — Your Dynatrace environment ID (e.g., `abc12345`)
45
+ - `dt0c01.XXXXXXXX` — Your API token
46
+
47
+ ### Dynatrace Managed
48
+
49
+ For Dynatrace Managed, use your ActiveGate URL:
50
+
51
+ ```json
52
+ {
53
+ "diagnostics": {
54
+ "enabled": true,
55
+ "otel": {
56
+ "enabled": true,
57
+ "endpoint": "https://{your-activegate}/e/{environment-id}/api/v2/otlp",
58
+ "headers": {
59
+ "Authorization": "Api-Token dt0c01.XXXXXXXX"
60
+ },
61
+ "serviceName": "openclaw-gateway"
62
+ }
63
+ }
64
+ }
65
+ ```
66
+
67
+ ## Restart Gateway
68
+
69
+ ```bash
70
+ openclaw gateway restart
71
+ ```
72
+
73
+ ## Verify in Dynatrace
74
+
75
+ ### Find Your Service
76
+
77
+ 1. Go to **Services** in Dynatrace
78
+ 2. Search for `openclaw-gateway`
79
+ 3. Click to view service details
80
+
81
+ ### View Traces
82
+
83
+ 1. Go to **Distributed traces**
84
+ 2. Filter by service: `openclaw-gateway`
85
+ 3. Click a trace to see spans
86
+
87
+ ### View Metrics
88
+
89
+ 1. Go to **Explore** → **Metrics**
90
+ 2. Search for `openclaw.`
91
+ 3. Available metrics:
92
+ - `openclaw.tokens` — Token usage
93
+ - `openclaw.cost.usd` — Cost tracking
94
+ - `openclaw.run.duration_ms` — Agent run times
95
+ - `openclaw.message.*` — Message processing
96
+ - `openclaw.queue.*` — Queue metrics
97
+
98
+ ### Create Dashboard
99
+
100
+ Create a dashboard with:
101
+
102
+ ```sql
103
+ -- Token usage by model
104
+ timeseries sum(openclaw.tokens), by:{openclaw.model, openclaw.token}
105
+
106
+ -- Cost over time
107
+ timeseries sum(openclaw.cost.usd), by:{openclaw.model}
108
+
109
+ -- Agent run duration
110
+ timeseries avg(openclaw.run.duration_ms), by:{openclaw.model}
111
+ ```
112
+
113
+ ### View Logs
114
+
115
+ 1. Go to **Logs**
116
+ 2. Filter: `dt.entity.service = "openclaw-gateway"`
117
+ 3. View log records with severity and attributes
118
+
119
+ ## Example DQL Queries
120
+
121
+ ### Token Usage by Model
122
+ ```sql
123
+ fetch spans
124
+ | filter dt.entity.service == "openclaw-gateway"
125
+ | summarize tokens = sum(openclaw.tokens.total), by:{openclaw.model}
126
+ | sort tokens desc
127
+ ```
128
+
129
+ ### Average Run Duration
130
+ ```sql
131
+ fetch spans
132
+ | filter dt.entity.service == "openclaw-gateway"
133
+ | filter matchesPhrase(span.name, "model")
134
+ | summarize avg_duration = avg(openclaw.run.duration_ms)
135
+ ```
136
+
137
+ ### Error Rate
138
+ ```sql
139
+ fetch logs
140
+ | filter dt.entity.service == "openclaw-gateway"
141
+ | filter loglevel == "ERROR"
142
+ | summarize count = count()
143
+ ```
144
+
145
+ ## Troubleshooting
146
+
147
+ ### No Data in Dynatrace?
148
+
149
+ 1. **Check token permissions**: Ensure all three scopes are enabled
150
+ 2. **Verify endpoint URL**: Should be `https://{env-id}.live.dynatrace.com/api/v2/otlp`
151
+ 3. **Test connectivity**:
152
+ ```bash
153
+ curl -v "https://{env-id}.live.dynatrace.com/api/v2/otlp/v1/traces" \
154
+ -H "Authorization: Api-Token dt0c01.xxx"
155
+ ```
156
+
157
+ ### Service Not Appearing?
158
+
159
+ - Wait 2-5 minutes for service detection
160
+ - Send a few messages to generate telemetry
161
+ - Check Dynatrace logs for ingest errors
162
+
163
+ ### 403 Forbidden?
164
+
165
+ Token lacks required scopes. Regenerate with all three:
166
+ - `metrics.ingest`
167
+ - `logs.ingest`
168
+ - `openTelemetryTrace.ingest`
@@ -0,0 +1,166 @@
1
+ # Generic OTLP Backend
2
+
3
+ Any backend that supports OTLP (OpenTelemetry Protocol) can receive data from this plugin. Here are configuration snippets for popular backends.
4
+
5
+ ## Honeycomb
6
+
7
+ ```json
8
+ {
9
+ "config": {
10
+ "endpoint": "https://api.honeycomb.io",
11
+ "protocol": "http",
12
+ "headers": {
13
+ "x-honeycomb-team": "<YOUR_API_KEY>"
14
+ }
15
+ }
16
+ }
17
+ ```
18
+
19
+ ## New Relic
20
+
21
+ ```json
22
+ {
23
+ "config": {
24
+ "endpoint": "https://otlp.nr-data.net",
25
+ "protocol": "http",
26
+ "headers": {
27
+ "api-key": "<YOUR_INGEST_LICENSE_KEY>"
28
+ }
29
+ }
30
+ }
31
+ ```
32
+
33
+ !!! note
34
+ For EU data centers, use `https://otlp.eu01.nr-data.net`.
35
+
36
+ ## Datadog
37
+
38
+ Datadog doesn't support direct OTLP ingest — use the OTel Collector with the Datadog exporter.
39
+
40
+ ### Collector Config
41
+
42
+ ```yaml
43
+ exporters:
44
+ datadog:
45
+ api:
46
+ key: "${DD_API_KEY}"
47
+ site: "datadoghq.com" # or datadoghq.eu, etc.
48
+
49
+ service:
50
+ pipelines:
51
+ traces:
52
+ receivers: [otlp]
53
+ processors: [batch]
54
+ exporters: [datadog]
55
+ metrics:
56
+ receivers: [otlp]
57
+ processors: [batch]
58
+ exporters: [datadog]
59
+ ```
60
+
61
+ ### Plugin Config
62
+
63
+ Point at the local collector:
64
+
65
+ ```json
66
+ {
67
+ "config": {
68
+ "endpoint": "http://localhost:4318"
69
+ }
70
+ }
71
+ ```
72
+
73
+ ## SigNoz
74
+
75
+ ```json
76
+ {
77
+ "config": {
78
+ "endpoint": "https://ingest.<region>.signoz.cloud:443",
79
+ "protocol": "http",
80
+ "headers": {
81
+ "signoz-access-token": "<YOUR_SIGNOZ_TOKEN>"
82
+ }
83
+ }
84
+ }
85
+ ```
86
+
87
+ Or self-hosted:
88
+
89
+ ```json
90
+ {
91
+ "config": {
92
+ "endpoint": "http://<signoz-host>:4318"
93
+ }
94
+ }
95
+ ```
96
+
97
+ ## Jaeger
98
+
99
+ Jaeger supports OTLP natively since v1.35:
100
+
101
+ ```json
102
+ {
103
+ "config": {
104
+ "endpoint": "http://<jaeger-host>:4318",
105
+ "protocol": "http"
106
+ }
107
+ }
108
+ ```
109
+
110
+ !!! note
111
+ Jaeger only supports traces, not metrics or logs via OTLP.
112
+
113
+ ## Splunk
114
+
115
+ Use the OTel Collector with the Splunk HEC exporter:
116
+
117
+ ### Collector Config
118
+
119
+ ```yaml
120
+ exporters:
121
+ splunk_hec:
122
+ token: "${SPLUNK_HEC_TOKEN}"
123
+ endpoint: "https://<splunk-host>:8088/services/collector"
124
+ source: "openclaw"
125
+ sourcetype: "otel"
126
+
127
+ service:
128
+ pipelines:
129
+ traces:
130
+ receivers: [otlp]
131
+ processors: [batch]
132
+ exporters: [splunk_hec]
133
+ ```
134
+
135
+ ## Elastic / Elasticsearch
136
+
137
+ Use the OTel Collector with the Elasticsearch exporter:
138
+
139
+ ### Collector Config
140
+
141
+ ```yaml
142
+ exporters:
143
+ elasticsearch:
144
+ endpoints: ["https://<elastic-host>:9200"]
145
+ user: "${ELASTIC_USER}"
146
+ password: "${ELASTIC_PASSWORD}"
147
+
148
+ service:
149
+ pipelines:
150
+ traces:
151
+ receivers: [otlp]
152
+ processors: [batch]
153
+ exporters: [elasticsearch]
154
+ logs:
155
+ receivers: [otlp]
156
+ processors: [batch]
157
+ exporters: [elasticsearch]
158
+ ```
159
+
160
+ ## Custom / Self-Hosted Collector
161
+
162
+ If your backend isn't listed, you can almost certainly connect it via the OTel Collector. The [contrib distribution](https://github.com/open-telemetry/opentelemetry-collector-contrib) includes exporters for 50+ backends.
163
+
164
+ 1. Find your exporter in the [collector contrib registry](https://opentelemetry.io/ecosystem/registry/?language=collector)
165
+ 2. Add it to the collector config
166
+ 3. Point the plugin at `http://localhost:4318`
@@ -0,0 +1,167 @@
1
+ # Grafana Integration
2
+
3
+ Export OpenClaw traces to **Grafana Tempo** and metrics to **Grafana Mimir** (or Prometheus) for visualization in Grafana dashboards.
4
+
5
+ ## Grafana Cloud (Direct Export)
6
+
7
+ ### 1. Get Your OTLP Credentials
8
+
9
+ 1. Go to [grafana.com](https://grafana.com) → Your stack → **Connections** → **OpenTelemetry**
10
+ 2. Note the OTLP endpoint and generate an API token
11
+
12
+ ### 2. Configure the Plugin
13
+
14
+ ```json
15
+ {
16
+ "plugins": {
17
+ "entries": {
18
+ "otel-observability": {
19
+ "enabled": true,
20
+ "config": {
21
+ "endpoint": "https://otlp-gateway-<region>.grafana.net/otlp",
22
+ "protocol": "http",
23
+ "serviceName": "openclaw-gateway",
24
+ "headers": {
25
+ "Authorization": "Basic <base64-of-instanceId:apiToken>"
26
+ }
27
+ }
28
+ }
29
+ }
30
+ }
31
+ }
32
+ ```
33
+
34
+ !!! tip "Creating the Basic auth header"
35
+ ```bash
36
+ echo -n "<instanceId>:<apiToken>" | base64
37
+ ```
38
+
39
+ ## Self-Hosted Grafana Stack
40
+
41
+ ### Docker Compose Addition
42
+
43
+ Add Tempo and Grafana to the collector setup:
44
+
45
+ ```yaml
46
+ services:
47
+ otel-collector:
48
+ # ... existing config ...
49
+
50
+ tempo:
51
+ image: grafana/tempo:latest
52
+ container_name: openclaw-tempo
53
+ ports:
54
+ - "3200:3200" # Tempo API
55
+ - "4417:4317" # OTLP gRPC (Tempo)
56
+ volumes:
57
+ - ./collector/tempo-config.yaml:/etc/tempo/config.yaml:ro
58
+ command: ["-config.file=/etc/tempo/config.yaml"]
59
+
60
+ grafana:
61
+ image: grafana/grafana:latest
62
+ container_name: openclaw-grafana
63
+ ports:
64
+ - "3000:3000"
65
+ environment:
66
+ - GF_AUTH_ANONYMOUS_ENABLED=true
67
+ - GF_AUTH_ANONYMOUS_ORG_ROLE=Admin
68
+ volumes:
69
+ - grafana-data:/var/lib/grafana
70
+
71
+ volumes:
72
+ grafana-data:
73
+ ```
74
+
75
+ ### Tempo Configuration
76
+
77
+ Create `collector/tempo-config.yaml`:
78
+
79
+ ```yaml
80
+ server:
81
+ http_listen_port: 3200
82
+
83
+ distributor:
84
+ receivers:
85
+ otlp:
86
+ protocols:
87
+ grpc:
88
+ endpoint: 0.0.0.0:4317
89
+
90
+ storage:
91
+ trace:
92
+ backend: local
93
+ local:
94
+ path: /tmp/tempo/blocks
95
+ wal:
96
+ path: /tmp/tempo/wal
97
+
98
+ metrics_generator:
99
+ storage:
100
+ path: /tmp/tempo/generator/wal
101
+ ```
102
+
103
+ ### Collector Config (Fan-Out)
104
+
105
+ Update `collector/otel-collector-config.yaml` to export to both Dynatrace and Tempo:
106
+
107
+ ```yaml
108
+ exporters:
109
+ otlphttp/dynatrace:
110
+ endpoint: "${DYNATRACE_ENDPOINT}"
111
+ headers:
112
+ Authorization: "Api-Token ${DYNATRACE_API_TOKEN}"
113
+
114
+ otlp/tempo:
115
+ endpoint: "tempo:4317"
116
+ tls:
117
+ insecure: true
118
+
119
+ service:
120
+ pipelines:
121
+ traces:
122
+ receivers: [otlp]
123
+ processors: [batch]
124
+ exporters: [otlphttp/dynatrace, otlp/tempo]
125
+ ```
126
+
127
+ ### Grafana Data Source
128
+
129
+ 1. Open Grafana at `http://localhost:3000`
130
+ 2. Go to **Configuration** → **Data Sources** → **Add data source**
131
+ 3. Select **Tempo**
132
+ 4. Set URL: `http://tempo:3200`
133
+ 5. Click **Save & Test**
134
+
135
+ ## Grafana Dashboards
136
+
137
+ ### Explore View
138
+
139
+ 1. Go to **Explore**
140
+ 2. Select the **Tempo** data source
141
+ 3. Search by service name: `openclaw-gateway`
142
+ 4. Or search by span name: `openclaw.agent.turn` or `tool.*`
143
+
144
+ ### Example Dashboard Panels
145
+
146
+ **Token Usage Over Time** (Prometheus/Mimir):
147
+ ```promql
148
+ sum(rate(openclaw_llm_tokens_total[5m])) by (model)
149
+ ```
150
+
151
+ **LLM Latency P95** (Prometheus/Mimir):
152
+ ```promql
153
+ histogram_quantile(0.95, rate(openclaw_llm_duration_bucket[5m]))
154
+ ```
155
+
156
+ **Tool Call Rate** (Prometheus/Mimir):
157
+ ```promql
158
+ sum(rate(openclaw_tool_calls_total[5m])) by (tool_name)
159
+ ```
160
+
161
+ **Error Rate** (Prometheus/Mimir):
162
+ ```promql
163
+ sum(rate(openclaw_llm_errors_total[5m])) / sum(rate(openclaw_llm_requests_total[5m])) * 100
164
+ ```
165
+
166
+ !!! note "Metric name format"
167
+ OTel metrics use dots (`openclaw.llm.tokens.total`) but Prometheus converts them to underscores (`openclaw_llm_tokens_total`).
@@ -0,0 +1,49 @@
1
+ # Backends
2
+
3
+ The plugin exports telemetry via standard **OTLP** (OpenTelemetry Protocol), which means it works with any OpenTelemetry-compatible backend.
4
+
5
+ ## Supported Backends
6
+
7
+ | Backend | Direct Export | Via Collector | Guide |
8
+ |---------|:---:|:---:|-------|
9
+ | **Dynatrace** | ✅ | ✅ | [Setup →](dynatrace.md) |
10
+ | **Grafana (Tempo + Mimir)** | ✅ | ✅ | [Setup →](grafana.md) |
11
+ | **Datadog** | ❌ | ✅ | [Generic →](generic-otlp.md) |
12
+ | **Honeycomb** | ✅ | ✅ | [Generic →](generic-otlp.md) |
13
+ | **New Relic** | ✅ | ✅ | [Generic →](generic-otlp.md) |
14
+ | **Splunk** | ❌ | ✅ | [Generic →](generic-otlp.md) |
15
+ | **Jaeger** | ✅ | ✅ | [Generic →](generic-otlp.md) |
16
+ | **SigNoz** | ✅ | ✅ | [Generic →](generic-otlp.md) |
17
+ | **OTel Collector** | — | — | [Setup →](otel-collector.md) |
18
+
19
+ ## Direct Export vs. Collector
20
+
21
+ ### Direct Export
22
+
23
+ ```mermaid
24
+ flowchart LR
25
+ A[OpenClaw Plugin] -->|OTLP| B[Backend]
26
+ ```
27
+
28
+ - Simpler setup — no extra components
29
+ - Works well for single-backend setups
30
+ - Backend credentials are in the OpenClaw config
31
+
32
+ ### Via OTel Collector (Recommended)
33
+
34
+ ```mermaid
35
+ flowchart LR
36
+ A[OpenClaw Plugin] -->|OTLP| B[OTel Collector]
37
+ B -->|OTLP| C[Dynatrace]
38
+ B -->|OTLP| D[Grafana]
39
+ B -->|Prometheus| E[Prometheus]
40
+ ```
41
+
42
+ - **Batching & retry** — handles network hiccups gracefully
43
+ - **Processing** — filter, transform, and enrich data before export
44
+ - **Fan-out** — send to multiple backends simultaneously
45
+ - **Decoupled auth** — backend credentials stay on the collector, not in OpenClaw
46
+ - **Sampling** — reduce data volume for high-traffic agents
47
+
48
+ !!! tip "Recommendation"
49
+ Use the OTel Collector in production. Use direct export for quick development testing.