@aigentsphere/openclaw-otel-observability 0.2.1
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/.github/workflows/ci.yml +52 -0
- package/.github/workflows/docs.yml +25 -0
- package/LICENSE +15 -0
- package/README.md +300 -0
- package/collector/README.md +186 -0
- package/collector/otel-collector-config.yaml +230 -0
- package/docker-compose.yaml +32 -0
- package/docs/architecture.md +319 -0
- package/docs/backends/dynatrace.md +168 -0
- package/docs/backends/generic-otlp.md +166 -0
- package/docs/backends/grafana.md +167 -0
- package/docs/backends/index.md +49 -0
- package/docs/backends/otel-collector.md +210 -0
- package/docs/configuration.md +276 -0
- package/docs/development.md +198 -0
- package/docs/getting-started.md +295 -0
- package/docs/index.md +139 -0
- package/docs/limitations.md +95 -0
- package/docs/security/detection.md +274 -0
- package/docs/security/tetragon.md +454 -0
- package/docs/telemetry/metrics.md +283 -0
- package/docs/telemetry/tokens.md +188 -0
- package/docs/telemetry/traces.md +165 -0
- package/dynatrace/security-slo-dql.md +263 -0
- package/index.ts +191 -0
- package/instrumentation/preload.mjs +59 -0
- package/mkdocs.yml +90 -0
- package/openclaw.plugin.json +99 -0
- package/package.json +49 -0
- package/src/config.ts +72 -0
- package/src/diagnostics.ts +214 -0
- package/src/hooks.ts +575 -0
- package/src/openllmetry.ts +27 -0
- package/src/security.ts +396 -0
- package/src/telemetry.ts +282 -0
- package/tetragon-policies/01-process-exec.yaml +20 -0
- package/tetragon-policies/02-sensitive-files.yaml +86 -0
- package/tetragon-policies/04-privilege-escalation.yaml +25 -0
- package/tetragon-policies/05-dangerous-commands.yaml +97 -0
- package/tetragon-policies/06-kernel-modules.yaml +27 -0
- package/tetragon-policies/07-prompt-injection-shell.yaml +73 -0
- package/tetragon-policies/README.md +143 -0
- package/tsconfig.json +17 -0
|
@@ -0,0 +1,52 @@
|
|
|
1
|
+
name: CI & Publish
|
|
2
|
+
|
|
3
|
+
on:
|
|
4
|
+
push:
|
|
5
|
+
branches: [main]
|
|
6
|
+
tags: ["v*"]
|
|
7
|
+
pull_request:
|
|
8
|
+
branches: [main]
|
|
9
|
+
|
|
10
|
+
permissions:
|
|
11
|
+
contents: read
|
|
12
|
+
|
|
13
|
+
jobs:
|
|
14
|
+
typecheck:
|
|
15
|
+
runs-on: ubuntu-latest
|
|
16
|
+
steps:
|
|
17
|
+
- uses: actions/checkout@v4
|
|
18
|
+
|
|
19
|
+
- uses: actions/setup-node@v4
|
|
20
|
+
with:
|
|
21
|
+
node-version: "22"
|
|
22
|
+
cache: npm
|
|
23
|
+
|
|
24
|
+
- name: Install dependencies
|
|
25
|
+
run: npm ci
|
|
26
|
+
|
|
27
|
+
- name: Typecheck
|
|
28
|
+
run: npm run typecheck
|
|
29
|
+
|
|
30
|
+
publish:
|
|
31
|
+
needs: typecheck
|
|
32
|
+
if: startsWith(github.ref, 'refs/tags/v')
|
|
33
|
+
runs-on: ubuntu-latest
|
|
34
|
+
steps:
|
|
35
|
+
- uses: actions/checkout@v4
|
|
36
|
+
|
|
37
|
+
- uses: actions/setup-node@v4
|
|
38
|
+
with:
|
|
39
|
+
node-version: "22"
|
|
40
|
+
cache: npm
|
|
41
|
+
registry-url: "https://registry.npmjs.org"
|
|
42
|
+
|
|
43
|
+
- name: Install dependencies
|
|
44
|
+
run: npm ci
|
|
45
|
+
|
|
46
|
+
- name: Typecheck
|
|
47
|
+
run: npm run typecheck
|
|
48
|
+
|
|
49
|
+
- name: Publish to npm
|
|
50
|
+
run: npm publish --access public
|
|
51
|
+
env:
|
|
52
|
+
NODE_AUTH_TOKEN: ${{ secrets.NPM_TOKEN }}
|
|
@@ -0,0 +1,25 @@
|
|
|
1
|
+
name: Deploy Documentation
|
|
2
|
+
|
|
3
|
+
on:
|
|
4
|
+
push:
|
|
5
|
+
branches: [main]
|
|
6
|
+
workflow_dispatch:
|
|
7
|
+
|
|
8
|
+
permissions:
|
|
9
|
+
contents: write
|
|
10
|
+
|
|
11
|
+
jobs:
|
|
12
|
+
deploy:
|
|
13
|
+
runs-on: ubuntu-latest
|
|
14
|
+
steps:
|
|
15
|
+
- uses: actions/checkout@v4
|
|
16
|
+
|
|
17
|
+
- uses: actions/setup-python@v5
|
|
18
|
+
with:
|
|
19
|
+
python-version: "3.12"
|
|
20
|
+
|
|
21
|
+
- name: Install MkDocs Material
|
|
22
|
+
run: pip install mkdocs-material
|
|
23
|
+
|
|
24
|
+
- name: Deploy to GitHub Pages
|
|
25
|
+
run: mkdocs gh-deploy --force
|
package/LICENSE
ADDED
|
@@ -0,0 +1,15 @@
|
|
|
1
|
+
Apache License
|
|
2
|
+
Version 2.0, January 2004
|
|
3
|
+
http://www.apache.org/licenses/
|
|
4
|
+
|
|
5
|
+
Licensed under the Apache License, Version 2.0 (the "License");
|
|
6
|
+
you may not use this file except in compliance with the License.
|
|
7
|
+
You may obtain a copy of the License at
|
|
8
|
+
|
|
9
|
+
http://www.apache.org/licenses/LICENSE-2.0
|
|
10
|
+
|
|
11
|
+
Unless required by applicable law or agreed to in writing, software
|
|
12
|
+
distributed under the License is distributed on an "AS IS" BASIS,
|
|
13
|
+
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
|
14
|
+
See the License for the specific language governing permissions and
|
|
15
|
+
limitations under the License.
|
package/README.md
ADDED
|
@@ -0,0 +1,300 @@
|
|
|
1
|
+
# OpenClaw Observability
|
|
2
|
+
|
|
3
|
+
[](https://henrikrexed.github.io/openclaw-observability-plugin/)
|
|
4
|
+
[](https://opensource.org/licenses/MIT)
|
|
5
|
+
|
|
6
|
+
OpenTelemetry observability for [OpenClaw](https://github.com/openclaw/openclaw) AI agents.
|
|
7
|
+
|
|
8
|
+
📖 **[Full Documentation](https://henrikrexed.github.io/openclaw-observability-plugin/)** — Setup guides, configuration reference, and backend examples.
|
|
9
|
+
|
|
10
|
+
## Two Approaches to Observability
|
|
11
|
+
|
|
12
|
+
This repository documents **two complementary approaches** to monitoring OpenClaw:
|
|
13
|
+
|
|
14
|
+
| Approach | Best For | Setup Complexity |
|
|
15
|
+
|----------|----------|------------------|
|
|
16
|
+
| **Official Plugin** | Operational metrics, Gateway health, cost tracking | Simple config |
|
|
17
|
+
| **Custom Plugin** | Deep tracing, tool call visibility, request lifecycle | Plugin installation |
|
|
18
|
+
|
|
19
|
+
**Recommendation:** Use both for complete observability.
|
|
20
|
+
|
|
21
|
+
---
|
|
22
|
+
|
|
23
|
+
## Approach 1: Official Diagnostics Plugin (Built-in)
|
|
24
|
+
|
|
25
|
+
OpenClaw v2026.2+ includes **built-in OpenTelemetry support**. Just add to `openclaw.json`:
|
|
26
|
+
|
|
27
|
+
```json
|
|
28
|
+
{
|
|
29
|
+
"diagnostics": {
|
|
30
|
+
"enabled": true,
|
|
31
|
+
"otel": {
|
|
32
|
+
"enabled": true,
|
|
33
|
+
"endpoint": "http://localhost:4318",
|
|
34
|
+
"serviceName": "openclaw-gateway",
|
|
35
|
+
"traces": true,
|
|
36
|
+
"metrics": true,
|
|
37
|
+
"logs": true
|
|
38
|
+
}
|
|
39
|
+
}
|
|
40
|
+
}
|
|
41
|
+
```
|
|
42
|
+
|
|
43
|
+
Then restart:
|
|
44
|
+
|
|
45
|
+
```bash
|
|
46
|
+
openclaw gateway restart
|
|
47
|
+
```
|
|
48
|
+
|
|
49
|
+
### What It Captures
|
|
50
|
+
|
|
51
|
+
**Metrics:**
|
|
52
|
+
- `openclaw.tokens` — Token usage by type (input/output/cache)
|
|
53
|
+
- `openclaw.cost.usd` — Estimated model cost
|
|
54
|
+
- `openclaw.run.duration_ms` — Agent run duration
|
|
55
|
+
- `openclaw.context.tokens` — Context window usage
|
|
56
|
+
- `openclaw.webhook.*` — Webhook processing stats
|
|
57
|
+
- `openclaw.message.*` — Message processing stats
|
|
58
|
+
- `openclaw.queue.*` — Queue depth and wait times
|
|
59
|
+
- `openclaw.session.*` — Session state transitions
|
|
60
|
+
|
|
61
|
+
**Traces:** Model usage, webhook processing, message processing, stuck sessions
|
|
62
|
+
|
|
63
|
+
**Logs:** All Gateway logs via OTLP with severity, subsystem, and code location
|
|
64
|
+
|
|
65
|
+
---
|
|
66
|
+
|
|
67
|
+
## Approach 2: Custom Hook-Based Plugin (This Repo)
|
|
68
|
+
|
|
69
|
+
For **deeper observability**, install the custom plugin from this repo. It uses OpenClaw's typed plugin hooks to capture the full agent lifecycle.
|
|
70
|
+
|
|
71
|
+
### What It Adds
|
|
72
|
+
|
|
73
|
+
**Connected Traces:**
|
|
74
|
+
```
|
|
75
|
+
openclaw.request (root span)
|
|
76
|
+
├── openclaw.agent.turn
|
|
77
|
+
│ ├── tool.Read (file read)
|
|
78
|
+
│ ├── tool.exec (shell command)
|
|
79
|
+
│ ├── tool.Write (file write)
|
|
80
|
+
│ └── tool.web_search
|
|
81
|
+
└── (child spans connected via trace context)
|
|
82
|
+
```
|
|
83
|
+
|
|
84
|
+
**Per-Tool Visibility:**
|
|
85
|
+
- Individual spans for each tool call
|
|
86
|
+
- Tool execution time
|
|
87
|
+
- Result size (characters)
|
|
88
|
+
- Error tracking per tool
|
|
89
|
+
|
|
90
|
+
**Request Lifecycle:**
|
|
91
|
+
- Full message → response tracing
|
|
92
|
+
- Session context propagation
|
|
93
|
+
- Agent turn duration with token breakdown
|
|
94
|
+
|
|
95
|
+
### Installation
|
|
96
|
+
|
|
97
|
+
1. Clone this repository:
|
|
98
|
+
```bash
|
|
99
|
+
git clone https://github.com/henrikrexed/openclaw-observability-plugin.git
|
|
100
|
+
```
|
|
101
|
+
|
|
102
|
+
2. Add to your `openclaw.json`:
|
|
103
|
+
```json
|
|
104
|
+
{
|
|
105
|
+
"plugins": {
|
|
106
|
+
"load": {
|
|
107
|
+
"paths": ["/path/to/openclaw-observability-plugin"]
|
|
108
|
+
},
|
|
109
|
+
"entries": {
|
|
110
|
+
"otel-observability": {
|
|
111
|
+
"enabled": true,
|
|
112
|
+
"config": {
|
|
113
|
+
"endpoint": "http://localhost:4318",
|
|
114
|
+
"serviceName": "openclaw-gateway"
|
|
115
|
+
}
|
|
116
|
+
}
|
|
117
|
+
}
|
|
118
|
+
}
|
|
119
|
+
}
|
|
120
|
+
```
|
|
121
|
+
|
|
122
|
+
3. Clear cache and restart:
|
|
123
|
+
```bash
|
|
124
|
+
rm -rf /tmp/jiti
|
|
125
|
+
systemctl --user restart openclaw-gateway
|
|
126
|
+
```
|
|
127
|
+
|
|
128
|
+
---
|
|
129
|
+
|
|
130
|
+
## Comparing the Two Approaches
|
|
131
|
+
|
|
132
|
+
| Feature | Official Plugin | Custom Plugin |
|
|
133
|
+
|---------|-----------------|---------------|
|
|
134
|
+
| Token metrics | ✅ Per model | ✅ Per session + model |
|
|
135
|
+
| Cost tracking | ✅ Yes | ✅ Yes (from diagnostics) |
|
|
136
|
+
| Gateway health | ✅ Webhooks, queues, sessions | ❌ Not focused |
|
|
137
|
+
| Session state | ✅ State transitions | ❌ Not tracked |
|
|
138
|
+
| **Tool call tracing** | ❌ No | ✅ Individual tool spans |
|
|
139
|
+
| **Request lifecycle** | ❌ No | ✅ Full request → response |
|
|
140
|
+
| **Connected traces** | ❌ Separate spans | ✅ Parent-child hierarchy |
|
|
141
|
+
| Setup complexity | 🟢 Config only | 🟡 Plugin installation |
|
|
142
|
+
|
|
143
|
+
---
|
|
144
|
+
|
|
145
|
+
## Backend Examples
|
|
146
|
+
|
|
147
|
+
### Dynatrace (Direct)
|
|
148
|
+
|
|
149
|
+
```json
|
|
150
|
+
{
|
|
151
|
+
"diagnostics": {
|
|
152
|
+
"enabled": true,
|
|
153
|
+
"otel": {
|
|
154
|
+
"enabled": true,
|
|
155
|
+
"endpoint": "https://{env-id}.live.dynatrace.com/api/v2/otlp",
|
|
156
|
+
"headers": {
|
|
157
|
+
"Authorization": "Api-Token {your-token}"
|
|
158
|
+
},
|
|
159
|
+
"serviceName": "openclaw-gateway",
|
|
160
|
+
"traces": true,
|
|
161
|
+
"metrics": true,
|
|
162
|
+
"logs": true
|
|
163
|
+
}
|
|
164
|
+
}
|
|
165
|
+
}
|
|
166
|
+
```
|
|
167
|
+
|
|
168
|
+
### Grafana Cloud
|
|
169
|
+
|
|
170
|
+
```json
|
|
171
|
+
{
|
|
172
|
+
"diagnostics": {
|
|
173
|
+
"enabled": true,
|
|
174
|
+
"otel": {
|
|
175
|
+
"enabled": true,
|
|
176
|
+
"endpoint": "https://otlp-gateway-{region}.grafana.net/otlp",
|
|
177
|
+
"headers": {
|
|
178
|
+
"Authorization": "Basic {base64-credentials}"
|
|
179
|
+
},
|
|
180
|
+
"serviceName": "openclaw-gateway",
|
|
181
|
+
"traces": true,
|
|
182
|
+
"metrics": true
|
|
183
|
+
}
|
|
184
|
+
}
|
|
185
|
+
}
|
|
186
|
+
```
|
|
187
|
+
|
|
188
|
+
### Local OTel Collector
|
|
189
|
+
|
|
190
|
+
```json
|
|
191
|
+
{
|
|
192
|
+
"diagnostics": {
|
|
193
|
+
"enabled": true,
|
|
194
|
+
"otel": {
|
|
195
|
+
"enabled": true,
|
|
196
|
+
"endpoint": "http://localhost:4318",
|
|
197
|
+
"serviceName": "openclaw-gateway",
|
|
198
|
+
"traces": true,
|
|
199
|
+
"metrics": true,
|
|
200
|
+
"logs": true
|
|
201
|
+
}
|
|
202
|
+
}
|
|
203
|
+
}
|
|
204
|
+
```
|
|
205
|
+
|
|
206
|
+
---
|
|
207
|
+
|
|
208
|
+
## Configuration Reference
|
|
209
|
+
|
|
210
|
+
### Official Plugin Options
|
|
211
|
+
|
|
212
|
+
| Option | Type | Default | Description |
|
|
213
|
+
|--------|------|---------|-------------|
|
|
214
|
+
| `diagnostics.enabled` | boolean | false | Enable diagnostics system |
|
|
215
|
+
| `diagnostics.otel.enabled` | boolean | false | Enable OTel export |
|
|
216
|
+
| `diagnostics.otel.endpoint` | string | — | OTLP endpoint URL |
|
|
217
|
+
| `diagnostics.otel.protocol` | string | "http/protobuf" | Protocol |
|
|
218
|
+
| `diagnostics.otel.headers` | object | — | Custom headers |
|
|
219
|
+
| `diagnostics.otel.serviceName` | string | "openclaw" | Service name |
|
|
220
|
+
| `diagnostics.otel.traces` | boolean | true | Enable traces |
|
|
221
|
+
| `diagnostics.otel.metrics` | boolean | true | Enable metrics |
|
|
222
|
+
| `diagnostics.otel.logs` | boolean | false | Enable logs |
|
|
223
|
+
| `diagnostics.otel.sampleRate` | number | 1.0 | Trace sampling (0-1) |
|
|
224
|
+
|
|
225
|
+
### Custom Plugin Options
|
|
226
|
+
|
|
227
|
+
| Option | Type | Default | Description |
|
|
228
|
+
|--------|------|---------|-------------|
|
|
229
|
+
| `endpoint` | string | — | OTLP endpoint URL |
|
|
230
|
+
| `serviceName` | string | "openclaw-gateway" | Service name |
|
|
231
|
+
| `exporterType` | string | "otlp" | Exporter type |
|
|
232
|
+
| `enableTraces` | boolean | true | Enable traces |
|
|
233
|
+
| `enableMetrics` | boolean | true | Enable metrics |
|
|
234
|
+
|
|
235
|
+
---
|
|
236
|
+
|
|
237
|
+
## Documentation
|
|
238
|
+
|
|
239
|
+
- [Getting Started](./docs/getting-started.md) — Setup guide
|
|
240
|
+
- [Configuration](./docs/configuration.md) — All options
|
|
241
|
+
- [Architecture](./docs/architecture.md) — How it works
|
|
242
|
+
- [Limitations](./docs/limitations.md) — Known constraints
|
|
243
|
+
- [Backends](./docs/backends/) — Backend-specific guides
|
|
244
|
+
|
|
245
|
+
---
|
|
246
|
+
|
|
247
|
+
## Optional: Kernel-Level Security with Tetragon
|
|
248
|
+
|
|
249
|
+
For **defense in depth**, add [Tetragon](https://tetragon.io) eBPF-based monitoring. While the plugins above capture application-level telemetry, Tetragon sees what happens at the kernel level — file access, process execution, network connections, and privilege changes.
|
|
250
|
+
|
|
251
|
+
### Why Tetragon?
|
|
252
|
+
|
|
253
|
+
- **Tamper-proof**: Even a compromised agent can't hide its kernel-level actions
|
|
254
|
+
- **Sensitive file detection**: Alert when `.env`, SSH keys, or credentials are accessed
|
|
255
|
+
- **Dangerous command detection**: Catch `rm`, `curl | sh`, `chmod 777`, etc.
|
|
256
|
+
- **Privilege escalation**: Detect `setuid`/`setgid` attempts
|
|
257
|
+
|
|
258
|
+
### Quick Setup
|
|
259
|
+
|
|
260
|
+
```bash
|
|
261
|
+
# Install Tetragon
|
|
262
|
+
curl -LO https://github.com/cilium/tetragon/releases/latest/download/tetragon-v1.6.0-amd64.tar.gz
|
|
263
|
+
tar -xzf tetragon-v1.6.0-amd64.tar.gz && cd tetragon-v1.6.0-amd64
|
|
264
|
+
sudo ./install.sh
|
|
265
|
+
|
|
266
|
+
# Create OpenClaw policies directory
|
|
267
|
+
sudo mkdir -p /etc/tetragon/tetragon.tp.d/openclaw
|
|
268
|
+
|
|
269
|
+
# Add policies (see docs/security/tetragon.md for full examples)
|
|
270
|
+
# Start Tetragon
|
|
271
|
+
sudo systemctl enable --now tetragon
|
|
272
|
+
```
|
|
273
|
+
|
|
274
|
+
Tetragon events are exported to `/var/log/tetragon/tetragon.log` and can be ingested by the OTel Collector using the `filelog` receiver.
|
|
275
|
+
|
|
276
|
+
### Complete Observability Stack
|
|
277
|
+
|
|
278
|
+
| Layer | Source | What It Shows |
|
|
279
|
+
|-------|--------|---------------|
|
|
280
|
+
| **Application** | Custom Plugin | Tool calls, tokens, request flow |
|
|
281
|
+
| **Gateway** | Official Plugin | Session health, queues, costs |
|
|
282
|
+
| **Kernel** | Tetragon | System calls, file access, network |
|
|
283
|
+
|
|
284
|
+
See [Security: Tetragon](./docs/security/tetragon.md) for full installation and configuration guide.
|
|
285
|
+
|
|
286
|
+
---
|
|
287
|
+
|
|
288
|
+
## Known Limitations
|
|
289
|
+
|
|
290
|
+
**Auto-instrumentation not possible:** OpenLLMetry/IITM breaks `@mariozechner/pi-ai` named exports due to ESM/CJS module isolation. All telemetry is captured via hooks, not direct SDK instrumentation.
|
|
291
|
+
|
|
292
|
+
**No per-LLM-call spans:** Individual API calls to Claude/OpenAI cannot be traced. Token usage is aggregated per agent turn.
|
|
293
|
+
|
|
294
|
+
See [Limitations](./docs/limitations.md) for details.
|
|
295
|
+
|
|
296
|
+
---
|
|
297
|
+
|
|
298
|
+
## License
|
|
299
|
+
|
|
300
|
+
MIT
|
|
@@ -0,0 +1,186 @@
|
|
|
1
|
+
# OTel Collector Configuration
|
|
2
|
+
|
|
3
|
+
This directory contains a ready-to-use OpenTelemetry Collector configuration for OpenClaw observability.
|
|
4
|
+
|
|
5
|
+
## What It Collects
|
|
6
|
+
|
|
7
|
+
| Source | Receiver | Data Type | Description |
|
|
8
|
+
|--------|----------|-----------|-------------|
|
|
9
|
+
| OpenClaw Plugin | `otlp` | Traces | Request lifecycle, tool calls |
|
|
10
|
+
| OpenClaw Plugin | `otlp` | Metrics | Token usage, costs |
|
|
11
|
+
| OpenClaw Plugin | `otlp` | Logs | Application logs |
|
|
12
|
+
| Host | `hostmetrics` | Metrics | CPU, memory, disk, network |
|
|
13
|
+
| Tetragon | `filelog/tetragon` | Logs | Kernel security events |
|
|
14
|
+
|
|
15
|
+
## Quick Start
|
|
16
|
+
|
|
17
|
+
### 1. Install the Collector
|
|
18
|
+
|
|
19
|
+
```bash
|
|
20
|
+
# Download otelcol-contrib (includes all receivers/processors)
|
|
21
|
+
curl -LO https://github.com/open-telemetry/opentelemetry-collector-releases/releases/download/v0.144.0/otelcol-contrib_0.144.0_linux_amd64.tar.gz
|
|
22
|
+
tar -xzf otelcol-contrib_0.144.0_linux_amd64.tar.gz
|
|
23
|
+
sudo mv otelcol-contrib /usr/local/bin/
|
|
24
|
+
```
|
|
25
|
+
|
|
26
|
+
### 2. Configure Environment Variables
|
|
27
|
+
|
|
28
|
+
For Dynatrace:
|
|
29
|
+
```bash
|
|
30
|
+
export DT_ENDPOINT="https://YOUR_ENV.live.dynatrace.com/api/v2/otlp"
|
|
31
|
+
export DT_API_TOKEN="dt0c01.xxxxx"
|
|
32
|
+
```
|
|
33
|
+
|
|
34
|
+
### 3. Run the Collector
|
|
35
|
+
|
|
36
|
+
```bash
|
|
37
|
+
otelcol-contrib --config otel-collector-config.yaml
|
|
38
|
+
```
|
|
39
|
+
|
|
40
|
+
### 4. Run as a Service (systemd)
|
|
41
|
+
|
|
42
|
+
```bash
|
|
43
|
+
# Create service file
|
|
44
|
+
sudo tee /etc/systemd/system/otelcol-contrib.service << 'EOF'
|
|
45
|
+
[Unit]
|
|
46
|
+
Description=OpenTelemetry Collector
|
|
47
|
+
After=network.target
|
|
48
|
+
|
|
49
|
+
[Service]
|
|
50
|
+
Type=simple
|
|
51
|
+
User=otelcol-contrib
|
|
52
|
+
ExecStart=/usr/local/bin/otelcol-contrib --config /etc/otelcol-contrib/config.yaml
|
|
53
|
+
Restart=always
|
|
54
|
+
RestartSec=5
|
|
55
|
+
|
|
56
|
+
[Install]
|
|
57
|
+
WantedBy=multi-user.target
|
|
58
|
+
EOF
|
|
59
|
+
|
|
60
|
+
# Create override for environment
|
|
61
|
+
sudo mkdir -p /etc/systemd/system/otelcol-contrib.service.d
|
|
62
|
+
sudo tee /etc/systemd/system/otelcol-contrib.service.d/override.conf << 'EOF'
|
|
63
|
+
[Service]
|
|
64
|
+
Environment="DT_ENDPOINT=https://YOUR_ENV.live.dynatrace.com/api/v2/otlp"
|
|
65
|
+
Environment="DT_API_TOKEN=dt0c01.xxxxx"
|
|
66
|
+
EOF
|
|
67
|
+
|
|
68
|
+
# Copy config and start
|
|
69
|
+
sudo mkdir -p /etc/otelcol-contrib
|
|
70
|
+
sudo cp otel-collector-config.yaml /etc/otelcol-contrib/config.yaml
|
|
71
|
+
sudo systemctl daemon-reload
|
|
72
|
+
sudo systemctl enable --now otelcol-contrib
|
|
73
|
+
```
|
|
74
|
+
|
|
75
|
+
## Alternative Backends
|
|
76
|
+
|
|
77
|
+
### Grafana Cloud
|
|
78
|
+
|
|
79
|
+
Replace the exporter section:
|
|
80
|
+
|
|
81
|
+
```yaml
|
|
82
|
+
exporters:
|
|
83
|
+
otlphttp/grafana:
|
|
84
|
+
endpoint: "https://otlp-gateway-prod-us-central-0.grafana.net/otlp"
|
|
85
|
+
headers:
|
|
86
|
+
Authorization: "Basic ${env:GRAFANA_CLOUD_TOKEN}"
|
|
87
|
+
```
|
|
88
|
+
|
|
89
|
+
### Jaeger (Local)
|
|
90
|
+
|
|
91
|
+
```yaml
|
|
92
|
+
exporters:
|
|
93
|
+
otlp/jaeger:
|
|
94
|
+
endpoint: "localhost:4317"
|
|
95
|
+
tls:
|
|
96
|
+
insecure: true
|
|
97
|
+
```
|
|
98
|
+
|
|
99
|
+
### Generic OTLP
|
|
100
|
+
|
|
101
|
+
```yaml
|
|
102
|
+
exporters:
|
|
103
|
+
otlphttp:
|
|
104
|
+
endpoint: "https://your-otlp-endpoint.com"
|
|
105
|
+
headers:
|
|
106
|
+
Authorization: "Bearer ${env:API_TOKEN}"
|
|
107
|
+
```
|
|
108
|
+
|
|
109
|
+
## Pipelines
|
|
110
|
+
|
|
111
|
+
The configuration defines four pipelines:
|
|
112
|
+
|
|
113
|
+
| Pipeline | Receivers | Purpose |
|
|
114
|
+
|----------|-----------|---------|
|
|
115
|
+
| `traces` | otlp | OpenClaw request traces |
|
|
116
|
+
| `metrics` | otlp, hostmetrics | Token usage + system metrics |
|
|
117
|
+
| `logs/openclaw` | otlp | OpenClaw application logs |
|
|
118
|
+
| `logs/tetragon` | filelog/tetragon | Kernel security events |
|
|
119
|
+
|
|
120
|
+
## Tetragon Integration
|
|
121
|
+
|
|
122
|
+
The Tetragon pipeline:
|
|
123
|
+
|
|
124
|
+
1. **Reads** JSON events from `/var/log/tetragon/tetragon.log`
|
|
125
|
+
2. **Parses** the JSON and extracts timestamps
|
|
126
|
+
3. **Transforms** events to extract:
|
|
127
|
+
- `tetragon.type` — event type (kprobe, exec, exit)
|
|
128
|
+
- `tetragon.policy` — which policy triggered
|
|
129
|
+
- `process.binary`, `process.pid`, `process.uid`
|
|
130
|
+
- `tetragon.function` — syscall name
|
|
131
|
+
4. **Assigns** security risk levels:
|
|
132
|
+
- `critical` — privilege-escalation, kernel-modules
|
|
133
|
+
- `high` — sensitive-files, dangerous-commands
|
|
134
|
+
- `low` — process-exec
|
|
135
|
+
5. **Exports** to your backend with `service.name: openclaw-security`
|
|
136
|
+
|
|
137
|
+
### Prerequisites for Tetragon
|
|
138
|
+
|
|
139
|
+
```bash
|
|
140
|
+
# Install Tetragon
|
|
141
|
+
# See ../tetragon-policies/README.md
|
|
142
|
+
|
|
143
|
+
# Ensure collector can read the log
|
|
144
|
+
sudo chmod 644 /var/log/tetragon/tetragon.log
|
|
145
|
+
|
|
146
|
+
# Or add collector user to appropriate group
|
|
147
|
+
sudo usermod -a -G adm otelcol-contrib
|
|
148
|
+
```
|
|
149
|
+
|
|
150
|
+
## Troubleshooting
|
|
151
|
+
|
|
152
|
+
### Collector not starting
|
|
153
|
+
|
|
154
|
+
```bash
|
|
155
|
+
# Validate config
|
|
156
|
+
otelcol-contrib validate --config otel-collector-config.yaml
|
|
157
|
+
|
|
158
|
+
# Check for missing env vars
|
|
159
|
+
echo $DT_ENDPOINT
|
|
160
|
+
echo $DT_API_TOKEN
|
|
161
|
+
```
|
|
162
|
+
|
|
163
|
+
### Tetragon events not appearing
|
|
164
|
+
|
|
165
|
+
```bash
|
|
166
|
+
# Check Tetragon is writing events
|
|
167
|
+
sudo tail -f /var/log/tetragon/tetragon.log
|
|
168
|
+
|
|
169
|
+
# Check file permissions
|
|
170
|
+
ls -la /var/log/tetragon/tetragon.log
|
|
171
|
+
|
|
172
|
+
# Check collector logs
|
|
173
|
+
journalctl -u otelcol-contrib -f | grep tetragon
|
|
174
|
+
```
|
|
175
|
+
|
|
176
|
+
### High memory usage
|
|
177
|
+
|
|
178
|
+
Reduce batch sizes:
|
|
179
|
+
|
|
180
|
+
```yaml
|
|
181
|
+
processors:
|
|
182
|
+
batch:
|
|
183
|
+
timeout: 5s
|
|
184
|
+
send_batch_size: 256
|
|
185
|
+
send_batch_max_size: 512
|
|
186
|
+
```
|