@simbimbo/brainstem 0.0.1 → 0.0.3
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/CHANGELOG.md +87 -0
- package/README.md +99 -3
- package/brainstem/__init__.py +3 -0
- package/brainstem/api.py +257 -0
- package/brainstem/connectors/__init__.py +1 -0
- package/brainstem/connectors/logicmonitor.py +26 -0
- package/brainstem/connectors/types.py +16 -0
- package/brainstem/demo.py +64 -0
- package/brainstem/fingerprint.py +44 -0
- package/brainstem/ingest.py +108 -0
- package/brainstem/instrumentation.py +38 -0
- package/brainstem/interesting.py +62 -0
- package/brainstem/models.py +80 -0
- package/brainstem/recurrence.py +112 -0
- package/brainstem/scoring.py +38 -0
- package/brainstem/storage.py +428 -0
- package/docs/adapters.md +435 -0
- package/docs/api.md +380 -0
- package/docs/architecture.md +333 -0
- package/docs/connectors.md +66 -0
- package/docs/data-model.md +290 -0
- package/docs/design-governance.md +595 -0
- package/docs/mvp-flow.md +109 -0
- package/docs/roadmap.md +87 -0
- package/docs/scoring.md +424 -0
- package/docs/v0.0.1.md +277 -0
- package/docs/vision.md +85 -0
- package/package.json +6 -14
- package/pyproject.toml +18 -0
- package/tests/fixtures/sample_syslog.log +6 -0
- package/tests/test_api.py +319 -0
- package/tests/test_canonicalization.py +28 -0
- package/tests/test_demo.py +25 -0
- package/tests/test_fingerprint.py +22 -0
- package/tests/test_ingest.py +15 -0
- package/tests/test_instrumentation.py +16 -0
- package/tests/test_interesting.py +36 -0
- package/tests/test_logicmonitor.py +22 -0
- package/tests/test_recurrence.py +16 -0
- package/tests/test_scoring.py +21 -0
- package/tests/test_storage.py +294 -0
package/docs/api.md
ADDED
|
@@ -0,0 +1,380 @@
|
|
|
1
|
+
# API Spec
|
|
2
|
+
|
|
3
|
+
## Goal
|
|
4
|
+
|
|
5
|
+
Provide a small, explainable API for ingesting events, scoring patterns, promoting memory, and retrieving relevant operational history.
|
|
6
|
+
|
|
7
|
+
The API should support both machine-driven ingestion and operator-driven investigation.
|
|
8
|
+
|
|
9
|
+
---
|
|
10
|
+
|
|
11
|
+
## 1. Ingest Event
|
|
12
|
+
|
|
13
|
+
### `POST /events/ingest`
|
|
14
|
+
|
|
15
|
+
Ingest one normalized or semi-normalized event.
|
|
16
|
+
|
|
17
|
+
Request:
|
|
18
|
+
```json
|
|
19
|
+
{
|
|
20
|
+
"tenant_id": "client-a",
|
|
21
|
+
"asset_id": "fw-01",
|
|
22
|
+
"source_type": "syslog",
|
|
23
|
+
"source_path": "/var/log/syslog",
|
|
24
|
+
"host": "fw-01",
|
|
25
|
+
"service": "charon",
|
|
26
|
+
"timestamp": "2026-03-22T03:30:00Z",
|
|
27
|
+
"severity": "warning",
|
|
28
|
+
"facility": "daemon",
|
|
29
|
+
"message_raw": "IPsec SA rekey failed; retrying",
|
|
30
|
+
"structured_fields": {
|
|
31
|
+
"peer": "203.0.113.10"
|
|
32
|
+
}
|
|
33
|
+
}
|
|
34
|
+
```
|
|
35
|
+
|
|
36
|
+
Response:
|
|
37
|
+
```json
|
|
38
|
+
{
|
|
39
|
+
"ok": true,
|
|
40
|
+
"event_id": "evt_123",
|
|
41
|
+
"signature_id": "sig_456",
|
|
42
|
+
"candidate_ids": ["cand_789"]
|
|
43
|
+
}
|
|
44
|
+
```
|
|
45
|
+
|
|
46
|
+
---
|
|
47
|
+
|
|
48
|
+
## 2. Bulk Ingest
|
|
49
|
+
|
|
50
|
+
### `POST /events/ingest_bulk`
|
|
51
|
+
|
|
52
|
+
For log file batches, webhook batches, or replay.
|
|
53
|
+
|
|
54
|
+
Request:
|
|
55
|
+
```json
|
|
56
|
+
{
|
|
57
|
+
"events": [ ... ],
|
|
58
|
+
"source_label": "syslog-replay"
|
|
59
|
+
}
|
|
60
|
+
```
|
|
61
|
+
|
|
62
|
+
Response:
|
|
63
|
+
```json
|
|
64
|
+
{
|
|
65
|
+
"ok": true,
|
|
66
|
+
"ingested": 500,
|
|
67
|
+
"signatures_created": 12,
|
|
68
|
+
"candidates_created": 6
|
|
69
|
+
}
|
|
70
|
+
```
|
|
71
|
+
|
|
72
|
+
---
|
|
73
|
+
|
|
74
|
+
## 3. Search Events
|
|
75
|
+
|
|
76
|
+
### `POST /events/search`
|
|
77
|
+
|
|
78
|
+
Search raw/normalized events.
|
|
79
|
+
|
|
80
|
+
Request:
|
|
81
|
+
```json
|
|
82
|
+
{
|
|
83
|
+
"tenant_id": "client-a",
|
|
84
|
+
"query": "rekey failed",
|
|
85
|
+
"host": "fw-01",
|
|
86
|
+
"service": "charon",
|
|
87
|
+
"since": "2026-03-22T00:00:00Z",
|
|
88
|
+
"limit": 50
|
|
89
|
+
}
|
|
90
|
+
```
|
|
91
|
+
|
|
92
|
+
Response:
|
|
93
|
+
```json
|
|
94
|
+
{
|
|
95
|
+
"ok": true,
|
|
96
|
+
"results": [ ... ]
|
|
97
|
+
}
|
|
98
|
+
```
|
|
99
|
+
|
|
100
|
+
---
|
|
101
|
+
|
|
102
|
+
## 4. Search Candidates
|
|
103
|
+
|
|
104
|
+
### `POST /candidates/search`
|
|
105
|
+
|
|
106
|
+
Search meaningful derived patterns.
|
|
107
|
+
|
|
108
|
+
Request:
|
|
109
|
+
```json
|
|
110
|
+
{
|
|
111
|
+
"tenant_id": "client-a",
|
|
112
|
+
"decision_band": ["review", "urgent_human_review"],
|
|
113
|
+
"candidate_type": ["recurrence", "self_heal"],
|
|
114
|
+
"limit": 20
|
|
115
|
+
}
|
|
116
|
+
```
|
|
117
|
+
|
|
118
|
+
Response:
|
|
119
|
+
```json
|
|
120
|
+
{
|
|
121
|
+
"ok": true,
|
|
122
|
+
"results": [ ... ]
|
|
123
|
+
}
|
|
124
|
+
```
|
|
125
|
+
|
|
126
|
+
---
|
|
127
|
+
|
|
128
|
+
## 5. Get Candidate Explanation
|
|
129
|
+
|
|
130
|
+
### `POST /candidates/explain`
|
|
131
|
+
|
|
132
|
+
Request:
|
|
133
|
+
```json
|
|
134
|
+
{
|
|
135
|
+
"candidate_id": "cand_789"
|
|
136
|
+
}
|
|
137
|
+
```
|
|
138
|
+
|
|
139
|
+
Response:
|
|
140
|
+
```json
|
|
141
|
+
{
|
|
142
|
+
"ok": true,
|
|
143
|
+
"candidate": { ... },
|
|
144
|
+
"score_breakdown": {
|
|
145
|
+
"recurrence": 0.8,
|
|
146
|
+
"recovery": 0.6,
|
|
147
|
+
"precursor": 0.7
|
|
148
|
+
},
|
|
149
|
+
"evidence": {
|
|
150
|
+
"event_count": 12,
|
|
151
|
+
"signature_ids": ["sig_456"],
|
|
152
|
+
"related_incidents": ["inc_100"]
|
|
153
|
+
},
|
|
154
|
+
"explanation": "This recurring self-healing VPN issue has increased in frequency and matched a prior promoted incident."
|
|
155
|
+
}
|
|
156
|
+
```
|
|
157
|
+
|
|
158
|
+
---
|
|
159
|
+
|
|
160
|
+
## 6. Promote Candidate
|
|
161
|
+
|
|
162
|
+
### `POST /candidates/promote`
|
|
163
|
+
|
|
164
|
+
Promote a candidate into durable incident memory.
|
|
165
|
+
|
|
166
|
+
Request:
|
|
167
|
+
```json
|
|
168
|
+
{
|
|
169
|
+
"candidate_id": "cand_789",
|
|
170
|
+
"title": "Recurring VPN rekey instability for client-a",
|
|
171
|
+
"summary": "Observed 12 self-resolving rekey failures over 7 days.",
|
|
172
|
+
"incident_type": "vpn_instability"
|
|
173
|
+
}
|
|
174
|
+
```
|
|
175
|
+
|
|
176
|
+
Response:
|
|
177
|
+
```json
|
|
178
|
+
{
|
|
179
|
+
"ok": true,
|
|
180
|
+
"incident_memory_id": "inc_100"
|
|
181
|
+
}
|
|
182
|
+
```
|
|
183
|
+
|
|
184
|
+
---
|
|
185
|
+
|
|
186
|
+
## 7. Search Incident Memory
|
|
187
|
+
|
|
188
|
+
### `POST /incidents/search`
|
|
189
|
+
|
|
190
|
+
Request:
|
|
191
|
+
```json
|
|
192
|
+
{
|
|
193
|
+
"tenant_id": "client-a",
|
|
194
|
+
"query": "vpn flaps",
|
|
195
|
+
"limit": 10
|
|
196
|
+
}
|
|
197
|
+
```
|
|
198
|
+
|
|
199
|
+
Response:
|
|
200
|
+
```json
|
|
201
|
+
{
|
|
202
|
+
"ok": true,
|
|
203
|
+
"results": [ ... ]
|
|
204
|
+
}
|
|
205
|
+
```
|
|
206
|
+
|
|
207
|
+
---
|
|
208
|
+
|
|
209
|
+
## 8. Related History
|
|
210
|
+
|
|
211
|
+
### `POST /history/related`
|
|
212
|
+
|
|
213
|
+
Given an event, signature, or candidate, return related past operational memory.
|
|
214
|
+
|
|
215
|
+
Request:
|
|
216
|
+
```json
|
|
217
|
+
{
|
|
218
|
+
"subject_type": "signature",
|
|
219
|
+
"subject_id": "sig_456",
|
|
220
|
+
"limit": 10
|
|
221
|
+
}
|
|
222
|
+
```
|
|
223
|
+
|
|
224
|
+
Response:
|
|
225
|
+
```json
|
|
226
|
+
{
|
|
227
|
+
"ok": true,
|
|
228
|
+
"related_candidates": [ ... ],
|
|
229
|
+
"related_incidents": [ ... ],
|
|
230
|
+
"related_lessons": [ ... ]
|
|
231
|
+
}
|
|
232
|
+
```
|
|
233
|
+
|
|
234
|
+
---
|
|
235
|
+
|
|
236
|
+
## 9. Daily Digest
|
|
237
|
+
|
|
238
|
+
### `POST /digest/daily`
|
|
239
|
+
|
|
240
|
+
Generate the operator digest for a tenant or environment.
|
|
241
|
+
|
|
242
|
+
Request:
|
|
243
|
+
```json
|
|
244
|
+
{
|
|
245
|
+
"tenant_id": "client-a",
|
|
246
|
+
"since": "2026-03-21T00:00:00Z",
|
|
247
|
+
"limit": 10
|
|
248
|
+
}
|
|
249
|
+
```
|
|
250
|
+
|
|
251
|
+
Response:
|
|
252
|
+
```json
|
|
253
|
+
{
|
|
254
|
+
"ok": true,
|
|
255
|
+
"items": [
|
|
256
|
+
{
|
|
257
|
+
"candidate_id": "cand_789",
|
|
258
|
+
"title": "Recurring VPN rekey instability",
|
|
259
|
+
"decision_band": "review",
|
|
260
|
+
"why_it_matters": "Repeated 12 times this week, self-resolved each time, and matches a prior incident.",
|
|
261
|
+
"score_total": 0.84
|
|
262
|
+
}
|
|
263
|
+
]
|
|
264
|
+
}
|
|
265
|
+
```
|
|
266
|
+
|
|
267
|
+
---
|
|
268
|
+
|
|
269
|
+
## 10. Review Decision
|
|
270
|
+
|
|
271
|
+
### `POST /review/record`
|
|
272
|
+
|
|
273
|
+
Capture operator judgment.
|
|
274
|
+
|
|
275
|
+
Request:
|
|
276
|
+
```json
|
|
277
|
+
{
|
|
278
|
+
"tenant_id": "client-a",
|
|
279
|
+
"subject_type": "candidate",
|
|
280
|
+
"subject_id": "cand_789",
|
|
281
|
+
"decision": "useful",
|
|
282
|
+
"notes": "This pattern caused tickets last month.",
|
|
283
|
+
"reviewer": "steven"
|
|
284
|
+
}
|
|
285
|
+
```
|
|
286
|
+
|
|
287
|
+
Response:
|
|
288
|
+
```json
|
|
289
|
+
{
|
|
290
|
+
"ok": true,
|
|
291
|
+
"review_id": "rev_101"
|
|
292
|
+
}
|
|
293
|
+
```
|
|
294
|
+
|
|
295
|
+
---
|
|
296
|
+
|
|
297
|
+
## 11. LogicMonitor ingest
|
|
298
|
+
|
|
299
|
+
### `POST /connectors/logicmonitor/events`
|
|
300
|
+
|
|
301
|
+
Accept LogicMonitor-originated alert/event payloads.
|
|
302
|
+
|
|
303
|
+
Request:
|
|
304
|
+
```json
|
|
305
|
+
{
|
|
306
|
+
"tenant_id": "client-a",
|
|
307
|
+
"resource_id": 12345,
|
|
308
|
+
"host": "edge-fw-01",
|
|
309
|
+
"service": "vpn",
|
|
310
|
+
"severity": "warning",
|
|
311
|
+
"alert_id": 998877,
|
|
312
|
+
"message_raw": "VPN tunnel dropped and recovered",
|
|
313
|
+
"timestamp": "2026-03-22T00:00:00Z",
|
|
314
|
+
"metadata": {
|
|
315
|
+
"datasource": "IPSec Tunnel",
|
|
316
|
+
"instance_name": "site-b",
|
|
317
|
+
"acknowledged": false,
|
|
318
|
+
"cleared_at": "2026-03-22T00:00:32Z"
|
|
319
|
+
}
|
|
320
|
+
}
|
|
321
|
+
```
|
|
322
|
+
|
|
323
|
+
Response:
|
|
324
|
+
```json
|
|
325
|
+
{
|
|
326
|
+
"ok": true,
|
|
327
|
+
"event_id": "evt_123",
|
|
328
|
+
"signature_id": "sig_456",
|
|
329
|
+
"candidate_ids": ["cand_789"]
|
|
330
|
+
}
|
|
331
|
+
```
|
|
332
|
+
|
|
333
|
+
### `POST /connectors/logicmonitor/sync`
|
|
334
|
+
|
|
335
|
+
Poll/sync LogicMonitor data in batches.
|
|
336
|
+
|
|
337
|
+
Request:
|
|
338
|
+
```json
|
|
339
|
+
{
|
|
340
|
+
"tenant_id": "client-a",
|
|
341
|
+
"since": "2026-03-21T00:00:00Z",
|
|
342
|
+
"mode": "alerts"
|
|
343
|
+
}
|
|
344
|
+
```
|
|
345
|
+
|
|
346
|
+
Response:
|
|
347
|
+
```json
|
|
348
|
+
{
|
|
349
|
+
"ok": true,
|
|
350
|
+
"ingested": 250,
|
|
351
|
+
"signatures_created": 9,
|
|
352
|
+
"candidates_created": 4
|
|
353
|
+
}
|
|
354
|
+
```
|
|
355
|
+
|
|
356
|
+
## 12. Health / Readiness
|
|
357
|
+
|
|
358
|
+
### `GET /healthz`
|
|
359
|
+
Simple liveness/readiness.
|
|
360
|
+
|
|
361
|
+
### `GET /status`
|
|
362
|
+
Compact runtime and ingest status.
|
|
363
|
+
|
|
364
|
+
### `GET /metrics`
|
|
365
|
+
Bounded metrics and counts. Must not block on deep correlation or full integrity scans.
|
|
366
|
+
|
|
367
|
+
---
|
|
368
|
+
|
|
369
|
+
## MVP API subset
|
|
370
|
+
|
|
371
|
+
For MVP, implement first:
|
|
372
|
+
- `/events/ingest`
|
|
373
|
+
- `/events/ingest_bulk`
|
|
374
|
+
- `/candidates/search`
|
|
375
|
+
- `/candidates/explain`
|
|
376
|
+
- `/candidates/promote`
|
|
377
|
+
- `/digest/daily`
|
|
378
|
+
- `/history/related`
|
|
379
|
+
- `/healthz`
|
|
380
|
+
- `/metrics`
|
|
@@ -0,0 +1,333 @@
|
|
|
1
|
+
# Architecture
|
|
2
|
+
|
|
3
|
+
> This document should be read together with `design-governance.md`, which is the canonical governor for early product scope, attention-model decisions, and intake/discovery/output boundaries.
|
|
4
|
+
|
|
5
|
+
## Overview
|
|
6
|
+
|
|
7
|
+
brAInstem converts raw operational inputs into a canonical event stream, assigns and evolves operator attention, and promotes only the most meaningful weak signals into higher-order operational memory.
|
|
8
|
+
|
|
9
|
+
Canonical early pipeline:
|
|
10
|
+
1. receive raw inputs from source-specific adapters
|
|
11
|
+
2. wrap them in provenance-preserving raw input envelopes
|
|
12
|
+
3. parse/canonicalize into a shared internal event model
|
|
13
|
+
4. normalize and fingerprint
|
|
14
|
+
5. assign initial attention
|
|
15
|
+
6. evolve attention over time based on recurrence/spread/history/context
|
|
16
|
+
7. route events/signatures/candidates according to attention band
|
|
17
|
+
8. surface meaningful operator-facing outputs and promote durable patterns when justified
|
|
18
|
+
|
|
19
|
+
The core architectural idea is not merely "store logs and analyze them later."
|
|
20
|
+
It is to maintain a continuously updated stream of operational attention.
|
|
21
|
+
|
|
22
|
+
## Core layers
|
|
23
|
+
|
|
24
|
+
### 1. Input apparatus
|
|
25
|
+
|
|
26
|
+
The input apparatus is the trust boundary of the system.
|
|
27
|
+
Its job is not just to parse data, but to receive inputs robustly, preserve provenance, and decouple ingestion from deeper analysis.
|
|
28
|
+
|
|
29
|
+
Target source classes over time:
|
|
30
|
+
- syslog
|
|
31
|
+
- app logs
|
|
32
|
+
- auth logs
|
|
33
|
+
- firewall / VPN logs
|
|
34
|
+
- file tailing
|
|
35
|
+
- webhook event streams
|
|
36
|
+
- LogicMonitor alerts / events / resource metadata
|
|
37
|
+
- later: journald, Windows events, richer vendor/event adapters
|
|
38
|
+
|
|
39
|
+
The input apparatus should eventually support broad source coverage, but that breadth should be implemented through a stable adapter model rather than ad hoc source-specific hacks.
|
|
40
|
+
|
|
41
|
+
### 2. Raw input envelope
|
|
42
|
+
|
|
43
|
+
All source-specific inputs should first become a provenance-preserving raw envelope.
|
|
44
|
+
|
|
45
|
+
Suggested raw-envelope fields:
|
|
46
|
+
- `envelope_id`
|
|
47
|
+
- `source_id`
|
|
48
|
+
- `source_type`
|
|
49
|
+
- `tenant_id`
|
|
50
|
+
- `received_at`
|
|
51
|
+
- `observed_at`
|
|
52
|
+
- `transport`
|
|
53
|
+
- `raw_payload`
|
|
54
|
+
- `source_metadata`
|
|
55
|
+
- `parse_status`
|
|
56
|
+
|
|
57
|
+
This layer exists so the system can:
|
|
58
|
+
- preserve ugly real-world input faithfully
|
|
59
|
+
- survive malformed records
|
|
60
|
+
- replay and audit parser behavior
|
|
61
|
+
- keep source/protocol complexity at the edge
|
|
62
|
+
|
|
63
|
+
### 3. Canonical event
|
|
64
|
+
|
|
65
|
+
After parsing/canonicalization, all valid inputs should enter one shared internal stream as canonical events.
|
|
66
|
+
|
|
67
|
+
Canonical-event fields should be stable enough that downstream discovery does not care whether an event came from syslog, file tailing, webhook JSON, or a monitoring connector.
|
|
68
|
+
|
|
69
|
+
Suggested canonical-event fields:
|
|
70
|
+
- `event_id`
|
|
71
|
+
- `tenant_id`
|
|
72
|
+
- `source_type`
|
|
73
|
+
- `source_name`
|
|
74
|
+
- `asset_id`
|
|
75
|
+
- `host`
|
|
76
|
+
- `service`
|
|
77
|
+
- `timestamp`
|
|
78
|
+
- `severity`
|
|
79
|
+
- `kind`
|
|
80
|
+
- `message_raw`
|
|
81
|
+
- `message_normalized`
|
|
82
|
+
- `structured_fields`
|
|
83
|
+
- `correlation_keys`
|
|
84
|
+
- `labels`
|
|
85
|
+
- `raw_ref`
|
|
86
|
+
- `ingest_metadata`
|
|
87
|
+
|
|
88
|
+
### 4. Normalization
|
|
89
|
+
|
|
90
|
+
Normalization responsibilities:
|
|
91
|
+
- parse timestamps
|
|
92
|
+
- standardize field names and types
|
|
93
|
+
- extract host / service / actor / IP / user where possible
|
|
94
|
+
- normalize obvious message variants
|
|
95
|
+
- remove or suppress volatile values where appropriate
|
|
96
|
+
- produce a stable signature-friendly representation
|
|
97
|
+
|
|
98
|
+
Normalization is the step that turns heterogeneous inputs into a true shared stream of consciousness.
|
|
99
|
+
|
|
100
|
+
### 5. Signatures and fingerprints
|
|
101
|
+
|
|
102
|
+
Each event should map to a reusable signature:
|
|
103
|
+
- same issue family
|
|
104
|
+
- same subsystem
|
|
105
|
+
- same rough operational meaning
|
|
106
|
+
|
|
107
|
+
Examples:
|
|
108
|
+
- repeated SSH auth failures from one source
|
|
109
|
+
- VPN tunnel flap for site X
|
|
110
|
+
- service Y exited unexpectedly
|
|
111
|
+
- DNS timeout for resolver Z
|
|
112
|
+
|
|
113
|
+
This enables:
|
|
114
|
+
- recurrence counting
|
|
115
|
+
- cross-host spread detection
|
|
116
|
+
- historical comparison
|
|
117
|
+
|
|
118
|
+
### 6. Attention model
|
|
119
|
+
|
|
120
|
+
Attention is the central primitive of brAInstem.
|
|
121
|
+
|
|
122
|
+
The system should not think in a naive binary of keep vs drop.
|
|
123
|
+
Instead, it should assign and evolve **attention** over time.
|
|
124
|
+
|
|
125
|
+
This matters because many important weak signals begin as tiny, individually inconsequential events.
|
|
126
|
+
Those events should not necessarily be promoted immediately, but they should be able to earn more attention later.
|
|
127
|
+
|
|
128
|
+
Suggested early attention bands:
|
|
129
|
+
- `ignore_fast`
|
|
130
|
+
- `background`
|
|
131
|
+
- `watch`
|
|
132
|
+
- `investigate`
|
|
133
|
+
- `promote`
|
|
134
|
+
|
|
135
|
+
Attention should influence:
|
|
136
|
+
- retention depth
|
|
137
|
+
- compute budget
|
|
138
|
+
- eligibility for candidate generation
|
|
139
|
+
- operator visibility
|
|
140
|
+
- promotion into durable memory
|
|
141
|
+
|
|
142
|
+
### 7. Candidate generation / discovery apparatus
|
|
143
|
+
|
|
144
|
+
Candidates are higher-level interpretations of raw events and signatures, such as:
|
|
145
|
+
- recurrence candidate
|
|
146
|
+
- burst candidate
|
|
147
|
+
- spread candidate
|
|
148
|
+
- self-heal candidate
|
|
149
|
+
- precursor candidate
|
|
150
|
+
- anomaly candidate
|
|
151
|
+
|
|
152
|
+
Examples:
|
|
153
|
+
- "VPN tunnel at client A flapped 7 times this week but always recovered within 30 seconds"
|
|
154
|
+
- "same auth anomaly appeared across 4 devices in 2 hours"
|
|
155
|
+
- "service restart storm follows backup completion on host B"
|
|
156
|
+
|
|
157
|
+
The discovery apparatus should review the canonical stream in near-real-time but spend the most effort where attention has already been earned.
|
|
158
|
+
|
|
159
|
+
### 8. Scoring
|
|
160
|
+
|
|
161
|
+
Candidates should receive an operator-attention score, not just a severity score.
|
|
162
|
+
|
|
163
|
+
The score should combine:
|
|
164
|
+
- recurrence
|
|
165
|
+
- recovery behavior
|
|
166
|
+
- spread
|
|
167
|
+
- novelty
|
|
168
|
+
- temporal correlation
|
|
169
|
+
- human impact likelihood
|
|
170
|
+
- precursor likelihood
|
|
171
|
+
- prior memory weight
|
|
172
|
+
- optionally source trust / source criticality
|
|
173
|
+
|
|
174
|
+
Scoring should remain inspectable and explainable.
|
|
175
|
+
Opaque scoring should be avoided early.
|
|
176
|
+
|
|
177
|
+
### 9. Routing
|
|
178
|
+
|
|
179
|
+
After attention/scoring, events and candidates should be routed according to what kind of handling they deserve.
|
|
180
|
+
|
|
181
|
+
Examples:
|
|
182
|
+
- ignore quickly and cheaply
|
|
183
|
+
- retain in background state
|
|
184
|
+
- maintain on a watchlist
|
|
185
|
+
- investigate now
|
|
186
|
+
- promote into operator-facing output
|
|
187
|
+
- promote into durable incident memory / lessons later
|
|
188
|
+
|
|
189
|
+
### 10. Promotion
|
|
190
|
+
|
|
191
|
+
Important candidates can be promoted into durable memory forms:
|
|
192
|
+
- `incident_memory`
|
|
193
|
+
- `lesson`
|
|
194
|
+
- `runbook_hint`
|
|
195
|
+
- `watch_pattern`
|
|
196
|
+
- `tenant_risk_marker`
|
|
197
|
+
|
|
198
|
+
### 7. Retrieval
|
|
199
|
+
|
|
200
|
+
Two main retrieval modes:
|
|
201
|
+
|
|
202
|
+
#### Investigative retrieval
|
|
203
|
+
For questions like:
|
|
204
|
+
- have we seen this before?
|
|
205
|
+
- what happened before the outage?
|
|
206
|
+
- what similar incidents exist?
|
|
207
|
+
|
|
208
|
+
#### Proactive retrieval
|
|
209
|
+
For new events:
|
|
210
|
+
- show prior related incidents
|
|
211
|
+
- show matching lessons / runbook hints
|
|
212
|
+
- explain why this signal matters now
|
|
213
|
+
|
|
214
|
+
## LogicMonitor connector
|
|
215
|
+
|
|
216
|
+
LogicMonitor should be treated as a high-value connector, not a hard dependency.
|
|
217
|
+
|
|
218
|
+
### Why it fits
|
|
219
|
+
LogicMonitor already knows about:
|
|
220
|
+
- devices and resources
|
|
221
|
+
- alerts and alert history
|
|
222
|
+
- collectors and monitored services
|
|
223
|
+
- severity and topology-adjacent metadata
|
|
224
|
+
|
|
225
|
+
brAInstem should add:
|
|
226
|
+
- recurrence memory
|
|
227
|
+
- weak-signal detection
|
|
228
|
+
- cross-time pattern recall
|
|
229
|
+
- human-threshold significance scoring
|
|
230
|
+
|
|
231
|
+
### Integration modes
|
|
232
|
+
1. **Webhook ingest**
|
|
233
|
+
- preferred for fast MVP integration
|
|
234
|
+
- LogicMonitor pushes alert/event payloads into brAInstem
|
|
235
|
+
|
|
236
|
+
2. **Polling connector**
|
|
237
|
+
- periodic fetch of alert/event history and resource metadata
|
|
238
|
+
- useful for replay, backfill, and correlation
|
|
239
|
+
|
|
240
|
+
3. **Context enrichment**
|
|
241
|
+
- use LogicMonitor device/resource metadata to enrich events with:
|
|
242
|
+
- resource id
|
|
243
|
+
- datasource
|
|
244
|
+
- instance
|
|
245
|
+
- host group / site
|
|
246
|
+
- collector context
|
|
247
|
+
|
|
248
|
+
### Mapping into brAInstem
|
|
249
|
+
LogicMonitor payloads should map into native objects like:
|
|
250
|
+
- `tenant`
|
|
251
|
+
- `asset`
|
|
252
|
+
- `event`
|
|
253
|
+
- `signature`
|
|
254
|
+
- `candidate`
|
|
255
|
+
|
|
256
|
+
Recommended event metadata fields:
|
|
257
|
+
- `source_type = logicmonitor`
|
|
258
|
+
- `alert_id`
|
|
259
|
+
- `resource_id`
|
|
260
|
+
- `datasource`
|
|
261
|
+
- `instance_name`
|
|
262
|
+
- `severity`
|
|
263
|
+
- `acknowledged`
|
|
264
|
+
- `cleared_at`
|
|
265
|
+
|
|
266
|
+
### Product role
|
|
267
|
+
LogicMonitor should supply operational signal and context.
|
|
268
|
+
brAInstem should decide whether a pattern is recurring, meaningful, and worth a human's attention even when the original LM alert auto-cleared or stayed below escalation importance.
|
|
269
|
+
|
|
270
|
+
## Suggested storage model
|
|
271
|
+
|
|
272
|
+
### Main DB
|
|
273
|
+
- events
|
|
274
|
+
- signatures
|
|
275
|
+
- candidates
|
|
276
|
+
- incident_memory
|
|
277
|
+
- lessons
|
|
278
|
+
- runbooks
|
|
279
|
+
- links
|
|
280
|
+
- provenance
|
|
281
|
+
- review decisions
|
|
282
|
+
|
|
283
|
+
### File / object store
|
|
284
|
+
- raw log bundles
|
|
285
|
+
- retained evidence files
|
|
286
|
+
- archived slices for drill-down
|
|
287
|
+
|
|
288
|
+
### Vector / semantic layer
|
|
289
|
+
Use selectively for:
|
|
290
|
+
- fuzzy similarity
|
|
291
|
+
- natural-language search over incidents and lessons
|
|
292
|
+
|
|
293
|
+
But keep the main architecture centered on:
|
|
294
|
+
- deterministic fingerprints
|
|
295
|
+
- structured recurrence
|
|
296
|
+
- temporal correlation
|
|
297
|
+
- explainable scoring
|
|
298
|
+
|
|
299
|
+
## Operator-facing outputs
|
|
300
|
+
|
|
301
|
+
The product value appears here, not merely in the internal pipeline.
|
|
302
|
+
|
|
303
|
+
Early operator-facing outputs should aim to answer:
|
|
304
|
+
- what is happening?
|
|
305
|
+
- why does it matter?
|
|
306
|
+
- where / how often is it happening?
|
|
307
|
+
- have we seen this before?
|
|
308
|
+
- what should a human check next?
|
|
309
|
+
|
|
310
|
+
Suggested early outputs:
|
|
311
|
+
- interesting items list
|
|
312
|
+
- daily weak-signal digest
|
|
313
|
+
- recurring issue feed
|
|
314
|
+
- incident precursor feed
|
|
315
|
+
- prior-incident recall on demand
|
|
316
|
+
- explainability payload for every surfaced item
|
|
317
|
+
|
|
318
|
+
The output contract is as important as the event/attention contract.
|
|
319
|
+
If the surfaced artifacts are noisy, bloated, or unactionable, the architecture has failed even if the internals are elegant.
|
|
320
|
+
|
|
321
|
+
## Relationship to ocmemog
|
|
322
|
+
|
|
323
|
+
Shared mechanisms:
|
|
324
|
+
- ingest -> candidate -> promotion
|
|
325
|
+
- provenance
|
|
326
|
+
- explainability
|
|
327
|
+
- retrieval
|
|
328
|
+
- confidence / scoring concepts
|
|
329
|
+
|
|
330
|
+
Different data model:
|
|
331
|
+
- raw events instead of chat turns
|
|
332
|
+
- signatures instead of prompt-centric queries
|
|
333
|
+
- incidents and lessons instead of primarily continuity memories
|