@simbimbo/brainstem 0.0.1 → 0.0.2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (41) hide show
  1. package/CHANGELOG.md +63 -0
  2. package/README.md +99 -3
  3. package/brainstem/__init__.py +3 -0
  4. package/brainstem/api.py +131 -0
  5. package/brainstem/connectors/__init__.py +1 -0
  6. package/brainstem/connectors/logicmonitor.py +26 -0
  7. package/brainstem/connectors/types.py +16 -0
  8. package/brainstem/demo.py +64 -0
  9. package/brainstem/fingerprint.py +44 -0
  10. package/brainstem/ingest.py +101 -0
  11. package/brainstem/instrumentation.py +38 -0
  12. package/brainstem/interesting.py +62 -0
  13. package/brainstem/models.py +78 -0
  14. package/brainstem/recurrence.py +112 -0
  15. package/brainstem/scoring.py +38 -0
  16. package/brainstem/storage.py +182 -0
  17. package/docs/adapters.md +435 -0
  18. package/docs/api.md +380 -0
  19. package/docs/architecture.md +333 -0
  20. package/docs/connectors.md +66 -0
  21. package/docs/data-model.md +290 -0
  22. package/docs/design-governance.md +595 -0
  23. package/docs/mvp-flow.md +109 -0
  24. package/docs/roadmap.md +87 -0
  25. package/docs/scoring.md +424 -0
  26. package/docs/v0.0.1.md +277 -0
  27. package/docs/vision.md +85 -0
  28. package/package.json +6 -14
  29. package/pyproject.toml +18 -0
  30. package/tests/fixtures/sample_syslog.log +6 -0
  31. package/tests/test_api.py +72 -0
  32. package/tests/test_canonicalization.py +28 -0
  33. package/tests/test_demo.py +25 -0
  34. package/tests/test_fingerprint.py +22 -0
  35. package/tests/test_ingest.py +15 -0
  36. package/tests/test_instrumentation.py +16 -0
  37. package/tests/test_interesting.py +36 -0
  38. package/tests/test_logicmonitor.py +22 -0
  39. package/tests/test_recurrence.py +16 -0
  40. package/tests/test_scoring.py +21 -0
  41. package/tests/test_storage.py +26 -0
package/docs/api.md ADDED
@@ -0,0 +1,380 @@
1
+ # API Spec
2
+
3
+ ## Goal
4
+
5
+ Provide a small, explainable API for ingesting events, scoring patterns, promoting memory, and retrieving relevant operational history.
6
+
7
+ The API should support both machine-driven ingestion and operator-driven investigation.
8
+
9
+ ---
10
+
11
+ ## 1. Ingest Event
12
+
13
+ ### `POST /events/ingest`
14
+
15
+ Ingest one normalized or semi-normalized event.
16
+
17
+ Request:
18
+ ```json
19
+ {
20
+ "tenant_id": "client-a",
21
+ "asset_id": "fw-01",
22
+ "source_type": "syslog",
23
+ "source_path": "/var/log/syslog",
24
+ "host": "fw-01",
25
+ "service": "charon",
26
+ "timestamp": "2026-03-22T03:30:00Z",
27
+ "severity": "warning",
28
+ "facility": "daemon",
29
+ "message_raw": "IPsec SA rekey failed; retrying",
30
+ "structured_fields": {
31
+ "peer": "203.0.113.10"
32
+ }
33
+ }
34
+ ```
35
+
36
+ Response:
37
+ ```json
38
+ {
39
+ "ok": true,
40
+ "event_id": "evt_123",
41
+ "signature_id": "sig_456",
42
+ "candidate_ids": ["cand_789"]
43
+ }
44
+ ```
45
+
46
+ ---
47
+
48
+ ## 2. Bulk Ingest
49
+
50
+ ### `POST /events/ingest_bulk`
51
+
52
+ For log file batches, webhook batches, or replay.
53
+
54
+ Request:
55
+ ```json
56
+ {
57
+ "events": [ ... ],
58
+ "source_label": "syslog-replay"
59
+ }
60
+ ```
61
+
62
+ Response:
63
+ ```json
64
+ {
65
+ "ok": true,
66
+ "ingested": 500,
67
+ "signatures_created": 12,
68
+ "candidates_created": 6
69
+ }
70
+ ```
71
+
72
+ ---
73
+
74
+ ## 3. Search Events
75
+
76
+ ### `POST /events/search`
77
+
78
+ Search raw/normalized events.
79
+
80
+ Request:
81
+ ```json
82
+ {
83
+ "tenant_id": "client-a",
84
+ "query": "rekey failed",
85
+ "host": "fw-01",
86
+ "service": "charon",
87
+ "since": "2026-03-22T00:00:00Z",
88
+ "limit": 50
89
+ }
90
+ ```
91
+
92
+ Response:
93
+ ```json
94
+ {
95
+ "ok": true,
96
+ "results": [ ... ]
97
+ }
98
+ ```
99
+
100
+ ---
101
+
102
+ ## 4. Search Candidates
103
+
104
+ ### `POST /candidates/search`
105
+
106
+ Search meaningful derived patterns.
107
+
108
+ Request:
109
+ ```json
110
+ {
111
+ "tenant_id": "client-a",
112
+ "decision_band": ["review", "urgent_human_review"],
113
+ "candidate_type": ["recurrence", "self_heal"],
114
+ "limit": 20
115
+ }
116
+ ```
117
+
118
+ Response:
119
+ ```json
120
+ {
121
+ "ok": true,
122
+ "results": [ ... ]
123
+ }
124
+ ```
125
+
126
+ ---
127
+
128
+ ## 5. Get Candidate Explanation
129
+
130
+ ### `POST /candidates/explain`
131
+
132
+ Request:
133
+ ```json
134
+ {
135
+ "candidate_id": "cand_789"
136
+ }
137
+ ```
138
+
139
+ Response:
140
+ ```json
141
+ {
142
+ "ok": true,
143
+ "candidate": { ... },
144
+ "score_breakdown": {
145
+ "recurrence": 0.8,
146
+ "recovery": 0.6,
147
+ "precursor": 0.7
148
+ },
149
+ "evidence": {
150
+ "event_count": 12,
151
+ "signature_ids": ["sig_456"],
152
+ "related_incidents": ["inc_100"]
153
+ },
154
+ "explanation": "This recurring self-healing VPN issue has increased in frequency and matched a prior promoted incident."
155
+ }
156
+ ```
157
+
158
+ ---
159
+
160
+ ## 6. Promote Candidate
161
+
162
+ ### `POST /candidates/promote`
163
+
164
+ Promote a candidate into durable incident memory.
165
+
166
+ Request:
167
+ ```json
168
+ {
169
+ "candidate_id": "cand_789",
170
+ "title": "Recurring VPN rekey instability for client-a",
171
+ "summary": "Observed 12 self-resolving rekey failures over 7 days.",
172
+ "incident_type": "vpn_instability"
173
+ }
174
+ ```
175
+
176
+ Response:
177
+ ```json
178
+ {
179
+ "ok": true,
180
+ "incident_memory_id": "inc_100"
181
+ }
182
+ ```
183
+
184
+ ---
185
+
186
+ ## 7. Search Incident Memory
187
+
188
+ ### `POST /incidents/search`
189
+
190
+ Request:
191
+ ```json
192
+ {
193
+ "tenant_id": "client-a",
194
+ "query": "vpn flaps",
195
+ "limit": 10
196
+ }
197
+ ```
198
+
199
+ Response:
200
+ ```json
201
+ {
202
+ "ok": true,
203
+ "results": [ ... ]
204
+ }
205
+ ```
206
+
207
+ ---
208
+
209
+ ## 8. Related History
210
+
211
+ ### `POST /history/related`
212
+
213
+ Given an event, signature, or candidate, return related past operational memory.
214
+
215
+ Request:
216
+ ```json
217
+ {
218
+ "subject_type": "signature",
219
+ "subject_id": "sig_456",
220
+ "limit": 10
221
+ }
222
+ ```
223
+
224
+ Response:
225
+ ```json
226
+ {
227
+ "ok": true,
228
+ "related_candidates": [ ... ],
229
+ "related_incidents": [ ... ],
230
+ "related_lessons": [ ... ]
231
+ }
232
+ ```
233
+
234
+ ---
235
+
236
+ ## 9. Daily Digest
237
+
238
+ ### `POST /digest/daily`
239
+
240
+ Generate the operator digest for a tenant or environment.
241
+
242
+ Request:
243
+ ```json
244
+ {
245
+ "tenant_id": "client-a",
246
+ "since": "2026-03-21T00:00:00Z",
247
+ "limit": 10
248
+ }
249
+ ```
250
+
251
+ Response:
252
+ ```json
253
+ {
254
+ "ok": true,
255
+ "items": [
256
+ {
257
+ "candidate_id": "cand_789",
258
+ "title": "Recurring VPN rekey instability",
259
+ "decision_band": "review",
260
+ "why_it_matters": "Repeated 12 times this week, self-resolved each time, and matches a prior incident.",
261
+ "score_total": 0.84
262
+ }
263
+ ]
264
+ }
265
+ ```
266
+
267
+ ---
268
+
269
+ ## 10. Review Decision
270
+
271
+ ### `POST /review/record`
272
+
273
+ Capture operator judgment.
274
+
275
+ Request:
276
+ ```json
277
+ {
278
+ "tenant_id": "client-a",
279
+ "subject_type": "candidate",
280
+ "subject_id": "cand_789",
281
+ "decision": "useful",
282
+ "notes": "This pattern caused tickets last month.",
283
+ "reviewer": "steven"
284
+ }
285
+ ```
286
+
287
+ Response:
288
+ ```json
289
+ {
290
+ "ok": true,
291
+ "review_id": "rev_101"
292
+ }
293
+ ```
294
+
295
+ ---
296
+
297
+ ## 11. LogicMonitor ingest
298
+
299
+ ### `POST /connectors/logicmonitor/events`
300
+
301
+ Accept LogicMonitor-originated alert/event payloads.
302
+
303
+ Request:
304
+ ```json
305
+ {
306
+ "tenant_id": "client-a",
307
+ "resource_id": 12345,
308
+ "host": "edge-fw-01",
309
+ "service": "vpn",
310
+ "severity": "warning",
311
+ "alert_id": 998877,
312
+ "message_raw": "VPN tunnel dropped and recovered",
313
+ "timestamp": "2026-03-22T00:00:00Z",
314
+ "metadata": {
315
+ "datasource": "IPSec Tunnel",
316
+ "instance_name": "site-b",
317
+ "acknowledged": false,
318
+ "cleared_at": "2026-03-22T00:00:32Z"
319
+ }
320
+ }
321
+ ```
322
+
323
+ Response:
324
+ ```json
325
+ {
326
+ "ok": true,
327
+ "event_id": "evt_123",
328
+ "signature_id": "sig_456",
329
+ "candidate_ids": ["cand_789"]
330
+ }
331
+ ```
332
+
333
+ ### `POST /connectors/logicmonitor/sync`
334
+
335
+ Poll/sync LogicMonitor data in batches.
336
+
337
+ Request:
338
+ ```json
339
+ {
340
+ "tenant_id": "client-a",
341
+ "since": "2026-03-21T00:00:00Z",
342
+ "mode": "alerts"
343
+ }
344
+ ```
345
+
346
+ Response:
347
+ ```json
348
+ {
349
+ "ok": true,
350
+ "ingested": 250,
351
+ "signatures_created": 9,
352
+ "candidates_created": 4
353
+ }
354
+ ```
355
+
356
+ ## 12. Health / Readiness
357
+
358
+ ### `GET /healthz`
359
+ Simple liveness/readiness.
360
+
361
+ ### `GET /status`
362
+ Compact runtime and ingest status.
363
+
364
+ ### `GET /metrics`
365
+ Bounded metrics and counts. Must not block on deep correlation or full integrity scans.
366
+
367
+ ---
368
+
369
+ ## MVP API subset
370
+
371
+ For MVP, implement first:
372
+ - `/events/ingest`
373
+ - `/events/ingest_bulk`
374
+ - `/candidates/search`
375
+ - `/candidates/explain`
376
+ - `/candidates/promote`
377
+ - `/digest/daily`
378
+ - `/history/related`
379
+ - `/healthz`
380
+ - `/metrics`
@@ -0,0 +1,333 @@
1
+ # Architecture
2
+
3
+ > This document should be read together with `design-governance.md`, which is the canonical governor for early product scope, attention-model decisions, and intake/discovery/output boundaries.
4
+
5
+ ## Overview
6
+
7
+ brAInstem converts raw operational inputs into a canonical event stream, assigns and evolves operator attention, and promotes only the most meaningful weak signals into higher-order operational memory.
8
+
9
+ Canonical early pipeline:
10
+ 1. receive raw inputs from source-specific adapters
11
+ 2. wrap them in provenance-preserving raw input envelopes
12
+ 3. parse/canonicalize into a shared internal event model
13
+ 4. normalize and fingerprint
14
+ 5. assign initial attention
15
+ 6. evolve attention over time based on recurrence/spread/history/context
16
+ 7. route events/signatures/candidates according to attention band
17
+ 8. surface meaningful operator-facing outputs and promote durable patterns when justified
18
+
19
+ The core architectural idea is not merely "store logs and analyze them later."
20
+ It is to maintain a continuously updated stream of operational attention.
21
+
22
+ ## Core layers
23
+
24
+ ### 1. Input apparatus
25
+
26
+ The input apparatus is the trust boundary of the system.
27
+ Its job is not just to parse data, but to receive inputs robustly, preserve provenance, and decouple ingestion from deeper analysis.
28
+
29
+ Target source classes over time:
30
+ - syslog
31
+ - app logs
32
+ - auth logs
33
+ - firewall / VPN logs
34
+ - file tailing
35
+ - webhook event streams
36
+ - LogicMonitor alerts / events / resource metadata
37
+ - later: journald, Windows events, richer vendor/event adapters
38
+
39
+ The input apparatus should eventually support broad source coverage, but that breadth should be implemented through a stable adapter model rather than ad hoc source-specific hacks.
40
+
41
+ ### 2. Raw input envelope
42
+
43
+ All source-specific inputs should first become a provenance-preserving raw envelope.
44
+
45
+ Suggested raw-envelope fields:
46
+ - `envelope_id`
47
+ - `source_id`
48
+ - `source_type`
49
+ - `tenant_id`
50
+ - `received_at`
51
+ - `observed_at`
52
+ - `transport`
53
+ - `raw_payload`
54
+ - `source_metadata`
55
+ - `parse_status`
56
+
57
+ This layer exists so the system can:
58
+ - preserve ugly real-world input faithfully
59
+ - survive malformed records
60
+ - replay and audit parser behavior
61
+ - keep source/protocol complexity at the edge
62
+
63
+ ### 3. Canonical event
64
+
65
+ After parsing/canonicalization, all valid inputs should enter one shared internal stream as canonical events.
66
+
67
+ Canonical-event fields should be stable enough that downstream discovery does not care whether an event came from syslog, file tailing, webhook JSON, or a monitoring connector.
68
+
69
+ Suggested canonical-event fields:
70
+ - `event_id`
71
+ - `tenant_id`
72
+ - `source_type`
73
+ - `source_name`
74
+ - `asset_id`
75
+ - `host`
76
+ - `service`
77
+ - `timestamp`
78
+ - `severity`
79
+ - `kind`
80
+ - `message_raw`
81
+ - `message_normalized`
82
+ - `structured_fields`
83
+ - `correlation_keys`
84
+ - `labels`
85
+ - `raw_ref`
86
+ - `ingest_metadata`
87
+
88
+ ### 4. Normalization
89
+
90
+ Normalization responsibilities:
91
+ - parse timestamps
92
+ - standardize field names and types
93
+ - extract host / service / actor / IP / user where possible
94
+ - normalize obvious message variants
95
+ - remove or suppress volatile values where appropriate
96
+ - produce a stable signature-friendly representation
97
+
98
+ Normalization is the step that turns heterogeneous inputs into a true shared stream of consciousness.
99
+
100
+ ### 5. Signatures and fingerprints
101
+
102
+ Each event should map to a reusable signature:
103
+ - same issue family
104
+ - same subsystem
105
+ - same rough operational meaning
106
+
107
+ Examples:
108
+ - repeated SSH auth failures from one source
109
+ - VPN tunnel flap for site X
110
+ - service Y exited unexpectedly
111
+ - DNS timeout for resolver Z
112
+
113
+ This enables:
114
+ - recurrence counting
115
+ - cross-host spread detection
116
+ - historical comparison
117
+
118
+ ### 6. Attention model
119
+
120
+ Attention is the central primitive of brAInstem.
121
+
122
+ The system should not think in a naive binary of keep vs drop.
123
+ Instead, it should assign and evolve **attention** over time.
124
+
125
+ This matters because many important weak signals begin as tiny, individually inconsequential events.
126
+ Those events should not necessarily be promoted immediately, but they should be able to earn more attention later.
127
+
128
+ Suggested early attention bands:
129
+ - `ignore_fast`
130
+ - `background`
131
+ - `watch`
132
+ - `investigate`
133
+ - `promote`
134
+
135
+ Attention should influence:
136
+ - retention depth
137
+ - compute budget
138
+ - eligibility for candidate generation
139
+ - operator visibility
140
+ - promotion into durable memory
141
+
142
+ ### 7. Candidate generation / discovery apparatus
143
+
144
+ Candidates are higher-level interpretations of raw events and signatures, such as:
145
+ - recurrence candidate
146
+ - burst candidate
147
+ - spread candidate
148
+ - self-heal candidate
149
+ - precursor candidate
150
+ - anomaly candidate
151
+
152
+ Examples:
153
+ - "VPN tunnel at client A flapped 7 times this week but always recovered within 30 seconds"
154
+ - "same auth anomaly appeared across 4 devices in 2 hours"
155
+ - "service restart storm follows backup completion on host B"
156
+
157
+ The discovery apparatus should review the canonical stream in near-real-time but spend the most effort where attention has already been earned.
158
+
159
+ ### 8. Scoring
160
+
161
+ Candidates should receive an operator-attention score, not just a severity score.
162
+
163
+ The score should combine:
164
+ - recurrence
165
+ - recovery behavior
166
+ - spread
167
+ - novelty
168
+ - temporal correlation
169
+ - human impact likelihood
170
+ - precursor likelihood
171
+ - prior memory weight
172
+ - optionally source trust / source criticality
173
+
174
+ Scoring should remain inspectable and explainable.
175
+ Opaque scoring should be avoided early.
176
+
177
+ ### 9. Routing
178
+
179
+ After attention/scoring, events and candidates should be routed according to what kind of handling they deserve.
180
+
181
+ Examples:
182
+ - ignore quickly and cheaply
183
+ - retain in background state
184
+ - maintain on a watchlist
185
+ - investigate now
186
+ - promote into operator-facing output
187
+ - promote into durable incident memory / lessons later
188
+
189
+ ### 10. Promotion
190
+
191
+ Important candidates can be promoted into durable memory forms:
192
+ - `incident_memory`
193
+ - `lesson`
194
+ - `runbook_hint`
195
+ - `watch_pattern`
196
+ - `tenant_risk_marker`
197
+
198
+ ### 7. Retrieval
199
+
200
+ Two main retrieval modes:
201
+
202
+ #### Investigative retrieval
203
+ For questions like:
204
+ - have we seen this before?
205
+ - what happened before the outage?
206
+ - what similar incidents exist?
207
+
208
+ #### Proactive retrieval
209
+ For new events:
210
+ - show prior related incidents
211
+ - show matching lessons / runbook hints
212
+ - explain why this signal matters now
213
+
214
+ ## LogicMonitor connector
215
+
216
+ LogicMonitor should be treated as a high-value connector, not a hard dependency.
217
+
218
+ ### Why it fits
219
+ LogicMonitor already knows about:
220
+ - devices and resources
221
+ - alerts and alert history
222
+ - collectors and monitored services
223
+ - severity and topology-adjacent metadata
224
+
225
+ brAInstem should add:
226
+ - recurrence memory
227
+ - weak-signal detection
228
+ - cross-time pattern recall
229
+ - human-threshold significance scoring
230
+
231
+ ### Integration modes
232
+ 1. **Webhook ingest**
233
+ - preferred for fast MVP integration
234
+ - LogicMonitor pushes alert/event payloads into brAInstem
235
+
236
+ 2. **Polling connector**
237
+ - periodic fetch of alert/event history and resource metadata
238
+ - useful for replay, backfill, and correlation
239
+
240
+ 3. **Context enrichment**
241
+ - use LogicMonitor device/resource metadata to enrich events with:
242
+ - resource id
243
+ - datasource
244
+ - instance
245
+ - host group / site
246
+ - collector context
247
+
248
+ ### Mapping into brAInstem
249
+ LogicMonitor payloads should map into native objects like:
250
+ - `tenant`
251
+ - `asset`
252
+ - `event`
253
+ - `signature`
254
+ - `candidate`
255
+
256
+ Recommended event metadata fields:
257
+ - `source_type = logicmonitor`
258
+ - `alert_id`
259
+ - `resource_id`
260
+ - `datasource`
261
+ - `instance_name`
262
+ - `severity`
263
+ - `acknowledged`
264
+ - `cleared_at`
265
+
266
+ ### Product role
267
+ LogicMonitor should supply operational signal and context.
268
+ brAInstem should decide whether a pattern is recurring, meaningful, and worth a human's attention even when the original LM alert auto-cleared or stayed below escalation importance.
269
+
270
+ ## Suggested storage model
271
+
272
+ ### Main DB
273
+ - events
274
+ - signatures
275
+ - candidates
276
+ - incident_memory
277
+ - lessons
278
+ - runbooks
279
+ - links
280
+ - provenance
281
+ - review decisions
282
+
283
+ ### File / object store
284
+ - raw log bundles
285
+ - retained evidence files
286
+ - archived slices for drill-down
287
+
288
+ ### Vector / semantic layer
289
+ Use selectively for:
290
+ - fuzzy similarity
291
+ - natural-language search over incidents and lessons
292
+
293
+ But keep the main architecture centered on:
294
+ - deterministic fingerprints
295
+ - structured recurrence
296
+ - temporal correlation
297
+ - explainable scoring
298
+
299
+ ## Operator-facing outputs
300
+
301
+ The product value appears here, not merely in the internal pipeline.
302
+
303
+ Early operator-facing outputs should aim to answer:
304
+ - what is happening?
305
+ - why does it matter?
306
+ - where / how often is it happening?
307
+ - have we seen this before?
308
+ - what should a human check next?
309
+
310
+ Suggested early outputs:
311
+ - interesting items list
312
+ - daily weak-signal digest
313
+ - recurring issue feed
314
+ - incident precursor feed
315
+ - prior-incident recall on demand
316
+ - explainability payload for every surfaced item
317
+
318
+ The output contract is as important as the event/attention contract.
319
+ If the surfaced artifacts are noisy, bloated, or unactionable, the architecture has failed even if the internals are elegant.
320
+
321
+ ## Relationship to ocmemog
322
+
323
+ Shared mechanisms:
324
+ - ingest -> candidate -> promotion
325
+ - provenance
326
+ - explainability
327
+ - retrieval
328
+ - confidence / scoring concepts
329
+
330
+ Different data model:
331
+ - raw events instead of chat turns
332
+ - signatures instead of prompt-centric queries
333
+ - incidents and lessons instead of primarily continuity memories