@simbimbo/brainstem 0.0.1 → 0.0.3
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/CHANGELOG.md +87 -0
- package/README.md +99 -3
- package/brainstem/__init__.py +3 -0
- package/brainstem/api.py +257 -0
- package/brainstem/connectors/__init__.py +1 -0
- package/brainstem/connectors/logicmonitor.py +26 -0
- package/brainstem/connectors/types.py +16 -0
- package/brainstem/demo.py +64 -0
- package/brainstem/fingerprint.py +44 -0
- package/brainstem/ingest.py +108 -0
- package/brainstem/instrumentation.py +38 -0
- package/brainstem/interesting.py +62 -0
- package/brainstem/models.py +80 -0
- package/brainstem/recurrence.py +112 -0
- package/brainstem/scoring.py +38 -0
- package/brainstem/storage.py +428 -0
- package/docs/adapters.md +435 -0
- package/docs/api.md +380 -0
- package/docs/architecture.md +333 -0
- package/docs/connectors.md +66 -0
- package/docs/data-model.md +290 -0
- package/docs/design-governance.md +595 -0
- package/docs/mvp-flow.md +109 -0
- package/docs/roadmap.md +87 -0
- package/docs/scoring.md +424 -0
- package/docs/v0.0.1.md +277 -0
- package/docs/vision.md +85 -0
- package/package.json +6 -14
- package/pyproject.toml +18 -0
- package/tests/fixtures/sample_syslog.log +6 -0
- package/tests/test_api.py +319 -0
- package/tests/test_canonicalization.py +28 -0
- package/tests/test_demo.py +25 -0
- package/tests/test_fingerprint.py +22 -0
- package/tests/test_ingest.py +15 -0
- package/tests/test_instrumentation.py +16 -0
- package/tests/test_interesting.py +36 -0
- package/tests/test_logicmonitor.py +22 -0
- package/tests/test_recurrence.py +16 -0
- package/tests/test_scoring.py +21 -0
- package/tests/test_storage.py +294 -0
package/docs/adapters.md
ADDED
|
@@ -0,0 +1,435 @@
|
|
|
1
|
+
# Adapters and Canonical Event Contract
|
|
2
|
+
|
|
3
|
+
_Status: design contract for intake breadth without connector chaos_
|
|
4
|
+
|
|
5
|
+
## Purpose
|
|
6
|
+
|
|
7
|
+
This document defines how brAInstem should accept many different source types without turning into an ungoverned pile of bespoke connector logic.
|
|
8
|
+
|
|
9
|
+
Its job is to answer:
|
|
10
|
+
- what an adapter is
|
|
11
|
+
- what an adapter is allowed to do
|
|
12
|
+
- what an adapter is not allowed to do
|
|
13
|
+
- what the raw input contract is
|
|
14
|
+
- what the canonical event contract is
|
|
15
|
+
- how failures should be handled
|
|
16
|
+
- what "universal input" means in practice
|
|
17
|
+
|
|
18
|
+
This document should be read together with:
|
|
19
|
+
- `design-governance.md`
|
|
20
|
+
- `architecture.md`
|
|
21
|
+
- `v0.0.1.md`
|
|
22
|
+
|
|
23
|
+
---
|
|
24
|
+
|
|
25
|
+
## 1. Why adapters exist
|
|
26
|
+
|
|
27
|
+
brAInstem should eventually ingest many classes of operational input:
|
|
28
|
+
- syslog
|
|
29
|
+
- local log files
|
|
30
|
+
- JSON log streams
|
|
31
|
+
- webhook payloads
|
|
32
|
+
- monitoring/alert APIs
|
|
33
|
+
- vendor-specific event formats
|
|
34
|
+
- later: journald, Windows events, queue/stream sources, cloud audit feeds
|
|
35
|
+
|
|
36
|
+
Those sources all have different:
|
|
37
|
+
- transport behavior
|
|
38
|
+
- payload shapes
|
|
39
|
+
- timestamp formats
|
|
40
|
+
- metadata conventions
|
|
41
|
+
- source identity hints
|
|
42
|
+
- failure modes
|
|
43
|
+
|
|
44
|
+
The adapter layer exists so that source-specific ugliness stays at the edge.
|
|
45
|
+
|
|
46
|
+
The rest of the product should primarily deal with:
|
|
47
|
+
- raw input envelopes
|
|
48
|
+
- canonical events
|
|
49
|
+
- attention and routing
|
|
50
|
+
- discovery and memory
|
|
51
|
+
|
|
52
|
+
### Design rule
|
|
53
|
+
Adapter complexity belongs at the edges.
|
|
54
|
+
The discovery apparatus should not need to know what transport/protocol originally delivered an event.
|
|
55
|
+
|
|
56
|
+
---
|
|
57
|
+
|
|
58
|
+
## 2. What "universal input" means
|
|
59
|
+
|
|
60
|
+
"Universal" does **not** mean:
|
|
61
|
+
- native first-class support for every source in the first release
|
|
62
|
+
- a giant vendor integration matrix before the core event model is stable
|
|
63
|
+
- a custom parser for every odd format on day one
|
|
64
|
+
|
|
65
|
+
"Universal" **does** mean:
|
|
66
|
+
- every source can be represented by a raw input envelope
|
|
67
|
+
- every successful parse can become a canonical event
|
|
68
|
+
- every canonical event can enter the same attention/discovery pipeline
|
|
69
|
+
- new sources can be added by implementing a constrained adapter contract instead of creating system-wide exceptions
|
|
70
|
+
|
|
71
|
+
Universal input is a property of the architecture, not a promise of immediate breadth.
|
|
72
|
+
|
|
73
|
+
---
|
|
74
|
+
|
|
75
|
+
## 3. Adapter responsibilities
|
|
76
|
+
|
|
77
|
+
An adapter is responsible for:
|
|
78
|
+
1. receiving source data from a specific source class
|
|
79
|
+
2. preserving enough provenance to audit where the input came from
|
|
80
|
+
3. emitting a valid `RawInputEnvelope`
|
|
81
|
+
4. optionally performing source-local pre-parse validation
|
|
82
|
+
5. handing the envelope into the parser/canonicalizer stage
|
|
83
|
+
|
|
84
|
+
An adapter is **not** responsible for:
|
|
85
|
+
- long-term memory decisions
|
|
86
|
+
- discovery logic
|
|
87
|
+
- attention scoring policy
|
|
88
|
+
- promotion policy
|
|
89
|
+
- operator-facing explanation generation
|
|
90
|
+
|
|
91
|
+
Adapters should stay narrow.
|
|
92
|
+
|
|
93
|
+
---
|
|
94
|
+
|
|
95
|
+
## 4. Adapter categories
|
|
96
|
+
|
|
97
|
+
Early useful categories:
|
|
98
|
+
|
|
99
|
+
### 4.1 File adapter
|
|
100
|
+
For:
|
|
101
|
+
- local file tails
|
|
102
|
+
- rotated logs
|
|
103
|
+
- directory watch patterns
|
|
104
|
+
- line-oriented service/application logs
|
|
105
|
+
|
|
106
|
+
### 4.2 Syslog adapter
|
|
107
|
+
For:
|
|
108
|
+
- UDP syslog
|
|
109
|
+
- TCP syslog
|
|
110
|
+
- later TLS syslog if needed
|
|
111
|
+
|
|
112
|
+
### 4.3 HTTP/webhook adapter
|
|
113
|
+
For:
|
|
114
|
+
- generic JSON event POST
|
|
115
|
+
- vendor webhooks
|
|
116
|
+
- batched events
|
|
117
|
+
|
|
118
|
+
### 4.4 API pull adapter
|
|
119
|
+
For:
|
|
120
|
+
- periodic polling of event history
|
|
121
|
+
- alert/event backfill
|
|
122
|
+
- vendor APIs like LogicMonitor where polling is useful
|
|
123
|
+
|
|
124
|
+
### 4.5 Stream adapter
|
|
125
|
+
For:
|
|
126
|
+
- stdin or pipeline-fed events
|
|
127
|
+
- queue/stream integrations later
|
|
128
|
+
- replay tooling
|
|
129
|
+
|
|
130
|
+
These are categories, not promises that all must ship in `v0.0.1`.
|
|
131
|
+
|
|
132
|
+
---
|
|
133
|
+
|
|
134
|
+
## 5. Raw input envelope contract
|
|
135
|
+
|
|
136
|
+
Every adapter must emit a raw input envelope before deeper normalization.
|
|
137
|
+
|
|
138
|
+
This contract preserves:
|
|
139
|
+
- provenance
|
|
140
|
+
- transport identity
|
|
141
|
+
- original payload fidelity
|
|
142
|
+
- parser/debug visibility
|
|
143
|
+
|
|
144
|
+
## Required fields
|
|
145
|
+
|
|
146
|
+
### `envelope_id`
|
|
147
|
+
- unique id for the raw envelope
|
|
148
|
+
- generated at receipt if source does not provide one
|
|
149
|
+
|
|
150
|
+
### `source_id`
|
|
151
|
+
- stable identifier for the source instance
|
|
152
|
+
- examples:
|
|
153
|
+
- `syslog:edge-fw-01`
|
|
154
|
+
- `file:/var/log/auth.log`
|
|
155
|
+
- `http:logicmonitor-prod-webhook`
|
|
156
|
+
|
|
157
|
+
### `source_type`
|
|
158
|
+
- broad source class
|
|
159
|
+
- examples:
|
|
160
|
+
- `syslog`
|
|
161
|
+
- `file`
|
|
162
|
+
- `http`
|
|
163
|
+
- `logicmonitor`
|
|
164
|
+
- `stream`
|
|
165
|
+
|
|
166
|
+
### `tenant_id`
|
|
167
|
+
- logical tenant/environment owner
|
|
168
|
+
- may be defaulted in early local mode
|
|
169
|
+
- should still exist conceptually even if the first release uses a single tenant
|
|
170
|
+
|
|
171
|
+
### `received_at`
|
|
172
|
+
- timestamp when brAInstem received the input
|
|
173
|
+
|
|
174
|
+
### `raw_payload`
|
|
175
|
+
- original raw line/body/payload in preserved form
|
|
176
|
+
- may be string, bytes, or structured object depending on implementation, but must remain recoverable
|
|
177
|
+
|
|
178
|
+
## Strongly recommended fields
|
|
179
|
+
|
|
180
|
+
### `observed_at`
|
|
181
|
+
- source-reported timestamp if available
|
|
182
|
+
- may differ from `received_at`
|
|
183
|
+
|
|
184
|
+
### `transport`
|
|
185
|
+
- example values:
|
|
186
|
+
- `syslog-udp`
|
|
187
|
+
- `syslog-tcp`
|
|
188
|
+
- `http-post`
|
|
189
|
+
- `file-tail`
|
|
190
|
+
- `api-poll`
|
|
191
|
+
|
|
192
|
+
### `source_metadata`
|
|
193
|
+
- adapter/source-specific metadata
|
|
194
|
+
- examples:
|
|
195
|
+
- file path
|
|
196
|
+
- listener port
|
|
197
|
+
- remote ip
|
|
198
|
+
- vendor alert id
|
|
199
|
+
- request headers subset
|
|
200
|
+
- offset/sequence info
|
|
201
|
+
|
|
202
|
+
### `parse_status`
|
|
203
|
+
- initial parse state marker
|
|
204
|
+
- example values:
|
|
205
|
+
- `pending`
|
|
206
|
+
- `parsed`
|
|
207
|
+
- `parse_error`
|
|
208
|
+
- `unsupported`
|
|
209
|
+
|
|
210
|
+
### `sequence_hint`
|
|
211
|
+
- optional source ordering hint when available
|
|
212
|
+
|
|
213
|
+
## Design rule
|
|
214
|
+
The raw envelope is not the product output.
|
|
215
|
+
It is the preserved intake truth that allows everything else to be audited.
|
|
216
|
+
|
|
217
|
+
---
|
|
218
|
+
|
|
219
|
+
## 6. Canonical event contract
|
|
220
|
+
|
|
221
|
+
After parsing/canonicalization, a successful input should become a canonical event.
|
|
222
|
+
|
|
223
|
+
This is the shared internal stream of consciousness.
|
|
224
|
+
|
|
225
|
+
Once an event becomes canonical, the discovery apparatus should not care whether it came from:
|
|
226
|
+
- syslog
|
|
227
|
+
- file tail
|
|
228
|
+
- webhook
|
|
229
|
+
- LogicMonitor
|
|
230
|
+
- future sources
|
|
231
|
+
|
|
232
|
+
## Required fields
|
|
233
|
+
|
|
234
|
+
### `event_id`
|
|
235
|
+
- stable unique id for the canonical event
|
|
236
|
+
|
|
237
|
+
### `tenant_id`
|
|
238
|
+
- the tenant/environment the event belongs to
|
|
239
|
+
|
|
240
|
+
### `source_type`
|
|
241
|
+
- normalized source family
|
|
242
|
+
|
|
243
|
+
### `timestamp`
|
|
244
|
+
- best normalized event timestamp
|
|
245
|
+
- prefers true observed time when trustworthy
|
|
246
|
+
- may fall back to receipt time
|
|
247
|
+
|
|
248
|
+
### `kind`
|
|
249
|
+
- normalized event class
|
|
250
|
+
- examples:
|
|
251
|
+
- `auth_failure`
|
|
252
|
+
- `service_restart`
|
|
253
|
+
- `vpn_flap`
|
|
254
|
+
- `generic_warning`
|
|
255
|
+
|
|
256
|
+
### `message_raw`
|
|
257
|
+
- original message body after basic extraction
|
|
258
|
+
|
|
259
|
+
### `message_normalized`
|
|
260
|
+
- normalized message used for fingerprinting/grouping
|
|
261
|
+
|
|
262
|
+
### `raw_ref`
|
|
263
|
+
- reference back to the raw input envelope or raw store
|
|
264
|
+
|
|
265
|
+
## Strongly recommended fields
|
|
266
|
+
|
|
267
|
+
### `source_name`
|
|
268
|
+
- human-meaningful source instance name
|
|
269
|
+
|
|
270
|
+
### `host`
|
|
271
|
+
- normalized host/device identity where possible
|
|
272
|
+
|
|
273
|
+
### `asset_id`
|
|
274
|
+
- stable asset identifier if known
|
|
275
|
+
|
|
276
|
+
### `service`
|
|
277
|
+
- normalized service/subsystem name
|
|
278
|
+
|
|
279
|
+
### `severity`
|
|
280
|
+
- normalized severity value or band
|
|
281
|
+
|
|
282
|
+
### `labels`
|
|
283
|
+
- tag-like annotations for routing/discovery
|
|
284
|
+
|
|
285
|
+
### `structured_fields`
|
|
286
|
+
- extracted structured values
|
|
287
|
+
|
|
288
|
+
### `correlation_keys`
|
|
289
|
+
- fields likely useful for grouping/spread/recurrence logic
|
|
290
|
+
|
|
291
|
+
### `ingest_metadata`
|
|
292
|
+
- useful canonicalization metadata that should travel downstream
|
|
293
|
+
|
|
294
|
+
---
|
|
295
|
+
|
|
296
|
+
## 7. Parse failure handling
|
|
297
|
+
|
|
298
|
+
brAInstem must never quietly erase parse failures.
|
|
299
|
+
|
|
300
|
+
If an adapter can receive input but canonicalization fails, the system should:
|
|
301
|
+
- preserve the raw envelope
|
|
302
|
+
- emit or record a parse-failure state
|
|
303
|
+
- increment parse/decode error counters
|
|
304
|
+
- allow operators/builders to inspect representative failures
|
|
305
|
+
|
|
306
|
+
### Why this matters
|
|
307
|
+
A malformed payload can still be operationally meaningful.
|
|
308
|
+
Also, if adapters or parsers silently drop bad inputs, trust dies.
|
|
309
|
+
|
|
310
|
+
### Design rule
|
|
311
|
+
Bad parse is a first-class ingest outcome, not an invisible discard path.
|
|
312
|
+
|
|
313
|
+
---
|
|
314
|
+
|
|
315
|
+
## 8. Normalization responsibilities
|
|
316
|
+
|
|
317
|
+
The parser/canonicalizer layer, not the adapter, should own the canonical transformation rules where possible.
|
|
318
|
+
|
|
319
|
+
Normalization responsibilities include:
|
|
320
|
+
- timestamp parsing
|
|
321
|
+
- host/service extraction
|
|
322
|
+
- volatility stripping
|
|
323
|
+
- field normalization
|
|
324
|
+
- message cleanup
|
|
325
|
+
- kind classification
|
|
326
|
+
- preparation for fingerprinting
|
|
327
|
+
|
|
328
|
+
Adapters may do source-local preprocessing when unavoidable, but the canonicalization logic should remain centralized enough that the system has one real opinion about event shape.
|
|
329
|
+
|
|
330
|
+
---
|
|
331
|
+
|
|
332
|
+
## 9. Adapter boundaries
|
|
333
|
+
|
|
334
|
+
To prevent connector chaos, adapters should obey these rules.
|
|
335
|
+
|
|
336
|
+
### Adapter may:
|
|
337
|
+
- receive source input
|
|
338
|
+
- preserve provenance
|
|
339
|
+
- perform source-local validation
|
|
340
|
+
- map obvious source metadata into envelope fields
|
|
341
|
+
- pass through source-specific metadata needed later
|
|
342
|
+
|
|
343
|
+
### Adapter should avoid:
|
|
344
|
+
- inventing bespoke downstream fields that only one adapter knows about
|
|
345
|
+
- performing discovery logic
|
|
346
|
+
- performing long-term suppression policy
|
|
347
|
+
- making promotion decisions
|
|
348
|
+
- reshaping canonical semantics without going through the canonicalization contract
|
|
349
|
+
|
|
350
|
+
### Strong anti-pattern
|
|
351
|
+
"This source is special, so we built a one-off downstream path just for it."
|
|
352
|
+
|
|
353
|
+
That is how architecture rots.
|
|
354
|
+
|
|
355
|
+
---
|
|
356
|
+
|
|
357
|
+
## 10. Early recommended source support strategy
|
|
358
|
+
|
|
359
|
+
To keep the product honest and focused, new adapters should be added in this order:
|
|
360
|
+
|
|
361
|
+
### First
|
|
362
|
+
- file/log ingestion
|
|
363
|
+
- syslog-like ingestion
|
|
364
|
+
- generic HTTP/webhook ingestion
|
|
365
|
+
|
|
366
|
+
### Then
|
|
367
|
+
- LogicMonitor
|
|
368
|
+
- other monitoring/alert sources with strong MSP relevance
|
|
369
|
+
|
|
370
|
+
### Later
|
|
371
|
+
- richer vendor connectors
|
|
372
|
+
- queue/stream integrations
|
|
373
|
+
- platform-specific event systems
|
|
374
|
+
|
|
375
|
+
This order keeps the architecture universal without pretending infinite source breadth on day one.
|
|
376
|
+
|
|
377
|
+
---
|
|
378
|
+
|
|
379
|
+
## 11. Attention and adapters
|
|
380
|
+
|
|
381
|
+
Adapters do not assign final operator attention.
|
|
382
|
+
|
|
383
|
+
However, adapters may contribute source metadata that attention scoring later uses, such as:
|
|
384
|
+
- source reliability/trust
|
|
385
|
+
- source criticality
|
|
386
|
+
- source class
|
|
387
|
+
- environment/tenant tags
|
|
388
|
+
- transport characteristics
|
|
389
|
+
|
|
390
|
+
This distinction matters:
|
|
391
|
+
- adapters provide evidence and provenance
|
|
392
|
+
- the scoring/discovery apparatus decides attention
|
|
393
|
+
|
|
394
|
+
---
|
|
395
|
+
|
|
396
|
+
## 12. Audit and replay expectations
|
|
397
|
+
|
|
398
|
+
The adapter + raw envelope system should eventually support:
|
|
399
|
+
- replay of raw inputs into canonicalization/discovery
|
|
400
|
+
- inspection of parse failures
|
|
401
|
+
- verification of source attribution
|
|
402
|
+
- sampling of suppressed/ignored inputs for trust calibration
|
|
403
|
+
|
|
404
|
+
Even if replay tooling is not fully mature in `v0.0.1`, the architecture should preserve the possibility.
|
|
405
|
+
|
|
406
|
+
---
|
|
407
|
+
|
|
408
|
+
## 13. What a good adapter contract enables
|
|
409
|
+
|
|
410
|
+
If this contract is followed, brAInstem can:
|
|
411
|
+
- expand source breadth over time without discovery-layer chaos
|
|
412
|
+
- remain source-agnostic in the core pipeline
|
|
413
|
+
- preserve trust via provenance and replayability
|
|
414
|
+
- maintain one real stream of operational consciousness
|
|
415
|
+
|
|
416
|
+
If this contract is ignored, brAInstem becomes:
|
|
417
|
+
- connector soup
|
|
418
|
+
- parsing exceptions everywhere
|
|
419
|
+
- brittle discovery logic
|
|
420
|
+
- untrustworthy ingestion
|
|
421
|
+
|
|
422
|
+
That must be avoided.
|
|
423
|
+
|
|
424
|
+
---
|
|
425
|
+
|
|
426
|
+
## 14. v0.0.1 implication
|
|
427
|
+
|
|
428
|
+
For `v0.0.1`, the key requirement is not broad adapter count.
|
|
429
|
+
It is that the repo clearly defines:
|
|
430
|
+
- the adapter model
|
|
431
|
+
- the raw envelope concept
|
|
432
|
+
- the canonical event concept
|
|
433
|
+
- the relationship between adapters and the attention/discovery pipeline
|
|
434
|
+
|
|
435
|
+
That is enough for a truthful first release.
|