@simbimbo/brainstem 0.0.1 → 0.0.2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (41) hide show
  1. package/CHANGELOG.md +63 -0
  2. package/README.md +99 -3
  3. package/brainstem/__init__.py +3 -0
  4. package/brainstem/api.py +131 -0
  5. package/brainstem/connectors/__init__.py +1 -0
  6. package/brainstem/connectors/logicmonitor.py +26 -0
  7. package/brainstem/connectors/types.py +16 -0
  8. package/brainstem/demo.py +64 -0
  9. package/brainstem/fingerprint.py +44 -0
  10. package/brainstem/ingest.py +101 -0
  11. package/brainstem/instrumentation.py +38 -0
  12. package/brainstem/interesting.py +62 -0
  13. package/brainstem/models.py +78 -0
  14. package/brainstem/recurrence.py +112 -0
  15. package/brainstem/scoring.py +38 -0
  16. package/brainstem/storage.py +182 -0
  17. package/docs/adapters.md +435 -0
  18. package/docs/api.md +380 -0
  19. package/docs/architecture.md +333 -0
  20. package/docs/connectors.md +66 -0
  21. package/docs/data-model.md +290 -0
  22. package/docs/design-governance.md +595 -0
  23. package/docs/mvp-flow.md +109 -0
  24. package/docs/roadmap.md +87 -0
  25. package/docs/scoring.md +424 -0
  26. package/docs/v0.0.1.md +277 -0
  27. package/docs/vision.md +85 -0
  28. package/package.json +6 -14
  29. package/pyproject.toml +18 -0
  30. package/tests/fixtures/sample_syslog.log +6 -0
  31. package/tests/test_api.py +72 -0
  32. package/tests/test_canonicalization.py +28 -0
  33. package/tests/test_demo.py +25 -0
  34. package/tests/test_fingerprint.py +22 -0
  35. package/tests/test_ingest.py +15 -0
  36. package/tests/test_instrumentation.py +16 -0
  37. package/tests/test_interesting.py +36 -0
  38. package/tests/test_logicmonitor.py +22 -0
  39. package/tests/test_recurrence.py +16 -0
  40. package/tests/test_scoring.py +21 -0
  41. package/tests/test_storage.py +26 -0
@@ -0,0 +1,435 @@
1
+ # Adapters and Canonical Event Contract
2
+
3
+ _Status: design contract for intake breadth without connector chaos_
4
+
5
+ ## Purpose
6
+
7
+ This document defines how brAInstem should accept many different source types without turning into an ungoverned pile of bespoke connector logic.
8
+
9
+ Its job is to answer:
10
+ - what an adapter is
11
+ - what an adapter is allowed to do
12
+ - what an adapter is not allowed to do
13
+ - what the raw input contract is
14
+ - what the canonical event contract is
15
+ - how failures should be handled
16
+ - what "universal input" means in practice
17
+
18
+ This document should be read together with:
19
+ - `design-governance.md`
20
+ - `architecture.md`
21
+ - `v0.0.1.md`
22
+
23
+ ---
24
+
25
+ ## 1. Why adapters exist
26
+
27
+ brAInstem should eventually ingest many classes of operational input:
28
+ - syslog
29
+ - local log files
30
+ - JSON log streams
31
+ - webhook payloads
32
+ - monitoring/alert APIs
33
+ - vendor-specific event formats
34
+ - later: journald, Windows events, queue/stream sources, cloud audit feeds
35
+
36
+ Those sources all have different:
37
+ - transport behavior
38
+ - payload shapes
39
+ - timestamp formats
40
+ - metadata conventions
41
+ - source identity hints
42
+ - failure modes
43
+
44
+ The adapter layer exists so that source-specific ugliness stays at the edge.
45
+
46
+ The rest of the product should primarily deal with:
47
+ - raw input envelopes
48
+ - canonical events
49
+ - attention and routing
50
+ - discovery and memory
51
+
52
+ ### Design rule
53
+ Adapter complexity belongs at the edges.
54
+ The discovery apparatus should not need to know what transport/protocol originally delivered an event.
55
+
56
+ ---
57
+
58
+ ## 2. What "universal input" means
59
+
60
+ "Universal" does **not** mean:
61
+ - native first-class support for every source in the first release
62
+ - a giant vendor integration matrix before the core event model is stable
63
+ - a custom parser for every odd format on day one
64
+
65
+ "Universal" **does** mean:
66
+ - every source can be represented by a raw input envelope
67
+ - every successful parse can become a canonical event
68
+ - every canonical event can enter the same attention/discovery pipeline
69
+ - new sources can be added by implementing a constrained adapter contract instead of creating system-wide exceptions
70
+
71
+ Universal input is a property of the architecture, not a promise of immediate breadth.
72
+
73
+ ---
74
+
75
+ ## 3. Adapter responsibilities
76
+
77
+ An adapter is responsible for:
78
+ 1. receiving source data from a specific source class
79
+ 2. preserving enough provenance to audit where the input came from
80
+ 3. emitting a valid `RawInputEnvelope`
81
+ 4. optionally performing source-local pre-parse validation
82
+ 5. handing the envelope into the parser/canonicalizer stage
83
+
84
+ An adapter is **not** responsible for:
85
+ - long-term memory decisions
86
+ - discovery logic
87
+ - attention scoring policy
88
+ - promotion policy
89
+ - operator-facing explanation generation
90
+
91
+ Adapters should stay narrow.
92
+
93
+ ---
94
+
95
+ ## 4. Adapter categories
96
+
97
+ Early useful categories:
98
+
99
+ ### 4.1 File adapter
100
+ For:
101
+ - local file tails
102
+ - rotated logs
103
+ - directory watch patterns
104
+ - line-oriented service/application logs
105
+
106
+ ### 4.2 Syslog adapter
107
+ For:
108
+ - UDP syslog
109
+ - TCP syslog
110
+ - later TLS syslog if needed
111
+
112
+ ### 4.3 HTTP/webhook adapter
113
+ For:
114
+ - generic JSON event POST
115
+ - vendor webhooks
116
+ - batched events
117
+
118
+ ### 4.4 API pull adapter
119
+ For:
120
+ - periodic polling of event history
121
+ - alert/event backfill
122
+ - vendor APIs like LogicMonitor where polling is useful
123
+
124
+ ### 4.5 Stream adapter
125
+ For:
126
+ - stdin or pipeline-fed events
127
+ - queue/stream integrations later
128
+ - replay tooling
129
+
130
+ These are categories, not promises that all must ship in `v0.0.1`.
131
+
132
+ ---
133
+
134
+ ## 5. Raw input envelope contract
135
+
136
+ Every adapter must emit a raw input envelope before deeper normalization.
137
+
138
+ This contract preserves:
139
+ - provenance
140
+ - transport identity
141
+ - original payload fidelity
142
+ - parser/debug visibility
143
+
144
+ ## Required fields
145
+
146
+ ### `envelope_id`
147
+ - unique id for the raw envelope
148
+ - generated at receipt if source does not provide one
149
+
150
+ ### `source_id`
151
+ - stable identifier for the source instance
152
+ - examples:
153
+ - `syslog:edge-fw-01`
154
+ - `file:/var/log/auth.log`
155
+ - `http:logicmonitor-prod-webhook`
156
+
157
+ ### `source_type`
158
+ - broad source class
159
+ - examples:
160
+ - `syslog`
161
+ - `file`
162
+ - `http`
163
+ - `logicmonitor`
164
+ - `stream`
165
+
166
+ ### `tenant_id`
167
+ - logical tenant/environment owner
168
+ - may be defaulted in early local mode
169
+ - should still exist conceptually even if the first release uses a single tenant
170
+
171
+ ### `received_at`
172
+ - timestamp when brAInstem received the input
173
+
174
+ ### `raw_payload`
175
+ - original raw line/body/payload in preserved form
176
+ - may be string, bytes, or structured object depending on implementation, but must remain recoverable
177
+
178
+ ## Strongly recommended fields
179
+
180
+ ### `observed_at`
181
+ - source-reported timestamp if available
182
+ - may differ from `received_at`
183
+
184
+ ### `transport`
185
+ - example values:
186
+ - `syslog-udp`
187
+ - `syslog-tcp`
188
+ - `http-post`
189
+ - `file-tail`
190
+ - `api-poll`
191
+
192
+ ### `source_metadata`
193
+ - adapter/source-specific metadata
194
+ - examples:
195
+ - file path
196
+ - listener port
197
+ - remote ip
198
+ - vendor alert id
199
+ - request headers subset
200
+ - offset/sequence info
201
+
202
+ ### `parse_status`
203
+ - initial parse state marker
204
+ - example values:
205
+ - `pending`
206
+ - `parsed`
207
+ - `parse_error`
208
+ - `unsupported`
209
+
210
+ ### `sequence_hint`
211
+ - optional source ordering hint when available
212
+
213
+ ## Design rule
214
+ The raw envelope is not the product output.
215
+ It is the preserved intake truth that allows everything else to be audited.
216
+
217
+ ---
218
+
219
+ ## 6. Canonical event contract
220
+
221
+ After parsing/canonicalization, a successful input should become a canonical event.
222
+
223
+ This is the shared internal stream of consciousness.
224
+
225
+ Once an event becomes canonical, the discovery apparatus should not care whether it came from:
226
+ - syslog
227
+ - file tail
228
+ - webhook
229
+ - LogicMonitor
230
+ - future sources
231
+
232
+ ## Required fields
233
+
234
+ ### `event_id`
235
+ - stable unique id for the canonical event
236
+
237
+ ### `tenant_id`
238
+ - the tenant/environment the event belongs to
239
+
240
+ ### `source_type`
241
+ - normalized source family
242
+
243
+ ### `timestamp`
244
+ - best normalized event timestamp
245
+ - prefers true observed time when trustworthy
246
+ - may fall back to receipt time
247
+
248
+ ### `kind`
249
+ - normalized event class
250
+ - examples:
251
+ - `auth_failure`
252
+ - `service_restart`
253
+ - `vpn_flap`
254
+ - `generic_warning`
255
+
256
+ ### `message_raw`
257
+ - original message body after basic extraction
258
+
259
+ ### `message_normalized`
260
+ - normalized message used for fingerprinting/grouping
261
+
262
+ ### `raw_ref`
263
+ - reference back to the raw input envelope or raw store
264
+
265
+ ## Strongly recommended fields
266
+
267
+ ### `source_name`
268
+ - human-meaningful source instance name
269
+
270
+ ### `host`
271
+ - normalized host/device identity where possible
272
+
273
+ ### `asset_id`
274
+ - stable asset identifier if known
275
+
276
+ ### `service`
277
+ - normalized service/subsystem name
278
+
279
+ ### `severity`
280
+ - normalized severity value or band
281
+
282
+ ### `labels`
283
+ - tag-like annotations for routing/discovery
284
+
285
+ ### `structured_fields`
286
+ - extracted structured values
287
+
288
+ ### `correlation_keys`
289
+ - fields likely useful for grouping/spread/recurrence logic
290
+
291
+ ### `ingest_metadata`
292
+ - useful canonicalization metadata that should travel downstream
293
+
294
+ ---
295
+
296
+ ## 7. Parse failure handling
297
+
298
+ brAInstem must never quietly erase parse failures.
299
+
300
+ If an adapter can receive input but canonicalization fails, the system should:
301
+ - preserve the raw envelope
302
+ - emit or record a parse-failure state
303
+ - increment parse/decode error counters
304
+ - allow operators/builders to inspect representative failures
305
+
306
+ ### Why this matters
307
+ A malformed payload can still be operationally meaningful.
308
+ Also, if adapters or parsers silently drop bad inputs, trust dies.
309
+
310
+ ### Design rule
311
+ Bad parse is a first-class ingest outcome, not an invisible discard path.
312
+
313
+ ---
314
+
315
+ ## 8. Normalization responsibilities
316
+
317
+ The parser/canonicalizer layer, not the adapter, should own the canonical transformation rules where possible.
318
+
319
+ Normalization responsibilities include:
320
+ - timestamp parsing
321
+ - host/service extraction
322
+ - volatility stripping
323
+ - field normalization
324
+ - message cleanup
325
+ - kind classification
326
+ - preparation for fingerprinting
327
+
328
+ Adapters may do source-local preprocessing when unavoidable, but the canonicalization logic should remain centralized enough that the system has one real opinion about event shape.
329
+
330
+ ---
331
+
332
+ ## 9. Adapter boundaries
333
+
334
+ To prevent connector chaos, adapters should obey these rules.
335
+
336
+ ### Adapter may:
337
+ - receive source input
338
+ - preserve provenance
339
+ - perform source-local validation
340
+ - map obvious source metadata into envelope fields
341
+ - pass through source-specific metadata needed later
342
+
343
+ ### Adapter should avoid:
344
+ - inventing bespoke downstream fields that only one adapter knows about
345
+ - performing discovery logic
346
+ - performing long-term suppression policy
347
+ - making promotion decisions
348
+ - reshaping canonical semantics without going through the canonicalization contract
349
+
350
+ ### Strong anti-pattern
351
+ "This source is special, so we built a one-off downstream path just for it."
352
+
353
+ That is how architecture rots.
354
+
355
+ ---
356
+
357
+ ## 10. Early recommended source support strategy
358
+
359
+ To keep the product honest and focused, new adapters should be added in this order:
360
+
361
+ ### First
362
+ - file/log ingestion
363
+ - syslog-like ingestion
364
+ - generic HTTP/webhook ingestion
365
+
366
+ ### Then
367
+ - LogicMonitor
368
+ - other monitoring/alert sources with strong MSP relevance
369
+
370
+ ### Later
371
+ - richer vendor connectors
372
+ - queue/stream integrations
373
+ - platform-specific event systems
374
+
375
+ This order keeps the architecture universal without pretending infinite source breadth on day one.
376
+
377
+ ---
378
+
379
+ ## 11. Attention and adapters
380
+
381
+ Adapters do not assign final operator attention.
382
+
383
+ However, adapters may contribute source metadata that attention scoring later uses, such as:
384
+ - source reliability/trust
385
+ - source criticality
386
+ - source class
387
+ - environment/tenant tags
388
+ - transport characteristics
389
+
390
+ This distinction matters:
391
+ - adapters provide evidence and provenance
392
+ - the scoring/discovery apparatus decides attention
393
+
394
+ ---
395
+
396
+ ## 12. Audit and replay expectations
397
+
398
+ The adapter + raw envelope system should eventually support:
399
+ - replay of raw inputs into canonicalization/discovery
400
+ - inspection of parse failures
401
+ - verification of source attribution
402
+ - sampling of suppressed/ignored inputs for trust calibration
403
+
404
+ Even if replay tooling is not fully mature in `v0.0.1`, the architecture should preserve the possibility.
405
+
406
+ ---
407
+
408
+ ## 13. What a good adapter contract enables
409
+
410
+ If this contract is followed, brAInstem can:
411
+ - expand source breadth over time without discovery-layer chaos
412
+ - remain source-agnostic in the core pipeline
413
+ - preserve trust via provenance and replayability
414
+ - maintain one real stream of operational consciousness
415
+
416
+ If this contract is ignored, brAInstem becomes:
417
+ - connector soup
418
+ - parsing exceptions everywhere
419
+ - brittle discovery logic
420
+ - untrustworthy ingestion
421
+
422
+ That must be avoided.
423
+
424
+ ---
425
+
426
+ ## 14. v0.0.1 implication
427
+
428
+ For `v0.0.1`, the key requirement is not broad adapter count.
429
+ It is that the repo clearly defines:
430
+ - the adapter model
431
+ - the raw envelope concept
432
+ - the canonical event concept
433
+ - the relationship between adapters and the attention/discovery pipeline
434
+
435
+ That is enough for a truthful first release.