@simbimbo/brainstem 0.0.4 → 0.0.5

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/docs/adapters.md CHANGED
@@ -1,131 +1,435 @@
1
1
  # Adapters and Canonical Event Contract
2
2
 
3
- _Status: aligned contract for implemented intake runtime_
3
+ _Status: design contract for intake breadth without connector chaos_
4
4
 
5
- This document defines the current adapter and driver surface and preserves the same intent as design governance without promising unshipped connectors.
5
+ ## Purpose
6
6
 
7
- Read with:
7
+ This document defines how brAInstem should accept many different source types without turning into an ungoverned pile of bespoke connector logic.
8
8
 
9
+ Its job is to answer:
10
+ - what an adapter is
11
+ - what an adapter is allowed to do
12
+ - what an adapter is not allowed to do
13
+ - what the raw input contract is
14
+ - what the canonical event contract is
15
+ - how failures should be handled
16
+ - what "universal input" means in practice
17
+
18
+ This document should be read together with:
9
19
  - `design-governance.md`
10
20
  - `architecture.md`
11
21
  - `v0.0.1.md`
12
22
 
13
- ## 1. Current runtime intake scope
23
+ ---
24
+
25
+ ## 1. Why adapters exist
26
+
27
+ brAInstem should eventually ingest many classes of operational input:
28
+ - syslog
29
+ - local log files
30
+ - JSON log streams
31
+ - webhook payloads
32
+ - monitoring/alert APIs
33
+ - vendor-specific event formats
34
+ - later: journald, Windows events, queue/stream sources, cloud audit feeds
35
+
36
+ Those sources all have different:
37
+ - transport behavior
38
+ - payload shapes
39
+ - timestamp formats
40
+ - metadata conventions
41
+ - source identity hints
42
+ - failure modes
43
+
44
+ The adapter layer exists so that source-specific ugliness stays at the edge.
45
+
46
+ The rest of the product should primarily deal with:
47
+ - raw input envelopes
48
+ - canonical events
49
+ - attention and routing
50
+ - discovery and memory
51
+
52
+ ### Design rule
53
+ Adapter complexity belongs at the edges.
54
+ The discovery apparatus should not need to know what transport/protocol originally delivered an event.
55
+
56
+ ---
57
+
58
+ ## 2. What "universal input" means
59
+
60
+ "Universal" does **not** mean:
61
+ - native first-class support for every source in the first release
62
+ - a giant vendor integration matrix before the core event model is stable
63
+ - a custom parser for every odd format on day one
64
+
65
+ "Universal" **does** mean:
66
+ - every source can be represented by a raw input envelope
67
+ - every successful parse can become a canonical event
68
+ - every canonical event can enter the same attention/discovery pipeline
69
+ - new sources can be added by implementing a constrained adapter contract instead of creating system-wide exceptions
70
+
71
+ Universal input is a property of the architecture, not a promise of immediate breadth.
72
+
73
+ ---
74
+
75
+ ## 3. Adapter responsibilities
76
+
77
+ An adapter is responsible for:
78
+ 1. receiving source data from a specific source class
79
+ 2. preserving enough provenance to audit where the input came from
80
+ 3. emitting a valid `RawInputEnvelope`
81
+ 4. optionally performing source-local pre-parse validation
82
+ 5. handing the envelope into the parser/canonicalizer stage
83
+
84
+ An adapter is **not** responsible for:
85
+ - long-term memory decisions
86
+ - discovery logic
87
+ - attention scoring policy
88
+ - promotion policy
89
+ - operator-facing explanation generation
90
+
91
+ Adapters should stay narrow.
92
+
93
+ ---
94
+
95
+ ## 4. Adapter categories
96
+
97
+ Early useful categories:
98
+
99
+ ### 4.1 File adapter
100
+ For:
101
+ - local file tails
102
+ - rotated logs
103
+ - directory watch patterns
104
+ - line-oriented service/application logs
105
+
106
+ ### 4.2 Syslog adapter
107
+ For:
108
+ - UDP syslog
109
+ - TCP syslog
110
+ - later TLS syslog if needed
111
+
112
+ ### 4.3 HTTP/webhook adapter
113
+ For:
114
+ - generic JSON event POST
115
+ - vendor webhooks
116
+ - batched events
117
+
118
+ ### 4.4 API pull adapter
119
+ For:
120
+ - periodic polling of event history
121
+ - alert/event backfill
122
+ - vendor APIs like LogicMonitor where polling is useful
123
+
124
+ ### 4.5 Stream adapter
125
+ For:
126
+ - stdin or pipeline-fed events
127
+ - queue/stream integrations later
128
+ - replay tooling
129
+
130
+ These are categories, not promises that all must ship in `v0.0.1`.
131
+
132
+ ---
133
+
134
+ ## 5. Raw input envelope contract
135
+
136
+ Every adapter must emit a raw input envelope before deeper normalization.
137
+
138
+ This contract preserves:
139
+ - provenance
140
+ - transport identity
141
+ - original payload fidelity
142
+ - parser/debug visibility
143
+
144
+ ## Required fields
145
+
146
+ ### `envelope_id`
147
+ - unique id for the raw envelope
148
+ - generated at receipt if source does not provide one
149
+
150
+ ### `source_id`
151
+ - stable identifier for the source instance
152
+ - examples:
153
+ - `syslog:edge-fw-01`
154
+ - `file:/var/log/auth.log`
155
+ - `http:logicmonitor-prod-webhook`
156
+
157
+ ### `source_type`
158
+ - broad source class
159
+ - examples:
160
+ - `syslog`
161
+ - `file`
162
+ - `http`
163
+ - `logicmonitor`
164
+ - `stream`
165
+
166
+ ### `tenant_id`
167
+ - logical tenant/environment owner
168
+ - may be defaulted in early local mode
169
+ - should still exist conceptually even if the first release uses a single tenant
170
+
171
+ ### `received_at`
172
+ - timestamp when brAInstem received the input
173
+
174
+ ### `raw_payload`
175
+ - original raw line/body/payload in preserved form
176
+ - may be string, bytes, or structured object depending on implementation, but must remain recoverable
177
+
178
+ ## Strongly recommended fields
179
+
180
+ ### `observed_at`
181
+ - source-reported timestamp if available
182
+ - may differ from `received_at`
183
+
184
+ ### `transport`
185
+ - example values:
186
+ - `syslog-udp`
187
+ - `syslog-tcp`
188
+ - `http-post`
189
+ - `file-tail`
190
+ - `api-poll`
191
+
192
+ ### `source_metadata`
193
+ - adapter/source-specific metadata
194
+ - examples:
195
+ - file path
196
+ - listener port
197
+ - remote ip
198
+ - vendor alert id
199
+ - request headers subset
200
+ - offset/sequence info
201
+
202
+ ### `parse_status`
203
+ - initial parse state marker
204
+ - example values:
205
+ - `pending`
206
+ - `parsed`
207
+ - `parse_error`
208
+ - `unsupported`
209
+
210
+ ### `sequence_hint`
211
+ - optional source ordering hint when available
212
+
213
+ ## Design rule
214
+ The raw envelope is not the product output.
215
+ It is the preserved intake truth that allows everything else to be audited.
216
+
217
+ ---
218
+
219
+ ## 6. Canonical event contract
220
+
221
+ After parsing/canonicalization, a successful input should become a canonical event.
222
+
223
+ This is the shared internal stream of consciousness.
224
+
225
+ Once an event becomes canonical, the discovery apparatus should not care whether it came from:
226
+ - syslog
227
+ - file tail
228
+ - webhook
229
+ - LogicMonitor
230
+ - future sources
231
+
232
+ ## Required fields
233
+
234
+ ### `event_id`
235
+ - stable unique id for the canonical event
236
+
237
+ ### `tenant_id`
238
+ - the tenant/environment the event belongs to
239
+
240
+ ### `source_type`
241
+ - normalized source family
242
+
243
+ ### `timestamp`
244
+ - best normalized event timestamp
245
+ - prefers true observed time when trustworthy
246
+ - may fall back to receipt time
247
+
248
+ ### `kind`
249
+ - normalized event class
250
+ - examples:
251
+ - `auth_failure`
252
+ - `service_restart`
253
+ - `vpn_flap`
254
+ - `generic_warning`
255
+
256
+ ### `message_raw`
257
+ - original message body after basic extraction
258
+
259
+ ### `message_normalized`
260
+ - normalized message used for fingerprinting/grouping
261
+
262
+ ### `raw_ref`
263
+ - reference back to the raw input envelope or raw store
264
+
265
+ ## Strongly recommended fields
266
+
267
+ ### `source_name`
268
+ - human-meaningful source instance name
269
+
270
+ ### `host`
271
+ - normalized host/device identity where possible
272
+
273
+ ### `asset_id`
274
+ - stable asset identifier if known
275
+
276
+ ### `service`
277
+ - normalized service/subsystem name
278
+
279
+ ### `severity`
280
+ - normalized severity value or band
281
+
282
+ ### `labels`
283
+ - tag-like annotations for routing/discovery
284
+
285
+ ### `structured_fields`
286
+ - extracted structured values
287
+
288
+ ### `correlation_keys`
289
+ - fields likely useful for grouping/spread/recurrence logic
290
+
291
+ ### `ingest_metadata`
292
+ - useful canonicalization metadata that should travel downstream
293
+
294
+ ---
295
+
296
+ ## 7. Parse failure handling
297
+
298
+ brAInstem must never quietly erase parse failures.
299
+
300
+ If an adapter can receive input but canonicalization fails, the system should:
301
+ - preserve the raw envelope
302
+ - emit or record a parse-failure state
303
+ - increment parse/decode error counters
304
+ - allow operators/builders to inspect representative failures
305
+
306
+ ### Why this matters
307
+ A malformed payload can still be operationally meaningful.
308
+ Also, if adapters or parsers silently drop bad inputs, trust dies.
309
+
310
+ ### Design rule
311
+ Bad parse is a first-class ingest outcome, not an invisible discard path.
312
+
313
+ ---
314
+
315
+ ## 8. Normalization responsibilities
316
+
317
+ The parser/canonicalizer layer, not the adapter, should own the canonical transformation rules where possible.
318
+
319
+ Normalization responsibilities include:
320
+ - timestamp parsing
321
+ - host/service extraction
322
+ - volatility stripping
323
+ - field normalization
324
+ - message cleanup
325
+ - kind classification
326
+ - preparation for fingerprinting
327
+
328
+ Adapters may do source-local preprocessing when unavoidable, but the canonicalization logic should remain centralized enough that the system has one real opinion about event shape.
329
+
330
+ ---
331
+
332
+ ## 9. Adapter boundaries
333
+
334
+ To prevent connector chaos, adapters should obey these rules.
335
+
336
+ ### Adapter may:
337
+ - receive source input
338
+ - preserve provenance
339
+ - perform source-local validation
340
+ - map obvious source metadata into envelope fields
341
+ - pass through source-specific metadata needed later
342
+
343
+ ### Adapter should avoid:
344
+ - inventing bespoke downstream fields that only one adapter knows about
345
+ - performing discovery logic
346
+ - performing long-term suppression policy
347
+ - making promotion decisions
348
+ - reshaping canonical semantics without going through the canonicalization contract
349
+
350
+ ### Strong anti-pattern
351
+ "This source is special, so we built a one-off downstream path just for it."
352
+
353
+ That is how architecture rots.
14
354
 
15
- The implemented foundation includes:
16
- - `syslog` adapter + source driver
17
- - `file` adapter + source driver
355
+ ---
18
356
 
19
- Both are line-oriented, envelope-first sources. Everything else remains in the roadmap.
357
+ ## 10. Early recommended source support strategy
20
358
 
21
- The UDP listener in `brainstem.listener` is built on the same `syslog` source driver used by API ingestion.
359
+ To keep the product honest and focused, new adapters should be added in this order:
22
360
 
23
- ## 2. Why this adapter layer exists
361
+ ### First
362
+ - file/log ingestion
363
+ - syslog-like ingestion
364
+ - generic HTTP/webhook ingestion
24
365
 
25
- The adapter layer contains source-specific parsing and provenance capture so downstream discovery stays source-agnostic.
366
+ ### Then
367
+ - LogicMonitor
368
+ - other monitoring/alert sources with strong MSP relevance
26
369
 
27
- It should remain narrow:
28
- - parse raw input into a `RawInputEnvelope`
29
- - preserve enough source context for replay and forensics
30
- - avoid discovery/scoring/promotion policy in adapter code
370
+ ### Later
371
+ - richer vendor connectors
372
+ - queue/stream integrations
373
+ - platform-specific event systems
31
374
 
32
- ## 3. Source-driver contract (runtime code)
375
+ This order keeps the architecture universal without pretending infinite source breadth on day one.
33
376
 
34
- `source_drivers.py` registers drivers by `source_type`.
377
+ ---
35
378
 
36
- Current contract:
37
- - `source_type` string
38
- - `parse_payload(payload, tenant_id, source_path="", on_parse_error=None) -> list[RawInputEnvelope]`
379
+ ## 11. Attention and adapters
39
380
 
40
- Implemented drivers:
41
- - `file`
42
- - `syslog`
381
+ Adapters do not assign final operator attention.
43
382
 
44
- Runtime behavior:
45
- - returns zero or more envelopes
46
- - may call `on_parse_error` callback on parse failure
47
- - should not swallow parse exceptions silently
383
+ However, adapters may contribute source metadata that attention scoring later uses, such as:
384
+ - source reliability/trust
385
+ - source criticality
386
+ - source class
387
+ - environment/tenant tags
388
+ - transport characteristics
48
389
 
49
- ## 4. Raw input envelope contract
390
+ This distinction matters:
391
+ - adapters provide evidence and provenance
392
+ - the scoring/discovery apparatus decides attention
50
393
 
51
- Adapters are required to return:
52
- - `tenant_id` (required by ingestion API)
53
- - `source_type` (`file` or `syslog`)
54
- - `timestamp` (ISO-ish string; defaults to UTC now when unknown)
55
- - `message_raw` (non-empty for successful canonicalization)
394
+ ---
56
395
 
57
- Adapters may populate:
58
- - `source_id`
59
- - `source_name`
60
- - `source_path`
61
- - `host`
62
- - `service`
63
- - `severity`
64
- - `asset_id`
65
- - `facility`
66
- - `structured_fields`
67
- - `correlation_keys`
68
- - `metadata` (adapter-local audit data; e.g. `raw_line`)
396
+ ## 12. Audit and replay expectations
69
397
 
70
- Canonicalization outcomes are not part of this contract; they are recorded by the storage layer.
398
+ The adapter + raw envelope system should eventually support:
399
+ - replay of raw inputs into canonicalization/discovery
400
+ - inspection of parse failures
401
+ - verification of source attribution
402
+ - sampling of suppressed/ignored inputs for trust calibration
71
403
 
72
- ## 5. Canonical event contract
404
+ Even if replay tooling is not fully mature in `v0.0.1`, the architecture should preserve the possibility.
73
405
 
74
- `canonicalize_raw_input_envelope` currently emits:
75
- - `tenant_id`
76
- - `source_type`
77
- - `timestamp`
78
- - `message_raw`
79
- - optional `raw_envelope_id`
80
- - `host`
81
- - `service`
82
- - `severity`
83
- - `asset_id`
84
- - `source_path`
85
- - `facility`
86
- - `structured_fields`
87
- - `correlation_keys`
88
- - `message_normalized`
89
- - `signature_input`
90
- - `ingest_metadata` (including `canonicalization_source`, `canonicalized_at`, and raw envelope linkage)
406
+ ---
91
407
 
92
- There is no separate explicit `source_id`/`source_name` field on canonical events in this milestone.
408
+ ## 13. What a good adapter contract enables
93
409
 
94
- ## 6. Parse failure handling
410
+ If this contract is followed, brAInstem can:
411
+ - expand source breadth over time without discovery-layer chaos
412
+ - remain source-agnostic in the core pipeline
413
+ - preserve trust via provenance and replayability
414
+ - maintain one real stream of operational consciousness
95
415
 
96
- When canonicalization fails:
97
- - intake row is still tracked as `parse_failed`
98
- - the raw envelope remains queryable in storage
99
- - parsing failure reason is captured
100
- - later replay is possible via `/replay/raw` (DB-backed replay path)
416
+ If this contract is ignored, brAInstem becomes:
417
+ - connector soup
418
+ - parsing exceptions everywhere
419
+ - brittle discovery logic
420
+ - untrustworthy ingestion
101
421
 
102
- This is an explicit trust boundary requirement, not a silent discard path.
422
+ That must be avoided.
103
423
 
104
- ## 7. Adapter boundaries
424
+ ---
105
425
 
106
- Adapters may:
107
- - adapt transport/source quirks
108
- - extract obvious source metadata
426
+ ## 14. v0.0.1 implication
109
427
 
110
- Adapters should not:
111
- - assign attention
112
- - alter candidate generation policy
113
- - run promotion logic
114
- - persist raw envelopes or candidates directly
115
-
116
- ## 8. Planned intake categories
428
+ For `v0.0.1`, the key requirement is not broad adapter count.
429
+ It is that the repo clearly defines:
430
+ - the adapter model
431
+ - the raw envelope concept
432
+ - the canonical event concept
433
+ - the relationship between adapters and the attention/discovery pipeline
117
434
 
118
- These remain design targets beyond the current milestone:
119
- - TCP/TLS syslog transport
120
- - webhook/API pull sources
121
- - queue/stream drivers
122
- - richer vendor-native adapters
123
-
124
- ## 9. Why this is still the right architecture now
125
-
126
- Even with two drivers, the architecture can stay universal:
127
- - all registered drivers produce one `RawInputEnvelope` shape
128
- - all successful envelopes become canonical events in one stream
129
- - parse failures remain inspectable and replayable
130
-
131
- This is the smallest practical intake foundation that matches current runtime implementation.
435
+ That is enough for a truthful first release.
package/docs/api.md CHANGED
@@ -7,6 +7,7 @@ This document reflects the runtime and API surfaces that are implemented today.
7
7
  - ingest endpoints
8
8
  - `POST /ingest/event`
9
9
  - `POST /ingest/batch`
10
+ - `POST /ingest/logicmonitor`
10
11
  - `POST /replay/raw`
11
12
  - inspection endpoints
12
13
  - `GET /interesting`
@@ -40,8 +41,10 @@ This document reflects the runtime and API surfaces that are implemented today.
40
41
  Registered source types in this milestone are:
41
42
  - `syslog`
42
43
  - `file`
44
+ - `logicmonitor`
43
45
 
44
- Use `source_type` to select the source driver for raw ingestion events.
46
+ Use `source_type` to select `syslog` or `file` payload drivers for raw envelope ingestion.
47
+ `logicmonitor` payloads are expected on `/ingest/logicmonitor`.
45
48
 
46
49
  ## Shared ingest request fields
47
50
 
@@ -112,6 +115,30 @@ The response includes ingest accounting for this request:
112
115
 
113
116
  Response shape is the same as single-event ingest, with counts based on all batch rows.
114
117
 
118
+ `POST /ingest/logicmonitor` accepts:
119
+
120
+ ```json
121
+ {
122
+ "tenant_id": "demo-tenant",
123
+ "source_path": "/logicmonitor/ingest",
124
+ "threshold": 2,
125
+ "db_path": "/tmp/brainstem.sqlite3",
126
+ "events": [
127
+ {
128
+ "resource_id": 123,
129
+ "resource_name": "edge-fw-01",
130
+ "message_raw": "VPN tunnel dropped and recovered",
131
+ "severity": "warning",
132
+ "metadata": {
133
+ "datasource": "IPSec Tunnel"
134
+ }
135
+ }
136
+ ]
137
+ }
138
+ ```
139
+
140
+ Response shape is the same as other ingest endpoints.
141
+
115
142
  ## Replay raw envelopes
116
143
 
117
144
  `POST /replay/raw` requires a DB-backed set of raw envelope IDs:
@@ -309,10 +336,18 @@ Operator-oriented runtime summary with the same payload as `/healthz`.
309
336
  Returns canonical runtime snapshot:
310
337
  - `version`
311
338
  - `api_token_env`
339
+ - `capability_flags.source_capabilities` (`source_types` + per-source ingest-mode matrix)
312
340
  - runtime defaults and limits
313
341
  - listener defaults
314
342
  - endpoint capability flags
315
343
 
344
+ Current `capability_flags.source_capabilities` includes:
345
+ - `source_types`: `file`, `logicmonitor`, `syslog`
346
+ - `ingest_modes_by_source_type`:
347
+ - `syslog`: `single_event_api`, `batch_api`, `udp_listener`
348
+ - `file`: `single_event_api`, `batch_api`
349
+ - `logicmonitor`: `logicmonitor_webhook`
350
+
316
351
  `GET /runtime` is a fuller diagnostic snapshot with the same runtime summary object used by `/status`/`/healthz`.
317
352
 
318
353
  ## Listener + file/syslog foundation alignment
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "@simbimbo/brainstem",
3
- "version": "0.0.4",
3
+ "version": "0.0.5",
4
4
  "description": "brAInstem — operational memory for weak signals.",
5
5
  "license": "MIT",
6
6
  "type": "module",
package/pyproject.toml CHANGED
@@ -4,7 +4,7 @@ build-backend = "setuptools.build_meta"
4
4
 
5
5
  [project]
6
6
  name = "brainstem"
7
- version = "0.0.4"
7
+ version = "0.0.5"
8
8
  description = "brAInstem — operational memory for weak signals."
9
9
  readme = "README.md"
10
10
  requires-python = ">=3.9"
@@ -16,6 +16,7 @@ def test_syslog_adapter_registry_is_populated() -> None:
16
16
  sources = list_raw_input_source_types()
17
17
  assert "syslog" in sources
18
18
  assert "file" in sources
19
+ assert "logicmonitor" in sources
19
20
  assert isinstance(get_raw_input_adapter("syslog"), RawInputAdapter)
20
21
 
21
22