wicked-brain 0.16.1 → 0.17.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "wicked-brain",
3
- "version": "0.16.1",
3
+ "version": "0.17.0",
4
4
  "type": "module",
5
5
  "description": "Digital brain as skills for AI coding CLIs — no vector DB, no embeddings, no infrastructure",
6
6
  "keywords": [
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "wicked-brain-server",
3
- "version": "0.16.1",
3
+ "version": "0.17.0",
4
4
  "type": "module",
5
5
  "description": "SQLite FTS5 search server for wicked-brain digital knowledge bases",
6
6
  "keywords": [
@@ -115,6 +115,7 @@ entities:
115
115
  people: [{people/roles}]
116
116
  programs: [{programs/initiatives}]
117
117
  metrics: ["{metric}: {value}"]
118
+ method: {extraction method — see "Extraction method" below}
118
119
  confidence: {0.7 for text, 0.85 for vision}
119
120
  indexed_at: {current ISO timestamp}
120
121
  narrative_theme: {the "so what" in 8 words or fewer}
@@ -122,6 +123,23 @@ narrative_theme: {the "so what" in 8 words or fewer}
122
123
 
123
124
  {Extracted content in markdown format}
124
125
 
126
+ ## Extraction method
127
+
128
+ The `method:` field records *how* the chunk's content was obtained — the
129
+ provenance answer to "how do we know this?" It is distinct from `source_type`
130
+ (which is the file format, e.g. `pdf`/`md`/`js`). Set it deterministically
131
+ from the path you are already taking:
132
+
133
+ - `deterministic-parse` — the TEXT path above (Read + split, no model judgement).
134
+ - `llm-vision` — the BINARY path above (content extracted by the model viewing
135
+ the document/image).
136
+
137
+ Use one of the controlled values: `deterministic-parse`, `llm-vision`,
138
+ `llm-synthesis` (model-generated/inferred content), or `manual` (hand-authored).
139
+ The value is plain frontmatter — it is stored and returned verbatim by the
140
+ server with no schema migration. If omitted, downstream lint treats the chunk
141
+ as `method: unknown`; prefer to set it explicitly.
142
+
125
143
  ## Tag Expansion
126
144
 
127
145
  After generating the initial `contains:` tags, expand each keyword with 1-3 synonyms or related terms:
@@ -334,6 +352,10 @@ async function ingestFile(filePath) {
334
352
  " - text",
335
353
  "contains:",
336
354
  ...keywords.map(k => ` - ${k}`),
355
+ // method = HOW this chunk was obtained (provenance), distinct from
356
+ // source_type (file format). The batch path is a deterministic
357
+ // Read + split with no model judgement.
358
+ `method: deterministic-parse`,
337
359
  `confidence: 0.7`,
338
360
  `indexed_at: "${ts}"`,
339
361
  "---",
@@ -82,6 +82,21 @@ For each wiki article with source_hashes in frontmatter:
82
82
  ### Missing frontmatter
83
83
  Check each chunk has required frontmatter fields (source, chunk_id, confidence, indexed_at).
84
84
 
85
+ Also check the **provenance** field `method` (how the chunk/memory was obtained:
86
+ `deterministic-parse`, `llm-vision`, `llm-synthesis`, `manual`, or
87
+ `session-capture` for memories). `method` is **optional** — it was added after
88
+ some content was written, so a chunk/memory without it is still valid. When it
89
+ is missing, auto-fix by stamping `method: unknown` and report the fix as `info`
90
+ severity, type `missing_field` (do NOT raise it to a warning/error — that would
91
+ invalidate pre-existing content). Surfacing `method: unknown` lets a reviewer
92
+ distinguish facts with known provenance from those whose origin was never
93
+ recorded.
94
+
95
+ Lightweight provenance check (the "no source ⇒ assumption" rule): if a chunk has
96
+ no `source`/`source_path` and its `method` is not one of the inferred kinds
97
+ (`llm-synthesis`, `unknown`), flag it `info`, type `missing_field`:
98
+ `unsourced fact with method "{method}" — add a source or set method to llm-synthesis`.
99
+
85
100
  ### Tag synonym candidates
86
101
 
87
102
  Call the server to get all tag frequencies:
@@ -100,6 +100,7 @@ Write to `{brain_path}/memory/{safe_name}.md`:
100
100
  ---
101
101
  type: {detected or provided type}
102
102
  tier: {resolved tier from Step 2b}
103
+ method: {extraction method — see "Extraction method" below}
103
104
  confidence: 0.5
104
105
  importance: {from type defaults or override}
105
106
  ttl_days: {from type defaults or override, null if permanent}
@@ -117,6 +118,21 @@ indexed_at: "{ISO 8601 timestamp}"
117
118
  {memory content}
118
119
  ```
119
120
 
121
+ #### Extraction method
122
+
123
+ The `method:` field records *how* the memory was obtained — the provenance
124
+ answer to "how do we know this?", mirroring the `method:` field on ingested
125
+ chunks. Set it from how the memory came to be:
126
+
127
+ - `session-capture` — captured live from the current session (the default for
128
+ "remember this" during work).
129
+ - `manual` — explicitly stated by the user ("we decided X", interview-style).
130
+ - `llm-synthesis` — inferred/derived by the agent rather than directly observed.
131
+
132
+ Default to `session-capture` when unsure. The value is plain frontmatter,
133
+ stored and returned verbatim by the server (no schema migration). If omitted,
134
+ lint treats the memory as `method: unknown` — prefer to set it explicitly.
135
+
120
136
  #### Tier definitions
121
137
 
122
138
  - **working**: Active, session-specific context. Expires quickly (hours to days). Use for in-progress decisions, temporary notes, and things only relevant to the current task.
@@ -131,6 +147,7 @@ New memories start at the tier resolved from importance (default `episodic` for
131
147
  ---
132
148
  type: decision
133
149
  tier: semantic
150
+ method: manual
134
151
  confidence: 0.9
135
152
  importance: 7
136
153
  ttl_days: null