wicked-brain 0.16.1 → 0.17.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "wicked-brain",
3
- "version": "0.16.1",
3
+ "version": "0.17.1",
4
4
  "type": "module",
5
5
  "description": "Digital brain as skills for AI coding CLIs — no vector DB, no embeddings, no infrastructure",
6
6
  "keywords": [
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "wicked-brain-server",
3
- "version": "0.16.1",
3
+ "version": "0.17.1",
4
4
  "type": "module",
5
5
  "description": "SQLite FTS5 search server for wicked-brain digital knowledge bases",
6
6
  "keywords": [
@@ -115,6 +115,7 @@ entities:
115
115
  people: [{people/roles}]
116
116
  programs: [{programs/initiatives}]
117
117
  metrics: ["{metric}: {value}"]
118
+ method: {extraction method — see "Extraction method" below}
118
119
  confidence: {0.7 for text, 0.85 for vision}
119
120
  indexed_at: {current ISO timestamp}
120
121
  narrative_theme: {the "so what" in 8 words or fewer}
@@ -122,6 +123,29 @@ narrative_theme: {the "so what" in 8 words or fewer}
122
123
 
123
124
  {Extracted content in markdown format}
124
125
 
126
+ ## Extraction method
127
+
128
+ The `method:` field records *how* the chunk's content was obtained — the
129
+ provenance answer to "how do we know this?" It is distinct from `source_type`
130
+ (which is the file format, e.g. `pdf`/`md`/`js`). Set it deterministically
131
+ from the path you are already taking:
132
+
133
+ - `deterministic-parse` — the TEXT path above (Read + split, no model judgement).
134
+ - `llm-vision` — the BINARY path above (content extracted by the model viewing
135
+ the document/image).
136
+
137
+ Use one of the shared controlled values (the same vocabulary across
138
+ `wicked-brain:ingest`, `wicked-brain:memory`, and `wicked-brain:lint`):
139
+ `deterministic-parse`, `llm-vision`, `llm-synthesis`, `session-capture`,
140
+ `manual`, `unknown`. For ingested chunks you will almost always use
141
+ `deterministic-parse` (text) or `llm-vision` (binary); `llm-synthesis` covers
142
+ model-generated/inferred content and `manual` covers hand-authored content.
143
+ `session-capture` applies to memories (see `wicked-brain:memory`), and `unknown`
144
+ is the lint-applied fallback for content written before this field existed. The
145
+ value is plain frontmatter — it is stored and returned verbatim by the server
146
+ with no schema migration. If omitted, downstream lint stamps the chunk as
147
+ `method: unknown`; prefer to set it explicitly.
148
+
125
149
  ## Tag Expansion
126
150
 
127
151
  After generating the initial `contains:` tags, expand each keyword with 1-3 synonyms or related terms:
@@ -334,6 +358,10 @@ async function ingestFile(filePath) {
334
358
  " - text",
335
359
  "contains:",
336
360
  ...keywords.map(k => ` - ${k}`),
361
+ // method = HOW this chunk was obtained (provenance), distinct from
362
+ // source_type (file format). The batch path is a deterministic
363
+ // Read + split with no model judgement.
364
+ `method: deterministic-parse`,
337
365
  `confidence: 0.7`,
338
366
  `indexed_at: "${ts}"`,
339
367
  "---",
@@ -82,6 +82,24 @@ For each wiki article with source_hashes in frontmatter:
82
82
  ### Missing frontmatter
83
83
  Check each chunk has required frontmatter fields (source, chunk_id, confidence, indexed_at).
84
84
 
85
+ Also check the **provenance** field `method` (how the chunk/memory was
86
+ obtained). The shared controlled vocabulary — the same set documented by
87
+ `wicked-brain:ingest` and `wicked-brain:memory` — is: `deterministic-parse`,
88
+ `llm-vision`, `llm-synthesis`, `session-capture`, `manual`, `unknown`. In
89
+ practice `deterministic-parse`/`llm-vision` come from ingested chunks,
90
+ `session-capture` from memories, and `llm-synthesis`/`manual` from either.
91
+ `method` is **optional** — it was added after some content was written, so a
92
+ chunk/memory without it is still valid. When it is missing, auto-fix by
93
+ stamping `method: unknown` and report the fix as `info` severity, type
94
+ `missing_field` (do NOT raise it to a warning/error — that would invalidate
95
+ pre-existing content). Surfacing `method: unknown` lets a reviewer distinguish
96
+ facts with known provenance from those whose origin was never recorded.
97
+
98
+ Lightweight provenance check (the "no source ⇒ assumption" rule): if a chunk has
99
+ no `source`/`source_path` and its `method` is not one of the inferred kinds
100
+ (`llm-synthesis`, `unknown`), flag it `info`, type `missing_field`:
101
+ `unsourced fact with method "{method}" — add a source or set method to llm-synthesis`.
102
+
85
103
  ### Tag synonym candidates
86
104
 
87
105
  Call the server to get all tag frequencies:
@@ -100,6 +100,7 @@ Write to `{brain_path}/memory/{safe_name}.md`:
100
100
  ---
101
101
  type: {detected or provided type}
102
102
  tier: {resolved tier from Step 2b}
103
+ method: {extraction method — see "Extraction method" below}
103
104
  confidence: 0.5
104
105
  importance: {from type defaults or override}
105
106
  ttl_days: {from type defaults or override, null if permanent}
@@ -117,6 +118,29 @@ indexed_at: "{ISO 8601 timestamp}"
117
118
  {memory content}
118
119
  ```
119
120
 
121
+ #### Extraction method
122
+
123
+ The `method:` field records *how* the memory was obtained — the provenance
124
+ answer to "how do we know this?", mirroring the `method:` field on ingested
125
+ chunks. Set it from how the memory came to be:
126
+
127
+ - `session-capture` — captured live from the current session (the default for
128
+ "remember this" during work).
129
+ - `manual` — explicitly stated by the user ("we decided X", interview-style).
130
+ - `llm-synthesis` — inferred/derived by the agent rather than directly observed.
131
+
132
+ These three are the values you will use for memories. They are drawn from the
133
+ shared controlled vocabulary used across `wicked-brain:ingest`,
134
+ `wicked-brain:memory`, and `wicked-brain:lint`: `deterministic-parse`,
135
+ `llm-vision`, `llm-synthesis`, `session-capture`, `manual`, `unknown`. The
136
+ remaining values (`deterministic-parse`, `llm-vision`) describe ingested
137
+ chunks rather than memories, and `unknown` is the lint-applied fallback for
138
+ content written before this field existed.
139
+
140
+ Default to `session-capture` when unsure. The value is plain frontmatter,
141
+ stored and returned verbatim by the server (no schema migration). If omitted,
142
+ lint stamps the memory as `method: unknown` — prefer to set it explicitly.
143
+
120
144
  #### Tier definitions
121
145
 
122
146
  - **working**: Active, session-specific context. Expires quickly (hours to days). Use for in-progress decisions, temporary notes, and things only relevant to the current task.
@@ -131,6 +155,7 @@ New memories start at the tier resolved from importance (default `episodic` for
131
155
  ---
132
156
  type: decision
133
157
  tier: semantic
158
+ method: manual
134
159
  confidence: 0.9
135
160
  importance: 7
136
161
  ttl_days: null