wicked-brain 0.16.1 → 0.17.1
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/package.json
CHANGED
package/server/package.json
CHANGED
|
@@ -115,6 +115,7 @@ entities:
|
|
|
115
115
|
people: [{people/roles}]
|
|
116
116
|
programs: [{programs/initiatives}]
|
|
117
117
|
metrics: ["{metric}: {value}"]
|
|
118
|
+
method: {extraction method — see "Extraction method" below}
|
|
118
119
|
confidence: {0.7 for text, 0.85 for vision}
|
|
119
120
|
indexed_at: {current ISO timestamp}
|
|
120
121
|
narrative_theme: {the "so what" in 8 words or fewer}
|
|
@@ -122,6 +123,29 @@ narrative_theme: {the "so what" in 8 words or fewer}
|
|
|
122
123
|
|
|
123
124
|
{Extracted content in markdown format}
|
|
124
125
|
|
|
126
|
+
## Extraction method
|
|
127
|
+
|
|
128
|
+
The `method:` field records *how* the chunk's content was obtained — the
|
|
129
|
+
provenance answer to "how do we know this?" It is distinct from `source_type`
|
|
130
|
+
(which is the file format, e.g. `pdf`/`md`/`js`). Set it deterministically
|
|
131
|
+
from the path you are already taking:
|
|
132
|
+
|
|
133
|
+
- `deterministic-parse` — the TEXT path above (Read + split, no model judgement).
|
|
134
|
+
- `llm-vision` — the BINARY path above (content extracted by the model viewing
|
|
135
|
+
the document/image).
|
|
136
|
+
|
|
137
|
+
Use one of the shared controlled values (the same vocabulary across
|
|
138
|
+
`wicked-brain:ingest`, `wicked-brain:memory`, and `wicked-brain:lint`):
|
|
139
|
+
`deterministic-parse`, `llm-vision`, `llm-synthesis`, `session-capture`,
|
|
140
|
+
`manual`, `unknown`. For ingested chunks you will almost always use
|
|
141
|
+
`deterministic-parse` (text) or `llm-vision` (binary); `llm-synthesis` covers
|
|
142
|
+
model-generated/inferred content and `manual` covers hand-authored content.
|
|
143
|
+
`session-capture` applies to memories (see `wicked-brain:memory`), and `unknown`
|
|
144
|
+
is the lint-applied fallback for content written before this field existed. The
|
|
145
|
+
value is plain frontmatter — it is stored and returned verbatim by the server
|
|
146
|
+
with no schema migration. If omitted, downstream lint stamps the chunk as
|
|
147
|
+
`method: unknown`; prefer to set it explicitly.
|
|
148
|
+
|
|
125
149
|
## Tag Expansion
|
|
126
150
|
|
|
127
151
|
After generating the initial `contains:` tags, expand each keyword with 1-3 synonyms or related terms:
|
|
@@ -334,6 +358,10 @@ async function ingestFile(filePath) {
|
|
|
334
358
|
" - text",
|
|
335
359
|
"contains:",
|
|
336
360
|
...keywords.map(k => ` - ${k}`),
|
|
361
|
+
// method = HOW this chunk was obtained (provenance), distinct from
|
|
362
|
+
// source_type (file format). The batch path is a deterministic
|
|
363
|
+
// Read + split with no model judgement.
|
|
364
|
+
`method: deterministic-parse`,
|
|
337
365
|
`confidence: 0.7`,
|
|
338
366
|
`indexed_at: "${ts}"`,
|
|
339
367
|
"---",
|
|
@@ -82,6 +82,24 @@ For each wiki article with source_hashes in frontmatter:
|
|
|
82
82
|
### Missing frontmatter
|
|
83
83
|
Check each chunk has required frontmatter fields (source, chunk_id, confidence, indexed_at).
|
|
84
84
|
|
|
85
|
+
Also check the **provenance** field `method` (how the chunk/memory was
|
|
86
|
+
obtained). The shared controlled vocabulary — the same set documented by
|
|
87
|
+
`wicked-brain:ingest` and `wicked-brain:memory` — is: `deterministic-parse`,
|
|
88
|
+
`llm-vision`, `llm-synthesis`, `session-capture`, `manual`, `unknown`. In
|
|
89
|
+
practice `deterministic-parse`/`llm-vision` come from ingested chunks,
|
|
90
|
+
`session-capture` from memories, and `llm-synthesis`/`manual` from either.
|
|
91
|
+
`method` is **optional** — it was added after some content was written, so a
|
|
92
|
+
chunk/memory without it is still valid. When it is missing, auto-fix by
|
|
93
|
+
stamping `method: unknown` and report the fix as `info` severity, type
|
|
94
|
+
`missing_field` (do NOT raise it to a warning/error — that would invalidate
|
|
95
|
+
pre-existing content). Surfacing `method: unknown` lets a reviewer distinguish
|
|
96
|
+
facts with known provenance from those whose origin was never recorded.
|
|
97
|
+
|
|
98
|
+
Lightweight provenance check (the "no source ⇒ assumption" rule): if a chunk has
|
|
99
|
+
no `source`/`source_path` and its `method` is not one of the inferred kinds
|
|
100
|
+
(`llm-synthesis`, `unknown`), flag it `info`, type `missing_field`:
|
|
101
|
+
`unsourced fact with method "{method}" — add a source or set method to llm-synthesis`.
|
|
102
|
+
|
|
85
103
|
### Tag synonym candidates
|
|
86
104
|
|
|
87
105
|
Call the server to get all tag frequencies:
|
|
@@ -100,6 +100,7 @@ Write to `{brain_path}/memory/{safe_name}.md`:
|
|
|
100
100
|
---
|
|
101
101
|
type: {detected or provided type}
|
|
102
102
|
tier: {resolved tier from Step 2b}
|
|
103
|
+
method: {extraction method — see "Extraction method" below}
|
|
103
104
|
confidence: 0.5
|
|
104
105
|
importance: {from type defaults or override}
|
|
105
106
|
ttl_days: {from type defaults or override, null if permanent}
|
|
@@ -117,6 +118,29 @@ indexed_at: "{ISO 8601 timestamp}"
|
|
|
117
118
|
{memory content}
|
|
118
119
|
```
|
|
119
120
|
|
|
121
|
+
#### Extraction method
|
|
122
|
+
|
|
123
|
+
The `method:` field records *how* the memory was obtained — the provenance
|
|
124
|
+
answer to "how do we know this?", mirroring the `method:` field on ingested
|
|
125
|
+
chunks. Set it from how the memory came to be:
|
|
126
|
+
|
|
127
|
+
- `session-capture` — captured live from the current session (the default for
|
|
128
|
+
"remember this" during work).
|
|
129
|
+
- `manual` — explicitly stated by the user ("we decided X", interview-style).
|
|
130
|
+
- `llm-synthesis` — inferred/derived by the agent rather than directly observed.
|
|
131
|
+
|
|
132
|
+
These three are the values you will use for memories. They are drawn from the
|
|
133
|
+
shared controlled vocabulary used across `wicked-brain:ingest`,
|
|
134
|
+
`wicked-brain:memory`, and `wicked-brain:lint`: `deterministic-parse`,
|
|
135
|
+
`llm-vision`, `llm-synthesis`, `session-capture`, `manual`, `unknown`. The
|
|
136
|
+
remaining values (`deterministic-parse`, `llm-vision`) describe ingested
|
|
137
|
+
chunks rather than memories, and `unknown` is the lint-applied fallback for
|
|
138
|
+
content written before this field existed.
|
|
139
|
+
|
|
140
|
+
Default to `session-capture` when unsure. The value is plain frontmatter,
|
|
141
|
+
stored and returned verbatim by the server (no schema migration). If omitted,
|
|
142
|
+
lint stamps the memory as `method: unknown` — prefer to set it explicitly.
|
|
143
|
+
|
|
120
144
|
#### Tier definitions
|
|
121
145
|
|
|
122
146
|
- **working**: Active, session-specific context. Expires quickly (hours to days). Use for in-progress decisions, temporary notes, and things only relevant to the current task.
|
|
@@ -131,6 +155,7 @@ New memories start at the tier resolved from importance (default `episodic` for
|
|
|
131
155
|
---
|
|
132
156
|
type: decision
|
|
133
157
|
tier: semantic
|
|
158
|
+
method: manual
|
|
134
159
|
confidence: 0.9
|
|
135
160
|
importance: 7
|
|
136
161
|
ttl_days: null
|