@vectros-ai/blueprints 0.6.3 → 0.6.4

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -130,7 +130,8 @@ document_ingest:
130
130
  ```
131
131
 
132
132
  Scope later searches to one type with `contentTypes: ["documents"], typeName:
133
- "decision"` (the document-type facet), plus `filters` for `area`/`tags`.
133
+ "decision"` `typeName` narrows documents and records alike (equivalently
134
+ `filters: { recordType: "decision" }`), plus more `filters` for `area`/`tags`.
134
135
 
135
136
  ### Records — the structured artifacts (controls, conventions, gotchas, terms)
136
137
 
@@ -166,7 +167,17 @@ pauses or restarts simply converges — it never double-writes.
166
167
 
167
168
  ## 4. Query it
168
169
 
169
- Documents are queried via search (scoped by `typeName`); records via `record_query`.
170
+ Reach for the most precise tool first: `record_query` for anything *enumerable*
171
+ (exact, cheap, compact), `hybrid_search` to recall by meaning when you don't know
172
+ the exact filter, `rag_ask` for a grounded *answer* over document bodies. Scope by
173
+ type per tool: `hybrid_search` uses `typeName` (which narrows documents and records
174
+ alike), `record_query` uses its `type` argument.
175
+
176
+ Two `hybrid_search` gotchas worth knowing up front: hits carry the surrounding
177
+ passage, so searches are *heavy* — start `limit:3` + `uniqueDocuments:true` and
178
+ escalate only if needed; and the default `mode:HYBRID` uses `textMode:PHRASE`, so a
179
+ long natural-language query can contribute nothing on the keyword leg (a `textScore`
180
+ of `0` on every hit is the tell) — use a short keyword phrase or `textMode:"OR"`.
170
181
 
171
182
  | You want… | Call |
172
183
  |---|---|
@@ -175,7 +186,7 @@ Documents are queried via search (scoped by `typeName`); records via `record_que
175
186
  | "What's the active rule for area X?" | `record_query convention { area:"<area>", status:"active" }` |
176
187
  | "Have we hit this failure before?" | `hybrid_search "<symptom>" contentTypes:["documents"], typeName:"postmortem"`; plus `record_query gotcha { area:"deploy", status:"active" }` |
177
188
  | "Define X" | `record_query term { term:"X" }` (unique lookup) |
178
- | "Latest decisions / search the designs" | `hybrid_search "<topic>" contentTypes:["documents"], typeName:"decision"` (or `"design"`), `filters:{ area:"<area>" }` |
189
+ | "Latest decisions / search the designs" | `hybrid_search "<topic>" contentTypes:["documents"], typeName:"decision"` (or `"design"`, plus `filters:{ area:"<area>" }`) |
179
190
  | "What supersedes a given decision?" | document lookup on `decision` by `supersedes:"<externalId>"` |
180
191
 
181
192
  ## 5. Customize
@@ -236,3 +247,78 @@ change — every type already has `tags`.)
236
247
  - **Re-ingest is keyed on `externalId`** — a backfill never double-writes (an
237
248
  unchanged item returns as-is), re-ingesting edited source with `upsert: true` keeps
238
249
  the KB in sync, and the KB can be rebuilt from source at any time.
250
+
251
+ ### Keep it in sync with your source — the self-describing marker
252
+
253
+ If your knowledge lives in a repo (docs, decision records, runbooks) and the KB mirrors
254
+ it, the two drift the moment someone edits a file. A durable pattern keeps them together
255
+ without a fragile side-index.
256
+
257
+ **Stamp each mirrored file with its KB id, in the file itself** — a one-line HTML comment
258
+ at the top:
259
+
260
+ ```markdown
261
+ <!-- vectros-kb-id: ref-data-model -->
262
+ # Data model reference
263
+ ```
264
+
265
+ - **It's invisible.** An HTML comment renders to nothing in Markdown — on your docs site
266
+ and in the repository view alike — so it never shows on the page, even for files that
267
+ are also documentation-site source.
268
+ - **The id travels with the file.** Move or rename the file and the binding is unchanged;
269
+ because references resolve by `externalId`, your cross-links never break.
270
+ - **No side ledger to maintain.** The file is self-describing — there is no separate
271
+ path-to-id map to keep in lockstep with every add, move, and rename.
272
+
273
+ **Sync is then a one-liner:** on change, re-ingest the file under the id in its marker with
274
+ `upsert: true` — the body re-indexes while the id, typed fields, and edges stay put. A
275
+ merge hook that re-ingests every changed marked file keeps the KB fresh automatically.
276
+
277
+ **The marker is also your membership signal:** a file belongs in the KB exactly when it
278
+ carries a marker, so "add this to the KB" is a one-line edit and unmarked files are simply
279
+ never mirrored — the KB stays a curated subset, not a copy of the whole repo.
280
+
281
+ ### The same, for records extracted from a file
282
+
283
+ Records are *distilled* from a source — a glossary becomes many `term` records, a
284
+ conventions doc becomes many `convention` records — so one file maps to many records, and
285
+ there's no single line to stamp with one id. Two small conventions keep them in sync
286
+ anyway:
287
+
288
+ **1. Each record carries a `sourceRef`** — the identifier of the file it was distilled from.
289
+ So "which records came from this file?" is one query:
290
+
291
+ ```text
292
+ record_query type:term field:sourceRef value:<ref>
293
+ ```
294
+
295
+ **2. The source file declares itself** with a companion marker that names the record
296
+ type(s) it feeds and pins that `ref`:
297
+
298
+ ```markdown
299
+ <!-- vectros-kb-records: term ref=glossary.md -->
300
+ # Glossary
301
+ ```
302
+
303
+ Sync is then symmetric with documents: on a change to a marked source file, read its
304
+ marker, query its existing records by `sourceRef`, re-distill, and `upsert` each by its
305
+ stable `externalId` — refreshing changed ones, adding new headings, superseding any that
306
+ disappeared. Because the `ref` is pinned in the marker (not read from the path), moving the
307
+ file never breaks the link.
308
+
309
+ **Two markers, one model:** `vectros-kb-id` marks a file that *is* a KB document;
310
+ `vectros-kb-records` marks a file that *feeds* KB records. A file can carry either. With
311
+ both, nothing about your KB lives in a separate index — every synced file says what it is,
312
+ right in the file.
313
+
314
+ ### Promote durable knowledge into your repo, then the KB
315
+
316
+ If your agent keeps a private memory or working-notes file, treat it as a **staging area**,
317
+ not a second home for knowledge. When a note matures into something durable and shareable,
318
+ **promote it one-way** into the right repo doc (a conventions file, a troubleshooting
319
+ reference, a post-mortem) — the broadest type that fits — then let the marker + `sourceRef`
320
+ sync above carry it into the KB. Once the repo doc is committed and the ingest is confirmed,
321
+ collapse the memory note to a one-line pointer at the repo path (and only then — working
322
+ notes usually aren't version-controlled). The result: one golden copy per lesson (the repo),
323
+ one queryable projection (the KB), and a breadcrumb in memory — never the same prose in three
324
+ places.
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "@vectros-ai/blueprints",
3
- "version": "0.6.3",
3
+ "version": "0.6.4",
4
4
  "description": "Curated Vectros use-case blueprints (schemas + least-privilege AccessProfile + seed) and the Blueprint format + structural validation. Enforcement (the scope gate) lives in @vectros-ai/cli, not here.",
5
5
  "license": "Apache-2.0",
6
6
  "repository": {
@@ -20,16 +20,23 @@ truth for "why is it shaped this way?" and "how do we do X?".
20
20
 
21
21
  ## The loop: recall before you act, capture after
22
22
 
23
- 1. **RECALL first.** Before you propose a change, design something, or debug a
24
- failure, query the knowledge base. A cold start is exactly when you most need
25
- the decision/convention/gotcha you can't re-derive from the code. Don't
26
- re-litigate a settled decision or re-discover a known trap.
23
+ 1. **RECALL first the KB outranks your own re-derivation.** Before you propose a
24
+ change, design something, or debug a failure, query the knowledge base and treat
25
+ a hit as authoritative over what you'd reconstruct from the code alone. A cold
26
+ start is exactly when you most need the decision/convention/gotcha you can't
27
+ re-derive. Don't re-litigate a settled decision, re-invent an existing
28
+ convention, or re-discover a known trap — a two-second `rag_ask` is cheaper than
29
+ repeating a mistake the team already wrote down. If recall turns up nothing,
30
+ *then* proceed from first principles (and capture what you learn, per step 3).
27
31
  2. **ACT** using what you recalled — follow the conventions and controls, reuse the
28
32
  runbook, respect the supersede chain.
29
33
  3. **CAPTURE after.** When you make a durable decision, learn a convention, hit a
30
34
  gotcha, write or revise a runbook, or run a post-mortem, write it back (a
31
35
  document or a record, per below) so the next session inherits it. **Record the
32
- *why*, not just the *what*** — the reasoning is the most-recalled content.
36
+ *why*, not just the *what*** — the reasoning is the most-recalled content. And
37
+ when the durable source of a KB item is a repo file you just edited, re-ingest it
38
+ (see *Keep the KB in sync* below) — an edit that doesn't propagate silently forks
39
+ the KB from the repo, which is worse than no KB at all.
33
40
 
34
41
  Knowledge is **superseded / retired / resolved** via a status flip, never deleted —
35
42
  the trail of how the team's thinking evolved is part of the value.
@@ -68,17 +75,51 @@ documents). Follow these to navigate provenance.
68
75
 
69
76
  ## How to query (MCP tools)
70
77
 
78
+ **Reach for the most precise tool first.** If the ask is *enumerable* — "which
79
+ critical controls are active?", "the convention for area X", "the definition of
80
+ term Y" — use **`record_query`**: it is exact, cheap, and compact. Fall back to
81
+ **`hybrid_search`** (recall by meaning) only when you don't know the exact filter,
82
+ and to **`rag_ask`** when you want a grounded *answer* over document bodies rather
83
+ than the raw hits. Ordering matters — a `record_query` that returns three tight
84
+ rows beats a `hybrid_search` that spends thousands of tokens to surface the same
85
+ fact.
86
+
87
+ **Query compactly by default.** `hybrid_search` hits carry the surrounding passage
88
+ (`contextText`), so a wide search is *heavy* — a handful of hits can be tens of KB.
89
+ Start with **`limit: 3` + `uniqueDocuments: true`** and escalate only if recall is
90
+ insufficient. Prefer a `record_query` or a tighter filter over a bigger `limit`.
91
+
71
92
  - **Recall by meaning / grounded answer** — `rag_ask` for a cited answer over the
72
93
  document bodies: *"why did we choose X?"*, *"have we hit this before?"*.
73
- - **Search documents** — `hybrid_search` with `contentTypes: ["documents"]` and
74
- `typeName` to scope to one document type (e.g. `typeName: "decision"` or
75
- `"runbook"` or `"postmortem"`), plus `filters` (`{ area: "search" }`,
76
- `{ tags: "tenant-isolation" }`).
77
- - **Query records** — `record_query` for exact enumeration:
94
+ - **Search documents** — `hybrid_search` with `contentTypes: ["documents"]`. Scope
95
+ to one document *type* with **`typeName: "decision"`** (or `"runbook"`,
96
+ `"postmortem"`, ) `typeName` narrows documents and records alike. Add `filters`
97
+ to narrow further (`{ area: "search" }`, `{ tags: "tenant-isolation" }`).
98
+ - **Query records** — `record_query` for exact enumeration; the record type is the
99
+ tool's `type` argument, plus field filters:
78
100
  `record_query control { kind: "control", criticality: "critical", status: "active" }`;
79
101
  `record_query term { term: "AccessProfile" }` (unique lookup);
80
102
  `record_query convention { area: "auth", status: "active" }`.
81
103
  Range/sort on a record's date field (`order: "desc"`) for "latest" / "since".
104
+ *(Type facet by tool: `hybrid_search` uses `typeName`; `record_query` uses `type`.)*
105
+
106
+ **Mind the keyword leg.** `hybrid_search` defaults to `mode: HYBRID` with
107
+ `textMode: PHRASE` (slop 3), so a long natural-language query often matches nothing
108
+ on the BM25 (keyword) leg and you silently get a semantic-only ranking. Use a short
109
+ **keyword phrase** for the text leg, or pass **`textMode: "OR"`** for a
110
+ natural-language query. Tell-tale: if `textScore` is `0` across every hit, the
111
+ keyword leg contributed nothing — re-shape the query or switch `textMode`.
112
+
113
+ **Recall cheat-sheet** (map the question to the tightest query):
114
+
115
+ | You want… | Query |
116
+ |---|---|
117
+ | The active rule/standard for area X | `record_query convention { area: "X", status: "active" }` · `record_query control { area: "X", status: "active" }` |
118
+ | "Have we hit this failure before?" | `hybrid_search { contentTypes: ["documents"], typeName: "postmortem" }` + `record_query gotcha { area: "X", status: "active" }` |
119
+ | The definition of a term | `record_query term { term: "…" }` (unique) |
120
+ | The *why* behind a decision | `rag_ask "why did we …?"` or `hybrid_search { contentTypes: ["documents"], typeName: "decision" }` |
121
+ | Latest N of a dated type | `record_query <type> { … , order: "desc" }` (range/sort on the date field) |
122
+ | Everything tagged to an issue | `record_query`/`hybrid_search` with `filters: { tags: "issue:<id>" }` |
82
123
 
83
124
  ## How to capture (MCP tools)
84
125
 
@@ -92,6 +133,9 @@ documents). Follow these to navigate provenance.
92
133
  stable `externalId` + the typed fields, e.g.:
93
134
  - gotcha: `{ externalId: "gotcha-<slug>", symptom, cause, fix, area: "[area]", status: "active", discoveredOn: "YYYY-MM-DD" }`.
94
135
  - convention: `{ externalId: "<slug>", title, rule, why, howToApply, area: "[area]", status: "active", establishedBy: "<decision-externalId>", updatedOn: "YYYY-MM-DD" }`.
136
+ - If the record is **extracted from a repo file**, add `sourceRef: "<that file>"` so a
137
+ later edit to the source can find and re-extract exactly its records (see the sync
138
+ convention below).
95
139
  - **To retire** — re-write with `status` flipped (a decision to `superseded`, a
96
140
  gotcha to `resolved`). Don't delete.
97
141
 
@@ -101,6 +145,16 @@ documents). Follow these to navigate provenance.
101
145
  plain re-create returns the existing record **unchanged** (`created: false`); to apply
102
146
  edits, send the change with `upsert: true`. Pick stable slugs/numbers.
103
147
  - **Write a reference target before the record that points at it.**
148
+ - **Keep the KB in sync with its source (self-describing, no side index).** If a document
149
+ mirrors a repo file, stamp the file with a top-of-file `<!-- vectros-kb-id: <externalId> -->`
150
+ comment; if a file is *extracted* into records, stamp it with
151
+ `<!-- vectros-kb-records: <type> ref=<path> -->` and give each record a `sourceRef` equal to
152
+ that `ref`. On a source edit, re-ingest the document or re-extract the records by
153
+ `externalId` with `upsert: true`. The markers are invisible HTML comments (they never
154
+ render), and they mean the KB needs no separate map of what came from where. On a
155
+ re-extract, if a record's source heading has disappeared, flip that orphaned record to
156
+ `resolved`/`superseded` rather than leaving it active — a re-sync that only refreshes
157
+ and never retires still serves stale answers.
104
158
  - **Record the why.** A statement without rationale is a log entry, not knowledge.
105
159
  - When a decision changes, write the new `decision` and set its `supersedes` — don't
106
160
  edit the old one's meaning away.
@@ -122,6 +176,27 @@ knowledge with `filters:{ tags:"issue:147" }`; the tag is also your jump-link to
122
176
  status. Be selective — most issues promote nothing; only the durable why/how/lesson
123
177
  belongs here. Never store status (open/closed/assignee) in the KB.
124
178
 
179
+ ## Promoting what you learn (memory → repo → KB)
180
+
181
+ If you keep private working notes or an always-loaded memory file, treat it as a
182
+ **staging area, not a second home**. A lesson lives in exactly one tier:
183
+
184
+ - **Working memory** — where a lesson lands *first*, while it's still fresh or
185
+ agent-personal.
186
+ - **Your repo docs** — the **golden**, shared, reviewable copy. When a memory note
187
+ matures into durable, shareable knowledge, **promote it one-way** into the right
188
+ doc (a conventions file, a troubleshooting reference, a post-mortem) under the
189
+ broadest type that fits — don't fragment one nugget into its own file when a
190
+ broader home exists.
191
+ - **This KB** — the queryable projection, fed *from* the repo doc (stamp it with a
192
+ marker and ingest, per *Keep the KB in sync*).
193
+
194
+ Promotion is terminal: once the repo doc is committed **and** the KB ingest is
195
+ confirmed, collapse the memory note to a one-line pointer at the repo path. Order
196
+ matters — remove the working copy **only after** the two durable copies exist, since
197
+ working notes usually aren't version-controlled. Keep in memory, in full, only what
198
+ has no repo/KB home by design (personal credentials, in-flight status).
199
+
125
200
  [Customize: your `area` vocabulary, which schemas your team uses, naming
126
201
  conventions for `externalId`, your tracker tag prefix, and any house rules — e.g.
127
202
  "every control names the test that enforces it in `evidence`."]