textus 0.2.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
data/SPEC.md ADDED
@@ -0,0 +1,720 @@
1
+ # textus/1 — Specification
2
+
3
+ **Status:** Draft v1.0 (2026-05-19)
4
+ **Protocol identifier:** `textus/1`
5
+ **Reference implementation:** Ruby gem `textus`
6
+
7
+ > *textus* — Latin for "the fabric a text is woven from," same root as *context* (from *con-texere*, "to weave together"). This spec defines a storage shape and wire protocol for that fabric.
8
+
9
+ ---
10
+
11
+ ## 1. What textus is
12
+
13
+ A storage convention and JSON wire protocol that lets humans, scripts, and AI agents read and write structured project memory **deterministically**, with addressable dotted keys, schema validation, role-based write gates, declarative compute, and copy-based publish targets.
14
+
15
+ The storage lives in a `.textus/` directory at the project root. Each entry is a Markdown file with YAML frontmatter. A manifest binds dotted keys to subtrees and declares which roles may write to each zone. Schemas (also YAML) define what frontmatter shape each entry must have. Derived entries are computed from other entries via pure projections and a vendored Mustache template engine, then optionally published to repo-relative paths as byte-for-byte file copies. The CLI surface (`textus get/put/list/where/schema/build/...` `--format=json`) returns a versioned envelope any caller can parse without knowing Markdown.
16
+
17
+ You **shape your own memory structure** inside `.textus/`. The protocol manages how it's read, written, addressed, validated, gated, computed, and published. The contents are entirely yours.
18
+
19
+ ### 1.1 The five layers
20
+
21
+ textus is organized as five composable layers. Each layer has a single responsibility; later layers build on earlier ones.
22
+
23
+ | Layer | Name | Responsibility |
24
+ |---|---|---|
25
+ | L1 | **Store** | Plain-file backend: `.textus/zones/<zone>/...` with YAML frontmatter + Markdown body, addressed by dotted keys, schema-validated, etag-versioned. |
26
+ | L2 | **Sources** | Declared external inputs (`intake` zone): URLs, files, feeds with declared parsers and TTLs. textus *describes* sources; external runners fetch and pipe results through `textus put`. |
27
+ | L3 | **Compute** | Pure transforms from store entries to derived entries. Projections (select/pluck/sort/limit/format) plus a vendored Mustache template subset. No shell execution. |
28
+ | L4 | **Publish** | Byte-for-byte file copy from derived entries to repo-relative paths declared via `publish_to:`. The in-store artifact is the consumer-shaped output; the published file is an identical copy. A sentinel under `.textus/sentinels/<target-rel-path>.textus-managed.json` records the source, sha256, and `mode: "copy"`. |
29
+ | L5 | **Consumers** | Anything that reads the published files or calls the CLI — editors, LLM tools, MCP servers, CI jobs, dashboards. textus is agnostic about who consumes; the envelope is the contract. |
30
+
31
+ ## 2. Goals and non-goals
32
+
33
+ **Goals**
34
+ - Stable wire format (`textus/1`) any language can speak.
35
+ - Deterministic read/write of structured Markdown via a CLI returning JSON.
36
+ - Schema-validated frontmatter using YAML schemas as data.
37
+ - Role-based write gates (humans, scripts, AI, build runners get different permissions per zone).
38
+ - Optimistic concurrency via ETags.
39
+ - Pure declarative compute: derived entries computed from projections + Mustache, no shell-out.
40
+ - Publish derived entries to well-known paths as body-only plain files.
41
+ - Plain-file backend — consumers can also read raw if they prefer.
42
+
43
+ **Non-goals**
44
+ - Not a database. No queries, indexes, joins, or full-text search.
45
+ - Not a graph store. Keys are hierarchical strings; cross-links are unindexed.
46
+ - Not a sync protocol. Single-writer per file, ETag-checked.
47
+ - Not a transport. Spawn the CLI or wrap it in MCP/HTTP downstream.
48
+ - Not a UI. Filesystem + CLI. Viewers ship elsewhere.
49
+ - Not a fetcher. textus declares sources; external runners fetch them.
50
+ - Not an executor. textus computes pure projections but never spawns shell commands.
51
+
52
+ ## 3. Storage layout
53
+
54
+ The root is `.textus/` at the project working directory. A typical v1.0 tree:
55
+
56
+ ```
57
+ .textus/
58
+ manifest.yaml # internal: key → subtree mapping + zones declarations
59
+ audit.log # internal, append-only NDJSON log of every successful write
60
+ role # internal, role token (one line, e.g. "human")
61
+ schemas/ # internal: YAML schema files
62
+ templates/ # internal: Mustache templates referenced by derived entries
63
+ parsers/ # internal: project-local parser extensions
64
+ zones/ # ALL user content lives here
65
+ canon/ # zone: canon (human-only)
66
+ working/ # zone: working (human, ai, script)
67
+ intake/ # zone: intake (script — declared external inputs)
68
+ pending/ # zone: pending (ai proposals awaiting accept)
69
+ derived/ # zone: derived (build only — computed outputs)
70
+ ```
71
+
72
+ Textus internals (`manifest.yaml`, `audit.log`, `role`, `schemas/`, `templates/`, `parsers/`) live directly under `.textus/`. **All user content lives under `.textus/zones/`.** Manifest `path:` fields are relative to `.textus/zones/` — they do **not** include the `zones/` prefix. Implementations MUST prepend `zones/` to every `path:` when resolving a key to a filesystem location.
73
+
74
+ Zone directories under `zones/` are conventional; their write semantics are declared in the manifest, not the directory name.
75
+
76
+ `.textus/audit.log` is an append-only NDJSON file written under a file lock by every successful `put`, `delete`, `accept`, and `build`. `.textus/role` (one line containing a role name) is optional and participates in the role-resolution order (§5).
77
+
78
+ ## 4. Manifest
79
+
80
+ The manifest declares: (a) which zones exist and which roles may write to each, (b) the key-to-subtree mapping, (c) the schema applied to entries in each subtree, and (d) the owner string recorded in writes.
81
+
82
+ ```yaml
83
+ # .textus/manifest.yaml
84
+ version: textus/1
85
+
86
+ zones:
87
+ - name: canon
88
+ writable_by: [human]
89
+ - name: working
90
+ writable_by: [human, ai, script]
91
+ - name: intake
92
+ writable_by: [script]
93
+ - name: pending
94
+ writable_by: [ai]
95
+ - name: derived
96
+ writable_by: [build]
97
+
98
+ entries:
99
+ - key: canon.identity
100
+ path: canon/identity.md
101
+ zone: canon
102
+ schema: identity
103
+
104
+ - key: working.network.org
105
+ path: working/network/org
106
+ zone: working
107
+ schema: person
108
+ owner: textus:network
109
+ nested: true
110
+
111
+ - key: derived.catalogs.people
112
+ path: derived/catalogs/people.md
113
+ zone: derived
114
+ schema: null
115
+ owner: textus:build
116
+ ```
117
+
118
+ **Backward compatibility.** If the manifest omits the `zones:` block, the legacy v0.1 three-zone model is synthesized:
119
+
120
+ ```yaml
121
+ zones:
122
+ - name: fixed
123
+ writable_by: [human]
124
+ - name: state
125
+ writable_by: [human, ai, script]
126
+ - name: derived
127
+ writable_by: [build]
128
+ ```
129
+
130
+ Old manifests written against textus/1 draft v0.1 therefore parse without modification, and any tooling expecting `fixed`/`state`/`derived` continues to work.
131
+
132
+ **Key grammar (enforced from v1.2):** dotted segments matching `/^[a-z0-9][a-z0-9-]*$/`. Segments are joined by `.`. A key has at most 8 segments; each segment is at most 64 characters. Segments MUST NOT contain dots, slashes, uppercase letters, or underscores. Example: `working.projects.acme.dashboard`. Enforcement points: manifest load (rejects illegal `key:` declarations and illegal nested file/directory names), `put` (rejects illegal keys before any write), `enumerate` (filters and warns on illegal filenames so existing trees still load with a clear migration message). Run-once migration: `textus migrate-keys --dry-run` then `--write` (see §audit).
133
+
134
+ **Per-entry `format:` (enforced from v1.2):** an entry MAY declare `format:` to be one of `markdown` (default), `json`, `yaml`, or `text`. The `format` controls the on-disk shape and which path extension is required:
135
+
136
+ | `format` | Path extension | `template:` | `schema:` |
137
+ |------------|-----------------------------|------------------------|-----------|
138
+ | `markdown` | `.md` (or appended if absent) | required for derived | optional |
139
+ | `json` | `.json` required | optional (escape hatch) | optional (top-level keys) |
140
+ | `yaml` | `.yaml` or `.yml` required | optional (escape hatch) | optional (top-level keys) |
141
+ | `text` | `.txt` or no extension | required for derived | MUST be null |
142
+
143
+ For `nested: true`, the recursive glob matches the format's extension (markdown→`**/*.md`, json→`**/*.json`, yaml→`**/*.{yaml,yml}`, text→`**/*.txt`). All files under one nested entry share one format and one schema.
144
+
145
+ **Per-leaf publishing (`publish_each:`, v1.2).** A nested manifest entry MAY declare `publish_each:` to byte-copy every leaf to a templated repo-relative path. `publish_each:` and `publish_to:` are mutually exclusive on the same entry, and `publish_each:` requires `nested: true`. The template substitutes these variables (using `{name}` syntax):
146
+
147
+ | Variable | Value |
148
+ |--------------|----------------------------------------------------------------------------------------|
149
+ | `{leaf}` | Remaining key segments after the entry prefix, joined with `/`. |
150
+ | `{basename}` | Last segment only. |
151
+ | `{key}` | Full dotted key. |
152
+ | `{ext}` | Primary extension for the entry's format, without the leading dot (`md`/`json`/`yaml`/`txt`). |
153
+
154
+ Validation at manifest load: any unknown variable raises `UsageError`; the template MUST reference at least one of `{leaf}`, `{basename}`, `{key}` (otherwise every leaf would clobber the same target). A computed target outside the repo root is refused at build time with `PublishError`. Example:
155
+
156
+ ```yaml
157
+ - key: working.skills
158
+ path: working/skills
159
+ zone: working
160
+ schema: skill
161
+ nested: true
162
+ publish_each: "skills/{basename}/SKILL.md"
163
+ ```
164
+
165
+ A leaf at `working.skills.writing.voice-writer` (authored at `.textus/zones/working/skills/writing/voice-writer.md`) publishes to `skills/voice-writer/SKILL.md`.
166
+
167
+ **`inject_intro:` (v1.1).** A derived entry with a `template:` MAY declare `inject_intro: true`. When the builder materializes the entry, it merges the `textus intro` envelope (§9) into the projection data under the key `intro`, so the template can render orientation content (zones, write flows, CLI catalog) alongside its projected rows. The flag is rejected at manifest load on (a) non-derived entries or (b) derived entries without a `template:` — agents reading the rendered file should be able to trust the preamble was produced by the same source of truth `textus intro` exposes.
168
+
169
+ **Lookup rule:** to resolve a key, find the entry with the longest `key:` prefix that matches. If that entry has `nested: true`, the remaining segments map to subdirectories under its `path`. Otherwise the key must equal an entry exactly. The resolved filesystem path is `<.textus root>/zones/<entry.path>[/<remaining>...].md` — implementations MUST prepend `zones/` to the manifest `path:` when constructing the filesystem location.
170
+
171
+ ## 5. Zones and role-based write gates
172
+
173
+ Each zone declares which **roles** may write to it via `writable_by:` in the manifest. Reads are unrestricted across all zones; only writes are gated.
174
+
175
+ | Zone | `writable_by` | Use case |
176
+ |---|---|---|
177
+ | `canon` | `[human]` | Identity, voice, immutable principles — things only a human edits. |
178
+ | `working` | `[human, ai, script]` | Active project state: notes, decisions, network — what humans and agents update day-to-day. |
179
+ | `intake` | `[script]` | Declared external inputs (calendar, feeds, scraped pages). Refreshed by external runner scripts; never by humans or AI directly. |
180
+ | `pending` | `[ai]` | AI-generated proposals awaiting human review via `textus accept`. Lets agents stage changes without touching `working`. |
181
+ | `derived` | `[build]` | Computed outputs (catalogs, indexes, published context). Written only by the build runner via `textus build`. |
182
+
183
+ A write is gated by the caller's **role**, supplied via `--as=<role>`. If the role is not in the target zone's `writable_by` list, the write returns `write_forbidden`.
184
+
185
+ ### 5.1 Role resolution
186
+
187
+ The effective role for any CLI invocation is resolved in this order; the first match wins:
188
+
189
+ 1. `--as=<role>` flag on the command line.
190
+ 2. `TEXTUS_ROLE` environment variable.
191
+ 3. `.textus/role` file (one line, role name) at the project root.
192
+ 4. Default: `human`.
193
+
194
+ Recognized roles in v1.0: `human`, `ai`, `script`, `build`. Unknown roles are rejected with `invalid_role`. The roles list is intentionally open-ended: a future minor revision MAY introduce additional roles without breaking the wire string.
195
+
196
+ Every successful write records the resolved role and a wall-clock timestamp in `.textus/audit.log`, so reviewers can later distinguish a human edit from an agent edit even though both live in the same file.
197
+
198
+ ### 5.2 Compute layer (derived entries)
199
+
200
+ Derived entries live in the `derived` zone. They are not authored by hand; their body is produced by projecting over other entries. A derived entry's frontmatter declares a `projection` block:
201
+
202
+ ```yaml
203
+ - key: derived.catalogs.people
204
+ zone: derived
205
+ projection:
206
+ select: working.network.org # prefix OR [list of prefixes]
207
+ pluck: [name, relationship, org]
208
+ sort_by: name # optional
209
+ limit: 1000 # default 1000, max 1000
210
+ format: yaml-list-in-md # one of: list, hash, yaml-list-in-md, json, markdown-table
211
+ template: people.mustache # optional; if absent, format determines body
212
+ ```
213
+
214
+ `select` is either a single dotted-key prefix or a list of prefixes. Every entry whose key starts with one of those prefixes is included. `pluck` names the frontmatter fields to retain in the projection result. `sort_by` is optional; when absent, entries are sorted by key. `limit` is bounded at 1000 entries (hard cap); requests above 1000 are rejected.
215
+
216
+ `format` controls the body serialization when no template is supplied. Permitted values: `list`, `hash`, `yaml-list-in-md`, `json`, `markdown-table`.
217
+
218
+ If `template` is given, it names a Mustache template under `.textus/templates/`. textus implements a deliberately restricted Mustache subset:
219
+
220
+ - `{{var}}` — variable interpolation.
221
+ - `{{#section}}...{{/section}}` — section (iteration / truthy block).
222
+ - `{{^inverted}}...{{/inverted}}` — inverted section.
223
+ - `{{!comment}}` — comment.
224
+
225
+ No partials. No lambdas. No HTML escaping (output is raw text, intended for Markdown). Template recursion depth is bounded at 8; exceeding the limit is an error.
226
+
227
+ ### 5.3 Publish layer (`publish_to:`)
228
+
229
+ A derived entry MAY declare `publish_to:` in its frontmatter, listing one or more destination paths relative to the project root:
230
+
231
+ ```yaml
232
+ publish_to:
233
+ - CLAUDE.md
234
+ - .ai/instructions.md
235
+ ```
236
+
237
+ When the entry is recomputed, textus copies the in-store file byte-for-byte to each destination. The in-store artifact under `.textus/zones/derived/…` is already the consumer-shaped output (per the format strategy — see §5.x), so publish is a verbatim file copy with no parsing or stripping.
238
+
239
+ A sentinel is written for each published file at `<store_root>/sentinels/<target-relative-to-repo>.textus-managed.json`, recording `source`, `target`, the target's sha256, and `mode: "copy"`. Sentinels live under the store rather than beside the consumer file so target directories stay clean. The sentinel exists so out-of-band edits can be detected on the next publish — textus refuses to clobber a destination that is not either missing or marked as managed. Legacy sibling sentinels (`<target>.textus-managed.json`) are still recognised as managed and are migrated to the new location on the next publish.
240
+
241
+ **Per-leaf publishing.** A nested entry MAY declare `publish_each:` instead of `publish_to:` (see §4). When the build runs, every leaf reachable under the nested entry is byte-copied to the path produced by substituting `{leaf}` / `{basename}` / `{key}` / `{ext}` in the template, with a sentinel written under `<store_root>/sentinels/` at the mirrored target path. The build envelope grows a `published_leaves` array — one row per leaf, with `key`, `source`, and `target` — alongside the existing `built` array. Targets that would resolve outside the repo root are refused.
242
+
243
+ ### 5.4 Intake (declared, refreshed via registered fetcher)
244
+
245
+ Intake entries declare an external source by naming a **fetcher** — a registered, named function that pulls data into the entry. textus itself still makes no implicit network calls: a fetcher only runs when explicitly invoked by `textus refresh KEY --as=script`. The declaration is data only:
246
+
247
+ ```yaml
248
+ - key: intake.calendar.events
249
+ zone: intake
250
+ source:
251
+ fetcher: ical-events
252
+ config:
253
+ url: "https://calendar.google.com/.../basic.ics"
254
+ ttl: 6h
255
+ ```
256
+
257
+ `fetcher` names a registered fetcher; `config` is an opaque hash handed to the fetcher; `ttl` is the staleness budget. Implementations MUST reject legacy `source.from` and `source.parse` with a clear usage error.
258
+
259
+ **Fetcher contract.** A fetcher is registered via `Textus.fetcher(:name) do |config:, store:| ... end` and MUST return one of three shapes, all normalized by the store into its internal `{frontmatter, body, content}` representation (§5.12):
260
+
261
+ - `{ frontmatter:, body: }` — markdown-friendly (current shape).
262
+ - `{ content: }` — for `format: json|yaml` entries; the parsed object becomes the entry's content.
263
+ - `{ body: }` — raw bytes for `text` or for any format that prefers verbatim writes; the store re-parses and validates per `format:`.
264
+
265
+ The `store:` argument is a read-only `Textus::StoreView` (§5.11). Every fetcher call is wrapped in `Timeout.timeout(2)`; exceptions and timeouts surface as `usage` errors that abort the refresh.
266
+
267
+ **Built-in fetchers.** `json`, `csv`, `markdown-links`, `ical-events`, `rss` are always available. They expect raw bytes in `config["bytes"]` and produce structured frontmatter/body. Built-ins do not perform I/O themselves — the caller (or an outer fetcher) is responsible for supplying bytes.
268
+
269
+ **Refresh paths.** Two are supported:
270
+
271
+ 1. **In-process** — `textus refresh KEY --as=script` resolves the entry's `source.fetcher`, invokes it with `(config:, store:)`, and writes the result under role `script`.
272
+ 2. **External runner** — a cron job or agent harness reads `textus list --zone=intake --stale --format=json`, fetches the source out of band, and pipes bytes back through `textus put KEY --as=script --stdin`.
273
+
274
+ Both paths share the same role gate, audit-log entry, and `:refresh` event (§5.10). User-supplied fetchers live in `.textus/extensions/*.rb` and auto-load at `Store#initialize` (§5.11).
275
+
276
+ ### 5.5 Pending / accept workflow
277
+
278
+ Pending entries are full proposal patches authored into the `pending` zone, typically by agents or scripts. A pending entry's frontmatter describes the patch it proposes against another zone:
279
+
280
+ ```yaml
281
+ ---
282
+ proposal:
283
+ target_key: working.network.org.bob
284
+ action: put
285
+ frontmatter:
286
+ name: bob
287
+ relationship: peer
288
+ org: acme
289
+ ---
290
+ Proposed body content.
291
+ ```
292
+
293
+ `proposal.target_key` names the entry the patch would create or modify, and `proposal.action` is `put` or `delete`. The remaining frontmatter and body are the proposed new content.
294
+
295
+ `textus accept <pending-key>` is **human-only**: the resolved role must be `human`. It copies the patch into the target zone, records provenance (originating pending key, original role, original timestamp) in the audit log, and removes the pending entry. Agents and scripts can propose but cannot accept.
296
+
297
+ ### 5.6 Audit log
298
+
299
+ Every successful write appends one line to an append-only TSV file at `.textus/audit.log`. The file is opened with `flock(LOCK_EX)` for the duration of each append so concurrent writers serialize cleanly.
300
+
301
+ Schema (tab-separated, one record per line):
302
+
303
+ ```
304
+ <iso8601-utc>\t<role>\t<verb>\t<key>\t<etag-before-or-NULL>\t<etag-after-or-NULL>
305
+ ```
306
+
307
+ `<iso8601-utc>` is the wall-clock timestamp in UTC with second (or finer) precision. `<role>` is the resolved role for the invocation. `<verb>` is the CLI verb (`put`, `delete`, `accept`, `compute`, `migrate-keys`, `mv`, ...). `<key>` is the affected entry key. For `mv`, `<key>` is the **new** key, and an extras JSON column carries `from_key`, `to_key`, `from_path`, `to_path`, and `uid`. `<etag-before>` and `<etag-after>` are the entry etags before and after the write, or the literal string `NULL` when not applicable (e.g. create has no before-etag, delete has no after-etag). `migrate-keys --write` emits one line per renamed file using the new key as `<key>` and the file's pre- and post-rename etags.
308
+
309
+ ### 5.7 Security bounds
310
+
311
+ textus enforces fixed bounds to keep behavior predictable under hostile or buggy input:
312
+
313
+ - **Projection result:** 1000 entries (hard cap).
314
+ - **Template recursion:** depth 8.
315
+ - **Manifest size:** 256 KB.
316
+ - **Entry size:** 1 MB.
317
+ - **Audit log:** unbounded; rotation is the user's problem.
318
+
319
+ ### 5.8 Schema evolution (v1.1)
320
+
321
+ Schemas may declare per-field ownership and version history. These keys are additive: a schema may omit both `fields:` and `evolution:` and still parse as in v1.0.
322
+
323
+ **`fields:` block** — keyed by field name. Each entry is an object with at least `type`, plus optional `maintained_by` and any vendor extensions:
324
+
325
+ ```yaml
326
+ fields:
327
+ full_name: { type: string, maintained_by: human }
328
+ embedding: { type: array, maintained_by: ai }
329
+ updated_at: { type: time, maintained_by: script }
330
+ ```
331
+
332
+ `maintained_by` values are free-form strings. The recognized set is `human | ai | script | build | derived`. Unrecognized values do not affect role-authority validation; they pass through unchanged.
333
+
334
+ **`evolution:` block** — top-level, declares the schema's history and migration intent:
335
+
336
+ ```yaml
337
+ evolution:
338
+ added_in: 2026-05-19
339
+ deprecated_at: null
340
+ migrate_from:
341
+ OLD_FIELD: NEW_FIELD
342
+ ```
343
+
344
+ `textus schema-migrate NAME` consults `evolution.migrate_from` when invoked without `--rename=OLD:NEW`, applying every declared rename across affected entries in one pass. An explicit `--rename` flag overrides the schema-declared map for that invocation.
345
+
346
+ **Backwards compat:** v1.0 schemas (no `fields:`, no `evolution:`) continue to parse and behave identically. `schema.maintained_by(field)` returns `nil` for every field; `schema.evolution` returns `{}`.
347
+
348
+ **Override rule:** the role `human` is permitted to write any `maintained_by` field, regardless of declared owner. This preserves human authority over AI/script-managed data — humans curating canon over AI-written embeddings is a feature, not a bug. All other role mismatches are reported by `validate-all` with code `role_authority`, including fields `key`, `field`, `expected`, and `last_writer`.
349
+
350
+ ### 5.9 Reducers (v1.2)
351
+
352
+ Reducers are pure, named functions that shape projection rows into projection rows. Registered via the module-level DSL:
353
+
354
+ ```ruby
355
+ Textus.reducer(:rank_by_recency) do |rows:, config:|
356
+ rows.sort_by { |r| r["updated_at"].to_s }.reverse
357
+ end
358
+ ```
359
+
360
+ **Declaration.** A projection opts into a reducer via `projection.reducer`, with optional `projection.reducer_config`:
361
+
362
+ ```yaml
363
+ projection:
364
+ select: [working.projects]
365
+ pluck: [name, status, updated_at]
366
+ reducer: rank_by_recency
367
+ reducer_config: { tiebreak: name }
368
+ sort_by: updated_at
369
+ limit: 50
370
+ ```
371
+
372
+ The reducer runs **between pluck and sort**. `config:` receives the manifest's `reducer_config` hash (or `{}`). Rows in, rows out.
373
+
374
+ **Purity.** A reducer MUST NOT perform I/O or mutate the store; no `store:` kwarg is passed.
375
+
376
+ **Timeout.** Each invocation is wrapped in `Timeout.timeout(Textus::Projection::REDUCER_TIMEOUT_SECONDS)` (2s). Timeouts, exceptions, and unknown names raise `usage` errors and abort the build.
377
+
378
+ **Auto-load.** Reducers register from `.textus/extensions/*.rb`, loaded at `Store#initialize` in lexical order (§5.11). The registry is per-Store; reducers do not share state across `Store` instances.
379
+
380
+ ### 5.10 Events (v1.2)
381
+
382
+ Lifecycle events fire in-process. Subscribers register via `Textus.hook(:event, :name) do |**kwargs| ... end`. Hooks are fire-and-forget: return values are discarded.
383
+
384
+ **Event set and kwargs:**
385
+
386
+ | Event | Fired by | Kwargs |
387
+ |-----------|-------------------------|--------------------------------------------------------------|
388
+ | `:put` | `Store#put` | `key:, envelope:, store:` |
389
+ | `:delete` | `Store#delete` | `key:, store:` |
390
+ | `:refresh`| `Refresh.call` | `key:, envelope:, store:, change:` (`:created` or `:updated`)|
391
+ | `:build` | `Builder#materialize` | `key:, envelope:, store:, sources:` |
392
+ | `:accept` | `Proposal.accept` | `pending_key:, target_key:, store:` |
393
+
394
+ `:refresh` with `change: :unchanged` does NOT fire — only `:created` and `:updated` are emitted. The `store:` kwarg is always a `Textus::StoreView` (§5.11).
395
+
396
+ **Timeout and isolation.** Each hook runs under `Timeout.timeout(2)`. Hook errors and timeouts are recorded as `event_error` rows in `.textus/audit.log` (column 7, JSON-encoded extras with `event`, `hook`, `error`) but do NOT abort the triggering operation. The store write that fired the event is already committed by the time hooks run.
397
+
398
+ **Manifest declarations.** A manifest entry MAY declare external-runner hooks under an `events:` block, keyed by event name:
399
+
400
+ ```yaml
401
+ events:
402
+ refresh:
403
+ - { exec: scripts/reindex.sh, as: script }
404
+ build:
405
+ - { exec: scripts/rebuild-index.sh, as: build }
406
+ ```
407
+
408
+ Textus does NOT invoke these — they surface only through `textus extensions list --kind=hook` for orchestrators (lefthook, cron, CI) to consume. Each entry has `exec` (opaque runner-resolvable string) and `as` (role to claim, defaults to `script`).
409
+
410
+ **Removed.** The v1.1 `on_stale` event is removed in 0.2. Staleness is a poll, surfaced by `textus stale`. The `on_`-prefix convention from v1.1 is gone; events are bare symbols.
411
+
412
+ ### 5.11 Extension surface (v1.2)
413
+
414
+ Three DSL verbs cover all user-supplied code:
415
+
416
+ ```
417
+ Textus.fetcher(:name) do |config:, store:| ... end # returns {frontmatter:, body:} | {content:} | {body:}
418
+ Textus.reducer(:name) do |rows:, config:| ... end # returns rows
419
+ Textus.hook(:event, :name) do |**kwargs| ... end # side effects; return ignored
420
+ ```
421
+
422
+ Files in `.textus/extensions/*.rb` are loaded at `Store#initialize`, in lexical order, with the registry installed as the current registry for that store. Registries are per-Store: two Store instances in the same process do not share state.
423
+
424
+ Failure modes:
425
+
426
+ | Surface | Timeout | Exception | Bad return |
427
+ |----------|------------|---------------------------------------------|------------|
428
+ | fetcher | aborts op | aborts op (wrapped as `UsageError`) | aborts op |
429
+ | reducer | aborts op | aborts op | aborts op |
430
+ | hook | logged | logged (audit `event_error` row) | n/a |
431
+
432
+ Fetchers and reducers are pure transforms; return values flow into the store. Hooks are side effects; return values are discarded.
433
+
434
+ The `store:` argument is always a `Textus::StoreView` — a read-only proxy exposing `get`, `list`, `where`, `schema_envelope`, `deps`, `rdeps`, `published`, `stale`, `validate_all`. Write attempts raise `Textus::UsageError`.
435
+
436
+ ### 5.12 Storage formats (v1.2)
437
+
438
+ An entry's `format:` selects a storage strategy. All strategies expose the same `parse(bytes) → {frontmatter, body, content}` and `serialize(frontmatter:, body:, content:) → bytes` contract. The store, audit, etag, and projection layers operate on the parsed shape; only (de)serialization differs.
439
+
440
+ - **markdown** — YAML frontmatter between `---` fences, free-form body. Parse: Psych `safe_load` on the front matter; body is the remainder. Serialize: emit `---\n<yaml>\n---\n<body>`. `content` is always `nil`.
441
+ - **json** — entire file is a JSON document. Parse: `JSON.parse`. Serialize: `JSON.pretty_generate(content)` + trailing newline. `frontmatter` is populated from a top-level `_meta` hash (if present, else `{}`); `body` is the raw bytes; `content` is the parsed object.
442
+ - **yaml** — entire file is a YAML mapping. Parse: `YAML.safe_load(bytes, permitted_classes: [Date, Time], aliases: false)`; anchors/aliases rejected. Serialize: `YAML.dump(content).sub(/\A---\n/, "")`. Same `_meta` / `frontmatter` / `body` / `content` rules as JSON.
443
+ - **text** — raw UTF-8 bytes. Parse: body is the file verbatim, `frontmatter` is `{}`, `content` is `nil`. Serialize: write `body` bytes (with trailing newline if missing).
444
+
445
+ **Envelope shape.** Every envelope carries `format:` (always present, defaults to `markdown` for back-compat). For `json|yaml`, the envelope additionally carries `content:` (parsed object). `body` is always the raw on-disk bytes. `frontmatter` always exists, and for `json|yaml` mirrors the `_meta` block (`{}` if absent). `text` always has `frontmatter: {}` and no `content`.
446
+
447
+ **`_meta` convention.** Derived structured entries (json, yaml) embed a `_meta` hash as the first top-level key. Builder-injected keys appear in a fixed order for etag stability:
448
+
449
+ ```
450
+ generated_at, from, template, reducer
451
+ ```
452
+
453
+ Keys with `nil` values are omitted. User-shaped content (or the reducer's hash) follows `_meta`. The etag (§10) is the sha256 of the on-disk bytes regardless of format; key ordering MUST therefore be deterministic, which Ruby's `Hash` and `JSON.generate` / `YAML.dump` honor via insertion order.
454
+
455
+ ## 6. Schemas
456
+
457
+ Schemas live in `.textus/schemas/<name>.yaml`. A schema declares the required and optional frontmatter fields for entries that reference it.
458
+
459
+ ```yaml
460
+ # .textus/schemas/person.yaml
461
+ name: person
462
+ required:
463
+ - name
464
+ - relationship
465
+ - org
466
+ optional:
467
+ - notes
468
+ - aliases
469
+ fields:
470
+ name: { type: string, max: 80 }
471
+ relationship: { type: enum, values: [peer, manager, report, external] }
472
+ org: { type: string }
473
+ aliases: { type: array, items: { type: string } }
474
+ notes: { type: string, max: 2000 }
475
+ ```
476
+
477
+ **Supported types:** `string`, `number`, `boolean`, `enum` (with `values:`), `array` (with `items:`), `object` (with nested `fields:`).
478
+
479
+ **Validation:** strict required-field check; optional fields may be omitted; unknown fields produce a warning, not an error (forward-compat).
480
+
481
+ ## 7. Entry file format
482
+
483
+ Every entry is a UTF-8 Markdown file with a YAML frontmatter block:
484
+
485
+ ```markdown
486
+ ---
487
+ name: jane
488
+ relationship: peer
489
+ org: acme
490
+ ---
491
+ Short body in Markdown.
492
+ ```
493
+
494
+ The frontmatter `name:` field, when present, must match the file's basename (without `.md`). Implementations may relax this for backward compat but the reference impl enforces it.
495
+
496
+ **`uid:` (Textus UID).** Entries MAY carry a stable identity field that survives renames and moves. Optional. When present:
497
+
498
+ - Lives at top-level `uid:` in markdown frontmatter, or `_meta.uid` in `json`/`yaml` entries.
499
+ - Format: lowercase hex string, 12 or more characters. The reference impl mints 16 hex chars (`SecureRandom.hex(8)`). This is a **Textus UID**, not a UUID — short on purpose.
500
+ - Auto-assigned on the first successful `Store#put` if the payload has no uid. Preserved on subsequent puts.
501
+ - Existing files without a uid continue to work. The envelope shows `"uid": null` until a put mints one.
502
+ - `text` entries have no metadata channel and therefore no uid; their envelope always shows `"uid": null`.
503
+
504
+ Entries in `zone: derived` SHOULD additionally carry the `generated:` block defined in §5.2. Implementations MUST treat unknown frontmatter fields as warnings, not errors, so build runners can extend the metadata without breaking conformance.
505
+
506
+ ## 8. Envelope (the wire format)
507
+
508
+ Every successful CLI response (`--format=json`) is a single JSON envelope:
509
+
510
+ ```json
511
+ {
512
+ "protocol": "textus/1",
513
+ "key": "working.network.org.jane",
514
+ "zone": "working",
515
+ "owner": "textus:network",
516
+ "path": "/absolute/path/to/.textus/zones/working/network/org/jane.md",
517
+ "format": "markdown",
518
+ "frontmatter": { "name": "jane", "relationship": "peer", "org": "acme" },
519
+ "body": "Short body in Markdown.\n",
520
+ "etag": "sha256:8f3c…",
521
+ "schema_ref": "person",
522
+ "uid": "a1b2c3d4e5f60718"
523
+ }
524
+ ```
525
+
526
+ **Field rules:**
527
+ - `protocol` MUST be the exact string `textus/1`.
528
+ - `key` MUST be the canonical resolved key.
529
+ - `zone` MUST be one of the zones declared in the manifest (`canon`, `working`, `intake`, `pending`, `derived` for the default v1.0 model; legacy v0.1 manifests synthesize `fixed`, `state`, `derived` per §4).
530
+ - `path` MUST be an absolute filesystem path.
531
+ - `format` MUST be one of `markdown`, `json`, `yaml`, `text` (§5.12). Absent envelopes are treated as `markdown` for back-compat.
532
+ - `body` is the raw on-disk bytes as a UTF-8 string for every format.
533
+ - `content` is present only when `format` is `json` or `yaml`; equals the parsed object. For `json|yaml`, `frontmatter` mirrors the top-level `_meta` (or `{}` if absent).
534
+ - `etag` MUST be `sha256:<hex>` of the raw file bytes, computed identically for every format.
535
+ - `schema_ref` MAY be `null` for entries in subtrees with `schema: null`.
536
+ - `uid` is the stable Textus UID (§7) if the entry carries one, else `null`. Always present in the envelope.
537
+
538
+ Errors use a distinct envelope:
539
+
540
+ ```json
541
+ {
542
+ "protocol": "textus/1",
543
+ "ok": false,
544
+ "code": "write_forbidden",
545
+ "message": "zone 'canon' is not writable by role 'ai' for key 'canon.identity'",
546
+ "details": { "key": "canon.identity", "zone": "canon", "role": "ai" }
547
+ }
548
+ ```
549
+
550
+ **Error codes:**
551
+
552
+ | Code | Meaning | Default exit |
553
+ |---|---|---|
554
+ | `unknown_key` | Key does not resolve | 1 |
555
+ | `bad_frontmatter` | YAML parse failed or `name:` mismatch | 1 |
556
+ | `schema_violation` | Required field missing or wrong type | 1 |
557
+ | `write_forbidden` | Resolved role is not in the zone's `writable_by` | 1 |
558
+ | `etag_mismatch` | Concurrent write detected | 1 |
559
+ | `io_error` | Filesystem failure | 64 |
560
+ | `usage` | CLI argument error | 2 |
561
+
562
+ ## 9. CLI surface
563
+
564
+ The reference binary is `textus`. Conforming implementations MAY use any binary name; the protocol is in the JSON.
565
+
566
+ All verbs accept `--format=json` and emit a canonical envelope (success or error). Write verbs require `--as=<role>`; the role must satisfy the target zone's write gate (§5).
567
+
568
+ | Verb | Reads / writes | Role required |
569
+ |---|---|---|
570
+ | `list [--prefix=K] [--zone=Z] [--stale]` | read | any |
571
+ | `where K` | read | any |
572
+ | `get K` | read | any |
573
+ | `schema K` | read | any |
574
+ | `stale [--prefix=K] [--strict]` | read | any |
575
+ | `deps K` / `rdeps K` | read | any |
576
+ | `published` | read | any |
577
+ | `validate-all` | read | any |
578
+ | `put K --stdin --as=R [--fetcher=NAME]` | write | per zone |
579
+ | `delete K --if-etag=E --as=R` | write | per zone |
580
+ | `refresh K --as=script` | write | per zone (typically `script`) |
581
+ | `build [--prefix=K] [--dry-run]` | write | `build` (default) |
582
+ | `accept K --as=human` | write | `human` |
583
+ | `init` | write | `human` |
584
+ | `schema-init NAME` / `schema-diff NAME` / `schema-migrate NAME --rename=OLD:NEW` | write | `human` |
585
+ | `migrate-keys [--dry-run\|--write]` | write (with `--write`) | `human` |
586
+ | `mv OLD NEW [--as=R] [--dry-run]` | write | per zone (same-zone only) |
587
+ | `uid K` | read | any |
588
+ | `extensions list [--kind=fetcher\|reducer\|hook]` | read | any |
589
+ | `doctor [--format=json]` | read | any |
590
+ | `intro [--format=json]` | read | any |
591
+
592
+ **`put` input** (read from stdin when `--stdin` is given):
593
+
594
+ ```json
595
+ { "frontmatter": { "name": "jane", "relationship": "peer", "org": "acme" },
596
+ "body": "Short body.\n",
597
+ "if_etag": "sha256:8f3c…" }
598
+ ```
599
+
600
+ `if_etag` is optional on `put`, required on `delete`. When provided, the write fails with `etag_mismatch` if the on-disk file's etag differs. When omitted on `put`, the write is unconditional (last-writer-wins).
601
+
602
+ **`textus stale` output shape:**
603
+
604
+ ```json
605
+ [
606
+ { "key": "derived.catalogs.skills",
607
+ "path": "/abs/.textus/zones/derived/catalogs/skills.md",
608
+ "generator": { "command": "rake catalog:skills",
609
+ "sources": ["working.projects", "working.network"] },
610
+ "reason": "source 'working.projects' modified after generated.at" }
611
+ ]
612
+ ```
613
+
614
+ `textus build` consumes the stale list and executes each `generator.command` itself, writing results back through `put` under the `build` role. `--dry-run` prints the plan without executing.
615
+
616
+ `textus accept K --as=human` promotes a pending entry into its target zone: it copies the patch body into the target key, deletes the pending entry, and writes one audit line per side (§audit). Only the `human` role may invoke `accept`.
617
+
618
+ `textus init` scaffolds a fresh `.textus/` tree (manifest, zones, schemas, audit log) under the current directory with a default manifest. Customize by editing `.textus/manifest.yaml` after init.
619
+
620
+ `textus schema-init NAME` writes a stub schema. `schema-diff NAME` compares the on-disk schema against entries that claim it and prints the deltas. `schema-migrate NAME --rename=OLD:NEW` rewrites the frontmatter key `OLD` to `NEW` across every entry that uses the named schema, in a single transactional sweep that logs each touched file.
621
+
622
+ ## 10. ETag semantics
623
+
624
+ The etag is `sha256:<lowercase-hex-digest-of-raw-file-bytes>`. Computed after any normalization (trailing newline on write, UTF-8 encoding). Both reads and successful writes return the current etag; passing it back in `if_etag` enforces optimistic concurrency.
625
+
626
+ ## 10.1 Errors carry hints
627
+
628
+ Every `Textus::Error` exposes `code`, `message`, and an optional `hint:`. The hint is a single short string suggesting the next action — the file to edit, the role to pass, the command to run. Errors in the wire envelope include the hint as a top-level `hint:` field when present. The CLI prints failures to stderr as `code: message` followed by ` → hint` (when a hint exists), in addition to the JSON envelope on stdout. Hints are advisory: implementations MAY omit or rephrase them without breaking conformance.
629
+
630
+ ## 10.2 `textus doctor`
631
+
632
+ `textus doctor` returns a health-check envelope: `{ "protocol": "textus/1", "ok": bool, "issues": [...], "summary": {error, warning, info} }`. Each issue carries `code`, `level` (`error|warning|info`), `subject`, `message`, and optionally `fix`. `ok` is true iff no error-level issues are present; warnings and info do not flip the bit. Checks include manifest sanity, missing schemas/templates, extension load failures, illegal nested keys (with proposed normalisation), sentinel drift/orphans, and audit-log line corruption. Exit code is 0 on `ok`, 1 otherwise.
633
+
634
+ ## 11. Versioning
635
+
636
+ - The wire string `textus/1` is the protocol version.
637
+ - Backward-compatible additions (new fields, new error codes, new schema types) MAY be made under `textus/1`.
638
+ - Breaking changes (renamed/removed fields, zone semantics, key grammar) require a new wire string `textus/2`.
639
+ - Implementations MUST reject envelopes whose `protocol` they do not recognize.
640
+
641
+ The reference Ruby gem follows semver independently. Gem 1.x speaks `textus/1`.
642
+
643
+ ## 12. Conformance fixtures
644
+
645
+ A conformant implementation MUST pass these fixtures (the reference test suite ships a YAML file listing inputs and expected envelopes):
646
+
647
+ **Fixture A — Resolve and read:**
648
+ Given a manifest with `working.network.org` → `working/network/org` (nested), schema `person`, and a file `.textus/zones/working/network/org/jane.md` with valid frontmatter, `textus get working.network.org.jane --format=json` returns the canonical envelope with `etag` matching the file's sha256.
649
+
650
+ **Fixture B — Role gate on write:**
651
+ Given a manifest entry where `key: canon.identity` lives in the `canon` zone (human-only), `textus put canon.identity --stdin --as=ai` (with any valid input) returns the error envelope with `code: "write_forbidden"` and exit code 1.
652
+
653
+ **Fixture C — Schema violation:**
654
+ Given the `person` schema and a `put` whose frontmatter omits `relationship`, the result is the error envelope with `code: "schema_violation"`, `details.missing: ["relationship"]`, and exit code 1.
655
+
656
+ **Fixture D — Staleness detection:**
657
+ Given a manifest entry `derived.catalogs.skills` with `generator.sources: [working.projects]`, and a working-zone entry under `working.projects` whose file mtime is newer than the derived entry's `generated.at` frontmatter timestamp, `textus stale --format=json` includes the derived entry with its declared `generator.command` and a `reason` field naming the stale source. Calling `textus stale` does NOT execute the command.
658
+
659
+ **Fixture E — Projection build:**
660
+ Given a manifest entry `derived.catalogs.skills` whose `projection` clause selects fields from `working.projects` entries, `textus build derived.catalogs.skills` materializes the derived entry on disk with frontmatter and body matching the projected shape, and updates `generated.at` to the build timestamp.
661
+
662
+ **Fixture F — Mustache render:**
663
+ Given a derived entry with a `template` clause referencing a `.mustache` file and inputs drawn from other keys, `textus build` produces a body whose contents match the expected rendered output byte-for-byte (after trailing-newline normalization).
664
+
665
+ **Fixture G — Copy publish:**
666
+ Given a manifest entry with `publish_to: <path>`, a successful `textus build` for that entry leaves a plain file at `<path>` whose contents are byte-identical to the in-store artifact at `.textus/zones/<...>`, accompanied by a sentinel at `.textus/sentinels/<path>.textus-managed.json` recording `source`, `target`, `sha256`, and `mode: "copy"`. Re-running `build` is idempotent.
667
+
668
+ **Fixture H — Audit log format:**
669
+ Every successful write verb (`put`, `delete`, `build`, `accept`, `schema-migrate`) appends exactly one line per affected key to the audit log, in the canonical format defined in §audit (timestamp, actor role, verb, key, etag-before, etag-after). No write produces zero or multiple lines per key.
670
+
671
+ **Fixture I — Pending → accept:**
672
+ Given a pending entry `pending.canon.identity.patch` proposing a change to `canon.identity`, `textus accept canon.identity --as=human` copies the patch body into `canon.identity`, deletes the pending entry, and appends two audit lines (one for the canon write, one for the pending delete) in that order.
673
+
674
+ ## 13. Why not X?
675
+
676
+ - **Why not MCP?** MCP is a transport; textus is a data model. The two compose: a 50-line MCP server can wrap `textus get/put` as tools. textus exists because the *shape* of agent-readable project memory deserves a standalone spec, separate from how it's served.
677
+
678
+ - **Why doesn't textus execute generator commands itself?** textus is a dataflow oracle, not a build runner. The moment a spec includes process execution, it inherits shell-injection surface, OS-portability concerns, and signal-handling semantics — and ends up duplicating whatever build system the consumer already runs (make, rake, just, lefthook, CI). Keeping execution external means a Python or TypeScript port of `textus/1` only has to parse YAML and emit JSON; it doesn't have to spawn processes safely. Build runners stay the executor; textus stays a data tool.
679
+
680
+ - **Why not plain Markdown vaults (Obsidian / Foam)?** No schema enforcement, no write-gating, no addressable wire format. Fine for human notes; underspecified for agents that must act on the contents deterministically.
681
+
682
+ - **Why not Notion / Coda?** Closed, hosted, lossy export. textus is local-first, plain-files, diffable in git.
683
+
684
+ - **Why not JSON Schema for the schemas?** Considered. Bespoke YAML chosen for v1: simpler implementation, lighter dependency footprint, matches the reference impl's house language. JSON Schema MAY be added as an alternate schema-language adapter in a future minor revision without breaking `textus/1`.
685
+
686
+ - **Why not a database (SQLite, kv store)?** textus's whole point is that the storage is plain files agents and humans both read. A binary store loses git-diff, grep, and editor support.
687
+
688
+ - **Why not vector embeddings?** Different problem. textus is for facts agents act on deterministically; embeddings are for fuzzy retrieval. They compose — index a textus tree into a vector store if you need both.
689
+
690
+ ## 14. Open questions (v1.x scope)
691
+
692
+ - **Locking on `put`:** the reference impl uses sha256 etags. Should the spec also define a file-lock fallback for systems where read-before-write is racy?
693
+ - **Schema imports:** can one schema reference another (`type: $ref: person`)? Defer to v1.1.
694
+ - **Internationalization:** non-ASCII in keys? Spec currently restricts segments to `[a-z0-9_-]`. Revisit if community wants Unicode.
695
+ - **Generated content in `derived/`:** the spec says `schema: null` is allowed, but should there be a separate marker (`generated: true`) for clarity?
696
+
697
+ ## 15. Implementation checklist
698
+
699
+ A v1 implementation MUST:
700
+
701
+ - [ ] Parse `.textus/manifest.yaml` and validate the `version: textus/1` declaration.
702
+ - [ ] Resolve keys via longest-prefix match against manifest entries.
703
+ - [ ] Read frontmatter + body from `.md` files; validate against the named schema.
704
+ - [ ] Compute `sha256:<hex>` etags over raw file bytes.
705
+ - [ ] Refuse writes whose resolved role is not in the target zone's `writable_by` list with `write_forbidden`.
706
+ - [ ] Return envelopes matching the shape in §8 exactly.
707
+ - [ ] Use the error codes in §8 and the exit-code table.
708
+ - [ ] Implement `textus stale` per §5.1 and §9, comparing each derived entry's `generator.sources` against its `generated.at` timestamp without invoking any commands.
709
+ - [ ] Pass the conformance fixtures A–I in §12.
710
+
711
+ A v1 implementation MAY:
712
+
713
+ - Add additional CLI verbs (e.g. `move`, vendor-specific reporters) beyond the v1.0 set in §9.
714
+ - Provide alternate output formats (`--format=yaml`, `--format=table`) for human use.
715
+ - Support additional schema field types beyond §6, marked as `vendor:<name>` extensions.
716
+
717
+ ---
718
+
719
+ **Spec word count target:** <2500 words (allowance widened from 2000 to fit Level-A/B derived provenance + staleness in v1).
720
+ **Reviewed against community-testing checklist (idea file §"Community-testing"):** ✅ <2500 words; ✅ implementable in a day in TS/Python (four concepts: manifest, schema, envelope, staleness check); ✅ conformance fixtures A–I; ✅ "Why not X?" section present (incl. why no execution); ✅ name picked.