textus 0.2.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,57 @@
1
+ # Architecture
2
+
3
+ How the reference Ruby implementation is organized. The wire protocol itself lives in [`../SPEC.md`](../SPEC.md); this document covers *how* the gem implements that spec.
4
+
5
+ ## Layering
6
+
7
+ ```
8
+ ┌──────────────────────────────────────────────┐
9
+ │ exe/textus │ thin shim: load lib, call CLI.run
10
+ ├──────────────────────────────────────────────┤
11
+ │ Textus::CLI │ argv parsing, JSON I/O, exit codes
12
+ ├──────────────────────────────────────────────┤
13
+ │ Textus::Store │ verb implementations (get/put/list/…)
14
+ ├───────────────┬───────────────┬──────────────┤
15
+ │ Manifest │ Schema │ Entry │ parse/resolve, validate, (de)serialize
16
+ ├───────────────┴───────────────┴──────────────┤
17
+ │ Etag · Errors · version │ primitives
18
+ └──────────────────────────────────────────────┘
19
+ ```
20
+
21
+ Each layer talks only to the layer below it. `Store` is the only class that touches the filesystem for read/write; `Manifest` and `Schema` read at load time and are otherwise pure.
22
+
23
+ ## Key resolution
24
+
25
+ `Manifest#resolve(key)` does **longest-prefix match** against `entries[].key`. The matched entry's `path:` is the base; if `nested: true`, remaining dotted segments become `/`-joined subdirectories under that path and a `.md` is appended.
26
+
27
+ Resolution is **path-only** — it does not check whether the file exists. Existence is the verb's concern: `get` raises `unknown_key` when the file is missing; `put` happily creates new files in nested entries.
28
+
29
+ ## Frontmatter parsing
30
+
31
+ `Entry.parse` splits raw bytes on `---\n` boundaries and feeds the YAML chunk to `YAML.safe_load` (no aliases, restricted classes). Unknown top-level frontmatter keys are not rejected here — they are surfaced as warnings by `Schema#validate!`. This is the forward-compat rule from §6 of the spec.
32
+
33
+ The frontmatter `name:` field is enforced against the file basename inside `Store` (both on read and on write) so a misnamed file or a mistyped `name:` in `put` payload fails fast with `bad_frontmatter`.
34
+
35
+ ## Zone enforcement
36
+
37
+ Three zones (`fixed`, `state`, `derived`) declared per-entry in the manifest. `Store#put` checks `ManifestEntry#agent_writable?` (true only for `state`) before doing anything else and raises `write_forbidden` otherwise. Zone semantics live in the manifest, not directory names — a project can rename `state/` to whatever it wants.
38
+
39
+ ## ETag and concurrency
40
+
41
+ `Etag.for_bytes` returns `sha256:<hex>` over the raw file bytes. `put` accepts an optional `if_etag:` — if provided and the on-disk file's etag differs, `etag_mismatch` is raised. No locking, no temp-file-and-rename — the v1 spec leaves stronger guarantees to v1.x (§14 open question).
42
+
43
+ ## Staleness (the dataflow oracle)
44
+
45
+ `Store#stale` walks every `zone: derived` entry that declares a `generator:` block, reads its `generated.at` frontmatter timestamp, and compares against each `generator.sources` entry's current mtime. Returns the offenders **and their declared `command`**; it does **not** execute anything. This is the core "dataflow oracle, not executor" boundary from §5.1 of the spec.
46
+
47
+ Sources are heuristically classified: a string matching the textus key grammar with no `/` is treated as a textus key (and enumerated via the manifest); anything else is treated as a repo-relative path.
48
+
49
+ ## Errors → envelopes
50
+
51
+ All `Textus::Error` subclasses carry a stable error `code`, a `details` hash, and an `exit_code`. `CLI` catches them at the top level and emits the §8 error envelope on stdout; the exit code matches the §8 table. Errors are never written to stderr in `--format=json` mode — agents read stdout.
52
+
53
+ ## What this implementation deliberately leaves out
54
+
55
+ - **No process spawning.** Even `stale` does not execute. Build runners do that.
56
+ - **No transport.** No HTTP server, no socket, no MCP server in this gem. Those are downstream wrappers (see [`./conventions.md`](./conventions.md)).
57
+ - **No indexes.** Listing walks the filesystem each time. Premature optimisation for v1.
@@ -0,0 +1,85 @@
1
+ # Conventions
2
+
3
+ Guidelines for shaping a `.textus/` tree, naming keys, organising schemas, and integrating with build runners. The spec ([`../SPEC.md`](../SPEC.md)) defines what's enforceable; this document captures what's *idiomatic*.
4
+
5
+ ## Key naming
6
+
7
+ - **Segments are lowercase, kebab- or snake-case.** The grammar `^[a-z0-9](?:[a-z0-9_-]*[a-z0-9])?$` is the hard limit. Prefer `acme-dashboard` over `acmedashboard` when there's a natural word break.
8
+ - **Lead with the zone in the key path.** `state.projects.acme.dashboard`, not `projects.acme.dashboard`. The zone prefix makes it obvious from the key alone whether a write will be accepted.
9
+ - **Mirror the directory structure.** If `state.projects.acme.dashboard` resolves to `state/projects/acme/dashboard.md`, do not invent shortcuts that diverge.
10
+ - **Don't pluralise the leaf.** `state.network.org.jane`, not `state.network.org.janes`. Pluralise the container, not the entry.
11
+
12
+ ## Zone layout
13
+
14
+ Recommended top-level layout — the spec allows alternatives, but this is what tooling will default to:
15
+
16
+ ```
17
+ .textus/
18
+ manifest.yaml
19
+ schemas/ # YAML schema definitions
20
+ fixed/ # identity, voice, canon — humans only
21
+ state/ # agent-writable working memory
22
+ derived/ # generated by build runners — never edit by hand
23
+ ```
24
+
25
+ Inside `state/`, group by **domain** (people, projects, decisions, runbooks), not by file type or date. Inside `derived/`, group by **producer** (`derived/catalogs/`, `derived/indexes/`) so it's clear which build job owns what.
26
+
27
+ ## Schema design
28
+
29
+ - **One schema per entry type, not per directory.** `person.yaml`, `project.yaml`, `decision.yaml` — applied across multiple subtrees if the shape matches.
30
+ - **Required = "this entry is meaningless without it."** Everything else is `optional`. Resist the urge to mark organisational metadata (like `tags`) required.
31
+ - **Prefer `enum` over free-text** for low-cardinality fields (relationship type, status, severity). Agents are far better at picking from a list than at producing exact strings.
32
+ - **Cap string lengths** with `max:` where the field has a natural bound (names, summaries). Skip for prose body — bodies are not schema-validated, only frontmatter is.
33
+
34
+ ## Owner strings
35
+
36
+ The `owner:` field in the manifest is **advisory metadata**, not an ACL. Use it to label *who's expected to write here*:
37
+
38
+ - `textus:network` — humans curate
39
+ - `agent:planner` — a specific named agent
40
+ - `build:catalog-skills` — a specific build job
41
+
42
+ Tooling around `git blame` or audit logs may filter on owner; the gem itself only echoes it back in envelopes.
43
+
44
+ ## Derived entries and build runners
45
+
46
+ **Always** declare `generator:` on derived entries that participate in any build pipeline. Without it, `textus stale` cannot help — the entry is just an opaque file.
47
+
48
+ ```yaml
49
+ - key: derived.catalogs.skills
50
+ path: derived/catalogs/skills
51
+ zone: derived
52
+ schema: null
53
+ owner: build:catalog-skills
54
+ generator:
55
+ command: "rake catalog:skills"
56
+ sources:
57
+ - state.projects
58
+ - state.network
59
+ ```
60
+
61
+ **The build runner is responsible for writing the `generated:` frontmatter block** when it regenerates. The gem will never synthesize it. A typical lefthook / rake / just integration looks like:
62
+
63
+ ```sh
64
+ textus stale --format=json | jq -r '.[] | .generator.command' | sort -u | while read cmd; do
65
+ eval "$cmd"
66
+ done
67
+ ```
68
+
69
+ `generated.from` SHOULD match `generator.sources` from the manifest — they're the same list, recorded in two places so a diffable file proves what was actually consumed.
70
+
71
+ ## Body content
72
+
73
+ - **Bodies are Markdown.** Headings, lists, code fences — whatever a human or agent finds useful.
74
+ - **The schema does not validate the body.** If a field belongs in structured data, put it in frontmatter, not the body.
75
+ - **Keep entries short.** If a project entry hits 500 lines, it probably wants to be split into sub-entries (e.g. `state.projects.acme.dashboard` + `state.projects.acme.api`) rather than one mega-document.
76
+
77
+ ## Concurrency
78
+
79
+ For multi-writer environments, **always pass `if_etag`** on `put`. The gem treats etag-less writes as last-writer-wins on purpose (single-writer scripts, fresh-file creation), but anything resembling a daemon or a long-running agent should round-trip the etag.
80
+
81
+ ## Pairing with other tools
82
+
83
+ - **MCP servers**: a thin server that exposes `textus get` and `textus put` as tools is the recommended way to give Claude/agents access. Don't bake MCP into this gem.
84
+ - **Vector stores**: index `body` content into a vector store if you want fuzzy retrieval. `frontmatter` stays in textus as the source of truth for deterministic facts.
85
+ - **CI**: run `textus stale` (or `textus list` + schema validation) in CI to catch drift between derived entries and their sources.
data/exe/textus ADDED
@@ -0,0 +1,4 @@
1
+ #!/usr/bin/env ruby
2
+ $LOAD_PATH.unshift(File.expand_path("../lib", __dir__))
3
+ require "textus"
4
+ exit Textus::CLI.run(ARGV)
@@ -0,0 +1,32 @@
1
+ require "json"
2
+ require "time"
3
+
4
+ module Textus
5
+ class AuditLog
6
+ def initialize(root)
7
+ @path = File.join(root, "audit.log")
8
+ end
9
+
10
+ def last_writer_for(key)
11
+ return nil unless File.exist?(@path)
12
+
13
+ File.foreach(@path).map { |l| l.chomp.split("\t") }
14
+ .select { |row| row[3] == key && %w[put delete].include?(row[2]) }
15
+ .last&.fetch(1)
16
+ end
17
+
18
+ def append(role:, verb:, key:, etag_before:, etag_after:, extras: nil)
19
+ fields = [
20
+ Time.now.utc.iso8601, role, verb, key,
21
+ etag_before || "NULL",
22
+ etag_after || "NULL"
23
+ ]
24
+ fields << JSON.generate(extras) if extras && !extras.empty?
25
+ line = fields.join("\t") + "\n"
26
+ File.open(@path, File::WRONLY | File::APPEND | File::CREAT, 0o644) do |f|
27
+ f.flock(File::LOCK_EX)
28
+ f.write(line)
29
+ end
30
+ end
31
+ end
32
+ end
@@ -0,0 +1,191 @@
1
+ require "fileutils"
2
+ require "json"
3
+ require "time"
4
+ require "yaml"
5
+
6
+ module Textus
7
+ class Builder
8
+ def initialize(store)
9
+ @store = store
10
+ @manifest = store.manifest
11
+ @root = store.root
12
+ end
13
+
14
+ def build(prefix: nil)
15
+ built = []
16
+ @manifest.entries.each do |mentry|
17
+ next unless derived_zone?(mentry)
18
+ next unless mentry.projection || mentry.template
19
+ next if prefix && !mentry.key.start_with?(prefix)
20
+
21
+ result = materialize(mentry)
22
+ built << result
23
+ end
24
+ published_leaves = publish_leaves(prefix: prefix)
25
+ { "protocol" => Textus::PROTOCOL, "built" => built, "published_leaves" => published_leaves }
26
+ end
27
+
28
+ private
29
+
30
+ def publish_leaves(prefix: nil)
31
+ repo_root = File.dirname(@root)
32
+ out = []
33
+ @manifest.entries.each do |mentry|
34
+ next unless mentry.nested && mentry.publish_each
35
+ next if prefix && !mentry.key.start_with?(prefix) && !prefix.start_with?("#{mentry.key}.")
36
+
37
+ @manifest.enumerate(prefix: mentry.key).each do |row|
38
+ next unless row[:manifest_entry].equal?(mentry)
39
+ next if prefix && !row[:key].start_with?(prefix) && row[:key] != prefix
40
+
41
+ target_rel = mentry.publish_target_for(row[:key])
42
+ target_abs = File.expand_path(File.join(repo_root, target_rel))
43
+ unless target_abs.start_with?(File.expand_path(repo_root) + File::SEPARATOR)
44
+ raise PublishError.new(
45
+ "entry '#{mentry.key}': publish_each target '#{target_rel}' for key '#{row[:key]}' escapes repo root",
46
+ )
47
+ end
48
+
49
+ Publisher.publish(source: row[:path], target: target_abs, store_root: @root)
50
+ out << { "key" => row[:key], "source" => row[:path], "target" => target_abs }
51
+ end
52
+ end
53
+ out
54
+ end
55
+
56
+ def derived_zone?(mentry)
57
+ writers = @manifest.zone_writers(mentry.zone)
58
+ writers.include?("build")
59
+ end
60
+
61
+ def materialize(mentry)
62
+ data =
63
+ if mentry.projection
64
+ Projection.new(@store, mentry.projection).run
65
+ else
66
+ { "entries" => [], "count" => 0, "generated_at" => Time.now.utc.iso8601 }
67
+ end
68
+
69
+ bytes =
70
+ case mentry.format
71
+ when "markdown" then build_markdown(mentry, data)
72
+ when "text" then build_text(mentry, data)
73
+ when "json" then build_structured(mentry, data, "json")
74
+ when "yaml" then build_structured(mentry, data, "yaml")
75
+ else raise UsageError.new("builder: unsupported format #{mentry.format.inspect} for '#{mentry.key}'")
76
+ end
77
+
78
+ target_path = File.join(@root, "zones", mentry.path)
79
+ FileUtils.mkdir_p(File.dirname(target_path))
80
+ File.binwrite(target_path, bytes)
81
+
82
+ publish_and_fire(mentry, target_path)
83
+ { "key" => mentry.key, "path" => target_path, "published_to" => mentry.publish_to }
84
+ end
85
+
86
+ # Markdown: projection -> template -> markdown.serialize(frontmatter, body).
87
+ # Frontmatter carries the legacy `generated:` bookkeeping block. Per plan-1.2 §6,
88
+ # `_meta` ordering applies to structured formats only; markdown keeps existing shape
89
+ # for backward compat with consumers reading frontmatter["generated"]["at"].
90
+ def build_markdown(mentry, data)
91
+ data = data.merge("intro" => Intro.run(@store)) if mentry.inject_intro
92
+ body = render_template!(mentry, data)
93
+ frontmatter = {
94
+ "generated" => {
95
+ "at" => Time.now.utc.iso8601,
96
+ "from" => Array(mentry.projection&.fetch("select", nil)).compact,
97
+ },
98
+ }
99
+ Entry.for_format("markdown").serialize(frontmatter: frontmatter, body: body)
100
+ end
101
+
102
+ # Text: projection -> template -> text.serialize(body). No frontmatter, no _meta.
103
+ def build_text(mentry, data)
104
+ data = data.merge("intro" => Intro.run(@store)) if mentry.inject_intro
105
+ body = render_template!(mentry, data)
106
+ Entry.for_format("text").serialize(frontmatter: {}, body: body)
107
+ end
108
+
109
+ # JSON / YAML pipeline. Templateless = default; template = escape hatch.
110
+ def build_structured(mentry, data, format)
111
+ strategy = Entry.for_format(format)
112
+
113
+ content =
114
+ if mentry.template
115
+ parse_rendered_template!(mentry, data, format)
116
+ else
117
+ # Default rule: if the reducer returned a Hash (it replaced `rows`), use it as-is.
118
+ # Otherwise wrap the entries list as { "entries" => [...] } so the top level is a Hash
119
+ # (required to carry _meta).
120
+ if mentry.projection && mentry.projection["reducer"] && data.is_a?(Hash) && !data.key?("entries")
121
+ data
122
+ elsif data.is_a?(Hash) && data["entries"].is_a?(Array)
123
+ { "entries" => data["entries"] }
124
+ else
125
+ data.is_a?(Hash) ? data : { "entries" => Array(data) }
126
+ end
127
+ end
128
+
129
+ final = inject_meta(content, mentry)
130
+ strategy.serialize(frontmatter: {}, body: "", content: final)
131
+ end
132
+
133
+ def render_template!(mentry, data)
134
+ raise TemplateError.new("entry '#{mentry.key}': #{mentry.format} build requires a template") unless mentry.template
135
+
136
+ tpl_path = File.join(@root, "templates", mentry.template)
137
+ raise TemplateError.new("template not found: #{tpl_path}", template_name: mentry.template) unless File.exist?(tpl_path)
138
+
139
+ Mustache.render(File.read(tpl_path), data)
140
+ end
141
+
142
+ def parse_rendered_template!(mentry, data, format)
143
+ tpl_path = File.join(@root, "templates", mentry.template)
144
+ raise TemplateError.new("template not found: #{tpl_path}", template_name: mentry.template) unless File.exist?(tpl_path)
145
+
146
+ rendered = Mustache.render(File.read(tpl_path), data)
147
+ begin
148
+ parsed =
149
+ case format
150
+ when "json" then ::JSON.parse(rendered)
151
+ when "yaml" then ::YAML.safe_load(rendered, permitted_classes: [Date, Time], aliases: false)
152
+ end
153
+ rescue ::JSON::ParserError, Psych::SyntaxError, Psych::DisallowedClass, Psych::AliasesNotEnabled => e
154
+ raise BadRender.new("entry '#{mentry.key}': template did not render valid #{format}: #{e.message}", format: format)
155
+ end
156
+ unless parsed.is_a?(Hash)
157
+ raise BadRender.new("entry '#{mentry.key}': template must render a top-level object/mapping",
158
+ format: format)
159
+ end
160
+
161
+ parsed
162
+ end
163
+
164
+ # Builds the _meta block per §6 ordering and inserts it as the first top-level key.
165
+ def inject_meta(content_hash, mentry)
166
+ meta = {}
167
+ meta["generated_at"] = Time.now.utc.iso8601
168
+ from = Array(mentry.projection&.fetch("select", nil)).compact
169
+ meta["from"] = from unless from.empty?
170
+ meta["template"] = mentry.template if mentry.template
171
+ reducer = mentry.projection&.dig("reducer")
172
+ meta["reducer"] = reducer if reducer
173
+
174
+ # Rebuild so _meta appears first; user content follows.
175
+ out = { "_meta" => meta }
176
+ content_hash.each { |k, v| out[k] = v unless k == "_meta" }
177
+ out
178
+ end
179
+
180
+ def publish_and_fire(mentry, target_path)
181
+ mentry.publish_to.each do |rel|
182
+ repo_root = File.dirname(@root)
183
+ Publisher.publish(source: target_path, target: File.join(repo_root, rel), store_root: @root)
184
+ end
185
+
186
+ envelope = @store.get(mentry.key)
187
+ @store.fire_event(:build, key: mentry.key, envelope: envelope,
188
+ sources: Array(mentry.projection&.fetch("select", nil)).compact)
189
+ end
190
+ end
191
+ end
@@ -0,0 +1,63 @@
1
+ require "json"
2
+ require "csv"
3
+ require "yaml"
4
+ require "rexml/document"
5
+
6
+ module Textus
7
+ module BuiltinFetchers
8
+ # rubocop:disable Metrics/AbcSize, Metrics/MethodLength
9
+ def self.register_all
10
+ Textus.fetcher(:json) do |config:, store:|
11
+ _ = store
12
+ data = JSON.parse(config["bytes"].to_s)
13
+ { frontmatter: {}, body: YAML.dump(data) }
14
+ end
15
+
16
+ Textus.fetcher(:csv) do |config:, store:|
17
+ _ = store
18
+ rows = CSV.parse(config["bytes"].to_s, headers: true).map(&:to_h)
19
+ { frontmatter: {}, body: YAML.dump(rows) }
20
+ end
21
+
22
+ Textus.fetcher(:"markdown-links") do |config:, store:|
23
+ _ = store
24
+ links = config["bytes"].to_s.scan(%r{\[([^\]]+)\]\((https?://[^)\s]+)\)}).map do |text, href|
25
+ { "text" => text, "href" => href }
26
+ end
27
+ { frontmatter: {}, body: YAML.dump(links) }
28
+ end
29
+
30
+ Textus.fetcher(:"ical-events") do |config:, store:|
31
+ _ = store
32
+ events = []
33
+ current = nil
34
+ config["bytes"].to_s.each_line do |line|
35
+ line = line.strip
36
+ case line
37
+ when "BEGIN:VEVENT" then current = {}
38
+ when "END:VEVENT"
39
+ events << current if current
40
+ current = nil
41
+ when /\A(SUMMARY|DTSTART|DTEND|UID|LOCATION|DESCRIPTION):(.*)\z/
42
+ current[Regexp.last_match(1).downcase] = Regexp.last_match(2) if current
43
+ end
44
+ end
45
+ { frontmatter: {}, body: YAML.dump(events) }
46
+ end
47
+
48
+ Textus.fetcher(:rss) do |config:, store:|
49
+ _ = store
50
+ doc = REXML::Document.new(config["bytes"].to_s)
51
+ items = doc.elements.to_a("//item").map do |item|
52
+ {
53
+ "title" => item.elements["title"]&.text,
54
+ "link" => item.elements["link"]&.text,
55
+ "pubDate" => item.elements["pubDate"]&.text,
56
+ }
57
+ end
58
+ { frontmatter: {}, body: YAML.dump(items) }
59
+ end
60
+ end
61
+ # rubocop:enable Metrics/AbcSize, Metrics/MethodLength
62
+ end
63
+ end