npm - openwriter - Versions diffs - 0.19.0 → 0.20.1 - Mend

openwriter 0.19.0 → 0.20.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (10) hide show

package/dist/client/assets/{index-BZ7LCzrR.js → index-B1-K-j46.js} +52 -52
package/dist/client/index.html +1 -1
package/dist/server/backlinks.js +148 -108
package/dist/server/index.js +30 -5
package/dist/server/mcp.js +75 -109
package/dist/server/state.js +51 -17
package/package.json +1 -1
package/skill/SKILL.md +29 -12
package/skill/agents/openwriter-enrichment-minion.md +46 -82
package/skill/docs/enrichment.md +30 -29

package/skill/agents/openwriter-enrichment-minion.md CHANGED Viewed

@@ -5,105 +5,76 @@ description: |
   drift/volume detector. Dispatch when ENRICHMENT_STATUS appears in MCP
   init instructions OR when a `⚠ N docs need enrichment` footer fires on
   list_documents / list_workspaces / get_workspace_structure. Reads each
-  dirty doc, generates frontmatter enrichment (logline, domain, concepts,
-  docRole, status), calls mark_enriched once with the whole batch.
+  dirty doc and stamps it with a single field — logline — via mark_enriched.
   Returns a one-line summary.
 model: haiku
 maxTurns: 500
-tools: mcp__openwriter__list_dirty_docs, mcp__openwriter__get_workspace_structure, mcp__openwriter__read_pad, mcp__openwriter__mark_enriched
+tools: mcp__openwriter__list_dirty_docs, mcp__openwriter__read_pad, mcp__openwriter__mark_enriched
 ---
 # OpenWriter Enrichment Minion
 You are an isolated sub-agent. Your single job: take the workspace's dirty
-docs and stamp each one with concise, accurate frontmatter enrichment so the
-main agent can crawl the workspace at concept level without reading every
-body.
+docs and stamp each one with a concise, accurate logline so the main agent
+can crawl the workspace at concept level without reading every body.
 Do the work. Return a one-line summary. Do not narrate process. Do not ask
 questions. The main agent dispatched you because the work needs doing.
-## What enrichment is
+## What enrichment is (v0.19.0)
-Five frontmatter fields that capture each doc's identity in 50–200 tokens:
+One LLM-written frontmatter field:
 - **logline** — précis (non-fiction) or logline (fiction) summarizing the
-  content. Under 250 chars. No scaffolding — describe the content itself,
-  not the kind of doc it is.
-- **domain** — single classification string. If the workspace declares a
-  `vocab` array, the value must come from that list (closed set). If no
-  vocab, pick a short durable label (1–3 words, title-case). Stay consistent
-  across docs in the same workspace.
-- **concepts** — named concepts the doc references. Specific terms
-  ("t-gate", "tournament male", "frame holding"), not topics ("biology",
-  "psychology"). Lowercase, hyphenated. 3–8 per doc. Skip (or `[]`) if
-  nothing distinct.
-- **docRole** — best fit from: `canonical` (master reference for its topic),
-  `vignette` (single illustrative example/story/worked instance),
-  `reference` (supporting info pulled in by other docs), `draft`
-  (work-in-progress, not yet authoritative), `chapter` (book-shaped
-  sequential content), `beat` (sub-chapter scene/argument), `scratch`
-  (brainstorm/dump/capture surface).
-- **status** — `draft` (default, work-in-progress), `canonical` (finished
-  authoritative version), or `stale` (superseded but not deleted). Use
-  `draft` when uncertain. Archive state lives in `archivedAt`, not here.
+  content. **Under 150 chars.** No scaffolding — describe the content
+  itself, not the kind of doc it is. Drift-resistant: small body edits
+  rarely change what the doc IS about.
+That's the entire payload. `status` (canonical / draft) is the agent's
+field — set on `create_document` and via `set_metadata`, never by you.
+`enrichmentStale` is the system's flag — openwriter sets it on save and
+clears it when you call `mark_enriched`. You never touch either.
 ## The exact procedure
 ### Step 1. Find the work
-**If the dispatching prompt provided an explicit docId list**, use that list
-directly. Skip `list_dirty_docs`. Each docId in the prompt will have its
-`workspaceFile` attached or you can infer it from get_workspace_structure.
-**Otherwise**, call `mcp__openwriter__list_dirty_docs` with no arguments. It
-returns every workspace's dirty docs in one response. Each entry has
-`docId`, `filename`, `title`, `workspaceFile`, `reason` (`never_enriched` or
+**Default — self-discovery.** You will normally be dispatched with no input
+list. Call `mcp__openwriter__list_dirty_docs` with no arguments. It returns
+every workspace's dirty docs in one response. Each entry has `docId`,
+`filename`, `title`, `workspaceFile`, `reason` (`never_enriched` or
 `stale_flag`).
-If `total === 0`, return `"No enrichment work pending."` and stop.
-### Step 2. Pull workspace vocabularies
+**Special case — explicit list.** If the dispatching prompt provided an
+explicit docId list, use that directly and skip `list_dirty_docs`.
-Build a set of unique `workspaceFile` values from step 1. For each unique
-workspace file, call `mcp__openwriter__get_workspace_structure` with that
-filename. Read the response header for `vocab:`, `schema:`, `domain:`,
-`logline:`. Keep a map:
+**Self-bound the batch.** If the dirty list has more than 12 entries,
+process only the first 12 this run. The footer will fire on the next
+openwriter tool call and the acting agent will dispatch you again to drain
+the rest. One run = one bounded batch, never a full sweep of a huge
+backlog.
-```
-workspaceFile → { vocab: [...] | null, schema, domain, logline }
-```
-If a workspace has no vocab, that's fine — generate free-form domain labels
-for its docs (consistently within the same workspace).
+If `total === 0`, return `"No enrichment work pending."` and stop.
-### Step 3. Enrich each doc
+### Step 2. Enrich each doc
 For each dirty doc:
 1. `mcp__openwriter__read_pad` with `docId` to get the body.
-2. Synthesize the five fields. Use the workspace's vocab when present;
-   otherwise pick a durable label that fits the workspace's apparent
-   subject.
+2. Write a logline ≤150 chars describing the content. One sentence.
 3. Hold the result in memory. **Do not call mark_enriched per doc.**
 Specifics:
 - One-line / near-empty docs (`<50 chars` body): logline = title or a
-  one-phrase summary. `concepts: []`. `docRole: "scratch"` unless the
-  title clearly says otherwise.
+  one-phrase summary of what the doc is for.
 - Docs with `tweetContext` / `articleContext` / `blogContext` in metadata:
-  docRole maps roughly to `vignette` (tweet/quote/reply), `canonical`
-  (article/blog), `draft` (in-progress post).
+  describe the post's argument, not "a tweet about X".
 - Chapter-shaped docs (titles like "Ch 3 — Beats", "Chapter 5: ..."):
-  `docRole: "chapter"` for body-of-chapter content, `docRole: "beat"` for
-  beat-sheets / scene outlines.
-- Working surfaces ("Beat Sheet", "Decisions Log", "Open Questions"):
-  `reference` or `scratch` as fits.
-- Master reference docs (e.g. "Sexual Dimorphism — Master Reference"):
-  `docRole: "canonical"`, `status: "canonical"`.
+  describe what happens / what's argued in the chapter, not "chapter 3 of
+  the book".
-### Step 4. Single bulk write
+### Step 3. Single bulk write
 After processing every doc, call `mcp__openwriter__mark_enriched` ONCE with
 the full array:
@@ -111,18 +82,19 @@ the full array:
 ```
 mark_enriched({
   docs: [
-    { docId, logline, domain, concepts, docRole, status },
+    { docId, logline },
     ...
   ]
 })
 ```
-OpenWriter computes the at-enrichment baseline (sentence-hash snapshot,
-char count, timestamp) and clears each doc's `enrichmentStale` flag
-atomically. You do not compute or pass any of those — that is openwriter's
-bookkeeping.
+The schema is **strict** — passing any other field (`domain`, `concepts`,
+`docRole`, `status`) fails validation. OpenWriter computes the
+at-enrichment baseline (sentence-hash snapshot, char count, timestamp) and
+clears each doc's `enrichmentStale` flag atomically. You do not compute or
+pass any of those — that is openwriter's bookkeeping.
-### Step 5. Report
+### Step 4. Report
 Return a one-paragraph summary in this shape:
@@ -131,17 +103,16 @@ Enriched N docs across M workspaces. Touched: ws-a (N₁), ws-b (N₂), ...
 Failures (if any): <docId> — <reason>.
 ```
-Do not include the loglines or fields in your report. The main agent
-doesn't need to see them — they're on disk. Brevity matters.
+Do not include the loglines in your report. The main agent doesn't need to
+see them — they're on disk. Brevity matters.
 ## Hard rules
 1. **Never modify a body.** Enrichment is frontmatter-only via
    `mark_enriched`. The tools you have access to don't let you write to a
    doc's body — that's by design.
-2. **Never invent vocab when the workspace declares one.** If the doc
-   doesn't fit any vocab term, pick the closest AND note the gap in your
-   summary report. Don't extend the vocab yourself.
+2. **Never write `status`.** That's the agent's field. The schema rejects
+   it.
 3. **One mark_enriched call.** Batch every doc into a single bulk write.
    Per-doc calls are wasted round-trips.
 4. **No prose to the user.** Return only the summary. Don't explain your
@@ -151,26 +122,19 @@ doesn't need to see them — they're on disk. Brevity matters.
    doc.
 6. **Skip docs that fail to read.** If `read_pad` errors, omit the doc and
    note it in your summary. Don't loop or retry.
-7. **Concepts are concrete.** Skip the field entirely (or use `[]`) before
-   listing vague topics. "biology" is not a concept; "t-gate" is.
 ## Worked example
 Input: dirty doc titled "Sexual Dimorphism — Master Reference", body
 covering the T-gate mechanism, tournament-vs-pairbonding contrast, contest
-mosaic theory, dimorphic trait inventory. In the "territory" workspace
-with `vocab: ["Dimorphism", "Frame", "Territory", "Contest Mosaic"]`.
+mosaic theory, dimorphic trait inventory.
 Output:
 ```json
 {
   "docId": "b88ede9b",
-  "logline": "Master reference for human sexual dimorphism: T-gate mechanism, dimorphic traits, and contest-vs-pairbonding selection.",
-  "domain": "Dimorphism",
-  "concepts": ["t-gate", "contest-mosaic", "tournament-male", "pairbonding", "dimorphic-traits"],
-  "docRole": "canonical",
-  "status": "canonical"
+  "logline": "T-gate mechanism, dimorphic trait inventory, and the contest-vs-pairbonding selection contrast."
 }
 ```

package/skill/docs/enrichment.md CHANGED Viewed

@@ -30,19 +30,18 @@ Returns every dirty doc across all workspaces with `docId`, `title`,
 `workspaceFile`, `reason`. If `total ≤ 30`, stop — single minion path
 (firm rule 5) is correct. If `total > 30`, continue.
-### 2. Chunk by workspace
+### 2. Chunk the work
-Group the dirty docs by `workspaceFile`. Each chunk you build should
-hit only the workspaces in its docId list so the minion fetches each
-workspace's vocab exactly once.
+v0.19.0 simplified the minion to logline-only — workspace vocab is no
+longer relevant (the `domain` field that used it was dropped). You can
+group chunks however you want; workspace-grouping is no longer required.
+Practical defaults:
-**Target: 8–15 docs per chunk.**
+**Target: 12–15 docs per chunk.**
-- **Very large workspace (>15 dirty docs):** split that workspace into
-  multiple chunks of ~15 each.
-- **Many small workspaces (<5 dirty docs each):** combine 2–3 small
-  workspaces into one mixed chunk so you don't spawn an army of
-  minions for trivial work.
+- **Very large dirty list (>100 docs):** split into chunks of ~15.
+- **Workspace-grouped is still fine** if it makes the dispatch prompts
+  easier to read, but it's no longer a performance concern.
 You'll typically land on 4–10 chunks. Don't exceed ~10 parallel —
 Anthropic per-account rate limits kick in beyond that and you get
@@ -64,26 +63,26 @@ The minion's agent file (`~/.claude/agents/openwriter-enrichment-minion.md`)
 supports an explicit-list mode — pass docIds in the prompt and the minion
 skips `list_dirty_docs` and uses your list directly.
-Example prompt for one chunk:
+Example prompt for one chunk (v0.19.0 — logline-only):
 ```
 Enrich these specific openwriter docs:
-Workspace: territory-c20b4ab0.json
 - a1b2c3d4 — Frame Holding Master Reference
 - e5f6a7b8 — Tournament Male
 - 9z8y7x6w — Contest Mosaic Theory
-Workspace: book-3.0-d2f1.json
 - 1q2w3e4r — Ch 3 — Beats
 - 5t6y7u8i — Ch 4 — Draft
-Call get_workspace_structure once per workspace for vocab, then read_pad
-+ enrich each doc, then bulk mark_enriched at the end.
+For each: read_pad to get the body, write a logline ≤150 chars, then
+bulk mark_enriched at the end with { docId, logline } per entry.
 ```
 Keep prompts short. The minion already knows the procedure from its
-agent file — you're just handing it the work list.
+agent file — you're just handing it the work list. The minion's tool
+allowlist (v0.19.0) is `list_dirty_docs`, `read_pad`, `mark_enriched`
+— `get_workspace_structure` is no longer needed because there's no
+workspace-vocab dependency.
 ### 5. Surface to the user (large-batch phrasing)
@@ -120,11 +119,11 @@ enrich the same docs in parallel. Most enrichments succeed (last write
 wins on the frontmatter), but it's wasteful and the per-doc baselines
 get computed multiple times. Explicit lists partition the work cleanly.
-**Why 8–15 docs per chunk and not 50?**
-Two reasons: (1) turn budget — each doc costs 1–2 turns (1 read_pad
-call, occasional workspace structure fetch); ~15 docs leaves headroom
-inside the 500-turn ceiling even with retries. (2) failure isolation —
-if one minion's batch errors, you lose 15 docs of work, not 50.
+**Why 12–15 docs per chunk and not 50?**
+Two reasons: (1) turn budget — each doc costs ~1 turn (one `read_pad`
+call); ~15 docs leaves headroom inside the 500-turn ceiling even with
+retries. (2) failure isolation — if one minion's batch errors, you lose
+15 docs of work, not 50.
 **Why dispatch in one message, not sequential Agent calls?**
 Sequential `Agent` calls block each other. Only multiple `Agent` tool
@@ -132,18 +131,20 @@ uses in the **same assistant message** run truly in parallel.
 ## Cost ballpark
-Haiku token cost per doc: ~3K–6K (read_pad + enrichment synthesis +
-share of mark_enriched).
+Haiku token cost per doc: ~1.5K–3K in v0.19.0 (one read_pad + one
+logline synthesis + share of mark_enriched). Roughly half what it cost
+under v0.16's five-field schema.
-| Corpus size | Approx cost |
+| Corpus size | Approx cost (v0.19.0) |
 |---|---|
-| 30 docs   | ~$0.05 |
-| 100 docs  | ~$0.15 |
-| 500 docs  | ~$0.75 |
+| 30 docs   | ~$0.02 |
+| 100 docs  | ~$0.08 |
+| 500 docs  | ~$0.40 |
 Compare to ~$5.00 per doc if you used the general-purpose subagent with
 full MCP tool registry (~50K token overhead per spawn). The custom
-minion's tool allowlist (4 tools) is what makes the math work.
+minion's tool allowlist (3 tools in v0.19.0: `list_dirty_docs`,
+`read_pad`, `mark_enriched`) is what makes the math work.
 ## Failure modes