@openplaybooks/converge 0.2.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/LICENSE +21 -0
- package/README.md +131 -0
- package/dist/index.js +212278 -0
- package/package.json +54 -0
- package/skills/converge-control/SKILL.md +208 -0
- package/skills/converge-control/reference/cli.md +128 -0
- package/skills/converge-control/reference/events.md +165 -0
- package/skills/converge-control/troubleshooting/playbook.md +367 -0
- package/skills/converge-development/SKILL.md +303 -0
- package/skills/converge-development/reference/framework-map.md +294 -0
- package/skills/converge-development/reference/observability.md +132 -0
- package/skills/converge-development/troubleshooting/playbook.md +213 -0
- package/skills/converge-planning/SKILL.md +302 -0
- package/skills/converge-planning/references/anti-patterns.md +35 -0
- package/skills/converge-planning/references/model.md +317 -0
- package/skills/converge-planning/references/patterns.md +169 -0
- package/skills/converge-planning/references/phases.md +168 -0
- package/skills/converge-planning/references/schema.md +313 -0
- package/skills/converge-planning/references/static-dynamic.md +38 -0
- package/skills/converge-planning/references/tests.md +91 -0
|
@@ -0,0 +1,317 @@
|
|
|
1
|
+
# Model Reference
|
|
2
|
+
|
|
3
|
+
Full model reference for converge-planning. Read when you need to understand goal decomposition, convergence, delegation-contract theory, DAG semantics, or the three principles in depth. For the abbreviated version, see `../SKILL.md`.
|
|
4
|
+
|
|
5
|
+
---
|
|
6
|
+
|
|
7
|
+
## The Model: Goal → Deliverable Sub-Goals → Convergence
|
|
8
|
+
|
|
9
|
+
**A playbook starts with a goal and decomposes into deliverable sub-goals.** The user wants a complete, usable result. If that result is too large for one agent, split it into smaller complete results. Each sub-goal produces its own deliverable. The parent converges those deliverables into the unified whole.
|
|
10
|
+
|
|
11
|
+
```
|
|
12
|
+
USER'S GOAL: "Working payment dashboard"
|
|
13
|
+
│
|
|
14
|
+
├── Sub-goal A: Database schema + seed data → migration.sql + seed.sql
|
|
15
|
+
├── Sub-goal B: Payment API endpoints → working API with passing tests
|
|
16
|
+
├── Sub-goal C: Dashboard UI → rendered dashboard with live data
|
|
17
|
+
└── Sub-goal D: Auth + permissions → login flow with role checks
|
|
18
|
+
|
|
19
|
+
Each sub-goal produces a complete deliverable. The parent converges them.
|
|
20
|
+
```
|
|
21
|
+
|
|
22
|
+
This pattern is recursive. Sub-goal B ("Payment API") might further split into "POST /charges endpoint," "GET /transactions endpoint," and "Webhook handler" — each a complete, testable deliverable.
|
|
23
|
+
|
|
24
|
+
### The three phases of execution
|
|
25
|
+
|
|
26
|
+
**1. Decompose** — The task analyzes its goal and identifies the deliverable sub-goals that together achieve it. It writes a contract for each child: scope, expected deliverable (outputs), checks. The set of children's deliverables must form a **complete cover** of the parent's goal — nothing left unassigned, no overlap.
|
|
27
|
+
|
|
28
|
+
**2. Execute** — Children produce their deliverables independently. They don't know about each other. Each reads its declared `inputs:`, does its work, produces its declared `outputs:`. Children of the same parent can run in parallel when their `depends_on` edges allow it.
|
|
29
|
+
|
|
30
|
+
**3. Converge** — The parent gathers children's deliverables, integrates them, and produces the converged result. This is active work, not passive grouping. The parent reads children's files via its own `inputs:`, synthesizes, validates cross-child consistency, and produces its `outputs:` — the integrated deliverable.
|
|
31
|
+
|
|
32
|
+
**The convergence step is what makes a parent a real task.** A container without convergence is just a folder — it groups children but adds no value. A container with convergence produces something none of its children produce individually: the integrated result.
|
|
33
|
+
|
|
34
|
+
### Example: three-level goal decomposition
|
|
35
|
+
|
|
36
|
+
```
|
|
37
|
+
Goal A: "Build Dashboard"
|
|
38
|
+
├── DECOMPOSE: split into Data Pipeline + UI Components
|
|
39
|
+
├── CHILDREN EXECUTE:
|
|
40
|
+
│ ├── B: "Data Pipeline"
|
|
41
|
+
│ │ ├── DECOMPOSE: split into B1 (raw data) + B2 (clean data)
|
|
42
|
+
│ │ ├── B1 produces raw-data.json, B2 produces clean-data.json
|
|
43
|
+
│ │ └── CONVERGE: B validates schema, joins, produces data.json
|
|
44
|
+
│ └── C: "UI Components"
|
|
45
|
+
│ ├── DECOMPOSE: split into C1 (charts) + C2 (tables)
|
|
46
|
+
│ ├── C1 produces charts/, C2 produces tables/
|
|
47
|
+
│ └── CONVERGE: C validates components, produces components/
|
|
48
|
+
└── CONVERGE: A reads data.json + components/, assembles dashboard,
|
|
49
|
+
validates integration (data binds to UI), produces dashboard/
|
|
50
|
+
```
|
|
51
|
+
|
|
52
|
+
A's `outputs:` is `dashboard/` — the converged dashboard. B's `outputs:` is `data.json` — the converged data. C's `outputs:` is `components/` — the converged UI. Each level adds value through convergence.
|
|
53
|
+
|
|
54
|
+
### Why convergence matters
|
|
55
|
+
|
|
56
|
+
- **Integration bugs surface at the right level.** If B's data doesn't bind to C's charts, A's convergence check catches it — not the user.
|
|
57
|
+
- **The DAG encodes real dependencies.** A depends on B *because* A's convergence reads B's output. The edge has semantic meaning.
|
|
58
|
+
- **Re-running is surgical.** If B1 fails, re-run B's subtree (B1 → B2 → B converge). C is untouched.
|
|
59
|
+
- **Each level can be validated independently.** B's convergence check validates the data pipeline in isolation. A's convergence check validates the integration.
|
|
60
|
+
|
|
61
|
+
**The TASK.md body is the converge prompt.** Decomposition is handled by static children or by runtime spawn commands emitted from a passthrough container. The body contains only convergence instructions — what to read, how to integrate, what to validate. It runs after children complete.
|
|
62
|
+
|
|
63
|
+
---
|
|
64
|
+
|
|
65
|
+
## The Delegation Contract
|
|
66
|
+
|
|
67
|
+
The division-convergence model is implemented through delegation contracts. **A TASK.md is the contract for one node's division, execution, and convergence.**
|
|
68
|
+
|
|
69
|
+
A parent owns the larger problem; children own bounded sub-problems the parent has handed off. Each task is **self-contained** — its `TASK.md` fully specifies the contract: scope, inputs, outputs, acceptance checks. Like a company: a director owns "ship the product" and delegates to a team lead, who delegates to an engineer, who delegates to a junior. At every level the work is bounded, specified, and accepted by checks. Nobody upstream micromanages downstream; nobody downstream second-guesses the parent's choice of sub-scopes.
|
|
70
|
+
|
|
71
|
+
Every `TASK.md` has six contract parts:
|
|
72
|
+
|
|
73
|
+
| Contract part | TASK.md field | What it specifies |
|
|
74
|
+
|---|---|---|
|
|
75
|
+
| **Scope** | `title` + `description` + body | The bounded problem this task owns — includes both division logic and convergence logic |
|
|
76
|
+
| **Inputs** (Context In) | `inputs:` | Files the executor reads — children's outputs (for convergence) and upstream data |
|
|
77
|
+
| **Outputs** (Context Out) | `outputs:` | Files the executor produces — the *converged* result for this level |
|
|
78
|
+
| **Acceptance** | `checks:` | Deterministic predicates that decide done/not-done — must include convergence validation |
|
|
79
|
+
| **Resources** | `skills:`, `references:`, `vars:` | Tools and data the executor may use |
|
|
80
|
+
| **Dependencies** | `depends_on:` | Tasks that must complete first — children, upstream siblings |
|
|
81
|
+
|
|
82
|
+
A contract is **leaky** when any part is missing, vague, or over-broad. Leaky contracts break the chain: the executor either can't complete the work or has to read outside its scope.
|
|
83
|
+
|
|
84
|
+
### The convergence handshake
|
|
85
|
+
|
|
86
|
+
```
|
|
87
|
+
Parent declares in its TASK.md:
|
|
88
|
+
depends_on: [B, C] # "I need these before I can converge"
|
|
89
|
+
inputs: # "I'll read these to converge"
|
|
90
|
+
- B/data.json
|
|
91
|
+
- C/components/
|
|
92
|
+
|
|
93
|
+
Child B declares in its TASK.md:
|
|
94
|
+
outputs: # "I'll produce this for my parent"
|
|
95
|
+
- B/data.json
|
|
96
|
+
|
|
97
|
+
Child C declares in its TASK.md:
|
|
98
|
+
outputs:
|
|
99
|
+
- C/components/
|
|
100
|
+
```
|
|
101
|
+
|
|
102
|
+
The file paths are the handshake. Parent says "I expect these files." Children say "I produce these files." When paths match, the DAG edge is wired. When they don't, it's a leaky contract.
|
|
103
|
+
|
|
104
|
+
### Containers: division + convergence, not just grouping
|
|
105
|
+
|
|
106
|
+
A container task's body has two sections:
|
|
107
|
+
|
|
108
|
+
1. **Division instructions** — how to split the scope into children. What each child owns. What template to use. What data drives the division.
|
|
109
|
+
2. **Convergence instructions** — after children complete, how to integrate their outputs. What to validate across children. What the converged output looks like.
|
|
110
|
+
|
|
111
|
+
A container without convergence instructions is a red flag. Ask: *what does this container produce that none of its children produce individually?* If the answer is "nothing," collapse it.
|
|
112
|
+
|
|
113
|
+
### Decompose scope, not process
|
|
114
|
+
|
|
115
|
+
**A task is a contract of *result*, not a contract of *process*.** The scope is *what exists when this is done*, expressed as `outputs:` + `checks:`. The body says how to do the work, but the contract doesn't bind the executor to specific steps — only to the result.
|
|
116
|
+
|
|
117
|
+
When you split a parent into children, you're splitting *the result* into smaller results, each owned by one child. You are **not** splitting "the workflow we run to produce the result" into stages.
|
|
118
|
+
|
|
119
|
+
| Process decomposition (wrong) | Scope decomposition (right) |
|
|
120
|
+
|---|---|
|
|
121
|
+
| `001-spec`, `002-author`, `003-prompts`, `004-concepts` — the four stages of a per-token pipeline, each operating on *all tokens* | `001-catalog` writes `tokens-catalog.json`; `002-craft` is a dynamic container that, for each entry, spawns the per-token pipeline and produces `{token.md, prompt.md, concept.png}` |
|
|
122
|
+
| `001-fetch-data`, `002-clean-data`, `003-analyze-data` — three stages over the same dataset | `001-build-dataset` produces `dataset.parquet` (cleaned); `002-report` produces `report.md` from it |
|
|
123
|
+
| `001-design`, `002-implement`, `003-test` per feature, repeated across N features | `001-spec` lists the N features; `002-deliver` is a dynamic container per feature, each owning its own design→build→test internally |
|
|
124
|
+
|
|
125
|
+
**The diagnostic — three questions to ask of any sibling set:**
|
|
126
|
+
|
|
127
|
+
1. **Are the names verbs or nouns?** Process stages are verbs (*author*, *fetch*, *clean*, *implement*). Scope chunks are nouns (*catalog*, *dataset*, *feature-X*). Verbs are a smell.
|
|
128
|
+
2. **Does each child produce a *complete* result, or a partial product the next stage finishes?** If child N's outputs are inputs to child N+1 over the same population, you've sliced the workflow horizontally — that's process. Scope decomposition slices the *population* (per-token, per-feature, per-scene); each child owns one slice end-to-end.
|
|
129
|
+
3. **What does failure look like?** Process: stage 3 fails on the 17th item, the whole stage retries. Scope: the 17th-item task fails, the other 16 stand. Re-running is just re-running that one task.
|
|
130
|
+
|
|
131
|
+
Why scope wins:
|
|
132
|
+
|
|
133
|
+
- **Failures are small.** A spawned child for one token re-runs in isolation; a bulk-stage task retries every token because the contract says "every token."
|
|
134
|
+
- **Cost is visible.** Each scope-child's cost is its own line in the journal. Process-stages aggregate cost across the whole population — you see it after the fact.
|
|
135
|
+
- **The contract is closed.** A scope-child's `outputs:` are *its* deliverables. A process-stage's outputs are intermediate goo the next stage consumes — the contract leaks across the seam.
|
|
136
|
+
- **Parallelism is implicit.** Per-entity scope children run in parallel for free. Process stages serialize.
|
|
137
|
+
|
|
138
|
+
If a child's name reads "the part of the workflow where we ___", rewrite it. The right name reads "the ___ that exists when this is done."
|
|
139
|
+
|
|
140
|
+
The body of a task can still describe *how* to produce the result (which scripts, which order, which APIs). That's executor guidance, not the contract. The contract is the `outputs:` + `checks:` line.
|
|
141
|
+
|
|
142
|
+
### Files are the contract currency
|
|
143
|
+
|
|
144
|
+
**Tasks pass work through files, not through TASK.md bodies.** A `TASK.md` contains *instructions only* — the scope, the references, the checks. The actual work product — specs, designs, code, data, reports — lives in files declared under `outputs:`, and downstream tasks (including the parent's convergence step) consume them via `inputs:`. The contract is the only thing inlined; everything else is a file path.
|
|
145
|
+
|
|
146
|
+
| Goes inline in TASK.md | Goes in a file |
|
|
147
|
+
|---|---|
|
|
148
|
+
| Scope, instructions, acceptance criteria | Specs, requirements, designs |
|
|
149
|
+
| Skill references, tool names | Code, data, configs |
|
|
150
|
+
| Short literal vars (≤ 10 lines) | Anything ≥ 10 lines, anything reused, anything structured |
|
|
151
|
+
|
|
152
|
+
Why it matters:
|
|
153
|
+
|
|
154
|
+
- **Convergence only works if children's outputs are files.** The parent's convergence step reads files, not prompts. If a child's result lives only in its execution trace, the parent can't converge it.
|
|
155
|
+
- **Delegation only works if the work product is portable.** If task B needs task A's spec, A writes the spec to a file; B reads it. Stuffing the spec into B's prompt couples B to A's wording and bloats context.
|
|
156
|
+
- **Self-containment is verifiable.** "Can this executor complete the work given only TASK.md + declared inputs?" is answerable when work flows through files. It's not when context leaks via shared prompts.
|
|
157
|
+
- **Format matches use.** Markdown for specs and instructions humans read; JSON for structured data machines parse; JSONL for append-only event streams.
|
|
158
|
+
|
|
159
|
+
**Rule of thumb:** if you're tempted to paste content into a TASK.md body that another task will need, you've found a missing artifact. Have the producing task write a file; declare it as an `output:`; have the consumer declare it as an `input:`. The TASK.md body is for *how to do the work*, not *what to do it with*.
|
|
160
|
+
|
|
161
|
+
### Not middle work
|
|
162
|
+
|
|
163
|
+
**Every task output must be a complete, usable deliverable.** This is the most expensive rule to violate. Middle work — partial results that the next task finishes — breaks the contract chain and makes verification impossible at the task level.
|
|
164
|
+
|
|
165
|
+
**The diagnostic — three questions for every task:**
|
|
166
|
+
|
|
167
|
+
1. **"Can someone use this output directly?"** If the output is instructions, plans, specs, or partial work that needs further processing before it's usable — it's middle work. The task isn't done.
|
|
168
|
+
2. **"Does the next task finish this output, or consume it?"** If *finish* (the next task continues building the same thing) → middle work. Decompose into two complete deliverables instead. If *consume* (reads it as a complete input to produce its own distinct deliverable) → correct.
|
|
169
|
+
3. **"Is this a complete thing that exists, or a stage of producing a thing?"** If *stage* → middle work. The right decomposition splits the *population* (per-entity, per-endpoint, per-feature), each owning its end-to-end result.
|
|
170
|
+
|
|
171
|
+
**Examples:**
|
|
172
|
+
|
|
173
|
+
| Middle work (wrong) | Complete deliverable (right) |
|
|
174
|
+
|---|---|
|
|
175
|
+
| `design-database` → `implement-database` — design is a stage, implementation finishes it | `database-schema` produces migration.sql + seed.sql (complete, runnable) |
|
|
176
|
+
| `spec-api` → `build-api` — spec is a stage, build finishes it | `charges-endpoint` produces working endpoint with passing tests |
|
|
177
|
+
| `prepare-project` — installs deps, creates folders (not usable) | `project-skeleton` produces runnable app with health-check endpoint |
|
|
178
|
+
| `analyze-codebase` — produces analysis.md (planning artifact) | Not a task at all — research the AI does while planning |
|
|
179
|
+
|
|
180
|
+
**The golden rule:** if you can't hand the output to a user and they can use it, the task isn't done. Split differently.
|
|
181
|
+
|
|
182
|
+
**Middle work vs. convergence:** A parent converging children's deliverables is *not* middle work — the children produced complete deliverables, and the parent produces a new complete deliverable (the integration). The key distinction: children's outputs are complete on their own; convergence adds integration value, not completion value.
|
|
183
|
+
|
|
184
|
+
### Requirement coverage
|
|
185
|
+
|
|
186
|
+
**Before writing any contract, verify every user requirement maps to at least one sub-goal.** Missing requirements are the second most expensive mistake after middle work.
|
|
187
|
+
|
|
188
|
+
**The process:**
|
|
189
|
+
|
|
190
|
+
1. **List every requirement** extracted from the user's prompt and discovery. Number them (R1, R2, R3...). Be specific: "Users can reset their password via email link" not "auth features."
|
|
191
|
+
2. **Map each requirement to sub-goal(s).** For each requirement, identify which sub-goal's deliverable fulfills it. One requirement may map to multiple sub-goals. One sub-goal may fulfill multiple requirements.
|
|
192
|
+
3. **Flag gaps.** A requirement with zero mappings → missing sub-goal. Add one or adjust an existing sub-goal's scope.
|
|
193
|
+
4. **Flag creep.** A sub-goal with zero mapped requirements → it doesn't serve the user's goal. Remove it or explicitly justify why it's necessary infrastructure (e.g., "CI/CD setup" even if not explicitly requested).
|
|
194
|
+
5. **Check the union.** Reading all sub-goal deliverables together, would a user say "yes, that's what I asked for"? If not, what's missing?
|
|
195
|
+
|
|
196
|
+
**Example:**
|
|
197
|
+
|
|
198
|
+
```
|
|
199
|
+
User goal: "Blog with comments and RSS feed"
|
|
200
|
+
|
|
201
|
+
Requirements:
|
|
202
|
+
R1: Author can write and publish posts
|
|
203
|
+
R2: Readers can leave comments on posts
|
|
204
|
+
R3: RSS feed of published posts
|
|
205
|
+
R4: Posts support markdown formatting
|
|
206
|
+
R5: Mobile-responsive design
|
|
207
|
+
|
|
208
|
+
Sub-goal mapping:
|
|
209
|
+
A: "Post CRUD + publishing" → R1, R4
|
|
210
|
+
B: "Comment system" → R2
|
|
211
|
+
C: "RSS feed endpoint" → R3
|
|
212
|
+
D: "Responsive layout" → R5
|
|
213
|
+
E: "Database + auth" → (infrastructure, serves A, B)
|
|
214
|
+
|
|
215
|
+
R1 ✓ R2 ✓ R3 ✓ R4 ✓ R5 ✓ — full coverage
|
|
216
|
+
E has no direct requirement → justified as shared infrastructure
|
|
217
|
+
```
|
|
218
|
+
|
|
219
|
+
This check takes 2 minutes. It catches the gaps that cause rework downstream.
|
|
220
|
+
|
|
221
|
+
### Playbook is reusable; artifacts are per project
|
|
222
|
+
|
|
223
|
+
**A playbook is a tree of `TASK.md` + `templates/` that ships in source control. It says *how* to do this kind of work. Artifacts — the work product — are per project and live at the project root, not inside the playbook.** Two projects can run the same playbook and produce wildly different artifacts.
|
|
224
|
+
|
|
225
|
+
| Reusable (lives in the playbook) | Per-project (lives in the project) |
|
|
226
|
+
|---|---|
|
|
227
|
+
| `playbook.yml` — manifest | `idea.md`, `PRD.md` — what *this* project wants |
|
|
228
|
+
| `TASK.md` — contract instructions | `screens.json`, `entities.json` — *this* project's data |
|
|
229
|
+
| `templates/` and spawn-driven orchestration — replication logic | Generated code, designs, configs |
|
|
230
|
+
| Skills, references | Final deliverables, build outputs |
|
|
231
|
+
|
|
232
|
+
**Anchor:** `examples/baby-app/` — the playbook lives at `.converge/playbooks/default/`; the artifacts live at the project root. Drop a different `idea.md` into a new project and run the same playbook.
|
|
233
|
+
|
|
234
|
+
**Test for drift:** if you can't copy the playbook into a new empty project and run it (after dropping in a fresh `idea.md`), you've baked project-specific data into the playbook. Move it out to a project file.
|
|
235
|
+
|
|
236
|
+
---
|
|
237
|
+
|
|
238
|
+
## The DAG Model
|
|
239
|
+
|
|
240
|
+
**A playbook is a DAG.** Every task is a node. Every `depends_on` in `playbook.yml` and `dependencies` in `TASK.md` is a directed edge. The framework computes topological order from those edges — directory sort prefixes (`01-`, `002-`) are for human readability, not execution order.
|
|
241
|
+
|
|
242
|
+
In the division-convergence model, DAG edges have clear semantics:
|
|
243
|
+
- **Parent → child edges** are the division: parent spawns children, depends on them to complete before converging.
|
|
244
|
+
- **Sibling → sibling edges** are sequential constraints within a level: "C needs B's output before C can start."
|
|
245
|
+
- **Child → parent (implicit)** is the convergence: children complete, parent converges. This edge isn't declared by the child — it's implied by the parent's `depends_on:` listing its children.
|
|
246
|
+
|
|
247
|
+
**Declarative, not imperative.** A task declares *what it produces* (`outputs:`) and *what it needs* (`inputs:`, `depends_on:`). It does not declare *when it runs* — the framework resolves that from the DAG. This is the same mental model as dbt's `ref()`: you name what you depend on, and the tool figures out the rest.
|
|
248
|
+
|
|
249
|
+
**The manifest is the compiled DAG.** Planning produces the source files (the `TASK.md` tree). `converge compile` produces `target/manifest.json` — the single source of truth for what nodes exist and how they connect.
|
|
250
|
+
|
|
251
|
+
**Selection operates on the DAG.** `--select '03-tokens+'` means "this node and all descendants." `--select 'state:modified+'` means "what changed and everything downstream."
|
|
252
|
+
|
|
253
|
+
**Three implications for planning:**
|
|
254
|
+
|
|
255
|
+
1. **Declare every edge.** A task's `depends_on:` list is the definitive record of what must complete first — including its own children for convergence.
|
|
256
|
+
2. **Outputs trace to downstream inputs (or to parent).** Every `outputs:` entry is consumed either by a sibling downstream or by the parent's convergence step. Orphan outputs signal a missing consumer.
|
|
257
|
+
3. **The DAG is partly dynamic.** Runtime spawn lets a parent materialize children at execution time. Plan for what's knowable; mark what isn't.
|
|
258
|
+
|
|
259
|
+
---
|
|
260
|
+
|
|
261
|
+
## Three Principles (Full Exposition)
|
|
262
|
+
|
|
263
|
+
### Principle 1 — Nested over flat (separation of concerns)
|
|
264
|
+
|
|
265
|
+
A parent owns one concern; children own sub-concerns. Each level's convergence addresses one integration concern. Flat trees collapse the org chart — the root ends up doing everyone's job.
|
|
266
|
+
|
|
267
|
+
- Top-level: **3–7 phases**. Each is one concern the project owner holds.
|
|
268
|
+
- Each phase: **3–7 children**. Each is one sub-concern the phase delegates.
|
|
269
|
+
- Continue nesting until each leaf is **15–45 min** of self-contained work.
|
|
270
|
+
- At each level, the convergence step integrates that level's concern.
|
|
271
|
+
|
|
272
|
+
**Anchor:** `examples/baby-app/.converge/playbooks/default/tasks/03-build-screens/` — three levels (phase → per-screen → per-sub-layer) instead of 80 sibling tasks.
|
|
273
|
+
|
|
274
|
+
**Smells:**
|
|
275
|
+
- *One-child node* → no division happening. Collapse into parent.
|
|
276
|
+
- *Mixed-shape siblings* (one config task next to ten per-screen tasks) → multiple concerns leaked into one parent. Split.
|
|
277
|
+
- *Verb-named siblings* (`author`, `fetch`, `clean`, `implement`, `test`) → you've decomposed the workflow, not the scope. Re-decompose by *what exists* — usually one child per entity, each owning its own end-to-end mini-workflow internally.
|
|
278
|
+
- *No convergence step in a container* → the parent adds no value. Either add convergence or flatten.
|
|
279
|
+
|
|
280
|
+
### Principle 2 — templates for replicable work (one contract, N instances)
|
|
281
|
+
|
|
282
|
+
When the same contract shape repeats from data, write the contract **once** as a template. The runtime spawns instances.
|
|
283
|
+
|
|
284
|
+
**Use runtime templates when:**
|
|
285
|
+
- N similar children driven by a list (`screens.json`, `entities[]`, `shots[]`).
|
|
286
|
+
- N is data-driven and may grow.
|
|
287
|
+
- Each instance has the same input/output shape — only the data binding differs.
|
|
288
|
+
|
|
289
|
+
**Don't use runtime templates when:**
|
|
290
|
+
- One-off tasks (single config, one spec).
|
|
291
|
+
- Heterogeneous shapes (different inputs, outputs, or skills) — those are *different* contracts; hand-write them.
|
|
292
|
+
- Small fixed N (≤ 3) where hand-writing is clearer.
|
|
293
|
+
|
|
294
|
+
Even with templates, the parent's convergence step is explicit: "after all N instances produce their outputs, I integrate them."
|
|
295
|
+
|
|
296
|
+
**Anchor:** `examples/stitch-to-flutter-baby-watch-v2/` — one template drives 10 screens; the parent converges the screens into the app.
|
|
297
|
+
|
|
298
|
+
|
|
299
|
+
### Principle 3 — Progressive decomposition by domain × layer (delegation discipline)
|
|
300
|
+
|
|
301
|
+
Plan one layer at a time. Write contracts only for your **direct children**. Never reach into grandchildren — that's each child's job when invoked.
|
|
302
|
+
|
|
303
|
+
Split each layer two ways:
|
|
304
|
+
|
|
305
|
+
| Split by | What it produces | Example |
|
|
306
|
+
|----------|------------------|---------|
|
|
307
|
+
| **Lifecycle layer** | Top-level phases | `01-prepare → 02-design-system → 03-build-screens → 05-behavior → 06-wire → 07-overlays` |
|
|
308
|
+
| **Domain** | Children inside a phase | inside `03-build-screens/`: one child per screen (`001-home`, `002-cycle-tracking`, …) |
|
|
309
|
+
| **Sub-layer** | Grandchildren inside a domain | inside `001-home/`: `001-design`, `002-build`, `003-split`, `004-lift` |
|
|
310
|
+
|
|
311
|
+
Lifecycle gives the *order* of division; domain gives the *fan-out* (who gets which slice).
|
|
312
|
+
|
|
313
|
+
At each layer, the parent's convergence integrates what that layer divided. The phase-level parent converges the phase; the domain-level parent converges the domain.
|
|
314
|
+
|
|
315
|
+
**Anchor:** `examples/baby-app/.converge/playbooks/default/tasks/` — top-level by lifecycle, second by screen domain, third by sub-layer.
|
|
316
|
+
|
|
317
|
+
**Hard rule:** when invoked at a node, plan only its direct children. Never read siblings, cousins, or grandchildren. If something's missing from your scope, write it under "Open questions" in PLAN.md — *don't fix under-specification by reaching outside your scope.*
|
|
@@ -0,0 +1,169 @@
|
|
|
1
|
+
# Goal-Tree Shapes Reference
|
|
2
|
+
|
|
3
|
+
Common shapes that emerge from goal decomposition. Use this reference to sanity-check your decomposition — if your goal tree looks nothing like any of these, it might be process decomposition. **Don't pick a shape first; let the goal tree dictate the shape.**
|
|
4
|
+
|
|
5
|
+
---
|
|
6
|
+
|
|
7
|
+
## Pattern Overview
|
|
8
|
+
|
|
9
|
+
After decomposing a user's goal into deliverable sub-goals, the resulting task tree will often match one of five recurring shapes:
|
|
10
|
+
|
|
11
|
+
| Shape | Root delegates by | Runtime fan-out sits at | Emerges when | Anchor examples |
|
|
12
|
+
|---|---|---|---|---|
|
|
13
|
+
| **Ordered Stages** | Delivery phase (dataset → analysis → report) | Domain entity inside a phase | One artifact-type evolves through ordered stages; entities replicate within a stage | `examples/baby-app`, `examples/flutter-app` |
|
|
14
|
+
| **Linear Pipeline** | Functional transform (fetch → transform → validate → report) | None usually — leaves are atomic | Linear flow of data/work; each stage is one bounded operation; no fan-out | `examples/data-pipeline` |
|
|
15
|
+
| **Creative Progression** | Creative stage (story → cast → world → style → breakdown → storyboard) | Late-stage replication only (per-shot, per-sheet) | Sequential creative refinement; early stages are singletons; late stages fan out over assets | `examples/cinematic-video-production` |
|
|
16
|
+
| **Domain Split** | Domain entity (characters, scenes, props) | Per-entity at every domain | The deliverable is *N parallel pipelines*, one per entity, with shared upstream specs | `examples/game-assets-video` |
|
|
17
|
+
| **Epoch Loop** | Epoch / iteration (a fixed template repeated) | Epoch template plus runtime spawn | Iterative refinement; quality converges over rounds; stop on a convergence check | `examples/scientific-research`, `examples/frontier-research` |
|
|
18
|
+
| **Goal-Driven Epoch Loop** | Declared goal set in playbook.yml | Epoch from template, adaptive per remaining goal | Work is large and replayable with clear measurable completion conditions; each epoch targets an unsatisfied goal | dynamic goal-driven loops |
|
|
19
|
+
|
|
20
|
+
---
|
|
21
|
+
|
|
22
|
+
## How Shapes Emerge
|
|
23
|
+
|
|
24
|
+
After decomposing the user's goal into deliverable sub-goals, look at the dependency graph:
|
|
25
|
+
|
|
26
|
+
- Sub-goals form a chain where each depends on the prior one's output → **Linear Pipeline** or **Ordered Stages** shape
|
|
27
|
+
- Sub-goals are N identical deliverables from a catalog → **Domain Split** with runtime fan-out shape
|
|
28
|
+
- Sub-goal is "improve quality" with no natural endpoint → **Epoch Loop** shape
|
|
29
|
+
- Sub-goals start as singletons then fan out over assets defined late → **Creative Progression** shape
|
|
30
|
+
- User describes a measurable end state with clear checks ("all tests pass", "zero type errors") → **Goal-Driven Epoch Loop** shape
|
|
31
|
+
|
|
32
|
+
The shape confirms a good decomposition. If you force a shape onto the goal (e.g., "this must be a lifecycle pipeline"), you'll miss the user's actual needs.
|
|
33
|
+
|
|
34
|
+
Two questions help recognize the shape:
|
|
35
|
+
|
|
36
|
+
1. **Is there a list of N similar deliverables?** (screens, characters, endpoints)
|
|
37
|
+
- Delivered in parallel → **Domain Split** shape
|
|
38
|
+
- Delivered inside an ordered stage → **Ordered Stages** with runtime fan-out
|
|
39
|
+
- If no, skip to question 2.
|
|
40
|
+
2. **Does work refine over rounds, or flow once through stages?**
|
|
41
|
+
- Refines over rounds with a convergence criterion → **Epoch Loop** shape
|
|
42
|
+
- Flows once, deterministic stages, no fan-out → **Linear Pipeline** shape
|
|
43
|
+
- Flows once, creative stages with late-stage asset fan-out → **Creative Progression** shape
|
|
44
|
+
|
|
45
|
+
---
|
|
46
|
+
|
|
47
|
+
## Pattern Shapes
|
|
48
|
+
|
|
49
|
+
### Ordered Stages — *one artifact, ordered stages, entities replicate within a stage*
|
|
50
|
+
|
|
51
|
+
```
|
|
52
|
+
01-prepare (singleton: requirements, screens.json)
|
|
53
|
+
02-design-system (singleton)
|
|
54
|
+
03-build-screens ← seed: per-screen, each spawns its own design→build→split→lift
|
|
55
|
+
05-add-behavior ← partial seed: per-provider
|
|
56
|
+
06-wire-screens ← partial seed: per-handler
|
|
57
|
+
07-polish
|
|
58
|
+
```
|
|
59
|
+
Domain entities (screens, providers) are *internal* to phases. Each phase gates the next.
|
|
60
|
+
|
|
61
|
+
> **Static/dynamic:** Top-level phase containers are static (hand-written). Per-entity replication inside a phase (per-screen, per-provider) is dynamic via catalog + templates + runtime spawn — children are *expected* after the catalog task runs. **Tests:** Phase-boundary checks gate progression (e.g., "all screens generated"); per-entity checks validate each spawned child.
|
|
62
|
+
|
|
63
|
+
### Linear Pipeline — *deterministic stages, atomic leaves, no replication*
|
|
64
|
+
|
|
65
|
+
```
|
|
66
|
+
01-recon → 02-intel → 03-sweep → 04-explore → 05-evidence → 06-report
|
|
67
|
+
```
|
|
68
|
+
Each stage owns one transformation. No seed unless one stage genuinely fans out (e.g. `03-sweep` per-target).
|
|
69
|
+
|
|
70
|
+
> **Static/dynamic:** All stages are static by default — each produces a qualitatively different artifact. If a stage fans out (per-target sweep), that stage is dynamic through templates + runtime spawn. **Tests:** Each stage's output is gated by a check before the next stage runs. The final report has a playbook-level check.
|
|
71
|
+
|
|
72
|
+
> **Linear Pipeline is not a license to verb-decompose anything.** It applies when each stage produces a *qualitatively different artifact* (recon-data → intel-summary → sweep-results → … → report) — every stage is a different kind of thing. If your "stages" all operate on the same population (N tokens, N features, N records) and just transform it incrementally, that's process-decomposition of a single scope — collapse into one task with a per-entity seed inside.
|
|
73
|
+
|
|
74
|
+
### Creative Progression — *sequential creative refinement, late-stage fan-out*
|
|
75
|
+
|
|
76
|
+
```
|
|
77
|
+
01-story (logline → synopsis → treatment → screenplay → bible) singletons
|
|
78
|
+
02-cast (extract → voice-casting → sheets) sheets seed
|
|
79
|
+
03-world (extract → plates) plates seed
|
|
80
|
+
04-style (visual → palette → audio) singletons
|
|
81
|
+
05-breakdown (scenes → shots → continuity) singletons
|
|
82
|
+
06-storyboard seed per-shot
|
|
83
|
+
07-keyframes seed per-shot
|
|
84
|
+
```
|
|
85
|
+
Early stages produce one artifact; late stages multiply over the assets defined upstream.
|
|
86
|
+
|
|
87
|
+
> **Static/dynamic:** Early creative stages (story, style, breakdown) are static singletons. Late-stage fan-out (per-shot, per-sheet) is dynamic via templates + runtime spawn — children are *expected* from breakdown outputs. **Tests:** Singleton stages have format/content checks; spawned children each have per-asset checks. Cross-stage consistency checks at playbook level (e.g., "every shot in the breakdown has a storyboard frame").
|
|
88
|
+
|
|
89
|
+
### Domain Split — *N parallel pipelines, one per entity, shared upstream specs*
|
|
90
|
+
|
|
91
|
+
```
|
|
92
|
+
00-classify-game (singleton: game type, tokens)
|
|
93
|
+
01-art-bible (singleton: shared visual spec)
|
|
94
|
+
02-asset-breakdown (produces: characters.json, props.json, scenes.json)
|
|
95
|
+
03-characters ← seed per-character: each runs its own pipeline
|
|
96
|
+
03-shared-props ← seed per-prop
|
|
97
|
+
05-scenes ← seed per-scene (consumes characters + props)
|
|
98
|
+
06-export
|
|
99
|
+
```
|
|
100
|
+
Domain entities are *first-class top-level concerns*, each with its own multi-step pipeline. Use when entities are heavy enough to warrant their own delegation tree.
|
|
101
|
+
|
|
102
|
+
> **Static/dynamic:** Shared upstream specs (classify-game, art-bible, asset-breakdown) are static singletons. Per-entity domain containers (characters, props, scenes) are static containers whose internal pipelines are dynamic via catalog + templates + runtime spawn. **Tests:** Shared spec tasks have format checks. Each domain has cross-entity consistency checks. Playbook-level checks validate cross-domain invariants (e.g., "every character appearing in a scene exists in characters.json").
|
|
103
|
+
|
|
104
|
+
### Epoch Loop — *iterative refinement until convergence*
|
|
105
|
+
|
|
106
|
+
```
|
|
107
|
+
playbook root
|
|
108
|
+
└── templates/epoch/
|
|
109
|
+
├── 001-hypothesize (or sub-tasks specific to the epoch)
|
|
110
|
+
├── 002-experiment
|
|
111
|
+
├── 003-evaluate
|
|
112
|
+
└── 004-decide (triggers next epoch or convergence)
|
|
113
|
+
```
|
|
114
|
+
The runtime spawns `epoch-001`, `epoch-002`, … instantiating the same template each time. Stop condition is a convergence check (quality threshold, contradiction-free, score plateau). Goals at the playbook level decide *when to stop spawning*.
|
|
115
|
+
|
|
116
|
+
> **Static/dynamic:** The epoch template is static (hand-written `TASK.md` files). Each epoch instance is a dynamic subtask spawned at runtime. The number of epochs is unknown at plan time — the convergence check decides when to stop. **Tests:** Each epoch has internal checks validating its own outputs. The convergence check is the most important test in the playbook — it defines "done."
|
|
117
|
+
|
|
118
|
+
### Goal-Driven Epoch Loop — *declared goal set, diverge→converge each epoch, stops when all goals pass*
|
|
119
|
+
|
|
120
|
+
```
|
|
121
|
+
playbook.yml
|
|
122
|
+
goals:
|
|
123
|
+
- id: code-quality # ← each goal has multiple checks
|
|
124
|
+
description: "All quality gates pass"
|
|
125
|
+
checks:
|
|
126
|
+
- id: type-check
|
|
127
|
+
cmd: "pnpm tsc --noEmit"
|
|
128
|
+
- id: tests
|
|
129
|
+
cmd: "pnpm vitest run"
|
|
130
|
+
|
|
131
|
+
DAG per epoch:
|
|
132
|
+
DIVERGE CONVERGE
|
|
133
|
+
seed spawns children → children execute → parent evaluates
|
|
134
|
+
(implement, verify) independently goal state, decides
|
|
135
|
+
continue or stop
|
|
136
|
+
```
|
|
137
|
+
|
|
138
|
+
Each epoch follows the **diverge → converge** rhythm:
|
|
139
|
+
1. **Diverge**: the root container evaluates goals, picks the first unsatisfied goal, and spawns an epoch with implement+verify tasks targeting that goal
|
|
140
|
+
2. **Children execute**: implement makes the change, verify runs the goal's checks
|
|
141
|
+
3. **Converge**: the seed re-evaluates goal state — if goals remain, diverge again (spawn next epoch); if all satisfied, `ctx.loop.stop()`
|
|
142
|
+
|
|
143
|
+
A goal is satisfied when **all** its checks pass. Goals replace the old playbook-level `checks:` — there is no separate post-run validation system.
|
|
144
|
+
|
|
145
|
+
Use when the work is large, replayable, and has clear measurable completion conditions. Unlike a research epoch loop (incremental quality improvement), the goal-driven loop targets specific, binary completion conditions.
|
|
146
|
+
|
|
147
|
+
> **Static/dynamic:** Goals and their checks are declared in playbook.yml. Epochs are spawned dynamically from a template. **Tests:** Every goal check IS a test — deterministic shell command, exit 0 = pass. Playbook bounds (maxIterations, stall) prevent infinite loops.
|
|
148
|
+
|
|
149
|
+
---
|
|
150
|
+
|
|
151
|
+
## Mixing Shapes
|
|
152
|
+
|
|
153
|
+
Goal-tree shapes compose. Common combinations:
|
|
154
|
+
|
|
155
|
+
- **Ordered Stages + Domain Split**: top-level ordered stages, but one stage fans out into a Domain-Split sub-tree (e.g., `03-build-screens/` contains per-screen deliverables that themselves use ordered stages internally — exactly what `baby-app` does).
|
|
156
|
+
- **Linear Pipeline → Epoch Loop**: a deterministic ingestion phase feeds a research epoch loop.
|
|
157
|
+
- **Creative Progression → Domain Split**: early creative stages produce specs that downstream Domain Split consumes (e.g., screenplay → per-shot pipelines).
|
|
158
|
+
|
|
159
|
+
When mixing, **the outermost shape describes how the root goal decomposes**. Don't force-fit all sub-trees into the same shape.
|
|
160
|
+
|
|
161
|
+
---
|
|
162
|
+
|
|
163
|
+
## Per-Shape Anti-Patterns
|
|
164
|
+
|
|
165
|
+
- **Ordered Stages for bulk replicable work.** If you have 100 scenes to generate, sequential phases at the top crush parallelism. Use Domain Split or push runtime fan-out to the right layer.
|
|
166
|
+
- **Domain Split when deliverables are tiny.** A "per-config-file" fan-out with one-line bodies is just nesting for nesting's sake. Hand-write or move runtime fan-out up a level.
|
|
167
|
+
- **Epoch Loop without a convergence check.** Without a stop condition, you spawn epochs forever. Define what "converged" looks like *before* writing the template.
|
|
168
|
+
- **Linear Pipeline when work refines.** Linear stages can't go back. If quality must improve over rounds, use Epoch Loop.
|
|
169
|
+
- **Creative Progression for deterministic work.** If checks are deterministic and stages are orderable, prefer Linear Pipeline — it's mechanically simpler.
|