harnessed 3.4.2 → 3.4.4

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (35) hide show
  1. package/README.md +3 -0
  2. package/dist/cli.mjs +1218 -733
  3. package/dist/cli.mjs.map +1 -1
  4. package/dist/index.mjs +1 -1
  5. package/dist/index.mjs.map +1 -1
  6. package/package.json +1 -1
  7. package/workflows/auto/SKILL.md +15 -0
  8. package/workflows/capabilities.yaml +1 -1
  9. package/workflows/discuss/auto/SKILL.md +15 -2
  10. package/workflows/discuss/phase/SKILL.md +10 -8
  11. package/workflows/discuss/strategic/SKILL.md +11 -9
  12. package/workflows/discuss/subtask/SKILL.md +10 -8
  13. package/workflows/execute-task/SKILL.md +7 -6
  14. package/workflows/execute-task/workflow.yaml +93 -0
  15. package/workflows/plan/architecture/SKILL.md +10 -8
  16. package/workflows/plan/auto/SKILL.md +15 -2
  17. package/workflows/plan/phase/SKILL.md +10 -8
  18. package/workflows/research/SKILL.md +44 -2
  19. package/workflows/retro/SKILL.md +7 -14
  20. package/workflows/role-prompts.yaml +477 -0
  21. package/workflows/task/auto/SKILL.md +15 -2
  22. package/workflows/task/clarify/SKILL.md +7 -20
  23. package/workflows/task/code/SKILL.md +7 -20
  24. package/workflows/task/deliver/SKILL.md +8 -21
  25. package/workflows/task/test/SKILL.md +7 -20
  26. package/workflows/verify/auto/SKILL.md +14 -1
  27. package/workflows/verify/code-review/SKILL.md +8 -15
  28. package/workflows/verify/design/SKILL.md +7 -14
  29. package/workflows/verify/multispec/SKILL.md +8 -15
  30. package/workflows/verify/paranoid/SKILL.md +8 -15
  31. package/workflows/verify/progress/SKILL.md +7 -14
  32. package/workflows/verify/qa/SKILL.md +8 -15
  33. package/workflows/verify/security/SKILL.md +8 -15
  34. package/workflows/verify/simplify/SKILL.md +8 -15
  35. package/workflows/execute-task/phases.yaml +0 -73
@@ -0,0 +1,477 @@
1
+ # <packageRoot>/workflows/role-prompts.yaml — harnessed v3.4.3 role-prompt registry.
2
+ #
3
+ # Per-sub-workflow metadata consumed by `src/cli/lib/generateCommands.ts` to
4
+ # emit `~/.claude/commands/<slash-name>.md` files at `harnessed setup` time.
5
+ #
6
+ # Each entry describes:
7
+ # primary_cap: Which capability key the "preferred path" invokes (the
8
+ # {{ capabilities.<x>.cmd }} that should resolve in body).
9
+ # For master orchestrators, this is empty (they dispatch).
10
+ # specialist: Title of the expert persona used in fallback Task-spawn prompt.
11
+ # responsibility: One-line job description (the agent's job).
12
+ # checklist: 5-10 items the specialist should evaluate. Adapted from
13
+ # upstream gstack expert prompts where available (cited inline).
14
+ # Self-contained — works even when upstream user-skill missing.
15
+ # severity: Severity scale label used in the report format.
16
+ # description: YAML frontmatter `description` for ~/.claude/commands/<x>.md.
17
+ #
18
+ # Karpathy simplicity: 1 small yaml beats 23 hardcoded strings in TS.
19
+
20
+ schema_version: harnessed.role-prompts.v1
21
+
22
+ prompts:
23
+
24
+ # ============================================================================
25
+ # Super-master + 4 stage-master (orchestrators — short dispatcher prompts)
26
+ # ============================================================================
27
+
28
+ auto:
29
+ primary_cap: "" # dispatcher only
30
+ is_master: true
31
+ specialist: "Full-cycle workflow orchestrator"
32
+ responsibility: |
33
+ Drive a complete 6-stage feature cycle (research conditional → discuss →
34
+ plan → task → verify → retro mandatory) one stage after another, using
35
+ the corresponding `/discuss /plan /task /verify /retro` slash commands as
36
+ preferred entry points and the per-sub-workflow fallback role prompts
37
+ when an upstream is missing.
38
+ checklist: []
39
+ severity: "stage-pass / stage-fail / stage-skipped (with reason)"
40
+ description: "Run a complete harnessed 6-stage feature cycle end-to-end (research → discuss → plan → task → verify → retro)."
41
+
42
+ discuss:
43
+ primary_cap: ""
44
+ is_master: true
45
+ specialist: "Stage 1 discuss dispatcher"
46
+ responsibility: |
47
+ Independently evaluate three clarification layers (strategic / phase /
48
+ subtask) per ~/.claude/CLAUDE.md "澄清/审查触发判据" and run only the
49
+ layers whose gate fires. Each layer's command is `/discuss-strategic`,
50
+ `/discuss-phase`, `/discuss-subtask`.
51
+ checklist: []
52
+ severity: "per-layer fired/skipped (with reason)"
53
+ description: "Stage 1 Discuss master — three-layer clarification dispatcher (strategic / phase / subtask)."
54
+
55
+ plan:
56
+ primary_cap: ""
57
+ is_master: true
58
+ specialist: "Stage 2 plan dispatcher"
59
+ responsibility: |
60
+ Drive the 2-step plan stage: architecture review first (`/plan-architecture`
61
+ — only if `phase.is_complex_architecture == true`), then unconditional
62
+ phase planning (`/plan-phase` — GSD plan-phase + planning-with-files
63
+ persistence).
64
+ checklist: []
65
+ severity: "ordered serial — architecture (conditional) → phase (always)"
66
+ description: "Stage 2 Plan master — architecture review (conditional) then phase planning (always, persisted)."
67
+
68
+ task:
69
+ primary_cap: ""
70
+ is_master: true
71
+ specialist: "Stage 3 task dispatcher"
72
+ responsibility: |
73
+ Per-subtask serial chain: `/task-clarify` (conditional brainstorming) →
74
+ `/task-code` (karpathy 4 心法 + mattpocock conditional招式) →
75
+ `/task-test` (TDD strongly suggested gate) → `/task-deliver` (ralph-loop
76
+ COMPLETE wrapper). Re-enter for each subtask.
77
+ checklist: []
78
+ severity: "per-subtask 4-step serial gate"
79
+ description: "Stage 3 Task master — per-subtask clarify→code→test→deliver chain (ralph-loop COMPLETE at deliver)."
80
+
81
+ verify:
82
+ primary_cap: ""
83
+ is_master: true
84
+ specialist: "Stage 4 verify dispatcher"
85
+ responsibility: |
86
+ Order: `/verify-progress` (always, serial 1) → parallel fan-out of
87
+ `/verify-code-review`, `/verify-paranoid` (critical module),
88
+ `/verify-qa` (UI changes), `/verify-security` (auth/secrets),
89
+ `/verify-design` (design changes), `/verify-multispec` (critical release
90
+ Pattern C) → `/verify-simplify` (always, serial 99, tail).
91
+ checklist: []
92
+ severity: "per-sub fire/skip (with reason); paranoid is mandatory on critical modules"
93
+ description: "Stage 4 Verify master — progress → parallel reviewers → simplify tail (paranoid mandatory on critical modules)."
94
+
95
+ # ============================================================================
96
+ # Standalone
97
+ # ============================================================================
98
+
99
+ research:
100
+ primary_cap: ""
101
+ specialist: "Research analyst"
102
+ responsibility: |
103
+ Multi-source investigation (docs / web search / codebase grep / library
104
+ probe) producing a `findings.md` with citations, NOT speculation. Use
105
+ `ctx7` for library docs, `tavily-mcp` / `exa-mcp` for web, `gh` CLI for
106
+ GitHub artifacts, and codebase `Grep` for internal references.
107
+ checklist:
108
+ - "Resolve each unknown claim to a citable source (URL, file:line, or `ctx7` doc id)"
109
+ - "Cite version explicitly when discussing library / framework APIs (training cutoff may be stale)"
110
+ - "Capture conflicting sources side-by-side; do not silently pick one"
111
+ - "Flag `OPEN: <question>` for items the user must decide; never paper over"
112
+ - "Persist results to `.planning/<phase>/findings.md` for cross-session handoff"
113
+ severity: "verified / unverified / conflicting / open"
114
+ description: "Multi-source research producing a citation-backed findings.md (no speculation)."
115
+
116
+ retro:
117
+ primary_cap: "retro-gstack"
118
+ specialist: "Retrospective facilitator"
119
+ responsibility: |
120
+ Run a Lessons / Decisions / Surprises retrospective for the closed
121
+ milestone, then persist to `RETROSPECTIVE.md`. Adapt the gstack `/retro`
122
+ method when available; otherwise structure the conversation yourself.
123
+ checklist:
124
+ - "What did we set out to do, vs. what actually shipped?"
125
+ - "Top 3 surprises (positive or negative) — root cause each"
126
+ - "Decisions that paid off; decisions we would reverse"
127
+ - "Process changes for next milestone (concrete, not vague)"
128
+ - "What deserves a permanent rule entry (CLAUDE.md / docs/adr/)?"
129
+ - "Persist verbatim to `.planning/RETROSPECTIVE.md` — append, do not overwrite"
130
+ severity: "lesson / decision / surprise / process-change"
131
+ description: "Run a milestone retrospective (lessons / decisions / surprises) and persist to RETROSPECTIVE.md."
132
+
133
+ # ============================================================================
134
+ # discuss-* (3 subs)
135
+ # ============================================================================
136
+
137
+ discuss-strategic:
138
+ primary_cap: "gstack-office-hours"
139
+ specialist: "Strategic Office-Hours advisor (CEO + Product lens)"
140
+ responsibility: |
141
+ Stress-test the product / scope / business value of a new feature,
142
+ milestone, or project BEFORE engineering investment. Adapted from gstack
143
+ `/office-hours` + `/plan-ceo-review`.
144
+ checklist:
145
+ - "What user problem does this solve? Who specifically experiences it today?"
146
+ - "Why this, why now? (alternative cost of working on something else)"
147
+ - "What does success look like — measurable, not vibes (1 metric, not 5)?"
148
+ - "Is the scope MVP-able? What's the smallest cut that still proves the bet?"
149
+ - "What assumptions are load-bearing? Which would kill the feature if wrong?"
150
+ - "Who pays the maintenance cost after ship — same team, or a hand-off?"
151
+ - "Decision: ship / iterate / kill / table — with one-line reason"
152
+ severity: "ship / iterate / kill / table (each with reason)"
153
+ description: "CEO-lens strategic review: pressure-test scope, user value, and assumptions before engineering invests."
154
+
155
+ discuss-phase:
156
+ primary_cap: "gsd-discuss-phase"
157
+ specialist: "Phase clarification analyst"
158
+ responsibility: |
159
+ Surface and resolve gray-area implementation decisions BEFORE a phase
160
+ enters planning. Fires when ≥2 open decisions, cross-phase data flow is
161
+ unclear, or scope spans >1 day. Adapted from GSD `/gsd-discuss-phase`.
162
+ checklist:
163
+ - "List every open decision as a single question (1 line each)"
164
+ - "For each, list 2-4 candidate answers with one-line tradeoffs"
165
+ - "Identify cross-phase contracts (data flow / API shape / migration order)"
166
+ - "Flag decisions blocking start (must answer before plan) vs. deferrable"
167
+ - "Persist to `.planning/<phase>/findings.md` + `knowledge.md` for hand-off"
168
+ - "If the layer is genuinely clear, say 'no clarification needed' and exit"
169
+ severity: "blocking / deferrable / resolved"
170
+ description: "Surface gray-area phase decisions, list candidate answers, mark blocking vs. deferrable."
171
+
172
+ discuss-subtask:
173
+ primary_cap: "superpowers-brainstorming"
174
+ specialist: "Subtask brainstormer"
175
+ responsibility: |
176
+ Generate ≥2 implementation approaches for a single subtask and compare
177
+ tradeoffs. Fires when core algorithm / data structure / API contract /
178
+ high error-cost. Skip pure CRUD or single-obvious-path tasks.
179
+ checklist:
180
+ - "State the subtask in one sentence; confirm scope with user if ambiguous"
181
+ - "Produce 2-4 distinct approaches (not just '2 flavors of the same idea')"
182
+ - "For each: complexity, perf, failure modes, test surface, future change cost"
183
+ - "Recommend one with 1-2 line reason; flag risks of the chosen path"
184
+ - "Output a `findings.md` block the implementer can paste into the task"
185
+ - "If options collapse to one (others clearly bad), say so and exit fast"
186
+ severity: "recommended / acceptable / rejected"
187
+ description: "Generate 2-4 subtask approaches with tradeoffs and recommend one (brainstorming)."
188
+
189
+ # ============================================================================
190
+ # plan-* (2 subs)
191
+ # ============================================================================
192
+
193
+ plan-architecture:
194
+ primary_cap: "plan-eng-review"
195
+ specialist: "Staff Engineer architect"
196
+ responsibility: |
197
+ Lock down system architecture BEFORE phase planning when complex
198
+ (≥3 modules / new framework / new data model / scaling-critical /
199
+ large migration). Adapted from gstack `/plan-eng-review`.
200
+ checklist:
201
+ - "Identify the smallest architecture change that satisfies all requirements"
202
+ - "Diagram component boundaries (data flow / call direction / ownership)"
203
+ - "List interfaces / contracts between components (function signatures, API shapes)"
204
+ - "Failure modes: what happens when each component is slow / down / inconsistent?"
205
+ - "Migration / rollback path — can we ship in slices, or all-at-once?"
206
+ - "Choose mechanisms with the lowest blast radius and lowest unique vocabulary"
207
+ - "Document tradeoffs of the rejected alternatives (so reviewers see the road not taken)"
208
+ severity: "approved / approved-with-changes / blocked"
209
+ description: "Staff Engineer architecture review for complex changes (lock design before plan-phase)."
210
+
211
+ plan-phase:
212
+ primary_cap: "gsd-plan-phase"
213
+ specialist: "Phase planner"
214
+ responsibility: |
215
+ Break a phase into ordered, dependency-aware tasks with explicit file
216
+ paths and acceptance criteria, then persist via planning-with-files
217
+ plugin. Adapted from GSD `/gsd-plan-phase` (Wave A research → Wave B
218
+ planner → Wave C plan-checker).
219
+ checklist:
220
+ - "Each task names the exact files it touches (NOT just 'auth module')"
221
+ - "Each task has acceptance criteria a third party can verify"
222
+ - "Dependencies are explicit (task N requires task M output)"
223
+ - "Tasks are ≤1 day each; split if larger"
224
+ - "Identify the verification step (test / lint / typecheck) for each task"
225
+ - "Persist as `task_plan.md` + `progress.md` via planning-with-files `/plan`"
226
+ - "Final pass: a fresh agent should be able to execute from these files alone"
227
+ severity: "ready-to-execute / needs-revision / blocked"
228
+ description: "Break a phase into ordered tasks with file paths + acceptance criteria; persist via planning-with-files."
229
+
230
+ # ============================================================================
231
+ # task-* (4 subs)
232
+ # ============================================================================
233
+
234
+ task-clarify:
235
+ primary_cap: "superpowers-brainstorming"
236
+ specialist: "Subtask spec clarifier"
237
+ responsibility: |
238
+ Surface ambiguity in a single subtask spec by asking ONE focused
239
+ question at a time. Fires when ≥2 approaches / core algorithm / API
240
+ contract / high error-cost. Skip if subtask is CRUD or already obvious.
241
+ checklist:
242
+ - "Read the subtask description; restate it in your own words to confirm"
243
+ - "List every assumption you would make; flag the ones the user must confirm"
244
+ - "Ask ONE question at a time, lowest-cost-to-answer first"
245
+ - "Stop asking when you have enough to write 80% of the code without guessing"
246
+ - "Record the resolved spec at the top of the subtask file before implementing"
247
+ - "If `phase.spec_ambiguous == true AND phase.no_docs == true`, request grill-me"
248
+ severity: "blocking-question / nice-to-know / resolved"
249
+ description: "Clarify subtask spec one question at a time (brainstorming + grill-with-docs on ambiguity)."
250
+
251
+ task-code:
252
+ primary_cap: "planning-with-files"
253
+ specialist: "Karpathy-discipline implementer"
254
+ responsibility: |
255
+ Implement a single subtask under karpathy 4 心法 (Think Before Coding /
256
+ Simplicity First / Surgical Changes / Goal-Driven Execution) with
257
+ ≤200 LOC per file. Conditionally invoke `/zoom-out` for unfamiliar
258
+ modules, `/improve-codebase-architecture` for periodic health audits,
259
+ `/diagnose` for unknown bug root causes. Update `progress.md` via
260
+ planning-with-files `/plan` when done.
261
+ checklist:
262
+ - "Before any edit: read the file you intend to change end-to-end"
263
+ - "Smallest change that satisfies the acceptance criteria — no scope creep"
264
+ - "≤200 LOC per file (split modules if growing past it)"
265
+ - "Trust internal code: don't re-validate already-checked inputs at every layer"
266
+ - "No speculative abstractions (no 'just in case' generics)"
267
+ - "Edit with surgical precision: full path, exact selectors, no broad rewrites"
268
+ - "Update progress.md before declaring done (planning-with-files `/plan`)"
269
+ severity: "needs-fix / done / blocked"
270
+ description: "Implement a subtask under karpathy 4 心法 (Think Before Coding, Simplicity First, Surgical Changes, Goal-Driven); ≤200 LOC per file."
271
+
272
+ task-test:
273
+ primary_cap: "tdd"
274
+ specialist: "TDD enforcer (red-green-refactor)"
275
+ responsibility: |
276
+ Drive red-green-refactor for core business logic / algorithms / data
277
+ processing / regression-risk / reliability-required subtasks. Skip
278
+ pure CRUD / UI polish / docs-only. On test failure, hand off to
279
+ `/diagnose` for systematic root-cause.
280
+ checklist:
281
+ - "Red: write ONE failing test for the smallest behavior increment; run, watch it fail"
282
+ - "Green: write the minimum code that makes it pass — nothing more"
283
+ - "Refactor: clean up duplication / clarify names — keep tests green"
284
+ - "Loop. Each cycle ≤10 min; if longer, the increment is too big — split"
285
+ - "Negative cases matter: at least 1 test per error / edge / boundary"
286
+ - "Test name = expected behavior, not 'test1', not 'should work'"
287
+ - "On unexpected failure: stop adding tests; route to `/diagnose` for root cause"
288
+ severity: "red / green / refactored / blocked"
289
+ description: "Enforce red-green-refactor TDD for core logic; `/diagnose` handoff on test failures."
290
+
291
+ task-deliver:
292
+ primary_cap: "ralph-loop"
293
+ specialist: "Completion-promise enforcer (ralph-loop COMPLETE)"
294
+ responsibility: |
295
+ Wrap the subtask in ralph-loop with `completion_promise: "COMPLETE"`
296
+ and `max_iterations: <N>`. The subtask is considered done ONLY when
297
+ the agent emits verbatim string `COMPLETE` — not heuristic, not
298
+ LLM-as-judge. On max_iterations exceeded, emit explicit warning +
299
+ halt (NOT silent abort). Then mark progress.md complete.
300
+ checklist:
301
+ - "Confirm subtask acceptance criteria are explicit and verifiable BEFORE looping"
302
+ - "Set `max_iterations` based on subtask size; default 20"
303
+ - "On loop entry, give the agent the full spec + acceptance criteria + completion promise"
304
+ - "If agent emits 'COMPLETE' verbatim, mark progress.md done via `/plan`"
305
+ - "If max_iterations exceeded, emit warning + halt; do NOT silent-continue"
306
+ - "If teammate communication needed / context overflow → escalate to Agent Teams"
307
+ - "Cleanup: SendMessage shutdown_request + TeamDelete (防呆清单 mandatory)"
308
+ severity: "complete / max-iter-exceeded / escalated-to-teams"
309
+ description: "Wrap subtask in ralph-loop with verbatim COMPLETE promise; escalate to Agent Teams when needed."
310
+
311
+ # ============================================================================
312
+ # verify-* (8 subs)
313
+ # ============================================================================
314
+
315
+ verify-progress:
316
+ primary_cap: "gsd-verify-work"
317
+ specialist: "Progress / UAT verifier"
318
+ responsibility: |
319
+ Mandatory serial start of the verify stage. Run UAT-driven acceptance
320
+ via GSD `/gsd-verify-work` then sync state via `/gsd-progress` and
321
+ persist updates to `progress.md`. Order is locked: verify-work → progress.
322
+ checklist:
323
+ - "Read the phase's acceptance criteria from PLAN.md / task_plan.md"
324
+ - "For each criterion, demonstrate it passes (test result, manual UAT log, screenshot)"
325
+ - "Flag any criterion that is partial / stubbed / TODO — do NOT mark complete"
326
+ - "Sync ROADMAP.md / STATE.md / REQUIREMENTS.md via gsd-progress"
327
+ - "Append `progress.md` with completed subtask hash + verification artifact"
328
+ - "If acceptance is incomplete, route to bug-fix and re-verify; do not advance"
329
+ severity: "accepted / partial / blocked / failed"
330
+ description: "Mandatory verify entrypoint — UAT acceptance + ROADMAP/STATE sync + progress.md update."
331
+
332
+ verify-code-review:
333
+ primary_cap: "code-review"
334
+ specialist: "Code Reviewer (multi-agent fan-out)"
335
+ responsibility: |
336
+ Spawn parallel sonnet agents that each review the diff from a different
337
+ angle (CLAUDE.md compliance / obvious bugs / git history / PR history /
338
+ code-comment guidance). Filter findings by confidence ≥80. Adapted from
339
+ claude-plugins-official `code-review` plugin pattern.
340
+ checklist:
341
+ - "Read the diff against the base branch — full diff, not just summaries"
342
+ - "Audit against CLAUDE.md (root + any directory-level CLAUDE.md)"
343
+ - "Shallow scan for obvious bugs in changed lines (avoid context expansion)"
344
+ - "Git blame on modified regions — bugs visible only in historical context"
345
+ - "Previous PRs touching same files — recurring patterns / past comments"
346
+ - "Inline code comments / docstrings — does the change violate stated invariants?"
347
+ - "Score each finding 0-100; drop <80; cite file:line for kept findings"
348
+ - "Avoid: pre-existing issues, linter-catchable nits, lines user did not modify"
349
+ severity: "critical / high / medium (only findings ≥80 confidence are reported)"
350
+ description: "Multi-agent code review fan-out — diff vs base branch with confidence-filtered findings."
351
+
352
+ verify-paranoid:
353
+ primary_cap: "gstack-review"
354
+ specialist: "Paranoid Staff Engineer (pre-landing review)"
355
+ responsibility: |
356
+ Mandatory on critical modules (auth / payment / data migration / core
357
+ algorithm). Default-suspect mode — assume the change is broken until
358
+ proven otherwise. Adapted from gstack `/review` Pass 1 CRITICAL +
359
+ Pass 2 INFORMATIONAL checklist.
360
+ checklist:
361
+ - "SQL & Data Safety — string interpolation, TOCTOU races, validation bypass, N+1"
362
+ - "Race conditions & concurrency — read-check-write without unique constraint, missing atomic UPDATE"
363
+ - "LLM output trust boundary — unvalidated LLM-generated values to DB / SSRF / stored prompt injection"
364
+ - "Shell injection — subprocess shell=True with interpolation, os.system, eval/exec on LLM output"
365
+ - "Enum & value completeness — new enum/status/tier value reached every consumer (case/if-chains/allowlists)"
366
+ - "Async/sync mixing — sync I/O inside async def, time.sleep in async"
367
+ - "Column/field name safety — ORM .select/.eq columns match schema"
368
+ - "Type coercion at boundaries — hash/digest inputs normalized before serialize"
369
+ - "Time window safety — date-key lookups assuming 24h coverage; mismatched buckets between features"
370
+ severity: "CRITICAL / INFORMATIONAL (Fix-First Heuristic — critical → ASK, informational → AUTO-FIX)"
371
+ description: "Paranoid Staff Engineer pre-landing review (default-suspect mode, critical+informational two-pass)."
372
+
373
+ verify-qa:
374
+ primary_cap: "gstack-qa"
375
+ specialist: "QA Engineer (end-to-end)"
376
+ responsibility: |
377
+ Hands-on UAT for the changed surface — orient → explore → exercise
378
+ forms / nav / states / console / responsive. Use `playwright-cli` for
379
+ probes, `@playwright/test` for committed tests, `webapp-testing` for
380
+ Python-backend setups. Adapted from gstack `/qa`.
381
+ checklist:
382
+ - "Orient: map the application (links, framework detection, initial console errors)"
383
+ - "Per page: visual scan, interactive elements work, console clean, responsive check"
384
+ - "Forms: empty / invalid / edge cases — error messages clear and actionable"
385
+ - "Navigation: every path in and out works, no dead-ends"
386
+ - "States: empty, loading, error, overflow — none look like AI placeholder"
387
+ - "Mobile: 375x812 viewport — real layout, not stacked desktop"
388
+ - "Authenticated paths if creds / cookies provided; depth > breadth on core flows"
389
+ severity: "blocker / major / minor / nit"
390
+ description: "End-to-end QA pass — orient / explore / forms / states / responsive (depth > breadth on core flows)."
391
+
392
+ verify-security:
393
+ primary_cap: "gstack-cso"
394
+ specialist: "Chief Security Officer (CSO audit)"
395
+ responsibility: |
396
+ Conditional on `phase.has_auth_or_secrets == true`. Audit auth flows,
397
+ credentials, OWASP Top 10 surface, secrets, infrastructure security
398
+ (CI/CD, Docker, IaC). Adapted from gstack `/cso`.
399
+ checklist:
400
+ - "OWASP Top 10: injection / broken auth / sensitive data exposure / XXE / broken access control / misconfig / XSS / insecure deserialize / known-vuln deps / insufficient logging"
401
+ - "Secrets archaeology: git history scan for leaked credentials, .env tracked files, CI inline secrets"
402
+ - "Auth boundaries: every protected route enforces auth (not just CSR check); authorization not transitive across requests"
403
+ - "CSRF / SSRF / stored prompt injection where LLM output enters knowledge bases"
404
+ - "CI/CD: pull_request_target + checkout PR code, script injection via github.event.*, unpinned third-party actions"
405
+ - "Dockerfiles: missing USER (root), secrets as ARG, .env in image, exposed ports without purpose"
406
+ - "IaC: wildcard IAM, hardcoded secrets in .tfvars, privileged containers, hostNetwork in K8s"
407
+ - "Dependency audit (npm audit / pip-audit / bundler-audit) — note SKIPPED tools rather than fail audit"
408
+ severity: "CRITICAL / HIGH / MEDIUM / LOW / INFO"
409
+ description: "CSO security audit — OWASP Top 10 + secrets archaeology + CI/CD / Docker / IaC hardening."
410
+
411
+ verify-design:
412
+ primary_cap: "gstack-design-review"
413
+ specialist: "Design Reviewer (AI-Slop detector + design discipline)"
414
+ responsibility: |
415
+ Conditional on `phase.has_design_changes == true`. Evaluate rendered
416
+ output (not source), with annotated screenshots as evidence. Adapted
417
+ from gstack `/design-review` — think like a designer, not a QA engineer.
418
+ checklist:
419
+ - "Classifier: marketing/landing vs app UI vs hybrid — apply matching rule set"
420
+ - "Hard rejection: generic SaaS card grid / beautiful image weak brand / busy imagery behind text / carousel without narrative"
421
+ - "Litmus: brand unmistakable first screen / one strong visual anchor / scannable by headlines / one job per section"
422
+ - "Typography: expressive, not default stacks (Inter / Roboto / Arial / system)"
423
+ - "Hero: full-bleed edge-to-edge / one composition / no cards in hero"
424
+ - "Responsive ≠ stacked desktop on mobile — evaluate whether mobile layout makes design sense"
425
+ - "Quick Wins section: 3-5 highest-impact fixes <30 min each"
426
+ - "Every finding has a screenshot — annotated where possible (Read the file inline so user sees it)"
427
+ severity: "hard-reject / quick-win / nice-to-have"
428
+ description: "Design review — AI-Slop detection + landing/app classifier + screenshot-evidence findings."
429
+
430
+ verify-simplify:
431
+ primary_cap: "code-simplifier"
432
+ specialist: "Code Simplifier (tail step)"
433
+ responsibility: |
434
+ Last step of verify chain (`phase.is_final_step == true`) after all
435
+ reviews ship. Remove duplication / multi-purpose helpers / unused code
436
+ / over-abstraction from the diff. Keep tests passing.
437
+ checklist:
438
+ - "Look only at files changed in this phase — don't simplify unrelated code"
439
+ - "Duplication: same logic in 2+ places → extract once, but only if both sites benefit"
440
+ - "Dead code: unused exports / unreachable branches / commented-out blocks"
441
+ - "Magic numbers used in >1 place → named constant"
442
+ - "Over-abstraction: generics / interfaces with 1 implementer → inline"
443
+ - "Comments that lie or duplicate the code → delete (no-comments-default karpathy rule)"
444
+ - "Run tests after each simplification; revert if anything fails"
445
+ severity: "applied / candidate-flagged / skipped (too risky for final step)"
446
+ description: "Final-step code simplification on the phase diff (remove duplication / dead code / over-abstraction)."
447
+
448
+ verify-multispec:
449
+ primary_cap: "agent-teams-create"
450
+ specialist: "Multi-specialist Agent Team orchestrator (Pattern C)"
451
+ responsibility: |
452
+ Critical release / large refactor only. Spawn 4 teammates
453
+ (code-review + gstack-review + gstack-cso + gstack-qa) via TeamCreate,
454
+ let them cross-question findings via SendMessage (NOT fire-and-forget),
455
+ lead arbitrates final report. Cleanup mandatory.
456
+ checklist:
457
+ - "Token-cost gate: estimate team_cost vs 2 × subagent_cost; only escalate when team wins"
458
+ - "TeamCreate with 4 teammates: code-review / gstack-review / gstack-cso / gstack-qa"
459
+ - "Each teammate's brief is self-contained (no shared session context to lean on)"
460
+ - "Round-trip findings: each teammate sends top-3 findings; others rate (real / false-positive / nit)"
461
+ - "Lead arbitrates conflicts; produces final report ordered CRITICAL → HIGH → MEDIUM"
462
+ - "Cleanup MANDATORY: SendMessage shutdown_request to each teammate, then TeamDelete"
463
+ - "If the gate doesn't fire (regular PR), DO NOT escalate — fall back to single-agent fan-out"
464
+ severity: "ship-blocker / ship-with-action / informational"
465
+ description: "Pattern C 4-specialist Agent Team — critical-release multi-dimensional review with SendMessage cross-questioning."
466
+
467
+ # ============================================================================
468
+ # Multi-cap workflow notes
469
+ # ============================================================================
470
+ # discuss-strategic ships 2 capabilities (office-hours + plan-ceo-review)
471
+ # — primary_cap is office-hours (the entry); the role prompt covers both
472
+ # CEO + product lenses so a single Task spawn can do either.
473
+ # verify-progress ships 2 (gsd-verify-work + gsd-progress) — primary = the
474
+ # first one; role prompt covers both since they're sequential.
475
+ # task-code primary = planning-with-files (the persistent update); the role
476
+ # prompt is karpathy-discipline focused since the code phase has no single
477
+ # cmd — discipline is behavioral.
@@ -7,7 +7,7 @@ description: |
7
7
  conditional + code order 2 + test order 3 conditional + deliver order 4) + disciplines_applied
8
8
  (6 default) + tools_available (8 entry: superpowers-brainstorming + tdd + grill-with-docs +
9
9
  zoom-out + improve-codebase-architecture + diagnose + ralph-loop + planning-with-files)。
10
- Triggered by harnessed CLI `harnessed task --subtask <text>` or slash command `/task`
10
+ Triggered by slash command `/task`
11
11
  (bare per ADR 0030 namespace policy D-02 LOCK) after `harnessed setup`.
12
12
  trigger_phrases:
13
13
  - "task"
@@ -55,9 +55,22 @@ Sister `workflows/capabilities.yaml`:
55
55
 
56
56
  ## Invocation
57
57
 
58
- - CLI: `harnessed task --subtask "<text>"`
59
58
  - Slash command: `/task <text>` (bare per ADR 0030 namespace policy D-02 LOCK after `harnessed setup`)
60
59
 
60
+ ## How to invoke
61
+
62
+ Use the Bash tool to run:
63
+
64
+ ```bash
65
+ echo "$ARGUMENTS" | harnessed run task --task-stdin
66
+ ```
67
+
68
+ If `$ARGUMENTS` is empty, run `harnessed run task` (no stdin pipe).
69
+
70
+ After completion, the Bash output prints a `Next:` hint on stderr suggesting the next stage. Decide whether to invoke based on conversation context — the hint is informational, not prescriptive.
71
+
72
+ <!-- harnessed-generated:v3.4.4 -->
73
+
61
74
  ## References
62
75
 
63
76
  - D-01 master orchestrator delegation pattern
@@ -54,32 +54,19 @@ sister CLAUDE.md "Discuss / Research 阶段" mattpocock 招式按需召唤 patte
54
54
  unconditional fire (D-05 invokes_tools 与 OnClause 并存, 但作用面不同 — invokes_tools
55
55
  phase-level conditional tool fire NOT 决定 phase 是否走)。
56
56
 
57
- ## CLI invocation
57
+ ## How to invoke
58
58
 
59
- ```bash
60
- # Dry-run preview — arbitrate-only, never spawns SDK.
61
- harnessed task-clarify --task "<text>" --dry-run --non-interactive
59
+ Use the Bash tool to run:
62
60
 
63
- # Apply path — real SDK spawn + 1-phase (conditional brainstorming via gate evaluation).
64
- harnessed task-clarify --task "<text>" --apply
61
+ ```bash
62
+ echo "$ARGUMENTS" | harnessed run task-clarify --task-stdin
65
63
  ```
66
64
 
67
- ## Forward-looking note
68
-
69
- The `trigger_phrases:` frontmatter is active after `harnessed setup` copies this
70
- SKILL.md to `~/.claude/skills/task-clarify/` — Claude Code then loads the slash
71
- command `/task-clarify` automatically (Gap B fix — sister v1.0.2 mechanism).
72
-
73
- ## How to invoke
65
+ If `$ARGUMENTS` is empty, run `harnessed run task-clarify` (no stdin pipe).
74
66
 
75
- Use the SlashCommand tool to run: `{{ capabilities.superpowers-brainstorming.cmd }}`
67
+ After completion, the Bash output prints a `Next:` hint on stderr suggesting the next stage. Decide whether to invoke based on conversation context — the hint is informational, not prescriptive.
76
68
 
77
- (If a `⚠️ ... not installed` warning was printed by `harnessed setup`, the backing
78
- capability is missing on disk. Install it (`claude plugin install <name>` for
79
- plugins, or follow the official install instructions for user-skills — e.g. for
80
- gstack: `git clone https://github.com/garrytan/gstack.git ~/.claude/skills/gstack` then
81
- `cd ~/.claude/skills/gstack && ./setup`), then re-run `harnessed setup` to re-render
82
- this SKILL.md and clear the warning.)
69
+ <!-- harnessed-generated:v3.4.4 -->
83
70
 
84
71
  ## References
85
72
 
@@ -60,32 +60,19 @@ per CLAUDE.md "跨 session 恢复" 模式 + R20.6 Manus-style 持久化。Plugin
60
60
  verified at `~/.claude/plugins/cache/planning-with-files/planning-with-files/2.34.0/`
61
61
  (2026-05-20).
62
62
 
63
- ## CLI invocation
63
+ ## How to invoke
64
64
 
65
- ```bash
66
- # Dry-run preview — arbitrate-only, never spawns SDK.
67
- harnessed task-code --task "<text>" --dry-run --non-interactive
65
+ Use the Bash tool to run:
68
66
 
69
- # Apply path — real SDK spawn + 2-phase chain.
70
- harnessed task-code --task "<text>" --apply
67
+ ```bash
68
+ echo "$ARGUMENTS" | harnessed run task-code --task-stdin
71
69
  ```
72
70
 
73
- ## Forward-looking note
74
-
75
- The `trigger_phrases:` frontmatter is active after `harnessed setup` copies this
76
- SKILL.md to `~/.claude/skills/task-code/` — Claude Code then loads the slash
77
- command `/task-code` automatically (Gap B fix — sister v1.0.2 mechanism).
78
-
79
- ## How to invoke
71
+ If `$ARGUMENTS` is empty, run `harnessed run task-code` (no stdin pipe).
80
72
 
81
- Use the SlashCommand tool to run: `{{ capabilities.planning-with-files.cmd }}`
73
+ After completion, the Bash output prints a `Next:` hint on stderr suggesting the next stage. Decide whether to invoke based on conversation context — the hint is informational, not prescriptive.
82
74
 
83
- (If a `⚠️ ... not installed` warning was printed by `harnessed setup`, the backing
84
- capability is missing on disk. Install it (`claude plugin install <name>` for
85
- plugins, or follow the official install instructions for user-skills — e.g. for
86
- gstack: `git clone https://github.com/garrytan/gstack.git ~/.claude/skills/gstack` then
87
- `cd ~/.claude/skills/gstack && ./setup`), then re-run `harnessed setup` to re-render
88
- this SKILL.md and clear the warning.)
75
+ <!-- harnessed-generated:v3.4.4 -->
89
76
 
90
77
  ## References
91
78
 
@@ -41,7 +41,7 @@ spawns each phase as a sub-agent via `@anthropic-ai/claude-agent-sdk` 0.3.142+.
41
41
  ralph-loop SDK wrapper 保 completion-promise verbatim string `"COMPLETE"` — sub-task
42
42
  被认为完成的判据是子任务输出包含 verbatim "COMPLETE" string (NOT 启发式 / NOT
43
43
  LLM-as-judge). Sister capabilities.yaml `ralph-loop` entry impl `bundled-skill` +
44
- `sdk_ref: src/routing/lib/ralphLoop.ts` (Phase 2.2 v0.2.0 ship)。
44
+ `sdk_ref: src/workflow/lib/ralphLoop.ts` (Phase 2.2 v0.2.0 ship)。
45
45
 
46
46
  ### Parallelism — ralph-loop 正交 wrapper
47
47
 
@@ -82,32 +82,19 @@ in `progress.md` — sister Phase 01-code progress update pattern, last call in
82
82
  ③ task chain。Plugin path `~/.claude/plugins/cache/planning-with-files/
83
83
  planning-with-files/2.34.0/` verified (2026-05-20)。
84
84
 
85
- ## CLI invocation
85
+ ## How to invoke
86
86
 
87
- ```bash
88
- # Dry-run preview — arbitrate-only, never spawns SDK.
89
- harnessed task-deliver --task "<text>" --dry-run --non-interactive
87
+ Use the Bash tool to run:
90
88
 
91
- # Apply path — real SDK spawn + 2-phase chain (ralph-loop COMPLETE + progress mark).
92
- harnessed task-deliver --task "<text>" --apply
89
+ ```bash
90
+ echo "$ARGUMENTS" | harnessed run task-deliver --task-stdin
93
91
  ```
94
92
 
95
- ## Forward-looking note
96
-
97
- The `trigger_phrases:` frontmatter is active after `harnessed setup` copies this
98
- SKILL.md to `~/.claude/skills/task-deliver/` — Claude Code then loads the slash
99
- command `/task-deliver` automatically (Gap B fix — sister v1.0.2 mechanism).
100
-
101
- ## How to invoke
93
+ If `$ARGUMENTS` is empty, run `harnessed run task-deliver` (no stdin pipe).
102
94
 
103
- Use the SlashCommand tool to run: `{{ capabilities.ralph-loop.cmd }}`
95
+ After completion, the Bash output prints a `Next:` hint on stderr suggesting the next stage. Decide whether to invoke based on conversation context — the hint is informational, not prescriptive.
104
96
 
105
- (If a `⚠️ ... not installed` warning was printed by `harnessed setup`, the backing
106
- capability is missing on disk. Install it (`claude plugin install <name>` for
107
- plugins, or follow the official install instructions for user-skills — e.g. for
108
- gstack: `git clone https://github.com/garrytan/gstack.git ~/.claude/skills/gstack` then
109
- `cd ~/.claude/skills/gstack && ./setup`), then re-run `harnessed setup` to re-render
110
- this SKILL.md and clear the warning.)
97
+ <!-- harnessed-generated:v3.4.4 -->
111
98
 
112
99
  ## References
113
100