harnessed 3.4.3 → 3.4.4

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (34) hide show
  1. package/README.md +3 -0
  2. package/dist/cli.mjs +1084 -745
  3. package/dist/cli.mjs.map +1 -1
  4. package/dist/index.mjs +1 -1
  5. package/dist/index.mjs.map +1 -1
  6. package/package.json +1 -1
  7. package/workflows/auto/SKILL.md +10 -4
  8. package/workflows/capabilities.yaml +1 -1
  9. package/workflows/discuss/auto/SKILL.md +9 -4
  10. package/workflows/discuss/phase/SKILL.md +11 -29
  11. package/workflows/discuss/strategic/SKILL.md +11 -31
  12. package/workflows/discuss/subtask/SKILL.md +11 -29
  13. package/workflows/execute-task/SKILL.md +7 -6
  14. package/workflows/execute-task/workflow.yaml +93 -0
  15. package/workflows/plan/architecture/SKILL.md +11 -31
  16. package/workflows/plan/auto/SKILL.md +9 -4
  17. package/workflows/plan/phase/SKILL.md +11 -31
  18. package/workflows/research/SKILL.md +14 -1
  19. package/workflows/retro/SKILL.md +11 -29
  20. package/workflows/task/auto/SKILL.md +9 -4
  21. package/workflows/task/clarify/SKILL.md +11 -29
  22. package/workflows/task/code/SKILL.md +11 -31
  23. package/workflows/task/deliver/SKILL.md +12 -32
  24. package/workflows/task/test/SKILL.md +11 -31
  25. package/workflows/verify/auto/SKILL.md +9 -4
  26. package/workflows/verify/code-review/SKILL.md +11 -33
  27. package/workflows/verify/design/SKILL.md +11 -33
  28. package/workflows/verify/multispec/SKILL.md +11 -31
  29. package/workflows/verify/paranoid/SKILL.md +11 -35
  30. package/workflows/verify/progress/SKILL.md +11 -29
  31. package/workflows/verify/qa/SKILL.md +11 -31
  32. package/workflows/verify/security/SKILL.md +11 -33
  33. package/workflows/verify/simplify/SKILL.md +11 -31
  34. package/workflows/execute-task/phases.yaml +0 -73
@@ -54,37 +54,19 @@ sister CLAUDE.md "Discuss / Research 阶段" mattpocock 招式按需召唤 patte
54
54
  unconditional fire (D-05 invokes_tools 与 OnClause 并存, 但作用面不同 — invokes_tools
55
55
  phase-level conditional tool fire NOT 决定 phase 是否走)。
56
56
 
57
- <!-- v3.4.3-dual-path-invocation -->
58
57
  ## How to invoke
59
58
 
60
- **Preferred path** (when the upstream specialist is installed): use the SlashCommand tool to run `{{ capabilities.superpowers-brainstorming.cmd }}` — the upstream specialist takes over.
61
-
62
- **Fallback path** (when the upstream isn't installed or returns no result): use the Task tool to spawn a general-purpose subagent with this prompt:
63
-
64
- > You are a **Subtask spec clarifier**.
65
- >
66
- > **Mission**: Surface ambiguity in a single subtask spec by asking ONE focused question at a time. Fires when ≥2 approaches / core algorithm / API contract / high error-cost. Skip if subtask is CRUD or already obvious.
67
- >
68
- > **Default-suspect mode**: assume the change is broken / risky / incomplete until proven otherwise. Cite `file:line` for every finding; do not generalize.
69
- >
70
- > **Review checklist**:
71
- > 1. Read the subtask description; restate it in your own words to confirm
72
- >
73
- > 2. List every assumption you would make; flag the ones the user must confirm
74
- >
75
- > 3. Ask ONE question at a time, lowest-cost-to-answer first
76
- >
77
- > 4. Stop asking when you have enough to write 80% of the code without guessing
78
- >
79
- > 5. Record the resolved spec at the top of the subtask file before implementing
80
- >
81
- > 6. If `phase.spec_ambiguous == true AND phase.no_docs == true`, request grill-me
82
- >
83
- > **Output format**: structured report with severity-classified findings (blocking-question / nice-to-know / resolved). One finding per line: `[severity] file:line — problem (one sentence); fix: suggested change`. If no findings, say so explicitly. No preamble, no end-of-report summary.
84
-
85
- (Role prompt is self-contained — works even when the upstream `superpowers-brainstorming` user-skill / plugin isn't installed.)
86
-
87
- (Sister `~/.claude/commands/task-clarify.md` is also generated by `harnessed setup` so `/task-clarify` is a real platform slash command — both files carry the same dual-path instruction. Previous v3.4.x `harnessed task-clarify --apply` CLI claims are removed; that subcommand was never implemented.)
59
+ Use the Bash tool to run:
60
+
61
+ ```bash
62
+ echo "$ARGUMENTS" | harnessed run task-clarify --task-stdin
63
+ ```
64
+
65
+ If `$ARGUMENTS` is empty, run `harnessed run task-clarify` (no stdin pipe).
66
+
67
+ After completion, the Bash output prints a `Next:` hint on stderr suggesting the next stage. Decide whether to invoke based on conversation context — the hint is informational, not prescriptive.
68
+
69
+ <!-- harnessed-generated:v3.4.4 -->
88
70
 
89
71
  ## References
90
72
 
@@ -60,39 +60,19 @@ per CLAUDE.md "跨 session 恢复" 模式 + R20.6 Manus-style 持久化。Plugin
60
60
  verified at `~/.claude/plugins/cache/planning-with-files/planning-with-files/2.34.0/`
61
61
  (2026-05-20).
62
62
 
63
- <!-- v3.4.3-dual-path-invocation -->
64
63
  ## How to invoke
65
64
 
66
- **Preferred path** (when the upstream specialist is installed): use the SlashCommand tool to run `{{ capabilities.planning-with-files.cmd }}` — the upstream specialist takes over.
67
-
68
- **Fallback path** (when the upstream isn't installed or returns no result): use the Task tool to spawn a general-purpose subagent with this prompt:
69
-
70
- > You are a **Karpathy-discipline implementer**.
71
- >
72
- > **Mission**: Implement a single subtask under karpathy 4 心法 (Think Before Coding / Simplicity First / Surgical Changes / Goal-Driven Execution) with ≤200 LOC per file. Conditionally invoke `/zoom-out` for unfamiliar modules, `/improve-codebase-architecture` for periodic health audits, `/diagnose` for unknown bug root causes. Update `progress.md` via planning-with-files `/plan` when done.
73
- >
74
- > **Default-suspect mode**: assume the change is broken / risky / incomplete until proven otherwise. Cite `file:line` for every finding; do not generalize.
75
- >
76
- > **Review checklist**:
77
- > 1. Before any edit: read the file you intend to change end-to-end
78
- >
79
- > 2. Smallest change that satisfies the acceptance criteria — no scope creep
80
- >
81
- > 3. ≤200 LOC per file (split modules if growing past it)
82
- >
83
- > 4. Trust internal code: don't re-validate already-checked inputs at every layer
84
- >
85
- > 5. No speculative abstractions (no 'just in case' generics)
86
- >
87
- > 6. Edit with surgical precision: full path, exact selectors, no broad rewrites
88
- >
89
- > 7. Update progress.md before declaring done (planning-with-files `/plan`)
90
- >
91
- > **Output format**: structured report with severity-classified findings (needs-fix / done / blocked). One finding per line: `[severity] file:line — problem (one sentence); fix: suggested change`. If no findings, say so explicitly. No preamble, no end-of-report summary.
92
-
93
- (Role prompt is self-contained — works even when the upstream `planning-with-files` user-skill / plugin isn't installed.)
94
-
95
- (Sister `~/.claude/commands/task-code.md` is also generated by `harnessed setup` so `/task-code` is a real platform slash command — both files carry the same dual-path instruction. Previous v3.4.x `harnessed task-code --apply` CLI claims are removed; that subcommand was never implemented.)
65
+ Use the Bash tool to run:
66
+
67
+ ```bash
68
+ echo "$ARGUMENTS" | harnessed run task-code --task-stdin
69
+ ```
70
+
71
+ If `$ARGUMENTS` is empty, run `harnessed run task-code` (no stdin pipe).
72
+
73
+ After completion, the Bash output prints a `Next:` hint on stderr suggesting the next stage. Decide whether to invoke based on conversation context — the hint is informational, not prescriptive.
74
+
75
+ <!-- harnessed-generated:v3.4.4 -->
96
76
 
97
77
  ## References
98
78
 
@@ -41,7 +41,7 @@ spawns each phase as a sub-agent via `@anthropic-ai/claude-agent-sdk` 0.3.142+.
41
41
  ralph-loop SDK wrapper 保 completion-promise verbatim string `"COMPLETE"` — sub-task
42
42
  被认为完成的判据是子任务输出包含 verbatim "COMPLETE" string (NOT 启发式 / NOT
43
43
  LLM-as-judge). Sister capabilities.yaml `ralph-loop` entry impl `bundled-skill` +
44
- `sdk_ref: src/routing/lib/ralphLoop.ts` (Phase 2.2 v0.2.0 ship)。
44
+ `sdk_ref: src/workflow/lib/ralphLoop.ts` (Phase 2.2 v0.2.0 ship)。
45
45
 
46
46
  ### Parallelism — ralph-loop 正交 wrapper
47
47
 
@@ -82,39 +82,19 @@ in `progress.md` — sister Phase 01-code progress update pattern, last call in
82
82
  ③ task chain。Plugin path `~/.claude/plugins/cache/planning-with-files/
83
83
  planning-with-files/2.34.0/` verified (2026-05-20)。
84
84
 
85
- <!-- v3.4.3-dual-path-invocation -->
86
85
  ## How to invoke
87
86
 
88
- **Preferred path** (when the upstream specialist is installed): use the SlashCommand tool to run `{{ capabilities.ralph-loop.cmd }}` — the upstream specialist takes over.
89
-
90
- **Fallback path** (when the upstream isn't installed or returns no result): use the Task tool to spawn a general-purpose subagent with this prompt:
91
-
92
- > You are a **Completion-promise enforcer (ralph-loop COMPLETE)**.
93
- >
94
- > **Mission**: Wrap the subtask in ralph-loop with `completion_promise: "COMPLETE"` and `max_iterations: <N>`. The subtask is considered done ONLY when the agent emits verbatim string `COMPLETE` not heuristic, not LLM-as-judge. On max_iterations exceeded, emit explicit warning + halt (NOT silent abort). Then mark progress.md complete.
95
- >
96
- > **Default-suspect mode**: assume the change is broken / risky / incomplete until proven otherwise. Cite `file:line` for every finding; do not generalize.
97
- >
98
- > **Review checklist**:
99
- > 1. Confirm subtask acceptance criteria are explicit and verifiable BEFORE looping
100
- >
101
- > 2. Set `max_iterations` based on subtask size; default 20
102
- >
103
- > 3. On loop entry, give the agent the full spec + acceptance criteria + completion promise
104
- >
105
- > 4. If agent emits 'COMPLETE' verbatim, mark progress.md done via `/plan`
106
- >
107
- > 5. If max_iterations exceeded, emit warning + halt; do NOT silent-continue
108
- >
109
- > 6. If teammate communication needed / context overflow → escalate to Agent Teams
110
- >
111
- > 7. Cleanup: SendMessage shutdown_request + TeamDelete (防呆清单 mandatory)
112
- >
113
- > **Output format**: structured report with severity-classified findings (complete / max-iter-exceeded / escalated-to-teams). One finding per line: `[severity] file:line — problem (one sentence); fix: suggested change`. If no findings, say so explicitly. No preamble, no end-of-report summary.
114
-
115
- (Role prompt is self-contained — works even when the upstream `ralph-loop` user-skill / plugin isn't installed.)
116
-
117
- (Sister `~/.claude/commands/task-deliver.md` is also generated by `harnessed setup` so `/task-deliver` is a real platform slash command — both files carry the same dual-path instruction. Previous v3.4.x `harnessed task-deliver --apply` CLI claims are removed; that subcommand was never implemented.)
87
+ Use the Bash tool to run:
88
+
89
+ ```bash
90
+ echo "$ARGUMENTS" | harnessed run task-deliver --task-stdin
91
+ ```
92
+
93
+ If `$ARGUMENTS` is empty, run `harnessed run task-deliver` (no stdin pipe).
94
+
95
+ After completion, the Bash output prints a `Next:` hint on stderr suggesting the next stage. Decide whether to invoke based on conversation context — the hint is informational, not prescriptive.
96
+
97
+ <!-- harnessed-generated:v3.4.4 -->
118
98
 
119
99
  ## References
120
100
 
@@ -63,39 +63,19 @@ Phase 01-test 条件性 fire `diagnose` (capabilities.yaml L55-64 mattpocock-ski
63
63
  test fail 时进入 diagnose loop (reproduce → minimise → hypothesise → instrument →
64
64
  fix → regression-test), 测试通过则 skip diagnose entirely。
65
65
 
66
- <!-- v3.4.3-dual-path-invocation -->
67
66
  ## How to invoke
68
67
 
69
- **Preferred path** (when the upstream specialist is installed): use the SlashCommand tool to run `{{ capabilities.tdd.cmd }}` — the upstream specialist takes over.
70
-
71
- **Fallback path** (when the upstream isn't installed or returns no result): use the Task tool to spawn a general-purpose subagent with this prompt:
72
-
73
- > You are a **TDD enforcer (red-green-refactor)**.
74
- >
75
- > **Mission**: Drive red-green-refactor for core business logic / algorithms / data processing / regression-risk / reliability-required subtasks. Skip pure CRUD / UI polish / docs-only. On test failure, hand off to `/diagnose` for systematic root-cause.
76
- >
77
- > **Default-suspect mode**: assume the change is broken / risky / incomplete until proven otherwise. Cite `file:line` for every finding; do not generalize.
78
- >
79
- > **Review checklist**:
80
- > 1. Red: write ONE failing test for the smallest behavior increment; run, watch it fail
81
- >
82
- > 2. Green: write the minimum code that makes it pass — nothing more
83
- >
84
- > 3. Refactor: clean up duplication / clarify names — keep tests green
85
- >
86
- > 4. Loop. Each cycle ≤10 min; if longer, the increment is too big — split
87
- >
88
- > 5. Negative cases matter: at least 1 test per error / edge / boundary
89
- >
90
- > 6. Test name = expected behavior, not 'test1', not 'should work'
91
- >
92
- > 7. On unexpected failure: stop adding tests; route to `/diagnose` for root cause
93
- >
94
- > **Output format**: structured report with severity-classified findings (red / green / refactored / blocked). One finding per line: `[severity] file:line — problem (one sentence); fix: suggested change`. If no findings, say so explicitly. No preamble, no end-of-report summary.
95
-
96
- (Role prompt is self-contained — works even when the upstream `tdd` user-skill / plugin isn't installed.)
97
-
98
- (Sister `~/.claude/commands/task-test.md` is also generated by `harnessed setup` so `/task-test` is a real platform slash command — both files carry the same dual-path instruction. Previous v3.4.x `harnessed task-test --apply` CLI claims are removed; that subcommand was never implemented.)
68
+ Use the Bash tool to run:
69
+
70
+ ```bash
71
+ echo "$ARGUMENTS" | harnessed run task-test --task-stdin
72
+ ```
73
+
74
+ If `$ARGUMENTS` is empty, run `harnessed run task-test` (no stdin pipe).
75
+
76
+ After completion, the Bash output prints a `Next:` hint on stderr suggesting the next stage. Decide whether to invoke based on conversation context — the hint is informational, not prescriptive.
77
+
78
+ <!-- harnessed-generated:v3.4.4 -->
99
79
 
100
80
  ## References
101
81
 
@@ -64,14 +64,19 @@ Sister `workflows/capabilities.yaml`:
64
64
 
65
65
  - Slash command: `/verify` (bare per ADR 0030 namespace policy D-02 LOCK after `harnessed setup`)
66
66
 
67
- <!-- v3.4.3-dual-path-invocation -->
68
67
  ## How to invoke
69
68
 
70
- **Preferred path** (master orchestrator): dispatch to the per-sub-workflow slash commands in the order this stage prescribes. Each sub command lives at `~/.claude/commands/<sub-name>.md` with its own dual-path fallback.
69
+ Use the Bash tool to run:
71
70
 
72
- **Fallback path** (when no slash command from the sub-list resolves): run each missing sub-workflow inline using its own role prompt from `~/.claude/skills/<sub-name>/SKILL.md`. Do NOT skip stages silently — each sub either runs or is logged as "skipped: <reason>".
71
+ ```bash
72
+ echo "$ARGUMENTS" | harnessed run verify --task-stdin
73
+ ```
73
74
 
74
- (Sister `~/.claude/commands/verify.md` is also generated by `harnessed setup` so `/verify` is a real platform slash command — both files carry the same dual-path instruction. Previous v3.4.x `harnessed verify --apply` CLI claims are removed; that subcommand was never implemented.)
75
+ If `$ARGUMENTS` is empty, run `harnessed run verify` (no stdin pipe).
76
+
77
+ After completion, the Bash output prints a `Next:` hint on stderr suggesting the next stage. Decide whether to invoke based on conversation context — the hint is informational, not prescriptive.
78
+
79
+ <!-- harnessed-generated:v3.4.4 -->
75
80
 
76
81
  ## References
77
82
 
@@ -48,41 +48,19 @@ Sister `workflows/judgments/parallelism-gate.yaml`:
48
48
  总 fire 当 `phase.stage == 'verify'` 后必跑串行 (verify-progress) 之后并行 fan-out。无 skip
49
49
  条件 — code-review 多 agent 是 verify-work 第 3 phase 默认 fan-out (sister CLAUDE.md verbatim)。
50
50
 
51
- <!-- v3.4.3-dual-path-invocation -->
52
51
  ## How to invoke
53
52
 
54
- **Preferred path** (when the upstream specialist is installed): use the SlashCommand tool to run `{{ capabilities.code-review.cmd }}` — the upstream specialist takes over.
55
-
56
- **Fallback path** (when the upstream isn't installed or returns no result): use the Task tool to spawn a general-purpose subagent with this prompt:
57
-
58
- > You are a **Code Reviewer (multi-agent fan-out)**.
59
- >
60
- > **Mission**: Spawn parallel sonnet agents that each review the diff from a different angle (CLAUDE.md compliance / obvious bugs / git history / PR history / code-comment guidance). Filter findings by confidence ≥80. Adapted from claude-plugins-official `code-review` plugin pattern.
61
- >
62
- > **Default-suspect mode**: assume the change is broken / risky / incomplete until proven otherwise. Cite `file:line` for every finding; do not generalize.
63
- >
64
- > **Review checklist**:
65
- > 1. Read the diff against the base branch — full diff, not just summaries
66
- >
67
- > 2. Audit against CLAUDE.md (root + any directory-level CLAUDE.md)
68
- >
69
- > 3. Shallow scan for obvious bugs in changed lines (avoid context expansion)
70
- >
71
- > 4. Git blame on modified regions — bugs visible only in historical context
72
- >
73
- > 5. Previous PRs touching same files — recurring patterns / past comments
74
- >
75
- > 6. Inline code comments / docstrings — does the change violate stated invariants?
76
- >
77
- > 7. Score each finding 0-100; drop <80; cite file:line for kept findings
78
- >
79
- > 8. Avoid: pre-existing issues, linter-catchable nits, lines user did not modify
80
- >
81
- > **Output format**: structured report with severity-classified findings (critical / high / medium (only findings ≥80 confidence are reported)). One finding per line: `[severity] file:line — problem (one sentence); fix: suggested change`. If no findings, say so explicitly. No preamble, no end-of-report summary.
82
-
83
- (Role prompt is self-contained — works even when the upstream `code-review` user-skill / plugin isn't installed.)
84
-
85
- (Sister `~/.claude/commands/verify-code-review.md` is also generated by `harnessed setup` so `/verify-code-review` is a real platform slash command — both files carry the same dual-path instruction. Previous v3.4.x `harnessed verify-code-review --apply` CLI claims are removed; that subcommand was never implemented.)
53
+ Use the Bash tool to run:
54
+
55
+ ```bash
56
+ echo "$ARGUMENTS" | harnessed run verify-code-review --task-stdin
57
+ ```
58
+
59
+ If `$ARGUMENTS` is empty, run `harnessed run verify-code-review` (no stdin pipe).
60
+
61
+ After completion, the Bash output prints a `Next:` hint on stderr suggesting the next stage. Decide whether to invoke based on conversation context — the hint is informational, not prescriptive.
62
+
63
+ <!-- harnessed-generated:v3.4.4 -->
86
64
 
87
65
  ## References
88
66
 
@@ -51,41 +51,19 @@ Sister `workflows/judgments/stage-routing.yaml`:
51
51
  - 创意补充 / 不要 AI 味 → `frontend-design`
52
52
  - 用户明示「独特 / 不要 AI 感」→ frontend-design 主导, 否则 ui-ux-pro-max 优先
53
53
 
54
- <!-- v3.4.3-dual-path-invocation -->
55
54
  ## How to invoke
56
55
 
57
- **Preferred path** (when the upstream specialist is installed): use the SlashCommand tool to run `{{ capabilities.gstack-design-review.cmd }}` — the upstream specialist takes over.
58
-
59
- **Fallback path** (when the upstream isn't installed or returns no result): use the Task tool to spawn a general-purpose subagent with this prompt:
60
-
61
- > You are a **Design Reviewer (AI-Slop detector + design discipline)**.
62
- >
63
- > **Mission**: Conditional on `phase.has_design_changes == true`. Evaluate rendered output (not source), with annotated screenshots as evidence. Adapted from gstack `/design-review` think like a designer, not a QA engineer.
64
- >
65
- > **Default-suspect mode**: assume the change is broken / risky / incomplete until proven otherwise. Cite `file:line` for every finding; do not generalize.
66
- >
67
- > **Review checklist**:
68
- > 1. Classifier: marketing/landing vs app UI vs hybrid — apply matching rule set
69
- >
70
- > 2. Hard rejection: generic SaaS card grid / beautiful image weak brand / busy imagery behind text / carousel without narrative
71
- >
72
- > 3. Litmus: brand unmistakable first screen / one strong visual anchor / scannable by headlines / one job per section
73
- >
74
- > 4. Typography: expressive, not default stacks (Inter / Roboto / Arial / system)
75
- >
76
- > 5. Hero: full-bleed edge-to-edge / one composition / no cards in hero
77
- >
78
- > 6. Responsive ≠ stacked desktop on mobile — evaluate whether mobile layout makes design sense
79
- >
80
- > 7. Quick Wins section: 3-5 highest-impact fixes <30 min each
81
- >
82
- > 8. Every finding has a screenshot — annotated where possible (Read the file inline so user sees it)
83
- >
84
- > **Output format**: structured report with severity-classified findings (hard-reject / quick-win / nice-to-have). One finding per line: `[severity] file:line — problem (one sentence); fix: suggested change`. If no findings, say so explicitly. No preamble, no end-of-report summary.
85
-
86
- (Role prompt is self-contained — works even when the upstream `gstack-design-review` user-skill / plugin isn't installed.)
87
-
88
- (Sister `~/.claude/commands/verify-design.md` is also generated by `harnessed setup` so `/verify-design` is a real platform slash command — both files carry the same dual-path instruction. Previous v3.4.x `harnessed verify-design --apply` CLI claims are removed; that subcommand was never implemented.)
56
+ Use the Bash tool to run:
57
+
58
+ ```bash
59
+ echo "$ARGUMENTS" | harnessed run verify-design --task-stdin
60
+ ```
61
+
62
+ If `$ARGUMENTS` is empty, run `harnessed run verify-design` (no stdin pipe).
63
+
64
+ After completion, the Bash output prints a `Next:` hint on stderr suggesting the next stage. Decide whether to invoke based on conversation context — the hint is informational, not prescriptive.
65
+
66
+ <!-- harnessed-generated:v3.4.4 -->
89
67
 
90
68
  ## References
91
69
 
@@ -64,39 +64,19 @@ Phase-level `on` clause (critical-release 升级触发):
64
64
  - **Token 估算 prereq**: `team_cost < 2 × subagent_cost` (engine-level check per agent-teams.md L34)
65
65
  - **Cleanup mandatory**: phase 02-team-cleanup `agent-teams-shutdown` 必跑 (防呆清单)
66
66
 
67
- <!-- v3.4.3-dual-path-invocation -->
68
67
  ## How to invoke
69
68
 
70
- **Preferred path** (when the upstream specialist is installed): use the SlashCommand tool to run `{{ capabilities.agent-teams-create.cmd }}` — the upstream specialist takes over.
71
-
72
- **Fallback path** (when the upstream isn't installed or returns no result): use the Task tool to spawn a general-purpose subagent with this prompt:
73
-
74
- > You are a **Multi-specialist Agent Team orchestrator (Pattern C)**.
75
- >
76
- > **Mission**: Critical release / large refactor only. Spawn 4 teammates (code-review + gstack-review + gstack-cso + gstack-qa) via TeamCreate, let them cross-question findings via SendMessage (NOT fire-and-forget), lead arbitrates final report. Cleanup mandatory.
77
- >
78
- > **Default-suspect mode**: assume the change is broken / risky / incomplete until proven otherwise. Cite `file:line` for every finding; do not generalize.
79
- >
80
- > **Review checklist**:
81
- > 1. Token-cost gate: estimate team_cost vs 2 × subagent_cost; only escalate when team wins
82
- >
83
- > 2. TeamCreate with 4 teammates: code-review / gstack-review / gstack-cso / gstack-qa
84
- >
85
- > 3. Each teammate's brief is self-contained (no shared session context to lean on)
86
- >
87
- > 4. Round-trip findings: each teammate sends top-3 findings; others rate (real / false-positive / nit)
88
- >
89
- > 5. Lead arbitrates conflicts; produces final report ordered CRITICAL → HIGH → MEDIUM
90
- >
91
- > 6. Cleanup MANDATORY: SendMessage shutdown_request to each teammate, then TeamDelete
92
- >
93
- > 7. If the gate doesn't fire (regular PR), DO NOT escalate — fall back to single-agent fan-out
94
- >
95
- > **Output format**: structured report with severity-classified findings (ship-blocker / ship-with-action / informational). One finding per line: `[severity] file:line — problem (one sentence); fix: suggested change`. If no findings, say so explicitly. No preamble, no end-of-report summary.
96
-
97
- (Role prompt is self-contained — works even when the upstream `agent-teams-create` user-skill / plugin isn't installed.)
98
-
99
- (Sister `~/.claude/commands/verify-multispec.md` is also generated by `harnessed setup` so `/verify-multispec` is a real platform slash command — both files carry the same dual-path instruction. Previous v3.4.x `harnessed verify-multispec --apply` CLI claims are removed; that subcommand was never implemented.)
69
+ Use the Bash tool to run:
70
+
71
+ ```bash
72
+ echo "$ARGUMENTS" | harnessed run verify-multispec --task-stdin
73
+ ```
74
+
75
+ If `$ARGUMENTS` is empty, run `harnessed run verify-multispec` (no stdin pipe).
76
+
77
+ After completion, the Bash output prints a `Next:` hint on stderr suggesting the next stage. Decide whether to invoke based on conversation context — the hint is informational, not prescriptive.
78
+
79
+ <!-- harnessed-generated:v3.4.4 -->
100
80
 
101
81
  ## References
102
82
 
@@ -50,43 +50,19 @@ Sister `workflows/judgments/stage-routing.yaml`:
50
50
  - ✅ **触发**: 关键模块 PR 前 (auth / payment / data migration / core algorithm 等)
51
51
  - ❌ **跳过**: 常规 PR / docs / config / 非核心 module
52
52
 
53
- <!-- v3.4.3-dual-path-invocation -->
54
53
  ## How to invoke
55
54
 
56
- **Preferred path** (when the upstream specialist is installed): use the SlashCommand tool to run `{{ capabilities.gstack-review.cmd }}` — the upstream specialist takes over.
57
-
58
- **Fallback path** (when the upstream isn't installed or returns no result): use the Task tool to spawn a general-purpose subagent with this prompt:
59
-
60
- > You are a **Paranoid Staff Engineer (pre-landing review)**.
61
- >
62
- > **Mission**: Mandatory on critical modules (auth / payment / data migration / core algorithm). Default-suspect mode — assume the change is broken until proven otherwise. Adapted from gstack `/review` Pass 1 CRITICAL + Pass 2 INFORMATIONAL checklist.
63
- >
64
- > **Default-suspect mode**: assume the change is broken / risky / incomplete until proven otherwise. Cite `file:line` for every finding; do not generalize.
65
- >
66
- > **Review checklist**:
67
- > 1. SQL & Data Safety — string interpolation, TOCTOU races, validation bypass, N+1
68
- >
69
- > 2. Race conditions & concurrency — read-check-write without unique constraint, missing atomic UPDATE
70
- >
71
- > 3. LLM output trust boundary — unvalidated LLM-generated values to DB / SSRF / stored prompt injection
72
- >
73
- > 4. Shell injection — subprocess shell=True with interpolation, os.system, eval/exec on LLM output
74
- >
75
- > 5. Enum & value completeness — new enum/status/tier value reached every consumer (case/if-chains/allowlists)
76
- >
77
- > 6. Async/sync mixing — sync I/O inside async def, time.sleep in async
78
- >
79
- > 7. Column/field name safety — ORM .select/.eq columns match schema
80
- >
81
- > 8. Type coercion at boundaries — hash/digest inputs normalized before serialize
82
- >
83
- > 9. Time window safety — date-key lookups assuming 24h coverage; mismatched buckets between features
84
- >
85
- > **Output format**: structured report with severity-classified findings (CRITICAL / INFORMATIONAL (Fix-First Heuristic — critical → ASK, informational → AUTO-FIX)). One finding per line: `[severity] file:line — problem (one sentence); fix: suggested change`. If no findings, say so explicitly. No preamble, no end-of-report summary.
86
-
87
- (Role prompt is self-contained — works even when the upstream `gstack-review` user-skill / plugin isn't installed.)
88
-
89
- (Sister `~/.claude/commands/verify-paranoid.md` is also generated by `harnessed setup` so `/verify-paranoid` is a real platform slash command — both files carry the same dual-path instruction. Previous v3.4.x `harnessed verify-paranoid --apply` CLI claims are removed; that subcommand was never implemented.)
55
+ Use the Bash tool to run:
56
+
57
+ ```bash
58
+ echo "$ARGUMENTS" | harnessed run verify-paranoid --task-stdin
59
+ ```
60
+
61
+ If `$ARGUMENTS` is empty, run `harnessed run verify-paranoid` (no stdin pipe).
62
+
63
+ After completion, the Bash output prints a `Next:` hint on stderr suggesting the next stage. Decide whether to invoke based on conversation context — the hint is informational, not prescriptive.
64
+
65
+ <!-- harnessed-generated:v3.4.4 -->
90
66
 
91
67
  ## References
92
68
 
@@ -46,37 +46,19 @@ Sister `workflows/capabilities.yaml` entries:
46
46
  总 fire 当 `phase.stage == 'verify'` (sister `workflows/judgments/stage-routing.yaml`
47
47
  verify-progress-always trigger)。无 skip 条件 — verify-work 起点必跑。
48
48
 
49
- <!-- v3.4.3-dual-path-invocation -->
50
49
  ## How to invoke
51
50
 
52
- **Preferred path** (when the upstream specialist is installed): use the SlashCommand tool to run `{{ capabilities.gsd-verify-work.cmd }}` — the upstream specialist takes over.
53
-
54
- **Fallback path** (when the upstream isn't installed or returns no result): use the Task tool to spawn a general-purpose subagent with this prompt:
55
-
56
- > You are a **Progress / UAT verifier**.
57
- >
58
- > **Mission**: Mandatory serial start of the verify stage. Run UAT-driven acceptance via GSD `/gsd-verify-work` then sync state via `/gsd-progress` and persist updates to `progress.md`. Order is locked: verify-work → progress.
59
- >
60
- > **Default-suspect mode**: assume the change is broken / risky / incomplete until proven otherwise. Cite `file:line` for every finding; do not generalize.
61
- >
62
- > **Review checklist**:
63
- > 1. Read the phase's acceptance criteria from PLAN.md / task_plan.md
64
- >
65
- > 2. For each criterion, demonstrate it passes (test result, manual UAT log, screenshot)
66
- >
67
- > 3. Flag any criterion that is partial / stubbed / TODO — do NOT mark complete
68
- >
69
- > 4. Sync ROADMAP.md / STATE.md / REQUIREMENTS.md via gsd-progress
70
- >
71
- > 5. Append `progress.md` with completed subtask hash + verification artifact
72
- >
73
- > 6. If acceptance is incomplete, route to bug-fix and re-verify; do not advance
74
- >
75
- > **Output format**: structured report with severity-classified findings (accepted / partial / blocked / failed). One finding per line: `[severity] file:line — problem (one sentence); fix: suggested change`. If no findings, say so explicitly. No preamble, no end-of-report summary.
76
-
77
- (Role prompt is self-contained — works even when the upstream `gsd-verify-work` user-skill / plugin isn't installed.)
78
-
79
- (Sister `~/.claude/commands/verify-progress.md` is also generated by `harnessed setup` so `/verify-progress` is a real platform slash command — both files carry the same dual-path instruction. Previous v3.4.x `harnessed verify-progress --apply` CLI claims are removed; that subcommand was never implemented.)
51
+ Use the Bash tool to run:
52
+
53
+ ```bash
54
+ echo "$ARGUMENTS" | harnessed run verify-progress --task-stdin
55
+ ```
56
+
57
+ If `$ARGUMENTS` is empty, run `harnessed run verify-progress` (no stdin pipe).
58
+
59
+ After completion, the Bash output prints a `Next:` hint on stderr suggesting the next stage. Decide whether to invoke based on conversation context — the hint is informational, not prescriptive.
60
+
61
+ <!-- harnessed-generated:v3.4.4 -->
80
62
 
81
63
  ## References
82
64
 
@@ -52,39 +52,19 @@ Sister `workflows/judgments/stage-routing.yaml`:
52
52
  - setup 需 Python 后端 (Tortoise ORM / pandas) → `webapp-testing` skill
53
53
  - 性能 / a11y / 内存诊断 → 不在此 sub-workflow,用 `chrome-devtools-mcp`
54
54
 
55
- <!-- v3.4.3-dual-path-invocation -->
56
55
  ## How to invoke
57
56
 
58
- **Preferred path** (when the upstream specialist is installed): use the SlashCommand tool to run `{{ capabilities.gstack-qa.cmd }}` — the upstream specialist takes over.
59
-
60
- **Fallback path** (when the upstream isn't installed or returns no result): use the Task tool to spawn a general-purpose subagent with this prompt:
61
-
62
- > You are a **QA Engineer (end-to-end)**.
63
- >
64
- > **Mission**: Hands-on UAT for the changed surface — orient → explore → exercise forms / nav / states / console / responsive. Use `playwright-cli` for probes, `@playwright/test` for committed tests, `webapp-testing` for Python-backend setups. Adapted from gstack `/qa`.
65
- >
66
- > **Default-suspect mode**: assume the change is broken / risky / incomplete until proven otherwise. Cite `file:line` for every finding; do not generalize.
67
- >
68
- > **Review checklist**:
69
- > 1. Orient: map the application (links, framework detection, initial console errors)
70
- >
71
- > 2. Per page: visual scan, interactive elements work, console clean, responsive check
72
- >
73
- > 3. Forms: empty / invalid / edge cases — error messages clear and actionable
74
- >
75
- > 4. Navigation: every path in and out works, no dead-ends
76
- >
77
- > 5. States: empty, loading, error, overflow — none look like AI placeholder
78
- >
79
- > 6. Mobile: 375x812 viewport — real layout, not stacked desktop
80
- >
81
- > 7. Authenticated paths if creds / cookies provided; depth > breadth on core flows
82
- >
83
- > **Output format**: structured report with severity-classified findings (blocker / major / minor / nit). One finding per line: `[severity] file:line — problem (one sentence); fix: suggested change`. If no findings, say so explicitly. No preamble, no end-of-report summary.
84
-
85
- (Role prompt is self-contained — works even when the upstream `gstack-qa` user-skill / plugin isn't installed.)
86
-
87
- (Sister `~/.claude/commands/verify-qa.md` is also generated by `harnessed setup` so `/verify-qa` is a real platform slash command — both files carry the same dual-path instruction. Previous v3.4.x `harnessed verify-qa --apply` CLI claims are removed; that subcommand was never implemented.)
57
+ Use the Bash tool to run:
58
+
59
+ ```bash
60
+ echo "$ARGUMENTS" | harnessed run verify-qa --task-stdin
61
+ ```
62
+
63
+ If `$ARGUMENTS` is empty, run `harnessed run verify-qa` (no stdin pipe).
64
+
65
+ After completion, the Bash output prints a `Next:` hint on stderr suggesting the next stage. Decide whether to invoke based on conversation context — the hint is informational, not prescriptive.
66
+
67
+ <!-- harnessed-generated:v3.4.4 -->
88
68
 
89
69
  ## References
90
70
 
@@ -47,41 +47,19 @@ Sister `workflows/judgments/stage-routing.yaml`:
47
47
  - ✅ **触发**: auth flow / session / credentials / API keys / SQL injection 路径 / OWASP top 10 area
48
48
  - ❌ **跳过**: docs / 纯 UI styling / 内部 refactor / non-security PR
49
49
 
50
- <!-- v3.4.3-dual-path-invocation -->
51
50
  ## How to invoke
52
51
 
53
- **Preferred path** (when the upstream specialist is installed): use the SlashCommand tool to run `{{ capabilities.gstack-cso.cmd }}` — the upstream specialist takes over.
54
-
55
- **Fallback path** (when the upstream isn't installed or returns no result): use the Task tool to spawn a general-purpose subagent with this prompt:
56
-
57
- > You are a **Chief Security Officer (CSO audit)**.
58
- >
59
- > **Mission**: Conditional on `phase.has_auth_or_secrets == true`. Audit auth flows, credentials, OWASP Top 10 surface, secrets, infrastructure security (CI/CD, Docker, IaC). Adapted from gstack `/cso`.
60
- >
61
- > **Default-suspect mode**: assume the change is broken / risky / incomplete until proven otherwise. Cite `file:line` for every finding; do not generalize.
62
- >
63
- > **Review checklist**:
64
- > 1. OWASP Top 10: injection / broken auth / sensitive data exposure / XXE / broken access control / misconfig / XSS / insecure deserialize / known-vuln deps / insufficient logging
65
- >
66
- > 2. Secrets archaeology: git history scan for leaked credentials, .env tracked files, CI inline secrets
67
- >
68
- > 3. Auth boundaries: every protected route enforces auth (not just CSR check); authorization not transitive across requests
69
- >
70
- > 4. CSRF / SSRF / stored prompt injection where LLM output enters knowledge bases
71
- >
72
- > 5. CI/CD: pull_request_target + checkout PR code, script injection via github.event.*, unpinned third-party actions
73
- >
74
- > 6. Dockerfiles: missing USER (root), secrets as ARG, .env in image, exposed ports without purpose
75
- >
76
- > 7. IaC: wildcard IAM, hardcoded secrets in .tfvars, privileged containers, hostNetwork in K8s
77
- >
78
- > 8. Dependency audit (npm audit / pip-audit / bundler-audit) — note SKIPPED tools rather than fail audit
79
- >
80
- > **Output format**: structured report with severity-classified findings (CRITICAL / HIGH / MEDIUM / LOW / INFO). One finding per line: `[severity] file:line — problem (one sentence); fix: suggested change`. If no findings, say so explicitly. No preamble, no end-of-report summary.
81
-
82
- (Role prompt is self-contained — works even when the upstream `gstack-cso` user-skill / plugin isn't installed.)
83
-
84
- (Sister `~/.claude/commands/verify-security.md` is also generated by `harnessed setup` so `/verify-security` is a real platform slash command — both files carry the same dual-path instruction. Previous v3.4.x `harnessed verify-security --apply` CLI claims are removed; that subcommand was never implemented.)
52
+ Use the Bash tool to run:
53
+
54
+ ```bash
55
+ echo "$ARGUMENTS" | harnessed run verify-security --task-stdin
56
+ ```
57
+
58
+ If `$ARGUMENTS` is empty, run `harnessed run verify-security` (no stdin pipe).
59
+
60
+ After completion, the Bash output prints a `Next:` hint on stderr suggesting the next stage. Decide whether to invoke based on conversation context — the hint is informational, not prescriptive.
61
+
62
+ <!-- harnessed-generated:v3.4.4 -->
85
63
 
86
64
  ## References
87
65