@curdx/flow 2.2.0 → 2.2.3

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (78) hide show
  1. package/.claude-plugin/marketplace.json +2 -2
  2. package/.claude-plugin/plugin.json +19 -2
  3. package/README.md +15 -8
  4. package/README.zh.md +5 -3
  5. package/agent-preamble/preamble.md +33 -0
  6. package/agents/flow-adversary.md +1 -1
  7. package/agents/flow-architect.md +2 -1
  8. package/agents/flow-brownfield-analyst.md +153 -0
  9. package/agents/flow-debugger.md +6 -11
  10. package/agents/flow-edge-hunter.md +1 -1
  11. package/agents/flow-executor.md +30 -8
  12. package/agents/flow-planner.md +38 -5
  13. package/agents/flow-product-designer.md +2 -1
  14. package/agents/flow-qa-engineer.md +9 -5
  15. package/agents/flow-researcher.md +2 -1
  16. package/agents/flow-reviewer.md +23 -5
  17. package/agents/flow-security-auditor.md +5 -3
  18. package/agents/flow-triage-analyst.md +5 -24
  19. package/agents/flow-ui-researcher.md +4 -3
  20. package/agents/flow-ux-designer.md +12 -39
  21. package/agents/flow-verifier.md +35 -3
  22. package/cli/README.md +3 -1
  23. package/cli/doctor-workflow.js +1074 -2
  24. package/cli/doctor.js +8 -0
  25. package/cli/help.js +2 -0
  26. package/cli/lib/doctor-report.js +256 -1
  27. package/cli/lib/frontmatter.js +44 -0
  28. package/cli/lib/json-schema.js +57 -0
  29. package/cli/lib/runtime.js +20 -2
  30. package/cli/utils.js +6 -1
  31. package/gates/adversarial-review-gate.md +1 -1
  32. package/gates/security-gate.md +2 -2
  33. package/gates/test-quality-gate.md +59 -0
  34. package/hooks/hooks.json +16 -2
  35. package/hooks/scripts/common.sh +4 -0
  36. package/hooks/scripts/session-start.sh +17 -2
  37. package/hooks/scripts/stop-watcher.sh +69 -18
  38. package/hooks/scripts/subagent-artifact-guard.sh +159 -0
  39. package/hooks/scripts/subagent-statusline.sh +105 -0
  40. package/knowledge/atomic-commits.md +1 -1
  41. package/knowledge/claude-code-runtime-contracts.md +203 -0
  42. package/knowledge/epic-decomposition.md +1 -1
  43. package/knowledge/execution-strategies.md +23 -1
  44. package/knowledge/planning-reviews.md +2 -2
  45. package/knowledge/poc-first-workflow.md +8 -8
  46. package/knowledge/review-feedback-intake.md +57 -0
  47. package/knowledge/two-stage-review.md +19 -6
  48. package/knowledge/wave-execution.md +16 -1
  49. package/output-styles/curdx-evidence-first.md +34 -0
  50. package/package.json +7 -1
  51. package/schemas/agent-frontmatter.schema.json +0 -7
  52. package/schemas/config.schema.json +14 -0
  53. package/schemas/hooks.schema.json +34 -2
  54. package/schemas/output-style-frontmatter.schema.json +22 -0
  55. package/schemas/plugin-manifest.schema.json +387 -17
  56. package/schemas/plugin-settings.schema.json +29 -0
  57. package/schemas/skill-frontmatter.schema.json +109 -4
  58. package/schemas/spec-state.schema.json +29 -4
  59. package/settings.json +6 -0
  60. package/skills/brownfield-index/SKILL.md +31 -35
  61. package/skills/browser-qa/SKILL.md +11 -3
  62. package/skills/cancel/SKILL.md +82 -0
  63. package/skills/debug/SKILL.md +6 -2
  64. package/skills/epic/SKILL.md +5 -3
  65. package/skills/fast/SKILL.md +1 -0
  66. package/skills/help/SKILL.md +17 -7
  67. package/skills/implement/SKILL.md +38 -7
  68. package/skills/init/SKILL.md +2 -1
  69. package/skills/review/SKILL.md +4 -1
  70. package/skills/security-audit/SKILL.md +17 -3
  71. package/skills/spec/SKILL.md +2 -1
  72. package/skills/start/SKILL.md +18 -18
  73. package/skills/status/SKILL.md +85 -0
  74. package/skills/ui-sketch/SKILL.md +11 -3
  75. package/skills/verify/SKILL.md +13 -1
  76. package/templates/config.json.tmpl +4 -1
  77. package/templates/progress.md.tmpl +19 -0
  78. package/templates/tasks.md.tmpl +26 -3
@@ -57,7 +57,7 @@ fi
57
57
  # the stop-hook strategy never activated.
58
58
  export STATE_FILE
59
59
 
60
- read STRATEGY PHASE TASK_INDEX TOTAL_TASKS FAILED ROUNDS <<EOF
60
+ read STRATEGY PHASE TASK_INDEX TOTAL_TASKS FAILED ROUNDS RECOVERY_MODE MAX_FIX_TASKS <<EOF
61
61
  $(python3 <<'PY'
62
62
  import json, os, sys
63
63
  p = os.environ.get("STATE_FILE")
@@ -72,7 +72,9 @@ ti = ex.get("task_index", 0)
72
72
  tt = ex.get("total_tasks", 0)
73
73
  failed = ex.get("failed_attempts", 0)
74
74
  rounds = ex.get("global_iteration", 0)
75
- print(strategy, phase, ti, tt, failed, rounds)
75
+ recovery_mode = ex.get("recovery_mode", "manual")
76
+ max_fix_tasks = ex.get("max_fix_tasks_per_original", 2)
77
+ print(strategy, phase, ti, tt, failed, rounds, recovery_mode, max_fix_tasks)
76
78
  PY
77
79
  )
78
80
  EOF
@@ -81,7 +83,7 @@ EOF
81
83
  [ "$STRATEGY" != "stop-hook" ] && allow_stop
82
84
  [ "$PHASE" != "execute" ] && allow_stop
83
85
 
84
- # ---------- 5. Check for completion signal in transcript ----------
86
+ # ---------- 5. Check hook input + completion signal in transcript ----------
85
87
  # Claude Code passes transcript path via stdin as JSON: {"transcript_path": "/path/..."}
86
88
  # We read stdin to detect ALL_TASKS_COMPLETE or TASK_FAILED
87
89
  INPUT=$(cat 2>/dev/null || echo "{}")
@@ -89,6 +91,19 @@ TRANSCRIPT_PATH=$(echo "$INPUT" | python3 -c 'import json,sys;
89
91
  try: print(json.load(sys.stdin).get("transcript_path",""))
90
92
  except: print("")' 2>/dev/null)
91
93
 
94
+ STOP_HOOK_ACTIVE=$(echo "$INPUT" | python3 -c 'import json,sys;
95
+ try: print("true" if json.load(sys.stdin).get("stop_hook_active", False) else "false")
96
+ except: print("false")' 2>/dev/null)
97
+
98
+ if [ "$STOP_HOOK_ACTIVE" = "true" ]; then
99
+ # Claude Code sets stop_hook_active during a stop-hook continuation.
100
+ # Treat it as context only: the final decision still comes from transcript
101
+ # signals, state-file progress, and tasks.md parity. Unconditionally allowing
102
+ # stop here can terminate an in-flight stop-hook loop after the first
103
+ # continuation, leaving remaining tasks stranded.
104
+ echo "[CurDX-Flow stop-hook] stop_hook_active=true; evaluating transcript/state before deciding" >&2
105
+ fi
106
+
92
107
  TRANSCRIPT_TAIL=""
93
108
  if [ -n "$TRANSCRIPT_PATH" ] && [ -f "$TRANSCRIPT_PATH" ]; then
94
109
  # Read last 50KB only (efficiency)
@@ -100,22 +115,54 @@ fi
100
115
  # python source text. Previously a spec name containing single quotes or
101
116
  # $-signs could break the script or inject arbitrary code.
102
117
 
103
- # Check for explicit completion signals
104
- if echo "$TRANSCRIPT_TAIL" | grep -q "ALL_TASKS_COMPLETE"; then
105
- # Cleanup: mark phase completed
118
+ # Helper: count unchecked tasks in tasks.md. If tasks.md is absent, return 0
119
+ # to avoid blocking recovery for partially-initialized specs.
120
+ unchecked_task_count() {
121
+ local tasks_file="$SPEC_DIR/tasks.md"
122
+ [ ! -f "$tasks_file" ] && { echo 0; return; }
123
+ grep -Ec '^- \[ \] \*\*[0-9]+(\.[0-9]+|\.VF|\.X|\.X\+1)*\*\*' "$tasks_file" 2>/dev/null || echo 0
124
+ }
125
+
126
+ last_task_signal() {
127
+ local msg="${1:-}"
128
+ printf '%s' "$msg" \
129
+ | grep -Eo 'ALL_TASKS_COMPLETE|TASK_(COMPLETE|FAILED):[[:space:]]*[0-9]+(\.([0-9]+|VF|X(\+[0-9]+)?))*' \
130
+ | tail -1
131
+ }
132
+
133
+ failed_task_id() {
134
+ local msg="${1:-}"
135
+ printf '%s' "$msg" | sed -nE 's/.*TASK_FAILED:[[:space:]]*([0-9]+(\.([0-9]+|VF|X(\+[0-9]+)?))*).*/\1/p' | tail -1
136
+ }
137
+
138
+ mark_execute_complete() {
106
139
  python3 <<'PY' 2>/dev/null
107
140
  import json, os
108
141
  p = os.environ["STATE_FILE"]
109
142
  s = json.load(open(p))
110
143
  s.setdefault("phase_status", {})["execute"] = "completed"
111
- s["phase"] = "verify" # move to verify phase
144
+ s["phase"] = "verify"
112
145
  json.dump(s, open(p, "w"), indent=2, ensure_ascii=False)
113
146
  PY
147
+ }
148
+
149
+ # Check for explicit completion signals
150
+ LAST_TASK_SIGNAL="$(last_task_signal "$TRANSCRIPT_TAIL")"
151
+
152
+ if [ "$LAST_TASK_SIGNAL" = "ALL_TASKS_COMPLETE" ]; then
153
+ UNCHECKED="$(unchecked_task_count)"
154
+ if [ "${UNCHECKED:-0}" -gt 0 ]; then
155
+ block_continue "[CurDX-Flow stop-hook] ALL_TASKS_COMPLETE was emitted, but tasks.md still has ${UNCHECKED} unchecked task(s). Read .flow/specs/${ACTIVE}/tasks.md, complete only the remaining unchecked tasks, update tasks.md, then emit ALL_TASKS_COMPLETE again."
156
+ fi
157
+ mark_execute_complete
114
158
  allow_stop
115
159
  fi
116
160
 
117
- # Check for fail signal (accumulate; actual stop decision below)
118
- if echo "$TRANSCRIPT_TAIL" | grep -q "TASK_FAILED"; then
161
+ # Check for the latest fail signal (accumulate; actual stop decision below)
162
+ if printf '%s' "$LAST_TASK_SIGNAL" | grep -q "^TASK_FAILED"; then
163
+ FAILED_TASK="$(failed_task_id "$LAST_TASK_SIGNAL")"
164
+ [ -z "$FAILED_TASK" ] && FAILED_TASK="the current task"
165
+
119
166
  # Increment failed_attempts
120
167
  python3 <<'PY' 2>/dev/null
121
168
  import json, os
@@ -127,6 +174,14 @@ json.dump(s, open(p, "w"), indent=2, ensure_ascii=False)
127
174
  PY
128
175
  # Re-read — again via os.environ, no shell interpolation into python.
129
176
  FAILED=$(python3 -c 'import json, os; print(json.load(open(os.environ["STATE_FILE"]))["execute_state"]["failed_attempts"])' 2>/dev/null || echo 0)
177
+
178
+ if [ "${FAILED:-0}" -lt 3 ]; then
179
+ if [ "${RECOVERY_MODE:-manual}" = "fix-task" ]; then
180
+ block_continue "[CurDX-Flow stop-hook] TASK_FAILED observed for ${FAILED_TASK}. Do not skip it. Recovery mode is fix-task: insert one targeted [FIX ${FAILED_TASK}] task immediately after the failed task in tasks.md (max ${MAX_FIX_TASKS:-2} fix task(s) per original), update .state.json execute_state.fix_task_map, then execute the fix task before retrying ${FAILED_TASK}. The fix task must include Do, Files, Done when, Verify, and Commit fields."
181
+ fi
182
+
183
+ block_continue "[CurDX-Flow stop-hook] TASK_FAILED observed for ${FAILED_TASK}. Do not advance past the failed task. Re-read tasks.md, perform root-cause analysis, retry the first unchecked task, and emit TASK_COMPLETE only after its Verify command passes. failed_attempts=${FAILED}/3."
184
+ fi
130
185
  fi
131
186
 
132
187
  # ---------- 6. Safety brakes ----------
@@ -142,15 +197,11 @@ fi
142
197
 
143
198
  # Check if all tasks done
144
199
  if [ "$TASK_INDEX" -ge "$TOTAL_TASKS" ] && [ "$TOTAL_TASKS" -gt 0 ]; then
145
- # Mark complete
146
- python3 <<'PY' 2>/dev/null
147
- import json, os
148
- p = os.environ["STATE_FILE"]
149
- s = json.load(open(p))
150
- s.setdefault("phase_status", {})["execute"] = "completed"
151
- s["phase"] = "verify"
152
- json.dump(s, open(p, "w"), indent=2, ensure_ascii=False)
153
- PY
200
+ UNCHECKED="$(unchecked_task_count)"
201
+ if [ "${UNCHECKED:-0}" -gt 0 ]; then
202
+ block_continue "[CurDX-Flow stop-hook] State says execute is complete (${TASK_INDEX}/${TOTAL_TASKS}), but tasks.md still has ${UNCHECKED} unchecked task(s). Continue with the first unchecked task; do not add new tasks."
203
+ fi
204
+ mark_execute_complete
154
205
  allow_stop
155
206
  fi
156
207
 
@@ -0,0 +1,159 @@
1
+ #!/usr/bin/env bash
2
+ # CurDX-Flow SubagentStop Hook
3
+ # Blocks successful subagent completion if the expected artifact never landed on disk.
4
+ #
5
+ # Why:
6
+ # - long markdown/report-writing agents can truncate near the end of a run
7
+ # - users then see a cheerful success summary but the actual file is missing or tiny
8
+ # - latest Claude Code hooks expose SubagentStop with agent_type + last_assistant_message,
9
+ # which lets us guard only "success-looking" exits while allowing genuine precondition failures
10
+
11
+ set -u
12
+
13
+ SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
14
+ . "$SCRIPT_DIR/common.sh"
15
+
16
+ INPUT="$(cat 2>/dev/null || echo "{}")"
17
+
18
+ if ! has_python3; then
19
+ # Without JSON parsing, fail open rather than blocking subagents blindly.
20
+ exit 0
21
+ fi
22
+
23
+ export SUBAGENT_GUARD_INPUT="$INPUT"
24
+
25
+ AGENT_TYPE="$(python3 -c 'import json, os
26
+ try:
27
+ data = json.loads(os.environ["SUBAGENT_GUARD_INPUT"])
28
+ print(data.get("agent_type", ""))
29
+ except Exception:
30
+ print("")
31
+ ' 2>/dev/null)"
32
+
33
+ LAST_MESSAGE="$(python3 -c 'import json, os
34
+ try:
35
+ data = json.loads(os.environ["SUBAGENT_GUARD_INPUT"])
36
+ print((data.get("last_assistant_message") or "").strip())
37
+ except Exception:
38
+ print("")
39
+ ' 2>/dev/null)"
40
+
41
+ looks_like_success() {
42
+ local msg="${1:-}"
43
+ printf '%s' "$msg" | grep -Eq '(^✓| generated$| generated\n|Wrote |Review complete|Requirements done|Research complete|UI Sketch generation complete|Report:|Next:|TASK_COMPLETE|ALL_TASKS_COMPLETE)'
44
+ }
45
+
46
+ active_spec_path() {
47
+ local file_name="$1"
48
+
49
+ [ ! -d ".flow" ] && return 1
50
+
51
+ local active
52
+ active="$(cat .flow/.active-spec 2>/dev/null)"
53
+ [ -z "$active" ] && return 1
54
+
55
+ printf '.flow/specs/%s/%s\n' "$active" "$file_name"
56
+ }
57
+
58
+ completed_task_id() {
59
+ local msg="${1:-}"
60
+ printf '%s' "$msg" | sed -nE 's/.*TASK_COMPLETE:[[:space:]]*([0-9]+(\.([0-9]+|VF|X(\+[0-9]+)?))*).*/\1/p' | head -1
61
+ }
62
+
63
+ artifact_target=""
64
+ minimum_size=200
65
+
66
+ case "$AGENT_TYPE" in
67
+ flow-researcher)
68
+ artifact_target="$(active_spec_path research.md)" || exit 0
69
+ minimum_size=400
70
+ ;;
71
+ flow-product-designer)
72
+ artifact_target="$(active_spec_path requirements.md)" || exit 0
73
+ minimum_size=400
74
+ ;;
75
+ flow-architect)
76
+ artifact_target="$(active_spec_path design.md)" || exit 0
77
+ minimum_size=400
78
+ ;;
79
+ flow-planner)
80
+ artifact_target="$(active_spec_path tasks.md)" || exit 0
81
+ minimum_size=400
82
+ ;;
83
+ flow-executor)
84
+ artifact_target="$(active_spec_path tasks.md)" || exit 0
85
+ if printf '%s' "$LAST_MESSAGE" | grep -q 'ALL_TASKS_COMPLETE'; then
86
+ exit 0
87
+ fi
88
+ task_id="$(completed_task_id "$LAST_MESSAGE")"
89
+ [ -z "$task_id" ] && exit 0
90
+ if grep -Eq "^- \\[x\\] \\*\\*${task_id//./\\.}\\*\\*" "$artifact_target" 2>/dev/null; then
91
+ exit 0
92
+ fi
93
+ emit_subagentstop_block "[CurDX-Flow subagent-artifact-guard] flow-executor emitted TASK_COMPLETE: ${task_id}, but ${artifact_target} does not mark that task as [x]. Update tasks.md and the spec progress/state before stopping."
94
+ exit 0
95
+ ;;
96
+ flow-debugger)
97
+ artifact_target="$(active_spec_path debug-report.md)" || exit 0
98
+ minimum_size=250
99
+ ;;
100
+ flow-triage-analyst)
101
+ artifact_target="$(active_spec_path triage-report.md)" || exit 0
102
+ minimum_size=250
103
+ ;;
104
+ flow-ux-designer)
105
+ artifact_target="$(active_spec_path ui-sketch.md)" || exit 0
106
+ minimum_size=250
107
+ ;;
108
+ flow-reviewer)
109
+ artifact_target="$(active_spec_path review-report.md)" || exit 0
110
+ minimum_size=300
111
+ ;;
112
+ flow-verifier)
113
+ artifact_target="$(active_spec_path verification-report.md)" || exit 0
114
+ minimum_size=300
115
+ ;;
116
+ flow-security-auditor)
117
+ artifact_target="$(active_spec_path security-audit.md)" || exit 0
118
+ minimum_size=250
119
+ ;;
120
+ flow-qa-engineer)
121
+ artifact_target="$(active_spec_path qa-report.md)" || exit 0
122
+ minimum_size=250
123
+ ;;
124
+ flow-edge-hunter)
125
+ artifact_target="$(active_spec_path edge-cases.md)" || exit 0
126
+ minimum_size=250
127
+ ;;
128
+ flow-adversary)
129
+ artifact_target="$(active_spec_path adversarial-review.md)" || exit 0
130
+ minimum_size=250
131
+ ;;
132
+ flow-ui-researcher)
133
+ artifact_target="$(active_spec_path ui-research.md)" || exit 0
134
+ minimum_size=250
135
+ ;;
136
+ flow-brownfield-analyst)
137
+ artifact_target=".flow/codebase-index.md"
138
+ minimum_size=250
139
+ ;;
140
+ *)
141
+ exit 0
142
+ ;;
143
+ esac
144
+
145
+ if [ -f "$artifact_target" ]; then
146
+ size="$(wc -c < "$artifact_target" 2>/dev/null | tr -d ' ')"
147
+ if [ "${size:-0}" -ge "$minimum_size" ]; then
148
+ exit 0
149
+ fi
150
+ fi
151
+
152
+ if ! looks_like_success "$LAST_MESSAGE"; then
153
+ # The subagent appears to be stopping because of a real precondition failure,
154
+ # clarification request, or other non-success path. Let that response through.
155
+ exit 0
156
+ fi
157
+
158
+ emit_subagentstop_block "[CurDX-Flow subagent-artifact-guard] ${AGENT_TYPE} is stopping with a success summary, but ${artifact_target} is missing or too small. Write the full artifact to disk first, then respond with the minimal completion summary only."
159
+ exit 0
@@ -0,0 +1,105 @@
1
+ #!/usr/bin/env bash
2
+ # CurDX-Flow subagentStatusLine command.
3
+ # Reads the official subagent status-line JSON payload from stdin and emits
4
+ # one JSON line per CurDX-Flow subagent row that should be overridden.
5
+
6
+ set -u
7
+
8
+ SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
9
+ . "$SCRIPT_DIR/common.sh"
10
+
11
+ INPUT="$(cat 2>/dev/null || echo "{}")"
12
+
13
+ if ! has_python3; then
14
+ exit 0
15
+ fi
16
+
17
+ export CURDX_SUBAGENT_STATUSLINE_INPUT="$INPUT"
18
+
19
+ python3 <<'PY'
20
+ import json
21
+ import os
22
+ from pathlib import Path
23
+
24
+ try:
25
+ data = json.loads(os.environ.get("CURDX_SUBAGENT_STATUSLINE_INPUT", "{}"))
26
+ except Exception:
27
+ raise SystemExit(0)
28
+
29
+ columns = data.get("columns")
30
+ try:
31
+ columns = int(columns)
32
+ except Exception:
33
+ columns = 100
34
+ columns = max(40, columns)
35
+
36
+
37
+ def flow_type(task):
38
+ raw = str(task.get("type") or task.get("name") or task.get("label") or "")
39
+ if raw.startswith("curdx-flow:"):
40
+ raw = raw.split(":", 1)[1]
41
+ return raw
42
+
43
+
44
+ def is_flow_task(task):
45
+ values = [
46
+ task.get("type"),
47
+ task.get("name"),
48
+ task.get("label"),
49
+ task.get("description"),
50
+ ]
51
+ return any(str(value or "").startswith(("flow-", "curdx-flow:flow-")) for value in values)
52
+
53
+
54
+ def active_spec(cwd):
55
+ if not cwd:
56
+ return ""
57
+ try:
58
+ value = (Path(cwd) / ".flow" / ".active-spec").read_text(encoding="utf-8").strip()
59
+ except Exception:
60
+ return ""
61
+ return value[:40]
62
+
63
+
64
+ def token_label(value):
65
+ try:
66
+ count = int(value)
67
+ except Exception:
68
+ return ""
69
+ if count >= 1000:
70
+ return f"{count / 1000:.1f}k tok"
71
+ return f"{count} tok"
72
+
73
+
74
+ def clamp(text):
75
+ if len(text) <= columns:
76
+ return text
77
+ if columns <= 4:
78
+ return text[:columns]
79
+ return text[: columns - 3] + "..."
80
+
81
+
82
+ for task in data.get("tasks") or []:
83
+ if not isinstance(task, dict) or not is_flow_task(task):
84
+ continue
85
+
86
+ task_id = task.get("id")
87
+ if not task_id:
88
+ continue
89
+
90
+ parts = ["[curdx-flow]", flow_type(task) or "flow-agent"]
91
+
92
+ status = str(task.get("status") or "").strip()
93
+ if status:
94
+ parts.append(status)
95
+
96
+ spec = active_spec(task.get("cwd"))
97
+ if spec:
98
+ parts.append(f"spec:{spec}")
99
+
100
+ tokens = token_label(task.get("tokenCount"))
101
+ if tokens:
102
+ parts.append(tokens)
103
+
104
+ print(json.dumps({"id": str(task_id), "content": clamp(" | ".join(parts))}, ensure_ascii=True))
105
+ PY
@@ -230,7 +230,7 @@ PR review reads each commit's message.
230
230
  - Good commit message → reviewer finishes in 5 minutes
231
231
  - Bad commit message → reviewer either rubber-stamps or blocks without reading
232
232
 
233
- CurdX-Flow's `/curdx-flow:ship` command (Phase 6) will turn atomic commits into a clean PR description. Poor commit quality yields poor PR descriptions.
233
+ CurdX-Flow's review handoff expects atomic commits plus verification/review reports. Poor commit quality yields poor PR descriptions and weak release evidence.
234
234
 
235
235
  ---
236
236
 
@@ -0,0 +1,203 @@
1
+ # Claude Code Runtime Contracts — CurDX-Flow Notes
2
+
3
+ CurDX-Flow depends on Claude Code's plugin, hook, skill, and subagent runtime surfaces. This page records the operational contracts we rely on so agents and maintainers do not drift from the current official behavior.
4
+
5
+ ## Source of Truth
6
+
7
+ - Official docs entry: `https://code.claude.com/docs/en/overview`
8
+ - Runtime-specific pages to re-check when changing behavior:
9
+ - Hooks: `/docs/en/hooks`
10
+ - Subagents: `/docs/en/sub-agents`
11
+ - Skills: `/docs/en/skills`
12
+ - Commands: `/docs/en/commands`
13
+ - Plugins: `/docs/en/plugins`
14
+ - Settings: `/docs/en/settings`
15
+ - Plugin manifest reference: `/docs/en/plugins-reference`
16
+ - Output styles: `/docs/en/output-styles`
17
+ - Status line: `/docs/en/statusline`
18
+ - Plugin dependency constraints: `/docs/en/plugin-dependencies`
19
+ - Routines / scheduled tasks: `/docs/en/routines`, `/docs/en/scheduled-tasks`
20
+
21
+ When a behavior is unclear, prefer the official docs and `claude plugin validate .` over inferred behavior from older examples.
22
+
23
+ ## Hook Output Rules
24
+
25
+ - `SessionStart` context injection must use:
26
+ - `hookSpecificOutput.hookEventName = "SessionStart"`
27
+ - `hookSpecificOutput.additionalContext = "..."`
28
+ - Persistent environment for later hook/script invocations must be written to `CLAUDE_ENV_FILE` as shell exports. Do not invent a JSON top-level `environmentVariables` field.
29
+ - `Stop` / `SubagentStop` continuation blocking uses top-level `decision: "block"` plus `reason`.
30
+ - `PreToolUse` denial uses `hookSpecificOutput.permissionDecision = "deny"` and `permissionDecisionReason`.
31
+ - `PreToolUse` also supports `hookSpecificOutput.permissionDecision = "defer"` for deferred tool handling in `-p` / SDK-style flows; do not assume deny/allow are the only valid permission outcomes.
32
+ - `PermissionDenied` can return `{ "retry": true }` to let Claude try a different approach after an auto-mode classifier denial.
33
+ - Hooks must fail open when runtime prerequisites are missing (`python3`, malformed stdin JSON, absent `.flow/` state). The exception is an explicit, success-looking subagent completion with a missing required artifact.
34
+
35
+ ## Subagent Artifact Discipline
36
+
37
+ Subagents that produce long reports must write the artifact before producing the final assistant summary. The final summary should be short and point to the file path.
38
+
39
+ Guarded artifact targets:
40
+
41
+ | Agent | Expected artifact |
42
+ | --- | --- |
43
+ | `flow-researcher` | `.flow/specs/<active>/research.md` |
44
+ | `flow-product-designer` | `.flow/specs/<active>/requirements.md` |
45
+ | `flow-architect` | `.flow/specs/<active>/design.md` |
46
+ | `flow-planner` | `.flow/specs/<active>/tasks.md` |
47
+ | `flow-reviewer` | `.flow/specs/<active>/review-report.md` |
48
+ | `flow-verifier` | `.flow/specs/<active>/verification-report.md` |
49
+ | `flow-security-auditor` | `.flow/specs/<active>/security-audit.md` |
50
+ | `flow-qa-engineer` | `.flow/specs/<active>/qa-report.md` |
51
+ | `flow-edge-hunter` | `.flow/specs/<active>/edge-cases.md` |
52
+ | `flow-adversary` | `.flow/specs/<active>/adversarial-review.md` |
53
+ | `flow-ui-researcher` | `.flow/specs/<active>/ui-research.md` |
54
+ | `flow-brownfield-analyst` | `.flow/codebase-index.md` |
55
+
56
+ `flow-executor` is marker-driven rather than report-driven: it must update task state and end with `TASK_COMPLETE: <task_id>` or `TASK_FAILED: <task_id>`.
57
+
58
+ ## Agent Teams Compatibility
59
+
60
+ - Official `agent-teams` behavior differs from regular subagent invocation in one critical way: when a subagent definition runs as a teammate, its `skills` and `mcpServers` frontmatter fields are not applied.
61
+ - Team coordination tools remain available to teammates, but any agent that relies on a preloaded skill must also have access to the `Skill` tool so it can invoke that skill explicitly when used as a teammate.
62
+ - A project file like `.claude/teams/teams.json` is not configuration. Official docs say team config lives under user scope, not project scope.
63
+
64
+ ## Skills and Frontmatter
65
+
66
+ - Keep `SKILL.md` frontmatter minimal and schema-backed.
67
+ - Use `description` for the concise trigger phrase; put longer trigger examples in `when_to_use`.
68
+ - Use forked context and a named agent only when the skill's work benefits from isolation or a specialized role.
69
+ - Avoid preloading broad tool access. Prefer the smallest useful tool set per skill/agent.
70
+ - Do not make bundled skills or agents implicitly depend on runtime-gated tools such as `SendMessage`, `TeamCreate`, `TeamDelete`, or `ToolSearch` unless CurDX-Flow also ships the matching feature-flag/setup contract.
71
+
72
+ ## Plugin Settings
73
+
74
+ - Claude Code plugin-root `settings.json` currently supports only `agent` and `subagentStatusLine`.
75
+ - CurDX-Flow ships only `subagentStatusLine`, pointing at `${CLAUDE_PLUGIN_ROOT}/hooks/scripts/subagent-statusline.sh`.
76
+ - The status-line script must fail open on malformed input or missing `python3`; UI decoration must never break agent execution.
77
+ - Plugin-root references must never traverse outside the plugin directory. Installed marketplace plugins run from Claude Code's plugin cache, so parent-directory references are invalid even if they work in a development checkout.
78
+ - If adding plugin settings, update `schemas/plugin-settings.schema.json`, `test/plugin-structure-contract.test.js`, `test/pack-tarball-smoke.test.js`, and `scripts/validate-plugin-contracts.mjs` in the same change.
79
+
80
+ ## Plugin Dependency Constraints
81
+
82
+ - Official dependency version constraints require upstream plugin release tags in the `{plugin-name}--v{version}` format.
83
+ - Do not add a version constraint to the `context7-plugin` dependency unless the Upstash marketplace has matching `context7-plugin--v*` tags. A semver range without those tags can disable dependency resolution.
84
+ - Keep the CLI registry and `.claude-plugin/plugin.json` dependency entry aligned: Context7 remains a required companion plugin, while optional tools stay in `RECOMMENDED_PLUGINS`.
85
+
86
+ ## Shared Settings Guardrails
87
+
88
+ - `.claude/settings.json` is a shared project surface. Keep machine-local scripts, secrets, and credential helpers out of it.
89
+ - Official docs say these keys are ignored or not accepted at project scope and must live in user/local/managed settings instead:
90
+ - `autoMemoryDirectory`
91
+ - `autoMode`
92
+ - `useAutoModeDuringPlan`
93
+ - `permissions.skipDangerousModePermissionPrompt`
94
+ - `sshConfigs`
95
+ - `teammateMode` belongs in the global `~/.claude.json` config, not project `settings.json`.
96
+ - Treat shared auto-approval settings as high risk:
97
+ - `enableAllProjectMcpServers`
98
+ - `enabledMcpjsonServers`
99
+ - Treat shared hook and skill policy as behavior-changing:
100
+ - `disableSkillShellExecution: true` replaces inline shell output in project/plugin skills and commands with a disabled placeholder.
101
+ - Empty `allowedHttpHookUrls` blocks all HTTP hook targets.
102
+ - Empty `httpHookAllowedEnvVars` prevents HTTP hook header environment interpolation.
103
+ - Treat shared `env` injection as behavior-changing when it flips Claude runtime modes:
104
+ - `CLAUDE_CODE_SIMPLE=1` puts Claude Code into bare/simple mode and disables hooks, skills, plugins, MCP discovery, auto memory, and `CLAUDE.md`.
105
+ - `CLAUDE_CODE_SIMPLE_SYSTEM_PROMPT=1` keeps discovery enabled but swaps in the minimal Claude system prompt.
106
+ - `CLAUDE_CODE_EFFORT_LEVEL=low|medium` lowers reasoning for every collaborator session.
107
+ - `CLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMS=1` enables experimental teammate surfaces for every collaborator session.
108
+ - Provider-specific pinned model IDs (`ANTHROPIC_DEFAULT_*_MODEL`, `ANTHROPIC_CUSTOM_MODEL_OPTION`) should usually be paired with `_SUPPORTED_CAPABILITIES` so Claude keeps effort / thinking feature detection.
109
+ - In CI / headless runs, `CLAUDE_CODE_SYNC_PLUGIN_INSTALL=1` makes marketplace plugins available before the first turn; otherwise they can install in the background and miss turn one.
110
+ - `CLAUDE_CODE_PLUGIN_SEED_DIR` is the official way to pre-populate marketplace plugins in containers and CI images.
111
+ - Treat shared sandbox policy as runtime-sensitive:
112
+ - `sandbox.failIfUnavailable: true` can fail Claude Code startup on unsupported hosts.
113
+ - `sandbox.filesystem.denyRead` / `denyWrite` must not block `.flow`, `.git`, or the project root.
114
+ - Empty `sandbox.network.allowedDomains` blocks outbound network access for sandboxed commands.
115
+ - Prefer `attribution` over deprecated `includeCoAuthoredBy`.
116
+ - Treat shared runtime blockers as high risk for CurDX-Flow:
117
+ - `disableAllHooks: true` disables stop-hook recovery, artifact guards, and custom status lines.
118
+ - `agent: "<name>"` routes the main thread through a named subagent, replacing the normal CurDX-Flow prompt, tool surface, and model for the whole session.
119
+ - `permissions.defaultMode: "dontAsk"` can auto-deny clarification and Agent dispatch prompts.
120
+ - `permissions.deny` rules for `Agent`, `AskUserQuestion`, CurDX-Flow `flow-*` agents, or broad `Bash` / `Monitor` / `Read` / `Write` / `Edit` / `Grep` / `Glob` tools can make workflows fail.
121
+ - `availableModels` must include the portable `sonnet` and `opus` aliases used by bundled agents.
122
+ - Shared `effortLevel: "low"` or `"medium"` may underpower main-thread planning/review turns; prefer `high` / `xhigh` for CurDX-Flow-heavy projects.
123
+ - `CLAUDE_CODE_SIMPLE=1` in the launch environment is a hard runtime blocker for CurDX-Flow because Claude stops discovering plugin assets and `CLAUDE.md`.
124
+ - `CLAUDE_CODE_SIMPLE_SYSTEM_PROMPT=1` in the launch environment is not a hard blocker, but it weakens the normal Claude Code system prompt CurDX-Flow expects.
125
+ - Provider-specific model IDs in `ANTHROPIC_DEFAULT_*_MODEL` or `ANTHROPIC_CUSTOM_MODEL_OPTION` can disable feature detection for effort and thinking unless the matching `_SUPPORTED_CAPABILITIES` env var is declared.
126
+ - In CI / `claude -p` runs that depend on marketplace plugins, missing `CLAUDE_CODE_SYNC_PLUGIN_INSTALL=1` (or a seeded plugin cache via `CLAUDE_CODE_PLUGIN_SEED_DIR`) can leave plugins unavailable on the first turn.
127
+ - Prefer `claude --bare -p` for CI / scripted runs so hooks, skills, plugins, MCP discovery, auto memory, and `CLAUDE.md` do not vary by machine; add `--plugin-dir`, `--settings`, and `--mcp-config` explicitly when needed.
128
+ - Do not depend on interactive `/curdx-flow:*` slash commands in `claude -p`; scripted runs should ask for the desired outcome directly.
129
+ - `settings.json` does not accept `effortLevel: "max"`; official docs reserve `max` for session-only `/effort` (or `CLAUDE_CODE_EFFORT_LEVEL`), so do not commit it to shared project settings.
130
+ - `enabledPlugins` entries set to `false` for `curdx-flow@curdx-flow-marketplace` or required companion plugins override user-level installs in that project.
131
+
132
+ ## Browser and UI Verification
133
+
134
+ For UI-facing acceptance criteria, code inspection and DOM unit tests are not sufficient evidence. Use `chrome-devtools` MCP when available to drive the real browser, capture screenshots, list console messages, and inspect network requests. If the MCP is unavailable, mark UI-facing acceptance criteria as unverified instead of silently passing them.
135
+
136
+
137
+ ## Reality Verification Contract
138
+
139
+ For fix/debug/regression specs, green tests alone do not prove the user-visible problem was fixed. The workflow must preserve a BEFORE/AFTER evidence trail:
140
+
141
+ 1. BEFORE: record the original reproduction command, observed failure output, and timestamp in `.progress.md` before changing code.
142
+ 2. FIX: change the smallest root cause and run the task's Verify command.
143
+ 3. AFTER: rerun the original reproduction command and compare output against BEFORE.
144
+ 4. COMPLETE: write `Verified: Issue resolved` only when the original failure is gone.
145
+
146
+ Planner duties:
147
+ - Add a `VF` task for fix/debug specs unless `STATE.md` has an explicit D-NN waiver.
148
+ - Treat missing `VF` coverage as a coverage-audit gap.
149
+
150
+ Executor duties:
151
+ - Do not mark `VF` complete unless `.progress.md` has the BEFORE/AFTER comparison.
152
+ - Use the same reproduction command for AFTER unless a documented D-NN explains why the command changed.
153
+
154
+ Verifier duties:
155
+ - Mark fix/debug specs `PARTIAL` when BEFORE/AFTER evidence is missing, even if the normal test suite is green.
156
+
157
+ ## Task Split Contract
158
+
159
+ When a task is too broad, under-specified, or unsafe to complete surgically, the executor must stop rather than expand scope. It returns `TASK_FAILED` with a split proposal containing at most 3 replacement tasks, each with `Do`, `Files`, `Done when`, `Verify`, and `Commit` fields.
160
+
161
+ The coordinator or planner owns updates to `tasks.md`. An executor must not create new tasks and execute them in the same turn.
162
+
163
+ ## Failure Recovery Contract
164
+
165
+ Execution failure recovery is ledger-first:
166
+
167
+ - Default `manual` recovery blocks progress past `TASK_FAILED`; retry the first unchecked task after root-cause analysis.
168
+ - `fix-task` recovery may create one targeted `[FIX <task_id>]` task immediately after the failed task, but only before execution resumes.
169
+ - `.state.json` `execute_state.fix_task_map` records attempts, generated fix task ids, and the last error per original task.
170
+ - `max_fix_tasks_per_original` is a hard ceiling, not a suggestion.
171
+
172
+ Generated fix tasks must include `Do`, `Files`, `Done when`, `Verify`, and `Commit`. A recovery task that cannot name a verification command is not actionable and should stop for user input rather than guessing.
173
+
174
+ ## Stop-Hook Recovery Contract
175
+
176
+ The stop-hook strategy must never trust one source of completion by itself:
177
+
178
+ - `.state.json` tracks execution cursor and phase.
179
+ - `tasks.md` is the task ledger; unchecked tasks mean work remains.
180
+ - `ALL_TASKS_COMPLETE` is a signal, not proof.
181
+
182
+ Completion requires both completed state and zero unchecked tasks. If they disagree, continue `tasks.md`'s unchecked tasks and do not add new tasks. When Claude Code sends `stop_hook_active=true`, allow stop to prevent recursive stop-hook loops; resume from persisted state on the next turn.
183
+
184
+ ## Status / Cancel Contract
185
+
186
+ `/curdx-flow:status` is read-only. It must compare both machine state (`.state.json`) and human task ledger (`tasks.md`) before reporting health. If they disagree, report `NEEDS_ATTENTION` and give one concrete recovery command.
187
+
188
+ `/curdx-flow:cancel` is non-destructive by default. It cancels execution state while preserving spec artifacts, progress, reports, and project-level `.flow` files. Deleting a spec requires both `--delete-spec` and `--yes`.
189
+
190
+ If state JSON is corrupt, preserve it by renaming to `.state.json.corrupt.<timestamp>` rather than deleting it. Recovery commands should prefer `/curdx-flow:status` followed by `/curdx-flow:implement --strategy=subagent`.
191
+
192
+ ## Test Quality Contract
193
+
194
+ Tests used as FR/AC evidence must exercise real behavior. Mock-only tests are not proof of implementation.
195
+
196
+ Blocking evidence problems:
197
+ - The test only asserts mock/spies were called.
198
+ - The real module/function under test is not invoked.
199
+ - The test is skipped, assertion-free, or would pass with an empty implementation.
200
+ - Mock setup overwhelms behavioral assertions and no integration/e2e backup exists.
201
+ - Stateful mocks are not cleaned up between tests.
202
+
203
+ Mocks are acceptable for boundaries (network, payment provider, clock/randomness) when the assertion still verifies production logic. If a requirement is backed only by weak tests, `/curdx-flow:verify` and `/curdx-flow:review` must not return full PASS.
@@ -250,7 +250,7 @@ Week 5-6: Spec 4 (refund) + Spec 5 (query)
250
250
  /curdx-flow:review
251
251
 
252
252
  5. All sub-specs done → Epic complete
253
- 6. /curdx-flow:retro (Phase 6+) for Epic-level retrospective
253
+ 6. Record an epic-level retrospective in `.flow/_epics/<name>/epic.md`
254
254
  ```
255
255
 
256
256
  ---