@bhargavvc/sdd-cc 1.30.0 → 1.35.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (242) hide show
  1. package/README.ja-JP.md +144 -110
  2. package/README.ko-KR.md +143 -107
  3. package/README.md +183 -112
  4. package/README.pt-BR.md +90 -52
  5. package/README.zh-CN.md +141 -101
  6. package/agents/sdd-advisor-researcher.md +23 -0
  7. package/agents/sdd-ai-researcher.md +133 -0
  8. package/agents/sdd-code-fixer.md +516 -0
  9. package/agents/sdd-code-reviewer.md +355 -0
  10. package/agents/sdd-codebase-mapper.md +3 -3
  11. package/agents/sdd-debugger.md +17 -5
  12. package/agents/sdd-doc-verifier.md +201 -0
  13. package/agents/sdd-doc-writer.md +602 -0
  14. package/agents/sdd-domain-researcher.md +153 -0
  15. package/agents/sdd-eval-auditor.md +164 -0
  16. package/agents/sdd-eval-planner.md +154 -0
  17. package/agents/sdd-executor.md +87 -4
  18. package/agents/sdd-framework-selector.md +160 -0
  19. package/agents/sdd-intel-updater.md +314 -0
  20. package/agents/sdd-nyquist-auditor.md +1 -1
  21. package/agents/sdd-phase-researcher.md +71 -4
  22. package/agents/sdd-plan-checker.md +100 -6
  23. package/agents/sdd-planner.md +145 -206
  24. package/agents/sdd-project-researcher.md +25 -2
  25. package/agents/sdd-research-synthesizer.md +3 -3
  26. package/agents/sdd-roadmapper.md +6 -6
  27. package/agents/sdd-security-auditor.md +128 -0
  28. package/agents/sdd-ui-auditor.md +43 -3
  29. package/agents/sdd-ui-checker.md +5 -5
  30. package/agents/sdd-ui-researcher.md +27 -4
  31. package/agents/sdd-user-profiler.md +2 -2
  32. package/agents/sdd-verifier.md +142 -22
  33. package/bin/install.js +2151 -551
  34. package/commands/sdd/add-backlog.md +5 -5
  35. package/commands/sdd/add-tests.md +2 -2
  36. package/commands/sdd/ai-integration-phase.md +36 -0
  37. package/commands/sdd/analyze-dependencies.md +34 -0
  38. package/commands/sdd/audit-fix.md +33 -0
  39. package/commands/sdd/autonomous.md +7 -2
  40. package/commands/sdd/cleanup.md +5 -0
  41. package/commands/sdd/code-review-fix.md +52 -0
  42. package/commands/sdd/code-review.md +55 -0
  43. package/commands/sdd/complete-milestone.md +6 -6
  44. package/commands/sdd/debug.md +22 -9
  45. package/commands/sdd/discuss-phase.md +7 -2
  46. package/commands/sdd/do.md +1 -1
  47. package/commands/sdd/docs-update.md +48 -0
  48. package/commands/sdd/eval-review.md +32 -0
  49. package/commands/sdd/execute-phase.md +4 -0
  50. package/commands/sdd/explore.md +27 -0
  51. package/commands/sdd/fast.md +2 -2
  52. package/commands/sdd/from-sdd2.md +45 -0
  53. package/commands/sdd/help.md +2 -0
  54. package/commands/sdd/import.md +36 -0
  55. package/commands/sdd/intel.md +179 -0
  56. package/commands/sdd/join-discord.md +2 -1
  57. package/commands/sdd/manager.md +1 -0
  58. package/commands/sdd/map-codebase.md +3 -3
  59. package/commands/sdd/new-milestone.md +1 -1
  60. package/commands/sdd/new-project.md +5 -1
  61. package/commands/sdd/new-workspace.md +1 -1
  62. package/commands/sdd/next.md +2 -0
  63. package/commands/sdd/plan-milestone-gaps.md +2 -2
  64. package/commands/sdd/plan-phase.md +6 -1
  65. package/commands/sdd/plant-seed.md +1 -1
  66. package/commands/sdd/profile-user.md +1 -1
  67. package/commands/sdd/quick.md +5 -3
  68. package/commands/sdd/reapply-patches.md +230 -42
  69. package/commands/sdd/research-phase.md +3 -3
  70. package/commands/sdd/review-backlog.md +1 -0
  71. package/commands/sdd/review.md +6 -3
  72. package/commands/sdd/scan.md +26 -0
  73. package/commands/sdd/secure-phase.md +35 -0
  74. package/commands/sdd/ship.md +1 -1
  75. package/commands/sdd/thread.md +5 -5
  76. package/commands/sdd/undo.md +34 -0
  77. package/commands/sdd/verify-work.md +1 -1
  78. package/commands/sdd/workstreams.md +17 -11
  79. package/hooks/dist/sdd-check-update.js +33 -8
  80. package/hooks/dist/sdd-context-monitor.js +17 -8
  81. package/hooks/dist/sdd-phase-boundary.sh +27 -0
  82. package/hooks/dist/sdd-prompt-guard.js +1 -0
  83. package/hooks/dist/sdd-read-guard.js +82 -0
  84. package/hooks/dist/sdd-session-state.sh +33 -0
  85. package/hooks/dist/sdd-statusline.js +137 -15
  86. package/hooks/dist/sdd-validate-commit.sh +47 -0
  87. package/hooks/dist/sdd-workflow-guard.js +4 -4
  88. package/hooks/sdd-check-update.js +139 -0
  89. package/hooks/sdd-context-monitor.js +165 -0
  90. package/hooks/sdd-phase-boundary.sh +27 -0
  91. package/hooks/sdd-prompt-guard.js +97 -0
  92. package/hooks/sdd-read-guard.js +82 -0
  93. package/hooks/sdd-session-state.sh +33 -0
  94. package/hooks/sdd-statusline.js +241 -0
  95. package/hooks/sdd-validate-commit.sh +47 -0
  96. package/hooks/sdd-workflow-guard.js +94 -0
  97. package/package.json +3 -3
  98. package/scripts/build-hooks.js +18 -7
  99. package/scripts/prompt-injection-scan.sh +1 -0
  100. package/scripts/rebrand-gsd-to-sdd.sh +221 -220
  101. package/scripts/run-tests.cjs +5 -1
  102. package/scripts/sync-upstream.sh +1 -1
  103. package/sdd/bin/lib/commands.cjs +79 -17
  104. package/sdd/bin/lib/config.cjs +90 -48
  105. package/sdd/bin/lib/core.cjs +452 -87
  106. package/sdd/bin/lib/docs.cjs +267 -0
  107. package/sdd/bin/lib/frontmatter.cjs +381 -336
  108. package/sdd/bin/lib/init.cjs +110 -16
  109. package/sdd/bin/lib/intel.cjs +660 -0
  110. package/sdd/bin/lib/learnings.cjs +378 -0
  111. package/sdd/bin/lib/milestone.cjs +42 -11
  112. package/sdd/bin/lib/model-profiles.cjs +17 -15
  113. package/sdd/bin/lib/phase.cjs +367 -288
  114. package/sdd/bin/lib/profile-output.cjs +106 -10
  115. package/sdd/bin/lib/roadmap.cjs +146 -115
  116. package/sdd/bin/lib/schema-detect.cjs +238 -0
  117. package/sdd/bin/lib/sdd2-import.cjs +511 -0
  118. package/sdd/bin/lib/security.cjs +124 -3
  119. package/sdd/bin/lib/state.cjs +648 -264
  120. package/sdd/bin/lib/template.cjs +8 -4
  121. package/sdd/bin/lib/verify.cjs +209 -28
  122. package/sdd/bin/lib/workstream.cjs +7 -3
  123. package/sdd/bin/sdd-tools.cjs +184 -12
  124. package/sdd/contexts/dev.md +21 -0
  125. package/sdd/contexts/research.md +22 -0
  126. package/sdd/contexts/review.md +22 -0
  127. package/sdd/references/agent-contracts.md +79 -0
  128. package/sdd/references/ai-evals.md +156 -0
  129. package/sdd/references/ai-frameworks.md +186 -0
  130. package/sdd/references/artifact-types.md +113 -0
  131. package/sdd/references/common-bug-patterns.md +114 -0
  132. package/sdd/references/context-budget.md +49 -0
  133. package/sdd/references/continuation-format.md +25 -25
  134. package/sdd/references/domain-probes.md +125 -0
  135. package/sdd/references/few-shot-examples/plan-checker.md +73 -0
  136. package/sdd/references/few-shot-examples/verifier.md +109 -0
  137. package/sdd/references/gate-prompts.md +100 -0
  138. package/sdd/references/gates.md +70 -0
  139. package/sdd/references/git-integration.md +1 -1
  140. package/sdd/references/ios-scaffold.md +123 -0
  141. package/sdd/references/model-profile-resolution.md +2 -0
  142. package/sdd/references/model-profiles.md +24 -18
  143. package/sdd/references/planner-gap-closure.md +62 -0
  144. package/sdd/references/planner-reviews.md +39 -0
  145. package/sdd/references/planner-revision.md +87 -0
  146. package/sdd/references/planning-config.md +252 -0
  147. package/sdd/references/revision-loop.md +97 -0
  148. package/sdd/references/thinking-models-debug.md +44 -0
  149. package/sdd/references/thinking-models-execution.md +50 -0
  150. package/sdd/references/thinking-models-planning.md +62 -0
  151. package/sdd/references/thinking-models-research.md +50 -0
  152. package/sdd/references/thinking-models-verification.md +55 -0
  153. package/sdd/references/thinking-partner.md +96 -0
  154. package/sdd/references/ui-brand.md +4 -4
  155. package/sdd/references/universal-anti-patterns.md +63 -0
  156. package/sdd/references/verification-overrides.md +227 -0
  157. package/sdd/references/workstream-flag.md +56 -3
  158. package/sdd/templates/AI-SPEC.md +246 -0
  159. package/sdd/templates/DEBUG.md +1 -1
  160. package/sdd/templates/SECURITY.md +61 -0
  161. package/sdd/templates/UAT.md +4 -4
  162. package/sdd/templates/VALIDATION.md +4 -4
  163. package/sdd/templates/claude-md.md +32 -9
  164. package/sdd/templates/config.json +4 -0
  165. package/sdd/templates/debug-subagent-prompt.md +1 -1
  166. package/sdd/templates/dev-preferences.md +1 -1
  167. package/sdd/templates/discovery.md +2 -2
  168. package/sdd/templates/phase-prompt.md +1 -1
  169. package/sdd/templates/planner-subagent-prompt.md +3 -3
  170. package/sdd/templates/project.md +1 -1
  171. package/sdd/templates/research.md +1 -1
  172. package/sdd/templates/state.md +2 -2
  173. package/sdd/workflows/add-phase.md +8 -8
  174. package/sdd/workflows/add-tests.md +12 -9
  175. package/sdd/workflows/add-todo.md +5 -3
  176. package/sdd/workflows/ai-integration-phase.md +284 -0
  177. package/sdd/workflows/analyze-dependencies.md +96 -0
  178. package/sdd/workflows/audit-fix.md +157 -0
  179. package/sdd/workflows/audit-milestone.md +11 -11
  180. package/sdd/workflows/audit-uat.md +2 -2
  181. package/sdd/workflows/autonomous.md +195 -27
  182. package/sdd/workflows/check-todos.md +12 -10
  183. package/sdd/workflows/cleanup.md +2 -0
  184. package/sdd/workflows/code-review-fix.md +497 -0
  185. package/sdd/workflows/code-review.md +515 -0
  186. package/sdd/workflows/complete-milestone.md +56 -22
  187. package/sdd/workflows/diagnose-issues.md +10 -3
  188. package/sdd/workflows/discovery-phase.md +5 -3
  189. package/sdd/workflows/discuss-phase-assumptions.md +24 -6
  190. package/sdd/workflows/discuss-phase-power.md +291 -0
  191. package/sdd/workflows/discuss-phase.md +173 -21
  192. package/sdd/workflows/do.md +23 -21
  193. package/sdd/workflows/docs-update.md +1155 -0
  194. package/sdd/workflows/eval-review.md +155 -0
  195. package/sdd/workflows/execute-phase.md +594 -38
  196. package/sdd/workflows/execute-plan.md +67 -96
  197. package/sdd/workflows/explore.md +139 -0
  198. package/sdd/workflows/fast.md +5 -5
  199. package/sdd/workflows/forensics.md +2 -2
  200. package/sdd/workflows/health.md +4 -4
  201. package/sdd/workflows/help.md +122 -119
  202. package/sdd/workflows/import.md +276 -0
  203. package/sdd/workflows/inbox.md +387 -0
  204. package/sdd/workflows/insert-phase.md +7 -7
  205. package/sdd/workflows/list-phase-assumptions.md +4 -4
  206. package/sdd/workflows/list-workspaces.md +2 -2
  207. package/sdd/workflows/manager.md +35 -32
  208. package/sdd/workflows/map-codebase.md +7 -5
  209. package/sdd/workflows/milestone-summary.md +2 -2
  210. package/sdd/workflows/new-milestone.md +17 -9
  211. package/sdd/workflows/new-project.md +50 -25
  212. package/sdd/workflows/new-workspace.md +7 -5
  213. package/sdd/workflows/next.md +67 -11
  214. package/sdd/workflows/note.md +9 -7
  215. package/sdd/workflows/pause-work.md +75 -12
  216. package/sdd/workflows/plan-milestone-gaps.md +8 -8
  217. package/sdd/workflows/plan-phase.md +294 -42
  218. package/sdd/workflows/plant-seed.md +6 -3
  219. package/sdd/workflows/pr-branch.md +42 -14
  220. package/sdd/workflows/profile-user.md +9 -7
  221. package/sdd/workflows/progress.md +45 -45
  222. package/sdd/workflows/quick.md +195 -47
  223. package/sdd/workflows/remove-phase.md +6 -6
  224. package/sdd/workflows/remove-workspace.md +3 -1
  225. package/sdd/workflows/research-phase.md +2 -2
  226. package/sdd/workflows/resume-project.md +12 -12
  227. package/sdd/workflows/review.md +109 -9
  228. package/sdd/workflows/scan.md +102 -0
  229. package/sdd/workflows/secure-phase.md +166 -0
  230. package/sdd/workflows/session-report.md +2 -2
  231. package/sdd/workflows/settings.md +38 -12
  232. package/sdd/workflows/ship.md +21 -9
  233. package/sdd/workflows/stats.md +1 -1
  234. package/sdd/workflows/transition.md +23 -23
  235. package/sdd/workflows/ui-phase.md +15 -7
  236. package/sdd/workflows/ui-review.md +29 -4
  237. package/sdd/workflows/undo.md +314 -0
  238. package/sdd/workflows/update.md +171 -20
  239. package/sdd/workflows/validate-phase.md +6 -4
  240. package/sdd/workflows/verify-phase.md +210 -6
  241. package/sdd/workflows/verify-work.md +83 -9
  242. package/sdd/commands/sdd/workstreams.md +0 -63
@@ -70,6 +70,16 @@
70
70
  * audit-uat Scan all phases for unresolved UAT/verification items
71
71
  * uat render-checkpoint --file <path> Render the current UAT checkpoint block
72
72
  *
73
+ * Intel:
74
+ * intel query <term> Query intel files for a term
75
+ * intel status Show intel file freshness
76
+ * intel update Trigger intel refresh (returns agent spawn hint)
77
+ * intel diff Show changed intel entries since last snapshot
78
+ * intel snapshot Save current intel state as diff baseline
79
+ * intel patch-meta <file> Update _meta.updated_at in an intel file
80
+ * intel validate Validate intel file structure
81
+ * intel extract-exports <file> Extract exported symbols from a source file
82
+ *
73
83
  * Scaffolding:
74
84
  * scaffold context --phase <N> Create CONTEXT.md template
75
85
  * scaffold uat --phase <N> Create UAT.md template
@@ -93,6 +103,7 @@
93
103
  * verify commits <h1> [h2] ... Batch verify commit hashes
94
104
  * verify artifacts <plan-file> Check must_haves.artifacts
95
105
  * verify key-links <plan-file> Check must_haves.key_links
106
+ * verify schema-drift <phase> [--skip] Detect schema file changes without push
96
107
  *
97
108
  * Template Fill:
98
109
  * template fill summary --phase N Create pre-filled SUMMARY.md
@@ -133,6 +144,20 @@
133
144
  * init milestone-op All context for milestone operations
134
145
  * init map-codebase All context for map-codebase workflow
135
146
  * init progress All context for progress workflow
147
+ *
148
+ * Documentation:
149
+ * docs-init Project context for docs-update workflow
150
+ *
151
+ * Learnings:
152
+ * learnings list List all global learnings (JSON)
153
+ * learnings query --tag <tag> Query learnings by tag
154
+ * learnings copy Copy from current project's LEARNINGS.md
155
+ * learnings prune --older-than <dur> Remove entries older than duration (e.g. 90d)
156
+ * learnings delete <id> Delete a learning by ID
157
+ *
158
+ * SDD-2 Migration:
159
+ * from-sdd2 [--path <dir>] [--force] [--dry-run]
160
+ * Import a SDD-2 (.sdd/) project back to SDD v1 (.planning/) format
136
161
  */
137
162
 
138
163
  const fs = require('fs');
@@ -152,6 +177,8 @@ const frontmatter = require('./lib/frontmatter.cjs');
152
177
  const profilePipeline = require('./lib/profile-pipeline.cjs');
153
178
  const profileOutput = require('./lib/profile-output.cjs');
154
179
  const workstream = require('./lib/workstream.cjs');
180
+ const docs = require('./lib/docs.cjs');
181
+ const learnings = require('./lib/learnings.cjs');
155
182
 
156
183
  // ─── Arg parsing helpers ──────────────────────────────────────────────────────
157
184
 
@@ -230,7 +257,7 @@ async function main() {
230
257
  }
231
258
 
232
259
  // Optional workstream override for parallel milestone work.
233
- // Priority: --ws flag > SDD_WORKSTREAM env var > active-workstream file > null (flat mode)
260
+ // Priority: --ws flag > SDD_WORKSTREAM env var > session-scoped pointer > shared legacy pointer > null
234
261
  const wsEqArg = args.find(arg => arg.startsWith('--ws='));
235
262
  const wsIdx = args.indexOf('--ws');
236
263
  let ws = null;
@@ -271,10 +298,31 @@ async function main() {
271
298
  args.splice(pickIdx, 2);
272
299
  }
273
300
 
301
+ // --default <value>: for config-get, return this value instead of erroring
302
+ // when the key is absent. Allows workflows to express optional config reads
303
+ // without defensive `2>/dev/null || true` boilerplate (#1893).
304
+ const defaultIdx = args.indexOf('--default');
305
+ let defaultValue = undefined;
306
+ if (defaultIdx !== -1) {
307
+ defaultValue = args[defaultIdx + 1];
308
+ if (defaultValue === undefined) defaultValue = '';
309
+ args.splice(defaultIdx, 2);
310
+ }
311
+
274
312
  const command = args[0];
275
313
 
276
314
  if (!command) {
277
- error('Usage: sdd-tools <command> [args] [--raw] [--pick <field>] [--cwd <path>] [--ws <name>]\nCommands: state, resolve-model, find-phase, commit, verify-summary, verify, frontmatter, template, generate-slug, current-timestamp, list-todos, verify-path-exists, config-ensure-section, config-new-project, init, workstream');
315
+ error('Usage: sdd-tools <command> [args] [--raw] [--pick <field>] [--cwd <path>] [--ws <name>]\nCommands: state, resolve-model, find-phase, commit, verify-summary, verify, frontmatter, template, generate-slug, current-timestamp, list-todos, verify-path-exists, config-ensure-section, config-new-project, init, workstream, docs-init');
316
+ }
317
+
318
+ // Reject flags that are never valid for any sdd-tools command. AI agents
319
+ // sometimes hallucinate --help or --version on tool invocations; silently
320
+ // ignoring them can cause destructive operations to proceed unchecked.
321
+ const NEVER_VALID_FLAGS = new Set(['-h', '--help', '-?', '--h', '--version', '-v', '--usage']);
322
+ for (const arg of args) {
323
+ if (NEVER_VALID_FLAGS.has(arg)) {
324
+ error(`Unknown flag: ${arg}\nsdd-tools does not accept help or version flags. Run "sdd-tools" with no arguments for usage.`);
325
+ }
278
326
  }
279
327
 
280
328
  // Multi-repo guard: resolve project root for commands that read/write .planning/.
@@ -313,7 +361,7 @@ async function main() {
313
361
  }
314
362
  };
315
363
  try {
316
- await runCommand(command, args, cwd, raw);
364
+ await runCommand(command, args, cwd, raw, defaultValue);
317
365
  cleanup();
318
366
  } catch (e) {
319
367
  fs.writeSync = origWriteSync;
@@ -322,7 +370,27 @@ async function main() {
322
370
  return;
323
371
  }
324
372
 
325
- await runCommand(command, args, cwd, raw);
373
+ // Intercept stdout to transparently resolve @file: references (#1891).
374
+ // core.cjs output() writes @file:<path> when JSON > 50KB. The --pick path
375
+ // already resolves this, but the normal path wrote @file: to stdout, forcing
376
+ // every workflow to have a bash-specific `if [[ "$INIT" == @file:* ]]` check
377
+ // that breaks on PowerShell and other non-bash shells.
378
+ const origWriteSync2 = fs.writeSync;
379
+ const outChunks = [];
380
+ fs.writeSync = function (fd, data, ...rest) {
381
+ if (fd === 1) { outChunks.push(String(data)); return; }
382
+ return origWriteSync2.call(fs, fd, data, ...rest);
383
+ };
384
+ try {
385
+ await runCommand(command, args, cwd, raw, defaultValue);
386
+ } finally {
387
+ fs.writeSync = origWriteSync2;
388
+ }
389
+ let captured = outChunks.join('');
390
+ if (captured.startsWith('@file:')) {
391
+ captured = fs.readFileSync(captured.slice(6), 'utf-8');
392
+ }
393
+ origWriteSync2.call(fs, 1, captured);
326
394
  }
327
395
 
328
396
  /**
@@ -348,7 +416,7 @@ function extractField(obj, fieldPath) {
348
416
  return current;
349
417
  }
350
418
 
351
- async function runCommand(command, args, cwd, raw) {
419
+ async function runCommand(command, args, cwd, raw, defaultValue) {
352
420
  switch (command) {
353
421
  case 'state': {
354
422
  const subcommand = args[1];
@@ -394,6 +462,14 @@ async function runCommand(command, args, cwd, raw) {
394
462
  state.cmdSignalWaiting(cwd, type, question, options, p, raw);
395
463
  } else if (subcommand === 'signal-resume') {
396
464
  state.cmdSignalResume(cwd, raw);
465
+ } else if (subcommand === 'planned-phase') {
466
+ const { phase: p, name, plans } = parseNamedArgs(args, ['phase', 'name', 'plans']);
467
+ state.cmdStatePlannedPhase(cwd, p, plans !== null ? parseInt(plans, 10) : null, raw);
468
+ } else if (subcommand === 'validate') {
469
+ state.cmdStateValidate(cwd, raw);
470
+ } else if (subcommand === 'sync') {
471
+ const { verify } = parseNamedArgs(args, [], ['verify']);
472
+ state.cmdStateSync(cwd, { verify }, raw);
397
473
  } else {
398
474
  state.cmdStateLoad(cwd, raw);
399
475
  }
@@ -425,6 +501,11 @@ async function runCommand(command, args, cwd, raw) {
425
501
  break;
426
502
  }
427
503
 
504
+ case 'check-commit': {
505
+ commands.cmdCheckCommit(cwd, raw);
506
+ break;
507
+ }
508
+
428
509
  case 'commit-to-subrepo': {
429
510
  const message = args[1];
430
511
  const filesIndex = args.indexOf('--files');
@@ -498,8 +579,11 @@ async function runCommand(command, args, cwd, raw) {
498
579
  verify.cmdVerifyArtifacts(cwd, args[2], raw);
499
580
  } else if (subcommand === 'key-links') {
500
581
  verify.cmdVerifyKeyLinks(cwd, args[2], raw);
582
+ } else if (subcommand === 'schema-drift') {
583
+ const skipFlag = args.includes('--skip');
584
+ verify.cmdVerifySchemaDrift(cwd, args[2], skipFlag, raw);
501
585
  } else {
502
- error('Unknown verify subcommand. Available: plan-structure, phase-completeness, references, commits, artifacts, key-links');
586
+ error('Unknown verify subcommand. Available: plan-structure, phase-completeness, references, commits, artifacts, key-links, schema-drift');
503
587
  }
504
588
  break;
505
589
  }
@@ -540,7 +624,7 @@ async function runCommand(command, args, cwd, raw) {
540
624
  }
541
625
 
542
626
  case 'config-get': {
543
- config.cmdConfigGet(cwd, args[1], raw);
627
+ config.cmdConfigGet(cwd, args[1], raw, defaultValue);
544
628
  break;
545
629
  }
546
630
 
@@ -570,8 +654,10 @@ async function runCommand(command, args, cwd, raw) {
570
654
  includeArchived: args.includes('--include-archived'),
571
655
  };
572
656
  phase.cmdPhasesList(cwd, options, raw);
657
+ } else if (subcommand === 'clear') {
658
+ milestone.cmdPhasesClear(cwd, raw, args.slice(2));
573
659
  } else {
574
- error('Unknown phases subcommand. Available: list');
660
+ error('Unknown phases subcommand. Available: list, clear');
575
661
  }
576
662
  break;
577
663
  }
@@ -712,12 +798,16 @@ async function runCommand(command, args, cwd, raw) {
712
798
  case 'init': {
713
799
  const workflow = args[1];
714
800
  switch (workflow) {
715
- case 'execute-phase':
716
- init.cmdInitExecutePhase(cwd, args[2], raw);
801
+ case 'execute-phase': {
802
+ const { validate: epValidate } = parseNamedArgs(args, [], ['validate']);
803
+ init.cmdInitExecutePhase(cwd, args[2], raw, { validate: epValidate });
717
804
  break;
718
- case 'plan-phase':
719
- init.cmdInitPlanPhase(cwd, args[2], raw);
805
+ }
806
+ case 'plan-phase': {
807
+ const { validate: ppValidate } = parseNamedArgs(args, [], ['validate']);
808
+ init.cmdInitPlanPhase(cwd, args[2], raw, { validate: ppValidate });
720
809
  break;
810
+ }
721
811
  case 'new-project':
722
812
  init.cmdInitNewProject(cwd, raw);
723
813
  break;
@@ -910,6 +1000,88 @@ async function runCommand(command, args, cwd, raw) {
910
1000
  break;
911
1001
  }
912
1002
 
1003
+ // ─── Intel ────────────────────────────────────────────────────────────
1004
+
1005
+ case 'intel': {
1006
+ const intel = require('./lib/intel.cjs');
1007
+ const subcommand = args[1];
1008
+ if (subcommand === 'query') {
1009
+ const term = args[2];
1010
+ if (!term) error('Usage: sdd-tools intel query <term>');
1011
+ const planningDir = path.join(cwd, '.planning');
1012
+ core.output(intel.intelQuery(term, planningDir), raw);
1013
+ } else if (subcommand === 'status') {
1014
+ const planningDir = path.join(cwd, '.planning');
1015
+ core.output(intel.intelStatus(planningDir), raw);
1016
+ } else if (subcommand === 'diff') {
1017
+ const planningDir = path.join(cwd, '.planning');
1018
+ core.output(intel.intelDiff(planningDir), raw);
1019
+ } else if (subcommand === 'snapshot') {
1020
+ const planningDir = path.join(cwd, '.planning');
1021
+ core.output(intel.intelSnapshot(planningDir), raw);
1022
+ } else if (subcommand === 'patch-meta') {
1023
+ const filePath = args[2];
1024
+ if (!filePath) error('Usage: sdd-tools intel patch-meta <file-path>');
1025
+ core.output(intel.intelPatchMeta(path.resolve(cwd, filePath)), raw);
1026
+ } else if (subcommand === 'validate') {
1027
+ const planningDir = path.join(cwd, '.planning');
1028
+ core.output(intel.intelValidate(planningDir), raw);
1029
+ } else if (subcommand === 'extract-exports') {
1030
+ const filePath = args[2];
1031
+ if (!filePath) error('Usage: sdd-tools intel extract-exports <file-path>');
1032
+ core.output(intel.intelExtractExports(path.resolve(cwd, filePath)), raw);
1033
+ } else if (subcommand === 'update') {
1034
+ const planningDir = path.join(cwd, '.planning');
1035
+ core.output(intel.intelUpdate(planningDir), raw);
1036
+ } else {
1037
+ error('Unknown intel subcommand. Available: query, status, update, diff, snapshot, patch-meta, validate, extract-exports');
1038
+ }
1039
+ break;
1040
+ }
1041
+
1042
+ // ─── Documentation ────────────────────────────────────────────────────
1043
+
1044
+ case 'docs-init': {
1045
+ docs.cmdDocsInit(cwd, raw);
1046
+ break;
1047
+ }
1048
+
1049
+ // ─── Learnings ─────────────────────────────────────────────────────────
1050
+
1051
+ case 'learnings': {
1052
+ const subcommand = args[1];
1053
+ if (subcommand === 'list') {
1054
+ learnings.cmdLearningsList(raw);
1055
+ } else if (subcommand === 'query') {
1056
+ const tagIdx = args.indexOf('--tag');
1057
+ const tag = tagIdx !== -1 ? args[tagIdx + 1] : null;
1058
+ if (!tag) error('Usage: sdd-tools learnings query --tag <tag>');
1059
+ learnings.cmdLearningsQuery(tag, raw);
1060
+ } else if (subcommand === 'copy') {
1061
+ learnings.cmdLearningsCopy(cwd, raw);
1062
+ } else if (subcommand === 'prune') {
1063
+ const olderIdx = args.indexOf('--older-than');
1064
+ const olderThan = olderIdx !== -1 ? args[olderIdx + 1] : null;
1065
+ if (!olderThan) error('Usage: sdd-tools learnings prune --older-than <duration>');
1066
+ learnings.cmdLearningsPrune(olderThan, raw);
1067
+ } else if (subcommand === 'delete') {
1068
+ const id = args[2];
1069
+ if (!id) error('Usage: sdd-tools learnings delete <id>');
1070
+ learnings.cmdLearningsDelete(id, raw);
1071
+ } else {
1072
+ error('Unknown learnings subcommand. Available: list, query, copy, prune, delete');
1073
+ }
1074
+ break;
1075
+ }
1076
+
1077
+ // ─── SDD-2 Reverse Migration ───────────────────────────────────────────
1078
+
1079
+ case 'from-sdd2': {
1080
+ const sdd2Import = require('./lib/sdd2-import.cjs');
1081
+ sdd2Import.cmdFromSdd2(args.slice(1), cwd, raw);
1082
+ break;
1083
+ }
1084
+
913
1085
  default:
914
1086
  error(`Unknown command: ${command}`);
915
1087
  }
@@ -0,0 +1,21 @@
1
+ # Dev Context Profile
2
+
3
+ Agent output guidance for dev mode. Loaded when `context: dev` is set in config.json.
4
+
5
+ ## Output Style
6
+
7
+ - Concise, action-oriented responses
8
+ - Lead with the code change or command, follow with brief rationale
9
+ - Skip preamble — assume the developer has full context
10
+ - Use inline code references (`file:line`) over prose descriptions
11
+
12
+ ## Focus Areas
13
+
14
+ - Working code that compiles and passes tests
15
+ - Minimal diff — change only what is necessary
16
+ - Flag side effects or breaking changes immediately
17
+ - Surface the next actionable step at the end of every response
18
+
19
+ ## Verbosity
20
+
21
+ Low. One-liner explanations unless the change is non-obvious. Omit background theory, alternative approaches, and caveats that do not affect the current task.
@@ -0,0 +1,22 @@
1
+ # Research Context Profile
2
+
3
+ Agent output guidance for research mode. Loaded when `context: research` is set in config.json.
4
+
5
+ ## Output Style
6
+
7
+ - Verbose, exploratory responses that surface trade-offs and alternatives
8
+ - Present multiple approaches with pros and cons before recommending one
9
+ - Include links, references, and citations where available
10
+ - Use structured headings and bullet lists for scan-ability
11
+
12
+ ## Focus Areas
13
+
14
+ - Breadth of options — enumerate before narrowing
15
+ - Prior art and ecosystem conventions
16
+ - Risks, edge cases, and failure modes
17
+ - Dependencies and compatibility implications
18
+ - Long-term maintainability of each approach
19
+
20
+ ## Verbosity
21
+
22
+ High. Explain reasoning, show evidence, and document assumptions. Include background context even if the developer likely knows it — research artifacts are read by future contributors who may not.
@@ -0,0 +1,22 @@
1
+ # Review Context Profile
2
+
3
+ Agent output guidance for review mode. Loaded when `context: review` is set in config.json.
4
+
5
+ ## Output Style
6
+
7
+ - Critical, detail-focused responses that prioritize correctness
8
+ - Organize findings by severity: blocking, important, nit
9
+ - Reference specific lines and files for every finding
10
+ - State what is correct as well as what needs change — confirm the good parts
11
+
12
+ ## Focus Areas
13
+
14
+ - Correctness — logic errors, off-by-ones, missing edge cases
15
+ - Security — input validation, injection vectors, secret exposure
16
+ - Performance — unnecessary allocations, O(n^2) patterns, missing caching
17
+ - Style and consistency — naming, formatting, import order
18
+ - Test coverage — untested branches, missing assertions, flaky patterns
19
+
20
+ ## Verbosity
21
+
22
+ Medium. Be thorough on findings but terse in explanation. Each issue should be one to three sentences: what is wrong, why it matters, and how to fix it.
@@ -0,0 +1,79 @@
1
+ # Agent Contracts
2
+
3
+ Completion markers and handoff schemas for all SDD agents. Workflows use these markers to detect agent completion and route accordingly.
4
+
5
+ This doc describes what IS, not what should be. Casing inconsistencies are documented as they appear in agent source files.
6
+
7
+ ---
8
+
9
+ ## Agent Registry
10
+
11
+ | Agent | Role | Completion Markers |
12
+ |-------|------|--------------------|
13
+ | sdd-planner | Plan creation | `## PLANNING COMPLETE` |
14
+ | sdd-executor | Plan execution | `## PLAN COMPLETE`, `## CHECKPOINT REACHED` |
15
+ | sdd-phase-researcher | Phase-scoped research | `## RESEARCH COMPLETE`, `## RESEARCH BLOCKED` |
16
+ | sdd-project-researcher | Project-wide research | `## RESEARCH COMPLETE`, `## RESEARCH BLOCKED` |
17
+ | sdd-plan-checker | Plan validation | `## VERIFICATION PASSED`, `## ISSUES FOUND` |
18
+ | sdd-research-synthesizer | Multi-research synthesis | `## SYNTHESIS COMPLETE`, `## SYNTHESIS BLOCKED` |
19
+ | sdd-debugger | Debug investigation | `## DEBUG COMPLETE`, `## ROOT CAUSE FOUND`, `## CHECKPOINT REACHED` |
20
+ | sdd-roadmapper | Roadmap creation/revision | `## ROADMAP CREATED`, `## ROADMAP REVISED`, `## ROADMAP BLOCKED` |
21
+ | sdd-ui-auditor | UI review | `## UI REVIEW COMPLETE` |
22
+ | sdd-ui-checker | UI validation | `## ISSUES FOUND` |
23
+ | sdd-ui-researcher | UI spec creation | `## UI-SPEC COMPLETE`, `## UI-SPEC BLOCKED` |
24
+ | sdd-verifier | Post-execution verification | `## Verification Complete` (title case) |
25
+ | sdd-integration-checker | Cross-phase integration check | `## Integration Check Complete` (title case) |
26
+ | sdd-nyquist-auditor | Sampling audit | `## PARTIAL`, `## ESCALATE` (non-standard) |
27
+ | sdd-security-auditor | Security audit | `## OPEN_THREATS`, `## ESCALATE` (non-standard) |
28
+ | sdd-codebase-mapper | Codebase analysis | No marker (writes docs directly) |
29
+ | sdd-assumptions-analyzer | Assumption extraction | No marker (returns `## Assumptions` sections) |
30
+ | sdd-doc-verifier | Doc validation | No marker (writes JSON to `.planning/tmp/`) |
31
+ | sdd-doc-writer | Doc generation | No marker (writes docs directly) |
32
+ | sdd-advisor-researcher | Advisory research | No marker (utility agent) |
33
+ | sdd-user-profiler | User profiling | No marker (returns JSON in analysis tags) |
34
+ | sdd-intel-updater | Codebase intelligence analysis | `## INTEL UPDATE COMPLETE`, `## INTEL UPDATE FAILED` |
35
+
36
+ ## Marker Rules
37
+
38
+ 1. **ALL-CAPS markers** (e.g., `## PLANNING COMPLETE`) are the standard convention
39
+ 2. **Title-case markers** (e.g., `## Verification Complete`) exist in sdd-verifier and sdd-integration-checker -- these are intentional as-is, not bugs
40
+ 3. **Non-standard markers** (e.g., `## PARTIAL`, `## ESCALATE`) in audit agents indicate partial results requiring orchestrator judgment
41
+ 4. **Agents without markers** either write artifacts directly to disk or return structured data (JSON/sections) that the caller parses
42
+ 5. Markers must appear as H2 headings (`## `) at the start of a line in the agent's final output
43
+
44
+ ## Key Handoff Contracts
45
+
46
+ ### Planner -> Executor (via PLAN.md)
47
+
48
+ | Field | Required | Description |
49
+ |-------|----------|-------------|
50
+ | Frontmatter | Yes | phase, plan, type, wave, depends_on, files_modified, autonomous, requirements |
51
+ | `<objective>` | Yes | What the plan achieves |
52
+ | `<tasks>` | Yes | Ordered task list with type, files, action, verify, acceptance_criteria |
53
+ | `<verification>` | Yes | Overall verification steps |
54
+ | `<success_criteria>` | Yes | Measurable completion criteria |
55
+
56
+ ### Executor -> Verifier (via SUMMARY.md)
57
+
58
+ | Field | Required | Description |
59
+ |-------|----------|-------------|
60
+ | Frontmatter | Yes | phase, plan, subsystem, tags, key-files, metrics |
61
+ | Commits table | Yes | Per-task commit hashes and descriptions |
62
+ | Deviations section | Yes | Auto-fixed issues or "None" |
63
+ | Self-Check | Yes | PASSED or FAILED with details |
64
+
65
+ ## Workflow Regex Patterns
66
+
67
+ Workflows match these markers to detect agent completion:
68
+
69
+ **plan-phase.md matches:**
70
+ - `## RESEARCH COMPLETE` / `## RESEARCH BLOCKED` (researcher output)
71
+ - `## PLANNING COMPLETE` (planner output)
72
+ - `## CHECKPOINT REACHED` (planner/executor pause)
73
+ - `## VERIFICATION PASSED` / `## ISSUES FOUND` (plan-checker output)
74
+
75
+ **execute-phase.md matches:**
76
+ - `## PHASE COMPLETE` (all plans in phase done)
77
+ - `## Self-Check: FAILED` (summary self-check)
78
+
79
+ > **NOTE:** `## PLAN COMPLETE` is the sdd-executor's completion marker but execute-phase.md does not regex-match it. Instead, it detects executor completion via spot-checks (SUMMARY.md existence, git commit state). This is intentional behavior, not a mismatch.
@@ -0,0 +1,156 @@
1
+ # AI Evaluation Reference
2
+
3
+ > Reference used by `sdd-eval-planner` and `sdd-eval-auditor`.
4
+ > Based on "AI Evals for Everyone" course (Reganti & Badam) + industry practice.
5
+
6
+ ---
7
+
8
+ ## Core Concepts
9
+
10
+ ### Why Evals Exist
11
+ AI systems are non-deterministic. Input X does not reliably produce output Y across runs, users, or edge cases. Evals are the continuous process of assessing whether your system's behavior meets expectations under real-world conditions — unit tests and integration tests alone are insufficient.
12
+
13
+ ### Model vs. Product Evaluation
14
+ - **Model evals** (MMLU, HumanEval, GSM8K) — measure general capability in standardized conditions. Use as initial filter only.
15
+ - **Product evals** — measure behavior inside your specific system, with your data, your users, your domain rules. This is where 80% of eval effort belongs.
16
+
17
+ ### The Three Components of Every Eval
18
+ - **Input** — everything affecting the system: query, history, retrieved docs, system prompt, config
19
+ - **Expected** — what good behavior looks like, defined through rubrics
20
+ - **Actual** — what the system produced, including intermediate steps, tool calls, and reasoning traces
21
+
22
+ ### Three Measurement Approaches
23
+ 1. **Code-based metrics** — deterministic checks: JSON validation, required disclaimers, performance thresholds, classification flags. Fast, cheap, reliable. Use first.
24
+ 2. **LLM judges** — one model evaluates another against a rubric. Powerful for subjective qualities (tone, reasoning, escalation). Requires calibration against human judgment before trusting.
25
+ 3. **Human evaluation** — gold standard for nuanced judgment. Doesn't scale. Use for calibration, edge cases, periodic sampling, and high-stakes decisions.
26
+
27
+ Most effective systems combine all three.
28
+
29
+ ---
30
+
31
+ ## Evaluation Dimensions
32
+
33
+ ### Pre-Deployment (Development Phase)
34
+
35
+ | Dimension | What It Measures | When It Matters |
36
+ |-----------|-----------------|-----------------|
37
+ | **Factual accuracy** | Correctness of claims against ground truth | RAG, knowledge bases, any factual assertions |
38
+ | **Context faithfulness** | Response grounded in provided context vs. fabricated | RAG pipelines, document Q&A, retrieval-augmented systems |
39
+ | **Hallucination detection** | Plausible but unsupported claims | All generative systems, high-stakes domains |
40
+ | **Escalation accuracy** | Correct identification of when human intervention needed | Customer service, healthcare, financial advisory |
41
+ | **Policy compliance** | Adherence to business rules, legal requirements, disclaimers | Regulated industries, enterprise deployments |
42
+ | **Tone/style appropriateness** | Match with brand voice, audience expectations, emotional context | Customer-facing systems, content generation |
43
+ | **Output structure validity** | Schema compliance, required fields, format correctness | Structured extraction, API integrations, data pipelines |
44
+ | **Task completion** | Whether the system accomplished the stated goal | Agentic workflows, multi-step tasks |
45
+ | **Tool use correctness** | Correct selection and invocation of tools | Agent systems with tool calls |
46
+ | **Safety** | Absence of harmful, biased, or inappropriate outputs | All user-facing systems |
47
+
48
+ ### Production Monitoring
49
+
50
+ | Dimension | Monitoring Approach |
51
+ |-----------|---------------------|
52
+ | **Safety violations** | Online guardrail — real-time, immediate intervention |
53
+ | **Compliance failures** | Online guardrail — block or escalate before user sees output |
54
+ | **Quality degradation trends** | Offline flywheel — batch analysis of sampled interactions |
55
+ | **Emerging failure modes** | Signal-metric divergence — when user behavior signals diverge from metric scores, investigate manually |
56
+ | **Cost/latency drift** | Code-based metrics — automated threshold alerts |
57
+
58
+ ---
59
+
60
+ ## The Guardrail vs. Flywheel Decision
61
+
62
+ Ask: "If this behavior goes wrong, would it be catastrophic for my business?"
63
+
64
+ - **Yes → Guardrail** — run online, real-time, with immediate intervention (block, escalate, hand off). Be selective: guardrails add latency.
65
+ - **No → Flywheel** — run offline as batch analysis feeding system refinements over time.
66
+
67
+ ---
68
+
69
+ ## Rubric Design
70
+
71
+ Generic metrics are meaningless without context. "Helpfulness" in real estate means summarizing listings clearly. In healthcare it means knowing when *not* to answer.
72
+
73
+ A rubric must define:
74
+ 1. The dimension being measured
75
+ 2. What scores 1, 3, and 5 on a 5-point scale (or pass/fail criteria)
76
+ 3. Domain-specific examples of acceptable vs. unacceptable behavior
77
+
78
+ Without rubrics, LLM judges produce noise rather than signal.
79
+
80
+ ---
81
+
82
+ ## Reference Dataset Guidelines
83
+
84
+ - Start with **10-20 high-quality examples** — not 200 mediocre ones
85
+ - Cover: critical success scenarios, common user workflows, known edge cases, historical failure modes
86
+ - Have domain experts label the examples (not just engineers)
87
+ - Expand based on what you learn in production — don't build for hypothetical coverage
88
+
89
+ ---
90
+
91
+ ## Eval Tooling Guide
92
+
93
+ | Tool | Type | Best For | Key Strength |
94
+ |------|------|----------|-------------|
95
+ | **RAGAS** | Python library | RAG evaluation | Purpose-built metrics: faithfulness, answer relevance, context precision/recall |
96
+ | **Langfuse** | Platform (open-source, self-hostable) | All system types | Strong tracing, prompt management, good for teams wanting infrastructure control |
97
+ | **LangSmith** | Platform (commercial) | LangChain/LangGraph ecosystems | Tightest integration with LangChain; best if already in that ecosystem |
98
+ | **Arize Phoenix** | Platform (open-source + hosted) | RAG + multi-agent tracing | Strong RAG eval + trace visualization; open-source with hosted option |
99
+ | **Braintrust** | Platform (commercial) | Model-agnostic evaluation | Dataset and experiment management; good for comparing across frameworks |
100
+ | **Promptfoo** | CLI tool (open-source) | Prompt testing, CI/CD | CLI-first, excellent for CI/CD prompt regression testing |
101
+
102
+ ### Tool Selection by System Type
103
+
104
+ | System Type | Recommended Tooling |
105
+ |-------------|---------------------|
106
+ | RAG / Knowledge Q&A | RAGAS + Arize Phoenix or Braintrust |
107
+ | Multi-agent systems | Langfuse + Arize Phoenix |
108
+ | Conversational / single-model | Promptfoo + Braintrust |
109
+ | Structured extraction | Promptfoo + code-based validators |
110
+ | LangChain/LangGraph projects | LangSmith (native integration) |
111
+ | Production monitoring (all types) | Langfuse, Arize Phoenix, or LangSmith |
112
+
113
+ ---
114
+
115
+ ## Evals in the Development Lifecycle
116
+
117
+ ### Plan Phase (Evaluation-Aware Design)
118
+ Before writing code, define:
119
+ 1. What type of AI system is being built → determines framework and dominant eval concerns
120
+ 2. Critical failure modes (3-5 behaviors that cannot go wrong)
121
+ 3. Rubrics — explicit definitions of acceptable/unacceptable behavior per dimension
122
+ 4. Evaluation strategy — which dimensions use code metrics, LLM judges, or human review
123
+ 5. Reference dataset requirements — size, composition, labeling approach
124
+ 6. Eval tooling selection
125
+
126
+ Output: EVALS-SPEC section of AI-SPEC.md
127
+
128
+ ### Execute Phase (Instrument While Building)
129
+ - Add tracing from day one (Langfuse, Arize Phoenix, or LangSmith)
130
+ - Build reference dataset concurrently with implementation
131
+ - Implement code-based checks first; add LLM judges only for subjective dimensions
132
+ - Run evals in CI/CD via Promptfoo or Braintrust
133
+
134
+ ### Verify Phase (Pre-Deployment Validation)
135
+ - Run full reference dataset against all metrics
136
+ - Conduct human review of edge cases and LLM judge disagreements
137
+ - Calibrate LLM judges against human scores (target ≥ 0.7 correlation before trusting)
138
+ - Define and configure production guardrails
139
+ - Establish monitoring baseline
140
+
141
+ ### Monitor Phase (Production Evaluation Loop)
142
+ - Smart sampling — weight toward interactions with concerning signals (retries, unusual length, explicit escalations)
143
+ - Online guardrails on every interaction
144
+ - Offline flywheel on sampled batch
145
+ - Watch for signal-metric divergence — the early warning system for evaluation gaps
146
+
147
+ ---
148
+
149
+ ## Common Pitfalls
150
+
151
+ 1. **Assuming benchmarks predict product success** — they don't; model evals are a filter, not a verdict
152
+ 2. **Engineering evals in isolation** — domain experts must co-define rubrics; engineers alone miss critical nuances
153
+ 3. **Building comprehensive coverage on day one** — start small (10-20 examples), expand from real failure modes
154
+ 4. **Trusting uncalibrated LLM judges** — validate against human judgment before relying on them
155
+ 5. **Measuring everything** — only track metrics that drive decisions; "collect it all" produces noise
156
+ 6. **Treating evaluation as one-time setup** — user behavior evolves, requirements change, failure modes emerge; evaluation is continuous