gsd-opencode 1.33.3 → 1.35.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (118) hide show
  1. package/agents/gsd-advisor-researcher.md +23 -0
  2. package/agents/gsd-ai-researcher.md +142 -0
  3. package/agents/gsd-code-fixer.md +523 -0
  4. package/agents/gsd-code-reviewer.md +361 -0
  5. package/agents/gsd-debugger.md +14 -1
  6. package/agents/gsd-domain-researcher.md +162 -0
  7. package/agents/gsd-eval-auditor.md +170 -0
  8. package/agents/gsd-eval-planner.md +161 -0
  9. package/agents/gsd-executor.md +70 -7
  10. package/agents/gsd-framework-selector.md +167 -0
  11. package/agents/gsd-intel-updater.md +320 -0
  12. package/agents/gsd-phase-researcher.md +26 -0
  13. package/agents/gsd-plan-checker.md +12 -0
  14. package/agents/gsd-planner.md +16 -6
  15. package/agents/gsd-project-researcher.md +23 -0
  16. package/agents/gsd-ui-researcher.md +23 -0
  17. package/agents/gsd-verifier.md +55 -1
  18. package/commands/gsd/gsd-ai-integration-phase.md +36 -0
  19. package/commands/gsd/gsd-audit-fix.md +33 -0
  20. package/commands/gsd/gsd-autonomous.md +1 -0
  21. package/commands/gsd/gsd-code-review-fix.md +52 -0
  22. package/commands/gsd/gsd-code-review.md +55 -0
  23. package/commands/gsd/gsd-eval-review.md +32 -0
  24. package/commands/gsd/gsd-explore.md +27 -0
  25. package/commands/gsd/gsd-from-gsd2.md +45 -0
  26. package/commands/gsd/gsd-import.md +36 -0
  27. package/commands/gsd/gsd-intel.md +183 -0
  28. package/commands/gsd/gsd-next.md +2 -0
  29. package/commands/gsd/gsd-reapply-patches.md +58 -3
  30. package/commands/gsd/gsd-review.md +4 -2
  31. package/commands/gsd/gsd-scan.md +26 -0
  32. package/commands/gsd/gsd-undo.md +34 -0
  33. package/commands/gsd/gsd-workstreams.md +6 -6
  34. package/get-shit-done/bin/gsd-tools.cjs +143 -5
  35. package/get-shit-done/bin/lib/commands.cjs +10 -2
  36. package/get-shit-done/bin/lib/config.cjs +71 -37
  37. package/get-shit-done/bin/lib/core.cjs +70 -8
  38. package/get-shit-done/bin/lib/gsd2-import.cjs +511 -0
  39. package/get-shit-done/bin/lib/init.cjs +20 -6
  40. package/get-shit-done/bin/lib/intel.cjs +660 -0
  41. package/get-shit-done/bin/lib/learnings.cjs +378 -0
  42. package/get-shit-done/bin/lib/milestone.cjs +25 -15
  43. package/get-shit-done/bin/lib/model-profiles.cjs +17 -17
  44. package/get-shit-done/bin/lib/phase.cjs +148 -112
  45. package/get-shit-done/bin/lib/roadmap.cjs +12 -5
  46. package/get-shit-done/bin/lib/security.cjs +119 -0
  47. package/get-shit-done/bin/lib/state.cjs +283 -221
  48. package/get-shit-done/bin/lib/template.cjs +8 -4
  49. package/get-shit-done/bin/lib/verify.cjs +42 -5
  50. package/get-shit-done/references/ai-evals.md +156 -0
  51. package/get-shit-done/references/ai-frameworks.md +186 -0
  52. package/get-shit-done/references/common-bug-patterns.md +114 -0
  53. package/get-shit-done/references/few-shot-examples/plan-checker.md +73 -0
  54. package/get-shit-done/references/few-shot-examples/verifier.md +109 -0
  55. package/get-shit-done/references/gates.md +70 -0
  56. package/get-shit-done/references/ios-scaffold.md +123 -0
  57. package/get-shit-done/references/model-profile-resolution.md +6 -7
  58. package/get-shit-done/references/model-profiles.md +20 -14
  59. package/get-shit-done/references/planning-config.md +237 -0
  60. package/get-shit-done/references/thinking-models-debug.md +44 -0
  61. package/get-shit-done/references/thinking-models-execution.md +50 -0
  62. package/get-shit-done/references/thinking-models-planning.md +62 -0
  63. package/get-shit-done/references/thinking-models-research.md +50 -0
  64. package/get-shit-done/references/thinking-models-verification.md +55 -0
  65. package/get-shit-done/references/thinking-partner.md +96 -0
  66. package/get-shit-done/references/universal-anti-patterns.md +6 -1
  67. package/get-shit-done/references/verification-overrides.md +227 -0
  68. package/get-shit-done/templates/AI-SPEC.md +246 -0
  69. package/get-shit-done/workflows/add-tests.md +3 -0
  70. package/get-shit-done/workflows/add-todo.md +2 -0
  71. package/get-shit-done/workflows/ai-integration-phase.md +284 -0
  72. package/get-shit-done/workflows/audit-fix.md +154 -0
  73. package/get-shit-done/workflows/autonomous.md +33 -2
  74. package/get-shit-done/workflows/check-todos.md +2 -0
  75. package/get-shit-done/workflows/cleanup.md +2 -0
  76. package/get-shit-done/workflows/code-review-fix.md +497 -0
  77. package/get-shit-done/workflows/code-review.md +515 -0
  78. package/get-shit-done/workflows/complete-milestone.md +40 -15
  79. package/get-shit-done/workflows/diagnose-issues.md +1 -1
  80. package/get-shit-done/workflows/discovery-phase.md +3 -1
  81. package/get-shit-done/workflows/discuss-phase-assumptions.md +1 -1
  82. package/get-shit-done/workflows/discuss-phase.md +21 -7
  83. package/get-shit-done/workflows/do.md +2 -0
  84. package/get-shit-done/workflows/docs-update.md +2 -0
  85. package/get-shit-done/workflows/eval-review.md +155 -0
  86. package/get-shit-done/workflows/execute-phase.md +307 -57
  87. package/get-shit-done/workflows/execute-plan.md +64 -93
  88. package/get-shit-done/workflows/explore.md +136 -0
  89. package/get-shit-done/workflows/help.md +1 -1
  90. package/get-shit-done/workflows/import.md +273 -0
  91. package/get-shit-done/workflows/inbox.md +387 -0
  92. package/get-shit-done/workflows/manager.md +4 -10
  93. package/get-shit-done/workflows/new-milestone.md +3 -1
  94. package/get-shit-done/workflows/new-project.md +2 -0
  95. package/get-shit-done/workflows/new-workspace.md +2 -0
  96. package/get-shit-done/workflows/next.md +56 -0
  97. package/get-shit-done/workflows/note.md +2 -0
  98. package/get-shit-done/workflows/plan-phase.md +97 -17
  99. package/get-shit-done/workflows/plant-seed.md +3 -0
  100. package/get-shit-done/workflows/pr-branch.md +41 -13
  101. package/get-shit-done/workflows/profile-user.md +4 -2
  102. package/get-shit-done/workflows/quick.md +99 -4
  103. package/get-shit-done/workflows/remove-workspace.md +2 -0
  104. package/get-shit-done/workflows/review.md +53 -6
  105. package/get-shit-done/workflows/scan.md +98 -0
  106. package/get-shit-done/workflows/secure-phase.md +2 -0
  107. package/get-shit-done/workflows/settings.md +18 -3
  108. package/get-shit-done/workflows/ship.md +3 -0
  109. package/get-shit-done/workflows/ui-phase.md +10 -2
  110. package/get-shit-done/workflows/ui-review.md +2 -0
  111. package/get-shit-done/workflows/undo.md +314 -0
  112. package/get-shit-done/workflows/update.md +2 -0
  113. package/get-shit-done/workflows/validate-phase.md +2 -0
  114. package/get-shit-done/workflows/verify-phase.md +83 -0
  115. package/get-shit-done/workflows/verify-work.md +12 -1
  116. package/package.json +1 -1
  117. package/skills/gsd-code-review/SKILL.md +48 -0
  118. package/skills/gsd-code-review-fix/SKILL.md +44 -0
@@ -0,0 +1,246 @@
1
+ # AI-SPEC — Phase {N}: {phase_name}
2
+
3
+ > AI design contract generated by `/gsd-ai-integration-phase`. Consumed by `gsd-planner` and `gsd-eval-auditor`.
4
+ > Locks framework selection, implementation guidance, and evaluation strategy before planning begins.
5
+
6
+ ---
7
+
8
+ ## 1. System Classification
9
+
10
+ **System Type:** <!-- RAG | Multi-Agent | Conversational | Extraction | Autonomous Agent | Content Generation | Code Automation | Hybrid -->
11
+
12
+ **Description:**
13
+ <!-- One-paragraph description of what this AI system does, who uses it, and what "good" looks like -->
14
+
15
+ **Critical Failure Modes:**
16
+ <!-- The 3-5 behaviors that absolutely cannot go wrong in this system -->
17
+ 1.
18
+ 2.
19
+ 3.
20
+
21
+ ---
22
+
23
+ ## 1b. Domain Context
24
+
25
+ > Researched by `gsd-domain-researcher`. Grounds the evaluation strategy in domain expert knowledge.
26
+
27
+ **Industry Vertical:** <!-- healthcare | legal | finance | customer service | education | developer tooling | e-commerce | etc. -->
28
+
29
+ **User Population:** <!-- who uses this system and in what context -->
30
+
31
+ **Stakes Level:** <!-- Low | Medium | High | Critical -->
32
+
33
+ **Output Consequence:** <!-- what happens downstream when the AI output is acted on -->
34
+
35
+ ### What Domain Experts Evaluate Against
36
+
37
+ <!-- Domain-specific rubric ingredients — in practitioner language, not AI jargon -->
38
+ <!-- Format: Dimension / Good (expert accepts) / Bad (expert flags) / Stakes / Source -->
39
+
40
+ ### Known Failure Modes in This Domain
41
+
42
+ <!-- Domain-specific failure modes from research — not generic hallucination, but how it manifests here -->
43
+
44
+ ### Regulatory / Compliance Context
45
+
46
+ <!-- Relevant regulations or constraints — or "None identified" if genuinely none apply -->
47
+
48
+ ### Domain Expert Roles for Evaluation
49
+
50
+ | Role | Responsibility |
51
+ |------|---------------|
52
+ | <!-- e.g., Senior practitioner --> | <!-- Dataset labeling / rubric calibration / production sampling --> |
53
+
54
+ ---
55
+
56
+ ## 2. Framework Decision
57
+
58
+ **Selected Framework:** <!-- e.g., LlamaIndex v0.10.x -->
59
+
60
+ **Version:** <!-- Pin the version -->
61
+
62
+ **Rationale:**
63
+ <!-- Why this framework fits this system type, team context, and production requirements -->
64
+
65
+ **Alternatives Considered:**
66
+
67
+ | Framework | Ruled Out Because |
68
+ |-----------|------------------|
69
+ | | |
70
+
71
+ **Vendor Lock-In Accepted:** <!-- Yes / No / Partial — document the trade-off consciously -->
72
+
73
+ ---
74
+
75
+ ## 3. Framework Quick Reference
76
+
77
+ > Fetched from official docs by `gsd-ai-researcher`. Distilled for this specific use case.
78
+
79
+ ### Installation
80
+ ```bash
81
+ # Install command(s)
82
+ ```
83
+
84
+ ### Core Imports
85
+ ```python
86
+ # Key imports for this use case
87
+ ```
88
+
89
+ ### Entry Point Pattern
90
+ ```python
91
+ # Minimal working example for this system type
92
+ ```
93
+
94
+ ### Key Abstractions
95
+ <!-- Framework-specific concepts the developer must understand before coding -->
96
+ | Concept | What It Is | When You Use It |
97
+ |---------|-----------|-----------------|
98
+ | | | |
99
+
100
+ ### Common Pitfalls
101
+ <!-- Gotchas specific to this framework and system type — from docs, issues, and community reports -->
102
+ 1.
103
+ 2.
104
+ 3.
105
+
106
+ ### Recommended Project Structure
107
+ ```
108
+ project/
109
+ ├── # Framework-specific folder layout
110
+ ```
111
+
112
+ ---
113
+
114
+ ## 4. Implementation Guidance
115
+
116
+ **Model Configuration:**
117
+ <!-- Which model(s), temperature, max tokens, and other key parameters -->
118
+
119
+ **Core Pattern:**
120
+ <!-- The primary implementation pattern for this system type in this framework -->
121
+
122
+ **Tool Use:**
123
+ <!-- Tools/integrations needed and how to configure them -->
124
+
125
+ **State Management:**
126
+ <!-- How state is persisted, retrieved, and updated -->
127
+
128
+ **Context Window Strategy:**
129
+ <!-- How to manage context limits for this system type -->
130
+
131
+ ---
132
+
133
+ ## 4b. AI Systems Best Practices
134
+
135
+ > Written by `gsd-ai-researcher`. Cross-cutting patterns every developer building AI systems needs — independent of framework choice.
136
+
137
+ ### Structured Outputs with Pydantic
138
+
139
+ <!-- Framework-specific Pydantic integration pattern for this use case -->
140
+ <!-- Include: output model definition, how the framework uses it, retry logic on validation failure -->
141
+
142
+ ```python
143
+ # Pydantic output model for this system type
144
+ ```
145
+
146
+ ### Async-First Design
147
+
148
+ <!-- How async is handled in this framework, the one common mistake, and when to stream vs. await -->
149
+
150
+ ### Prompt Engineering Discipline
151
+
152
+ <!-- System vs. user prompt separation, few-shot guidance, token budget strategy -->
153
+
154
+ ### Context Window Management
155
+
156
+ <!-- Strategy specific to this system type: RAG chunking / conversation summarisation / agent compaction -->
157
+
158
+ ### Cost and Latency Budget
159
+
160
+ <!-- Per-call cost estimate, caching strategy, sub-task model routing -->
161
+
162
+ ---
163
+
164
+ ## 5. Evaluation Strategy
165
+
166
+ ### Dimensions
167
+
168
+ | Dimension | Rubric (Pass/Fail or 1-5) | Measurement Approach | Priority |
169
+ |-----------|--------------------------|---------------------|----------|
170
+ | | | Code / LLM Judge / Human | Critical / High / Medium |
171
+
172
+ ### Eval Tooling
173
+
174
+ **Primary Tool:** <!-- e.g., RAGAS + Langfuse -->
175
+
176
+ **Setup:**
177
+ ```bash
178
+ # Install and configure
179
+ ```
180
+
181
+ **CI/CD Integration:**
182
+ ```bash
183
+ # Command to run evals in CI/CD pipeline
184
+ ```
185
+
186
+ ### Reference Dataset
187
+
188
+ **Size:** <!-- e.g., 20 examples to start -->
189
+
190
+ **Composition:**
191
+ <!-- What scenario types the dataset covers: critical paths, edge cases, failure modes -->
192
+
193
+ **Labeling:**
194
+ <!-- Who labels examples and how (domain expert, LLM judge with calibration, etc.) -->
195
+
196
+ ---
197
+
198
+ ## 6. Guardrails
199
+
200
+ ### Online (Real-Time)
201
+
202
+ | Guardrail | Trigger | Intervention |
203
+ |-----------|---------|--------------|
204
+ | | | Block / Escalate / Flag |
205
+
206
+ ### Offline (Flywheel)
207
+
208
+ | Metric | Sampling Strategy | Action on Degradation |
209
+ |--------|------------------|----------------------|
210
+ | | | |
211
+
212
+ ---
213
+
214
+ ## 7. Production Monitoring
215
+
216
+ **Tracing Tool:** <!-- e.g., Langfuse self-hosted -->
217
+
218
+ **Key Metrics to Track:**
219
+ <!-- 3-5 metrics that will be monitored in production -->
220
+
221
+ **Alert Thresholds:**
222
+ <!-- When to page/alert -->
223
+
224
+ **Smart Sampling Strategy:**
225
+ <!-- How to select interactions for human review — signal-based filters -->
226
+
227
+ ---
228
+
229
+ ## Checklist
230
+
231
+ - [ ] System type classified
232
+ - [ ] Critical failure modes identified (≥ 3)
233
+ - [ ] Domain context researched (Section 1b: vertical, stakes, expert criteria, failure modes)
234
+ - [ ] Regulatory/compliance context identified or explicitly noted as none
235
+ - [ ] Domain expert roles defined for evaluation involvement
236
+ - [ ] Framework selected with rationale documented
237
+ - [ ] Alternatives considered and ruled out
238
+ - [ ] Framework quick reference written (install, imports, pattern, pitfalls)
239
+ - [ ] AI systems best practices written (Section 4b: Pydantic, async, prompt discipline, context)
240
+ - [ ] Evaluation dimensions grounded in domain rubric ingredients
241
+ - [ ] Each eval dimension has a concrete rubric (Good/Bad in domain language)
242
+ - [ ] Eval tooling selected — Arize Phoenix default confirmed or override noted
243
+ - [ ] Reference dataset spec written (size ≥ 10, composition + labeling defined)
244
+ - [ ] CI/CD eval integration specified
245
+ - [ ] Online guardrails defined
246
+ - [ ] Production monitoring configured (tracing tool + sampling strategy)
@@ -108,6 +108,9 @@ read each file to verify classification. Don't classify based on filename alone.
108
108
  <step name="present_classification">
109
109
  Present the classification to the user for confirmation before proceeding:
110
110
 
111
+
112
+ **Text mode (`workflow.text_mode: true` in config or `--text` flag):** Set `TEXT_MODE=true` if `--text` is present in `$ARGUMENTS` OR `text_mode` from init JSON is `true`. When TEXT_MODE is active, replace every `question` call with a plain-text numbered list and ask the user to type their choice number. This is required for non-OpenCode runtimes (OpenAI Codex, Gemini CLI, etc.) where `question` is not available.
113
+
111
114
  ```
112
115
  question(
113
116
  header: "Test Classification",
@@ -70,6 +70,8 @@ If potential duplicate found:
70
70
  1. read the existing todo
71
71
  2. Compare scope
72
72
 
73
+
74
+ **Text mode (`workflow.text_mode: true` in config or `--text` flag):** Set `TEXT_MODE=true` if `--text` is present in `$ARGUMENTS` OR `text_mode` from init JSON is `true`. When TEXT_MODE is active, replace every `question` call with a plain-text numbered list and ask the user to type their choice number. This is required for non-OpenCode runtimes (OpenAI Codex, Gemini CLI, etc.) where `question` is not available.
73
75
  If overlapping, use question:
74
76
  - header: "Duplicate?"
75
77
  - question: "Similar todo exists: [title]. What would you like to do?"
@@ -0,0 +1,284 @@
1
+ <objective>
2
+ Generate an AI design contract (AI-SPEC.md) for phases that involve building AI systems. Orchestrates gsd-framework-selector → gsd-ai-researcher → gsd-domain-researcher → gsd-eval-planner with a validation gate. Inserts between discuss-phase and plan-phase in the GSD lifecycle.
3
+
4
+ AI-SPEC.md locks four things before the planner creates tasks:
5
+ 1. Framework selection (with rationale and alternatives)
6
+ 2. Implementation guidance (correct syntax, patterns, pitfalls from official docs)
7
+ 3. Domain context (practitioner rubric ingredients, failure modes, regulatory constraints)
8
+ 4. Evaluation strategy (dimensions, rubrics, tooling, reference dataset, guardrails)
9
+
10
+ This prevents the two most common AI development failures: choosing the wrong framework for the use case, and treating evaluation as an afterthought.
11
+ </objective>
12
+
13
+ <required_reading>
14
+ @$HOME/.config/opencode/get-shit-done/references/ai-frameworks.md
15
+ @$HOME/.config/opencode/get-shit-done/references/ai-evals.md
16
+ </required_reading>
17
+
18
+ <process>
19
+
20
+ ## 1. Initialize
21
+
22
+ ```bash
23
+ INIT=$(node "$HOME/.config/opencode/get-shit-done/bin/gsd-tools.cjs" init plan-phase "$PHASE")
24
+ if [[ "$INIT" == @file:* ]]; then INIT=$(cat "${INIT#@file:}"); fi
25
+ ```
26
+
27
+ Parse JSON for: `phase_dir`, `phase_number`, `phase_name`, `phase_slug`, `padded_phase`, `has_context`, `has_research`, `commit_docs`.
28
+
29
+ **File paths:** `state_path`, `roadmap_path`, `requirements_path`, `context_path`.
30
+
31
+ Resolve agent models:
32
+ ```bash
33
+ SELECTOR_MODEL=$(node "$HOME/.config/opencode/get-shit-done/bin/gsd-tools.cjs" resolve-model gsd-framework-selector --raw)
34
+ RESEARCHER_MODEL=$(node "$HOME/.config/opencode/get-shit-done/bin/gsd-tools.cjs" resolve-model gsd-ai-researcher --raw)
35
+ DOMAIN_MODEL=$(node "$HOME/.config/opencode/get-shit-done/bin/gsd-tools.cjs" resolve-model gsd-domain-researcher --raw)
36
+ PLANNER_MODEL=$(node "$HOME/.config/opencode/get-shit-done/bin/gsd-tools.cjs" resolve-model gsd-eval-planner --raw)
37
+ ```
38
+
39
+ Check config:
40
+ ```bash
41
+ AI_PHASE_ENABLED=$(node "$HOME/.config/opencode/get-shit-done/bin/gsd-tools.cjs" config-get workflow.ai_integration_phase 2>/dev/null || echo "true")
42
+ ```
43
+
44
+ **If `AI_PHASE_ENABLED` is `false`:**
45
+ ```
46
+ AI phase is disabled in config. Enable via /gsd-settings.
47
+ ```
48
+ Exit workflow.
49
+
50
+ **If `planning_exists` is false:** Error — run `/gsd-new-project` first.
51
+
52
+ ## 2. Parse and Validate Phase
53
+
54
+ Extract phase number from $ARGUMENTS. If not provided, detect next unplanned phase.
55
+
56
+ ```bash
57
+ PHASE_INFO=$(node "$HOME/.config/opencode/get-shit-done/bin/gsd-tools.cjs" roadmap get-phase "${PHASE}")
58
+ ```
59
+
60
+ **If `found` is false:** Error with available phases.
61
+
62
+ ## 3. Check Prerequisites
63
+
64
+ **If `has_context` is false:**
65
+ ```
66
+ No CONTEXT.md found for Phase {N}.
67
+ Recommended: run /gsd-discuss-phase {N} first to capture framework preferences.
68
+ Continuing without user decisions — framework selector will ask all questions.
69
+ ```
70
+ Continue (non-blocking).
71
+
72
+ ## 4. Check Existing AI-SPEC
73
+
74
+ ```bash
75
+ AI_SPEC_FILE=$(ls "${PHASE_DIR}"/*-AI-SPEC.md 2>/dev/null | head -1)
76
+ ```
77
+
78
+
79
+ **Text mode (`workflow.text_mode: true` in config or `--text` flag):** Set `TEXT_MODE=true` if `--text` is present in `$ARGUMENTS` OR `text_mode` from init JSON is `true`. When TEXT_MODE is active, replace every `question` call with a plain-text numbered list and ask the user to type their choice number. This is required for non-OpenCode runtimes (OpenAI Codex, Gemini CLI, etc.) where `question` is not available.
80
+ **If exists:** Use question:
81
+ - header: "Existing AI-SPEC"
82
+ - question: "AI-SPEC.md already exists for Phase {N}. What would you like to do?"
83
+ - options:
84
+ - "Update — re-run with existing as baseline"
85
+ - "View — display current AI-SPEC and exit"
86
+ - "Skip — keep current AI-SPEC and exit"
87
+
88
+ If "View": display file contents, exit.
89
+ If "Skip": exit.
90
+ If "Update": continue to step 5.
91
+
92
+ ## 5. Spawn gsd-framework-selector
93
+
94
+ Display:
95
+ ```
96
+ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
97
+ GSD ► AI DESIGN CONTRACT — PHASE {N}: {name}
98
+ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
99
+
100
+ ◆ Step 1/4 — Framework Selection...
101
+ ```
102
+
103
+ Spawn `gsd-framework-selector` with:
104
+ ```markdown
105
+ read $HOME/.config/opencode/agents/gsd-framework-selector.md for instructions.
106
+
107
+ <objective>
108
+ Select the right AI framework for Phase {phase_number}: {phase_name}
109
+ Goal: {phase_goal}
110
+ </objective>
111
+
112
+ <files_to_read>
113
+ {context_path if exists}
114
+ {requirements_path if exists}
115
+ </files_to_read>
116
+
117
+ <phase_context>
118
+ Phase: {phase_number} — {phase_name}
119
+ Goal: {phase_goal}
120
+ </phase_context>
121
+ ```
122
+
123
+ Parse selector output for: `primary_framework`, `system_type`, `model_provider`, `eval_concerns`, `alternative_framework`.
124
+
125
+ **If selector fails or returns empty:** Exit with error — "Framework selection failed. Re-run /gsd-ai-integration-phase {N} or answer the framework question in /gsd-discuss-phase {N} first."
126
+
127
+ ## 6. Initialize AI-SPEC.md
128
+
129
+ Copy template:
130
+ ```bash
131
+ cp "$HOME/.config/opencode/get-shit-done/templates/AI-SPEC.md" "${PHASE_DIR}/${PADDED_PHASE}-AI-SPEC.md"
132
+ ```
133
+
134
+ Fill in header fields:
135
+ - Phase number and name
136
+ - System classification (from selector)
137
+ - Selected framework (from selector)
138
+ - Alternative considered (from selector)
139
+
140
+ ## 7. Spawn gsd-ai-researcher
141
+
142
+ Display:
143
+ ```
144
+ ◆ Step 2/4 — Researching {primary_framework} docs + AI systems best practices...
145
+ ```
146
+
147
+ Spawn `gsd-ai-researcher` with:
148
+ ```markdown
149
+ read $HOME/.config/opencode/agents/gsd-ai-researcher.md for instructions.
150
+
151
+ <objective>
152
+ Research {primary_framework} for Phase {phase_number}: {phase_name}
153
+ write Sections 3 and 4 of AI-SPEC.md
154
+ </objective>
155
+
156
+ <files_to_read>
157
+ {ai_spec_path}
158
+ {context_path if exists}
159
+ </files_to_read>
160
+
161
+ <input>
162
+ framework: {primary_framework}
163
+ system_type: {system_type}
164
+ model_provider: {model_provider}
165
+ ai_spec_path: {ai_spec_path}
166
+ phase_context: Phase {phase_number}: {phase_name} — {phase_goal}
167
+ </input>
168
+ ```
169
+
170
+ ## 8. Spawn gsd-domain-researcher
171
+
172
+ Display:
173
+ ```
174
+ ◆ Step 3/4 — Researching domain context and expert evaluation criteria...
175
+ ```
176
+
177
+ Spawn `gsd-domain-researcher` with:
178
+ ```markdown
179
+ read $HOME/.config/opencode/agents/gsd-domain-researcher.md for instructions.
180
+
181
+ <objective>
182
+ Research the business domain and expert evaluation criteria for Phase {phase_number}: {phase_name}
183
+ write Section 1b (Domain Context) of AI-SPEC.md
184
+ </objective>
185
+
186
+ <files_to_read>
187
+ {ai_spec_path}
188
+ {context_path if exists}
189
+ {requirements_path if exists}
190
+ </files_to_read>
191
+
192
+ <input>
193
+ system_type: {system_type}
194
+ phase_name: {phase_name}
195
+ phase_goal: {phase_goal}
196
+ ai_spec_path: {ai_spec_path}
197
+ </input>
198
+ ```
199
+
200
+ ## 9. Spawn gsd-eval-planner
201
+
202
+ Display:
203
+ ```
204
+ ◆ Step 4/4 — Designing evaluation strategy from domain + technical context...
205
+ ```
206
+
207
+ Spawn `gsd-eval-planner` with:
208
+ ```markdown
209
+ read $HOME/.config/opencode/agents/gsd-eval-planner.md for instructions.
210
+
211
+ <objective>
212
+ Design evaluation strategy for Phase {phase_number}: {phase_name}
213
+ write Sections 5, 6, and 7 of AI-SPEC.md
214
+ AI-SPEC.md now contains domain context (Section 1b) — use it as your rubric starting point.
215
+ </objective>
216
+
217
+ <files_to_read>
218
+ {ai_spec_path}
219
+ {context_path if exists}
220
+ {requirements_path if exists}
221
+ </files_to_read>
222
+
223
+ <input>
224
+ system_type: {system_type}
225
+ framework: {primary_framework}
226
+ model_provider: {model_provider}
227
+ phase_name: {phase_name}
228
+ phase_goal: {phase_goal}
229
+ ai_spec_path: {ai_spec_path}
230
+ </input>
231
+ ```
232
+
233
+ ## 10. Validate AI-SPEC Completeness
234
+
235
+ read the completed AI-SPEC.md. Check that:
236
+ - Section 2 has a framework name (not placeholder)
237
+ - Section 1b has at least one domain rubric ingredient (Good/Bad/Stakes)
238
+ - Section 3 has a non-empty code block (entry point pattern)
239
+ - Section 4b has a Pydantic example
240
+ - Section 5 has at least one row in the dimensions table
241
+ - Section 6 has at least one guardrail or explicit "N/A for internal tool" note
242
+ - Checklist section at end has 3+ items checked
243
+
244
+ **If validation fails:** Display specific missing sections. Ask user if they want to re-run the specific step or continue anyway.
245
+
246
+ ## 11. Commit
247
+
248
+ **If `commit_docs` is true:**
249
+ ```bash
250
+ git add "${AI_SPEC_FILE}"
251
+ git commit -m "docs({phase_slug}): generate AI-SPEC.md — {primary_framework} + domain context + eval strategy"
252
+ ```
253
+
254
+ ## 12. Display Completion
255
+
256
+ ```
257
+ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
258
+ GSD ► AI-SPEC COMPLETE — PHASE {N}: {name}
259
+ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
260
+
261
+ ◆ Framework: {primary_framework}
262
+ ◆ System Type: {system_type}
263
+ ◆ Domain: {domain_vertical from Section 1b}
264
+ ◆ Eval Dimensions: {eval_concerns}
265
+ ◆ Tracing Default: Arize Phoenix (or detected existing tool)
266
+ ◆ Output: {ai_spec_path}
267
+
268
+ Next step:
269
+ /gsd-plan-phase {N} — planner will consume AI-SPEC.md
270
+ ```
271
+
272
+ </process>
273
+
274
+ <success_criteria>
275
+ - [ ] Framework selected with rationale (Section 2)
276
+ - [ ] AI-SPEC.md created from template
277
+ - [ ] Framework docs + AI best practices researched (Sections 3, 4, 4b populated)
278
+ - [ ] Domain context + expert rubric ingredients researched (Section 1b populated)
279
+ - [ ] Eval strategy grounded in domain context (Sections 5-7 populated)
280
+ - [ ] Arize Phoenix (or detected tool) set as tracing default in Section 7
281
+ - [ ] AI-SPEC.md validated (Sections 1b, 2, 3, 4b, 5, 6 all non-empty)
282
+ - [ ] Committed if commit_docs enabled
283
+ - [ ] Next step surfaced to user
284
+ </success_criteria>
@@ -0,0 +1,154 @@
1
+ <objective>
2
+ Autonomous audit-to-fix pipeline. Runs an audit, parses findings, classifies each as
3
+ auto-fixable vs manual-only, spawns executor agents for fixable issues, runs tests
4
+ after each fix, and commits atomically with finding IDs for traceability.
5
+ </objective>
6
+
7
+ <available_agent_types>
8
+ - gsd-executor — executes a specific, scoped code change
9
+ </available_agent_types>
10
+
11
+ <process>
12
+
13
+ <step name="parse-arguments">
14
+ Extract flags from the user's invocation:
15
+
16
+ - `--max N` — maximum findings to fix (default: **5**)
17
+ - `--severity high|medium|all` — minimum severity to process (default: **medium**)
18
+ - `--dry-run` — classify findings without fixing (shows classification table only)
19
+ - `--source <audit>` — which audit to run (default: **audit-uat**)
20
+
21
+ Validate `--source` is a supported audit. Currently supported:
22
+ - `audit-uat`
23
+
24
+ If `--source` is not supported, stop with an error:
25
+ ```
26
+ Error: Unsupported audit source "{source}". Supported sources: audit-uat
27
+ ```
28
+ </step>
29
+
30
+ <step name="run-audit">
31
+ Invoke the source audit command and capture output.
32
+
33
+ For `audit-uat` source:
34
+ ```bash
35
+ INIT=$(node "$HOME/.config/opencode/get-shit-done/bin/gsd-tools.cjs" init audit-uat 2>/dev/null || echo "{}")
36
+ if [[ "$INIT" == @file:* ]]; then INIT=$(cat "${INIT#@file:}"); fi
37
+ ```
38
+
39
+ read existing UAT and verification files to extract findings:
40
+ - glob: `.planning/phases/*/*-UAT.md`
41
+ - glob: `.planning/phases/*/*-VERIFICATION.md`
42
+
43
+ Parse each finding into a structured record:
44
+ - **ID** — sequential identifier (F-01, F-02, ...)
45
+ - **description** — concise summary of the issue
46
+ - **severity** — high, medium, or low
47
+ - **file_refs** — specific file paths referenced in the finding
48
+ </step>
49
+
50
+ <step name="classify-findings">
51
+ For each finding, classify as one of:
52
+
53
+ - **auto-fixable** — clear code change, specific file referenced, testable fix
54
+ - **manual-only** — requires design decisions, ambiguous scope, architectural changes, user input needed
55
+ - **skip** — severity below the `--severity` threshold
56
+
57
+ **Classification heuristics** (err on manual-only when uncertain):
58
+
59
+ Auto-fixable signals:
60
+ - References a specific file path + line number
61
+ - Describes a missing test or assertion
62
+ - Missing export, wrong import path, typo in identifier
63
+ - Clear single-file change with obvious expected behavior
64
+
65
+ Manual-only signals:
66
+ - Uses words like "consider", "evaluate", "design", "rethink"
67
+ - Requires new architecture or API changes
68
+ - Ambiguous scope or multiple valid approaches
69
+ - Requires user input or design decisions
70
+ - Cross-cutting concerns affecting multiple subsystems
71
+ - Performance or scalability issues without clear fix
72
+
73
+ **When uncertain, always classify as manual-only.**
74
+ </step>
75
+
76
+ <step name="present-classification">
77
+ Display the classification table:
78
+
79
+ ```
80
+ ## Audit-Fix Classification
81
+
82
+ | # | Finding | Severity | Classification | Reason |
83
+ |---|---------|----------|---------------|--------|
84
+ | F-01 | Missing export in index.ts | high | auto-fixable | Specific file, clear fix |
85
+ | F-02 | No error handling in payment flow | high | manual-only | Requires design decisions |
86
+ | F-03 | Test stub with 0 assertions | medium | auto-fixable | Clear test gap |
87
+ ```
88
+
89
+ If `--dry-run` was specified, **stop here and exit**. The classification table is the
90
+ final output — do not proceed to fixing.
91
+ </step>
92
+
93
+ <step name="fix-loop">
94
+ For each **auto-fixable** finding (up to `--max`, ordered by severity desc):
95
+
96
+ **a. Spawn executor agent:**
97
+ ```
98
+ @gsd-executor "Fix finding {ID}: {description}. Files: {file_refs}. Make the minimal change to resolve this specific finding. Do not refactor surrounding code."
99
+ ```
100
+
101
+ **b. Run tests:**
102
+ ```bash
103
+ npm test 2>&1 | tail -20
104
+ ```
105
+
106
+ **c. If tests pass** — commit atomically:
107
+ ```bash
108
+ git add {changed_files}
109
+ git commit -m "fix({scope}): resolve {ID} — {description}"
110
+ ```
111
+ The commit message **must** include the finding ID (e.g., F-01) for traceability.
112
+
113
+ **d. If tests fail** — revert changes, mark finding as `fix-failed`, and **stop the pipeline**:
114
+ ```bash
115
+ git checkout -- {changed_files} 2>/dev/null
116
+ ```
117
+ Log the failure reason and stop processing — do not continue to the next finding.
118
+ A test failure indicates the codebase may be in an unexpected state, so the pipeline
119
+ must halt to avoid cascading issues. Remaining auto-fixable findings will appear in the
120
+ report as `not-attempted`.
121
+ </step>
122
+
123
+ <step name="report">
124
+ Present the final summary:
125
+
126
+ ```
127
+ ## Audit-Fix Complete
128
+
129
+ **Source:** {audit_command}
130
+ **Findings:** {total} total, {auto} auto-fixable, {manual} manual-only
131
+ **Fixed:** {fixed_count}/{auto} auto-fixable findings
132
+ **Failed:** {failed_count} (reverted)
133
+
134
+ | # | Finding | Status | Commit |
135
+ |---|---------|--------|--------|
136
+ | F-01 | Missing export | Fixed | abc1234 |
137
+ | F-03 | Test stub | Fix failed | (reverted) |
138
+
139
+ ### Manual-only findings (require developer attention):
140
+ - F-02: No error handling in payment flow — requires design decisions
141
+ ```
142
+ </step>
143
+
144
+ </process>
145
+
146
+ <success_criteria>
147
+ - Auto-fixable findings processed sequentially until --max reached or a test failure stops the pipeline
148
+ - Tests pass after each committed fix (no broken commits)
149
+ - Failed fixes are reverted cleanly (no partial changes left)
150
+ - Pipeline stops after the first test failure (no cascading fixes)
151
+ - Every commit message contains the finding ID
152
+ - Manual-only findings are surfaced for developer attention
153
+ - --dry-run produces a useful standalone classification table
154
+ </success_criteria>