npm - gsd-opencode - Versions diffs - 1.33.3 → 1.35.0 - Mend

gsd-opencode 1.33.3 → 1.35.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (118) hide show

package/agents/gsd-advisor-researcher.md +23 -0
package/agents/gsd-ai-researcher.md +142 -0
package/agents/gsd-code-fixer.md +523 -0
package/agents/gsd-code-reviewer.md +361 -0
package/agents/gsd-debugger.md +14 -1
package/agents/gsd-domain-researcher.md +162 -0
package/agents/gsd-eval-auditor.md +170 -0
package/agents/gsd-eval-planner.md +161 -0
package/agents/gsd-executor.md +70 -7
package/agents/gsd-framework-selector.md +167 -0
package/agents/gsd-intel-updater.md +320 -0
package/agents/gsd-phase-researcher.md +26 -0
package/agents/gsd-plan-checker.md +12 -0
package/agents/gsd-planner.md +16 -6
package/agents/gsd-project-researcher.md +23 -0
package/agents/gsd-ui-researcher.md +23 -0
package/agents/gsd-verifier.md +55 -1
package/commands/gsd/gsd-ai-integration-phase.md +36 -0
package/commands/gsd/gsd-audit-fix.md +33 -0
package/commands/gsd/gsd-autonomous.md +1 -0
package/commands/gsd/gsd-code-review-fix.md +52 -0
package/commands/gsd/gsd-code-review.md +55 -0
package/commands/gsd/gsd-eval-review.md +32 -0
package/commands/gsd/gsd-explore.md +27 -0
package/commands/gsd/gsd-from-gsd2.md +45 -0
package/commands/gsd/gsd-import.md +36 -0
package/commands/gsd/gsd-intel.md +183 -0
package/commands/gsd/gsd-next.md +2 -0
package/commands/gsd/gsd-reapply-patches.md +58 -3
package/commands/gsd/gsd-review.md +4 -2
package/commands/gsd/gsd-scan.md +26 -0
package/commands/gsd/gsd-undo.md +34 -0
package/commands/gsd/gsd-workstreams.md +6 -6
package/get-shit-done/bin/gsd-tools.cjs +143 -5
package/get-shit-done/bin/lib/commands.cjs +10 -2
package/get-shit-done/bin/lib/config.cjs +71 -37
package/get-shit-done/bin/lib/core.cjs +70 -8
package/get-shit-done/bin/lib/gsd2-import.cjs +511 -0
package/get-shit-done/bin/lib/init.cjs +20 -6
package/get-shit-done/bin/lib/intel.cjs +660 -0
package/get-shit-done/bin/lib/learnings.cjs +378 -0
package/get-shit-done/bin/lib/milestone.cjs +25 -15
package/get-shit-done/bin/lib/model-profiles.cjs +17 -17
package/get-shit-done/bin/lib/phase.cjs +148 -112
package/get-shit-done/bin/lib/roadmap.cjs +12 -5
package/get-shit-done/bin/lib/security.cjs +119 -0
package/get-shit-done/bin/lib/state.cjs +283 -221
package/get-shit-done/bin/lib/template.cjs +8 -4
package/get-shit-done/bin/lib/verify.cjs +42 -5
package/get-shit-done/references/ai-evals.md +156 -0
package/get-shit-done/references/ai-frameworks.md +186 -0
package/get-shit-done/references/common-bug-patterns.md +114 -0
package/get-shit-done/references/few-shot-examples/plan-checker.md +73 -0
package/get-shit-done/references/few-shot-examples/verifier.md +109 -0
package/get-shit-done/references/gates.md +70 -0
package/get-shit-done/references/ios-scaffold.md +123 -0
package/get-shit-done/references/model-profile-resolution.md +6 -7
package/get-shit-done/references/model-profiles.md +20 -14
package/get-shit-done/references/planning-config.md +237 -0
package/get-shit-done/references/thinking-models-debug.md +44 -0
package/get-shit-done/references/thinking-models-execution.md +50 -0
package/get-shit-done/references/thinking-models-planning.md +62 -0
package/get-shit-done/references/thinking-models-research.md +50 -0
package/get-shit-done/references/thinking-models-verification.md +55 -0
package/get-shit-done/references/thinking-partner.md +96 -0
package/get-shit-done/references/universal-anti-patterns.md +6 -1
package/get-shit-done/references/verification-overrides.md +227 -0
package/get-shit-done/templates/AI-SPEC.md +246 -0
package/get-shit-done/workflows/add-tests.md +3 -0
package/get-shit-done/workflows/add-todo.md +2 -0
package/get-shit-done/workflows/ai-integration-phase.md +284 -0
package/get-shit-done/workflows/audit-fix.md +154 -0
package/get-shit-done/workflows/autonomous.md +33 -2
package/get-shit-done/workflows/check-todos.md +2 -0
package/get-shit-done/workflows/cleanup.md +2 -0
package/get-shit-done/workflows/code-review-fix.md +497 -0
package/get-shit-done/workflows/code-review.md +515 -0
package/get-shit-done/workflows/complete-milestone.md +40 -15
package/get-shit-done/workflows/diagnose-issues.md +1 -1
package/get-shit-done/workflows/discovery-phase.md +3 -1
package/get-shit-done/workflows/discuss-phase-assumptions.md +1 -1
package/get-shit-done/workflows/discuss-phase.md +21 -7
package/get-shit-done/workflows/do.md +2 -0
package/get-shit-done/workflows/docs-update.md +2 -0
package/get-shit-done/workflows/eval-review.md +155 -0
package/get-shit-done/workflows/execute-phase.md +307 -57
package/get-shit-done/workflows/execute-plan.md +64 -93
package/get-shit-done/workflows/explore.md +136 -0
package/get-shit-done/workflows/help.md +1 -1
package/get-shit-done/workflows/import.md +273 -0
package/get-shit-done/workflows/inbox.md +387 -0
package/get-shit-done/workflows/manager.md +4 -10
package/get-shit-done/workflows/new-milestone.md +3 -1
package/get-shit-done/workflows/new-project.md +2 -0
package/get-shit-done/workflows/new-workspace.md +2 -0
package/get-shit-done/workflows/next.md +56 -0
package/get-shit-done/workflows/note.md +2 -0
package/get-shit-done/workflows/plan-phase.md +97 -17
package/get-shit-done/workflows/plant-seed.md +3 -0
package/get-shit-done/workflows/pr-branch.md +41 -13
package/get-shit-done/workflows/profile-user.md +4 -2
package/get-shit-done/workflows/quick.md +99 -4
package/get-shit-done/workflows/remove-workspace.md +2 -0
package/get-shit-done/workflows/review.md +53 -6
package/get-shit-done/workflows/scan.md +98 -0
package/get-shit-done/workflows/secure-phase.md +2 -0
package/get-shit-done/workflows/settings.md +18 -3
package/get-shit-done/workflows/ship.md +3 -0
package/get-shit-done/workflows/ui-phase.md +10 -2
package/get-shit-done/workflows/ui-review.md +2 -0
package/get-shit-done/workflows/undo.md +314 -0
package/get-shit-done/workflows/update.md +2 -0
package/get-shit-done/workflows/validate-phase.md +2 -0
package/get-shit-done/workflows/verify-phase.md +83 -0
package/get-shit-done/workflows/verify-work.md +12 -1
package/package.json +1 -1
package/skills/gsd-code-review/SKILL.md +48 -0
package/skills/gsd-code-review-fix/SKILL.md +44 -0

package/get-shit-done/templates/AI-SPEC.md ADDED Viewed

@@ -0,0 +1,246 @@
+# AI-SPEC — Phase {N}: {phase_name}
+> AI design contract generated by `/gsd-ai-integration-phase`. Consumed by `gsd-planner` and `gsd-eval-auditor`.
+> Locks framework selection, implementation guidance, and evaluation strategy before planning begins.
+---
+## 1. System Classification
+**System Type:** <!-- RAG | Multi-Agent | Conversational | Extraction | Autonomous Agent | Content Generation | Code Automation | Hybrid -->
+**Description:**
+<!-- One-paragraph description of what this AI system does, who uses it, and what "good" looks like -->
+**Critical Failure Modes:**
+<!-- The 3-5 behaviors that absolutely cannot go wrong in this system -->
+1.
+2.
+3.
+---
+## 1b. Domain Context
+> Researched by `gsd-domain-researcher`. Grounds the evaluation strategy in domain expert knowledge.
+**Industry Vertical:** <!-- healthcare | legal | finance | customer service | education | developer tooling | e-commerce | etc. -->
+**User Population:** <!-- who uses this system and in what context -->
+**Stakes Level:** <!-- Low | Medium | High | Critical -->
+**Output Consequence:** <!-- what happens downstream when the AI output is acted on -->
+### What Domain Experts Evaluate Against
+<!-- Domain-specific rubric ingredients — in practitioner language, not AI jargon -->
+<!-- Format: Dimension / Good (expert accepts) / Bad (expert flags) / Stakes / Source -->
+### Known Failure Modes in This Domain
+<!-- Domain-specific failure modes from research — not generic hallucination, but how it manifests here -->
+### Regulatory / Compliance Context
+<!-- Relevant regulations or constraints — or "None identified" if genuinely none apply -->
+### Domain Expert Roles for Evaluation
+| Role | Responsibility |
+|------|---------------|
+| <!-- e.g., Senior practitioner --> | <!-- Dataset labeling / rubric calibration / production sampling --> |
+---
+## 2. Framework Decision
+**Selected Framework:** <!-- e.g., LlamaIndex v0.10.x -->
+**Version:** <!-- Pin the version -->
+**Rationale:**
+<!-- Why this framework fits this system type, team context, and production requirements -->
+**Alternatives Considered:**
+| Framework | Ruled Out Because |
+|-----------|------------------|
+| | |
+**Vendor Lock-In Accepted:** <!-- Yes / No / Partial — document the trade-off consciously -->
+---
+## 3. Framework Quick Reference
+> Fetched from official docs by `gsd-ai-researcher`. Distilled for this specific use case.
+### Installation
+```bash
+# Install command(s)
+```
+### Core Imports
+```python
+# Key imports for this use case
+```
+### Entry Point Pattern
+```python
+# Minimal working example for this system type
+```
+### Key Abstractions
+<!-- Framework-specific concepts the developer must understand before coding -->
+| Concept | What It Is | When You Use It |
+|---------|-----------|-----------------|
+| | | |
+### Common Pitfalls
+<!-- Gotchas specific to this framework and system type — from docs, issues, and community reports -->
+1.
+2.
+3.
+### Recommended Project Structure
+```
+project/
+├── # Framework-specific folder layout
+```
+---
+## 4. Implementation Guidance
+**Model Configuration:**
+<!-- Which model(s), temperature, max tokens, and other key parameters -->
+**Core Pattern:**
+<!-- The primary implementation pattern for this system type in this framework -->
+**Tool Use:**
+<!-- Tools/integrations needed and how to configure them -->
+**State Management:**
+<!-- How state is persisted, retrieved, and updated -->
+**Context Window Strategy:**
+<!-- How to manage context limits for this system type -->
+---
+## 4b. AI Systems Best Practices
+> Written by `gsd-ai-researcher`. Cross-cutting patterns every developer building AI systems needs — independent of framework choice.
+### Structured Outputs with Pydantic
+<!-- Framework-specific Pydantic integration pattern for this use case -->
+<!-- Include: output model definition, how the framework uses it, retry logic on validation failure -->
+```python
+# Pydantic output model for this system type
+```
+### Async-First Design
+<!-- How async is handled in this framework, the one common mistake, and when to stream vs. await -->
+### Prompt Engineering Discipline
+<!-- System vs. user prompt separation, few-shot guidance, token budget strategy -->
+### Context Window Management
+<!-- Strategy specific to this system type: RAG chunking / conversation summarisation / agent compaction -->
+### Cost and Latency Budget
+<!-- Per-call cost estimate, caching strategy, sub-task model routing -->
+---
+## 5. Evaluation Strategy
+### Dimensions
+| Dimension | Rubric (Pass/Fail or 1-5) | Measurement Approach | Priority |
+|-----------|--------------------------|---------------------|----------|
+| | | Code / LLM Judge / Human | Critical / High / Medium |
+### Eval Tooling
+**Primary Tool:** <!-- e.g., RAGAS + Langfuse -->
+**Setup:**
+```bash
+# Install and configure
+```
+**CI/CD Integration:**
+```bash
+# Command to run evals in CI/CD pipeline
+```
+### Reference Dataset
+**Size:** <!-- e.g., 20 examples to start -->
+**Composition:**
+<!-- What scenario types the dataset covers: critical paths, edge cases, failure modes -->
+**Labeling:**
+<!-- Who labels examples and how (domain expert, LLM judge with calibration, etc.) -->
+---
+## 6. Guardrails
+### Online (Real-Time)
+| Guardrail | Trigger | Intervention |
+|-----------|---------|--------------|
+| | | Block / Escalate / Flag |
+### Offline (Flywheel)
+| Metric | Sampling Strategy | Action on Degradation |
+|--------|------------------|----------------------|
+| | | |
+---
+## 7. Production Monitoring
+**Tracing Tool:** <!-- e.g., Langfuse self-hosted -->
+**Key Metrics to Track:**
+<!-- 3-5 metrics that will be monitored in production -->
+**Alert Thresholds:**
+<!-- When to page/alert -->
+**Smart Sampling Strategy:**
+<!-- How to select interactions for human review — signal-based filters -->
+---
+## Checklist
+- [ ] System type classified
+- [ ] Critical failure modes identified (≥ 3)
+- [ ] Domain context researched (Section 1b: vertical, stakes, expert criteria, failure modes)
+- [ ] Regulatory/compliance context identified or explicitly noted as none
+- [ ] Domain expert roles defined for evaluation involvement
+- [ ] Framework selected with rationale documented
+- [ ] Alternatives considered and ruled out
+- [ ] Framework quick reference written (install, imports, pattern, pitfalls)
+- [ ] AI systems best practices written (Section 4b: Pydantic, async, prompt discipline, context)
+- [ ] Evaluation dimensions grounded in domain rubric ingredients
+- [ ] Each eval dimension has a concrete rubric (Good/Bad in domain language)
+- [ ] Eval tooling selected — Arize Phoenix default confirmed or override noted
+- [ ] Reference dataset spec written (size ≥ 10, composition + labeling defined)
+- [ ] CI/CD eval integration specified
+- [ ] Online guardrails defined
+- [ ] Production monitoring configured (tracing tool + sampling strategy)

package/get-shit-done/workflows/add-tests.md CHANGED Viewed

@@ -108,6 +108,9 @@ read each file to verify classification. Don't classify based on filename alone.
 <step name="present_classification">
 Present the classification to the user for confirmation before proceeding:
+**Text mode (`workflow.text_mode: true` in config or `--text` flag):** Set `TEXT_MODE=true` if `--text` is present in `$ARGUMENTS` OR `text_mode` from init JSON is `true`. When TEXT_MODE is active, replace every `question` call with a plain-text numbered list and ask the user to type their choice number. This is required for non-OpenCode runtimes (OpenAI Codex, Gemini CLI, etc.) where `question` is not available.
 ```
 question(
   header: "Test Classification",

package/get-shit-done/workflows/add-todo.md CHANGED Viewed

@@ -70,6 +70,8 @@ If potential duplicate found:
 1. read the existing todo
 2. Compare scope
+**Text mode (`workflow.text_mode: true` in config or `--text` flag):** Set `TEXT_MODE=true` if `--text` is present in `$ARGUMENTS` OR `text_mode` from init JSON is `true`. When TEXT_MODE is active, replace every `question` call with a plain-text numbered list and ask the user to type their choice number. This is required for non-OpenCode runtimes (OpenAI Codex, Gemini CLI, etc.) where `question` is not available.
 If overlapping, use question:
 - header: "Duplicate?"
 - question: "Similar todo exists: [title]. What would you like to do?"

package/get-shit-done/workflows/ai-integration-phase.md ADDED Viewed

@@ -0,0 +1,284 @@
+<objective>
+Generate an AI design contract (AI-SPEC.md) for phases that involve building AI systems. Orchestrates gsd-framework-selector → gsd-ai-researcher → gsd-domain-researcher → gsd-eval-planner with a validation gate. Inserts between discuss-phase and plan-phase in the GSD lifecycle.
+AI-SPEC.md locks four things before the planner creates tasks:
+1. Framework selection (with rationale and alternatives)
+2. Implementation guidance (correct syntax, patterns, pitfalls from official docs)
+3. Domain context (practitioner rubric ingredients, failure modes, regulatory constraints)
+4. Evaluation strategy (dimensions, rubrics, tooling, reference dataset, guardrails)
+This prevents the two most common AI development failures: choosing the wrong framework for the use case, and treating evaluation as an afterthought.
+</objective>
+<required_reading>
+@$HOME/.config/opencode/get-shit-done/references/ai-frameworks.md
+@$HOME/.config/opencode/get-shit-done/references/ai-evals.md
+</required_reading>
+<process>
+## 1. Initialize
+```bash
+INIT=$(node "$HOME/.config/opencode/get-shit-done/bin/gsd-tools.cjs" init plan-phase "$PHASE")
+if [[ "$INIT" == @file:* ]]; then INIT=$(cat "${INIT#@file:}"); fi
+```
+Parse JSON for: `phase_dir`, `phase_number`, `phase_name`, `phase_slug`, `padded_phase`, `has_context`, `has_research`, `commit_docs`.
+**File paths:** `state_path`, `roadmap_path`, `requirements_path`, `context_path`.
+Resolve agent models:
+```bash
+SELECTOR_MODEL=$(node "$HOME/.config/opencode/get-shit-done/bin/gsd-tools.cjs" resolve-model gsd-framework-selector --raw)
+RESEARCHER_MODEL=$(node "$HOME/.config/opencode/get-shit-done/bin/gsd-tools.cjs" resolve-model gsd-ai-researcher --raw)
+DOMAIN_MODEL=$(node "$HOME/.config/opencode/get-shit-done/bin/gsd-tools.cjs" resolve-model gsd-domain-researcher --raw)
+PLANNER_MODEL=$(node "$HOME/.config/opencode/get-shit-done/bin/gsd-tools.cjs" resolve-model gsd-eval-planner --raw)
+```
+Check config:
+```bash
+AI_PHASE_ENABLED=$(node "$HOME/.config/opencode/get-shit-done/bin/gsd-tools.cjs" config-get workflow.ai_integration_phase 2>/dev/null || echo "true")
+```
+**If `AI_PHASE_ENABLED` is `false`:**
+```
+AI phase is disabled in config. Enable via /gsd-settings.
+```
+Exit workflow.
+**If `planning_exists` is false:** Error — run `/gsd-new-project` first.
+## 2. Parse and Validate Phase
+Extract phase number from $ARGUMENTS. If not provided, detect next unplanned phase.
+```bash
+PHASE_INFO=$(node "$HOME/.config/opencode/get-shit-done/bin/gsd-tools.cjs" roadmap get-phase "${PHASE}")
+```
+**If `found` is false:** Error with available phases.
+## 3. Check Prerequisites
+**If `has_context` is false:**
+```
+No CONTEXT.md found for Phase {N}.
+Recommended: run /gsd-discuss-phase {N} first to capture framework preferences.
+Continuing without user decisions — framework selector will ask all questions.
+```
+Continue (non-blocking).
+## 4. Check Existing AI-SPEC
+```bash
+AI_SPEC_FILE=$(ls "${PHASE_DIR}"/*-AI-SPEC.md 2>/dev/null | head -1)
+```
+**Text mode (`workflow.text_mode: true` in config or `--text` flag):** Set `TEXT_MODE=true` if `--text` is present in `$ARGUMENTS` OR `text_mode` from init JSON is `true`. When TEXT_MODE is active, replace every `question` call with a plain-text numbered list and ask the user to type their choice number. This is required for non-OpenCode runtimes (OpenAI Codex, Gemini CLI, etc.) where `question` is not available.
+**If exists:** Use question:
+- header: "Existing AI-SPEC"
+- question: "AI-SPEC.md already exists for Phase {N}. What would you like to do?"
+- options:
+  - "Update — re-run with existing as baseline"
+  - "View — display current AI-SPEC and exit"
+  - "Skip — keep current AI-SPEC and exit"
+If "View": display file contents, exit.
+If "Skip": exit.
+If "Update": continue to step 5.
+## 5. Spawn gsd-framework-selector
+Display:
+```
+━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
+ GSD ► AI DESIGN CONTRACT — PHASE {N}: {name}
+━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
+◆ Step 1/4 — Framework Selection...
+```
+Spawn `gsd-framework-selector` with:
+```markdown
+read $HOME/.config/opencode/agents/gsd-framework-selector.md for instructions.
+<objective>
+Select the right AI framework for Phase {phase_number}: {phase_name}
+Goal: {phase_goal}
+</objective>
+<files_to_read>
+{context_path if exists}
+{requirements_path if exists}
+</files_to_read>
+<phase_context>
+Phase: {phase_number} — {phase_name}
+Goal: {phase_goal}
+</phase_context>
+```
+Parse selector output for: `primary_framework`, `system_type`, `model_provider`, `eval_concerns`, `alternative_framework`.
+**If selector fails or returns empty:** Exit with error — "Framework selection failed. Re-run /gsd-ai-integration-phase {N} or answer the framework question in /gsd-discuss-phase {N} first."
+## 6. Initialize AI-SPEC.md
+Copy template:
+```bash
+cp "$HOME/.config/opencode/get-shit-done/templates/AI-SPEC.md" "${PHASE_DIR}/${PADDED_PHASE}-AI-SPEC.md"
+```
+Fill in header fields:
+- Phase number and name
+- System classification (from selector)
+- Selected framework (from selector)
+- Alternative considered (from selector)
+## 7. Spawn gsd-ai-researcher
+Display:
+```
+◆ Step 2/4 — Researching {primary_framework} docs + AI systems best practices...
+```
+Spawn `gsd-ai-researcher` with:
+```markdown
+read $HOME/.config/opencode/agents/gsd-ai-researcher.md for instructions.
+<objective>
+Research {primary_framework} for Phase {phase_number}: {phase_name}
+write Sections 3 and 4 of AI-SPEC.md
+</objective>
+<files_to_read>
+{ai_spec_path}
+{context_path if exists}
+</files_to_read>
+<input>
+framework: {primary_framework}
+system_type: {system_type}
+model_provider: {model_provider}
+ai_spec_path: {ai_spec_path}
+phase_context: Phase {phase_number}: {phase_name} — {phase_goal}
+</input>
+```
+## 8. Spawn gsd-domain-researcher
+Display:
+```
+◆ Step 3/4 — Researching domain context and expert evaluation criteria...
+```
+Spawn `gsd-domain-researcher` with:
+```markdown
+read $HOME/.config/opencode/agents/gsd-domain-researcher.md for instructions.
+<objective>
+Research the business domain and expert evaluation criteria for Phase {phase_number}: {phase_name}
+write Section 1b (Domain Context) of AI-SPEC.md
+</objective>
+<files_to_read>
+{ai_spec_path}
+{context_path if exists}
+{requirements_path if exists}
+</files_to_read>
+<input>
+system_type: {system_type}
+phase_name: {phase_name}
+phase_goal: {phase_goal}
+ai_spec_path: {ai_spec_path}
+</input>
+```
+## 9. Spawn gsd-eval-planner
+Display:
+```
+◆ Step 4/4 — Designing evaluation strategy from domain + technical context...
+```
+Spawn `gsd-eval-planner` with:
+```markdown
+read $HOME/.config/opencode/agents/gsd-eval-planner.md for instructions.
+<objective>
+Design evaluation strategy for Phase {phase_number}: {phase_name}
+write Sections 5, 6, and 7 of AI-SPEC.md
+AI-SPEC.md now contains domain context (Section 1b) — use it as your rubric starting point.
+</objective>
+<files_to_read>
+{ai_spec_path}
+{context_path if exists}
+{requirements_path if exists}
+</files_to_read>
+<input>
+system_type: {system_type}
+framework: {primary_framework}
+model_provider: {model_provider}
+phase_name: {phase_name}
+phase_goal: {phase_goal}
+ai_spec_path: {ai_spec_path}
+</input>
+```
+## 10. Validate AI-SPEC Completeness
+read the completed AI-SPEC.md. Check that:
+- Section 2 has a framework name (not placeholder)
+- Section 1b has at least one domain rubric ingredient (Good/Bad/Stakes)
+- Section 3 has a non-empty code block (entry point pattern)
+- Section 4b has a Pydantic example
+- Section 5 has at least one row in the dimensions table
+- Section 6 has at least one guardrail or explicit "N/A for internal tool" note
+- Checklist section at end has 3+ items checked
+**If validation fails:** Display specific missing sections. Ask user if they want to re-run the specific step or continue anyway.
+## 11. Commit
+**If `commit_docs` is true:**
+```bash
+git add "${AI_SPEC_FILE}"
+git commit -m "docs({phase_slug}): generate AI-SPEC.md — {primary_framework} + domain context + eval strategy"
+```
+## 12. Display Completion
+```
+━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
+ GSD ► AI-SPEC COMPLETE — PHASE {N}: {name}
+━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
+◆ Framework: {primary_framework}
+◆ System Type: {system_type}
+◆ Domain: {domain_vertical from Section 1b}
+◆ Eval Dimensions: {eval_concerns}
+◆ Tracing Default: Arize Phoenix (or detected existing tool)
+◆ Output: {ai_spec_path}
+Next step:
+  /gsd-plan-phase {N}   — planner will consume AI-SPEC.md
+```
+</process>
+<success_criteria>
+- [ ] Framework selected with rationale (Section 2)
+- [ ] AI-SPEC.md created from template
+- [ ] Framework docs + AI best practices researched (Sections 3, 4, 4b populated)
+- [ ] Domain context + expert rubric ingredients researched (Section 1b populated)
+- [ ] Eval strategy grounded in domain context (Sections 5-7 populated)
+- [ ] Arize Phoenix (or detected tool) set as tracing default in Section 7
+- [ ] AI-SPEC.md validated (Sections 1b, 2, 3, 4b, 5, 6 all non-empty)
+- [ ] Committed if commit_docs enabled
+- [ ] Next step surfaced to user
+</success_criteria>

package/get-shit-done/workflows/audit-fix.md ADDED Viewed

@@ -0,0 +1,154 @@
+<objective>
+Autonomous audit-to-fix pipeline. Runs an audit, parses findings, classifies each as
+auto-fixable vs manual-only, spawns executor agents for fixable issues, runs tests
+after each fix, and commits atomically with finding IDs for traceability.
+</objective>
+<available_agent_types>
+- gsd-executor — executes a specific, scoped code change
+</available_agent_types>
+<process>
+<step name="parse-arguments">
+Extract flags from the user's invocation:
+- `--max N` — maximum findings to fix (default: **5**)
+- `--severity high|medium|all` — minimum severity to process (default: **medium**)
+- `--dry-run` — classify findings without fixing (shows classification table only)
+- `--source <audit>` — which audit to run (default: **audit-uat**)
+Validate `--source` is a supported audit. Currently supported:
+- `audit-uat`
+If `--source` is not supported, stop with an error:
+```
+Error: Unsupported audit source "{source}". Supported sources: audit-uat
+```
+</step>
+<step name="run-audit">
+Invoke the source audit command and capture output.
+For `audit-uat` source:
+```bash
+INIT=$(node "$HOME/.config/opencode/get-shit-done/bin/gsd-tools.cjs" init audit-uat 2>/dev/null || echo "{}")
+if [[ "$INIT" == @file:* ]]; then INIT=$(cat "${INIT#@file:}"); fi
+```
+read existing UAT and verification files to extract findings:
+- glob: `.planning/phases/*/*-UAT.md`
+- glob: `.planning/phases/*/*-VERIFICATION.md`
+Parse each finding into a structured record:
+- **ID** — sequential identifier (F-01, F-02, ...)
+- **description** — concise summary of the issue
+- **severity** — high, medium, or low
+- **file_refs** — specific file paths referenced in the finding
+</step>
+<step name="classify-findings">
+For each finding, classify as one of:
+- **auto-fixable** — clear code change, specific file referenced, testable fix
+- **manual-only** — requires design decisions, ambiguous scope, architectural changes, user input needed
+- **skip** — severity below the `--severity` threshold
+**Classification heuristics** (err on manual-only when uncertain):
+Auto-fixable signals:
+- References a specific file path + line number
+- Describes a missing test or assertion
+- Missing export, wrong import path, typo in identifier
+- Clear single-file change with obvious expected behavior
+Manual-only signals:
+- Uses words like "consider", "evaluate", "design", "rethink"
+- Requires new architecture or API changes
+- Ambiguous scope or multiple valid approaches
+- Requires user input or design decisions
+- Cross-cutting concerns affecting multiple subsystems
+- Performance or scalability issues without clear fix
+**When uncertain, always classify as manual-only.**
+</step>
+<step name="present-classification">
+Display the classification table:
+```
+## Audit-Fix Classification
+| # | Finding | Severity | Classification | Reason |
+|---|---------|----------|---------------|--------|
+| F-01 | Missing export in index.ts | high | auto-fixable | Specific file, clear fix |
+| F-02 | No error handling in payment flow | high | manual-only | Requires design decisions |
+| F-03 | Test stub with 0 assertions | medium | auto-fixable | Clear test gap |
+```
+If `--dry-run` was specified, **stop here and exit**. The classification table is the
+final output — do not proceed to fixing.
+</step>
+<step name="fix-loop">
+For each **auto-fixable** finding (up to `--max`, ordered by severity desc):
+**a. Spawn executor agent:**
+```
+@gsd-executor "Fix finding {ID}: {description}. Files: {file_refs}. Make the minimal change to resolve this specific finding. Do not refactor surrounding code."
+```
+**b. Run tests:**
+```bash
+npm test 2>&1 | tail -20
+```
+**c. If tests pass** — commit atomically:
+```bash
+git add {changed_files}
+git commit -m "fix({scope}): resolve {ID} — {description}"
+```
+The commit message **must** include the finding ID (e.g., F-01) for traceability.
+**d. If tests fail** — revert changes, mark finding as `fix-failed`, and **stop the pipeline**:
+```bash
+git checkout -- {changed_files} 2>/dev/null
+```
+Log the failure reason and stop processing — do not continue to the next finding.
+A test failure indicates the codebase may be in an unexpected state, so the pipeline
+must halt to avoid cascading issues. Remaining auto-fixable findings will appear in the
+report as `not-attempted`.
+</step>
+<step name="report">
+Present the final summary:
+```
+## Audit-Fix Complete
+**Source:** {audit_command}
+**Findings:** {total} total, {auto} auto-fixable, {manual} manual-only
+**Fixed:** {fixed_count}/{auto} auto-fixable findings
+**Failed:** {failed_count} (reverted)
+| # | Finding | Status | Commit |
+|---|---------|--------|--------|
+| F-01 | Missing export | Fixed | abc1234 |
+| F-03 | Test stub | Fix failed | (reverted) |
+### Manual-only findings (require developer attention):
+- F-02: No error handling in payment flow — requires design decisions
+```
+</step>
+</process>
+<success_criteria>
+- Auto-fixable findings processed sequentially until --max reached or a test failure stops the pipeline
+- Tests pass after each committed fix (no broken commits)
+- Failed fixes are reverted cleanly (no partial changes left)
+- Pipeline stops after the first test failure (no cascading fixes)
+- Every commit message contains the finding ID
+- Manual-only findings are surfaced for developer attention
+- --dry-run produces a useful standalone classification table
+</success_criteria>