npm - gsd-opencode - Versions diffs - 1.33.3 → 1.35.0 - Mend

gsd-opencode 1.33.3 → 1.35.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (118) hide show

package/agents/gsd-advisor-researcher.md +23 -0
package/agents/gsd-ai-researcher.md +142 -0
package/agents/gsd-code-fixer.md +523 -0
package/agents/gsd-code-reviewer.md +361 -0
package/agents/gsd-debugger.md +14 -1
package/agents/gsd-domain-researcher.md +162 -0
package/agents/gsd-eval-auditor.md +170 -0
package/agents/gsd-eval-planner.md +161 -0
package/agents/gsd-executor.md +70 -7
package/agents/gsd-framework-selector.md +167 -0
package/agents/gsd-intel-updater.md +320 -0
package/agents/gsd-phase-researcher.md +26 -0
package/agents/gsd-plan-checker.md +12 -0
package/agents/gsd-planner.md +16 -6
package/agents/gsd-project-researcher.md +23 -0
package/agents/gsd-ui-researcher.md +23 -0
package/agents/gsd-verifier.md +55 -1
package/commands/gsd/gsd-ai-integration-phase.md +36 -0
package/commands/gsd/gsd-audit-fix.md +33 -0
package/commands/gsd/gsd-autonomous.md +1 -0
package/commands/gsd/gsd-code-review-fix.md +52 -0
package/commands/gsd/gsd-code-review.md +55 -0
package/commands/gsd/gsd-eval-review.md +32 -0
package/commands/gsd/gsd-explore.md +27 -0
package/commands/gsd/gsd-from-gsd2.md +45 -0
package/commands/gsd/gsd-import.md +36 -0
package/commands/gsd/gsd-intel.md +183 -0
package/commands/gsd/gsd-next.md +2 -0
package/commands/gsd/gsd-reapply-patches.md +58 -3
package/commands/gsd/gsd-review.md +4 -2
package/commands/gsd/gsd-scan.md +26 -0
package/commands/gsd/gsd-undo.md +34 -0
package/commands/gsd/gsd-workstreams.md +6 -6
package/get-shit-done/bin/gsd-tools.cjs +143 -5
package/get-shit-done/bin/lib/commands.cjs +10 -2
package/get-shit-done/bin/lib/config.cjs +71 -37
package/get-shit-done/bin/lib/core.cjs +70 -8
package/get-shit-done/bin/lib/gsd2-import.cjs +511 -0
package/get-shit-done/bin/lib/init.cjs +20 -6
package/get-shit-done/bin/lib/intel.cjs +660 -0
package/get-shit-done/bin/lib/learnings.cjs +378 -0
package/get-shit-done/bin/lib/milestone.cjs +25 -15
package/get-shit-done/bin/lib/model-profiles.cjs +17 -17
package/get-shit-done/bin/lib/phase.cjs +148 -112
package/get-shit-done/bin/lib/roadmap.cjs +12 -5
package/get-shit-done/bin/lib/security.cjs +119 -0
package/get-shit-done/bin/lib/state.cjs +283 -221
package/get-shit-done/bin/lib/template.cjs +8 -4
package/get-shit-done/bin/lib/verify.cjs +42 -5
package/get-shit-done/references/ai-evals.md +156 -0
package/get-shit-done/references/ai-frameworks.md +186 -0
package/get-shit-done/references/common-bug-patterns.md +114 -0
package/get-shit-done/references/few-shot-examples/plan-checker.md +73 -0
package/get-shit-done/references/few-shot-examples/verifier.md +109 -0
package/get-shit-done/references/gates.md +70 -0
package/get-shit-done/references/ios-scaffold.md +123 -0
package/get-shit-done/references/model-profile-resolution.md +6 -7
package/get-shit-done/references/model-profiles.md +20 -14
package/get-shit-done/references/planning-config.md +237 -0
package/get-shit-done/references/thinking-models-debug.md +44 -0
package/get-shit-done/references/thinking-models-execution.md +50 -0
package/get-shit-done/references/thinking-models-planning.md +62 -0
package/get-shit-done/references/thinking-models-research.md +50 -0
package/get-shit-done/references/thinking-models-verification.md +55 -0
package/get-shit-done/references/thinking-partner.md +96 -0
package/get-shit-done/references/universal-anti-patterns.md +6 -1
package/get-shit-done/references/verification-overrides.md +227 -0
package/get-shit-done/templates/AI-SPEC.md +246 -0
package/get-shit-done/workflows/add-tests.md +3 -0
package/get-shit-done/workflows/add-todo.md +2 -0
package/get-shit-done/workflows/ai-integration-phase.md +284 -0
package/get-shit-done/workflows/audit-fix.md +154 -0
package/get-shit-done/workflows/autonomous.md +33 -2
package/get-shit-done/workflows/check-todos.md +2 -0
package/get-shit-done/workflows/cleanup.md +2 -0
package/get-shit-done/workflows/code-review-fix.md +497 -0
package/get-shit-done/workflows/code-review.md +515 -0
package/get-shit-done/workflows/complete-milestone.md +40 -15
package/get-shit-done/workflows/diagnose-issues.md +1 -1
package/get-shit-done/workflows/discovery-phase.md +3 -1
package/get-shit-done/workflows/discuss-phase-assumptions.md +1 -1
package/get-shit-done/workflows/discuss-phase.md +21 -7
package/get-shit-done/workflows/do.md +2 -0
package/get-shit-done/workflows/docs-update.md +2 -0
package/get-shit-done/workflows/eval-review.md +155 -0
package/get-shit-done/workflows/execute-phase.md +307 -57
package/get-shit-done/workflows/execute-plan.md +64 -93
package/get-shit-done/workflows/explore.md +136 -0
package/get-shit-done/workflows/help.md +1 -1
package/get-shit-done/workflows/import.md +273 -0
package/get-shit-done/workflows/inbox.md +387 -0
package/get-shit-done/workflows/manager.md +4 -10
package/get-shit-done/workflows/new-milestone.md +3 -1
package/get-shit-done/workflows/new-project.md +2 -0
package/get-shit-done/workflows/new-workspace.md +2 -0
package/get-shit-done/workflows/next.md +56 -0
package/get-shit-done/workflows/note.md +2 -0
package/get-shit-done/workflows/plan-phase.md +97 -17
package/get-shit-done/workflows/plant-seed.md +3 -0
package/get-shit-done/workflows/pr-branch.md +41 -13
package/get-shit-done/workflows/profile-user.md +4 -2
package/get-shit-done/workflows/quick.md +99 -4
package/get-shit-done/workflows/remove-workspace.md +2 -0
package/get-shit-done/workflows/review.md +53 -6
package/get-shit-done/workflows/scan.md +98 -0
package/get-shit-done/workflows/secure-phase.md +2 -0
package/get-shit-done/workflows/settings.md +18 -3
package/get-shit-done/workflows/ship.md +3 -0
package/get-shit-done/workflows/ui-phase.md +10 -2
package/get-shit-done/workflows/ui-review.md +2 -0
package/get-shit-done/workflows/undo.md +314 -0
package/get-shit-done/workflows/update.md +2 -0
package/get-shit-done/workflows/validate-phase.md +2 -0
package/get-shit-done/workflows/verify-phase.md +83 -0
package/get-shit-done/workflows/verify-work.md +12 -1
package/package.json +1 -1
package/skills/gsd-code-review/SKILL.md +48 -0
package/skills/gsd-code-review-fix/SKILL.md +44 -0

package/get-shit-done/references/thinking-models-debug.md ADDED Viewed

@@ -0,0 +1,44 @@
+# Thinking Models: Debug Cluster
+Structured reasoning models for the **debugger** agent. Apply these at decision points during investigation, not continuously. Each model counters a specific documented failure mode.
+Source: Curated from [thinking-partner](https://github.com/mattnowdev/thinking-partner) model catalog (150+ models). Selected for direct applicability to GSD debugging workflow.
+## Conflict Resolution
+**Fault Tree and Hypothesis-Driven are sequential:** Fault Tree FIRST (generate the tree of possible causes), Hypothesis-Driven SECOND (test each branch systematically). Fault Tree provides the map; Hypothesis-Driven provides the discipline to traverse it.
+## 1. Fault Tree Analysis
+**Counters:** Jumping to conclusions without systematically mapping failure paths.
+Before testing any hypothesis, build a fault tree: start with the observed symptom as the root node, then branch into all possible causes at each level (hardware, software, configuration, data, environment). Use AND/OR gates -- some failures require multiple conditions (AND), others have independent triggers (OR). This tree becomes your investigation roadmap. Prioritize branches by likelihood and testability, but do NOT prune branches just because they seem unlikely -- unlikely causes that are easy to test should be tested early.
+## 2. Hypothesis-Driven Investigation
+**Counters:** Making random changes and hoping something works -- the "shotgun debugging" anti-pattern.
+For each hypothesis from the fault tree, follow the strict protocol: PREDICT ("If hypothesis H is correct, then test T should produce result R"), TEST (execute exactly one test), OBSERVE (record the actual result), CONCLUDE (matched = SUPPORTED, failed = ELIMINATED, unexpected = new evidence). Never skip the PREDICT step -- without a prediction, you cannot distinguish a meaningful result from noise. Never change more than one variable per test -- if you change two things and the bug disappears, you don't know which change fixed it.
+## 3. Occam's Razor
+**Counters:** Pursuing elaborate explanations when simple ones have not been ruled out.
+Before investigating complex multi-component interaction bugs, race conditions, or framework-level issues, verify the simple explanations first: typo in variable name, wrong file path, missing import, incorrect config value, stale cache, wrong environment variable. These "boring" causes account for the majority of bugs. Only escalate to complex hypotheses AFTER the simple ones are eliminated. If your current hypothesis requires 3+ things to go wrong simultaneously, step back and look for a single-point failure.
+## 4. Counterfactual Thinking
+**Counters:** Failing to isolate causation by not asking "what if we changed just this one thing?"
+When you have a hypothesis about the root cause, construct a counterfactual: "If I change ONLY this one variable/config/line, the bug should disappear (or appear)." Execute the counterfactual test. If the bug persists after your targeted change, your hypothesis is wrong -- the cause is elsewhere. If the bug disappears, you have strong causal evidence. This is more powerful than correlation ("the bug appeared after deploy X") because it tests the mechanism, not just the timeline.
+---
+## When NOT to Think
+Skip structured reasoning models when the situation does not benefit from them:
+- **Obvious single-cause bugs** -- If the error message names the exact file, line, and cause (e.g., `TypeError: Cannot read property 'x' of undefined at foo.js:42`), fix it directly. Do not build a fault tree for a null reference with a stack trace.
+- **Reproducing a known fix** -- If you already know the root cause from a previous investigation or the user told you exactly what is wrong, skip hypothesis-driven investigation and go straight to the fix.
+- **Typos, missing imports, wrong paths** -- If Occam's Razor would immediately resolve it, apply the fix without invoking the full model. The model exists for when simple checks fail, not to gate simple checks.
+- **Reading error logs** -- Reading and understanding error output is normal debugging, not a "decision point." Only invoke models when you have multiple plausible hypotheses and need to choose which to test first.

package/get-shit-done/references/thinking-models-execution.md ADDED Viewed

@@ -0,0 +1,50 @@
+# Thinking Models: Execution Cluster
+Structured reasoning models for the **executor** agent. Apply these at decision points during task execution, not continuously. Each model counters a specific documented failure mode.
+Source: Curated from [thinking-partner](https://github.com/mattnowdev/thinking-partner) model catalog (150+ models). Selected for direct applicability to GSD execution workflow.
+## Conflict Resolution
+**Forcing Function and First Principles both push toward "do it now".** Run First Principles FIRST (understand the constraint), Forcing Function SECOND (create the mechanism). Sequential, not competing.
+## 1. Circle of Concern vs Circle of Control
+**Counters:** Executor trying to fix things outside its scope -- upstream bugs, unrelated tech debt, infrastructure issues.
+Before modifying any code not explicitly listed in the plan's `<files>` section, ask: Is this in my Circle of Control (plan scope) or my Circle of Concern (things I notice but shouldn't fix)? If Circle of Concern: document it as a deviation note or deferred item, do NOT fix it. The executor's job is to build what the plan says, not to improve the codebase. Scope creep from "while I'm here" fixes is the #1 cause of executor overruns.
+## 2. Forcing Function
+**Counters:** Deferring hard decisions to runtime instead of resolving them at build time.
+When you encounter an ambiguous requirement or unclear integration point, create a forcing function that makes the decision explicit NOW rather than hiding it behind a TODO or runtime check. Examples: use a TypeScript `never` type to force exhaustive switches, add a build-time assertion for required config values, create an interface that forces callers to handle error cases. If a decision truly cannot be made at build time, document it as a `checkpoint:decision` deviation -- do not silently defer.
+## 3. First Principles Thinking
+**Counters:** Copying patterns from existing code without understanding whether they fit the current task.
+Before copying a pattern from another file or phase, decompose WHY that pattern exists: What constraint does it satisfy? Does your current task have the same constraint? If not, the pattern may be cargo cult. Build your implementation from the task's actual requirements, not from the nearest existing example. When in doubt, the plan's `<action>` steps define what to build -- derive the implementation from those, not from adjacent code.
+## 4. Occam's Razor
+**Counters:** Over-engineering simple tasks with unnecessary abstractions, generics, or future-proofing.
+Before adding an abstraction layer, generic type parameter, factory pattern, or configuration option, ask: Does the plan REQUIRE this flexibility? If the plan says "create a function that does X", create a function that does X -- not a configurable, extensible, pluggable framework that could theoretically do X through Y through Z. The simplest implementation that satisfies the plan's `<done>` condition is the correct one. Add complexity only when the plan explicitly calls for it.
+## 5. Chesterton's Fence
+**Counters:** Removing or modifying existing code without understanding why it was written that way.
+Before removing, replacing, or significantly modifying existing code that the plan touches, determine WHY it exists. Check: git blame for the commit that introduced it, comments explaining the rationale, test cases that exercise it, the PLAN.md or SUMMARY.md that created it. If the purpose is unclear, keep it and add a comment noting the uncertainty -- do NOT remove code whose purpose you don't understand. If the plan explicitly says to remove it, still document what it did in the deviation notes.
+---
+## When NOT to Think
+Skip structured reasoning models when the situation does not benefit from them:
+- **Straightforward task actions** -- If the plan says "create file X with content Y" and the action is unambiguous, execute it directly. Do not invoke First Principles to analyze why you are creating a file the plan told you to create.
+- **Following established project patterns** -- If the codebase has a clear, consistent pattern (e.g., every route handler follows the same structure) and the plan says to add another one, follow the pattern. Chesterton's Fence applies to removing patterns, not to following them.
+- **Trivial file edits** -- Adding an import, fixing a typo, updating a version number. These are mechanical changes that do not involve design decisions.
+- **Running verify commands** -- Executing the plan's `<verify>` steps is procedural. Only invoke models if a verify step fails and you need to decide how to respond.

package/get-shit-done/references/thinking-models-planning.md ADDED Viewed

@@ -0,0 +1,62 @@
+# Thinking Models: Planning Cluster
+Structured reasoning models for the **planner** and **roadmapper** agents. Apply these at decision points during plan creation, not continuously. Each model counters a specific documented failure mode.
+Source: Curated from [thinking-partner](https://github.com/mattnowdev/thinking-partner) model catalog (150+ models). Selected for direct applicability to GSD planning workflow.
+## Conflict Resolution
+Pre-Mortem and Constraint Analysis both analyze risk at different granularities. Run Constraint Analysis FIRST (identify the hardest constraint), then Pre-Mortem (enumerate failure modes around that constraint and the rest of the plan).
+## 1. Pre-Mortem Analysis
+**Counters:** Optimistic plan decomposition that ignores failure modes.
+Before finalizing this plan, assume it has already failed. List the 3 most likely reasons for failure -- missing dependency, wrong decomposition, underestimated complexity -- and add mitigation steps or acceptance criteria that would catch each failure early.
+## 2. MECE Decomposition
+**Counters:** Overlapping tasks (merge conflicts) or gapped tasks (missing requirements).
+Verify this task breakdown is MECE at the REQUIREMENT level: (1) list every requirement from the phase goal, (2) confirm each maps to exactly one task's `<done>`, (3) if two tasks modify the same file, confirm they modify DIFFERENT sections or serve DIFFERENT requirements, (4) flag any requirement not covered by any task.
+## 3. Constraint Analysis
+**Counters:** Deferring the hardest constraint to the last task, causing late-stage failures.
+Identify the single hardest constraint in this phase -- the one thing that, if it doesn't work, makes everything else irrelevant. Schedule that constraint as task 1 or 2, not last. If the constraint involves an external API or unfamiliar library, add a spike/proof-of-concept task before the main implementation.
+## 4. Reversibility Test
+**Counters:** Over-analyzing cheap decisions, under-analyzing costly ones.
+For each significant decision in this plan, classify as REVERSIBLE (can change later with low cost) or IRREVERSIBLE (changing later requires migration, breaking changes, or significant rework). Spend analysis time proportional to irreversibility. For irreversible decisions, document the rationale in the plan.
+## 5. Curse of Knowledge Counter
+**Counters:** Plan-to-executor ambiguity from compressed instructions.
+For each `<action>` step, re-read it as if you have NEVER seen this codebase. Is every noun unambiguous (which file? which function? which endpoint?)? Is every verb specific (add WHERE? modify HOW?)? If a step could be interpreted two ways, rewrite it. Include file paths, function names, and expected behavior in every action step.
+## 6. Base Rate Neglect Counter
+**Counters:** Planners ignoring low-confidence research caveats.
+Before finalizing the plan, read ALL `[NEEDS DECISION]` items and LOW-confidence recommendations from SUMMARY.md. For each: either (a) create a `checkpoint:decision` task to resolve it, or (b) document why the risk is acceptable in the plan's deviation notes. LOW-confidence items that are silently accepted become undocumented technical debt.
+## Gap Closure Mode: Root-Cause Check
+**Applies only when:** Planner enters gap closure mode (triggered by `gaps_found` in VERIFICATION.md).
+Before writing the fix plan, apply a single "why" round: Why did this gap occur? Was it a plan deficiency (wrong task), an execution miss (correct task, wrong implementation), or a changed assumption (environment/dependency shift)? The fix plan must target the root cause category, not just the symptom.
+---
+## When NOT to Think
+Skip structured reasoning models when the situation does not benefit from them:
+- **Single-task plans** -- If the phase has one clear requirement and one obvious task, do not run Pre-Mortem or MECE analysis. write the task directly.
+- **Well-researched phases** -- If RESEARCH.md has HIGH-confidence recommendations for every decision and no `[NEEDS DECISION]` items, skip Base Rate Neglect Counter. The research already resolved uncertainty.
+- **Revision iterations** -- When revising a plan based on checker feedback, focus on fixing the flagged issues. Do not re-run the full model suite on every revision pass -- apply only the model relevant to the specific issue (e.g., MECE if the checker found a coverage gap).
+- **Boilerplate plans** -- Configuration changes, version bumps, documentation updates. These do not have failure modes worth pre-mortem analysis.

package/get-shit-done/references/thinking-models-research.md ADDED Viewed

@@ -0,0 +1,50 @@
+# Thinking Models: Research Cluster
+Structured reasoning models for the **researcher** and **synthesizer** agents. Apply these at decision points during research and synthesis, not continuously. Each model counters a specific documented failure mode.
+Source: Curated from [thinking-partner](https://github.com/mattnowdev/thinking-partner) model catalog (150+ models). Selected for direct applicability to GSD research workflow.
+## Conflict Resolution
+**First Principles and Steel Man both expand scope** -- run First Principles FIRST (decompose the problem), then Steel Man (strengthen alternatives). Don't run simultaneously.
+## 1. First Principles Thinking
+**Counters:** Accepting surface-level explanations without decomposing into fundamental components.
+Before accepting any technology recommendation or architectural pattern, decompose it to its fundamental constraints: What problem does this solve? What are the non-negotiable requirements? What are the physical/logical limits? Build your recommendation UP from these constraints rather than DOWN from conventional wisdom. If you cannot explain WHY a recommendation is correct from first principles, flag it as `[LOW]` regardless of source count.
+## 2. Simpson's Paradox Awareness
+**Counters:** Synthesizer aggregating conflicting research without checking for confounding splits.
+When combining findings from multiple research documents that show contradictory results, check whether the contradiction disappears when you split by a hidden variable: framework version, deployment target, project scale, or use case category. A library that benchmarks faster overall may be slower for YOUR specific workload. Before resolving contradictions by majority vote, ask: "Is there a subgroup split that explains why both findings are correct in their own context?"
+## 3. Survivorship Bias
+**Counters:** Only finding successful examples while missing failures and abandoned approaches.
+After gathering evidence FOR a recommended approach, actively search for projects that ABANDONED it. Check GitHub issues for "migrated away from", "replaced X with", or "problems with X at scale". A technology with 10 success stories and 100 quiet failures looks great until you check the graveyard. Weight negative evidence (migration-away stories, deprecation notices, unresolved issues) MORE heavily than positive evidence -- failures are underreported.
+## 4. Confirmation Bias Counter
+**Counters:** Searching for evidence that confirms initial hypothesis while ignoring disconfirming evidence.
+After forming your initial recommendation, spend one full research cycle searching AGAINST it. Use search terms like "{technology} problems", "{technology} alternatives", "why not {technology}", "{technology} vs {competitor}". For each piece of disconfirming evidence found, either (a) refute it with higher-confidence sources, or (b) add it as a caveat to your recommendation. If you cannot find ANY criticism of your recommendation, your search was too narrow -- widen it.
+## 5. Steel Man
+**Counters:** Dismissing alternative approaches without giving them their strongest possible form.
+Before recommending against an alternative technology or approach, construct its STRONGEST possible case. What would a passionate advocate say? What use cases does it serve better than your recommendation? What trade-offs favor it? Present the steel-manned alternative alongside your recommendation with an honest comparison. If the steel-manned alternative is competitive, flag the decision as `[NEEDS DECISION]` rather than making a unilateral recommendation.
+---
+## When NOT to Think
+Skip structured reasoning models when the situation does not benefit from them:
+- **Locked decisions from CONTEXT.md** -- If the user already decided "use library X", do not run Steel Man analysis on alternatives or First Principles decomposition of the choice. Research how to use X well, not whether X is the right choice.
+- **Standard stack lookups** -- If you are simply checking the latest version of a well-known library or reading its API docs, do not invoke Survivorship Bias or Confirmation Bias Counter. These models are for evaluating contested recommendations, not for factual lookups.
+- **Single-technology phases** -- If the phase involves one technology with no alternatives to evaluate (e.g., "add ESLint rule X"), skip comparative models (Steel Man, Confirmation Bias Counter). Just research the implementation.
+- **Codebase-only research** -- If the research is purely internal (understanding existing code patterns, finding where a function is called), structured reasoning models add no value. Use grep and read the code.

package/get-shit-done/references/thinking-models-verification.md ADDED Viewed

@@ -0,0 +1,55 @@
+# Thinking Models: Verification Cluster
+Structured reasoning models for the **verifier** and **plan-checker** agents. Apply these during verification passes, not continuously. Each model counters a specific documented failure mode.
+Source: Curated from [thinking-partner](https://github.com/mattnowdev/thinking-partner) model catalog (150+ models). Selected for direct applicability to GSD verification workflow.
+## Conflict Resolution
+**Inversion** and **Confirmation Bias Counter** both look for failures but serve different purposes. Run them in sequence:
+1. **Inversion FIRST** (brainstorm): generate 3 ways this could be wrong
+2. **Confirmation Bias Counter SECOND** (structured check): find one partial requirement, one misleading test, one uncovered error path
+Inversion generates the list; Confirmation Bias Counter is the discipline to verify items on it.
+## 1. Inversion
+**Counters:** Verifiers confirming success rather than finding failures.
+Instead of checking what IS correct, list 3 specific ways this implementation could be WRONG despite passing tests: missing edge cases, silent data loss, race conditions, unhandled error paths. For each, write a concrete check (grep for pattern, test with specific input, verify error handling exists). Additionally, check whether any documented DEVIATION in SUMMARY.md changes the meaning or applicability of a must-have. If a must-have was written assuming approach A but the executor used approach B, the must-have may need reinterpretation, not literal checking.
+## 2. Chesterton's Fence
+**Counters:** Flagging purposeful code as dead or unnecessary.
+Before flagging any existing code as dead, redundant, or overcomplicated, determine WHY it was written that way. Check git blame, comments, test cases, and the PLAN.md that created it. If the reason is unclear, flag as "purpose unknown -- recommend keeping with WARNING, not removing" and include the git blame hash for the commit that introduced it.
+## 3. Confirmation Bias Counter
+**Counters:** Verifiers primed by SUMMARY.md claims to see success.
+After your initial verification pass, do a DISCONFIRMATION pass: (1) find one requirement that is only partially met, (2) find one test that passes but does not actually test the stated behavior, (3) find one error path that has no test coverage. Report these even if overall verification passes.
+## 4. Planning Fallacy Calibration
+**Counters:** Accepting over-scoped plans as reasonable (plan-checker).
+For each task estimated as "simple" or "small", check: does it touch more than 2 files? Does it require understanding an unfamiliar API? Does it modify shared infrastructure? If yes to any, flag as likely underestimated. Plans with >5 tasks or tasks touching >4 files per task are over-scoped.
+## 5. Counterfactual Thinking
+**Counters:** Plans that assume success at every step with no error recovery (plan-checker).
+For each plan, ask: "What would happen if the executor followed this plan EXACTLY as written but encountered a common failure: dependency version mismatch, API returning unexpected format, file already modified by prior plan?" If the plan has no contingency path and the `<action>` steps assume success at every point, flag as WARNING: "No error recovery path for task T{n}."
+---
+## When NOT to Think
+Skip structured reasoning models when the situation does not benefit from them:
+- **Re-verification of previously passed items** -- When in re-verification mode, items that passed the initial check only need a quick regression check (existence + basic sanity), not the full Inversion + Confirmation Bias Counter treatment.
+- **Binary existence checks** -- If a must-have is "file X exists with >N lines" and the file clearly exists with substantive content, do not run Counterfactual Thinking on it. Reserve models for ambiguous or wiring-dependent must-haves.
+- **Straightforward test results** -- If `<verify>` commands produce clear pass/fail output (e.g., test suite exits 0 with all tests passing), accept the result. Only invoke models when test results are ambiguous or when you suspect the tests do not actually test what they claim.
+- **INFO-level issues** -- Do not apply structured reasoning to decide whether an INFO-level observation is actually a BLOCKER. INFO items are informational by definition and never trigger gates.

package/get-shit-done/references/thinking-partner.md ADDED Viewed

@@ -0,0 +1,96 @@
+# Thinking Partner Integration
+Conditional extended thinking at workflow decision points. Activates when `features.thinking_partner: true` in `.planning/config.json` (default: false).
+---
+## Tradeoff Detection Signals
+The thinking partner activates when developer responses contain specific signals indicating competing priorities:
+**Keyword signals:**
+- "or" / "versus" / "vs" connecting two approaches
+- "tradeoff" / "trade-off" / "tradeoffs"
+- "on one hand" / "on the other hand"
+- "pros and cons"
+- "not sure between" / "torn between"
+**Structural signals:**
+- Developer lists 2+ competing options
+- Developer asks "which is better" or "what would you recommend"
+- Developer reverses a previous decision ("actually, maybe we should...")
+**When NOT to activate:**
+- Developer has already made a clear choice
+- The "or" is rhetorical or trivial (e.g., "tabs or spaces" — use project convention)
+- Simple yes/no questions
+- Developer explicitly asks to move on
+---
+## Integration Points
+### 1. Discuss Phase — Tradeoff Deep-Dive
+**When:** During `discuss_areas` step, after a developer answer reveals competing priorities.
+**What:** Pause the normal question flow and offer a brief structured analysis:
+```
+I notice competing priorities here — {X} optimizes for {A} while {Y} optimizes for {B}.
+Want me to think through the tradeoffs before we decide?
+[Yes, analyze tradeoffs] / [No, I've decided]
+```
+If yes, provide a brief (3-5 bullet) analysis covering:
+- What each approach optimizes for
+- What each approach sacrifices
+- Which aligns better with the project's stated goals (from PROJECT.md)
+- A recommendation with reasoning
+Then return to the normal discussion flow.
+### 2. Plan Phase — Architectural Decision Analysis
+**When:** During step 11 (Handle Checker Return), when the plan-checker flags issues containing architectural tradeoff keywords.
+**What:** Before sending to the revision loop, analyze the architectural decision:
+```
+The plan-checker flagged an architectural tradeoff: {issue description}
+Brief analysis:
+- Option A: {approach} — {pros/cons}
+- Option B: {approach} — {pros/cons}
+- Recommendation: {choice} because {reasoning aligned with phase goals}
+Apply this recommendation to the revision? [Yes] / [No, let me decide]
+```
+### 3. Explore — Approach Comparison (requires #1729)
+**When:** During Socratic conversation, when multiple viable approaches emerge.
+**Note:** This integration point will be added when /gsd-explore (#1729) lands.
+---
+## Configuration
+```json
+{
+  "features": {
+    "thinking_partner": true
+  }
+}
+```
+Default: `false`. The thinking partner is opt-in because it adds latency to interactive workflows.
+---
+## Design Principles
+1. **Lightweight** — inline analysis, not a separate interactive session
+2. **Opt-in** — must be explicitly enabled, never activates by default
+3. **Skippable** — always offer "No, I've decided" to bypass
+4. **Brief** — 3-5 bullets max, not a full research report
+5. **Aligned** — recommendations reference PROJECT.md goals when available

package/get-shit-done/references/universal-anti-patterns.md CHANGED Viewed

@@ -21,7 +21,7 @@ Rules that apply to ALL workflows and agents. Individual workflows may have addi
 ## Subagent Rules
-10. **NEVER** use non-GSD agent types (`general`, `Explore`, `Plan`, `bash`, `feature-dev`, etc.) -- ALWAYS use `@gsd-{agent}` call (e.g., `@gsd-phase-researcher`, `@gsd-executor`, `@gsd-planner`). GSD agents have project-aware prompts, audit logging, and workflow context. Generic agents bypass all of this.
+10. **NEVER** use non-GSD agent types (generic agents, `Explore`, `Plan`, `bash`, `feature-dev`, etc.) -- ALWAYS use `@gsd-{agent}` call (e.g., `@gsd-phase-researcher`, `@gsd-executor`, `@gsd-planner`). GSD agents have project-aware prompts, audit logging, and workflow context. Generic agents bypass all of this.
 11. **Do not** re-litigate decisions that are already locked in CONTEXT.md (or PROJECT.md ## Context section) -- respect locked decisions unconditionally.
 ## Questioning Anti-Patterns
@@ -56,3 +56,8 @@ Reference: `references/questioning.md` for the full anti-pattern list.
 25. **Always use `gsd-tools.cjs`** (not `gsd-tools.js` or any other variant) -- GSD uses CommonJS for Node.js CLI compatibility.
 26. **Plan files MUST follow `{padded_phase}-{NN}-PLAN.md` pattern** (e.g., `01-01-PLAN.md`). Never use `PLAN-01.md`, `plan-01.md`, or any other variation -- gsd-tools detection depends on this exact pattern.
 27. **Do not start executing the next plan before writing the SUMMARY.md for the current plan** -- downstream plans may reference it via `@` includes.
+## iOS / Apple Platform Rules
+28. **NEVER use `Package.swift` + `.executableTarget` (or `.target`) as the primary build system for iOS apps.** SPM executable targets produce macOS CLI binaries, not iOS `.app` bundles. They cannot be installed on iOS devices or submitted to the App Store. Use XcodeGen (`project.yml` + `xcodegen generate`) to create a proper `.xcodeproj`. See `references/ios-scaffold.md` for the full pattern.
+29. **Verify SwiftUI API availability before use.** Many SwiftUI APIs require a specific minimum iOS version (e.g., `NavigationSplitView` is iOS 16+, `List(selection:)` with multi-select and `@Observable` require iOS 17). If a plan uses an API that exceeds the declared `IPHONEOS_DEPLOYMENT_TARGET`, raise the deployment target or add `#available` guards.

package/get-shit-done/references/verification-overrides.md ADDED Viewed

@@ -0,0 +1,227 @@
+# Verification Overrides
+Mechanism for intentionally accepting must-have failures when the deviation is known and acceptable. Prevents verification loops on items that will never pass as originally specified.
+<override_format>
+## Override Format
+Overrides are declared in the VERIFICATION.md frontmatter under an `overrides:` key:
+```yaml
+---
+phase: 03-authentication
+verified: 2026-04-05T12:00:00Z
+status: passed
+score: 5/5
+overrides_applied: 2
+overrides:
+  - must_have: "OAuth2 PKCE flow implemented"
+    reason: "Using session-based auth instead — PKCE unnecessary for server-rendered app"
+    accepted_by: "dave"
+    accepted_at: "2026-04-04T15:30:00Z"
+  - must_have: "Rate limiting on login endpoint"
+    reason: "Deferred to Phase 5 (infrastructure) — tracked in ROADMAP.md"
+    accepted_by: "dave"
+    accepted_at: "2026-04-04T15:30:00Z"
+---
+```
+### Required Fields
+| Field | Type | Description |
+|-------|------|-------------|
+| `must_have` | string | The must-have truth, artifact description, or key link being overridden. Does not need to be an exact match — fuzzy matching applies. |
+| `reason` | string | Why this deviation is acceptable. Must be specific — not just "not needed". |
+| `accepted_by` | string | Who accepted the override (username or role). Required. |
+| `accepted_at` | string | ISO timestamp of when the override was accepted. Required. |
+</override_format>
+## When to Use
+Overrides apply when a phase intentionally deviated from the original plan during execution — for example, a requirement was descoped, an alternative approach was chosen, or a dependency changed.
+Without overrides, the verifier reports these as FAIL even though the deviation was intentional. Overrides let the developer mark specific items as `PASSED (override)` with a documented reason.
+Overrides are appropriate when:
+- A requirement changed after planning but ROADMAP.md hasn't been updated yet
+- An alternative implementation satisfies the intent but not the literal wording
+- A must-have is deferred to a later phase with explicit tracking
+- External constraints make the original must-have impossible or unnecessary
+## When NOT to Use
+Overrides are NOT appropriate when:
+- The implementation is simply incomplete — fix it instead
+- The must-have is unclear — clarify it instead
+- The developer wants to skip verification — that undermines the process
+- Multiple must-haves are failing for the same phase — if more than 2-3 items need overrides, revisit the plan instead of overriding in bulk
+<matching_rules>
+## Matching Rules
+Override matching uses **fuzzy matching**, not exact string comparison. This accommodates minor wording differences between how must-haves are phrased in ROADMAP.md, PLAN.md frontmatter, and the override entry.
+### Matching Algorithm
+1. **Normalize both strings:** case-insensitive comparison — lowercase both strings, strip punctuation, collapse whitespace
+2. **Token overlap:** split into words, compute intersection
+3. **Match threshold:** 80% token overlap in EITHER direction (override tokens found in must-have, OR must-have tokens found in override)
+4. **Key noun priority:** nouns and technical terms (file paths, component names, API endpoints) are weighted higher than common words
+### Examples
+| Must-Have | Override `must_have` | Match? | Reason |
+|-----------|---------------------|--------|--------|
+| "User can authenticate via OAuth2 PKCE" | "OAuth2 PKCE flow implemented" | Yes | Key terms `OAuth2` and `PKCE` overlap, 80% threshold met |
+| "Rate limiting on /api/auth/login" | "Rate limiting on login endpoint" | Yes | `rate limiting` + `login` overlap |
+| "Chat component renders messages" | "OAuth2 PKCE flow implemented" | No | No meaningful token overlap |
+| "src/components/Chat.tsx provides message list" | "Chat.tsx message list rendering" | Yes | `Chat.tsx` + `message` + `list` overlap |
+### Ambiguity Resolution
+If an override matches multiple must-haves, apply it to the **most specific match** (highest token overlap percentage). If still ambiguous, apply to the first match and log a warning.
+</matching_rules>
+<verifier_behavior>
+## Verifier Behavior with Overrides
+### Check Order
+The override check happens **before marking a must-have as FAIL**. The flow is:
+1. Evaluate must-have against codebase (Steps 3-5 of verification process)
+2. If evaluation result is FAIL or UNCERTAIN:
+   a. Check `overrides:` array in VERIFICATION.md frontmatter for a fuzzy match
+   b. If override found: mark as `PASSED (override)` instead of FAIL
+   c. If no override found: mark as FAIL as normal
+3. If evaluation result is PASS: mark as VERIFIED (overrides are irrelevant)
+### Output Format
+Overridden items appear with distinct status in all verification tables:
+```markdown
+| # | Truth | Status | Evidence |
+|---|-------|--------|----------|
+| 1 | User can authenticate | VERIFIED | OAuth session flow working |
+| 2 | OAuth2 PKCE flow | PASSED (override) | Override: Using session-based auth — accepted by dave on 2026-04-04 |
+| 3 | Chat renders messages | FAILED | Component returns placeholder |
+```
+The `PASSED (override)` status must be visually distinct from both `VERIFIED` and `FAILED`. In the evidence column, include the override reason and who accepted it.
+### Impact on Overall Status
+- `PASSED (override)` items count toward the passing score, not the failing score
+- A phase with all items either VERIFIED or PASSED (override) can have status `passed`
+- Overrides do NOT suppress `human_needed` items — those still require human testing
+### Frontmatter Score
+The score and override count in frontmatter reflect applied overrides:
+```yaml
+score: 5/5  # includes 2 overrides
+overrides_applied: 2
+```
+</verifier_behavior>
+<creating_overrides>
+## Creating Overrides
+### Interactive Override Suggestion
+When the verifier marks a must-have as FAIL and the failure looks intentional (e.g., alternative implementation exists, or the code explicitly handles the case differently), the verifier should suggest creating an override:
+```markdown
+### F-002: OAuth2 PKCE flow
+**Status:** FAILED
+**Evidence:** No PKCE implementation found. Session-based auth used instead.
+**This looks intentional.** The codebase uses session-based authentication which achieves the same goal differently. To accept this deviation, add an override to VERIFICATION.md frontmatter:
+```yaml
+overrides:
+  - must_have: "OAuth2 PKCE flow implemented"
+    reason: "Using session-based auth instead — PKCE unnecessary for server-rendered app"
+    accepted_by: "{your name}"
+    accepted_at: "{current ISO timestamp}"
+```
+Then re-run verification to apply.
+```
+### Override via gsd-tools
+Overrides can also be managed through the verification workflow:
+1. Run `/gsd-verify-work` — verification finds gaps
+2. Review gaps — determine which are intentional deviations
+3. Add override entries to VERIFICATION.md frontmatter
+4. Re-run `/gsd-verify-work` — overrides are applied, remaining gaps shown
+</creating_overrides>
+<override_lifecycle>
+## Override Lifecycle
+### During Re-verification
+When a phase is re-verified (e.g., after gap closure):
+- Existing overrides carry forward automatically
+- If the underlying code now satisfies the must-have, the override becomes unnecessary — mark as VERIFIED instead
+- Overrides are never removed automatically; they persist as documentation
+### At Milestone Completion
+During `/gsd-audit-milestone`, overrides are surfaced in the audit report:
+```
+### Verification Overrides ({count} across {phase_count} phases)
+| Phase | Must-Have | Reason | Accepted By |
+|-------|----------|--------|-------------|
+| 03 | OAuth2 PKCE | Session-based auth used instead | dave |
+```
+This gives the team visibility into all accepted deviations before closing the milestone.
+### Cleanup
+Stale overrides (where the must-have was later implemented or removed from ROADMAP.md) can be cleaned up during milestone completion. They are informational — leaving them causes no harm.
+</override_lifecycle>
+## Example VERIFICATION.md
+```markdown
+---
+phase: 03-api-layer
+verified: 2026-04-05T12:00:00Z
+status: passed
+score: 3/3
+overrides_applied: 1
+overrides:
+  - must_have: "paginated API responses"
+    reason: "Descoped — dataset under 100 items, pagination adds complexity without value"
+    accepted_by: "dave"
+    accepted_at: "2026-04-04T15:30:00Z"
+---
+## Phase 3: API Layer — Verification
+| # | Truth | Status | Evidence |
+|---|-------|--------|----------|
+| 1 | REST endpoints return JSON | VERIFIED | curl tests confirm |
+| 2 | Paginated API responses | PASSED (override) | Descoped — see override: dataset under 100 items |
+| 3 | Authentication middleware | VERIFIED | JWT validation working |
+```