npm - gsd-antigravity-kit - Versions diffs - 1.32.0 → 2.0.0 - Mend

gsd-antigravity-kit 1.32.0 → 2.0.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (163) hide show

package/.agent/skills/gsd/references/docs/agent-contracts.md ADDED Viewed

@@ -0,0 +1,79 @@
+# Agent Contracts
+Completion markers and handoff schemas for all GSD agents. Workflows use these markers to detect agent completion and route accordingly.
+This doc describes what IS, not what should be. Casing inconsistencies are documented as they appear in agent source files.
+---
+## Agent Registry
+| Agent | Role | Completion Markers |
+|-------|------|--------------------|
+| gsd-planner | Plan creation | `## PLANNING COMPLETE` |
+| gsd-executor | Plan execution | `## PLAN COMPLETE`, `## CHECKPOINT REACHED` |
+| gsd-phase-researcher | Phase-scoped research | `## RESEARCH COMPLETE`, `## RESEARCH BLOCKED` |
+| gsd-project-researcher | Project-wide research | `## RESEARCH COMPLETE`, `## RESEARCH BLOCKED` |
+| gsd-plan-checker | Plan validation | `## VERIFICATION PASSED`, `## ISSUES FOUND` |
+| gsd-research-synthesizer | Multi-research synthesis | `## SYNTHESIS COMPLETE`, `## SYNTHESIS BLOCKED` |
+| gsd-debugger | Debug investigation | `## DEBUG COMPLETE`, `## ROOT CAUSE FOUND`, `## CHECKPOINT REACHED` |
+| gsd-roadmapper | Roadmap creation/revision | `## ROADMAP CREATED`, `## ROADMAP REVISED`, `## ROADMAP BLOCKED` |
+| gsd-ui-auditor | UI review | `## UI REVIEW COMPLETE` |
+| gsd-ui-checker | UI validation | `## ISSUES FOUND` |
+| gsd-ui-researcher | UI spec creation | `## UI-SPEC COMPLETE`, `## UI-SPEC BLOCKED` |
+| gsd-verifier | Post-execution verification | `## Verification Complete` (title case) |
+| gsd-integration-checker | Cross-phase integration check | `## Integration Check Complete` (title case) |
+| gsd-nyquist-auditor | Sampling audit | `## PARTIAL`, `## ESCALATE` (non-standard) |
+| gsd-security-auditor | Security audit | `## OPEN_THREATS`, `## ESCALATE` (non-standard) |
+| gsd-codebase-mapper | Codebase analysis | No marker (writes docs directly) |
+| gsd-assumptions-analyzer | Assumption extraction | No marker (returns `## Assumptions` sections) |
+| gsd-doc-verifier | Doc validation | No marker (writes JSON to `.planning/tmp/`) |
+| gsd-doc-writer | Doc generation | No marker (writes docs directly) |
+| gsd-advisor-researcher | Advisory research | No marker (utility agent) |
+| gsd-user-profiler | User profiling | No marker (returns JSON in analysis tags) |
+| gsd-intel-updater | Codebase intelligence analysis | `## INTEL UPDATE COMPLETE`, `## INTEL UPDATE FAILED` |
+## Marker Rules
+1. **ALL-CAPS markers** (e.g., `## PLANNING COMPLETE`) are the standard convention
+2. **Title-case markers** (e.g., `## Verification Complete`) exist in gsd-verifier and gsd-integration-checker -- these are intentional as-is, not bugs
+3. **Non-standard markers** (e.g., `## PARTIAL`, `## ESCALATE`) in audit agents indicate partial results requiring orchestrator judgment
+4. **Agents without markers** either write artifacts directly to disk or return structured data (JSON/sections) that the caller parses
+5. Markers must appear as H2 headings (`## `) at the start of a line in the agent's final output
+## Key Handoff Contracts
+### Planner -> Executor (via PLAN.md)
+| Field | Required | Description |
+|-------|----------|-------------|
+| Frontmatter | Yes | phase, plan, type, wave, depends_on, files_modified, autonomous, requirements |
+| `<objective>` | Yes | What the plan achieves |
+| `<tasks>` | Yes | Ordered task list with type, files, action, verify, acceptance_criteria |
+| `<verification>` | Yes | Overall verification steps |
+| `<success_criteria>` | Yes | Measurable completion criteria |
+### Executor -> Verifier (via SUMMARY.md)
+| Field | Required | Description |
+|-------|----------|-------------|
+| Frontmatter | Yes | phase, plan, subsystem, tags, key-files, metrics |
+| Commits table | Yes | Per-task commit hashes and descriptions |
+| Deviations section | Yes | Auto-fixed issues or "None" |
+| Self-Check | Yes | PASSED or FAILED with details |
+## Workflow Regex Patterns
+Workflows match these markers to detect agent completion:
+**plan-phase.md matches:**
+- `## RESEARCH COMPLETE` / `## RESEARCH BLOCKED` (researcher output)
+- `## PLANNING COMPLETE` (planner output)
+- `## CHECKPOINT REACHED` (planner/executor pause)
+- `## VERIFICATION PASSED` / `## ISSUES FOUND` (plan-checker output)
+**execute-phase.md matches:**
+- `## PHASE COMPLETE` (all plans in phase done)
+- `## Self-Check: FAILED` (summary self-check)
+> **NOTE:** `## PLAN COMPLETE` is the gsd-executor's completion marker but execute-phase.md does not regex-match it. Instead, it detects executor completion via spot-checks (SUMMARY.md existence, git commit state). This is intentional behavior, not a mismatch.

package/.agent/skills/gsd/references/docs/common-bug-patterns.md ADDED Viewed

@@ -0,0 +1,114 @@
+# Common Bug Patterns
+Checklist of frequent bug patterns to scan before forming hypotheses. Ordered by frequency. Check these FIRST — they cover ~80% of bugs across all technology stacks.
+<patterns>
+## Null / Undefined Access
+- [ ] Accessing property on `null` or `undefined` — missing null check or optional chaining
+- [ ] Function returns `undefined` instead of expected value — missing `return` statement or wrong branch
+- [ ] Array/object destructuring on `null`/`undefined` — API returned error shape instead of data
+- [ ] Optional parameter used without default — caller omitted argument
+## Off-by-One / Boundary
+- [ ] Loop starts at 1 instead of 0, or ends at `length` instead of `length - 1`
+- [ ] Fence-post error — "N items need N-1 separators" miscounted
+- [ ] Inclusive vs exclusive range boundary — `<` vs `<=`, slice/substring end index
+- [ ] Empty collection not handled — `.length === 0` falls through to logic assuming items exist
+## Async / Timing
+- [ ] Missing `await` on async function — gets Promise object instead of resolved value
+- [ ] Race condition — two async operations read/write same state without coordination
+- [ ] Stale closure — callback captures old variable value, not current one
+- [ ] Event handler fires before setup complete — initialization order dependency
+- [ ] Timeout/interval not cleaned up — fires after component/context destroyed
+## State Management
+- [ ] Mutating shared state — object/array modified in place affects other consumers
+- [ ] State updated but UI not re-rendered — missing reactive trigger or wrong reference
+- [ ] Stale state in event handler — closure captures state at bind time, not current value
+- [ ] Multiple sources of truth — same data stored in two places, one gets out of sync
+- [ ] State machine allows invalid transition — missing guard condition
+## Import / Module
+- [ ] Circular dependency — module A imports B, B imports A, one gets `undefined`
+- [ ] Default vs named export mismatch — `import X` vs `import { X }`
+- [ ] Wrong file extension — `.js` vs `.cjs` vs `.mjs`, `.ts` vs `.tsx`
+- [ ] Path case sensitivity — works on Windows/macOS, fails on Linux
+- [ ] Missing file extension in import — ESM requires explicit extensions
+## Type / Coercion
+- [ ] String vs number comparison — `"5" > "10"` is `true` (lexicographic), `5 > 10` is `false`
+- [ ] Implicit type coercion — `==` instead of `===`, truthy/falsy surprises (`0`, `""`, `[]`)
+- [ ] Integer overflow or floating point — `0.1 + 0.2 !== 0.3`, large numbers lose precision
+- [ ] Boolean vs truthy check — value is `0` or `""` which is valid but falsy
+## Environment / Config
+- [ ] Environment variable missing or wrong — different value in dev vs prod vs CI
+- [ ] Hardcoded path or URL — works on one machine, fails on another
+- [ ] Port already in use — previous process still running
+- [ ] File permission denied — different user/group in deployment
+- [ ] Missing dependency — not in package.json or not installed
+## Data Shape / API Contract
+- [ ] API response shape changed — backend updated, frontend expects old format
+- [ ] Array where object expected (or vice versa) — `data` vs `data.results` vs `data[0]`
+- [ ] Missing field in payload — required field omitted, backend returns validation error
+- [ ] Date/time format mismatch — ISO string vs timestamp vs locale string
+- [ ] Encoding mismatch — UTF-8 vs Latin-1, URL encoding, HTML entities
+## Regex / String
+- [ ] Regex `g` flag with `.test()` then `.exec()` — `lastIndex` not reset between calls
+- [ ] Missing escape — `.` matches any char, `$` is special, backslash needs doubling
+- [ ] Greedy match captures too much — `.*` eats through delimiters, need `.*?`
+- [ ] String interpolation in wrong quote type — template literals need backticks
+## Error Handling
+- [ ] Catch block swallows error — empty `catch {}` or logs but doesn't rethrow/handle
+- [ ] Wrong error type caught — catches base `Error` when specific type needed
+- [ ] Error in error handler — cleanup code throws, masking original error
+- [ ] Promise rejection unhandled — missing `.catch()` or try/catch around `await`
+## Scope / Closure
+- [ ] Variable shadowing — inner scope declares same name, hides outer variable
+- [ ] Loop variable capture — all closures share same `var i`, use `let` or bind
+- [ ] `this` binding lost — callback loses context, need `.bind()` or arrow function
+- [ ] Block scope vs function scope — `var` hoisted to function, `let`/`const` block-scoped
+</patterns>
+<usage>
+## How to Use This Checklist
+1. **Before forming any hypothesis**, scan the relevant categories based on the symptom
+2. **Match symptom to pattern** — if the bug involves "undefined is not an object", check Null/Undefined first
+3. **Each checked pattern is a hypothesis candidate** — verify or eliminate with evidence
+4. **If no pattern matches**, proceed to open-ended investigation
+### Symptom-to-Category Quick Map
+| Symptom | Check First |
+|---------|------------|
+| "Cannot read property of undefined/null" | Null/Undefined Access |
+| "X is not a function" | Import/Module, Type/Coercion |
+| Works sometimes, fails sometimes | Async/Timing, State Management |
+| Works locally, fails in CI/prod | Environment/Config |
+| Wrong data displayed | Data Shape, State Management |
+| Off by one item / missing last item | Off-by-One/Boundary |
+| "Unexpected token" / parse error | Data Shape, Type/Coercion |
+| Memory leak / growing resource usage | Async/Timing (cleanup), Scope/Closure |
+| Infinite loop / max call stack | State Management, Async/Timing |
+</usage>

package/.agent/skills/gsd/references/docs/context-budget.md ADDED Viewed

@@ -0,0 +1,49 @@
+# Context Budget Rules
+Standard rules for keeping orchestrator context lean. Reference this in workflows that spawn subagents or read significant content.
+See also: `references/universal-anti-patterns.md` for the complete set of universal rules.
+---
+## Universal Rules
+Every workflow that spawns agents or reads significant content must follow these rules:
+1. **Never** read agent definition files (`agents/*.md`) -- `subagent_type` auto-loads them
+2. **Never** inline large files into subagent prompts -- tell agents to read files from disk instead
+3. **Read depth scales with context window** -- check `context_window_tokens` in `.planning/config.json`:
+   - At < 500000 tokens (default 200k): read only frontmatter, status fields, or summaries. Never read full SUMMARY.md, VERIFICATION.md, or RESEARCH.md bodies.
+   - At >= 500000 tokens (1M model): MAY read full subagent output bodies when the content is needed for inline presentation or decision-making. Still avoid unnecessary reads.
+4. **Delegate** heavy work to subagents -- the orchestrator routes, it doesn't execute
+5. **Proactive warning**: If you've already consumed significant context (large file reads, multiple subagent results), warn the user: "Context budget is getting heavy. Consider checkpointing progress."
+## Read Depth by Context Window
+| Context Window | Subagent Output Reading | SUMMARY.md | VERIFICATION.md | PLAN.md (other phases) |
+|---------------|------------------------|------------|-----------------|------------------------|
+| < 500k (200k model) | Frontmatter only | Frontmatter only | Frontmatter only | Current phase only |
+| >= 500k (1M model) | Full body permitted | Full body permitted | Full body permitted | Current phase only |
+**How to check:** Read `.planning/config.json` and inspect `context_window_tokens`. If the field is absent, treat as 200k (conservative default).
+## Context Degradation Tiers
+Monitor context usage and adjust behavior accordingly:
+| Tier | Usage | Behavior |
+|------|-------|----------|
+| PEAK | 0-30% | Full operations. Read bodies, spawn multiple agents, inline results. |
+| GOOD | 30-50% | Normal operations. Prefer frontmatter reads, delegate aggressively. |
+| DEGRADING | 50-70% | Economize. Frontmatter-only reads, minimal inlining, warn user about budget. |
+| POOR | 70%+ | Emergency mode. Checkpoint progress immediately. No new reads unless critical. |
+## Context Degradation Warning Signs
+Quality degrades gradually before panic thresholds fire. Watch for these early signals:
+- **Silent partial completion** -- agent claims task is done but implementation is incomplete. Self-check catches file existence but not semantic completeness. Always verify agent output meets the plan's must_haves, not just that files exist.
+- **Increasing vagueness** -- agent starts using phrases like "appropriate handling" or "standard patterns" instead of specific code. This indicates context pressure even before budget warnings fire.
+- **Skipped steps** -- agent omits protocol steps it would normally follow. If an agent's success criteria has 8 items but it only reports 5, suspect context pressure.
+When delegating to agents, the orchestrator cannot verify semantic correctness of agent output -- only structural completeness. This is a fundamental limitation. Mitigate with must_haves.truths and spot-check verification.

package/.agent/skills/gsd/references/docs/domain-probes.md ADDED Viewed

@@ -0,0 +1,125 @@
+# Domain-Aware Probing Patterns
+Shared reference for `/gsd-begin`, `/gsd-discuss-phase`, and domain exploration workflows.
+When the user mentions a technology area, use these probes to ask insightful follow-up questions. Don't run through them as a checklist -- pick the 2-3 most relevant based on context. The goal is to surface hidden assumptions and trade-offs the user may not have considered yet.
+---
+## Authentication
+| User mentions | Agent probes with domain knowledge |
+|---|---|
+| "login" or "auth" | OAuth (which providers?), JWT, or session-based? Do you need social login or just email/password? |
+| "users" or "accounts" | MFA required? Password reset flow? Email verification? |
+| "sessions" | Session duration and refresh strategy? Server-side sessions or stateless tokens? |
+| "roles" or "permissions" | RBAC, ABAC, or simple role checks? How many distinct roles? |
+| "API keys" | Key rotation strategy? Scoped permissions per key? Rate limiting per key? |
+---
+## Real-Time Updates
+| User mentions | Agent probes with domain knowledge |
+|---|---|
+| "real-time" or "live updates" | WebSockets, SSE, or polling? What specifically needs to be real-time vs. eventual? |
+| "notifications" | Push notifications (browser/mobile), in-app only, or both? Persistence and read receipts? |
+| "collaboration" or "multiplayer" | Conflict resolution strategy? Operational transforms or CRDTs? Expected concurrent users? |
+| "chat" or "messaging" | Message history and search? Typing indicators? Read receipts? |
+| "streaming" | Reconnection strategy? What happens when the connection drops -- queue or discard? |
+---
+## Dashboard
+| User mentions | Agent probes with domain knowledge |
+|---|---|
+| "dashboard" | What data sources feed it? How many distinct views? |
+| "charts" or "graphs" | Interactive or static? Drill-down capability? Export to CSV/PDF? |
+| "metrics" or "KPIs" | Refresh strategy -- real-time, periodic polling, or on-demand? Acceptable staleness? |
+| "admin panel" | Role-based visibility? Which actions beyond viewing (edit, delete, approve)? |
+| "mobile" or "responsive" | Simplified mobile view or full parity? Touch interactions for charts? |
+---
+## API Design
+| User mentions | Agent probes with domain knowledge |
+|---|---|
+| "API" | REST, GraphQL, or RPC-style? Internal only or public-facing? |
+| "endpoints" or "routes" | Versioning strategy (URL path, header, query param)? Breaking change policy? |
+| "pagination" | Cursor-based or offset? Expected result set sizes? Stable ordering guarantee? |
+| "rate limiting" | Per-user, per-IP, or per-API-key? Burst allowance? How to communicate limits to clients? |
+| "errors" | Structured error format? Error codes vs. messages? How much detail in production errors? |
+---
+## Database
+| User mentions | Agent probes with domain knowledge |
+|---|---|
+| "database" or "storage" | SQL or NoSQL? What drives the choice -- relational integrity, flexibility, scale? |
+| "ORM" or "queries" | ORM (which one?) or raw queries? Query builder as middle ground? |
+| "migrations" | Migration tool? Rollback strategy? How do you handle data migrations vs. schema migrations? |
+| "seeding" or "test data" | Seed data for development? Realistic fake data or minimal fixtures? |
+| "scale" or "performance" | Read/write ratio? Read replicas? Connection pooling strategy? |
+---
+## Search
+| User mentions | Agent probes with domain knowledge |
+|---|---|
+| "search" | Full-text or exact match? Dedicated search engine (Elasticsearch, Meilisearch) or database-level? |
+| "filtering" or "facets" | Faceted filtering? How many filter dimensions? Combined filters (AND/OR)? |
+| "autocomplete" or "typeahead" | Debounce strategy? Minimum character threshold? Result ranking? |
+| "indexing" | Index size and update frequency? Real-time indexing or batch? Acceptable index lag? |
+| "fuzzy" or "typo tolerance" | Fuzzy matching? Synonym support? Language-specific stemming? |
+---
+## File Upload/Storage
+| User mentions | Agent probes with domain knowledge |
+|---|---|
+| "upload" or "file upload" | Local filesystem or cloud (S3, GCS, Azure Blob)? Direct upload or through server? |
+| "images" or "media" | Processing pipeline -- resize, compress, thumbnail generation? Format conversion? |
+| "size limits" | Max file size? Max total storage per user? What happens when limits are hit? |
+| "CDN" | CDN for delivery? Cache invalidation for updated files? Signed URLs for access control? |
+| "documents" or "attachments" | Virus scanning? Preview generation? Versioning of uploaded files? |
+---
+## Caching
+| User mentions | Agent probes with domain knowledge |
+|---|---|
+| "caching" or "performance" | Where to cache -- browser, CDN, application layer, database query cache? |
+| "invalidation" | Invalidation strategy -- TTL, event-driven, or manual? Cache-aside vs. write-through? |
+| "stale data" | Acceptable staleness window? Stale-while-revalidate pattern? |
+| "Redis" or "Memcached" | Cache topology -- single node or clustered? Persistence needed or pure cache? |
+| "CDN" or "edge" | Edge caching for static assets? Dynamic content at the edge? Cache key strategy? |
+---
+## Testing
+| User mentions | Agent probes with domain knowledge |
+|---|---|
+| "testing" or "tests" | Unit, integration, and E2E balance? Where do you invest most testing effort? |
+| "mocking" or "stubs" | Mock external services or use test containers? Database mocking strategy? |
+| "CI" or "pipeline" | Tests in CI? Parallel test execution? Test-on-PR or test-on-push? |
+| "coverage" | Coverage targets? Coverage as gate or advisory? Which metrics (line, branch, function)? |
+| "E2E" or "browser testing" | Playwright, Cypress, or other? Headed vs. headless? Visual regression testing? |
+---
+## Deployment
+| User mentions | Agent probes with domain knowledge |
+|---|---|
+| "deploy" or "hosting" | Container, serverless, or traditional VM/VPS? Managed platform (Vercel, Railway) or self-hosted? |
+| "CI/CD" or "pipeline" | GitHub Actions, GitLab CI, or other? Deploy on merge to main or manual trigger? |
+| "environments" | How many environments (dev, staging, prod)? Environment parity strategy? |
+| "rollback" | Rollback strategy? Blue-green, canary, or instant rollback? Database rollback considerations? |
+| "secrets" or "config" | Secret management -- env vars, vault, or platform-native? Per-environment config strategy? |

package/.agent/skills/gsd/references/docs/few-shot-examples/plan-checker.md ADDED Viewed

@@ -0,0 +1,73 @@
+---
+component: plan-checker
+version: 1
+last_calibrated: 2026-03-24
+---
+# Plan-Checker Few-Shot Examples
+## Positive Examples
+### Example 1: BLOCKER flagged for vague task action missing file and function names
+**Input:** Plan task with action: "Implement the authentication feature for the API routes."
+**Output:**
+> ```yaml
+> issues:
+>   - dimension: task_completeness
+>     severity: BLOCKER
+>     finding: "Task T1 action says 'implement the authentication feature' without naming target files, functions to create, or middleware to apply. Executor cannot determine what to build."
+>     affected_field: "<action>"
+>     suggested_fix: "Specify: create authMiddleware in src/middleware/auth.js, apply to routes in src/routes/api.js lines 12-45, verify with integration test"
+> ```
+**Why this is good:** The checker cited the specific dimension (task_completeness), quoted the problematic text, explained why it is a blocker (executor cannot determine what to build), and gave a concrete fix with file paths and function names. The finding is actionable -- the planner knows exactly what to add.
+### Example 2: BLOCKER for same-wave file conflict between two plans
+**Input:** Two plans in wave 1 -- Plan 01 modifies `get-shit-done/workflows/execute-phase.md`, Plan 02 also modifies `get-shit-done/workflows/execute-phase.md`. No `depends_on` declared.
+**Output:**
+> ```yaml
+> issues:
+>   - dimension: dependency_correctness
+>     severity: BLOCKER
+>     finding: "Plans 01 and 02 both modify get-shit-done/workflows/execute-phase.md in wave 1 with no depends_on relationship. Concurrent execution will cause merge conflicts or lost changes."
+>     affected_field: "files_modified"
+>     suggested_fix: "Either move Plan 02 to wave 2 with depends_on: ['01'] or consolidate the file changes into a single plan"
+> ```
+**Why this is good:** The checker identified a real structural problem -- two plans modifying the same file in the same wave without a dependency relationship. It cited dependency_correctness, named both plans, the conflicting file, and provided two alternative fixes.
+## Negative Examples
+### Example 1: All dimensions pass when verify commands are missing
+**Input:** Plan with 3 tasks. Task T2 has `<verify></verify>` (empty). Task T3 has `<verify>echo "done"</verify>` (trivial, cannot distinguish pass from fail).
+**Output:**
+> ## CHECK PASSED
+> Plans: 1 | Tasks: 3 | Dimensions: 10 | Issues: 0
+**Why this is bad:** The checker missed two violations. verification_derivation requires verify commands to be runnable and test actual output -- an empty verify block is a BLOCKER. Verify commands must distinguish pass from fail -- `echo "done"` always returns 0 regardless of implementation state. Both should have been flagged.
+### Example 2: False positive INFO for a correctly-sized plan
+**Input:** Plan with 3 tasks, 6 unique files, 1 dependency. All within limits.
+**Output:**
+> ```yaml
+> issues:
+>   - dimension: scope_sanity
+>     severity: INFO
+>     finding: "Plan has 3 tasks -- consider splitting into smaller plans for faster iteration"
+>     affected_field: "task count"
+>     suggested_fix: "Split tasks into separate plans"
+> ```
+**Why this is bad:** The checker flagged a non-issue. scope_sanity allows 2-3 tasks per plan -- 3 tasks is within limits. The checker applied a personal preference ("smaller is better") rather than the documented threshold. This wastes planner time on false positives and erodes trust in the checker's judgment. A correct check would produce no issue for this plan.

package/.agent/skills/gsd/references/docs/few-shot-examples/verifier.md ADDED Viewed

@@ -0,0 +1,109 @@
+---
+component: verifier
+version: 2
+last_calibrated: 2026-03-25
+calibration_source: "Adapted from thinking-partner verification corpus"
+---
+# Verifier Few-Shot Examples
+## Positive Examples
+### Example 1: Thorough multi-level artifact verification with real gap discovery
+**Input:** Must-have artifact: `get-shit-done/workflows/context-bridge.md: >50 lines`
+**Output:**
+> **L1 (Existence):** EXISTS -- `get-shit-done/workflows/context-bridge.md` found, 187 lines.
+> **L2 (Substantive):** SUBSTANTIVE -- No TODO/FIXME markers, 4 defined sections (`updateContextTier`, `getContextBudget`, `injectContextWarning`, `resetTierState`), non-trivial logic with config reads and threshold calculations.
+> **L3 (Wired):** WIRED -- Referenced by `get-shit-done/workflows/execute-phase.md` at step 3. Called by `agents/gsd-executor.md` via @-reference include. Arguments verified: receives planning directory path matching the expected parameter.
+> **Status: PASSED**
+**Why this is good:** The verifier checked all three levels independently with specific evidence at each level -- file path, line count, exported symbols, import location, and argument verification. It did not stop at "file exists" but traced wiring through to actual usage with correct arguments.
+### Example 2: Correct FAILED classification with grep evidence
+**Input:** Must-have truth: `validate-commit.js rejects commits missing scope parentheses`
+**Output:**
+> **Method:** Ran `grep -n "scope" get-shit-done/bin/validate-commit.js` to locate scope validation logic.
+> **Evidence:** Lines 34-41 show regex `/^(feat|fix|refactor|test|docs|chore|wip)\(/` which requires opening paren after type. However, no test for MISSING closing paren exists -- the regex accepts `feat(: description` (missing scope content and closing paren).
+> **Ran:** `echo "feat(: test" | node get-shit-done/bin/validate-commit.js` -- exit code 0 (accepted).
+> **Status: FAILED** -- Commit messages with empty scope `feat(: desc)` are accepted. The truth "rejects commits missing scope parentheses" does not hold for empty-scope edge case.
+**Why this is good:** The verifier did not trust the code's intent -- it ran an actual test with an edge case input and reported the specific failure. Evidence includes the regex, the test command, the exit code, and why the truth does not hold.
+### Example 3: Catching missing wiring (calibration-derived gap pattern)
+**Input:** Must-have key_link: `check-subagent-output.js imports and calls feedback-loop.js extractFeedback()`
+**Output:**
+> **Method:** Ran `grep -n "feedback-loop\|extractFeedback" get-shit-done/bin/check-subagent-output.js`.
+> **Evidence:** Zero matches. The file `feedback-loop.js` exists (172 lines, substantive), and `tests/feedback-loop.test.js` passes (229 lines). But `check-subagent-output.js` never imports or calls it.
+> **Cross-check:** `grep -rn "require.*feedback-loop" get-shit-done/bin/` returns zero results across all scripts.
+> **Status: FAILED** -- The artifact exists and is tested, but is completely orphaned. The key_link "imports and calls" does not hold. This is a **missing wiring** gap.
+**Why this is good:** The verifier caught a common calibration pattern: code that exists, is tested, but is never actually wired into the system. L1 and L2 pass, but L3 (wiring) fails. This pattern accounts for 37% of all gaps found in calibration corpus.
+### Example 4: Catching missing config schema entries (calibration-derived gap pattern)
+**Input:** Must-have truth: `Config toggles features.architecture_graph and features.architecture_guard exist with defaults`
+**Output:**
+> **Method:** Ran `grep "architecture_graph\|architecture_guard" get-shit-done/bin/lib/config-schema.json`.
+> **Evidence:** Zero matches in schema. However, `grep "architecture" get-shit-done/bin/lib/config.cjs` shows both flags in CONFIG_DEFAULTS (lines 426, 428).
+> **Gap:** Defaults exist in code but schema validation doesn't know about them. `configValidate()` would strip these fields on load, making the defaults unreachable.
+> **Status: FAILED** -- Config flags work in code but are invisible to schema validation. This is a **missing wiring** gap between config layers.
+**Why this is good:** The verifier checked multiple config layers (schema vs defaults vs code) rather than just one. Config wiring requires schema + defaults + code to agree -- checking only one layer misses layer-mismatch bugs.
+### Example 5: Distinguishing planning gaps from execution failures
+**Input:** Must-have truth: `All MSYS regex patterns consolidated into shared module`
+**Output:**
+> **Method:** Ran `grep -rn "\/\[a-zA-Z\]\/" get-shit-done/bin/` to find inline MSYS regex.
+> **Evidence:** 2 matches in `commands/gsd/misc.md` (lines 26, 487). These were NOT in the original plan scope -- Phase 148 extracted code into misc.md AFTER Phase 149's plan was written.
+> **Assessment:** The executor correctly completed all planned tasks. The gap is a **planning ordering issue**, not an execution failure.
+> **Status: PASS_WITH_NOTES** -- Executor did what was planned. Unconsolidated regex noted as follow-up, not as a failure of this phase.
+**Why this is good:** The verifier distinguished between what was planned vs what exists now. Code that changed AFTER plan creation is a planning gap, not an executor failure. This prevents false FAIL verdicts on correctly-executed work.
+## Calibration-Derived Gap Patterns
+Based on analysis of verification corpus (80% pass rate, 8 gaps):
+| Pattern | Count | % of Gaps | What to Check |
+|---------|-------|-----------|---------------|
+| Missing wiring | 3 | 37% | File exists + tested but never imported/called by consumer |
+| Missing tests | 2 | 25% | Feature works but test file excluded from CI or never runs |
+| Other | 3 | 38% | Config layer mismatches, planning ordering issues, schema gaps |
+## Negative Examples
+### Example 1: Blanket pass with no per-criterion evidence
+**Input:** Must-haves: 5 truths, 3 artifacts, 2 key_links from plan frontmatter.
+**Output:**
+> All must-haves have been verified. The implementation matches the plan requirements. Files exist and are properly connected. Status: PASSED.
+> must_haves_checked: 10, must_haves_passed: 10
+**Why this is bad:** No individual must-have is listed or checked. No file paths, no line numbers, no grep output, no level-by-level breakdown. The verifier restated the must-haves as passing without showing any evidence. This would miss stubs, orphaned exports, and broken wiring.
+### Example 2: File existence check without substance or wiring verification
+**Input:** Must-have artifact: `get-shit-done/bin/lib/audit-checks/index.cjs: >100 lines`
+**Output:**
+> **L1 (Existence):** EXISTS -- File found at `get-shit-done/bin/lib/audit-checks/index.cjs`, 142 lines.
+> **Status: PASSED**
+**Why this is bad:** The verifier stopped at Level 1. The file has 142 lines but could contain `// TODO: implement all checks` with stub functions returning empty objects. Level 2 (substantive) and Level 3 (wired) were skipped entirely. A file that exists but is never imported or contains only placeholder code should not pass.

package/.agent/skills/gsd/references/docs/gate-prompts.md ADDED Viewed

@@ -0,0 +1,100 @@
+# Gate Prompt Patterns
+Reusable prompt patterns for structured gate checks in workflows and agents.
+**For checkpoint box format details, see `references/ui-brand.md`** -- checkpoint boxes use double-line border drawing with 62-character inner width.
+## Rules
+- `header` must be max 12 characters
+- `multiSelect` is always `false` for gate checks
+- Always handle the "Other" case (user typed a freeform response instead of selecting)
+- Max 4 options per prompt -- if more are needed, use a 2-step flow
+---
+## Pattern: approve-revise-abort
+3-option gate for plan approval, gap-closure approval.
+- question: "Approve these {noun}?"
+- header: "Approve?"
+- options: Approve | Request changes | Abort
+## Pattern: yes-no
+Simple 2-option confirmation for re-planning, rebuild, replace plans, commit.
+- question: "{Specific question about the action}"
+- header: "Confirm"
+- options: Yes | No
+## Pattern: stale-continue
+2-option refresh gate for staleness warnings, timestamp freshness.
+- question: "{Artifact} may be outdated. Refresh or continue?"
+- header: "Stale"
+- options: Refresh | Continue anyway
+## Pattern: yes-no-pick
+3-option selection for seed selection, item inclusion.
+- question: "Include {items} in planning?"
+- header: "Include?"
+- options: Yes, all | Let me pick | No
+## Pattern: multi-option-failure
+4-option failure handler for build failures.
+- question: "Plan {id} failed. How should we proceed?"
+- header: "Failed"
+- options: Retry | Skip | Rollback | Abort
+## Pattern: multi-option-escalation
+4-option escalation for review escalation (max retries exceeded).
+- question: "Phase {N} has failed verification {attempt} times. How should we proceed?"
+- header: "Escalate"
+- options: Accept gaps | Re-plan (via /gsd-plan-phase) | Debug (via /gsd-debug) | Retry
+## Pattern: multi-option-gaps
+4-option gap handler for review gaps-found.
+- question: "{count} verification gaps need attention. How should we proceed?"
+- header: "Gaps"
+- options: Auto-fix | Override | Manual | Skip
+## Pattern: multi-option-priority
+4-option priority selection for milestone gap priority.
+- question: "Which gaps should we address?"
+- header: "Priority"
+- options: Must-fix only | Must + should | Everything | Let me pick
+## Pattern: toggle-confirm
+2-option confirmation for enabling/disabling boolean features.
+- question: "Enable {feature_name}?"
+- header: "Toggle"
+- options: Enable | Disable
+## Pattern: action-routing
+Up to 4 suggested next actions with selection (status, resume workflows).
+- question: "What would you like to do next?"
+- header: "Next Step"
+- options: {primary action} | {alternative 1} | {alternative 2} | Something else
+- Note: Dynamically generate options from workflow state. Always include "Something else" as last option.
+## Pattern: scope-confirm
+3-option confirmation for quick task scope validation.
+- question: "This task looks complex. Proceed as quick task or use full planning?"
+- header: "Scope"
+- options: Quick task | Full plan (via /gsd-plan-phase) | Revise
+## Pattern: depth-select
+3-option depth selection for planning workflow preferences.
+- question: "How thorough should planning be?"
+- header: "Depth"
+- options: Quick (3-5 phases, skip research) | Standard (5-8 phases, default) | Comprehensive (8-12 phases, deep research)
+## Pattern: context-handling
+3-option handler for existing CONTEXT.md in discuss workflow.
+- question: "Phase {N} already has a CONTEXT.md. How should we handle it?"
+- header: "Context"
+- options: Overwrite | Append | Cancel
+## Pattern: gray-area-option
+Dynamic template for presenting gray area choices in discuss workflow.
+- question: "{Gray area title}"
+- header: "Decision"
+- options: {Option 1} | {Option 2} | Let Antigravity decide
+- Note: Options generated at runtime. Always include "Let Antigravity decide" as last option.