npm - gsd-pi - Versions diffs - 2.48.0 → 2.49.0-dev.9e177e9 - Mend

gsd-pi 2.48.0 → 2.49.0-dev.9e177e9

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (369) hide show

package/src/resources/extensions/gsd/prompts/discuss-headless.md CHANGED Viewed

@@ -1,86 +1,253 @@
 # Headless Milestone Creation
-You are creating a GSD milestone from a provided specification document. This is a **headless** (non-interactive) flow — do NOT ask the user any questions. Work entirely from the provided specification.
+You are creating a GSD milestone from a provided specification document. This is a **headless** (non-interactive) flow — do NOT ask the user any questions. Wherever the interactive flow would ask the user, make your best-judgment call and document it as an assumption.
 ## Provided Specification
 {{seedContext}}
-## Your Task
+## Reflection Step
-### Step 1: Reflect
+Summarize your understanding of the specification concretely — not abstractly:
-Summarize your understanding of the specification concretely:
-- What is being built
-- Major capabilities/features
-- Scope estimate (how many milestones × slices)
-- Any ambiguities or gaps you notice
+1. Summarize what is being built in your own words.
+2. Give an honest size read: roughly how many milestones, roughly how many slices in the first one. Base this on the actual work involved, not a classification label.
+3. Include scope honesty — a bullet list of the major capabilities: "Here's what I'm reading from the spec: [bullet list of major capabilities]."
+4. Note any ambiguities, gaps, or areas where the spec is vague.
-### Step 2: Investigate (brief)
+Print this reflection in chat. Do not skip this step.
-Quickly scout the codebase to understand what already exists — spend no more than 5-6 tool calls here:
-- `ls` the project root and key directories
-- Search for relevant existing code, patterns, dependencies
-- Check library docs if needed (`resolve_library` / `get_library_docs`)
+## Vision Mapping
-Then move on to writing artifacts. Do not explore exhaustively — the research phase will do deeper investigation later.
+Decide the approach based on the actual scope:
-### Step 3: Make Decisions
+**If the work spans multiple milestones:** Map the full landscape:
+1. Propose a milestone sequence — names, one-line intents, rough dependencies
+2. Print this in chat as the working milestone sequence
-For any ambiguities or gaps in the specification:
-- Make your best-guess decision based on the spec's intent, codebase patterns, and domain conventions
-- Document each assumption clearly in the Context file
+**If the work fits in a single milestone:** Proceed directly to investigation.
-### Step 4: Assess Scope
+**Anti-reduction rule:** If the spec describes a big vision, plan the big vision. Do not reduce scope. Phase complex/risky work into later milestones — do not cut it. The spec's ambition is the target, and your job is to sequence it intelligently, not shrink it.
-Based on reflection + investigation:
-- Is this a single milestone or multiple milestones?
-- If multi-milestone: plan the full sequence with dependencies
+## Mandatory Investigation
-### Step 5: Write Artifacts
+Do a mandatory investigation pass before making any decisions. This is not optional.
-**Milestone ID**: {{milestoneId}}
+1. **Scout the codebase** — `ls`, `find`, `rg`, or `scout` for broad unfamiliar areas. Understand what already exists, what patterns are established, what constraints current code imposes.
+2. **Check library docs** — `resolve_library` / `get_library_docs` for any tech mentioned in the spec. Get current facts about capabilities, constraints, API shapes, version-specific behavior.
+3. **Web search** — `search-the-web` if the domain is unfamiliar, if you need current best practices, or if the spec references external services/APIs you need facts about. Use `fetch_page` for full content when snippets aren't enough.
-Use these templates exactly:
+**Web search budget:** Budget carefully across investigation + focused research:
+- Prefer `resolve_library` / `get_library_docs` over `web_search` for library documentation.
+- Prefer `search_and_read` for one-shot topic research.
+- Target 2-3 web searches in this investigation pass. Save remaining budget for focused research.
+- Do NOT repeat the same or similar queries.
-{{inlinedTemplates}}
+The goal: your decisions should reflect what's actually true in the codebase and ecosystem, not what you assume.
+## Autonomous Decision-Making
+For every area where the spec is ambiguous, vague, or silent:
+- Apply the depth checklist (below) to identify what needs resolution
+- Make your best-judgment call based on: the spec's intent, codebase patterns, domain conventions, and investigation findings
+- **Document every assumption** in the Context file under an "Assumptions" section
+- For each assumption, note: what the spec said (or didn't say), what you decided, and why
+### Depth Checklist
+Ensure ALL of these are resolved before writing artifacts — from the spec + investigation, not by asking:
+- [ ] **What is being built** — concrete enough that you could explain it to a stranger
+- [ ] **Why it needs to exist** — the problem it solves or the desire it fulfills
+- [ ] **Who it's for** — even if just the spec author
+- [ ] **What "done" looks like** — observable outcomes, not abstract goals
+- [ ] **The biggest technical unknowns / risks** — what could fail, what hasn't been proven
+- [ ] **What external systems/services this touches** — APIs, databases, third-party services, hardware
+If the spec leaves any of these unresolved, make your best-judgment call and document it.
+## Depth Verification
+Print a structured depth summary in chat covering:
+- What you understood the spec to describe
+- Key technical findings from investigation
+- Assumptions you made and why
+- Areas where you're least confident
+This is your audit trail. Print it — do not skip it.
+## Focused Research
+Do a focused research pass before roadmap creation.
+Research is advisory, not auto-binding. Use the spec + investigation to identify:
+- table stakes the product space usually expects
+- domain-standard behaviors that may be implied but not stated
+- likely omissions that would make the product feel incomplete
+- plausible anti-features or scope traps
+- differentiators worth preserving
+For multi-milestone visions, research should cover the full landscape, not just the first milestone. Research findings may affect milestone sequencing, not just slice ordering within M001.
+**Key difference from interactive flow:** Where the interactive flow would present research-surfaced candidate requirements for the user to confirm/defer/reject, you instead apply your best judgment. If a research finding clearly aligns with the spec's intent, include it. If it's tangential or would expand scope beyond what the spec describes, defer it or mark it out of scope. Document the reasoning.
+## Capability Contract
+Before writing a roadmap, produce `.gsd/REQUIREMENTS.md`.
+Use it as the project's explicit capability contract.
+Requirements must be organized into:
+- Active
+- Validated
+- Deferred
+- Out of Scope
+- Traceability
+Each requirement should include:
+- stable ID (`R###`)
+- title
+- class
+- status
+- description
+- why it matters
+- source (`spec`, `inferred`, `research`, or `execution`)
+- primary owning slice
+- supporting slices
+- validation status
+- notes
+Rules:
+- Keep requirements capability-oriented, not a giant feature inventory
+- Every Active requirement must either be mapped to a roadmap owner, explicitly deferred, blocked with reason, or moved out of scope
+- Product-facing work should capture launchability, primary user loop, continuity, and failure visibility when relevant
+- Later milestones may have provisional ownership, but the first planned milestone should map requirements to concrete slices wherever possible
+For multi-milestone projects, requirements should span the full vision. Requirements owned by later milestones get provisional ownership. The full requirement set captures the spec's complete vision — milestones are the sequencing strategy, not the scope boundary.
+**Print the requirements in chat before writing the roadmap.** Print a markdown table with columns: ID, Title, Status, Owner, Source. Group by status (Active, Deferred, Out of Scope).
+## Scope Assessment
+Confirm the size estimate from your reflection still holds. Investigation and research often reveal hidden complexity or simplify things. If the scope grew or shrank significantly, adjust the milestone and slice counts accordingly.
+## Output Phase
+### Roadmap Preview
+Before writing any files, **print the planned roadmap in chat**. Print a markdown table with columns: Slice, Title, Risk, Depends, Demo. One row per slice. Below the table, print the milestone definition of done as a bullet list.
+This is the user's audit trail in the TUI scrollback — do not skip it.
+### Naming Convention
-**For single milestone**, write in this order:
+Directories use bare IDs. Files use ID-SUFFIX format. Titles live inside file content, not in names.
+- Milestone dir: `.gsd/milestones/{{milestoneId}}/`
+- Milestone files: `{{milestoneId}}-CONTEXT.md`, `{{milestoneId}}-ROADMAP.md`
+- Slice dirs: `S01/`, `S02/`, etc.
+### Single Milestone
+In a single pass:
 1. `mkdir -p .gsd/milestones/{{milestoneId}}/slices`
-2. Write `.gsd/PROJECT.md` (using Project template)
-3. Write `.gsd/REQUIREMENTS.md` (using Requirements template)
-4. Write `{{contextPath}}` (using Context template) — preserve the specification's exact terminology, emphasis, and specific framing. Do not paraphrase domain-specific language into generics. Document assumptions under an "Assumptions" section.
-5. Write `{{roadmapPath}}` (using Roadmap template) — decompose into demoable vertical slices with checkboxes, risk, depends, demo sentences, proof strategy, verification classes, milestone definition of done, requirement coverage, and a boundary map. If the milestone crosses multiple runtime boundaries, include an explicit final integration slice.
-6. Seed `.gsd/DECISIONS.md` (using Decisions template)
+2. Write or update `.gsd/PROJECT.md` — use the **Project** output template below. Describe what the project is, its current state, and list the milestone sequence.
+3. Write or update `.gsd/REQUIREMENTS.md` — use the **Requirements** output template below. Confirm requirement states, ownership, and traceability before roadmap creation.
+**Depth-Preservation Guidance for context.md:**
+Preserve the specification's exact terminology, emphasis, and specific framing. Do not paraphrase domain-specific language into generics. If the spec said "craft feel," write "craft feel" — not "high-quality user experience." The context file is downstream agents' only window into this conversation — flattening specifics into generics loses the signal that shaped every decision.
+4. Write `{{contextPath}}` — use the **Context** output template below. Preserve key risks, unknowns, existing codebase constraints, integration points, and relevant requirements surfaced during research. Include an "Assumptions" section documenting every judgment call.
+5. Call `gsd_plan_milestone` to create the roadmap. Decompose into demoable vertical slices with risk, depends, demo sentences, proof strategy, verification classes, milestone definition of done, requirement coverage, and a boundary map. If the milestone crosses multiple runtime boundaries, include an explicit final integration slice that proves the assembled system works end-to-end in a real environment. Use the **Roadmap** output template below to structure the tool call parameters.
+6. For each architectural or pattern decision, call `gsd_decision_save` — the tool auto-assigns IDs and regenerates `.gsd/DECISIONS.md` automatically.
 7. {{commitInstruction}}
-9. Say exactly: "Milestone {{milestoneId}} ready."
-**For multi-milestone**, write in this order:
+After writing the files, say exactly: "Milestone {{milestoneId}} ready." — nothing else. Auto-mode will start automatically.
+### Multi-Milestone
+#### Phase 1: Shared artifacts
 1. For each milestone, call `gsd_milestone_generate_id` to get its ID — never invent milestone IDs manually. Then `mkdir -p .gsd/milestones/<ID>/slices` for each.
-2. Write `.gsd/PROJECT.md` — full vision across ALL milestones (using Project template)
-3. Write `.gsd/REQUIREMENTS.md` — full capability contract (using Requirements template)
-4. Seed `.gsd/DECISIONS.md` (using Decisions template)
-5. Write PRIMARY `{{contextPath}}` — full context with all assumptions documented
-6. Write PRIMARY `{{roadmapPath}}` — detailed slices for the first milestone only
-7. For each remaining milestone, write full CONTEXT.md with `depends_on` frontmatter:
-   ```yaml
-   ---
-   depends_on: [M001, M002]
-   ---
-   # M003: Title
-   ```
-   Each context file should be rich enough that a future agent — with no memory of this conversation — can understand the intent, constraints, dependencies, what the milestone unlocks, and what "done" looks like.
-8. {{multiMilestoneCommitInstruction}}
-10. Say exactly: "Milestone {{milestoneId}} ready."
+2. Write `.gsd/PROJECT.md` — use the **Project** output template below.
+3. Write `.gsd/REQUIREMENTS.md` — use the **Requirements** output template below. Capture Active, Deferred, Out of Scope, and any already Validated requirements. Later milestones may have provisional ownership where slice plans do not exist yet.
+4. For any architectural or pattern decisions, call `gsd_decision_save` — the tool auto-assigns IDs and regenerates `.gsd/DECISIONS.md` automatically.
+#### Phase 2: Primary milestone
+5. Write a full `CONTEXT.md` for the primary milestone (the first in sequence). Include an "Assumptions" section.
+6. Call `gsd_plan_milestone` for **only the primary milestone** — detail-planning later milestones now is waste because the codebase will change. Include requirement coverage and a milestone definition of done.
+#### MANDATORY: depends_on Frontmatter in CONTEXT.md
+Every CONTEXT.md for a milestone that depends on other milestones MUST have YAML frontmatter with `depends_on`. The auto-mode state machine reads this field to determine execution order — without it, milestones may execute out of order or in parallel when they shouldn't.
+```yaml
+---
+depends_on: [M001, M002]
+---
+# M003: Title
+```
+If a milestone has no dependencies, omit the frontmatter. Do NOT rely on QUEUE.md or PROJECT.md for dependency tracking — the state machine only reads CONTEXT.md frontmatter.
+#### Phase 3: Remaining milestones
+For each remaining milestone, in dependency order, autonomously decide the best readiness mode:
+- **Write full context** — if the spec provides enough detail for this milestone and investigation confirms feasibility. Write a full `CONTEXT.md` with technical assumptions verified against the actual codebase.
+- **Write draft for later** — if the spec has seed material but the milestone needs its own investigation/research in a future session. Write a `CONTEXT-DRAFT.md` capturing seed material, key ideas, provisional scope, and open questions. **Downstream:** Auto-mode pauses at this milestone and prompts the user to discuss.
+- **Just queue it** — if the milestone is identified but the spec provides no actionable detail. No context file written. **Downstream:** Auto-mode pauses and starts a full discussion from scratch.
+**Default to writing full context** when the spec is detailed enough. Default to draft when the spec mentions the milestone but is vague. Default to queue when the milestone is implied by the vision but not described.
+**Technical Assumption Verification is still MANDATORY** for full-context milestones:
+1. Read the actual code for every file or module you reference. Confirm APIs exist, check what functions actually do.
+2. Check for stale assumptions — verify referenced modules still work as described.
+3. Print findings in chat before writing each milestone's CONTEXT.md.
+Each context file (full or draft) should be rich enough that a future agent encountering it fresh — with no memory of this conversation — can understand the intent, constraints, dependencies, what this milestone unlocks, and what "done" looks like.
+#### Milestone Gate Tracking (MANDATORY for multi-milestone)
+After deciding each milestone's readiness, immediately write or update `.gsd/DISCUSSION-MANIFEST.json`:
+```json
+{
+  "primary": "M001",
+  "milestones": {
+    "M001": { "gate": "discussed", "context": "full" },
+    "M002": { "gate": "discussed", "context": "full" },
+    "M003": { "gate": "queued",    "context": "none" }
+  },
+  "total": 3,
+  "gates_completed": 3
+}
+```
+Write this file AFTER each gate decision, not just at the end. Update `gates_completed` incrementally. The system reads this file and BLOCKS auto-start if `gates_completed < total`.
+For single-milestone projects, do NOT write this file.
+#### Phase 4: Finalize
+7. {{multiMilestoneCommitInstruction}}
+After writing the files, say exactly: "Milestone {{milestoneId}} ready." — nothing else. Auto-mode will start automatically.
 ## Critical Rules
-- **DO NOT ask the user any questions** — this is headless mode
+- **DO NOT ask the user any questions** — this is headless mode. Make judgment calls and document them.
 - **Preserve the specification's terminology** — don't paraphrase domain-specific language
-- **Document assumptions** — when you make a judgment call, note it in CONTEXT.md under "Assumptions"
-- **Investigate before writing** — always scout the codebase first
-- **Use depends_on frontmatter** for multi-milestone sequences (the state machine reads this field to determine execution order)
-- **Anti-reduction rule** — if the spec describes a big vision, plan the big vision. Do not ask "what's the minimum viable version?" or reduce scope. Phase complex/risky work into later milestones — do not cut it.
-- **Naming convention** — always use `gsd_milestone_generate_id` to get milestone IDs. Directories use bare IDs (e.g. `M001/` or `M001-r5jzab/`), files use ID-SUFFIX format (e.g. `M001-CONTEXT.md` or `M001-r5jzab-CONTEXT.md`). Never invent milestone IDs manually.
+- **Document assumptions** — every judgment call gets noted in CONTEXT.md under "Assumptions" with reasoning
+- **Investigate thoroughly** — scout codebase, check library docs, web search. Same rigor as interactive mode.
+- **Do focused research** — identify table stakes, domain standards, omissions, scope traps. Same rigor as interactive mode.
+- **Use proper tools** — `gsd_plan_milestone` for roadmaps, `gsd_decision_save` for decisions, `gsd_milestone_generate_id` for IDs
+- **Print artifacts in chat** — requirements table, roadmap preview, depth summary. The TUI scrollback is the user's audit trail.
+- **Use depends_on frontmatter** for multi-milestone sequences
+- **Anti-reduction rule** — if the spec describes a big vision, plan the big vision. Phase complexity — don't cut it.
+- **Naming convention** — always use `gsd_milestone_generate_id` for IDs. Directories use bare IDs, files use ID-SUFFIX format.
 - **End with "Milestone {{milestoneId}} ready."** — this triggers auto-start detection
+{{inlinedTemplates}}

package/src/resources/extensions/gsd/prompts/execute-task.md CHANGED Viewed

@@ -38,18 +38,21 @@ Then:
    - Correct: `command > /dev/null 2>&1 &` or `nohup command > /dev/null 2>&1 &`
    - Example: `python -m http.server 8080 > /dev/null 2>&1 &` (NOT `python -m http.server 8080 &`)
    - Preferred: use the `bg_shell` tool if available — it manages process lifecycle correctly without stream-inheritance issues
-6. Verify must-haves are met by running concrete checks (tests, commands, observable behaviors)
-7. Run the slice-level verification checks defined in the slice plan's Verification section. Track which pass. On the final task of the slice, all must pass before marking done. On intermediate tasks, partial passes are expected — note which ones pass in the summary.
-8. After the verification gate runs (you'll see gate results in stderr/notify output), populate the `## Verification Evidence` table in your task summary with the check results. Use the `formatEvidenceTable` format: one row per check with command, exit code, verdict (✅ pass / ❌ fail), and duration. If no verification commands were discovered, note that in the section.
-9. If the task touches UI, browser flows, DOM behavior, or user-visible web state:
+6. If the task plan includes a **Failure Modes** section (Q5), implement the error/timeout/malformed handling specified. Verify each dependency's failure path is handled. Skip if the section is absent.
+7. If the task plan includes a **Load Profile** section (Q6), implement protections for the identified 10x breakpoint (connection pooling, rate limiting, pagination, etc.). Skip if absent.
+8. If the task plan includes a **Negative Tests** section (Q7), write the specified negative test cases alongside the happy-path tests — malformed inputs, error paths, and boundary conditions. Skip if absent.
+9. Verify must-haves are met by running concrete checks (tests, commands, observable behaviors)
+10. Run the slice-level verification checks defined in the slice plan's Verification section. Track which pass. On the final task of the slice, all must pass before marking done. On intermediate tasks, partial passes are expected — note which ones pass in the summary.
+11. After the verification gate runs (you'll see gate results in stderr/notify output), populate the `## Verification Evidence` table in your task summary with the check results. Use the `formatEvidenceTable` format: one row per check with command, exit code, verdict (✅ pass / ❌ fail), and duration. If no verification commands were discovered, note that in the section.
+12. If the task touches UI, browser flows, DOM behavior, or user-visible web state:
    - exercise the real flow in the browser
    - prefer `browser_batch` when the next few actions are obvious and sequential
    - prefer `browser_assert` for explicit pass/fail verification of the intended outcome
    - use `browser_diff` when an action's effect is ambiguous
    - use console/network/dialog diagnostics when validating async, stateful, or failure-prone UI
    - record verification in terms of explicit checks passed/failed, not only prose interpretation
-10. If the task plan includes an Observability Impact section, verify those signals directly. Skip this step if the task plan omits the section.
-11. **If execution is running long or verification fails:**
+13. If the task plan includes an Observability Impact section, verify those signals directly. Skip this step if the task plan omits the section.
+14. **If execution is running long or verification fails:**
     **Context budget:** You have approximately **{{verificationBudget}}** reserved for verification context. If you've used most of your context and haven't finished all steps, stop implementing and prioritize writing the task summary with clear notes on what's done and what remains. A partial summary that enables clean resumption is more valuable than one more half-finished step with no documentation. Never sacrifice summary quality for one more implementation step.
@@ -60,13 +63,13 @@ Then:
     - Distinguish "I know" from "I assume." Observable facts (the error says X) are strong evidence. Assumptions (this library should work this way) need verification.
     - Know when to stop. If you've tried 3+ fixes without progress, your mental model is probably wrong. Stop. List what you know for certain. List what you've ruled out. Form fresh hypotheses from there.
     - Don't fix symptoms. Understand *why* something fails before changing code. A test that passes after a change you don't understand is luck, not a fix.
-11. **Blocker discovery:** If execution reveals that the remaining slice plan is fundamentally invalid — not just a bug or minor deviation, but a plan-invalidating finding like a wrong API, missing capability, or architectural mismatch — set `blocker_discovered: true` in the task summary frontmatter and describe the blocker clearly in the summary narrative. Do NOT set `blocker_discovered: true` for ordinary debugging, minor deviations, or issues that can be fixed within the current task or the remaining plan. This flag triggers an automatic replan of the slice.
-12. If you made an architectural, pattern, library, or observability decision during this task that downstream work should know about, append it to `.gsd/DECISIONS.md` (read the template at `~/.gsd/agent/extensions/gsd/templates/decisions.md` if the file doesn't exist yet). Not every task produces decisions — only append when a meaningful choice was made.
-13. If you discover a non-obvious rule, recurring gotcha, or useful pattern during execution, append it to `.gsd/KNOWLEDGE.md`. Only add entries that would save future agents from repeating your investigation. Don't add obvious things.
-14. Read the template at `~/.gsd/agent/extensions/gsd/templates/task-summary.md`
-15. Write `{{taskSummaryPath}}`
-16. Call `gsd_complete_task` with milestone_id, slice_id, task_id, and a summary of what was accomplished. This is your final required step — do NOT manually edit PLAN.md checkboxes. The tool marks the task complete, updates the DB, and renders PLAN.md automatically.
-17. Do not run git commands — the system reads your task summary after completion and creates a meaningful commit from it (type inferred from title, message from your one-liner, key files from frontmatter). Write a clear, specific one-liner in the summary — it becomes the commit message.
+15. **Blocker discovery:** If execution reveals that the remaining slice plan is fundamentally invalid — not just a bug or minor deviation, but a plan-invalidating finding like a wrong API, missing capability, or architectural mismatch — set `blocker_discovered: true` in the task summary frontmatter and describe the blocker clearly in the summary narrative. Do NOT set `blocker_discovered: true` for ordinary debugging, minor deviations, or issues that can be fixed within the current task or the remaining plan. This flag triggers an automatic replan of the slice.
+16. If you made an architectural, pattern, library, or observability decision during this task that downstream work should know about, append it to `.gsd/DECISIONS.md` (read the template at `~/.gsd/agent/extensions/gsd/templates/decisions.md` if the file doesn't exist yet). Not every task produces decisions — only append when a meaningful choice was made.
+17. If you discover a non-obvious rule, recurring gotcha, or useful pattern during execution, append it to `.gsd/KNOWLEDGE.md`. Only add entries that would save future agents from repeating your investigation. Don't add obvious things.
+18. Read the template at `~/.gsd/agent/extensions/gsd/templates/task-summary.md`
+19. Write `{{taskSummaryPath}}`
+20. Call `gsd_complete_task` with milestone_id, slice_id, task_id, and a summary of what was accomplished. This is your final required step — do NOT manually edit PLAN.md checkboxes. The tool marks the task complete, updates the DB, and renders PLAN.md automatically.
+21. Do not run git commands — the system reads your task summary after completion and creates a meaningful commit from it (type inferred from title, message from your one-liner, key files from frontmatter). Write a clear, specific one-liner in the summary — it becomes the commit message.
 All work stays in your working directory: `{{workingDirectory}}`.

package/src/resources/extensions/gsd/prompts/forensics.md CHANGED Viewed

@@ -16,7 +16,7 @@ GSD extension source code is at: `{{gsdSourceDir}}`
 | Domain | Files |
 |--------|-------|
-| **Auto-mode engine** | `auto.ts` `auto-loop.ts` `auto-dispatch.ts` `auto-start.ts` `auto-supervisor.ts` `auto-timers.ts` `auto-timeout-recovery.ts` `auto-unit-closeout.ts` `auto-post-unit.ts` `auto-verification.ts` `auto-recovery.ts` `auto-worktree.ts` `auto-worktree-sync.ts` `auto-model-selection.ts` `auto-budget.ts` `dispatch-guard.ts` |
+| **Auto-mode engine** | `auto.ts` `auto-loop.ts` `auto-dispatch.ts` `auto-start.ts` `auto-supervisor.ts` `auto-timers.ts` `auto-timeout-recovery.ts` `auto-unit-closeout.ts` `auto-post-unit.ts` `auto-verification.ts` `auto-recovery.ts` `auto-worktree.ts` `auto-model-selection.ts` `auto-budget.ts` `dispatch-guard.ts` |
 | **State & persistence** | `state.ts` `types.ts` `files.ts` `paths.ts` `json-persistence.ts` `atomic-write.ts` |
 | **Forensics & recovery** | `forensics.ts` `session-forensics.ts` `crash-recovery.ts` `session-lock.ts` |
 | **Metrics & telemetry** | `metrics.ts` `skill-telemetry.ts` `token-counter.ts` |
@@ -46,7 +46,7 @@ GSD extension source code is at: `{{gsdSourceDir}}`
 ├── milestones/{ID}/             — milestone artifacts
 │   ├── {ID}-ROADMAP.md, {ID}-RESEARCH.md, {ID}-CONTEXT.md, {ID}-SUMMARY.md
 │   └── slices/{SID}/            — slice artifacts
-│       ├── {SID}-PLAN.md, {SID}-RESEARCH.md, {SID}-UAT-RESULT.md, {SID}-SUMMARY.md
+│       ├── {SID}-PLAN.md, {SID}-RESEARCH.md, {SID}-UAT.md, {SID}-SUMMARY.md
 │       └── tasks/{TID}-PLAN.md, {TID}-SUMMARY.md
 └── worktrees/{milestoneId}/     — per-milestone worktree with replicated .gsd/
 ```
@@ -142,9 +142,10 @@ Then **offer GitHub issue creation**: "Would you like me to create a GitHub issu
 If yes, create using the `bash` tool:
 ```bash
-gh issue create --repo gsd-build/gsd-2 \
+# Step 1: Create issue (use labels for metadata, NOT for classification — type is set via GraphQL)
+ISSUE_URL=$(gh issue create --repo gsd-build/gsd-2 \
   --title "..." \
-  --label "bug" --label "auto-generated" \
+  --label "auto-generated" \
   --body "$(cat <<'EOF'
 ## Problem
 [1-2 sentence summary]
@@ -169,7 +170,13 @@ gh issue create --repo gsd-build/gsd-2 \
 ---
 *Auto-generated by `/gsd forensics`*
 EOF
-)"
+)")
+# Step 2: Set issue type via GraphQL (gh issue create has no --type flag)
+ISSUE_NUM=$(echo "$ISSUE_URL" | grep -oE '[0-9]+$')
+ISSUE_ID=$(gh api graphql -f query='{ repository(owner:"gsd-build",name:"gsd-2") { issue(number:'"$ISSUE_NUM"') { id } } }' --jq '.data.repository.issue.id')
+TYPE_ID=$(gh api graphql -f query='{ repository(owner:"gsd-build",name:"gsd-2") { issueTypes(first:20) { nodes { id name } } } }' --jq '.data.repository.issueTypes.nodes[] | select(.name=="Bug") | .id')
+gh api graphql -f query='mutation { updateIssue(input:{id:"'"$ISSUE_ID"'",issueTypeId:"'"$TYPE_ID"'"}) { issue { number } } }'
 ```
 ### Redaction Rules (CRITICAL)

package/src/resources/extensions/gsd/prompts/gate-evaluate.md ADDED Viewed

@@ -0,0 +1,32 @@
+# Quality Gate Evaluation — Parallel Dispatch
+**Working directory:** `{{workingDirectory}}`
+**Milestone:** {{milestoneId}} — {{milestoneTitle}}
+**Slice:** {{sliceId}} — {{sliceTitle}}
+## Mission
+You are evaluating **quality gates in parallel** for this slice. Each gate is an independent question that must be answered before task execution begins. Use the `subagent` tool to dispatch all gate evaluations simultaneously.
+## Slice Plan Context
+{{slicePlanContent}}
+## Gates to Evaluate
+{{gateCount}} gates require evaluation:
+{{gateList}}
+## Execution Protocol
+1. **Dispatch all gates** using `subagent` in parallel mode. Each subagent prompt is provided below.
+2. **Wait for all subagents** to complete.
+3. **Verify each gate wrote its result** by checking that `gsd_save_gate_result` was called for each gate ID.
+4. **Report the batch outcome** — which gates passed, which flagged concerns, and which were omitted as not applicable.
+Gate agents may return `verdict: "omitted"` if the gate question is not applicable to this slice (e.g., no auth surface for Q3, no existing requirements touched for Q4). This is expected for simple slices.
+## Subagent Prompts
+{{subagentPrompts}}

package/src/resources/extensions/gsd/prompts/guided-complete-slice.md CHANGED Viewed

@@ -1,3 +1,3 @@
-Complete slice {{sliceId}} ("{{sliceTitle}}") of milestone {{milestoneId}}. Your working directory is `{{workingDirectory}}` — all file operations must use this path. All tasks are done. Your slice summary is the primary record of what was built — downstream agents (reassess-roadmap, future slice researchers) read it to understand what this slice delivered and what to watch out for. Use the **Slice Summary** and **UAT** output templates below to understand the expected structure. {{skillActivation}} Call `gsd_slice_complete` to record completion — the tool writes `{{sliceId}}-SUMMARY.md`, `{{sliceId}}-UAT.md`, and toggles the roadmap checkbox atomically. Fill the `UAT Type` plus `Not Proven By This UAT` sections explicitly in `uatContent` so the artifact states what class of acceptance it covers and what still remains unproven. Review task summaries for `key_decisions` and ensure any significant ones are in `.gsd/DECISIONS.md`. Do not commit or merge manually — the system handles this after the unit completes.
+Complete slice {{sliceId}} ("{{sliceTitle}}") of milestone {{milestoneId}}. Your working directory is `{{workingDirectory}}` — all file operations must use this path. All tasks are done. Your slice summary is the primary record of what was built — downstream agents (reassess-roadmap, future slice researchers) read it to understand what this slice delivered and what to watch out for. Use the **Slice Summary** and **UAT** output templates below to understand the expected structure. {{skillActivation}} Call `gsd_slice_complete` to record completion — the tool writes `{{sliceId}}-SUMMARY.md`, `{{sliceId}}-UAT.md`, and toggles the roadmap checkbox atomically. Fill the `UAT Type` plus `Not Proven By This UAT` sections explicitly in `uatContent` so the artifact states what class of acceptance it covers and what still remains unproven. Review task summaries for `key_decisions` and ensure any significant ones are in `.gsd/DECISIONS.md`. If the slice involved runtime behavior, fill the Operational Readiness section (Q8) in the summary: health signal, failure signal, recovery procedure, and monitoring gaps. Omit for simple slices. Do not commit or merge manually — the system handles this after the unit completes.
 {{inlinedTemplates}}

package/src/resources/extensions/gsd/prompts/guided-execute-task.md CHANGED Viewed

@@ -1,3 +1,3 @@
-Execute the next task: {{taskId}} ("{{taskTitle}}") in slice {{sliceId}} of milestone {{milestoneId}}. Read the task plan (`{{taskId}}-PLAN.md`), load relevant summaries from prior tasks, and execute each step. Verify must-haves when done. If the task touches UI, browser flows, DOM behavior, or user-visible web state, exercise the real flow in the browser, prefer `browser_batch` for obvious sequences, prefer `browser_assert` for explicit pass/fail verification, use `browser_diff` when an action's effect is ambiguous, and use browser diagnostics when validating async or failure-prone UI. If you made an architectural, pattern, or library decision, append it to `.gsd/DECISIONS.md`. Use the **Task Summary** output template below. Call `gsd_task_complete` to record completion (it writes the summary, toggles the checkbox, and persists to DB atomically). {{skillActivation}} If running long and not all steps are finished, stop implementing and prioritize writing a clean partial summary over attempting one more step — a recoverable handoff is more valuable than a half-finished step with no documentation. If verification fails, debug methodically: form a hypothesis and test that specific theory before changing anything, change one variable at a time, read entire functions not just the suspect line, distinguish observable facts from assumptions, and if 3+ fixes fail without progress stop and reassess your mental model — list what you know for certain, what you've ruled out, and form fresh hypotheses. Don't fix symptoms — understand why something fails before changing code.
+Execute the next task: {{taskId}} ("{{taskTitle}}") in slice {{sliceId}} of milestone {{milestoneId}}. Read the task plan (`{{taskId}}-PLAN.md`), load relevant summaries from prior tasks, and execute each step. Verify must-haves when done. If the task touches UI, browser flows, DOM behavior, or user-visible web state, exercise the real flow in the browser, prefer `browser_batch` for obvious sequences, prefer `browser_assert` for explicit pass/fail verification, use `browser_diff` when an action's effect is ambiguous, and use browser diagnostics when validating async or failure-prone UI. If you made an architectural, pattern, or library decision, append it to `.gsd/DECISIONS.md`. Use the **Task Summary** output template below. Call `gsd_task_complete` to record completion (it writes the summary, toggles the checkbox, and persists to DB atomically). {{skillActivation}} If running long and not all steps are finished, stop implementing and prioritize writing a clean partial summary over attempting one more step — a recoverable handoff is more valuable than a half-finished step with no documentation. If verification fails, debug methodically: form a hypothesis and test that specific theory before changing anything, change one variable at a time, read entire functions not just the suspect line, distinguish observable facts from assumptions, and if 3+ fixes fail without progress stop and reassess your mental model — list what you know for certain, what you've ruled out, and form fresh hypotheses. Don't fix symptoms — understand why something fails before changing code. If the task plan includes Failure Modes, Load Profile, or Negative Tests sections, implement and verify them: handle each dependency's error/timeout/malformed paths (Q5), protect against identified 10x breakpoints (Q6), and write specified negative test cases (Q7).
 {{inlinedTemplates}}

package/src/resources/extensions/gsd/prompts/guided-plan-milestone.md CHANGED Viewed

@@ -1,4 +1,4 @@
-Plan milestone {{milestoneId}} ("{{milestoneTitle}}"). Read `.gsd/DECISIONS.md` if it exists — respect existing decisions. Read `.gsd/REQUIREMENTS.md` if it exists and treat Active requirements as the capability contract. If `REQUIREMENTS.md` is missing, continue in legacy compatibility mode but explicitly note missing requirement coverage. Use the **Roadmap** output template below to shape the milestone planning payload you send to `gsd_plan_milestone`. Call `gsd_plan_milestone` to persist the milestone planning fields and render `{{milestoneId}}-ROADMAP.md` from DB state. Do **not** write `{{milestoneId}}-ROADMAP.md`, `ROADMAP.md`, or other planning artifacts manually. If planning produces structural decisions, append them to `.gsd/DECISIONS.md`. {{skillActivation}}
+Plan milestone {{milestoneId}} ("{{milestoneTitle}}"). Read `.gsd/DECISIONS.md` if it exists — respect existing decisions. Read `.gsd/REQUIREMENTS.md` if it exists and treat Active requirements as the capability contract. If `REQUIREMENTS.md` is missing, continue in legacy compatibility mode but explicitly note missing requirement coverage. Use the **Roadmap** output template below to shape the milestone planning payload you send to `gsd_plan_milestone`. Call `gsd_plan_milestone` to persist the milestone planning fields and render `{{milestoneId}}-ROADMAP.md` from DB state. Do **not** write `{{milestoneId}}-ROADMAP.md`, `ROADMAP.md`, or other planning artifacts manually. If planning produces structural decisions, append them to `.gsd/DECISIONS.md`. {{skillActivation}} Fill the Horizontal Checklist section with cross-cutting concerns considered during planning (requirements re-read, decisions re-evaluated, graceful shutdown, revenue paths, auth boundary, shared resources, reconnection). Omit for trivial milestones.
 ## Requirement Rules

package/src/resources/extensions/gsd/prompts/guided-plan-slice.md CHANGED Viewed

@@ -1,3 +1,3 @@
-Plan slice {{sliceId}} ("{{sliceTitle}}") of milestone {{milestoneId}}. Read `.gsd/DECISIONS.md` if it exists — respect existing decisions. Read `.gsd/REQUIREMENTS.md` if it exists — identify which Active requirements the roadmap says this slice owns or supports, and ensure the plan delivers them. Read the roadmap boundary map, any existing context/research files, and dependency summaries. Use the **Slice Plan** and **Task Plan** output templates below. Decompose into tasks with must-haves. Fill the `Proof Level` and `Integration Closure` sections truthfully so the plan says what class of proof this slice really delivers and what end-to-end wiring still remains. Call `gsd_plan_slice` to persist the slice plan — the tool writes `{{sliceId}}-PLAN.md` and individual `T##-PLAN.md` files to disk and persists to DB. Do **not** write plan files manually — use the DB-backed tool so state stays consistent. If planning produces structural decisions, call `gsd_decision_save` for each — the tool auto-assigns IDs and regenerates `.gsd/DECISIONS.md` automatically. {{skillActivation}} Before finishing, self-audit the plan: every must-have maps to at least one task, every task has complete sections (steps, must-haves, verification, observability impact, inputs, and expected output), task ordering is consistent with no circular references, every pair of artifacts that must connect has an explicit wiring step, task scope targets 2–5 steps and 3–8 files (6–8 steps or 8–10 files — consider splitting; 10+ steps or 12+ files — must split), the plan honors locked decisions from context/research/decisions artifacts, the proof-level wording does not overclaim live integration if only fixture/contract proof is planned, every Active requirement this slice owns has at least one task with verification that proves it is met, and every task produces real user-facing progress — if the slice has a UI surface at least one task builds the real UI, if it has an API at least one task connects it to a real data source, and showing the completed result to a non-technical stakeholder would demonstrate real product progress rather than developer artifacts.
+Plan slice {{sliceId}} ("{{sliceTitle}}") of milestone {{milestoneId}}. Read `.gsd/DECISIONS.md` if it exists — respect existing decisions. Read `.gsd/REQUIREMENTS.md` if it exists — identify which Active requirements the roadmap says this slice owns or supports, and ensure the plan delivers them. Read the roadmap boundary map, any existing context/research files, and dependency summaries. Use the **Slice Plan** and **Task Plan** output templates below. Decompose into tasks with must-haves. Fill the `Proof Level` and `Integration Closure` sections truthfully so the plan says what class of proof this slice really delivers and what end-to-end wiring still remains. Call `gsd_plan_slice` to persist the slice plan — the tool writes `{{sliceId}}-PLAN.md` and individual `T##-PLAN.md` files to disk and persists to DB. Do **not** write plan files manually — use the DB-backed tool so state stays consistent. If planning produces structural decisions, call `gsd_decision_save` for each — the tool auto-assigns IDs and regenerates `.gsd/DECISIONS.md` automatically. {{skillActivation}} Before finishing, self-audit the plan: every must-have maps to at least one task, every task has complete sections (steps, must-haves, verification, observability impact, inputs, and expected output), task ordering is consistent with no circular references, every pair of artifacts that must connect has an explicit wiring step, task scope targets 2–5 steps and 3–8 files (6–8 steps or 8–10 files — consider splitting; 10+ steps or 12+ files — must split), the plan honors locked decisions from context/research/decisions artifacts, the proof-level wording does not overclaim live integration if only fixture/contract proof is planned, every Active requirement this slice owns has at least one task with verification that proves it is met, and every task produces real user-facing progress — if the slice has a UI surface at least one task builds the real UI, if it has an API at least one task connects it to a real data source, and showing the completed result to a non-technical stakeholder would demonstrate real product progress rather than developer artifacts, and quality gate coverage — for non-trivial slices, Threat Surface (Q3: abuse, data exposure, input trust) and Requirement Impact (Q4: requirements touched, re-verify, decisions revisited) sections are present. For non-trivial tasks, Failure Modes (Q5), Load Profile (Q6), and Negative Tests (Q7) are filled in task plans.
 {{inlinedTemplates}}

package/src/resources/extensions/gsd/prompts/plan-milestone.md CHANGED Viewed

@@ -47,7 +47,7 @@ Then:
 2. {{skillActivation}}
 3. Create the roadmap: decompose into demoable vertical slices — as many as the work genuinely needs, no more. A simple feature might be 1 slice. Don't decompose for decomposition's sake.
 4. Order by risk (high-risk first)
-5. Call `gsd_plan_milestone` to persist the milestone planning fields and slice rows in the DB-backed planning path. Do **not** write `{{outputPath}}`, `ROADMAP.md`, or other planning artifacts manually — the planning tool owns roadmap rendering and persistence.
+5. Call `gsd_plan_milestone` to persist the milestone planning fields, slice rows, and **horizontal checklist** in the DB-backed planning path. Do **not** write `{{outputPath}}`, `ROADMAP.md`, or other planning artifacts manually — the planning tool owns roadmap rendering and persistence.
 6. If planning produced structural decisions (e.g. slice ordering rationale, technology choices, scope exclusions), call `gsd_decision_save` for each decision — the tool auto-assigns IDs and regenerates `.gsd/DECISIONS.md` automatically.
 ## Requirement Mapping Rules

package/src/resources/extensions/gsd/prompts/plan-slice.md CHANGED Viewed

@@ -57,14 +57,18 @@ Then:
    - Include `Observability / Diagnostics` for backend, integration, async, stateful, or UI slices where failure diagnosis matters.
    - Fill `Proof Level` and `Integration Closure` when the slice crosses runtime boundaries or has meaningful integration concerns.
    - **Omit these sections entirely for simple slices** where they would all be "none" or trivially obvious.
-5. Decompose the slice into tasks, each fitting one context window. Each task needs:
+5. **Quality gates** — for non-trivial slices, fill the Threat Surface (Q3) and Requirement Impact (Q4) sections in the slice plan:
+   - **Threat Surface:** Identify abuse scenarios, data exposure risks, and input trust boundaries. Required when the slice handles user input, authentication, authorization, or sensitive data. Omit entirely for internal refactoring or simple changes.
+   - **Requirement Impact:** List which existing requirements this slice touches, what must be re-verified after shipping, and which prior decisions should be reconsidered. Omit entirely if no existing requirements are affected.
+   - For each task in a non-trivial slice, fill Failure Modes (Q5), Load Profile (Q6), and Negative Tests (Q7) in the task plan when the task has external dependencies, shared resources, or non-trivial input handling. Omit for simple tasks.
+6. Decompose the slice into tasks, each fitting one context window. Each task needs:
    - a concrete, action-oriented title
    - the inline task entry fields defined in the plan.md template (Why / Files / Do / Verify / Done when)
    - a matching task plan file with description, steps, must-haves, verification, inputs, and expected output
    - **Inputs and Expected Output must list concrete backtick-wrapped file paths** (e.g. `` `src/types.ts` ``). These are machine-parsed to derive task dependencies — vague prose without paths breaks parallel execution. Every task must have at least one output file path.
    - Observability Impact section **only if the task touches runtime boundaries, async flows, or error paths** — omit it otherwise
-6. **Persist planning state through `gsd_plan_slice`.** Call it with the full slice planning payload (goal, demo, must-haves, verification, tasks, and metadata). The tool inserts all tasks in the same transaction, writes to the DB, and renders `{{outputPath}}` and `{{slicePath}}/tasks/T##-PLAN.md` files automatically. Do **not** call `gsd_plan_task` separately — `gsd_plan_slice` handles task persistence. Do **not** rely on direct `PLAN.md` writes as the source of truth; the DB-backed tool is the canonical write path for slice and task planning state.
-7. **Self-audit the plan.** Walk through each check — if any fail, fix the plan files before moving on:
+7. **Persist planning state through `gsd_plan_slice`.** Call it with the full slice planning payload (goal, demo, must-haves, verification, tasks, and metadata). The tool inserts all tasks in the same transaction, writes to the DB, and renders `{{outputPath}}` and `{{slicePath}}/tasks/T##-PLAN.md` files automatically. Do **not** call `gsd_plan_task` separately — `gsd_plan_slice` handles task persistence. Do **not** rely on direct `PLAN.md` writes as the source of truth; the DB-backed tool is the canonical write path for slice and task planning state.
+8. **Self-audit the plan.** Walk through each check — if any fail, fix the plan files before moving on:
     - **Completion semantics:** If every task were completed exactly as written, the slice goal/demo should actually be true.
     - **Requirement coverage:** Every must-have in the slice maps to at least one task. No must-have is orphaned. If `REQUIREMENTS.md` exists, every Active requirement this slice owns maps to at least one task.
     - **Task completeness:** Every task has steps, must-haves, verification, inputs, and expected output — none are blank or vague. Inputs and Expected Output list backtick-wrapped file paths, not prose descriptions.
@@ -72,6 +76,7 @@ Then:
     - **Key links planned:** For every pair of artifacts that must connect, there is an explicit step that wires them.
     - **Scope sanity:** Target 2–5 steps and 3–8 files per task. 10+ steps or 12+ files — must split. Each task must be completable in a single fresh context window.
     - **Feature completeness:** Every task produces real, user-facing progress — not just internal scaffolding.
+    - **Quality gate coverage:** For non-trivial slices, Threat Surface and Requirement Impact sections are present and specific (not placeholder text). For non-trivial tasks, Failure Modes, Load Profile, and Negative Tests are addressed in the task plan.
 10. If planning produced structural decisions, append them to `.gsd/DECISIONS.md`
 11. {{commitInstruction}}

package/src/resources/extensions/gsd/prompts/reassess-roadmap.md CHANGED Viewed

@@ -36,6 +36,9 @@ Ask yourself:
 - Did assumptions in remaining slice descriptions turn out wrong?
 - If `.gsd/REQUIREMENTS.md` exists: did this slice validate, invalidate, defer, block, or newly surface requirements?
 - If `.gsd/REQUIREMENTS.md` exists: does the remaining roadmap still provide credible coverage for Active requirements, including launchability, primary user loop, continuity, and failure visibility where relevant?
+- Are the Threat Surface and Requirement Impact sections in completed slice plans still accurate for remaining slices?
+- Did this slice's Operational Readiness reveal monitoring gaps that remaining slices should address?
+- Should any Horizontal Checklist items be updated based on what was actually built?
 ### Success-Criterion Coverage Check

package/src/resources/extensions/gsd/prompts/replan-slice.md CHANGED Viewed

@@ -32,7 +32,7 @@ Consider these captures when rewriting the remaining tasks — they represent th
 1. Read the blocker task summary carefully. Understand exactly what was discovered and why it blocks the current plan.
 2. Analyze the remaining `[ ]` tasks in the slice plan. Determine which are still valid, which need modification, and which should be replaced.
-3. **Persist replan state through `gsd_replan_slice`.** Call it with: `milestoneId`, `sliceId`, `blockerTaskId`, `blockerDescription`, `whatChanged`, `updatedTasks` (array of task objects with taskId, title, description, estimate, files, verify, inputs, expectedOutput), `removedTaskIds` (array of task ID strings). The tool structurally enforces preservation of completed tasks, writes replan history to the DB, re-renders `{{planPath}}`, and renders `{{replanPath}}`.
+3. **Persist replan state through `gsd_replan_slice`.** Call it with: `milestoneId`, `sliceId`, `blockerTaskId`, `blockerDescription`, `whatChanged`, `updatedTasks` (array of task objects with taskId, title, description, estimate, files, verify, inputs, expectedOutput), `removedTaskIds` (array of task ID strings). The tool structurally enforces preservation of completed tasks, writes replan history to the DB, re-renders `{{planPath}}`, and renders `{{replanPath}}`. Preserve or update the Threat Surface and Requirement Impact sections if the replan changes the slice's security posture or requirement coverage.
 4. If any incomplete task had a `T0x-PLAN.md`, remove or rewrite it to match the new task description.
 5. Do not commit manually — the system auto-commits your changes after this unit completes.

package/src/resources/extensions/gsd/prompts/run-uat.md CHANGED Viewed

@@ -29,7 +29,7 @@ You are the UAT runner. Execute every check defined in `{{uatPath}}` as deeply a
 - `runtime-executable` — execute the specified command or script. Capture stdout/stderr as evidence. Record pass/fail based on exit code and output.
 - `live-runtime` — exercise the real runtime path. Start or connect to the app/service if needed, use browser/runtime/network checks, and verify observable behavior.
 - `mixed` — run all automatable artifact-driven and live-runtime checks. Separate any remaining human-only checks explicitly.
-- `human-experience` — automate setup, preconditions, screenshots, logs, and objective checks, but do **not** invent subjective PASS results. Mark taste-based, experiential, or purely human-judgment checks as `NEEDS-HUMAN` and use an overall verdict of `PARTIAL` unless every required check was objective and passed.
+- `human-experience` — automate setup, preconditions, screenshots, logs, and objective checks, but do **not** invent subjective PASS results. Mark taste-based, experiential, or purely human-judgment checks as `NEEDS-HUMAN`. Use an overall verdict of `PASS` when all automatable checks succeed (even if human-only checks remain as `NEEDS-HUMAN`). Use `PARTIAL` only when automatable checks themselves were inconclusive.
 ### Evidence tools
@@ -51,9 +51,9 @@ For each check, record:
 - `PASS`, `FAIL`, or `NEEDS-HUMAN`
 After running all checks, compute the **overall verdict**:
-- `PASS` — all required checks passed and no human-only checks remain
-- `FAIL` — one or more checks failed
-- `PARTIAL` — some checks passed, but one or more checks were skipped, inconclusive, or still require human judgment
+- `PASS` — all automatable checks passed. Any remaining checks that honestly require human judgment are marked `NEEDS-HUMAN` with clear instructions for the human reviewer. (This is the correct verdict for mixed/human-experience/live-runtime modes when all automatable checks succeed.)
+- `FAIL` — one or more automatable checks failed
+- `PARTIAL` — one or more automatable checks were skipped or returned inconclusive results (not the same as `NEEDS-HUMAN` — use PARTIAL only when the agent itself could not determine pass/fail for a check it was supposed to automate)
 Call `gsd_summary_save` with `milestone_id: {{milestoneId}}`, `slice_id: {{sliceId}}`, `artifact_type: "ASSESSMENT"`, and the full UAT result markdown as `content` — the tool computes the file path and persists to both DB and disk. The content should follow this format: