npm - @tekyzinc/gsd-t - Versions diffs - 2.31.16 → 2.31.18 - Mend

@tekyzinc/gsd-t 2.31.16 → 2.31.18

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (6) hide show

package/commands/gsd-t-complete-milestone.md +24 -0
package/commands/gsd-t-debug.md +23 -1
package/commands/gsd-t-execute.md +78 -50
package/commands/gsd-t-partition.md +57 -0
package/commands/gsd-t-verify.md +31 -0
package/package.json +1 -1

package/commands/gsd-t-complete-milestone.md CHANGED Viewed

@@ -17,6 +17,30 @@ If status is not VERIFIED:
 If `--force` flag provided, proceed with warning in archive.
+## Step 1.5: Smoke Test Artifact Gate (MANDATORY — Categories 2 and 7)
+Before archiving, verify that high-risk features have testable artifacts. This gate catches what code review and unit tests cannot.
+**Scan this milestone's domains for any of the following:**
+- Audio capture/playback, speech recognition/synthesis
+- GPU/WebGPU/WebGL compute or rendering
+- ML inference, model loading, quantized model execution
+- Background workers, service workers, IPC channels
+- Native APIs (camera, bluetooth, filesystem, microphone)
+- WebAssembly modules
+- Any feature whose only prior "test" was manual user interaction
+**For each high-risk feature found:**
+1. Check that a smoke test script exists (in `scripts/`, `tests/`, or `.gsd-t/smoke-tests/`)
+2. Check that the script was run and passed (evidence in token-log.md, CI output, or a `.gsd-t/smoke-tests/{feature}.md` file with run results)
+3. If manual steps remain unavoidable: `.gsd-t/smoke-tests/{feature}.md` must exist documenting exact steps and confirming they passed
+**If any high-risk feature lacks a smoke test artifact → BLOCK completion.**
+Do not proceed to archiving. Create the smoke test now, run it, confirm it passes, then continue.
+> This gate exists because complete-milestone is the last opportunity to catch "shipped blind" features before they become user-facing bugs requiring 15 debug sessions to resolve.
 ## Step 2: Gap Analysis Gate
 After verification passes, run a gap analysis against `docs/requirements.md` scoped to this milestone's deliverables:

package/commands/gsd-t-debug.md CHANGED Viewed

@@ -71,6 +71,27 @@ The contract didn't specify something it should have. Symptoms:
 → Update the contract, then fix implementations on both sides.
+## Step 2.5: Reproduce First (MANDATORY — Category 5)
+**A fix attempt without a reproduction script is a guess, not a fix.**
+Before touching any code:
+1. **Write a reproduction script** that demonstrates the bug. Automate as much as possible:
+   - Unit/integration bug → write a failing test that proves the bug exists
+   - UI/audio/GPU/worker bug (not fully automatable) → write the closest possible script: a headless probe, a log-based trigger, a mock that replicates the failure path. Document the manual remainder explicitly.
+   - If you cannot write any form of reproduction → you do not yet understand the bug. Keep investigating until you can.
+2. **Run the reproduction** and confirm it fails before attempting any fix.
+3. **Never close a debug session with "ready for testing."** A session closes only when the reproduction script passes. If manual steps remain, document them explicitly and confirm they passed.
+4. **Log the reproduction script path** in `.gsd-t/progress.md` Decision Log: what it tests, how to run it, what passing looks like.
+> This rule exists because code review cannot detect silent runtime failures (GPU compute shaders, audio context state, worker message drops). Only execution proves correctness.
+---
 ## Step 3: Debug (Solo or Team)
 ### Deviation Rules
@@ -84,13 +105,14 @@ When you encounter unexpected situations during the fix:
 **3-attempt limit**: If your fix doesn't work after 3 attempts, log to `.gsd-t/deferred-items.md` and stop trying.
 ### Solo Mode
-1. Reproduce the issue
+1. Reproduce the issue — **reproduction script must exist before step 2** (see Step 2.5)
 2. Trace through the relevant domain(s)
 3. Check contract compliance at each boundary
 4. Identify root cause
 5. **Destructive Action Guard**: If the fix requires destructive or structural changes (dropping tables, removing columns, changing schema, replacing architecture patterns, removing working modules) → STOP and present the change to the user with what exists, what will change, what will break, and a safe migration path. Wait for explicit approval.
 6. Fix and test — **adapt the fix to existing structures**, not the other way around
 7. Update contracts if needed
+8. **Category 6 — Bug Isolation Check**: After applying the fix, run the FULL test suite and all smoke tests — not just the reproduction script. Do not assume the bug was isolated. A fix that resolves one failure frequently uncovers adjacent failures. Every test must pass before the session closes.
 ### Team Mode (for complex cross-domain bugs)
 ```

package/commands/gsd-t-execute.md CHANGED Viewed

@@ -19,10 +19,11 @@ Identify:
 - Which tasks are unblocked (no pending dependencies)
 - Which tasks are blocked (waiting on checkpoints)
-## Step 2: QA Setup
+## Step 2: QA Subagent
-After completing each task, spawn a QA subagent via the Task tool to run tests:
+In solo mode, QA runs inside each domain subagent (see Step 3). In team mode, the lead spawns QA subagents at each domain checkpoint using the pattern below.
+**QA subagent prompt:**
 ```
 Task subagent (general-purpose, model: haiku):
 "Run the full test suite for this project and report pass/fail counts.
@@ -31,21 +32,9 @@ Write edge case tests for any new code paths in this task: {task description}.
 Report: test pass/fail status and any coverage gaps found."
 ```
-**OBSERVABILITY LOGGING (MANDATORY):**
-Before spawning — run via Bash:
-`T_START=$(date +%s) && DT_START=$(date +"%Y-%m-%d %H:%M") && TOK_START=${CLAUDE_CONTEXT_TOKENS_USED:-0} && TOK_MAX=${CLAUDE_CONTEXT_TOKENS_MAX:-200000}`
-After subagent returns — run via Bash:
-`T_END=$(date +%s) && DT_END=$(date +"%Y-%m-%d %H:%M") && TOK_END=${CLAUDE_CONTEXT_TOKENS_USED:-0} && DURATION=$((T_END-T_START))`
-Compute tokens and compaction:
-- No compaction (TOK_END >= TOK_START): `TOKENS=$((TOK_END-TOK_START))`, COMPACTED=null
-- Compaction detected (TOK_END < TOK_START): `TOKENS=$(((TOK_MAX-TOK_START)+TOK_END))`, COMPACTED=$DT_END
-Append to `.gsd-t/token-log.md` (create with header `| Datetime-start | Datetime-end | Command | Step | Model | Duration(s) | Notes | Tokens | Compacted |` if missing):
-`| {DT_START} | {DT_END} | gsd-t-execute | Step 2 | haiku | {DURATION}s | task: {task-name}, {pass/fail} | {TOKENS} | {COMPACTED} |`
 If QA found issues, append each to `.gsd-t/qa-issues.md` (create with header `| Date | Command | Step | Model | Duration(s) | Severity | Finding |` if missing):
 `| {DT_START} | gsd-t-execute | Step 2 | haiku | {DURATION}s | {severity} | {finding} |`
-QA failure on any task blocks proceeding to the next task.
 ## Step 3: Choose Execution Mode
 ### Wave Scheduling (read first)
@@ -60,42 +49,81 @@ Before choosing solo or team mode, read the `## Wave Execution Groups` section i
 **If no wave groups are defined** (older plans): fall back to the `Execution Order` list.
-### Solo Mode (default)
-Execute tasks yourself following the wave groups (or execution order) in `integration-points.md`.
-### Deviation Rules (MANDATORY for every task)
-When you encounter unexpected situations during a task:
-1. **Bug in existing code blocking progress** → Fix it immediately, up to 3 attempts. If still blocked after 3 attempts, add to `.gsd-t/deferred-items.md` and move to the next task.
-2. **Missing functionality that the task clearly requires** → Add the minimum required code to unblock the task. Do not gold-plate. Document in commit message.
-3. **Blocker (missing file, wrong API, broken dependency)** → Fix the blocker and continue. Add a note to `.gsd-t/deferred-items.md` if the fix was more than trivial.
-4. **Architectural change required** → STOP immediately. Do NOT proceed. Apply the Destructive Action Guard: explain what exists, what needs to change, what breaks, and a migration path. Wait for explicit user approval.
-**3-attempt limit**: If any fix attempt fails 3 times, move on — don't loop. Log to `.gsd-t/deferred-items.md`.
-For each task:
-1. Read the task description, files list, and contract refs
-2. Read the relevant contract(s) — implement EXACTLY what they specify
-3. Read the domain's constraints.md — follow all patterns
-4. **Destructive Action Guard**: Before implementing, check if the task involves any destructive or structural changes (DROP TABLE, schema changes that lose data, removing existing modules, replacing architecture patterns). If YES → STOP and present the change to the user with what exists, what will change, what will break, and a safe migration path. Wait for explicit approval before proceeding.
-5. Implement the task
-6. Verify acceptance criteria are met
-7. **Write comprehensive tests for every new or changed code path** (MANDATORY — no exceptions):
-   a. **Unit/integration tests**: Cover the happy path, common edge cases, error cases, and boundary conditions for every new or modified function
-   b. **Playwright E2E tests** (if UI/routes/flows/modes changed): Detect `playwright.config.*` or Playwright in dependencies. If present:
-      - Create NEW specs for new features, pages, modes, or flows — not just update existing ones
-      - Cover: happy path, form validation errors, empty states, loading states, error states, responsive breakpoints
-      - Cover: all feature modes/flags (e.g., if `--component` mode was added, test `--component` mode specifically)
-      - Cover: common edge cases (network errors, invalid input, rapid clicking, back/forward navigation)
-      - Use descriptive test names that explain the scenario being tested
-   c. If no test framework exists: set one up as part of the task (at minimum, Playwright for E2E)
-   d. **"No feature code without test code"** — implementation and tests are ONE deliverable, not two separate steps
-8. **Run ALL tests** — unit, integration, and full Playwright suite. Fix any failures before proceeding (up to 2 attempts)
-9. Run the Pre-Commit Gate checklist from CLAUDE.md — update ALL affected docs BEFORE committing
-10. **Commit immediately** after each task: `feat({domain}/task-{N}): {description}` — do NOT batch commits at phase end
-11. Update `.gsd-t/progress.md` — mark task complete
-12. If you've reached a CHECKPOINT in integration-points.md, pause and verify the contract before continuing
+### Solo Mode (default) — Domain Subagent Pattern
+Each domain's work runs in an isolated Task subagent with a fresh context window. The orchestrator (this agent) stays lightweight — it only spawns subagents, collects summaries, verifies checkpoints, and updates progress.
+**OBSERVABILITY LOGGING (MANDATORY) — repeat for every domain subagent spawn:**
+Before spawning — run via Bash:
+`T_START=$(date +%s) && DT_START=$(date +"%Y-%m-%d %H:%M") && TOK_START=${CLAUDE_CONTEXT_TOKENS_USED:-0} && TOK_MAX=${CLAUDE_CONTEXT_TOKENS_MAX:-200000}`
+After subagent returns — run via Bash:
+`T_END=$(date +%s) && DT_END=$(date +"%Y-%m-%d %H:%M") && TOK_END=${CLAUDE_CONTEXT_TOKENS_USED:-0} && DURATION=$((T_END-T_START))`
+Compute tokens and compaction:
+- No compaction (TOK_END >= TOK_START): `TOKENS=$((TOK_END-TOK_START))`, COMPACTED=null
+- Compaction detected (TOK_END < TOK_START): `TOKENS=$(((TOK_MAX-TOK_START)+TOK_END))`, COMPACTED=$DT_END
+Append to `.gsd-t/token-log.md` (create with header `| Datetime-start | Datetime-end | Command | Step | Model | Duration(s) | Notes | Tokens | Compacted |` if missing):
+`| {DT_START} | {DT_END} | gsd-t-execute | domain:{domain-name} | sonnet | {DURATION}s | {N} tasks, {pass/fail} | {TOKENS} | {COMPACTED} |`
+**For each domain (in wave order), spawn:**
+```
+Task subagent (general-purpose, model: sonnet, mode: bypassPermissions):
+"You are executing all tasks for the {domain-name} domain.
+Read before starting (load your own context — do not assume anything):
+1. CLAUDE.md — project conventions (CRITICAL)
+2. .gsd-t/domains/{domain-name}/scope.md — what you own
+3. .gsd-t/domains/{domain-name}/constraints.md — patterns to follow
+4. ALL files in .gsd-t/contracts/ — your interfaces
+5. .gsd-t/domains/{domain-name}/tasks.md — your task list
+6. .gsd-t/contracts/integration-points.md — wave order and checkpoints
+Execute each incomplete task in order:
+1. Read task description, files list, and contract refs
+2. Read relevant contracts — implement EXACTLY what they specify
+3. Destructive Action Guard: if task involves DROP TABLE, schema changes that lose
+   data, removing working modules, or replacing architecture patterns → write a
+   NEEDS-APPROVAL entry to .gsd-t/deferred-items.md, skip the task, continue
+4. Implement the task
+5. Verify acceptance criteria are met
+6. Write comprehensive tests (MANDATORY — no feature code without test code):
+   - Unit/integration: happy path + edge cases + error cases for every new/changed function
+   - Playwright E2E (if UI/routes/flows changed): new specs for new features, cover
+     all modes, form validation, empty/loading/error states, common edge cases
+   - If no test framework exists: set one up as part of this task
+7. Run ALL tests — unit, integration, Playwright. Fix failures (up to 2 attempts)
+8. Run Pre-Commit Gate checklist from CLAUDE.md — update all affected docs BEFORE committing
+9. Commit immediately: feat({domain-name}/task-{N}): {description}
+10. Update .gsd-t/progress.md — mark task complete
+11. Spawn QA subagent (model: haiku) after each task:
+    'Run the full test suite. Read .gsd-t/contracts/ for definitions.
+     Report: pass/fail counts and coverage gaps.'
+    If QA fails, fix before proceeding. Append issues to .gsd-t/qa-issues.md.
+Deviation rules:
+- Bug blocking progress → fix, max 3 attempts; if still blocked, log to
+  .gsd-t/deferred-items.md and continue to next task
+- Missing dependency task requires → add minimum needed, document in commit message
+- Non-trivial blocker → fix and log to .gsd-t/deferred-items.md
+- Architectural change required → write NEEDS-APPROVAL to .gsd-t/deferred-items.md,
+  skip the task, continue — never self-approve structural changes
+When all tasks are complete, report:
+- Tasks completed: N/N
+- Test results: pass/fail counts
+- Commits made: list of commit hashes
+- Deferred items (if any): list from .gsd-t/deferred-items.md"
+```
+**After each domain subagent returns (orchestrator responsibilities):**
+1. Log to `.gsd-t/token-log.md` (see observability block above)
+2. Check `.gsd-t/deferred-items.md` for any `NEEDS-APPROVAL` entries — if found, STOP and present to user before spawning the next domain
+3. If a CHECKPOINT is reached per `integration-points.md`, verify contract compliance (see Step 4) before proceeding to the next wave/domain
+4. Update `.gsd-t/progress.md` with domain completion status
 ### Team Mode (when agent teams are enabled)
 Spawn teammates for domains within the same wave. Only domains in the same wave can run in parallel — do not spawn teammates for domains in different waves simultaneously:

package/commands/gsd-t-partition.md CHANGED Viewed

@@ -17,6 +17,63 @@ If `.gsd-t/` doesn't exist, create the full directory structure:
 └── progress.md
 ```
+## Step 1.5: Assumption Audit (MANDATORY — complete before domain work begins)
+Before partitioning, surface and lock down all assumptions baked into the requirements. Unexamined assumptions become architectural decisions no one approved.
+Work through each category below. For every match found, write the explicit disposition into the affected domain's `constraints.md` and into the Decision Log in `.gsd-t/progress.md`.
+---
+### Category 1: External Reference Assumptions
+Scan requirements for any external project, file, component, library, or URL mentioned by name or path. For each one found, explicitly confirm which disposition applies — and lock it in the contract before any domain touches it:
+| Disposition | Meaning |
+|-------------|---------|
+| `USE`       |  Import and depend on it — treat as a dependency |
+| `INSPECT`   |  Read source for patterns only — do not import or copy code |
+| `BUILD`     |  Build equivalent functionality from scratch — do not read or use it |
+**No external reference survives partition without a locked disposition.**
+Trigger phrases to watch for: "reference X", "like X", "similar to Y", "see W for how it handles Z", any file path or project name, any URL.
+> If Level 3 (Full Auto): state the inferred disposition and reason; lock it unless it's ambiguous.
+> If ambiguous (e.g., "reference X" could mean USE or INSPECT): pause and ask the user before proceeding.
+---
+### Category 3: Black Box Assumptions
+Any component, module, or library **not written in this milestone** that a domain will call, import, or depend on → the agent that executes that domain must read its source before treating it as correct. This includes internal project modules written in a previous milestone.
+For each such component identified:
+1. Name it explicitly in the domain's `constraints.md` under a `## Must Read Before Using` section
+2. List the specific functions or behaviors the domain depends on
+3. The execute agent is prohibited from treating it as a black box — it must read the listed items before implementing
+---
+### Category 4: User Intent Assumptions
+Scan requirements for ambiguous language. Flag every instance where intent could be interpreted more than one way. Common patterns:
+- "like X" / "similar to Y" — does this mean the same UX, the same architecture, or just the same concept?
+- "the way X handles it" — inspiration, direct port, or behavioral equivalent?
+- "reference Z" — does this mean read it, use it, or replicate it?
+- "build something that does W" — from scratch, or using an existing library?
+- Any requirement where a reasonable developer could make two different implementation choices
+For each ambiguous item:
+1. State the two (or more) possible interpretations explicitly
+2. State which interpretation you are locking in and why
+3. If genuinely unclear: pause and ask the user — do not infer and proceed
+> **Rule**: Ambiguous intent that reaches execute unresolved becomes a wrong assumption. Resolve it here or pay for it in debug sessions.
+---
 ## Step 2: Identify Domains
 Decompose the milestone into 2-5 independent domains. Each domain should:

package/commands/gsd-t-verify.md CHANGED Viewed

@@ -24,6 +24,37 @@ Run the full test audit directly:
 Verification cannot complete if any test fails or critical contract gaps remain.
+## Step 2.5: High-Risk Domain Gate (MANDATORY — Categories 2 and 7)
+Before running standard verification dimensions, check whether this milestone involves any high-risk domain:
+**High-risk domains**: audio capture/playback, GPU/WebGPU/WebGL, ML/inference/model loading, background workers, native APIs (camera, bluetooth, filesystem), IPC, WebAssembly, real-time data streams.
+**If any high-risk domain is present:**
+### Category 2 — Technology Reliability Gate
+Initialization success does not prove runtime correctness. These technologies can initialize cleanly and fail silently at runtime (compute shader errors, audio context state loss, worker message drops, inference failures).
+For each high-risk domain:
+1. A **smoke test script** must exist that exercises actual runtime behavior — not just initialization
+2. The smoke test must have been run and passed
+3. "It initialized without throwing" is NOT a passing smoke test
+4. If no smoke test exists → create one now before proceeding with any other verification dimension
+5. Smoke test failure → verification FAIL (not WARN)
+### Category 7 — Manual QA as Test Gate
+"The user will manually test it" is not a test artifact. Scan the milestone's domains for any feature whose acceptance criteria relies solely on manual user testing.
+For each such feature:
+1. A smoke test script must exist that automates as much of the verification as possible
+2. Any remaining manual steps must be explicitly documented in `.gsd-t/smoke-tests/{feature}.md` with exact steps and expected outcomes
+3. The documented manual steps must have been executed and passed (noted in the file)
+4. If neither automated smoke test nor documented manual procedure exists → verification FAIL
+> These gates exist because the pre-commit checklist "did you run the affected tests?" is meaningless when the only test is "user presses Ctrl+Space." That is not a test. It is hope.
+---
 ## Step 3: Define Verification Dimensions
 Standard dimensions (adjust based on project):

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "@tekyzinc/gsd-t",
-  "version": "2.31.16",
+  "version": "2.31.18",
   "description": "GSD-T: Contract-Driven Development for Claude Code — 46 slash commands with backlog management, impact analysis, test sync, milestone archival, and PRD generation",
   "author": "Tekyz, Inc.",
   "license": "MIT",