@tekyzinc/gsd-t 2.31.16 → 2.31.18

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -17,6 +17,30 @@ If status is not VERIFIED:
17
17
 
18
18
  If `--force` flag provided, proceed with warning in archive.
19
19
 
20
+ ## Step 1.5: Smoke Test Artifact Gate (MANDATORY — Categories 2 and 7)
21
+
22
+ Before archiving, verify that high-risk features have testable artifacts. This gate catches what code review and unit tests cannot.
23
+
24
+ **Scan this milestone's domains for any of the following:**
25
+ - Audio capture/playback, speech recognition/synthesis
26
+ - GPU/WebGPU/WebGL compute or rendering
27
+ - ML inference, model loading, quantized model execution
28
+ - Background workers, service workers, IPC channels
29
+ - Native APIs (camera, bluetooth, filesystem, microphone)
30
+ - WebAssembly modules
31
+ - Any feature whose only prior "test" was manual user interaction
32
+
33
+ **For each high-risk feature found:**
34
+
35
+ 1. Check that a smoke test script exists (in `scripts/`, `tests/`, or `.gsd-t/smoke-tests/`)
36
+ 2. Check that the script was run and passed (evidence in token-log.md, CI output, or a `.gsd-t/smoke-tests/{feature}.md` file with run results)
37
+ 3. If manual steps remain unavoidable: `.gsd-t/smoke-tests/{feature}.md` must exist documenting exact steps and confirming they passed
38
+
39
+ **If any high-risk feature lacks a smoke test artifact → BLOCK completion.**
40
+ Do not proceed to archiving. Create the smoke test now, run it, confirm it passes, then continue.
41
+
42
+ > This gate exists because complete-milestone is the last opportunity to catch "shipped blind" features before they become user-facing bugs requiring 15 debug sessions to resolve.
43
+
20
44
  ## Step 2: Gap Analysis Gate
21
45
 
22
46
  After verification passes, run a gap analysis against `docs/requirements.md` scoped to this milestone's deliverables:
@@ -71,6 +71,27 @@ The contract didn't specify something it should have. Symptoms:
71
71
 
72
72
  → Update the contract, then fix implementations on both sides.
73
73
 
74
+ ## Step 2.5: Reproduce First (MANDATORY — Category 5)
75
+
76
+ **A fix attempt without a reproduction script is a guess, not a fix.**
77
+
78
+ Before touching any code:
79
+
80
+ 1. **Write a reproduction script** that demonstrates the bug. Automate as much as possible:
81
+ - Unit/integration bug → write a failing test that proves the bug exists
82
+ - UI/audio/GPU/worker bug (not fully automatable) → write the closest possible script: a headless probe, a log-based trigger, a mock that replicates the failure path. Document the manual remainder explicitly.
83
+ - If you cannot write any form of reproduction → you do not yet understand the bug. Keep investigating until you can.
84
+
85
+ 2. **Run the reproduction** and confirm it fails before attempting any fix.
86
+
87
+ 3. **Never close a debug session with "ready for testing."** A session closes only when the reproduction script passes. If manual steps remain, document them explicitly and confirm they passed.
88
+
89
+ 4. **Log the reproduction script path** in `.gsd-t/progress.md` Decision Log: what it tests, how to run it, what passing looks like.
90
+
91
+ > This rule exists because code review cannot detect silent runtime failures (GPU compute shaders, audio context state, worker message drops). Only execution proves correctness.
92
+
93
+ ---
94
+
74
95
  ## Step 3: Debug (Solo or Team)
75
96
 
76
97
  ### Deviation Rules
@@ -84,13 +105,14 @@ When you encounter unexpected situations during the fix:
84
105
  **3-attempt limit**: If your fix doesn't work after 3 attempts, log to `.gsd-t/deferred-items.md` and stop trying.
85
106
 
86
107
  ### Solo Mode
87
- 1. Reproduce the issue
108
+ 1. Reproduce the issue — **reproduction script must exist before step 2** (see Step 2.5)
88
109
  2. Trace through the relevant domain(s)
89
110
  3. Check contract compliance at each boundary
90
111
  4. Identify root cause
91
112
  5. **Destructive Action Guard**: If the fix requires destructive or structural changes (dropping tables, removing columns, changing schema, replacing architecture patterns, removing working modules) → STOP and present the change to the user with what exists, what will change, what will break, and a safe migration path. Wait for explicit approval.
92
113
  6. Fix and test — **adapt the fix to existing structures**, not the other way around
93
114
  7. Update contracts if needed
115
+ 8. **Category 6 — Bug Isolation Check**: After applying the fix, run the FULL test suite and all smoke tests — not just the reproduction script. Do not assume the bug was isolated. A fix that resolves one failure frequently uncovers adjacent failures. Every test must pass before the session closes.
94
116
 
95
117
  ### Team Mode (for complex cross-domain bugs)
96
118
  ```
@@ -19,10 +19,11 @@ Identify:
19
19
  - Which tasks are unblocked (no pending dependencies)
20
20
  - Which tasks are blocked (waiting on checkpoints)
21
21
 
22
- ## Step 2: QA Setup
22
+ ## Step 2: QA Subagent
23
23
 
24
- After completing each task, spawn a QA subagent via the Task tool to run tests:
24
+ In solo mode, QA runs inside each domain subagent (see Step 3). In team mode, the lead spawns QA subagents at each domain checkpoint using the pattern below.
25
25
 
26
+ **QA subagent prompt:**
26
27
  ```
27
28
  Task subagent (general-purpose, model: haiku):
28
29
  "Run the full test suite for this project and report pass/fail counts.
@@ -31,21 +32,9 @@ Write edge case tests for any new code paths in this task: {task description}.
31
32
  Report: test pass/fail status and any coverage gaps found."
32
33
  ```
33
34
 
34
- **OBSERVABILITY LOGGING (MANDATORY):**
35
- Before spawning — run via Bash:
36
- `T_START=$(date +%s) && DT_START=$(date +"%Y-%m-%d %H:%M") && TOK_START=${CLAUDE_CONTEXT_TOKENS_USED:-0} && TOK_MAX=${CLAUDE_CONTEXT_TOKENS_MAX:-200000}`
37
- After subagent returns — run via Bash:
38
- `T_END=$(date +%s) && DT_END=$(date +"%Y-%m-%d %H:%M") && TOK_END=${CLAUDE_CONTEXT_TOKENS_USED:-0} && DURATION=$((T_END-T_START))`
39
- Compute tokens and compaction:
40
- - No compaction (TOK_END >= TOK_START): `TOKENS=$((TOK_END-TOK_START))`, COMPACTED=null
41
- - Compaction detected (TOK_END < TOK_START): `TOKENS=$(((TOK_MAX-TOK_START)+TOK_END))`, COMPACTED=$DT_END
42
- Append to `.gsd-t/token-log.md` (create with header `| Datetime-start | Datetime-end | Command | Step | Model | Duration(s) | Notes | Tokens | Compacted |` if missing):
43
- `| {DT_START} | {DT_END} | gsd-t-execute | Step 2 | haiku | {DURATION}s | task: {task-name}, {pass/fail} | {TOKENS} | {COMPACTED} |`
44
35
  If QA found issues, append each to `.gsd-t/qa-issues.md` (create with header `| Date | Command | Step | Model | Duration(s) | Severity | Finding |` if missing):
45
36
  `| {DT_START} | gsd-t-execute | Step 2 | haiku | {DURATION}s | {severity} | {finding} |`
46
37
 
47
- QA failure on any task blocks proceeding to the next task.
48
-
49
38
  ## Step 3: Choose Execution Mode
50
39
 
51
40
  ### Wave Scheduling (read first)
@@ -60,42 +49,81 @@ Before choosing solo or team mode, read the `## Wave Execution Groups` section i
60
49
 
61
50
  **If no wave groups are defined** (older plans): fall back to the `Execution Order` list.
62
51
 
63
- ### Solo Mode (default)
64
- Execute tasks yourself following the wave groups (or execution order) in `integration-points.md`.
65
-
66
- ### Deviation Rules (MANDATORY for every task)
67
-
68
- When you encounter unexpected situations during a task:
69
-
70
- 1. **Bug in existing code blocking progress** Fix it immediately, up to 3 attempts. If still blocked after 3 attempts, add to `.gsd-t/deferred-items.md` and move to the next task.
71
- 2. **Missing functionality that the task clearly requires** → Add the minimum required code to unblock the task. Do not gold-plate. Document in commit message.
72
- 3. **Blocker (missing file, wrong API, broken dependency)** → Fix the blocker and continue. Add a note to `.gsd-t/deferred-items.md` if the fix was more than trivial.
73
- 4. **Architectural change required** STOP immediately. Do NOT proceed. Apply the Destructive Action Guard: explain what exists, what needs to change, what breaks, and a migration path. Wait for explicit user approval.
74
-
75
- **3-attempt limit**: If any fix attempt fails 3 times, move on — don't loop. Log to `.gsd-t/deferred-items.md`.
76
-
77
- For each task:
78
- 1. Read the task description, files list, and contract refs
79
- 2. Read the relevant contract(s) implement EXACTLY what they specify
80
- 3. Read the domain's constraints.md follow all patterns
81
- 4. **Destructive Action Guard**: Before implementing, check if the task involves any destructive or structural changes (DROP TABLE, schema changes that lose data, removing existing modules, replacing architecture patterns). If YES → STOP and present the change to the user with what exists, what will change, what will break, and a safe migration path. Wait for explicit approval before proceeding.
82
- 5. Implement the task
83
- 6. Verify acceptance criteria are met
84
- 7. **Write comprehensive tests for every new or changed code path** (MANDATORY — no exceptions):
85
- a. **Unit/integration tests**: Cover the happy path, common edge cases, error cases, and boundary conditions for every new or modified function
86
- b. **Playwright E2E tests** (if UI/routes/flows/modes changed): Detect `playwright.config.*` or Playwright in dependencies. If present:
87
- - Create NEW specs for new features, pages, modes, or flows — not just update existing ones
88
- - Cover: happy path, form validation errors, empty states, loading states, error states, responsive breakpoints
89
- - Cover: all feature modes/flags (e.g., if `--component` mode was added, test `--component` mode specifically)
90
- - Cover: common edge cases (network errors, invalid input, rapid clicking, back/forward navigation)
91
- - Use descriptive test names that explain the scenario being tested
92
- c. If no test framework exists: set one up as part of the task (at minimum, Playwright for E2E)
93
- d. **"No feature code without test code"** implementation and tests are ONE deliverable, not two separate steps
94
- 8. **Run ALL tests** unit, integration, and full Playwright suite. Fix any failures before proceeding (up to 2 attempts)
95
- 9. Run the Pre-Commit Gate checklist from CLAUDE.md — update ALL affected docs BEFORE committing
96
- 10. **Commit immediately** after each task: `feat({domain}/task-{N}): {description}` — do NOT batch commits at phase end
97
- 11. Update `.gsd-t/progress.md` mark task complete
98
- 12. If you've reached a CHECKPOINT in integration-points.md, pause and verify the contract before continuing
52
+ ### Solo Mode (default) — Domain Subagent Pattern
53
+
54
+ Each domain's work runs in an isolated Task subagent with a fresh context window. The orchestrator (this agent) stays lightweight — it only spawns subagents, collects summaries, verifies checkpoints, and updates progress.
55
+
56
+ **OBSERVABILITY LOGGING (MANDATORY) — repeat for every domain subagent spawn:**
57
+
58
+ Before spawning — run via Bash:
59
+ `T_START=$(date +%s) && DT_START=$(date +"%Y-%m-%d %H:%M") && TOK_START=${CLAUDE_CONTEXT_TOKENS_USED:-0} && TOK_MAX=${CLAUDE_CONTEXT_TOKENS_MAX:-200000}`
60
+
61
+ After subagent returns run via Bash:
62
+ `T_END=$(date +%s) && DT_END=$(date +"%Y-%m-%d %H:%M") && TOK_END=${CLAUDE_CONTEXT_TOKENS_USED:-0} && DURATION=$((T_END-T_START))`
63
+
64
+ Compute tokens and compaction:
65
+ - No compaction (TOK_END >= TOK_START): `TOKENS=$((TOK_END-TOK_START))`, COMPACTED=null
66
+ - Compaction detected (TOK_END < TOK_START): `TOKENS=$(((TOK_MAX-TOK_START)+TOK_END))`, COMPACTED=$DT_END
67
+
68
+ Append to `.gsd-t/token-log.md` (create with header `| Datetime-start | Datetime-end | Command | Step | Model | Duration(s) | Notes | Tokens | Compacted |` if missing):
69
+ `| {DT_START} | {DT_END} | gsd-t-execute | domain:{domain-name} | sonnet | {DURATION}s | {N} tasks, {pass/fail} | {TOKENS} | {COMPACTED} |`
70
+
71
+ **For each domain (in wave order), spawn:**
72
+
73
+ ```
74
+ Task subagent (general-purpose, model: sonnet, mode: bypassPermissions):
75
+ "You are executing all tasks for the {domain-name} domain.
76
+
77
+ Read before starting (load your own context do not assume anything):
78
+ 1. CLAUDE.md project conventions (CRITICAL)
79
+ 2. .gsd-t/domains/{domain-name}/scope.md what you own
80
+ 3. .gsd-t/domains/{domain-name}/constraints.md patterns to follow
81
+ 4. ALL files in .gsd-t/contracts/ your interfaces
82
+ 5. .gsd-t/domains/{domain-name}/tasks.mdyour task list
83
+ 6. .gsd-t/contracts/integration-points.mdwave order and checkpoints
84
+
85
+ Execute each incomplete task in order:
86
+ 1. Read task description, files list, and contract refs
87
+ 2. Read relevant contracts implement EXACTLY what they specify
88
+ 3. Destructive Action Guard: if task involves DROP TABLE, schema changes that lose
89
+ data, removing working modules, or replacing architecture patterns → write a
90
+ NEEDS-APPROVAL entry to .gsd-t/deferred-items.md, skip the task, continue
91
+ 4. Implement the task
92
+ 5. Verify acceptance criteria are met
93
+ 6. Write comprehensive tests (MANDATORY — no feature code without test code):
94
+ - Unit/integration: happy path + edge cases + error cases for every new/changed function
95
+ - Playwright E2E (if UI/routes/flows changed): new specs for new features, cover
96
+ all modes, form validation, empty/loading/error states, common edge cases
97
+ - If no test framework exists: set one up as part of this task
98
+ 7. Run ALL tests — unit, integration, Playwright. Fix failures (up to 2 attempts)
99
+ 8. Run Pre-Commit Gate checklist from CLAUDE.md — update all affected docs BEFORE committing
100
+ 9. Commit immediately: feat({domain-name}/task-{N}): {description}
101
+ 10. Update .gsd-t/progress.md — mark task complete
102
+ 11. Spawn QA subagent (model: haiku) after each task:
103
+ 'Run the full test suite. Read .gsd-t/contracts/ for definitions.
104
+ Report: pass/fail counts and coverage gaps.'
105
+ If QA fails, fix before proceeding. Append issues to .gsd-t/qa-issues.md.
106
+
107
+ Deviation rules:
108
+ - Bug blocking progress → fix, max 3 attempts; if still blocked, log to
109
+ .gsd-t/deferred-items.md and continue to next task
110
+ - Missing dependency task requires → add minimum needed, document in commit message
111
+ - Non-trivial blocker → fix and log to .gsd-t/deferred-items.md
112
+ - Architectural change required → write NEEDS-APPROVAL to .gsd-t/deferred-items.md,
113
+ skip the task, continue — never self-approve structural changes
114
+
115
+ When all tasks are complete, report:
116
+ - Tasks completed: N/N
117
+ - Test results: pass/fail counts
118
+ - Commits made: list of commit hashes
119
+ - Deferred items (if any): list from .gsd-t/deferred-items.md"
120
+ ```
121
+
122
+ **After each domain subagent returns (orchestrator responsibilities):**
123
+ 1. Log to `.gsd-t/token-log.md` (see observability block above)
124
+ 2. Check `.gsd-t/deferred-items.md` for any `NEEDS-APPROVAL` entries — if found, STOP and present to user before spawning the next domain
125
+ 3. If a CHECKPOINT is reached per `integration-points.md`, verify contract compliance (see Step 4) before proceeding to the next wave/domain
126
+ 4. Update `.gsd-t/progress.md` with domain completion status
99
127
 
100
128
  ### Team Mode (when agent teams are enabled)
101
129
  Spawn teammates for domains within the same wave. Only domains in the same wave can run in parallel — do not spawn teammates for domains in different waves simultaneously:
@@ -17,6 +17,63 @@ If `.gsd-t/` doesn't exist, create the full directory structure:
17
17
  └── progress.md
18
18
  ```
19
19
 
20
+ ## Step 1.5: Assumption Audit (MANDATORY — complete before domain work begins)
21
+
22
+ Before partitioning, surface and lock down all assumptions baked into the requirements. Unexamined assumptions become architectural decisions no one approved.
23
+
24
+ Work through each category below. For every match found, write the explicit disposition into the affected domain's `constraints.md` and into the Decision Log in `.gsd-t/progress.md`.
25
+
26
+ ---
27
+
28
+ ### Category 1: External Reference Assumptions
29
+
30
+ Scan requirements for any external project, file, component, library, or URL mentioned by name or path. For each one found, explicitly confirm which disposition applies — and lock it in the contract before any domain touches it:
31
+
32
+ | Disposition | Meaning |
33
+ |-------------|---------|
34
+ | `USE` | Import and depend on it — treat as a dependency |
35
+ | `INSPECT` | Read source for patterns only — do not import or copy code |
36
+ | `BUILD` | Build equivalent functionality from scratch — do not read or use it |
37
+
38
+ **No external reference survives partition without a locked disposition.**
39
+
40
+ Trigger phrases to watch for: "reference X", "like X", "similar to Y", "see W for how it handles Z", any file path or project name, any URL.
41
+
42
+ > If Level 3 (Full Auto): state the inferred disposition and reason; lock it unless it's ambiguous.
43
+ > If ambiguous (e.g., "reference X" could mean USE or INSPECT): pause and ask the user before proceeding.
44
+
45
+ ---
46
+
47
+ ### Category 3: Black Box Assumptions
48
+
49
+ Any component, module, or library **not written in this milestone** that a domain will call, import, or depend on → the agent that executes that domain must read its source before treating it as correct. This includes internal project modules written in a previous milestone.
50
+
51
+ For each such component identified:
52
+ 1. Name it explicitly in the domain's `constraints.md` under a `## Must Read Before Using` section
53
+ 2. List the specific functions or behaviors the domain depends on
54
+ 3. The execute agent is prohibited from treating it as a black box — it must read the listed items before implementing
55
+
56
+ ---
57
+
58
+ ### Category 4: User Intent Assumptions
59
+
60
+ Scan requirements for ambiguous language. Flag every instance where intent could be interpreted more than one way. Common patterns:
61
+
62
+ - "like X" / "similar to Y" — does this mean the same UX, the same architecture, or just the same concept?
63
+ - "the way X handles it" — inspiration, direct port, or behavioral equivalent?
64
+ - "reference Z" — does this mean read it, use it, or replicate it?
65
+ - "build something that does W" — from scratch, or using an existing library?
66
+ - Any requirement where a reasonable developer could make two different implementation choices
67
+
68
+ For each ambiguous item:
69
+ 1. State the two (or more) possible interpretations explicitly
70
+ 2. State which interpretation you are locking in and why
71
+ 3. If genuinely unclear: pause and ask the user — do not infer and proceed
72
+
73
+ > **Rule**: Ambiguous intent that reaches execute unresolved becomes a wrong assumption. Resolve it here or pay for it in debug sessions.
74
+
75
+ ---
76
+
20
77
  ## Step 2: Identify Domains
21
78
 
22
79
  Decompose the milestone into 2-5 independent domains. Each domain should:
@@ -24,6 +24,37 @@ Run the full test audit directly:
24
24
 
25
25
  Verification cannot complete if any test fails or critical contract gaps remain.
26
26
 
27
+ ## Step 2.5: High-Risk Domain Gate (MANDATORY — Categories 2 and 7)
28
+
29
+ Before running standard verification dimensions, check whether this milestone involves any high-risk domain:
30
+
31
+ **High-risk domains**: audio capture/playback, GPU/WebGPU/WebGL, ML/inference/model loading, background workers, native APIs (camera, bluetooth, filesystem), IPC, WebAssembly, real-time data streams.
32
+
33
+ **If any high-risk domain is present:**
34
+
35
+ ### Category 2 — Technology Reliability Gate
36
+ Initialization success does not prove runtime correctness. These technologies can initialize cleanly and fail silently at runtime (compute shader errors, audio context state loss, worker message drops, inference failures).
37
+
38
+ For each high-risk domain:
39
+ 1. A **smoke test script** must exist that exercises actual runtime behavior — not just initialization
40
+ 2. The smoke test must have been run and passed
41
+ 3. "It initialized without throwing" is NOT a passing smoke test
42
+ 4. If no smoke test exists → create one now before proceeding with any other verification dimension
43
+ 5. Smoke test failure → verification FAIL (not WARN)
44
+
45
+ ### Category 7 — Manual QA as Test Gate
46
+ "The user will manually test it" is not a test artifact. Scan the milestone's domains for any feature whose acceptance criteria relies solely on manual user testing.
47
+
48
+ For each such feature:
49
+ 1. A smoke test script must exist that automates as much of the verification as possible
50
+ 2. Any remaining manual steps must be explicitly documented in `.gsd-t/smoke-tests/{feature}.md` with exact steps and expected outcomes
51
+ 3. The documented manual steps must have been executed and passed (noted in the file)
52
+ 4. If neither automated smoke test nor documented manual procedure exists → verification FAIL
53
+
54
+ > These gates exist because the pre-commit checklist "did you run the affected tests?" is meaningless when the only test is "user presses Ctrl+Space." That is not a test. It is hope.
55
+
56
+ ---
57
+
27
58
  ## Step 3: Define Verification Dimensions
28
59
 
29
60
  Standard dimensions (adjust based on project):
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "@tekyzinc/gsd-t",
3
- "version": "2.31.16",
3
+ "version": "2.31.18",
4
4
  "description": "GSD-T: Contract-Driven Development for Claude Code — 46 slash commands with backlog management, impact analysis, test sync, milestone archival, and PRD generation",
5
5
  "author": "Tekyz, Inc.",
6
6
  "license": "MIT",