npm - @tekyzinc/gsd-t - Versions diffs - 2.39.13 → 2.45.11 - Mend

@tekyzinc/gsd-t 2.39.13 → 2.45.11

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (45) hide show

package/README.md +17 -9
package/bin/desktop.ini +2 -0
package/bin/global-sync-manager.js +350 -0
package/bin/gsd-t.js +592 -2
package/bin/metrics-collector.js +167 -0
package/bin/metrics-rollup.js +200 -0
package/bin/patch-lifecycle.js +195 -0
package/bin/rule-engine.js +160 -0
package/commands/desktop.ini +2 -0
package/commands/gsd-t-complete-milestone.md +192 -5
package/commands/gsd-t-debug.md +16 -2
package/commands/gsd-t-execute.md +257 -52
package/commands/gsd-t-help.md +25 -10
package/commands/gsd-t-integrate.md +35 -7
package/commands/gsd-t-metrics.md +143 -0
package/commands/gsd-t-plan.md +49 -2
package/commands/gsd-t-quick.md +15 -3
package/commands/gsd-t-status.md +78 -0
package/commands/gsd-t-test-sync.md +2 -2
package/commands/gsd-t-verify.md +140 -9
package/commands/gsd-t-visualize.md +11 -1
package/commands/gsd-t-wave.md +34 -19
package/docs/GSD-T-README.md +9 -6
package/docs/architecture.md +84 -2
package/docs/ci-examples/desktop.ini +2 -0
package/docs/ci-examples/github-actions.yml +104 -0
package/docs/ci-examples/gitlab-ci.yml +116 -0
package/docs/desktop.ini +2 -0
package/docs/infrastructure.md +87 -1
package/docs/prd-graph-engine.md +2 -2
package/docs/prd-gsd2-hybrid.md +258 -135
package/docs/requirements.md +63 -2
package/examples/.gsd-t/contracts/desktop.ini +2 -0
package/examples/.gsd-t/desktop.ini +2 -0
package/examples/.gsd-t/domains/desktop.ini +2 -0
package/examples/.gsd-t/domains/example-domain/desktop.ini +2 -0
package/examples/desktop.ini +2 -0
package/examples/rules/.gitkeep +0 -0
package/package.json +40 -40
package/scripts/desktop.ini +2 -0
package/scripts/gsd-t-dashboard-server.js +19 -2
package/scripts/gsd-t-dashboard.html +63 -0
package/scripts/gsd-t-event-writer.js +1 -0
package/templates/CLAUDE-global.md +30 -9
package/templates/desktop.ini +2 -0

package/commands/gsd-t-metrics.md ADDED Viewed

@@ -0,0 +1,143 @@
+# GSD-T: Metrics — View Task Telemetry and Process Health
+You are displaying metrics data from the GSD-T telemetry system. Read JSONL files directly — no module imports needed.
+## Step 1: Load Metrics Data
+Read:
+1. `.gsd-t/metrics/task-metrics.jsonl` — per-task telemetry records
+2. `.gsd-t/metrics/rollup.jsonl` — per-milestone aggregation with ELO and heuristics
+3. `.gsd-t/progress.md` — current milestone ID (for default filter)
+If neither file exists: display "No metrics data yet. Metrics are collected automatically during execute, quick, and debug commands." and stop.
+If `$ARGUMENTS` contains a milestone ID (e.g., "M25"), use that as the filter. Otherwise, use the current active milestone from progress.md.
+## Step 2: Display Milestone Summary
+From `rollup.jsonl`, find the entry matching the target milestone. Display:
+```
+## Metrics — {milestone}
+| Metric              | Value                          |
+|---------------------|--------------------------------|
+| Tasks               | {total_tasks}                  |
+| First-pass rate     | {first_pass_rate * 100}%       |
+| Avg duration        | {avg_duration_s}s              |
+| Avg context         | {avg_context_pct}%             |
+| Total fix cycles    | {total_fix_cycles}             |
+| Total tokens        | {total_tokens}                 |
+```
+If no rollup entry exists for the milestone, compute summary directly from task-metrics.jsonl records.
+## Step 3: Display Process ELO
+```
+## Process ELO
+{elo_after} ({elo_delta > 0 ? '↑' : '↓'} {elo_delta} from {elo_before})
+```
+If no previous milestone, show: `{elo_after} (baseline — first milestone)`
+## Step 4: Display Signal Distribution
+From rollup `signal_distribution` or computed from task-metrics:
+```
+## Signal Distribution
+| Signal Type      | Count |
+|------------------|-------|
+| pass-through     | {N}   |
+| fix-cycle        | {N}   |
+| debug-invoked    | {N}   |
+| user-correction  | {N}   |
+| phase-skip       | {N}   |
+```
+## Step 5: Display Domain Breakdown
+From rollup `domain_breakdown`:
+```
+## Domain Breakdown
+| Domain            | Tasks | Pass% | Avg Duration |
+|-------------------|-------|-------|--------------|
+| {domain}          | {N}   | {N}%  | {N}s         |
+```
+## Step 6: Display Trend Comparison
+If `trend_delta` exists (previous milestone data available):
+```
+## Trend vs Previous Milestone
+| Metric          | Delta                    |
+|-----------------|--------------------------|
+| First-pass rate | {delta > 0 ? '↑' : '↓'} {delta}% |
+| Avg duration    | {delta > 0 ? '↑' : '↓'} {delta}s |
+| ELO             | {delta > 0 ? '↑' : '↓'} {delta}  |
+```
+If no previous milestone: "First milestone — no trend data yet."
+## Step 7: Display Heuristic Anomalies
+If `heuristic_flags` has entries:
+```
+## Anomaly Detection
+| Heuristic                     | Severity | Description              |
+|-------------------------------|----------|--------------------------|
+| {heuristic}                   | {sev}    | {description}            |
+```
+If no anomalies: "No anomalies detected."
+## Step 8: Cross-Project Comparison (when --cross-project flag present)
+If `$ARGUMENTS` contains `--cross-project`:
+1. Run via Bash:
+   ```bash
+   node -e "const g = require('./bin/global-sync-manager.js'); const r = g.compareSignalDistributions(require('./package.json').name || require('path').basename(process.cwd())); console.log(JSON.stringify(r));" 2>/dev/null
+   ```
+2. If the result has `insufficient_data: true`, display:
+   "No global metrics yet — complete milestones in multiple projects to enable cross-project comparison"
+3. Otherwise, display the cross-project signal distribution comparison:
+   ```
+   ## Cross-Project Signal Distribution
+   | Project              | Tasks | Pass-Through | Fix-Cycle | Other |
+   |----------------------|-------|--------------|-----------|-------|
+   | {project} {★ if is_queried} | {N} | {rate}    | {rate}    | ...   |
+   ```
+4. If `$ARGUMENTS` also contains `--domain {domainType}`, run:
+   ```bash
+   node -e "const g = require('./bin/global-sync-manager.js'); const r = g.getDomainTypeComparison('{domainType}'); console.log(JSON.stringify(r));" 2>/dev/null
+   ```
+   Display the domain-type comparison table:
+   ```
+   ## Domain-Type Comparison: {domainType}
+   | Project              | Tasks | Pass-Through | Fix-Cycle |
+   |----------------------|-------|--------------|-----------|
+   | {project}            | {N}   | {count}      | {count}   |
+   ```
+If `--cross-project` is NOT in `$ARGUMENTS`: skip this step entirely (no change to existing behavior).
+$ARGUMENTS
+## Auto-Clear
+All work is committed to project files. Execute `/clear` to free the context window for the next command.

package/commands/gsd-t-plan.md CHANGED Viewed

@@ -25,6 +25,24 @@ If `.gsd-t/graph/meta.json` exists (graph index is available):
 If graph is not available, skip this step.
+## Step 1.7: Pre-Mortem — Historical Failure Analysis
+Before creating task lists, check historical task-metrics for domain-level failure patterns from previous milestones:
+1. Run via Bash:
+   `node -e "const c = require('./bin/metrics-collector.js'); const domains = [/* list domain names from scope files */]; domains.forEach(d => { const w = c.getPreFlightWarnings(d); if(w.length) w.forEach(x => console.log('⚠️ ' + x)); });" 2>/dev/null || true`
+2. If any domain has `first_pass_rate < 0.6` historically:
+   - Display warning inline: `⚠️ Domain {name} has historically low first-pass rate ({rate}%). Consider: smaller tasks, more explicit acceptance criteria, or additional contract detail.`
+   - This is **non-blocking** — it informs task design, does not prevent planning.
+3. If `.gsd-t/metrics/task-metrics.jsonl` does not exist: skip this step silently (first milestone, no historical data).
+4. **Rule-based pre-mortem**: Run via Bash:
+   `node -e "const re = require('./bin/rule-engine.js'); const domains = [/* list domain names */]; domains.forEach(d => { const rules = re.getPreMortemRules(d); if(rules.length) rules.forEach(r => console.log('RULE ' + r.id + ': ' + r.name + ' — historically triggered for domains like ' + d)); });" 2>/dev/null || true`
+   If matching rules found: display warnings inline (non-blocking — informs task design). Falls back gracefully if rules.jsonl does not exist or is empty.
 ## Step 2: Create Task Lists Per Domain
 ### SharedCore-First Pre-Check
@@ -67,6 +85,30 @@ For each domain, write `.gsd-t/domains/{domain-name}/tasks.md`:
 4. **Contract-bound**: Every task that crosses a domain boundary must reference the specific contract it implements
 5. **Ordered**: Tasks within a domain are numbered in execution order
 6. **No implicit knowledge**: Don't assume the executing agent remembers previous tasks — reference contracts and files explicitly
+7. **Context-window fit**: Each task MUST be executable within a single context window. Apply the scope validation heuristics below.
+### Task Scope Validation
+After writing each task, apply this heuristic check before finalizing:
+**Splitting candidates — flag if ANY of these are true:**
+- Task lists **more than 5 files** to modify or create
+- Task has **more than 3 complex dependencies** (other tasks, contracts, or external systems it must read and understand)
+- Task description spans multiple distinct concerns (e.g., "implement X and also refactor Y and update Z docs")
+**Warning threshold:** If a task is flagged, emit:
+> ⚠️ **Task scope warning — {domain} Task {N}**: Estimated context load is high ({N} files, {N} dependencies). This task may approach the 70% context window threshold. Consider splitting into:
+> - Task {N}a: {first concern}
+> - Task {N}b: {second concern}
+**Auto-split rule (Level 3 Full Auto):** If a task has >5 files AND >3 dependencies, split it automatically. Renumber subsequent tasks. Document the split rationale in the task's Dependencies field.
+**Guidance for estimating context size:**
+- Each file to read ≈ 1–5% of context window (varies by file size)
+- CLAUDE.md + scope.md + constraints.md + contracts ≈ 15–25% baseline overhead
+- Tasks with >5 files or >3 cross-domain contracts commonly exceed 70% total context
+This rule implements the "task must fit in one context window" constraint — a task that compacts its subagent is a task that produces incomplete or corrupt output.
 ### Cross-Domain Duplicate Operation Scan
@@ -259,8 +301,13 @@ After subagent returns — run via Bash:
 Compute tokens and compaction:
 - No compaction (TOK_END >= TOK_START): `TOKENS=$((TOK_END-TOK_START))`, COMPACTED=null
 - Compaction detected (TOK_END < TOK_START): `TOKENS=$(((TOK_MAX-TOK_START)+TOK_END))`, COMPACTED=$DT_END
-Append to `.gsd-t/token-log.md` (create with header `| Datetime-start | Datetime-end | Command | Step | Model | Duration(s) | Notes | Tokens | Compacted |` if missing):
-`| {DT_START} | {DT_END} | gsd-t-plan | Step 7 | haiku | {DURATION}s | {PASS/FAIL}, iteration {N} | {TOKENS} | {COMPACTED} |`
+Compute context utilization — run via Bash:
+`if [ "${CLAUDE_CONTEXT_TOKENS_MAX:-0}" -gt 0 ]; then CTX_PCT=$(echo "scale=1; ${CLAUDE_CONTEXT_TOKENS_USED:-0} * 100 / ${CLAUDE_CONTEXT_TOKENS_MAX}" | bc); else CTX_PCT="N/A"; fi`
+Alert on context thresholds (display to user inline):
+- If CTX_PCT >= 85: `echo "🔴 CRITICAL: Context at ${CTX_PCT}% — compaction likely. Task MUST be split."`
+- If CTX_PCT >= 70: `echo "⚠️ WARNING: Context at ${CTX_PCT}% — approaching compaction threshold. Consider splitting in plan."`
+Append to `.gsd-t/token-log.md` (create with header `| Datetime-start | Datetime-end | Command | Step | Model | Duration(s) | Notes | Tokens | Compacted | Domain | Task | Ctx% |` if missing):
+`| {DT_START} | {DT_END} | gsd-t-plan | Step 7 | haiku | {DURATION}s | {PASS/FAIL}, iteration {N} | {TOKENS} | {COMPACTED} | | | {CTX_PCT} |`
 If validation FAIL, append each gap to `.gsd-t/qa-issues.md` (create with header `| Date | Command | Step | Model | Duration(s) | Severity | Finding |` if missing):
 `| {DT_START} | gsd-t-plan | Step 7 | haiku | {DURATION}s | medium | {gap description} |`

package/commands/gsd-t-quick.md CHANGED Viewed

@@ -83,6 +83,16 @@ When you encounter unexpected situations:
 5. Verify it works
 6. Commit: `[quick] {description}`
+## Step 3.5: Emit Task Metrics
+After committing, emit a task-metrics record for this quick task — run via Bash:
+`node bin/metrics-collector.js --milestone {current-milestone-or-none} --domain {domain-or-quick} --task quick-{timestamp} --command quick --duration_s {elapsed} --tokens_used {estimated} --context_pct ${CTX_PCT:-0} --pass {true|false} --fix_cycles {0|N} --signal_type {pass-through|fix-cycle} --notes "[quick] {description}" 2>/dev/null || true`
+Signal type: `pass-through` if task completed on first attempt; `fix-cycle` if rework was needed.
+Emit task_complete event — run via Bash:
+`node ~/.claude/scripts/gsd-t-event-writer.js --type task_complete --command gsd-t-quick --reasoning "signal_type={signal_type}, domain={domain}" --outcome {success|failure} || true`
 ## Step 4: Document Ripple (if GSD-T is active)
 If `.gsd-t/progress.md` exists, assess what documentation was affected and update ALL relevant files:
@@ -123,9 +133,11 @@ Quick does not mean skip testing. Before committing:
    - Playwright E2E specs (if UI/routes/flows/modes changed): create new specs for new functionality, update existing specs for changed behavior
    - Cover all modes/flags affected by this change
    - "No feature code without test code" applies to quick tasks too
-2. **Run the FULL test suite** — not just affected tests:
-   - All unit/integration tests
-   - Full Playwright E2E suite (if configured)
+2. **Run ALL configured test suites** — not just affected tests, not just one suite:
+   a. Detect all runners: check for vitest/jest config, playwright.config.*, cypress.config.*
+   b. Run EVERY detected suite. Unit tests alone are NEVER sufficient when E2E exists.
+   c. If `playwright.config.*` exists → `npx playwright test` (full suite)
+   d. Report ALL results: "Unit: X/Y pass | E2E: X/Y pass"
    - Fix any failures before proceeding (up to 2 attempts)
 3. **Verify against requirements**:
    - Does the change satisfy its intended requirement?

package/commands/gsd-t-status.md CHANGED Viewed

@@ -57,6 +57,59 @@ If `.gsd-t/backlog.md` exists, read and parse it. Show total count and top 3 ite
 If there are blockers or issues, highlight them.
 If the user provides $ARGUMENTS, focus the status on that specific domain or aspect.
+## Token Usage Breakdown
+If `.gsd-t/token-log.md` exists, read it and append a token breakdown to the status report.
+Parse each row in the table. Handle both old format (9 columns) and extended format (12 columns with Domain, Task, Ctx%). Rows with missing or empty Domain column are assigned domain "(untagged)".
+### Token Usage by Domain
+Group rows by Domain. For each domain, sum Tokens and collect all Ctx% values (ignoring "N/A" and empty). Display:
+```
+## Token Usage by Domain
+| Domain         | Tokens | Subagents | Peak Ctx% |
+|----------------|--------|-----------|-----------|
+| auth           | 12,400 | 4         | 14%       |
+| notifications  | 45,200 | 3         | 52% ⚠️    |
+| (untagged)     | 8,100  | 6         | N/A       |
+```
+Flag any domain where Peak Ctx% >= 70 with `⚠️` suffix.
+### Token Usage by Phase/Command
+Group rows by Command. For each command, sum Tokens and count subagent rows. Display:
+```
+## Token Usage by Command
+| Command       | Tokens | Subagents |
+|---------------|--------|-----------|
+| gsd-t-execute | 86,200 | 14        |
+| gsd-t-wave    | 12,400 | 9         |
+| gsd-t-plan    | 3,400  | 1         |
+```
+If token-log.md does not exist or is empty, skip this section entirely (no error).
+## Process Health
+If `.gsd-t/metrics/rollup.jsonl` exists, read the latest entry and append to the status report:
+```
+Process Health:
+  ELO: {elo_after} ({elo_delta > 0 ? '↑' : '↓'} {elo_delta})
+  Quality: {first_pass_rate * 100}% first-pass rate | {total_fix_cycles} fix cycles
+```
+If `.gsd-t/metrics/task-metrics.jsonl` exists but no rollup.jsonl, compute first_pass_rate directly from task-metrics for the current milestone and display:
+```
+Process Health:
+  Quality: {rate}% first-pass rate (current milestone, no rollup yet)
+```
+If neither file exists, skip this section entirely.
 ## Graph Status
 If `.gsd-t/graph/meta.json` exists, read it and append to the status report:
@@ -81,6 +134,31 @@ After displaying the project status, check for GSD-T updates:
 5. If versions match, skip — don't show anything
+## Global ELO & Cross-Project Rankings
+After the Process Health section, check for global metrics:
+1. Run via Bash:
+   ```bash
+   node -e "const g = require('./bin/global-sync-manager.js'); const name = (() => { try { return require('./package.json').name; } catch { return require('path').basename(process.cwd()); } })(); const elo = g.getGlobalELO(name); const ranks = g.getProjectRankings(); console.log(JSON.stringify({ elo, ranks, name }));" 2>/dev/null
+   ```
+2. If the result returns `elo: null` or the command fails: display "No global metrics yet" and skip.
+3. If global ELO data exists, display:
+   ```
+   Global ELO: {elo} (rank #{position} of {total} projects)
+   ```
+   Where position is the 1-based index of the current project in the rankings array.
+4. If 2+ projects have global rollup data, display the top 5 rankings:
+   ```
+   ## Cross-Project Rankings (Top 5)
+   | Rank | Project          | ELO    | Latest Milestone |
+   |------|------------------|--------|------------------|
+   | 1    | {project}        | {elo}  | {milestone}      |
+   ```
 $ARGUMENTS
 ## Auto-Clear

package/commands/gsd-t-test-sync.md CHANGED Viewed

@@ -111,8 +111,8 @@ pytest tests/test_{module}.py -v
 npm test -- --testPathPattern="{module}"
 ```
-### B) E2E Tests
-If an E2E framework is detected, run E2E tests affected by the changes:
+### B) E2E Tests (MANDATORY when config exists)
+If `playwright.config.*` or `cypress.config.*` exists, you MUST run E2E tests — skipping is never acceptable:
 ```bash
 # Playwright

package/commands/gsd-t-verify.md CHANGED Viewed

@@ -199,6 +199,95 @@ Create or update `.gsd-t/verify-report.md`:
 | 2 | ui | Add loading states for async calls | WARN |
 ```
+## Step 5.25: Metrics Quality Budget Check
+Check task-metrics for the current milestone to detect quality budget violations:
+1. Run via Bash:
+   `node -e "const c = require('./bin/metrics-collector.js'); const r = c.readTaskMetrics({milestone: '{milestone-id}'}); if(!r.length){console.log('No metrics data — quality budget check skipped');process.exit(0);} const pass=r.filter(t=>t.fix_cycles===0&&t.pass).length; const rate=pass/r.length; console.log('First-pass rate: '+(rate*100).toFixed(1)+'% ('+pass+'/'+r.length+')'); if(rate<0.6) console.log('⚠️ Quality budget WARNING: first-pass rate below 60%');" 2>/dev/null || true`
+2. Run heuristics check via Bash:
+   `node -e "const m=require('./bin/metrics-rollup.js'); const r=m.readRollups({milestone:'{milestone-id}'}); if(r.length&&r[r.length-1].heuristic_flags.some(f=>f.severity==='HIGH')) console.log('⚠️ HIGH severity heuristic flag detected — review before completing milestone');" 2>/dev/null || true`
+3. Display quality metrics summary inline. Quality budget violation is a **WARNING** (non-blocking) — does not fail verify.
+4. Include quality budget status in the verification report (Step 5):
+   `- Quality Budget: {PASS/WARN} — first-pass rate {N}%{, HIGH heuristic: {name} if any}`
+## Step 5.5: Goal-Backward Verification (Post-Gate Behavior Check)
+This step runs **after all 8 quality gates pass**. It verifies that milestone goals are actually achieved end-to-end — not just structurally present. It catches placeholder implementations that pass all structural gates.
+Refer to `.gsd-t/contracts/goal-backward-contract.md` for the full verification flow, placeholder patterns, and findings report format.
+### 5.5.1 Load Milestone Goals and Requirements
+1. Read `.gsd-t/progress.md` — extract the current milestone name and goals
+2. Read `docs/requirements.md` — identify **critical requirements** (skip trivial/low-priority items)
+### 5.5.2 Trace Requirements to Behavior
+For each critical requirement:
+1. **If `.gsd-t/graph/meta.json` exists (graph available)**:
+   - Trace the requirement → code path → behavior chain using graph queries
+   - Use `getRequirementFor`, `getCallers`, and `getTestsFor` to build the chain
+   - Flag requirements with no traceable code path as CRITICAL findings
+2. **If graph is not available (fallback to grep)**:
+   - Search the codebase for the feature/function implementing each requirement
+   - Trace from entry point → core logic → output/response
+### 5.5.3 Scan for Placeholder Patterns
+For each file identified in the requirement traces above, scan for these placeholder patterns:
+| Pattern | Detection Hint | Severity |
+|---------|---------------|----------|
+| console.log placeholder | `console.log.*TODO\|console.log.*implement` | CRITICAL |
+| TODO/FIXME in implementation | `// TODO\|// FIXME\|# TODO\|# FIXME` in non-test files | CRITICAL |
+| Empty function body | `function \w+\(\) \{\}` or `\(\) => \{\}` with no logic | CRITICAL |
+| Throw not-implemented | `throw new Error.*not implemented\|throw new Error.*TODO` | CRITICAL |
+| Hardcoded return | `return "success"\|return true` with no conditional logic | HIGH |
+| Static UI text | Static `<span>` or text that never updates based on state | HIGH |
+| Pass-through stub | `return input\|return req\|return data` with no transformation | MEDIUM |
+### 5.5.4 Produce Findings Report
+Format findings per the goal-backward-contract.md report format:
+```markdown
+## Goal-Backward Verification Report
+### Status: PASS | FAIL
+### Findings
+| # | Requirement | File:Line | Pattern | Severity | Description |
+|---|-------------|-----------|---------|----------|-------------|
+| 1 | {req-id}    | {path}:{line} | {pattern} | {severity} | {what's wrong} |
+### Summary
+- Requirements checked: {N}
+- Findings: {N} ({critical}, {high}, {medium})
+- Verdict: {PASS if 0 critical/high, FAIL otherwise}
+```
+### 5.5.5 Apply Blocking Rules
+- **CRITICAL or HIGH findings** → Goal-Backward status = **FAIL** — block verification
+  - Append findings to the Critical section of the verification report (Step 5)
+  - Set overall verification status to FAIL
+- **MEDIUM findings** → Goal-Backward status = **WARN** — log but do not block
+  - Append findings to the Warnings section of the verification report (Step 5)
+- **No findings** → Goal-Backward status = **PASS** — add to verification report summary
+Add a `Goal-Backward:` line to the Step 5 verification report summary:
+```
+- Goal-Backward: {PASS/WARN/FAIL} — {N} requirements checked, {N} findings ({critical} critical, {high} high, {medium} medium)
+```
+---
 ## Step 6: Handle Remediation
 If there are CRITICAL findings:
@@ -217,15 +306,9 @@ Update `.gsd-t/progress.md`:
 ### Autonomy Behavior
-**Level 3 (Full Auto)**:
-- VERIFIED → Log "✅ Verify complete — all quality gates passed" and auto-advance to complete-milestone. Do NOT wait for user input.
-- CONDITIONAL PASS → Log warnings, treat as VERIFIED, and auto-advance. Do NOT wait for user input.
-- FAIL → Auto-execute remediation tasks (up to 2 fix attempts). If still failing after 2 attempts, STOP and report to user.
-**Level 1–2**:
-- VERIFIED → Milestone complete, proceed to next milestone or ship
-- CONDITIONAL PASS → User decides if warnings are acceptable
-- FAIL → Return to execute phase for remediation tasks
+**All Levels**:
+- VERIFIED or CONDITIONAL PASS → **Auto-invoke complete-milestone** (see Step 8 below). Completing a verified milestone is mechanical — there is no judgment call that benefits from user review.
+- FAIL → **Level 3**: Auto-execute remediation tasks (up to 2 fix attempts). If still failing after 2 attempts, STOP and report to user. **Level 1–2**: Return to execute phase for remediation tasks.
 ## Document Ripple
@@ -238,6 +321,54 @@ Update `.gsd-t/progress.md`:
 4. **`.gsd-t/techdebt.md`** — If verification found new quality or security issues, add as debt
 5. **`docs/requirements.md`** — If verification revealed unmet requirements, update status
+## Step 8: Auto-Invoke Complete-Milestone
+**This step is MANDATORY and runs at ALL autonomy levels.** Completing a verified milestone is a mechanical operation (archive, tag, bump version, update docs). There is no decision that benefits from user review — the decision was made when verification passed.
+If status is VERIFY-FAILED:
+- Do NOT invoke complete-milestone
+- Report failures and stop
+If status is VERIFIED or VERIFIED-WITH-WARNINGS:
+1. Log: "✅ Verify complete — spawning complete-milestone agent..."
+**OBSERVABILITY LOGGING (MANDATORY):**
+Before spawning — run via Bash:
+`T_START=$(date +%s) && DT_START=$(date +"%Y-%m-%d %H:%M") && TOK_START=${CLAUDE_CONTEXT_TOKENS_USED:-0} && TOK_MAX=${CLAUDE_CONTEXT_TOKENS_MAX:-200000}`
+2. Spawn a Task subagent (model: sonnet, mode: bypassPermissions):
+```
+"Execute the complete-milestone phase of the current GSD-T milestone.
+Read and follow the full instructions in commands/gsd-t-complete-milestone.md
+(resolve from ~/.claude/commands/ if not in project).
+Read .gsd-t/progress.md for current milestone and state.
+Read CLAUDE.md for project conventions.
+Read .gsd-t/contracts/ for domain interfaces.
+Complete the phase fully:
+- Follow every step in the command file
+- Update .gsd-t/progress.md status when done
+- Run document ripple as specified
+- Commit your work
+Report back: one-line status summary."
+```
+After subagent returns — run via Bash:
+`T_END=$(date +%s) && DT_END=$(date +"%Y-%m-%d %H:%M") && TOK_END=${CLAUDE_CONTEXT_TOKENS_USED:-0} && DURATION=$((T_END-T_START))`
+Compute tokens and compaction:
+- No compaction (TOK_END >= TOK_START): `TOKENS=$((TOK_END-TOK_START))`, COMPACTED=null
+- Compaction detected (TOK_END < TOK_START): `TOKENS=$(((TOK_MAX-TOK_START)+TOK_END))`, COMPACTED=$DT_END
+Append to `.gsd-t/token-log.md`:
+`| {DT_START} | {DT_END} | gsd-t-verify | Step 8 | sonnet | {DURATION}s | auto-complete-milestone | {TOKENS} | {COMPACTED} | | | {CTX_PCT} |`
+3. Verify subagent result: Read `.gsd-t/progress.md` — confirm status is COMPLETED. If not, report the failure.
+**Why this is mandatory**: Without auto-completion, verified milestones remain in VERIFIED state indefinitely. Requirements stay unmarked, progress.md is stale, and future sessions cannot tell the work was done. This is the root cause of "GSD-T forgot it did this work" — the milestone was built and verified but never formally completed.
+**Why a subagent**: Complete-milestone is a 12-step process (gap analysis, archive, version bump, git tag, doc ripple). Verify is already heavy with 8+ quality gates. Spawning a fresh-context subagent avoids compaction risk — and complete-milestone loads everything it needs from files (progress.md, verify-report.md, contracts).
 $ARGUMENTS
 ## Auto-Clear

package/commands/gsd-t-visualize.md CHANGED Viewed

@@ -39,7 +39,17 @@ Run via Bash:
 node ~/.claude/scripts/gsd-t-event-writer.js --type command_invoked --command gsd-t-visualize --reasoning "Launching dashboard" || true
 ```
-## Step 1.5: Graph Data for Dashboard
+## Step 1.5: Context Metrics for Dashboard
+If `.gsd-t/token-log.md` exists, the dashboard server automatically reads it and provides context utilization metrics for visualization. These metrics are served from the `/api/token-breakdown` endpoint and rendered as:
+1. **Context utilization timeline** — Ctx% over time, ordered by Datetime-start
+2. **Token breakdown by domain** — bar chart grouping Tokens by Domain column (gracefully handles older rows without Domain column — they are grouped as "(untagged)")
+3. **Compaction proximity warnings** — rows where Ctx% >= 70 are highlighted; rows where Ctx% >= 85 are marked critical (🔴)
+If `.gsd-t/token-log.md` does not exist, context metrics panels are hidden (not shown as errors).
+## Step 1.6: Graph Data for Dashboard
 If `.gsd-t/graph/index.json` exists, the dashboard can render entity-relationship visualizations from the graph data. The dashboard server will detect and serve graph data automatically — no additional configuration needed.

package/commands/gsd-t-wave.md CHANGED Viewed

@@ -79,8 +79,13 @@ After phase agent returns — run via Bash:
 Compute tokens and compaction:
 - No compaction (TOK_END >= TOK_START): `TOKENS=$((TOK_END-TOK_START))`, COMPACTED=null
 - Compaction detected (TOK_END < TOK_START): `TOKENS=$(((TOK_MAX-TOK_START)+TOK_END))`, COMPACTED=$DT_END
-Append to `.gsd-t/token-log.md` (create with header `| Datetime-start | Datetime-end | Command | Step | Model | Duration(s) | Notes | Tokens | Compacted |` if missing):
-`| {DT_START} | {DT_END} | gsd-t-wave | {PHASE} | sonnet | {DURATION}s | phase: {PHASE} | {TOKENS} | {COMPACTED} |`
+Compute context utilization — run via Bash:
+`if [ "${CLAUDE_CONTEXT_TOKENS_MAX:-0}" -gt 0 ]; then CTX_PCT=$(echo "scale=1; ${CLAUDE_CONTEXT_TOKENS_USED:-0} * 100 / ${CLAUDE_CONTEXT_TOKENS_MAX}" | bc); else CTX_PCT="N/A"; fi`
+Alert on context thresholds (display to user inline):
+- If CTX_PCT >= 85: `echo "🔴 CRITICAL: Context at ${CTX_PCT}% — compaction likely. Task MUST be split."`
+- If CTX_PCT >= 70: `echo "⚠️ WARNING: Context at ${CTX_PCT}% — approaching compaction threshold. Consider splitting in plan."`
+Append to `.gsd-t/token-log.md` (create with header `| Datetime-start | Datetime-end | Command | Step | Model | Duration(s) | Notes | Tokens | Compacted | Domain | Task | Ctx% |` if missing):
+`| {DT_START} | {DT_END} | gsd-t-wave | {PHASE} | sonnet | {DURATION}s | phase: {PHASE} | {TOKENS} | {COMPACTED} | | | {CTX_PCT} |`
 ### Phase Sequence
@@ -114,8 +119,13 @@ Spawn agent → `commands/gsd-t-impact.md`
 #### 5. EXECUTE
 Spawn agent → `commands/gsd-t-execute.md`
-- This is the heaviest phase. The execute agent will handle its own domain agent spawning and QA agent internally.
-- After: Read `progress.md`, verify status = EXECUTED
+- This is the heaviest phase. The execute agent uses **task-level dispatch** (fresh-dispatch-contract.md): one Task subagent per task within each domain, each receiving only scope.md + relevant contracts + single task + graph context + up to 5 prior summaries. The execute agent handles domain task-dispatching and QA internally.
+- **Adaptive replanning**: After each domain completes, the execute agent runs a replan check (per `adaptive-replan-contract.md`). If a completed domain's task summaries reveal new constraints (e.g., deprecated API, wrong column name, incompatible library), the execute agent checks remaining domains' `tasks.md` files for invalidated assumptions and revises them on disk before dispatching the next domain. Maximum 2 replan cycles per execute run — if exceeded, execution pauses for user input. All replan decisions are logged to the Decision Log in `progress.md`. The wave phase summary includes any replan actions taken.
+- **Team/parallel mode**: If the plan defines parallel domains (same wave), the execute agent dispatches each domain teammate with `isolation: "worktree"` (per worktree-isolation-contract.md). Each domain works in an isolated git worktree. After all domains complete, the execute agent runs the Sequential Merge Protocol: merge domain A → test → merge domain B → test. Per-domain rollback if tests fail. Worktrees are cleaned up after all merges complete.
+- After: Read `progress.md`, verify status = EXECUTED. Phase summary must include replan actions if any occurred:
+  ```
+  📋 Phase 5 (EXECUTE): {N}/{N} tasks done | Replan cycles: {N} | Domains revised: {list or "none"}
+  ```
 #### 6. TEST-SYNC
 Spawn agent → `commands/gsd-t-test-sync.md`
@@ -125,15 +135,19 @@ Spawn agent → `commands/gsd-t-test-sync.md`
 Spawn agent → `commands/gsd-t-integrate.md`
 - After: Read `progress.md`, verify status = INTEGRATED
-#### 8. VERIFY
+#### 8. VERIFY + COMPLETE
 Spawn agent → `commands/gsd-t-verify.md`
+- The verify agent runs all 8 standard quality gates **plus** the goal-backward verification step (Step 5.5 in gsd-t-verify.md), which checks that milestone goals are actually achieved end-to-end and scans for placeholder patterns per `.gsd-t/contracts/goal-backward-contract.md`
+- Goal-backward runs after all structural gates pass — CRITICAL or HIGH findings block verification; MEDIUM findings are warnings only
+- **Verify auto-invokes complete-milestone** (Step 8 of gsd-t-verify.md). The verify agent handles both verification AND milestone completion in a single agent context. Do NOT spawn a separate complete agent.
 - After: Read `progress.md`, check status:
-  - VERIFIED → proceed to Complete
-  - VERIFY_FAILED → handle remediation (see Error Recovery)
-#### 9. COMPLETE
-Spawn agent → `commands/gsd-t-complete-milestone.md`
-- After: Read `progress.md`, verify status = COMPLETED
+  - COMPLETED → milestone done (verify passed and auto-completed)
+  - VERIFIED → verify passed but complete-milestone failed — spawn a standalone complete agent as fallback
+  - VERIFY_FAILED → handle remediation (see Error Recovery) — includes goal-backward failures
+- Phase summary must include the `Goal-Backward:` line from verify-report.md:
+  ```
+  📋 Phase 8 (VERIFY+COMPLETE): {N} gates passed | Goal-Backward: {PASS/WARN/FAIL} — {N} requirements checked, {N} findings
+  ```
 ### Between Each Phase
@@ -286,16 +300,17 @@ If command files in `~/.claude/commands/` are tampered with, wave agents will ex
 │    check           check       check       check +       check              │
 │                                           gate                              │
 │                                                                              │
-│  ┌──────────┐   ┌────────┐   ┌───────────┐       ┌─────────────────┐       │
-│  │ COMPLETE │ ← │ VERIFY │ ← │ INTEGRATE │ ←──── │ FULL TEST-SYNC  │       │
-│  │ agent 9  │   │agent 8 │   │  agent 7  │       │    agent 6      │       │
-│  └────┬────┘   └────┬────┘   └─────┬─────┘       └────────┬────────┘       │
-│       ↓              ↓              ↓                      ↓               │
-│    archive        status +       status                 status              │
-│    git tag        gate check     check                  check               │
+│  ┌──────────────────┐   ┌───────────┐       ┌─────────────────┐            │
+│  │ VERIFY+COMPLETE  │ ← │ INTEGRATE │ ←──── │ FULL TEST-SYNC  │            │
+│  │    agent 8       │   │  agent 7  │       │    agent 6      │            │
+│  └────────┬─────────┘   └─────┬─────┘       └────────┬────────┘            │
+│           ↓                    ↓                      ↓                     │
+│    gate check →             status                 status                   │
+│    auto-complete            check                  check                    │
+│    archive + tag                                                            │
 │                                                                              │
 │  Each agent: fresh context window, reads state from files, dies when done   │
-│  Orchestrator: ~30KB total, never compacts                                  │
+│  Orchestrator: 8 agents (was 9), ~30KB total, never compacts               │
 └──────────────────────────────────────────────────────────────────────────────┘
 ```