npm - @tekyzinc/gsd-t - Versions diffs - 2.39.13 → 2.46.11 - Mend

@tekyzinc/gsd-t 2.39.13 → 2.46.11

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (50) hide show

package/CHANGELOG.md +12 -0
package/README.md +19 -10
package/bin/desktop.ini +2 -0
package/bin/global-sync-manager.js +350 -0
package/bin/gsd-t.js +592 -2
package/bin/metrics-collector.js +167 -0
package/bin/metrics-rollup.js +200 -0
package/bin/patch-lifecycle.js +195 -0
package/bin/rule-engine.js +160 -0
package/commands/desktop.ini +2 -0
package/commands/gsd-t-complete-milestone.md +194 -6
package/commands/gsd-t-debug.md +38 -3
package/commands/gsd-t-doc-ripple.md +148 -0
package/commands/gsd-t-execute.md +328 -54
package/commands/gsd-t-help.md +32 -10
package/commands/gsd-t-integrate.md +59 -7
package/commands/gsd-t-metrics.md +143 -0
package/commands/gsd-t-plan.md +49 -2
package/commands/gsd-t-qa.md +26 -5
package/commands/gsd-t-quick.md +36 -3
package/commands/gsd-t-status.md +78 -0
package/commands/gsd-t-test-sync.md +23 -2
package/commands/gsd-t-verify.md +142 -10
package/commands/gsd-t-visualize.md +11 -1
package/commands/gsd-t-wave.md +64 -18
package/docs/GSD-T-README.md +10 -6
package/docs/architecture.md +84 -2
package/docs/ci-examples/desktop.ini +2 -0
package/docs/ci-examples/github-actions.yml +104 -0
package/docs/ci-examples/gitlab-ci.yml +116 -0
package/docs/desktop.ini +2 -0
package/docs/framework-comparison-scorecard.md +160 -0
package/docs/infrastructure.md +87 -1
package/docs/prd-graph-engine.md +2 -2
package/docs/prd-gsd2-hybrid.md +258 -135
package/docs/requirements.md +66 -2
package/examples/.gsd-t/contracts/desktop.ini +2 -0
package/examples/.gsd-t/desktop.ini +2 -0
package/examples/.gsd-t/domains/desktop.ini +2 -0
package/examples/.gsd-t/domains/example-domain/desktop.ini +2 -0
package/examples/desktop.ini +2 -0
package/examples/rules/.gitkeep +0 -0
package/examples/rules/desktop.ini +2 -0
package/package.json +40 -40
package/scripts/desktop.ini +2 -0
package/scripts/gsd-t-dashboard-server.js +19 -2
package/scripts/gsd-t-dashboard.html +63 -0
package/scripts/gsd-t-event-writer.js +1 -0
package/templates/CLAUDE-global.md +92 -10
package/templates/desktop.ini +2 -0

package/commands/gsd-t-integrate.md CHANGED Viewed

@@ -50,9 +50,26 @@ For each contract file:
 Fix any mismatches BEFORE proceeding to integration.
+## Step 2.5: Worktree Merge Status Check
+Before wiring integration points, check whether team mode execution left any domains with rolled-back worktree merges:
+1. Read `.gsd-t/progress.md` — look for `[rollback]` entries in the Decision Log from the execute phase
+2. If any domains were rolled back: list them and their failure reasons before proceeding
+3. Integration point wiring should only proceed for domains whose worktree merges PASSED — rolled-back domains are not yet in the main working tree
+If rolled-back domains exist, report them to the user (or if Level 3: log to `.gsd-t/deferred-items.md` as `[integration-gap] {domain}: not yet merged — worktree rollback during execute`). Do NOT attempt to re-merge rolled-back domains here; that requires re-running execute for the affected domain.
 ## Step 3: Wire Integration Points
-Work through each integration point in `integration-points.md`:
+Work through each integration point in `integration-points.md`. If integration work spans multiple domains with independent tasks, use the **task-level dispatch pattern** (per fresh-dispatch-contract.md): spawn one Task subagent per integration task, passing only the relevant contracts, the specific integration point to wire, and summaries from prior integration tasks (max 5, 10-20 lines each). This prevents context accumulation across integration tasks.
+**Multi-domain integration merging**: If integration work itself requires merging domain outputs that weren't merged during execute (e.g., domains executed in separate waves and integration needs to combine them), use the Sequential Merge Protocol from `.gsd-t/contracts/worktree-isolation-contract.md`:
+1. Sort domains by dependency order (from integration-points.md)
+2. Merge domain A's branch → run tests → merge domain B's branch → run tests
+3. If tests fail after a merge, roll back that domain's merge and log the failure
+4. Contract validation runs between merges
+5. All temporary branches cleaned up after integration completes
 For each connection:
 1. Identify the producing domain (provides the interface)
@@ -105,8 +122,14 @@ Spawn a QA subagent via the Task tool to verify contract compliance at all domai
 Task subagent (general-purpose, model: haiku):
 "Run contract compliance tests for this integration. Read .gsd-t/contracts/ for all contract definitions.
 Test every domain boundary: verify that producers and consumers match their contract shapes.
-Run the full test suite.
-Report: boundary-by-boundary test results with pass/fail counts."
+Run ALL configured test suites — detect and run every one:
+a. Unit tests (vitest/jest/mocha): run the full suite
+b. E2E tests: check for playwright.config.* or cypress.config.* — if found, run the FULL E2E suite
+c. NEVER skip E2E when a config file exists. Running only unit tests is a QA FAILURE.
+d. AUDIT E2E test quality: Review each Playwright spec — if any test only checks element existence
+   (isVisible, toBeAttached, toBeEnabled) without verifying functional behavior (state changes,
+   data loaded, content updated after actions), flag it as 'SHALLOW TEST — needs functional assertions'.
+Report: 'Unit: X/Y pass | E2E: X/Y pass (or N/A if no config) | Boundary: pass/fail by contract | Shallow tests: N'"
 ```
 **OBSERVABILITY LOGGING (MANDATORY):**
@@ -117,8 +140,13 @@ After subagent returns — run via Bash:
 Compute tokens and compaction:
 - No compaction (TOK_END >= TOK_START): `TOKENS=$((TOK_END-TOK_START))`, COMPACTED=null
 - Compaction detected (TOK_END < TOK_START): `TOKENS=$(((TOK_MAX-TOK_START)+TOK_END))`, COMPACTED=$DT_END
-Append to `.gsd-t/token-log.md` (create with header `| Datetime-start | Datetime-end | Command | Step | Model | Duration(s) | Notes | Tokens | Compacted |` if missing):
-`| {DT_START} | {DT_END} | gsd-t-integrate | Step 5 | haiku | {DURATION}s | {pass/fail}, {N} boundaries tested | {TOKENS} | {COMPACTED} |`
+Compute context utilization — run via Bash:
+`if [ "${CLAUDE_CONTEXT_TOKENS_MAX:-0}" -gt 0 ]; then CTX_PCT=$(echo "scale=1; ${CLAUDE_CONTEXT_TOKENS_USED:-0} * 100 / ${CLAUDE_CONTEXT_TOKENS_MAX}" | bc); else CTX_PCT="N/A"; fi`
+Alert on context thresholds (display to user inline):
+- If CTX_PCT >= 85: `echo "🔴 CRITICAL: Context at ${CTX_PCT}% — compaction likely. Task MUST be split."`
+- If CTX_PCT >= 70: `echo "⚠️ WARNING: Context at ${CTX_PCT}% — approaching compaction threshold. Consider splitting in plan."`
+Append to `.gsd-t/token-log.md` (create with header `| Datetime-start | Datetime-end | Command | Step | Model | Duration(s) | Notes | Tokens | Compacted | Domain | Task | Ctx% |` if missing):
+`| {DT_START} | {DT_END} | gsd-t-integrate | Step 5 | haiku | {DURATION}s | {pass/fail}, {N} boundaries tested | {TOKENS} | {COMPACTED} | | | {CTX_PCT} |`
 If QA found issues, append each to `.gsd-t/qa-issues.md` (create with header `| Date | Command | Step | Model | Duration(s) | Severity | Finding |` if missing):
 `| {DT_START} | gsd-t-integrate | Step 5 | haiku | {DURATION}s | {severity} | {finding} |`
@@ -145,11 +173,35 @@ Integration is where the real system takes shape. Verify documentation matches r
 After integration and doc ripple, verify everything works together:
 1. **Update tests**: Add or update integration tests for newly wired domain boundaries
-2. **Run all tests**: Execute the full test suite — integration often introduces cross-domain failures
+2. **Run ALL configured test suites** — detect and run every one:
+   a. Unit/integration tests (vitest/jest/mocha)
+   b. If `playwright.config.*` exists → run `npx playwright test` (full suite, not just affected specs)
+   c. Unit tests alone are NEVER sufficient when E2E exists
+   d. Report: "Unit: X/Y pass | E2E: X/Y pass"
 3. **Verify passing**: All tests must pass. If any fail, fix before proceeding (up to 2 attempts)
-4. **Run E2E tests**: If an E2E framework exists, run the full E2E suite — integration is where end-to-end flows break
+4. **Functional test quality**: Spot-check E2E specs — every assertion must verify functional behavior (state changed, data loaded, content updated after action), not just element existence. Shallow tests that would pass on an empty HTML page are not acceptable.
 5. **Smoke test results**: Ensure the Step 4 smoke test results are still valid after any fixes
+## Step 7.5: Doc-Ripple (Automated)
+After all integration work is committed but before reporting completion:
+1. Run threshold check — read `git diff --name-only HEAD~1` and evaluate against doc-ripple-contract.md trigger conditions
+2. If SKIP: log "Doc-ripple: SKIP — {reason}" and proceed to completion
+3. If FIRE: spawn doc-ripple agent:
+⚙ [{model}] gsd-t-doc-ripple → blast radius analysis + parallel updates
+Task subagent (general-purpose, model: sonnet):
+"Execute the doc-ripple workflow per commands/gsd-t-doc-ripple.md.
+Git diff context: {files changed list}
+Command that triggered: integrate
+Produce manifest at .gsd-t/doc-ripple-manifest.md.
+Update all affected documents.
+Report: 'Doc-ripple: {N} checked, {N} updated, {N} skipped'"
+4. After doc-ripple returns, verify manifest exists and report summary inline
 ## Step 8: Handle Integration Issues
 For each issue found:

package/commands/gsd-t-metrics.md ADDED Viewed

@@ -0,0 +1,143 @@
+# GSD-T: Metrics — View Task Telemetry and Process Health
+You are displaying metrics data from the GSD-T telemetry system. Read JSONL files directly — no module imports needed.
+## Step 1: Load Metrics Data
+Read:
+1. `.gsd-t/metrics/task-metrics.jsonl` — per-task telemetry records
+2. `.gsd-t/metrics/rollup.jsonl` — per-milestone aggregation with ELO and heuristics
+3. `.gsd-t/progress.md` — current milestone ID (for default filter)
+If neither file exists: display "No metrics data yet. Metrics are collected automatically during execute, quick, and debug commands." and stop.
+If `$ARGUMENTS` contains a milestone ID (e.g., "M25"), use that as the filter. Otherwise, use the current active milestone from progress.md.
+## Step 2: Display Milestone Summary
+From `rollup.jsonl`, find the entry matching the target milestone. Display:
+```
+## Metrics — {milestone}
+| Metric              | Value                          |
+|---------------------|--------------------------------|
+| Tasks               | {total_tasks}                  |
+| First-pass rate     | {first_pass_rate * 100}%       |
+| Avg duration        | {avg_duration_s}s              |
+| Avg context         | {avg_context_pct}%             |
+| Total fix cycles    | {total_fix_cycles}             |
+| Total tokens        | {total_tokens}                 |
+```
+If no rollup entry exists for the milestone, compute summary directly from task-metrics.jsonl records.
+## Step 3: Display Process ELO
+```
+## Process ELO
+{elo_after} ({elo_delta > 0 ? '↑' : '↓'} {elo_delta} from {elo_before})
+```
+If no previous milestone, show: `{elo_after} (baseline — first milestone)`
+## Step 4: Display Signal Distribution
+From rollup `signal_distribution` or computed from task-metrics:
+```
+## Signal Distribution
+| Signal Type      | Count |
+|------------------|-------|
+| pass-through     | {N}   |
+| fix-cycle        | {N}   |
+| debug-invoked    | {N}   |
+| user-correction  | {N}   |
+| phase-skip       | {N}   |
+```
+## Step 5: Display Domain Breakdown
+From rollup `domain_breakdown`:
+```
+## Domain Breakdown
+| Domain            | Tasks | Pass% | Avg Duration |
+|-------------------|-------|-------|--------------|
+| {domain}          | {N}   | {N}%  | {N}s         |
+```
+## Step 6: Display Trend Comparison
+If `trend_delta` exists (previous milestone data available):
+```
+## Trend vs Previous Milestone
+| Metric          | Delta                    |
+|-----------------|--------------------------|
+| First-pass rate | {delta > 0 ? '↑' : '↓'} {delta}% |
+| Avg duration    | {delta > 0 ? '↑' : '↓'} {delta}s |
+| ELO             | {delta > 0 ? '↑' : '↓'} {delta}  |
+```
+If no previous milestone: "First milestone — no trend data yet."
+## Step 7: Display Heuristic Anomalies
+If `heuristic_flags` has entries:
+```
+## Anomaly Detection
+| Heuristic                     | Severity | Description              |
+|-------------------------------|----------|--------------------------|
+| {heuristic}                   | {sev}    | {description}            |
+```
+If no anomalies: "No anomalies detected."
+## Step 8: Cross-Project Comparison (when --cross-project flag present)
+If `$ARGUMENTS` contains `--cross-project`:
+1. Run via Bash:
+   ```bash
+   node -e "const g = require('./bin/global-sync-manager.js'); const r = g.compareSignalDistributions(require('./package.json').name || require('path').basename(process.cwd())); console.log(JSON.stringify(r));" 2>/dev/null
+   ```
+2. If the result has `insufficient_data: true`, display:
+   "No global metrics yet — complete milestones in multiple projects to enable cross-project comparison"
+3. Otherwise, display the cross-project signal distribution comparison:
+   ```
+   ## Cross-Project Signal Distribution
+   | Project              | Tasks | Pass-Through | Fix-Cycle | Other |
+   |----------------------|-------|--------------|-----------|-------|
+   | {project} {★ if is_queried} | {N} | {rate}    | {rate}    | ...   |
+   ```
+4. If `$ARGUMENTS` also contains `--domain {domainType}`, run:
+   ```bash
+   node -e "const g = require('./bin/global-sync-manager.js'); const r = g.getDomainTypeComparison('{domainType}'); console.log(JSON.stringify(r));" 2>/dev/null
+   ```
+   Display the domain-type comparison table:
+   ```
+   ## Domain-Type Comparison: {domainType}
+   | Project              | Tasks | Pass-Through | Fix-Cycle |
+   |----------------------|-------|--------------|-----------|
+   | {project}            | {N}   | {count}      | {count}   |
+   ```
+If `--cross-project` is NOT in `$ARGUMENTS`: skip this step entirely (no change to existing behavior).
+$ARGUMENTS
+## Auto-Clear
+All work is committed to project files. Execute `/clear` to free the context window for the next command.

package/commands/gsd-t-plan.md CHANGED Viewed

@@ -25,6 +25,24 @@ If `.gsd-t/graph/meta.json` exists (graph index is available):
 If graph is not available, skip this step.
+## Step 1.7: Pre-Mortem — Historical Failure Analysis
+Before creating task lists, check historical task-metrics for domain-level failure patterns from previous milestones:
+1. Run via Bash:
+   `node -e "const c = require('./bin/metrics-collector.js'); const domains = [/* list domain names from scope files */]; domains.forEach(d => { const w = c.getPreFlightWarnings(d); if(w.length) w.forEach(x => console.log('⚠️ ' + x)); });" 2>/dev/null || true`
+2. If any domain has `first_pass_rate < 0.6` historically:
+   - Display warning inline: `⚠️ Domain {name} has historically low first-pass rate ({rate}%). Consider: smaller tasks, more explicit acceptance criteria, or additional contract detail.`
+   - This is **non-blocking** — it informs task design, does not prevent planning.
+3. If `.gsd-t/metrics/task-metrics.jsonl` does not exist: skip this step silently (first milestone, no historical data).
+4. **Rule-based pre-mortem**: Run via Bash:
+   `node -e "const re = require('./bin/rule-engine.js'); const domains = [/* list domain names */]; domains.forEach(d => { const rules = re.getPreMortemRules(d); if(rules.length) rules.forEach(r => console.log('RULE ' + r.id + ': ' + r.name + ' — historically triggered for domains like ' + d)); });" 2>/dev/null || true`
+   If matching rules found: display warnings inline (non-blocking — informs task design). Falls back gracefully if rules.jsonl does not exist or is empty.
 ## Step 2: Create Task Lists Per Domain
 ### SharedCore-First Pre-Check
@@ -67,6 +85,30 @@ For each domain, write `.gsd-t/domains/{domain-name}/tasks.md`:
 4. **Contract-bound**: Every task that crosses a domain boundary must reference the specific contract it implements
 5. **Ordered**: Tasks within a domain are numbered in execution order
 6. **No implicit knowledge**: Don't assume the executing agent remembers previous tasks — reference contracts and files explicitly
+7. **Context-window fit**: Each task MUST be executable within a single context window. Apply the scope validation heuristics below.
+### Task Scope Validation
+After writing each task, apply this heuristic check before finalizing:
+**Splitting candidates — flag if ANY of these are true:**
+- Task lists **more than 5 files** to modify or create
+- Task has **more than 3 complex dependencies** (other tasks, contracts, or external systems it must read and understand)
+- Task description spans multiple distinct concerns (e.g., "implement X and also refactor Y and update Z docs")
+**Warning threshold:** If a task is flagged, emit:
+> ⚠️ **Task scope warning — {domain} Task {N}**: Estimated context load is high ({N} files, {N} dependencies). This task may approach the 70% context window threshold. Consider splitting into:
+> - Task {N}a: {first concern}
+> - Task {N}b: {second concern}
+**Auto-split rule (Level 3 Full Auto):** If a task has >5 files AND >3 dependencies, split it automatically. Renumber subsequent tasks. Document the split rationale in the task's Dependencies field.
+**Guidance for estimating context size:**
+- Each file to read ≈ 1–5% of context window (varies by file size)
+- CLAUDE.md + scope.md + constraints.md + contracts ≈ 15–25% baseline overhead
+- Tasks with >5 files or >3 cross-domain contracts commonly exceed 70% total context
+This rule implements the "task must fit in one context window" constraint — a task that compacts its subagent is a task that produces incomplete or corrupt output.
 ### Cross-Domain Duplicate Operation Scan
@@ -259,8 +301,13 @@ After subagent returns — run via Bash:
 Compute tokens and compaction:
 - No compaction (TOK_END >= TOK_START): `TOKENS=$((TOK_END-TOK_START))`, COMPACTED=null
 - Compaction detected (TOK_END < TOK_START): `TOKENS=$(((TOK_MAX-TOK_START)+TOK_END))`, COMPACTED=$DT_END
-Append to `.gsd-t/token-log.md` (create with header `| Datetime-start | Datetime-end | Command | Step | Model | Duration(s) | Notes | Tokens | Compacted |` if missing):
-`| {DT_START} | {DT_END} | gsd-t-plan | Step 7 | haiku | {DURATION}s | {PASS/FAIL}, iteration {N} | {TOKENS} | {COMPACTED} |`
+Compute context utilization — run via Bash:
+`if [ "${CLAUDE_CONTEXT_TOKENS_MAX:-0}" -gt 0 ]; then CTX_PCT=$(echo "scale=1; ${CLAUDE_CONTEXT_TOKENS_USED:-0} * 100 / ${CLAUDE_CONTEXT_TOKENS_MAX}" | bc); else CTX_PCT="N/A"; fi`
+Alert on context thresholds (display to user inline):
+- If CTX_PCT >= 85: `echo "🔴 CRITICAL: Context at ${CTX_PCT}% — compaction likely. Task MUST be split."`
+- If CTX_PCT >= 70: `echo "⚠️ WARNING: Context at ${CTX_PCT}% — approaching compaction threshold. Consider splitting in plan."`
+Append to `.gsd-t/token-log.md` (create with header `| Datetime-start | Datetime-end | Command | Step | Model | Duration(s) | Notes | Tokens | Compacted | Domain | Task | Ctx% |` if missing):
+`| {DT_START} | {DT_END} | gsd-t-plan | Step 7 | haiku | {DURATION}s | {PASS/FAIL}, iteration {N} | {TOKENS} | {COMPACTED} | | | {CTX_PCT} |`
 If validation FAIL, append each gap to `.gsd-t/qa-issues.md` (create with header `| Date | Command | Step | Model | Duration(s) | Severity | Finding |` if missing):
 `| {DT_START} | gsd-t-plan | Step 7 | haiku | {DURATION}s | medium | {gap description} |`

package/commands/gsd-t-qa.md CHANGED Viewed

@@ -81,13 +81,16 @@ Your behavior depends on which phase spawned you:
 ### During Verify
 **Trigger**: Lead invokes verify phase
-**Action**: Full test audit
+**Action**: Full test audit + shallow test detection
 1. Run ALL tests — contract tests, acceptance tests, edge case tests, existing project tests
 2. Coverage audit: For every contract, confirm tests exist and pass
 3. For every new feature/mode/flow, confirm Playwright specs cover happy path, error states, edge cases
-4. Gap report: List any untested contracts or code paths
-5. Report: `QA: {pass|fail} — {N} contract tests, {N} acceptance tests, {N} edge case tests. Gaps: {list or "none"}`
+4. **Shallow test audit**: Read every Playwright spec file. For each `test()` block, check whether the assertions verify functional behavior (state changes, data flow, content updates after actions) or only check element existence (isVisible, toBeAttached, toBeEnabled). Flag any test that would pass on an empty HTML shell as `SHALLOW — needs functional assertions`.
+5. Gap report: List any untested contracts, code paths, AND shallow tests
+6. Report: `QA: {pass|fail} — {N} contract tests, {N} acceptance tests, {N} edge case tests. Gaps: {list or "none"}. Shallow E2E tests: {N} (list or "none")`
+**Shallow tests block verification.** A passing E2E suite where tests don't actually verify feature behavior is equivalent to a failing suite.
 ### During Quick
 **Trigger**: Lead runs a quick task
@@ -189,10 +192,28 @@ For each table in `schema-contract.md`:
 For each component in `component-contract.md`:
 - Each `## ComponentName` → one `test.describe` block
 - `Props:` → renders with required props, handles missing optional props
-- `Events:` → event handlers fire correctly
-- API references → verify correct API calls made
+- `Events:` → event handlers fire correctly AND produce the expected state change
+- API references → verify correct API calls made AND responses rendered correctly
 - Auto-generate: empty form, partial form, network error handling
+### Functional E2E Test Standard (MANDATORY for all Playwright specs)
+**E2E tests that only verify element existence are LAYOUT tests, not functional tests. Layout tests pass even when every feature is broken. This is a QA failure.**
+Every Playwright spec MUST verify functional behavior — that actions produce the correct outcome:
+| Test Pattern | WRONG (layout test) | RIGHT (functional test) |
+|---|---|---|
+| Tab switching | `expect(tab).toBeVisible()` | Click tab → assert NEW content loaded (text, data unique to that tab) |
+| Form submit | `expect(submitBtn).toBeEnabled()` | Fill form → submit → assert success message AND data persisted (API call, list updated) |
+| Terminal/editor | `expect(terminal).toBeAttached()` | Open terminal → type command → assert output appears |
+| WebSocket | `expect(statusBadge).toBeVisible()` | Wait for connection → assert status text changes to "Connected" → send message → assert response |
+| Navigation | `expect(link).toHaveAttribute('href')` | Click link → assert URL changed AND destination content rendered |
+| Toggle/mode | `expect(toggle).toBeVisible()` | Click toggle → assert the EFFECT (dark mode CSS applied, panel expanded with content, feature enabled) |
+| Error state | `expect(errorDiv).toBeVisible()` | Trigger error → assert message content → assert recovery action works |
+**Rule: If a test would pass on an empty HTML shell with the right element IDs, it is not a functional test. Every assertion must prove the feature works, not that the element exists.**
 ## Test File Conventions
 - **Location**: Project's test directory (detected from `playwright.config.*` or `package.json`)

package/commands/gsd-t-quick.md CHANGED Viewed

@@ -83,6 +83,16 @@ When you encounter unexpected situations:
 5. Verify it works
 6. Commit: `[quick] {description}`
+## Step 3.5: Emit Task Metrics
+After committing, emit a task-metrics record for this quick task — run via Bash:
+`node bin/metrics-collector.js --milestone {current-milestone-or-none} --domain {domain-or-quick} --task quick-{timestamp} --command quick --duration_s {elapsed} --tokens_used {estimated} --context_pct ${CTX_PCT:-0} --pass {true|false} --fix_cycles {0|N} --signal_type {pass-through|fix-cycle} --notes "[quick] {description}" 2>/dev/null || true`
+Signal type: `pass-through` if task completed on first attempt; `fix-cycle` if rework was needed.
+Emit task_complete event — run via Bash:
+`node ~/.claude/scripts/gsd-t-event-writer.js --type task_complete --command gsd-t-quick --reasoning "signal_type={signal_type}, domain={domain}" --outcome {success|failure} || true`
 ## Step 4: Document Ripple (if GSD-T is active)
 If `.gsd-t/progress.md` exists, assess what documentation was affected and update ALL relevant files:
@@ -123,9 +133,12 @@ Quick does not mean skip testing. Before committing:
    - Playwright E2E specs (if UI/routes/flows/modes changed): create new specs for new functionality, update existing specs for changed behavior
    - Cover all modes/flags affected by this change
    - "No feature code without test code" applies to quick tasks too
-2. **Run the FULL test suite** — not just affected tests:
-   - All unit/integration tests
-   - Full Playwright E2E suite (if configured)
+   - **Functional tests only** — every E2E assertion must verify an action produced the correct outcome (state changed, data loaded, content updated). Tests that only check element existence (`isVisible`, `toBeEnabled`) are shallow/layout tests and are not acceptable. If a test would pass on an empty HTML page with the right IDs, rewrite it.
+2. **Run ALL configured test suites** — not just affected tests, not just one suite:
+   a. Detect all runners: check for vitest/jest config, playwright.config.*, cypress.config.*
+   b. Run EVERY detected suite. Unit tests alone are NEVER sufficient when E2E exists.
+   c. If `playwright.config.*` exists → `npx playwright test` (full suite)
+   d. Report ALL results: "Unit: X/Y pass | E2E: X/Y pass"
    - Fix any failures before proceeding (up to 2 attempts)
 3. **Verify against requirements**:
    - Does the change satisfy its intended requirement?
@@ -133,6 +146,26 @@ Quick does not mean skip testing. Before committing:
    - If a contract exists for the interface touched, does the code still match?
 4. **No test framework?**: Set one up, or at minimum manually verify and document how in the commit message
+## Step 6: Doc-Ripple (Automated)
+After all work is committed but before reporting completion:
+1. Run threshold check — read `git diff --name-only HEAD~1` and evaluate against doc-ripple-contract.md trigger conditions
+2. If SKIP: log "Doc-ripple: SKIP — {reason}" and proceed to completion
+3. If FIRE: spawn doc-ripple agent:
+⚙ [{model}] gsd-t-doc-ripple → blast radius analysis + parallel updates
+Task subagent (general-purpose, model: sonnet):
+"Execute the doc-ripple workflow per commands/gsd-t-doc-ripple.md.
+Git diff context: {files changed list}
+Command that triggered: quick
+Produce manifest at .gsd-t/doc-ripple-manifest.md.
+Update all affected documents.
+Report: 'Doc-ripple: {N} checked, {N} updated, {N} skipped'"
+4. After doc-ripple returns, verify manifest exists and report summary inline
 $ARGUMENTS
 ## Auto-Clear

package/commands/gsd-t-status.md CHANGED Viewed

@@ -57,6 +57,59 @@ If `.gsd-t/backlog.md` exists, read and parse it. Show total count and top 3 ite
 If there are blockers or issues, highlight them.
 If the user provides $ARGUMENTS, focus the status on that specific domain or aspect.
+## Token Usage Breakdown
+If `.gsd-t/token-log.md` exists, read it and append a token breakdown to the status report.
+Parse each row in the table. Handle both old format (9 columns) and extended format (12 columns with Domain, Task, Ctx%). Rows with missing or empty Domain column are assigned domain "(untagged)".
+### Token Usage by Domain
+Group rows by Domain. For each domain, sum Tokens and collect all Ctx% values (ignoring "N/A" and empty). Display:
+```
+## Token Usage by Domain
+| Domain         | Tokens | Subagents | Peak Ctx% |
+|----------------|--------|-----------|-----------|
+| auth           | 12,400 | 4         | 14%       |
+| notifications  | 45,200 | 3         | 52% ⚠️    |
+| (untagged)     | 8,100  | 6         | N/A       |
+```
+Flag any domain where Peak Ctx% >= 70 with `⚠️` suffix.
+### Token Usage by Phase/Command
+Group rows by Command. For each command, sum Tokens and count subagent rows. Display:
+```
+## Token Usage by Command
+| Command       | Tokens | Subagents |
+|---------------|--------|-----------|
+| gsd-t-execute | 86,200 | 14        |
+| gsd-t-wave    | 12,400 | 9         |
+| gsd-t-plan    | 3,400  | 1         |
+```
+If token-log.md does not exist or is empty, skip this section entirely (no error).
+## Process Health
+If `.gsd-t/metrics/rollup.jsonl` exists, read the latest entry and append to the status report:
+```
+Process Health:
+  ELO: {elo_after} ({elo_delta > 0 ? '↑' : '↓'} {elo_delta})
+  Quality: {first_pass_rate * 100}% first-pass rate | {total_fix_cycles} fix cycles
+```
+If `.gsd-t/metrics/task-metrics.jsonl` exists but no rollup.jsonl, compute first_pass_rate directly from task-metrics for the current milestone and display:
+```
+Process Health:
+  Quality: {rate}% first-pass rate (current milestone, no rollup yet)
+```
+If neither file exists, skip this section entirely.
 ## Graph Status
 If `.gsd-t/graph/meta.json` exists, read it and append to the status report:
@@ -81,6 +134,31 @@ After displaying the project status, check for GSD-T updates:
 5. If versions match, skip — don't show anything
+## Global ELO & Cross-Project Rankings
+After the Process Health section, check for global metrics:
+1. Run via Bash:
+   ```bash
+   node -e "const g = require('./bin/global-sync-manager.js'); const name = (() => { try { return require('./package.json').name; } catch { return require('path').basename(process.cwd()); } })(); const elo = g.getGlobalELO(name); const ranks = g.getProjectRankings(); console.log(JSON.stringify({ elo, ranks, name }));" 2>/dev/null
+   ```
+2. If the result returns `elo: null` or the command fails: display "No global metrics yet" and skip.
+3. If global ELO data exists, display:
+   ```
+   Global ELO: {elo} (rank #{position} of {total} projects)
+   ```
+   Where position is the 1-based index of the current project in the rankings array.
+4. If 2+ projects have global rollup data, display the top 5 rankings:
+   ```
+   ## Cross-Project Rankings (Top 5)
+   | Rank | Project          | ELO    | Latest Milestone |
+   |------|------------------|--------|------------------|
+   | 1    | {project}        | {elo}  | {milestone}      |
+   ```
 $ARGUMENTS
 ## Auto-Clear

package/commands/gsd-t-test-sync.md CHANGED Viewed

@@ -111,8 +111,8 @@ pytest tests/test_{module}.py -v
 npm test -- --testPathPattern="{module}"
 ```
-### B) E2E Tests
-If an E2E framework is detected, run E2E tests affected by the changes:
+### B) E2E Tests (MANDATORY when config exists)
+If `playwright.config.*` or `cypress.config.*` exists, you MUST run E2E tests — skipping is never acceptable:
 ```bash
 # Playwright
@@ -151,6 +151,27 @@ If Playwright is configured (`playwright.config.*` or Playwright in dependencies
 **This is NOT optional.** Every new code path that a user can reach must have a Playwright spec. "We'll add tests later" is never acceptable.
+**FUNCTIONAL TESTS — NOT LAYOUT TESTS (MANDATORY):**
+E2E specs that only check element existence (`isVisible`, `toBeAttached`, `toBeEnabled`) are
+layout tests. Layout tests pass even when every feature is broken — they are worthless for QA.
+Every Playwright assertion MUST verify **functional behavior** — that an action produced the
+correct outcome:
+- **Tab/navigation**: Click → assert the NEW content loaded (unique text, data, or elements
+  that only appear on the destination view). Never just assert the tab element exists.
+- **Forms**: Fill → submit → assert success feedback AND data persisted (API call observed
+  via `page.waitForResponse`, or list/table updated with new entry).
+- **Interactive widgets** (terminals, editors, code panels): Open → interact → assert the
+  widget responded (keystroke produced output, content was saved, command executed).
+- **Connections** (WebSocket, SSE, polling): Assert status transitions ("Connecting" →
+  "Connected") and verify data flows through the connection.
+- **State toggles** (dark mode, expand/collapse, enable/disable): Assert the EFFECT of the
+  toggle, not just that the toggle control exists.
+- **Error handling**: Trigger error → assert error content → assert recovery path works.
+**Rule: If a test would pass on an empty HTML page with the correct element IDs and no
+JavaScript, it is not a functional test. Rewrite it.**
 ### D) Capture Results
 For all test types:
 - PASS: Test still valid