@tekyzinc/gsd-t 2.39.13 → 2.45.11
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +17 -9
- package/bin/desktop.ini +2 -0
- package/bin/global-sync-manager.js +350 -0
- package/bin/gsd-t.js +592 -2
- package/bin/metrics-collector.js +167 -0
- package/bin/metrics-rollup.js +200 -0
- package/bin/patch-lifecycle.js +195 -0
- package/bin/rule-engine.js +160 -0
- package/commands/desktop.ini +2 -0
- package/commands/gsd-t-complete-milestone.md +192 -5
- package/commands/gsd-t-debug.md +16 -2
- package/commands/gsd-t-execute.md +257 -52
- package/commands/gsd-t-help.md +25 -10
- package/commands/gsd-t-integrate.md +35 -7
- package/commands/gsd-t-metrics.md +143 -0
- package/commands/gsd-t-plan.md +49 -2
- package/commands/gsd-t-quick.md +15 -3
- package/commands/gsd-t-status.md +78 -0
- package/commands/gsd-t-test-sync.md +2 -2
- package/commands/gsd-t-verify.md +140 -9
- package/commands/gsd-t-visualize.md +11 -1
- package/commands/gsd-t-wave.md +34 -19
- package/docs/GSD-T-README.md +9 -6
- package/docs/architecture.md +84 -2
- package/docs/ci-examples/desktop.ini +2 -0
- package/docs/ci-examples/github-actions.yml +104 -0
- package/docs/ci-examples/gitlab-ci.yml +116 -0
- package/docs/desktop.ini +2 -0
- package/docs/infrastructure.md +87 -1
- package/docs/prd-graph-engine.md +2 -2
- package/docs/prd-gsd2-hybrid.md +258 -135
- package/docs/requirements.md +63 -2
- package/examples/.gsd-t/contracts/desktop.ini +2 -0
- package/examples/.gsd-t/desktop.ini +2 -0
- package/examples/.gsd-t/domains/desktop.ini +2 -0
- package/examples/.gsd-t/domains/example-domain/desktop.ini +2 -0
- package/examples/desktop.ini +2 -0
- package/examples/rules/.gitkeep +0 -0
- package/package.json +40 -40
- package/scripts/desktop.ini +2 -0
- package/scripts/gsd-t-dashboard-server.js +19 -2
- package/scripts/gsd-t-dashboard.html +63 -0
- package/scripts/gsd-t-event-writer.js +1 -0
- package/templates/CLAUDE-global.md +30 -9
- package/templates/desktop.ini +2 -0
|
@@ -0,0 +1,143 @@
|
|
|
1
|
+
# GSD-T: Metrics — View Task Telemetry and Process Health
|
|
2
|
+
|
|
3
|
+
You are displaying metrics data from the GSD-T telemetry system. Read JSONL files directly — no module imports needed.
|
|
4
|
+
|
|
5
|
+
## Step 1: Load Metrics Data
|
|
6
|
+
|
|
7
|
+
Read:
|
|
8
|
+
1. `.gsd-t/metrics/task-metrics.jsonl` — per-task telemetry records
|
|
9
|
+
2. `.gsd-t/metrics/rollup.jsonl` — per-milestone aggregation with ELO and heuristics
|
|
10
|
+
3. `.gsd-t/progress.md` — current milestone ID (for default filter)
|
|
11
|
+
|
|
12
|
+
If neither file exists: display "No metrics data yet. Metrics are collected automatically during execute, quick, and debug commands." and stop.
|
|
13
|
+
|
|
14
|
+
If `$ARGUMENTS` contains a milestone ID (e.g., "M25"), use that as the filter. Otherwise, use the current active milestone from progress.md.
|
|
15
|
+
|
|
16
|
+
## Step 2: Display Milestone Summary
|
|
17
|
+
|
|
18
|
+
From `rollup.jsonl`, find the entry matching the target milestone. Display:
|
|
19
|
+
|
|
20
|
+
```
|
|
21
|
+
## Metrics — {milestone}
|
|
22
|
+
|
|
23
|
+
| Metric | Value |
|
|
24
|
+
|---------------------|--------------------------------|
|
|
25
|
+
| Tasks | {total_tasks} |
|
|
26
|
+
| First-pass rate | {first_pass_rate * 100}% |
|
|
27
|
+
| Avg duration | {avg_duration_s}s |
|
|
28
|
+
| Avg context | {avg_context_pct}% |
|
|
29
|
+
| Total fix cycles | {total_fix_cycles} |
|
|
30
|
+
| Total tokens | {total_tokens} |
|
|
31
|
+
```
|
|
32
|
+
|
|
33
|
+
If no rollup entry exists for the milestone, compute summary directly from task-metrics.jsonl records.
|
|
34
|
+
|
|
35
|
+
## Step 3: Display Process ELO
|
|
36
|
+
|
|
37
|
+
```
|
|
38
|
+
## Process ELO
|
|
39
|
+
|
|
40
|
+
{elo_after} ({elo_delta > 0 ? '↑' : '↓'} {elo_delta} from {elo_before})
|
|
41
|
+
```
|
|
42
|
+
|
|
43
|
+
If no previous milestone, show: `{elo_after} (baseline — first milestone)`
|
|
44
|
+
|
|
45
|
+
## Step 4: Display Signal Distribution
|
|
46
|
+
|
|
47
|
+
From rollup `signal_distribution` or computed from task-metrics:
|
|
48
|
+
|
|
49
|
+
```
|
|
50
|
+
## Signal Distribution
|
|
51
|
+
|
|
52
|
+
| Signal Type | Count |
|
|
53
|
+
|------------------|-------|
|
|
54
|
+
| pass-through | {N} |
|
|
55
|
+
| fix-cycle | {N} |
|
|
56
|
+
| debug-invoked | {N} |
|
|
57
|
+
| user-correction | {N} |
|
|
58
|
+
| phase-skip | {N} |
|
|
59
|
+
```
|
|
60
|
+
|
|
61
|
+
## Step 5: Display Domain Breakdown
|
|
62
|
+
|
|
63
|
+
From rollup `domain_breakdown`:
|
|
64
|
+
|
|
65
|
+
```
|
|
66
|
+
## Domain Breakdown
|
|
67
|
+
|
|
68
|
+
| Domain | Tasks | Pass% | Avg Duration |
|
|
69
|
+
|-------------------|-------|-------|--------------|
|
|
70
|
+
| {domain} | {N} | {N}% | {N}s |
|
|
71
|
+
```
|
|
72
|
+
|
|
73
|
+
## Step 6: Display Trend Comparison
|
|
74
|
+
|
|
75
|
+
If `trend_delta` exists (previous milestone data available):
|
|
76
|
+
|
|
77
|
+
```
|
|
78
|
+
## Trend vs Previous Milestone
|
|
79
|
+
|
|
80
|
+
| Metric | Delta |
|
|
81
|
+
|-----------------|--------------------------|
|
|
82
|
+
| First-pass rate | {delta > 0 ? '↑' : '↓'} {delta}% |
|
|
83
|
+
| Avg duration | {delta > 0 ? '↑' : '↓'} {delta}s |
|
|
84
|
+
| ELO | {delta > 0 ? '↑' : '↓'} {delta} |
|
|
85
|
+
```
|
|
86
|
+
|
|
87
|
+
If no previous milestone: "First milestone — no trend data yet."
|
|
88
|
+
|
|
89
|
+
## Step 7: Display Heuristic Anomalies
|
|
90
|
+
|
|
91
|
+
If `heuristic_flags` has entries:
|
|
92
|
+
|
|
93
|
+
```
|
|
94
|
+
## Anomaly Detection
|
|
95
|
+
|
|
96
|
+
| Heuristic | Severity | Description |
|
|
97
|
+
|-------------------------------|----------|--------------------------|
|
|
98
|
+
| {heuristic} | {sev} | {description} |
|
|
99
|
+
```
|
|
100
|
+
|
|
101
|
+
If no anomalies: "No anomalies detected."
|
|
102
|
+
|
|
103
|
+
## Step 8: Cross-Project Comparison (when --cross-project flag present)
|
|
104
|
+
|
|
105
|
+
If `$ARGUMENTS` contains `--cross-project`:
|
|
106
|
+
|
|
107
|
+
1. Run via Bash:
|
|
108
|
+
```bash
|
|
109
|
+
node -e "const g = require('./bin/global-sync-manager.js'); const r = g.compareSignalDistributions(require('./package.json').name || require('path').basename(process.cwd())); console.log(JSON.stringify(r));" 2>/dev/null
|
|
110
|
+
```
|
|
111
|
+
|
|
112
|
+
2. If the result has `insufficient_data: true`, display:
|
|
113
|
+
"No global metrics yet — complete milestones in multiple projects to enable cross-project comparison"
|
|
114
|
+
|
|
115
|
+
3. Otherwise, display the cross-project signal distribution comparison:
|
|
116
|
+
```
|
|
117
|
+
## Cross-Project Signal Distribution
|
|
118
|
+
|
|
119
|
+
| Project | Tasks | Pass-Through | Fix-Cycle | Other |
|
|
120
|
+
|----------------------|-------|--------------|-----------|-------|
|
|
121
|
+
| {project} {★ if is_queried} | {N} | {rate} | {rate} | ... |
|
|
122
|
+
```
|
|
123
|
+
|
|
124
|
+
4. If `$ARGUMENTS` also contains `--domain {domainType}`, run:
|
|
125
|
+
```bash
|
|
126
|
+
node -e "const g = require('./bin/global-sync-manager.js'); const r = g.getDomainTypeComparison('{domainType}'); console.log(JSON.stringify(r));" 2>/dev/null
|
|
127
|
+
```
|
|
128
|
+
Display the domain-type comparison table:
|
|
129
|
+
```
|
|
130
|
+
## Domain-Type Comparison: {domainType}
|
|
131
|
+
|
|
132
|
+
| Project | Tasks | Pass-Through | Fix-Cycle |
|
|
133
|
+
|----------------------|-------|--------------|-----------|
|
|
134
|
+
| {project} | {N} | {count} | {count} |
|
|
135
|
+
```
|
|
136
|
+
|
|
137
|
+
If `--cross-project` is NOT in `$ARGUMENTS`: skip this step entirely (no change to existing behavior).
|
|
138
|
+
|
|
139
|
+
$ARGUMENTS
|
|
140
|
+
|
|
141
|
+
## Auto-Clear
|
|
142
|
+
|
|
143
|
+
All work is committed to project files. Execute `/clear` to free the context window for the next command.
|
package/commands/gsd-t-plan.md
CHANGED
|
@@ -25,6 +25,24 @@ If `.gsd-t/graph/meta.json` exists (graph index is available):
|
|
|
25
25
|
|
|
26
26
|
If graph is not available, skip this step.
|
|
27
27
|
|
|
28
|
+
## Step 1.7: Pre-Mortem — Historical Failure Analysis
|
|
29
|
+
|
|
30
|
+
Before creating task lists, check historical task-metrics for domain-level failure patterns from previous milestones:
|
|
31
|
+
|
|
32
|
+
1. Run via Bash:
|
|
33
|
+
`node -e "const c = require('./bin/metrics-collector.js'); const domains = [/* list domain names from scope files */]; domains.forEach(d => { const w = c.getPreFlightWarnings(d); if(w.length) w.forEach(x => console.log('⚠️ ' + x)); });" 2>/dev/null || true`
|
|
34
|
+
|
|
35
|
+
2. If any domain has `first_pass_rate < 0.6` historically:
|
|
36
|
+
- Display warning inline: `⚠️ Domain {name} has historically low first-pass rate ({rate}%). Consider: smaller tasks, more explicit acceptance criteria, or additional contract detail.`
|
|
37
|
+
- This is **non-blocking** — it informs task design, does not prevent planning.
|
|
38
|
+
|
|
39
|
+
3. If `.gsd-t/metrics/task-metrics.jsonl` does not exist: skip this step silently (first milestone, no historical data).
|
|
40
|
+
|
|
41
|
+
4. **Rule-based pre-mortem**: Run via Bash:
|
|
42
|
+
`node -e "const re = require('./bin/rule-engine.js'); const domains = [/* list domain names */]; domains.forEach(d => { const rules = re.getPreMortemRules(d); if(rules.length) rules.forEach(r => console.log('RULE ' + r.id + ': ' + r.name + ' — historically triggered for domains like ' + d)); });" 2>/dev/null || true`
|
|
43
|
+
|
|
44
|
+
If matching rules found: display warnings inline (non-blocking — informs task design). Falls back gracefully if rules.jsonl does not exist or is empty.
|
|
45
|
+
|
|
28
46
|
## Step 2: Create Task Lists Per Domain
|
|
29
47
|
|
|
30
48
|
### SharedCore-First Pre-Check
|
|
@@ -67,6 +85,30 @@ For each domain, write `.gsd-t/domains/{domain-name}/tasks.md`:
|
|
|
67
85
|
4. **Contract-bound**: Every task that crosses a domain boundary must reference the specific contract it implements
|
|
68
86
|
5. **Ordered**: Tasks within a domain are numbered in execution order
|
|
69
87
|
6. **No implicit knowledge**: Don't assume the executing agent remembers previous tasks — reference contracts and files explicitly
|
|
88
|
+
7. **Context-window fit**: Each task MUST be executable within a single context window. Apply the scope validation heuristics below.
|
|
89
|
+
|
|
90
|
+
### Task Scope Validation
|
|
91
|
+
|
|
92
|
+
After writing each task, apply this heuristic check before finalizing:
|
|
93
|
+
|
|
94
|
+
**Splitting candidates — flag if ANY of these are true:**
|
|
95
|
+
- Task lists **more than 5 files** to modify or create
|
|
96
|
+
- Task has **more than 3 complex dependencies** (other tasks, contracts, or external systems it must read and understand)
|
|
97
|
+
- Task description spans multiple distinct concerns (e.g., "implement X and also refactor Y and update Z docs")
|
|
98
|
+
|
|
99
|
+
**Warning threshold:** If a task is flagged, emit:
|
|
100
|
+
> ⚠️ **Task scope warning — {domain} Task {N}**: Estimated context load is high ({N} files, {N} dependencies). This task may approach the 70% context window threshold. Consider splitting into:
|
|
101
|
+
> - Task {N}a: {first concern}
|
|
102
|
+
> - Task {N}b: {second concern}
|
|
103
|
+
|
|
104
|
+
**Auto-split rule (Level 3 Full Auto):** If a task has >5 files AND >3 dependencies, split it automatically. Renumber subsequent tasks. Document the split rationale in the task's Dependencies field.
|
|
105
|
+
|
|
106
|
+
**Guidance for estimating context size:**
|
|
107
|
+
- Each file to read ≈ 1–5% of context window (varies by file size)
|
|
108
|
+
- CLAUDE.md + scope.md + constraints.md + contracts ≈ 15–25% baseline overhead
|
|
109
|
+
- Tasks with >5 files or >3 cross-domain contracts commonly exceed 70% total context
|
|
110
|
+
|
|
111
|
+
This rule implements the "task must fit in one context window" constraint — a task that compacts its subagent is a task that produces incomplete or corrupt output.
|
|
70
112
|
|
|
71
113
|
### Cross-Domain Duplicate Operation Scan
|
|
72
114
|
|
|
@@ -259,8 +301,13 @@ After subagent returns — run via Bash:
|
|
|
259
301
|
Compute tokens and compaction:
|
|
260
302
|
- No compaction (TOK_END >= TOK_START): `TOKENS=$((TOK_END-TOK_START))`, COMPACTED=null
|
|
261
303
|
- Compaction detected (TOK_END < TOK_START): `TOKENS=$(((TOK_MAX-TOK_START)+TOK_END))`, COMPACTED=$DT_END
|
|
262
|
-
|
|
263
|
-
|
|
304
|
+
Compute context utilization — run via Bash:
|
|
305
|
+
`if [ "${CLAUDE_CONTEXT_TOKENS_MAX:-0}" -gt 0 ]; then CTX_PCT=$(echo "scale=1; ${CLAUDE_CONTEXT_TOKENS_USED:-0} * 100 / ${CLAUDE_CONTEXT_TOKENS_MAX}" | bc); else CTX_PCT="N/A"; fi`
|
|
306
|
+
Alert on context thresholds (display to user inline):
|
|
307
|
+
- If CTX_PCT >= 85: `echo "🔴 CRITICAL: Context at ${CTX_PCT}% — compaction likely. Task MUST be split."`
|
|
308
|
+
- If CTX_PCT >= 70: `echo "⚠️ WARNING: Context at ${CTX_PCT}% — approaching compaction threshold. Consider splitting in plan."`
|
|
309
|
+
Append to `.gsd-t/token-log.md` (create with header `| Datetime-start | Datetime-end | Command | Step | Model | Duration(s) | Notes | Tokens | Compacted | Domain | Task | Ctx% |` if missing):
|
|
310
|
+
`| {DT_START} | {DT_END} | gsd-t-plan | Step 7 | haiku | {DURATION}s | {PASS/FAIL}, iteration {N} | {TOKENS} | {COMPACTED} | | | {CTX_PCT} |`
|
|
264
311
|
If validation FAIL, append each gap to `.gsd-t/qa-issues.md` (create with header `| Date | Command | Step | Model | Duration(s) | Severity | Finding |` if missing):
|
|
265
312
|
`| {DT_START} | gsd-t-plan | Step 7 | haiku | {DURATION}s | medium | {gap description} |`
|
|
266
313
|
|
package/commands/gsd-t-quick.md
CHANGED
|
@@ -83,6 +83,16 @@ When you encounter unexpected situations:
|
|
|
83
83
|
5. Verify it works
|
|
84
84
|
6. Commit: `[quick] {description}`
|
|
85
85
|
|
|
86
|
+
## Step 3.5: Emit Task Metrics
|
|
87
|
+
|
|
88
|
+
After committing, emit a task-metrics record for this quick task — run via Bash:
|
|
89
|
+
`node bin/metrics-collector.js --milestone {current-milestone-or-none} --domain {domain-or-quick} --task quick-{timestamp} --command quick --duration_s {elapsed} --tokens_used {estimated} --context_pct ${CTX_PCT:-0} --pass {true|false} --fix_cycles {0|N} --signal_type {pass-through|fix-cycle} --notes "[quick] {description}" 2>/dev/null || true`
|
|
90
|
+
|
|
91
|
+
Signal type: `pass-through` if task completed on first attempt; `fix-cycle` if rework was needed.
|
|
92
|
+
|
|
93
|
+
Emit task_complete event — run via Bash:
|
|
94
|
+
`node ~/.claude/scripts/gsd-t-event-writer.js --type task_complete --command gsd-t-quick --reasoning "signal_type={signal_type}, domain={domain}" --outcome {success|failure} || true`
|
|
95
|
+
|
|
86
96
|
## Step 4: Document Ripple (if GSD-T is active)
|
|
87
97
|
|
|
88
98
|
If `.gsd-t/progress.md` exists, assess what documentation was affected and update ALL relevant files:
|
|
@@ -123,9 +133,11 @@ Quick does not mean skip testing. Before committing:
|
|
|
123
133
|
- Playwright E2E specs (if UI/routes/flows/modes changed): create new specs for new functionality, update existing specs for changed behavior
|
|
124
134
|
- Cover all modes/flags affected by this change
|
|
125
135
|
- "No feature code without test code" applies to quick tasks too
|
|
126
|
-
2. **Run
|
|
127
|
-
|
|
128
|
-
|
|
136
|
+
2. **Run ALL configured test suites** — not just affected tests, not just one suite:
|
|
137
|
+
a. Detect all runners: check for vitest/jest config, playwright.config.*, cypress.config.*
|
|
138
|
+
b. Run EVERY detected suite. Unit tests alone are NEVER sufficient when E2E exists.
|
|
139
|
+
c. If `playwright.config.*` exists → `npx playwright test` (full suite)
|
|
140
|
+
d. Report ALL results: "Unit: X/Y pass | E2E: X/Y pass"
|
|
129
141
|
- Fix any failures before proceeding (up to 2 attempts)
|
|
130
142
|
3. **Verify against requirements**:
|
|
131
143
|
- Does the change satisfy its intended requirement?
|
package/commands/gsd-t-status.md
CHANGED
|
@@ -57,6 +57,59 @@ If `.gsd-t/backlog.md` exists, read and parse it. Show total count and top 3 ite
|
|
|
57
57
|
If there are blockers or issues, highlight them.
|
|
58
58
|
If the user provides $ARGUMENTS, focus the status on that specific domain or aspect.
|
|
59
59
|
|
|
60
|
+
## Token Usage Breakdown
|
|
61
|
+
|
|
62
|
+
If `.gsd-t/token-log.md` exists, read it and append a token breakdown to the status report.
|
|
63
|
+
|
|
64
|
+
Parse each row in the table. Handle both old format (9 columns) and extended format (12 columns with Domain, Task, Ctx%). Rows with missing or empty Domain column are assigned domain "(untagged)".
|
|
65
|
+
|
|
66
|
+
### Token Usage by Domain
|
|
67
|
+
Group rows by Domain. For each domain, sum Tokens and collect all Ctx% values (ignoring "N/A" and empty). Display:
|
|
68
|
+
|
|
69
|
+
```
|
|
70
|
+
## Token Usage by Domain
|
|
71
|
+
| Domain | Tokens | Subagents | Peak Ctx% |
|
|
72
|
+
|----------------|--------|-----------|-----------|
|
|
73
|
+
| auth | 12,400 | 4 | 14% |
|
|
74
|
+
| notifications | 45,200 | 3 | 52% ⚠️ |
|
|
75
|
+
| (untagged) | 8,100 | 6 | N/A |
|
|
76
|
+
```
|
|
77
|
+
|
|
78
|
+
Flag any domain where Peak Ctx% >= 70 with `⚠️` suffix.
|
|
79
|
+
|
|
80
|
+
### Token Usage by Phase/Command
|
|
81
|
+
Group rows by Command. For each command, sum Tokens and count subagent rows. Display:
|
|
82
|
+
|
|
83
|
+
```
|
|
84
|
+
## Token Usage by Command
|
|
85
|
+
| Command | Tokens | Subagents |
|
|
86
|
+
|---------------|--------|-----------|
|
|
87
|
+
| gsd-t-execute | 86,200 | 14 |
|
|
88
|
+
| gsd-t-wave | 12,400 | 9 |
|
|
89
|
+
| gsd-t-plan | 3,400 | 1 |
|
|
90
|
+
```
|
|
91
|
+
|
|
92
|
+
If token-log.md does not exist or is empty, skip this section entirely (no error).
|
|
93
|
+
|
|
94
|
+
## Process Health
|
|
95
|
+
|
|
96
|
+
If `.gsd-t/metrics/rollup.jsonl` exists, read the latest entry and append to the status report:
|
|
97
|
+
|
|
98
|
+
```
|
|
99
|
+
Process Health:
|
|
100
|
+
ELO: {elo_after} ({elo_delta > 0 ? '↑' : '↓'} {elo_delta})
|
|
101
|
+
Quality: {first_pass_rate * 100}% first-pass rate | {total_fix_cycles} fix cycles
|
|
102
|
+
```
|
|
103
|
+
|
|
104
|
+
If `.gsd-t/metrics/task-metrics.jsonl` exists but no rollup.jsonl, compute first_pass_rate directly from task-metrics for the current milestone and display:
|
|
105
|
+
|
|
106
|
+
```
|
|
107
|
+
Process Health:
|
|
108
|
+
Quality: {rate}% first-pass rate (current milestone, no rollup yet)
|
|
109
|
+
```
|
|
110
|
+
|
|
111
|
+
If neither file exists, skip this section entirely.
|
|
112
|
+
|
|
60
113
|
## Graph Status
|
|
61
114
|
|
|
62
115
|
If `.gsd-t/graph/meta.json` exists, read it and append to the status report:
|
|
@@ -81,6 +134,31 @@ After displaying the project status, check for GSD-T updates:
|
|
|
81
134
|
|
|
82
135
|
5. If versions match, skip — don't show anything
|
|
83
136
|
|
|
137
|
+
## Global ELO & Cross-Project Rankings
|
|
138
|
+
|
|
139
|
+
After the Process Health section, check for global metrics:
|
|
140
|
+
|
|
141
|
+
1. Run via Bash:
|
|
142
|
+
```bash
|
|
143
|
+
node -e "const g = require('./bin/global-sync-manager.js'); const name = (() => { try { return require('./package.json').name; } catch { return require('path').basename(process.cwd()); } })(); const elo = g.getGlobalELO(name); const ranks = g.getProjectRankings(); console.log(JSON.stringify({ elo, ranks, name }));" 2>/dev/null
|
|
144
|
+
```
|
|
145
|
+
|
|
146
|
+
2. If the result returns `elo: null` or the command fails: display "No global metrics yet" and skip.
|
|
147
|
+
|
|
148
|
+
3. If global ELO data exists, display:
|
|
149
|
+
```
|
|
150
|
+
Global ELO: {elo} (rank #{position} of {total} projects)
|
|
151
|
+
```
|
|
152
|
+
Where position is the 1-based index of the current project in the rankings array.
|
|
153
|
+
|
|
154
|
+
4. If 2+ projects have global rollup data, display the top 5 rankings:
|
|
155
|
+
```
|
|
156
|
+
## Cross-Project Rankings (Top 5)
|
|
157
|
+
| Rank | Project | ELO | Latest Milestone |
|
|
158
|
+
|------|------------------|--------|------------------|
|
|
159
|
+
| 1 | {project} | {elo} | {milestone} |
|
|
160
|
+
```
|
|
161
|
+
|
|
84
162
|
$ARGUMENTS
|
|
85
163
|
|
|
86
164
|
## Auto-Clear
|
|
@@ -111,8 +111,8 @@ pytest tests/test_{module}.py -v
|
|
|
111
111
|
npm test -- --testPathPattern="{module}"
|
|
112
112
|
```
|
|
113
113
|
|
|
114
|
-
### B) E2E Tests
|
|
115
|
-
If
|
|
114
|
+
### B) E2E Tests (MANDATORY when config exists)
|
|
115
|
+
If `playwright.config.*` or `cypress.config.*` exists, you MUST run E2E tests — skipping is never acceptable:
|
|
116
116
|
|
|
117
117
|
```bash
|
|
118
118
|
# Playwright
|
package/commands/gsd-t-verify.md
CHANGED
|
@@ -199,6 +199,95 @@ Create or update `.gsd-t/verify-report.md`:
|
|
|
199
199
|
| 2 | ui | Add loading states for async calls | WARN |
|
|
200
200
|
```
|
|
201
201
|
|
|
202
|
+
## Step 5.25: Metrics Quality Budget Check
|
|
203
|
+
|
|
204
|
+
Check task-metrics for the current milestone to detect quality budget violations:
|
|
205
|
+
|
|
206
|
+
1. Run via Bash:
|
|
207
|
+
`node -e "const c = require('./bin/metrics-collector.js'); const r = c.readTaskMetrics({milestone: '{milestone-id}'}); if(!r.length){console.log('No metrics data — quality budget check skipped');process.exit(0);} const pass=r.filter(t=>t.fix_cycles===0&&t.pass).length; const rate=pass/r.length; console.log('First-pass rate: '+(rate*100).toFixed(1)+'% ('+pass+'/'+r.length+')'); if(rate<0.6) console.log('⚠️ Quality budget WARNING: first-pass rate below 60%');" 2>/dev/null || true`
|
|
208
|
+
|
|
209
|
+
2. Run heuristics check via Bash:
|
|
210
|
+
`node -e "const m=require('./bin/metrics-rollup.js'); const r=m.readRollups({milestone:'{milestone-id}'}); if(r.length&&r[r.length-1].heuristic_flags.some(f=>f.severity==='HIGH')) console.log('⚠️ HIGH severity heuristic flag detected — review before completing milestone');" 2>/dev/null || true`
|
|
211
|
+
|
|
212
|
+
3. Display quality metrics summary inline. Quality budget violation is a **WARNING** (non-blocking) — does not fail verify.
|
|
213
|
+
|
|
214
|
+
4. Include quality budget status in the verification report (Step 5):
|
|
215
|
+
`- Quality Budget: {PASS/WARN} — first-pass rate {N}%{, HIGH heuristic: {name} if any}`
|
|
216
|
+
|
|
217
|
+
## Step 5.5: Goal-Backward Verification (Post-Gate Behavior Check)
|
|
218
|
+
|
|
219
|
+
This step runs **after all 8 quality gates pass**. It verifies that milestone goals are actually achieved end-to-end — not just structurally present. It catches placeholder implementations that pass all structural gates.
|
|
220
|
+
|
|
221
|
+
Refer to `.gsd-t/contracts/goal-backward-contract.md` for the full verification flow, placeholder patterns, and findings report format.
|
|
222
|
+
|
|
223
|
+
### 5.5.1 Load Milestone Goals and Requirements
|
|
224
|
+
|
|
225
|
+
1. Read `.gsd-t/progress.md` — extract the current milestone name and goals
|
|
226
|
+
2. Read `docs/requirements.md` — identify **critical requirements** (skip trivial/low-priority items)
|
|
227
|
+
|
|
228
|
+
### 5.5.2 Trace Requirements to Behavior
|
|
229
|
+
|
|
230
|
+
For each critical requirement:
|
|
231
|
+
|
|
232
|
+
1. **If `.gsd-t/graph/meta.json` exists (graph available)**:
|
|
233
|
+
- Trace the requirement → code path → behavior chain using graph queries
|
|
234
|
+
- Use `getRequirementFor`, `getCallers`, and `getTestsFor` to build the chain
|
|
235
|
+
- Flag requirements with no traceable code path as CRITICAL findings
|
|
236
|
+
|
|
237
|
+
2. **If graph is not available (fallback to grep)**:
|
|
238
|
+
- Search the codebase for the feature/function implementing each requirement
|
|
239
|
+
- Trace from entry point → core logic → output/response
|
|
240
|
+
|
|
241
|
+
### 5.5.3 Scan for Placeholder Patterns
|
|
242
|
+
|
|
243
|
+
For each file identified in the requirement traces above, scan for these placeholder patterns:
|
|
244
|
+
|
|
245
|
+
| Pattern | Detection Hint | Severity |
|
|
246
|
+
|---------|---------------|----------|
|
|
247
|
+
| console.log placeholder | `console.log.*TODO\|console.log.*implement` | CRITICAL |
|
|
248
|
+
| TODO/FIXME in implementation | `// TODO\|// FIXME\|# TODO\|# FIXME` in non-test files | CRITICAL |
|
|
249
|
+
| Empty function body | `function \w+\(\) \{\}` or `\(\) => \{\}` with no logic | CRITICAL |
|
|
250
|
+
| Throw not-implemented | `throw new Error.*not implemented\|throw new Error.*TODO` | CRITICAL |
|
|
251
|
+
| Hardcoded return | `return "success"\|return true` with no conditional logic | HIGH |
|
|
252
|
+
| Static UI text | Static `<span>` or text that never updates based on state | HIGH |
|
|
253
|
+
| Pass-through stub | `return input\|return req\|return data` with no transformation | MEDIUM |
|
|
254
|
+
|
|
255
|
+
### 5.5.4 Produce Findings Report
|
|
256
|
+
|
|
257
|
+
Format findings per the goal-backward-contract.md report format:
|
|
258
|
+
|
|
259
|
+
```markdown
|
|
260
|
+
## Goal-Backward Verification Report
|
|
261
|
+
|
|
262
|
+
### Status: PASS | FAIL
|
|
263
|
+
|
|
264
|
+
### Findings
|
|
265
|
+
| # | Requirement | File:Line | Pattern | Severity | Description |
|
|
266
|
+
|---|-------------|-----------|---------|----------|-------------|
|
|
267
|
+
| 1 | {req-id} | {path}:{line} | {pattern} | {severity} | {what's wrong} |
|
|
268
|
+
|
|
269
|
+
### Summary
|
|
270
|
+
- Requirements checked: {N}
|
|
271
|
+
- Findings: {N} ({critical}, {high}, {medium})
|
|
272
|
+
- Verdict: {PASS if 0 critical/high, FAIL otherwise}
|
|
273
|
+
```
|
|
274
|
+
|
|
275
|
+
### 5.5.5 Apply Blocking Rules
|
|
276
|
+
|
|
277
|
+
- **CRITICAL or HIGH findings** → Goal-Backward status = **FAIL** — block verification
|
|
278
|
+
- Append findings to the Critical section of the verification report (Step 5)
|
|
279
|
+
- Set overall verification status to FAIL
|
|
280
|
+
- **MEDIUM findings** → Goal-Backward status = **WARN** — log but do not block
|
|
281
|
+
- Append findings to the Warnings section of the verification report (Step 5)
|
|
282
|
+
- **No findings** → Goal-Backward status = **PASS** — add to verification report summary
|
|
283
|
+
|
|
284
|
+
Add a `Goal-Backward:` line to the Step 5 verification report summary:
|
|
285
|
+
```
|
|
286
|
+
- Goal-Backward: {PASS/WARN/FAIL} — {N} requirements checked, {N} findings ({critical} critical, {high} high, {medium} medium)
|
|
287
|
+
```
|
|
288
|
+
|
|
289
|
+
---
|
|
290
|
+
|
|
202
291
|
## Step 6: Handle Remediation
|
|
203
292
|
|
|
204
293
|
If there are CRITICAL findings:
|
|
@@ -217,15 +306,9 @@ Update `.gsd-t/progress.md`:
|
|
|
217
306
|
|
|
218
307
|
### Autonomy Behavior
|
|
219
308
|
|
|
220
|
-
**
|
|
221
|
-
- VERIFIED
|
|
222
|
-
-
|
|
223
|
-
- FAIL → Auto-execute remediation tasks (up to 2 fix attempts). If still failing after 2 attempts, STOP and report to user.
|
|
224
|
-
|
|
225
|
-
**Level 1–2**:
|
|
226
|
-
- VERIFIED → Milestone complete, proceed to next milestone or ship
|
|
227
|
-
- CONDITIONAL PASS → User decides if warnings are acceptable
|
|
228
|
-
- FAIL → Return to execute phase for remediation tasks
|
|
309
|
+
**All Levels**:
|
|
310
|
+
- VERIFIED or CONDITIONAL PASS → **Auto-invoke complete-milestone** (see Step 8 below). Completing a verified milestone is mechanical — there is no judgment call that benefits from user review.
|
|
311
|
+
- FAIL → **Level 3**: Auto-execute remediation tasks (up to 2 fix attempts). If still failing after 2 attempts, STOP and report to user. **Level 1–2**: Return to execute phase for remediation tasks.
|
|
229
312
|
|
|
230
313
|
## Document Ripple
|
|
231
314
|
|
|
@@ -238,6 +321,54 @@ Update `.gsd-t/progress.md`:
|
|
|
238
321
|
4. **`.gsd-t/techdebt.md`** — If verification found new quality or security issues, add as debt
|
|
239
322
|
5. **`docs/requirements.md`** — If verification revealed unmet requirements, update status
|
|
240
323
|
|
|
324
|
+
## Step 8: Auto-Invoke Complete-Milestone
|
|
325
|
+
|
|
326
|
+
**This step is MANDATORY and runs at ALL autonomy levels.** Completing a verified milestone is a mechanical operation (archive, tag, bump version, update docs). There is no decision that benefits from user review — the decision was made when verification passed.
|
|
327
|
+
|
|
328
|
+
If status is VERIFY-FAILED:
|
|
329
|
+
- Do NOT invoke complete-milestone
|
|
330
|
+
- Report failures and stop
|
|
331
|
+
|
|
332
|
+
If status is VERIFIED or VERIFIED-WITH-WARNINGS:
|
|
333
|
+
1. Log: "✅ Verify complete — spawning complete-milestone agent..."
|
|
334
|
+
|
|
335
|
+
**OBSERVABILITY LOGGING (MANDATORY):**
|
|
336
|
+
Before spawning — run via Bash:
|
|
337
|
+
`T_START=$(date +%s) && DT_START=$(date +"%Y-%m-%d %H:%M") && TOK_START=${CLAUDE_CONTEXT_TOKENS_USED:-0} && TOK_MAX=${CLAUDE_CONTEXT_TOKENS_MAX:-200000}`
|
|
338
|
+
|
|
339
|
+
2. Spawn a Task subagent (model: sonnet, mode: bypassPermissions):
|
|
340
|
+
```
|
|
341
|
+
"Execute the complete-milestone phase of the current GSD-T milestone.
|
|
342
|
+
|
|
343
|
+
Read and follow the full instructions in commands/gsd-t-complete-milestone.md
|
|
344
|
+
(resolve from ~/.claude/commands/ if not in project).
|
|
345
|
+
Read .gsd-t/progress.md for current milestone and state.
|
|
346
|
+
Read CLAUDE.md for project conventions.
|
|
347
|
+
Read .gsd-t/contracts/ for domain interfaces.
|
|
348
|
+
|
|
349
|
+
Complete the phase fully:
|
|
350
|
+
- Follow every step in the command file
|
|
351
|
+
- Update .gsd-t/progress.md status when done
|
|
352
|
+
- Run document ripple as specified
|
|
353
|
+
- Commit your work
|
|
354
|
+
|
|
355
|
+
Report back: one-line status summary."
|
|
356
|
+
```
|
|
357
|
+
|
|
358
|
+
After subagent returns — run via Bash:
|
|
359
|
+
`T_END=$(date +%s) && DT_END=$(date +"%Y-%m-%d %H:%M") && TOK_END=${CLAUDE_CONTEXT_TOKENS_USED:-0} && DURATION=$((T_END-T_START))`
|
|
360
|
+
Compute tokens and compaction:
|
|
361
|
+
- No compaction (TOK_END >= TOK_START): `TOKENS=$((TOK_END-TOK_START))`, COMPACTED=null
|
|
362
|
+
- Compaction detected (TOK_END < TOK_START): `TOKENS=$(((TOK_MAX-TOK_START)+TOK_END))`, COMPACTED=$DT_END
|
|
363
|
+
Append to `.gsd-t/token-log.md`:
|
|
364
|
+
`| {DT_START} | {DT_END} | gsd-t-verify | Step 8 | sonnet | {DURATION}s | auto-complete-milestone | {TOKENS} | {COMPACTED} | | | {CTX_PCT} |`
|
|
365
|
+
|
|
366
|
+
3. Verify subagent result: Read `.gsd-t/progress.md` — confirm status is COMPLETED. If not, report the failure.
|
|
367
|
+
|
|
368
|
+
**Why this is mandatory**: Without auto-completion, verified milestones remain in VERIFIED state indefinitely. Requirements stay unmarked, progress.md is stale, and future sessions cannot tell the work was done. This is the root cause of "GSD-T forgot it did this work" — the milestone was built and verified but never formally completed.
|
|
369
|
+
|
|
370
|
+
**Why a subagent**: Complete-milestone is a 12-step process (gap analysis, archive, version bump, git tag, doc ripple). Verify is already heavy with 8+ quality gates. Spawning a fresh-context subagent avoids compaction risk — and complete-milestone loads everything it needs from files (progress.md, verify-report.md, contracts).
|
|
371
|
+
|
|
241
372
|
$ARGUMENTS
|
|
242
373
|
|
|
243
374
|
## Auto-Clear
|
|
@@ -39,7 +39,17 @@ Run via Bash:
|
|
|
39
39
|
node ~/.claude/scripts/gsd-t-event-writer.js --type command_invoked --command gsd-t-visualize --reasoning "Launching dashboard" || true
|
|
40
40
|
```
|
|
41
41
|
|
|
42
|
-
## Step 1.5:
|
|
42
|
+
## Step 1.5: Context Metrics for Dashboard
|
|
43
|
+
|
|
44
|
+
If `.gsd-t/token-log.md` exists, the dashboard server automatically reads it and provides context utilization metrics for visualization. These metrics are served from the `/api/token-breakdown` endpoint and rendered as:
|
|
45
|
+
|
|
46
|
+
1. **Context utilization timeline** — Ctx% over time, ordered by Datetime-start
|
|
47
|
+
2. **Token breakdown by domain** — bar chart grouping Tokens by Domain column (gracefully handles older rows without Domain column — they are grouped as "(untagged)")
|
|
48
|
+
3. **Compaction proximity warnings** — rows where Ctx% >= 70 are highlighted; rows where Ctx% >= 85 are marked critical (🔴)
|
|
49
|
+
|
|
50
|
+
If `.gsd-t/token-log.md` does not exist, context metrics panels are hidden (not shown as errors).
|
|
51
|
+
|
|
52
|
+
## Step 1.6: Graph Data for Dashboard
|
|
43
53
|
|
|
44
54
|
If `.gsd-t/graph/index.json` exists, the dashboard can render entity-relationship visualizations from the graph data. The dashboard server will detect and serve graph data automatically — no additional configuration needed.
|
|
45
55
|
|
package/commands/gsd-t-wave.md
CHANGED
|
@@ -79,8 +79,13 @@ After phase agent returns — run via Bash:
|
|
|
79
79
|
Compute tokens and compaction:
|
|
80
80
|
- No compaction (TOK_END >= TOK_START): `TOKENS=$((TOK_END-TOK_START))`, COMPACTED=null
|
|
81
81
|
- Compaction detected (TOK_END < TOK_START): `TOKENS=$(((TOK_MAX-TOK_START)+TOK_END))`, COMPACTED=$DT_END
|
|
82
|
-
|
|
83
|
-
|
|
82
|
+
Compute context utilization — run via Bash:
|
|
83
|
+
`if [ "${CLAUDE_CONTEXT_TOKENS_MAX:-0}" -gt 0 ]; then CTX_PCT=$(echo "scale=1; ${CLAUDE_CONTEXT_TOKENS_USED:-0} * 100 / ${CLAUDE_CONTEXT_TOKENS_MAX}" | bc); else CTX_PCT="N/A"; fi`
|
|
84
|
+
Alert on context thresholds (display to user inline):
|
|
85
|
+
- If CTX_PCT >= 85: `echo "🔴 CRITICAL: Context at ${CTX_PCT}% — compaction likely. Task MUST be split."`
|
|
86
|
+
- If CTX_PCT >= 70: `echo "⚠️ WARNING: Context at ${CTX_PCT}% — approaching compaction threshold. Consider splitting in plan."`
|
|
87
|
+
Append to `.gsd-t/token-log.md` (create with header `| Datetime-start | Datetime-end | Command | Step | Model | Duration(s) | Notes | Tokens | Compacted | Domain | Task | Ctx% |` if missing):
|
|
88
|
+
`| {DT_START} | {DT_END} | gsd-t-wave | {PHASE} | sonnet | {DURATION}s | phase: {PHASE} | {TOKENS} | {COMPACTED} | | | {CTX_PCT} |`
|
|
84
89
|
|
|
85
90
|
### Phase Sequence
|
|
86
91
|
|
|
@@ -114,8 +119,13 @@ Spawn agent → `commands/gsd-t-impact.md`
|
|
|
114
119
|
|
|
115
120
|
#### 5. EXECUTE
|
|
116
121
|
Spawn agent → `commands/gsd-t-execute.md`
|
|
117
|
-
- This is the heaviest phase. The execute agent
|
|
118
|
-
- After
|
|
122
|
+
- This is the heaviest phase. The execute agent uses **task-level dispatch** (fresh-dispatch-contract.md): one Task subagent per task within each domain, each receiving only scope.md + relevant contracts + single task + graph context + up to 5 prior summaries. The execute agent handles domain task-dispatching and QA internally.
|
|
123
|
+
- **Adaptive replanning**: After each domain completes, the execute agent runs a replan check (per `adaptive-replan-contract.md`). If a completed domain's task summaries reveal new constraints (e.g., deprecated API, wrong column name, incompatible library), the execute agent checks remaining domains' `tasks.md` files for invalidated assumptions and revises them on disk before dispatching the next domain. Maximum 2 replan cycles per execute run — if exceeded, execution pauses for user input. All replan decisions are logged to the Decision Log in `progress.md`. The wave phase summary includes any replan actions taken.
|
|
124
|
+
- **Team/parallel mode**: If the plan defines parallel domains (same wave), the execute agent dispatches each domain teammate with `isolation: "worktree"` (per worktree-isolation-contract.md). Each domain works in an isolated git worktree. After all domains complete, the execute agent runs the Sequential Merge Protocol: merge domain A → test → merge domain B → test. Per-domain rollback if tests fail. Worktrees are cleaned up after all merges complete.
|
|
125
|
+
- After: Read `progress.md`, verify status = EXECUTED. Phase summary must include replan actions if any occurred:
|
|
126
|
+
```
|
|
127
|
+
📋 Phase 5 (EXECUTE): {N}/{N} tasks done | Replan cycles: {N} | Domains revised: {list or "none"}
|
|
128
|
+
```
|
|
119
129
|
|
|
120
130
|
#### 6. TEST-SYNC
|
|
121
131
|
Spawn agent → `commands/gsd-t-test-sync.md`
|
|
@@ -125,15 +135,19 @@ Spawn agent → `commands/gsd-t-test-sync.md`
|
|
|
125
135
|
Spawn agent → `commands/gsd-t-integrate.md`
|
|
126
136
|
- After: Read `progress.md`, verify status = INTEGRATED
|
|
127
137
|
|
|
128
|
-
#### 8. VERIFY
|
|
138
|
+
#### 8. VERIFY + COMPLETE
|
|
129
139
|
Spawn agent → `commands/gsd-t-verify.md`
|
|
140
|
+
- The verify agent runs all 8 standard quality gates **plus** the goal-backward verification step (Step 5.5 in gsd-t-verify.md), which checks that milestone goals are actually achieved end-to-end and scans for placeholder patterns per `.gsd-t/contracts/goal-backward-contract.md`
|
|
141
|
+
- Goal-backward runs after all structural gates pass — CRITICAL or HIGH findings block verification; MEDIUM findings are warnings only
|
|
142
|
+
- **Verify auto-invokes complete-milestone** (Step 8 of gsd-t-verify.md). The verify agent handles both verification AND milestone completion in a single agent context. Do NOT spawn a separate complete agent.
|
|
130
143
|
- After: Read `progress.md`, check status:
|
|
131
|
-
-
|
|
132
|
-
-
|
|
133
|
-
|
|
134
|
-
|
|
135
|
-
|
|
136
|
-
|
|
144
|
+
- COMPLETED → milestone done (verify passed and auto-completed)
|
|
145
|
+
- VERIFIED → verify passed but complete-milestone failed — spawn a standalone complete agent as fallback
|
|
146
|
+
- VERIFY_FAILED → handle remediation (see Error Recovery) — includes goal-backward failures
|
|
147
|
+
- Phase summary must include the `Goal-Backward:` line from verify-report.md:
|
|
148
|
+
```
|
|
149
|
+
📋 Phase 8 (VERIFY+COMPLETE): {N} gates passed | Goal-Backward: {PASS/WARN/FAIL} — {N} requirements checked, {N} findings
|
|
150
|
+
```
|
|
137
151
|
|
|
138
152
|
### Between Each Phase
|
|
139
153
|
|
|
@@ -286,16 +300,17 @@ If command files in `~/.claude/commands/` are tampered with, wave agents will ex
|
|
|
286
300
|
│ check check check check + check │
|
|
287
301
|
│ gate │
|
|
288
302
|
│ │
|
|
289
|
-
│
|
|
290
|
-
│ │ COMPLETE
|
|
291
|
-
│ │
|
|
292
|
-
│
|
|
293
|
-
│
|
|
294
|
-
│
|
|
295
|
-
│
|
|
303
|
+
│ ┌──────────────────┐ ┌───────────┐ ┌─────────────────┐ │
|
|
304
|
+
│ │ VERIFY+COMPLETE │ ← │ INTEGRATE │ ←──── │ FULL TEST-SYNC │ │
|
|
305
|
+
│ │ agent 8 │ │ agent 7 │ │ agent 6 │ │
|
|
306
|
+
│ └────────┬─────────┘ └─────┬─────┘ └────────┬────────┘ │
|
|
307
|
+
│ ↓ ↓ ↓ │
|
|
308
|
+
│ gate check → status status │
|
|
309
|
+
│ auto-complete check check │
|
|
310
|
+
│ archive + tag │
|
|
296
311
|
│ │
|
|
297
312
|
│ Each agent: fresh context window, reads state from files, dies when done │
|
|
298
|
-
│ Orchestrator: ~30KB total, never compacts
|
|
313
|
+
│ Orchestrator: 8 agents (was 9), ~30KB total, never compacts │
|
|
299
314
|
└──────────────────────────────────────────────────────────────────────────────┘
|
|
300
315
|
```
|
|
301
316
|
|