@massu/core 0.5.0 → 0.6.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (118) hide show
  1. package/README.md +40 -0
  2. package/agents/massu-architecture-reviewer.md +104 -0
  3. package/agents/massu-blast-radius-analyzer.md +84 -0
  4. package/agents/massu-competitive-scorer.md +126 -0
  5. package/agents/massu-help-sync.md +73 -0
  6. package/agents/massu-migration-writer.md +94 -0
  7. package/agents/massu-output-scorer.md +87 -0
  8. package/agents/massu-pattern-reviewer.md +84 -0
  9. package/agents/massu-plan-auditor.md +170 -0
  10. package/agents/massu-schema-sync-verifier.md +70 -0
  11. package/agents/massu-security-reviewer.md +98 -0
  12. package/agents/massu-ux-reviewer.md +106 -0
  13. package/commands/_shared-preamble.md +53 -23
  14. package/commands/_shared-references/auto-learning-protocol.md +71 -0
  15. package/commands/_shared-references/blast-radius-protocol.md +76 -0
  16. package/commands/_shared-references/security-pre-screen.md +64 -0
  17. package/commands/_shared-references/test-first-protocol.md +87 -0
  18. package/commands/_shared-references/verification-table.md +52 -0
  19. package/commands/massu-article-review.md +343 -0
  20. package/commands/massu-autoresearch/references/eval-runner.md +84 -0
  21. package/commands/massu-autoresearch/references/safety-rails.md +125 -0
  22. package/commands/massu-autoresearch/references/scoring-protocol.md +151 -0
  23. package/commands/massu-autoresearch.md +258 -0
  24. package/commands/massu-batch.md +44 -12
  25. package/commands/massu-bearings.md +42 -8
  26. package/commands/massu-checkpoint.md +588 -0
  27. package/commands/massu-ci-fix.md +2 -2
  28. package/commands/massu-command-health.md +132 -0
  29. package/commands/massu-command-improve.md +232 -0
  30. package/commands/massu-commit.md +205 -44
  31. package/commands/massu-create-plan.md +239 -57
  32. package/commands/massu-data/references/common-queries.md +79 -0
  33. package/commands/massu-data/references/table-guide.md +50 -0
  34. package/commands/massu-data.md +66 -0
  35. package/commands/massu-dead-code.md +29 -34
  36. package/commands/massu-debug/references/auto-learning.md +61 -0
  37. package/commands/massu-debug/references/codegraph-tracing.md +80 -0
  38. package/commands/massu-debug/references/common-shortcuts.md +98 -0
  39. package/commands/massu-debug/references/investigation-phases.md +294 -0
  40. package/commands/massu-debug/references/report-format.md +107 -0
  41. package/commands/massu-debug.md +105 -386
  42. package/commands/massu-docs.md +1 -1
  43. package/commands/massu-full-audit.md +61 -0
  44. package/commands/massu-gap-enhancement-analyzer.md +276 -16
  45. package/commands/massu-golden-path/references/approval-points.md +216 -0
  46. package/commands/massu-golden-path/references/competitive-mode.md +273 -0
  47. package/commands/massu-golden-path/references/error-handling.md +121 -0
  48. package/commands/massu-golden-path/references/phase-0-requirements.md +53 -0
  49. package/commands/massu-golden-path/references/phase-1-plan-creation.md +168 -0
  50. package/commands/massu-golden-path/references/phase-2-implementation.md +397 -0
  51. package/commands/massu-golden-path/references/phase-2.5-gap-analyzer.md +156 -0
  52. package/commands/massu-golden-path/references/phase-3-simplify.md +40 -0
  53. package/commands/massu-golden-path/references/phase-4-commit.md +94 -0
  54. package/commands/massu-golden-path/references/phase-5-push.md +116 -0
  55. package/commands/massu-golden-path/references/phase-5.5-production-verify.md +170 -0
  56. package/commands/massu-golden-path/references/phase-6-completion.md +113 -0
  57. package/commands/massu-golden-path/references/qa-evaluator-spec.md +137 -0
  58. package/commands/massu-golden-path/references/sprint-contract-protocol.md +117 -0
  59. package/commands/massu-golden-path/references/vr-visual-calibration.md +73 -0
  60. package/commands/massu-golden-path.md +114 -848
  61. package/commands/massu-guide.md +72 -69
  62. package/commands/massu-hooks.md +27 -12
  63. package/commands/massu-hotfix.md +221 -144
  64. package/commands/massu-incident.md +49 -20
  65. package/commands/massu-infra-audit.md +187 -0
  66. package/commands/massu-learning-audit.md +211 -0
  67. package/commands/massu-loop/references/auto-learning.md +49 -0
  68. package/commands/massu-loop/references/checkpoint-audit.md +40 -0
  69. package/commands/massu-loop/references/guardrails.md +17 -0
  70. package/commands/massu-loop/references/iteration-structure.md +115 -0
  71. package/commands/massu-loop/references/loop-controller.md +188 -0
  72. package/commands/massu-loop/references/plan-extraction.md +78 -0
  73. package/commands/massu-loop/references/vr-plan-spec.md +140 -0
  74. package/commands/massu-loop-playwright.md +9 -9
  75. package/commands/massu-loop.md +115 -670
  76. package/commands/massu-new-pattern.md +423 -0
  77. package/commands/massu-perf.md +422 -0
  78. package/commands/massu-plan-audit.md +1 -1
  79. package/commands/massu-plan.md +389 -122
  80. package/commands/massu-production-verify.md +433 -0
  81. package/commands/massu-push.md +62 -378
  82. package/commands/massu-recap.md +29 -3
  83. package/commands/massu-rollback.md +613 -0
  84. package/commands/massu-scaffold-hook.md +2 -4
  85. package/commands/massu-scaffold-page.md +2 -3
  86. package/commands/massu-scaffold-router.md +1 -2
  87. package/commands/massu-security.md +619 -0
  88. package/commands/massu-simplify.md +115 -85
  89. package/commands/massu-squirrels.md +2 -2
  90. package/commands/massu-tdd.md +38 -22
  91. package/commands/massu-test.md +3 -3
  92. package/commands/massu-type-mismatch-audit.md +469 -0
  93. package/commands/massu-ui-audit.md +587 -0
  94. package/commands/massu-verify-playwright.md +287 -32
  95. package/commands/massu-verify.md +150 -46
  96. package/dist/cli.js +146 -95
  97. package/package.json +6 -2
  98. package/patterns/build-patterns.md +302 -0
  99. package/patterns/component-patterns.md +246 -0
  100. package/patterns/display-patterns.md +185 -0
  101. package/patterns/form-patterns.md +890 -0
  102. package/patterns/integration-testing-checklist.md +445 -0
  103. package/patterns/security-patterns.md +219 -0
  104. package/patterns/testing-patterns.md +569 -0
  105. package/patterns/tool-routing.md +81 -0
  106. package/patterns/ui-patterns.md +371 -0
  107. package/protocols/plan-implementation.md +267 -0
  108. package/protocols/recovery.md +225 -0
  109. package/protocols/verification.md +404 -0
  110. package/reference/command-taxonomy.md +178 -0
  111. package/reference/cr-rules-reference.md +76 -0
  112. package/reference/hook-execution-order.md +148 -0
  113. package/reference/lessons-learned.md +175 -0
  114. package/reference/patterns-quickref.md +208 -0
  115. package/reference/standards.md +135 -0
  116. package/reference/subagents-reference.md +17 -0
  117. package/reference/vr-verification-reference.md +867 -0
  118. package/src/commands/install-commands.ts +149 -53
@@ -0,0 +1,132 @@
1
+ ---
2
+ name: massu-command-health
3
+ description: "When user asks about command quality, says 'how are my commands', 'command health', or wants to see which slash commands have quality issues"
4
+ allowed-tools: Bash(*), Read(*)
5
+ ---
6
+ name: massu-command-health
7
+
8
+ # Massu Command Health: Quality Score Dashboard
9
+
10
+ ## Purpose
11
+
12
+ Read `.claude/metrics/command-scores.jsonl` and display a summary of command quality over time. This is a READ-ONLY command — it does not modify anything.
13
+
14
+ ---
15
+
16
+ ## Data Sources
17
+
18
+ ### Quality Scores
19
+ Each line in `.claude/metrics/command-scores.jsonl` is a JSON object:
20
+
21
+ ```json
22
+ {"command":"massu-create-plan","timestamp":"2026-03-18T14:30:00","scores":{"items_have_acceptance_criteria":true,"references_real_tables":true,"ui_items_have_paths":false,"has_vr_types":true,"explicit_counts":true},"pass_rate":"4/5","input_summary":"knowledge-graph-phase-5"}
23
+ ```
24
+
25
+ ### Invocation Frequency
26
+ Each line in `.claude/metrics/command-invocations.jsonl` tracks when a command was used:
27
+
28
+ ### Autoresearch Runs
29
+ Each line in `.claude/metrics/autoresearch-runs.jsonl` tracks autonomous optimization iterations:
30
+ ```json
31
+ {"command":"massu-article-review","iteration":5,"timestamp":"ISO8601","score_before":75,"score_after":87.5,"action":"kept","edit_summary":"Added worked example for gap analysis"}
32
+ ```
33
+
34
+ ```json
35
+ {"skill":"massu-create-plan","timestamp":"2026-03-18T14:30:00Z"}
36
+ ```
37
+
38
+ ---
39
+
40
+ ## Output Format
41
+
42
+ ```
43
+ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
44
+ COMMAND HEALTH — [date]
45
+ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
46
+
47
+ OVERVIEW
48
+ Total scored runs: [N]
49
+ Commands tracked: [N]
50
+ Date range: [earliest] — [latest]
51
+
52
+ SCORECARD
53
+ Command Last 5 avg Trend Weakest check Runs
54
+ ─────────────────────────────────────────────────────────────────────────────────────
55
+ massu-create-plan 80% = ui_items_have_paths (40%) 12
56
+ massu-loop 90% ^ memory_persisted (60%) 8
57
+ massu-article-review 100% = — 5
58
+ massu-plan 75% v zero_gaps_at_exit (50%) 6
59
+ massu-debug 85% = regression_test_added (60%) 4
60
+
61
+ Trend: ^ improving (last 3 > prior 3) v declining = stable
62
+
63
+ USAGE (last 7 days from command-invocations.jsonl)
64
+ Command Invocations/week
65
+ ─────────────────────────────────────────
66
+ massu-create-plan 12
67
+ massu-debug 8
68
+ massu-loop 6
69
+ massu-bearings 5
70
+ massu-commit 4
71
+ (or "No invocation data yet")
72
+
73
+ ALERTS (commands below 60% on last 3 runs)
74
+ ! massu-plan — 50% on last 3 runs. Weakest: zero_gaps_at_exit
75
+ (or "No alerts — all commands above threshold")
76
+
77
+ CHECK DETAIL (per-command breakdown, last 10 runs)
78
+ massu-create-plan:
79
+ items_have_acceptance_criteria 9/10 (90%)
80
+ references_real_tables 8/10 (80%)
81
+ ui_items_have_paths 4/10 (40%) <-- weakest
82
+ has_vr_types 7/10 (70%)
83
+ explicit_counts 9/10 (90%)
84
+
85
+ [repeat for each command with data]
86
+
87
+ AUTORESEARCH (from autoresearch-runs.jsonl)
88
+ Command Last run Iterations Score: start -> end
89
+ ─────────────────────────────────────────────────────────────────────────
90
+ massu-article-review 2026-03-19 12 62% -> 87%
91
+ (or "No autoresearch runs yet")
92
+
93
+ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
94
+ ```
95
+
96
+ ---
97
+
98
+ ## Logic
99
+
100
+ 1. Read `.claude/metrics/command-scores.jsonl` AND `.claude/metrics/command-invocations.jsonl`
101
+ 2. If both empty: display "No command data recorded yet. Scores and invocations accumulate automatically as you use commands."
102
+ 3. Parse each line as JSON
103
+ 4. Group scores by `command`, group invocations by `skill`
104
+ 5. For each command:
105
+ - Count total runs
106
+ - Calculate average pass rate for last 5 runs
107
+ - Calculate trend (compare last 3 avg vs prior 3 avg)
108
+ - Find weakest check (lowest pass rate across all runs)
109
+ - Per-check breakdown for last 10 runs
110
+ 6. Sort by average pass rate (lowest first — worst commands at top)
111
+ 7. Flag any command below 60% on last 3 runs as an ALERT
112
+ 8. Read `.claude/metrics/autoresearch-runs.jsonl` — group by command, find last run's iteration range, extract start/end scores, display in AUTORESEARCH section
113
+
114
+ ---
115
+
116
+ ## Arguments
117
+
118
+ Optional: command name to show detail for just one command.
119
+
120
+ ```
121
+ /massu-command-health # Full dashboard
122
+ /massu-command-health massu-create-plan # Detail for one command
123
+ ```
124
+
125
+ ---
126
+
127
+ ## START NOW
128
+
129
+ 1. Read `.claude/metrics/command-scores.jsonl`
130
+ 2. Parse and aggregate scores
131
+ 3. Display dashboard in the format above
132
+ 4. Do NOT modify any files
@@ -0,0 +1,232 @@
1
+ ---
2
+ name: massu-command-improve
3
+ description: "When user wants to improve a specific command's quality score, says 'improve this command', 'fix command issues', or after command-health shows weak commands"
4
+ allowed-tools: Bash(*), Read(*), Write(*), Edit(*), Grep(*), Glob(*)
5
+ ---
6
+ name: massu-command-improve
7
+
8
+ # Massu Command Improve: Score-Driven Prompt Optimization
9
+
10
+ ## Purpose
11
+
12
+ Read accumulated command quality scores from `.claude/metrics/command-scores.jsonl`, identify consistently failing checks, propose **one targeted prompt edit at a time** to the command file, and wait for explicit user approval before applying.
13
+
14
+ **This is the manual-approval counterpart to autonomous autoresearch.** Scoring is automated; improvements require human judgment.
15
+
16
+ ---
17
+
18
+ ## ARGUMENTS
19
+
20
+ ```
21
+ /massu-command-improve # Auto-pick weakest command
22
+ /massu-command-improve massu-debug # Target specific command
23
+ ```
24
+
25
+ **Arguments from $ARGUMENTS**: {{ARGUMENTS}}
26
+
27
+ ---
28
+
29
+ ## NON-NEGOTIABLE RULES
30
+
31
+ 1. **ONE change per iteration** — never batch multiple edits. Single-variable perturbation only.
32
+ 2. **ALWAYS show the exact diff** — user must see before/after text.
33
+ 3. **NEVER apply without approval** — wait for explicit "yes", "approved", "apply it", etc.
34
+ 4. **ALWAYS create backup** — save original section to `.claude/metrics/backups/[command]-v[N].md.bak` before editing.
35
+ 5. **ALWAYS log the change** — append to `.claude/metrics/command-changelog.jsonl` after applying.
36
+ 6. **Target the weakest check** — don't guess what to improve; let the data decide.
37
+
38
+ ---
39
+
40
+ ## EXECUTION STEPS
41
+
42
+ ### Step 1: Load Score Data
43
+
44
+ Read `.claude/metrics/command-scores.jsonl`. If empty or missing:
45
+ ```
46
+ No command scores recorded yet.
47
+ Scores accumulate automatically as you use instrumented commands:
48
+ massu-article-review, massu-create-plan, massu-loop, massu-plan, massu-debug
49
+
50
+ Run these commands normally and come back when you have 5+ scored runs.
51
+ ```
52
+
53
+ ### Step 2: Analyze Weaknesses
54
+
55
+ For each command in the data:
56
+
57
+ 1. Parse all JSONL lines for that command
58
+ 2. Extract individual check pass/fail counts across all runs
59
+ 3. Calculate per-check pass rate (e.g., `root_cause_identified`: 7/10 = 70%)
60
+ 4. Identify the **single weakest check** (lowest pass rate)
61
+ 5. Calculate overall command pass rate (average across all checks)
62
+
63
+ If `$ARGUMENTS` specifies a command, use that. Otherwise, auto-select the command with the lowest overall pass rate.
64
+
65
+ **Minimum data requirement**: At least 3 scored runs for the target command. If fewer:
66
+ ```
67
+ [command] has only [N] scored runs. Need at least 3 for reliable analysis.
68
+ Keep using the command normally — scores accumulate automatically.
69
+ ```
70
+
71
+ ### Step 3: Diagnose the Failing Check
72
+
73
+ Read the target command file (`.claude/commands/[command].md`).
74
+
75
+ For the weakest check, analyze WHY it might be failing:
76
+
77
+ | Check Pattern | Likely Prompt Cause |
78
+ |--------------|---------------------|
79
+ | Check passes < 50% | The command doesn't mention this requirement at all, or buries it |
80
+ | Check passes 50-70% | The requirement exists but is vague or easy to skip |
81
+ | Check passes 70-85% | The requirement exists but lacks a concrete example or enforcement |
82
+ | Check passes > 85% | Healthy — move to next weakest check |
83
+
84
+ ### Step 4: Propose ONE Edit
85
+
86
+ Design a single, targeted prompt edit that addresses the weakest check. The edit should be one of:
87
+
88
+ | Edit Type | When to Use | Example |
89
+ |-----------|-------------|---------|
90
+ | **Add explicit rule** | Check requirement is missing from prompt | "Your debug report MUST include a specific root cause statement, not 'probably' or 'might be'" |
91
+ | **Add worked example** | Check exists but is vague | Add a before/after example of what good looks like |
92
+ | **Promote to NON-NEGOTIABLE** | Check exists but is buried | Move the requirement higher in the file, add to the rules table |
93
+ | **Add enforcement step** | Check exists but is easy to skip | Add a numbered step to START NOW that explicitly requires this check |
94
+ | **Add banned pattern** | Check fails because of a specific anti-pattern | "NEVER produce a debug report without [X]" |
95
+
96
+ ### Step 5: Present to User
97
+
98
+ Display:
99
+
100
+ ```
101
+ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
102
+ COMMAND IMPROVE — [command]
103
+ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
104
+
105
+ DATA SUMMARY
106
+ Scored runs: [N]
107
+ Overall rate: [X]%
108
+ Date range: [earliest] — [latest]
109
+
110
+ WEAKEST CHECK
111
+ Name: [check_name]
112
+ Pass rate: [X/N] ([Y]%)
113
+ Diagnosis: [why it's failing]
114
+
115
+ PROPOSED EDIT
116
+ Type: [Add rule / Add example / Promote / Add step / Ban pattern]
117
+ Target section: [which section of the command file]
118
+
119
+ --- BEFORE ---
120
+ [exact text being replaced or location of insertion]
121
+
122
+ --- AFTER ---
123
+ [exact new text]
124
+
125
+ --- END DIFF ---
126
+
127
+ RATIONALE
128
+ This targets [check_name] by [explanation of why this change should help].
129
+
130
+ APPROVE?
131
+ Reply "yes" to apply, "skip" to move to next check, or "no" to stop.
132
+ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
133
+ ```
134
+
135
+ ### Step 6: Wait for Approval
136
+
137
+ **STOP HERE. Do NOT proceed without explicit user approval.**
138
+
139
+ - **"yes" / "approved" / "apply it"** → proceed to Step 7
140
+ - **"skip"** → move to next weakest check on the same command, or next command
141
+ - **"no" / "stop"** → end the session, log the rejection
142
+
143
+ ### Step 7: Apply Change
144
+
145
+ 1. **Create backup**:
146
+ - Check `.claude/metrics/backups/` exists (create if not)
147
+ - Count existing backups for this command to determine version number
148
+ - Save the ORIGINAL section (not whole file) to `.claude/metrics/backups/[command]-v[N].md.bak`
149
+
150
+ 2. **Apply the edit** using the Edit tool
151
+
152
+ 3. **Log the change** — append to `.claude/metrics/command-changelog.jsonl`:
153
+ ```json
154
+ {"command":"[command]","timestamp":"ISO8601","action":"applied","target_check":"[check_name]","check_pass_rate_before":"[X/N]","edit_type":"[type]","edit_summary":"[1-line description]","backup_file":"[path]"}
155
+ ```
156
+
157
+ 4. **Confirm to user**:
158
+ ```
159
+ Applied. Backup saved to [path].
160
+
161
+ Next time you run /[command], the scoring will capture whether this
162
+ change improved [check_name]. Check /massu-command-health after 3+ runs.
163
+ ```
164
+
165
+ ### Step 8: Continue or Stop
166
+
167
+ After applying (or skipping), offer to continue:
168
+ - If there are more weak checks on the same command (pass rate < 85%), offer to address the next one
169
+ - If the current command is healthy, offer to move to the next weakest command
170
+ - If all commands are above 85%, report "All commands healthy"
171
+
172
+ **Remember: ONE change at a time.** Even if 3 checks are weak, propose and apply them individually so you can measure each improvement's effect.
173
+
174
+ ---
175
+
176
+ ## REJECTION LOGGING
177
+
178
+ If the user rejects a proposal, log it so future runs don't re-propose the same thing:
179
+
180
+ ```json
181
+ {"command":"[command]","timestamp":"ISO8601","action":"rejected","target_check":"[check_name]","edit_type":"[type]","edit_summary":"[1-line description]","rejection_reason":"[user's reason if given, or 'no reason provided']"}
182
+ ```
183
+
184
+ Before proposing an edit, check the changelog for recent rejections of the same check+type combination. If found, try a DIFFERENT edit type for the same check.
185
+
186
+ ---
187
+
188
+ ## MEASURING IMPROVEMENT
189
+
190
+ After applying a change, the improvement is measured passively:
191
+
192
+ 1. User continues using the command normally
193
+ 2. Silent scoring appends new data to `command-scores.jsonl`
194
+ 3. Next time `/massu-command-health` or `/massu-bearings` runs, the trend shows whether the check improved
195
+ 4. Next time `/massu-command-improve` runs, it uses the updated data — if the check improved, it moves to the next weakest
196
+
197
+ The feedback loop:
198
+ ```
199
+ Score → Diagnose → Propose → Approve → Apply → Score again → ...
200
+ ```
201
+
202
+ ---
203
+
204
+ ## EDGE CASES
205
+
206
+ - **No JSONL file**: Tell user to use instrumented commands first
207
+ - **< 3 runs for target command**: Tell user to accumulate more data
208
+ - **All checks > 85%**: "All commands healthy — no improvements needed"
209
+ - **User specifies unknown command**: List the 5 instrumented commands
210
+ - **Changelog shows same check was improved before**: Note this in the proposal ("Previously improved on [date] — may need a different approach")
211
+
212
+ ---
213
+
214
+ ## START NOW
215
+
216
+ 1. Read `.claude/metrics/command-scores.jsonl`
217
+ 2. If `$ARGUMENTS` specifies a command, target that; otherwise auto-pick weakest
218
+ 3. Analyze per-check pass rates
219
+ 4. Identify weakest check
220
+ 5. Read the command file
221
+ 6. Diagnose why the check fails
222
+ 7. Propose ONE targeted edit with exact diff
223
+ 8. **WAIT for user approval**
224
+ 9. If approved: backup → apply → log → confirm
225
+ 10. Offer to continue with next weak check
226
+
227
+ ---
228
+
229
+ ## Related Commands
230
+
231
+ - `/massu-autoresearch [command]` — Autonomous version. Runs the optimize loop unattended with git-based accept/reject. Use for overnight runs.
232
+ - `/massu-command-health` — Read-only dashboard. Shows scores, trends, and weakest checks.