npm - @massu/core - Versions diffs - 0.5.0 → 0.6.0 - Mend

@massu/core 0.5.0 → 0.6.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (118) hide show

package/README.md +40 -0
package/agents/massu-architecture-reviewer.md +104 -0
package/agents/massu-blast-radius-analyzer.md +84 -0
package/agents/massu-competitive-scorer.md +126 -0
package/agents/massu-help-sync.md +73 -0
package/agents/massu-migration-writer.md +94 -0
package/agents/massu-output-scorer.md +87 -0
package/agents/massu-pattern-reviewer.md +84 -0
package/agents/massu-plan-auditor.md +170 -0
package/agents/massu-schema-sync-verifier.md +70 -0
package/agents/massu-security-reviewer.md +98 -0
package/agents/massu-ux-reviewer.md +106 -0
package/commands/_shared-preamble.md +53 -23
package/commands/_shared-references/auto-learning-protocol.md +71 -0
package/commands/_shared-references/blast-radius-protocol.md +76 -0
package/commands/_shared-references/security-pre-screen.md +64 -0
package/commands/_shared-references/test-first-protocol.md +87 -0
package/commands/_shared-references/verification-table.md +52 -0
package/commands/massu-article-review.md +343 -0
package/commands/massu-autoresearch/references/eval-runner.md +84 -0
package/commands/massu-autoresearch/references/safety-rails.md +125 -0
package/commands/massu-autoresearch/references/scoring-protocol.md +151 -0
package/commands/massu-autoresearch.md +258 -0
package/commands/massu-batch.md +44 -12
package/commands/massu-bearings.md +42 -8
package/commands/massu-checkpoint.md +588 -0
package/commands/massu-ci-fix.md +2 -2
package/commands/massu-command-health.md +132 -0
package/commands/massu-command-improve.md +232 -0
package/commands/massu-commit.md +205 -44
package/commands/massu-create-plan.md +239 -57
package/commands/massu-data/references/common-queries.md +79 -0
package/commands/massu-data/references/table-guide.md +50 -0
package/commands/massu-data.md +66 -0
package/commands/massu-dead-code.md +29 -34
package/commands/massu-debug/references/auto-learning.md +61 -0
package/commands/massu-debug/references/codegraph-tracing.md +80 -0
package/commands/massu-debug/references/common-shortcuts.md +98 -0
package/commands/massu-debug/references/investigation-phases.md +294 -0
package/commands/massu-debug/references/report-format.md +107 -0
package/commands/massu-debug.md +105 -386
package/commands/massu-docs.md +1 -1
package/commands/massu-full-audit.md +61 -0
package/commands/massu-gap-enhancement-analyzer.md +276 -16
package/commands/massu-golden-path/references/approval-points.md +216 -0
package/commands/massu-golden-path/references/competitive-mode.md +273 -0
package/commands/massu-golden-path/references/error-handling.md +121 -0
package/commands/massu-golden-path/references/phase-0-requirements.md +53 -0
package/commands/massu-golden-path/references/phase-1-plan-creation.md +168 -0
package/commands/massu-golden-path/references/phase-2-implementation.md +397 -0
package/commands/massu-golden-path/references/phase-2.5-gap-analyzer.md +156 -0
package/commands/massu-golden-path/references/phase-3-simplify.md +40 -0
package/commands/massu-golden-path/references/phase-4-commit.md +94 -0
package/commands/massu-golden-path/references/phase-5-push.md +116 -0
package/commands/massu-golden-path/references/phase-5.5-production-verify.md +170 -0
package/commands/massu-golden-path/references/phase-6-completion.md +113 -0
package/commands/massu-golden-path/references/qa-evaluator-spec.md +137 -0
package/commands/massu-golden-path/references/sprint-contract-protocol.md +117 -0
package/commands/massu-golden-path/references/vr-visual-calibration.md +73 -0
package/commands/massu-golden-path.md +114 -848
package/commands/massu-guide.md +72 -69
package/commands/massu-hooks.md +27 -12
package/commands/massu-hotfix.md +221 -144
package/commands/massu-incident.md +49 -20
package/commands/massu-infra-audit.md +187 -0
package/commands/massu-learning-audit.md +211 -0
package/commands/massu-loop/references/auto-learning.md +49 -0
package/commands/massu-loop/references/checkpoint-audit.md +40 -0
package/commands/massu-loop/references/guardrails.md +17 -0
package/commands/massu-loop/references/iteration-structure.md +115 -0
package/commands/massu-loop/references/loop-controller.md +188 -0
package/commands/massu-loop/references/plan-extraction.md +78 -0
package/commands/massu-loop/references/vr-plan-spec.md +140 -0
package/commands/massu-loop-playwright.md +9 -9
package/commands/massu-loop.md +115 -670
package/commands/massu-new-pattern.md +423 -0
package/commands/massu-perf.md +422 -0
package/commands/massu-plan-audit.md +1 -1
package/commands/massu-plan.md +389 -122
package/commands/massu-production-verify.md +433 -0
package/commands/massu-push.md +62 -378
package/commands/massu-recap.md +29 -3
package/commands/massu-rollback.md +613 -0
package/commands/massu-scaffold-hook.md +2 -4
package/commands/massu-scaffold-page.md +2 -3
package/commands/massu-scaffold-router.md +1 -2
package/commands/massu-security.md +619 -0
package/commands/massu-simplify.md +115 -85
package/commands/massu-squirrels.md +2 -2
package/commands/massu-tdd.md +38 -22
package/commands/massu-test.md +3 -3
package/commands/massu-type-mismatch-audit.md +469 -0
package/commands/massu-ui-audit.md +587 -0
package/commands/massu-verify-playwright.md +287 -32
package/commands/massu-verify.md +150 -46
package/dist/cli.js +146 -95
package/package.json +6 -2
package/patterns/build-patterns.md +302 -0
package/patterns/component-patterns.md +246 -0
package/patterns/display-patterns.md +185 -0
package/patterns/form-patterns.md +890 -0
package/patterns/integration-testing-checklist.md +445 -0
package/patterns/security-patterns.md +219 -0
package/patterns/testing-patterns.md +569 -0
package/patterns/tool-routing.md +81 -0
package/patterns/ui-patterns.md +371 -0
package/protocols/plan-implementation.md +267 -0
package/protocols/recovery.md +225 -0
package/protocols/verification.md +404 -0
package/reference/command-taxonomy.md +178 -0
package/reference/cr-rules-reference.md +76 -0
package/reference/hook-execution-order.md +148 -0
package/reference/lessons-learned.md +175 -0
package/reference/patterns-quickref.md +208 -0
package/reference/standards.md +135 -0
package/reference/subagents-reference.md +17 -0
package/reference/vr-verification-reference.md +867 -0
package/src/commands/install-commands.ts +149 -53

package/commands/massu-command-health.md ADDED Viewed

@@ -0,0 +1,132 @@
+---
+name: massu-command-health
+description: "When user asks about command quality, says 'how are my commands', 'command health', or wants to see which slash commands have quality issues"
+allowed-tools: Bash(*), Read(*)
+---
+name: massu-command-health
+# Massu Command Health: Quality Score Dashboard
+## Purpose
+Read `.claude/metrics/command-scores.jsonl` and display a summary of command quality over time. This is a READ-ONLY command — it does not modify anything.
+---
+## Data Sources
+### Quality Scores
+Each line in `.claude/metrics/command-scores.jsonl` is a JSON object:
+```json
+{"command":"massu-create-plan","timestamp":"2026-03-18T14:30:00","scores":{"items_have_acceptance_criteria":true,"references_real_tables":true,"ui_items_have_paths":false,"has_vr_types":true,"explicit_counts":true},"pass_rate":"4/5","input_summary":"knowledge-graph-phase-5"}
+```
+### Invocation Frequency
+Each line in `.claude/metrics/command-invocations.jsonl` tracks when a command was used:
+### Autoresearch Runs
+Each line in `.claude/metrics/autoresearch-runs.jsonl` tracks autonomous optimization iterations:
+```json
+{"command":"massu-article-review","iteration":5,"timestamp":"ISO8601","score_before":75,"score_after":87.5,"action":"kept","edit_summary":"Added worked example for gap analysis"}
+```
+```json
+{"skill":"massu-create-plan","timestamp":"2026-03-18T14:30:00Z"}
+```
+---
+## Output Format
+```
+━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
+COMMAND HEALTH — [date]
+━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
+OVERVIEW
+  Total scored runs: [N]
+  Commands tracked:  [N]
+  Date range:        [earliest] — [latest]
+SCORECARD
+  Command                   Last 5 avg    Trend    Weakest check                    Runs
+  ─────────────────────────────────────────────────────────────────────────────────────
+  massu-create-plan          80%           =        ui_items_have_paths (40%)         12
+  massu-loop                 90%           ^        memory_persisted (60%)             8
+  massu-article-review      100%           =        —                                  5
+  massu-plan                 75%           v        zero_gaps_at_exit (50%)            6
+  massu-debug                85%           =        regression_test_added (60%)        4
+  Trend: ^ improving (last 3 > prior 3)  v declining  = stable
+USAGE (last 7 days from command-invocations.jsonl)
+  Command                   Invocations/week
+  ─────────────────────────────────────────
+  massu-create-plan          12
+  massu-debug                 8
+  massu-loop                  6
+  massu-bearings              5
+  massu-commit                4
+  (or "No invocation data yet")
+ALERTS (commands below 60% on last 3 runs)
+  ! massu-plan — 50% on last 3 runs. Weakest: zero_gaps_at_exit
+  (or "No alerts — all commands above threshold")
+CHECK DETAIL (per-command breakdown, last 10 runs)
+  massu-create-plan:
+    items_have_acceptance_criteria   9/10 (90%)
+    references_real_tables           8/10 (80%)
+    ui_items_have_paths              4/10 (40%)  <-- weakest
+    has_vr_types                     7/10 (70%)
+    explicit_counts                  9/10 (90%)
+  [repeat for each command with data]
+AUTORESEARCH (from autoresearch-runs.jsonl)
+  Command               Last run         Iterations   Score: start -> end
+  ─────────────────────────────────────────────────────────────────────────
+  massu-article-review   2026-03-19       12           62% -> 87%
+  (or "No autoresearch runs yet")
+━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
+```
+---
+## Logic
+1. Read `.claude/metrics/command-scores.jsonl` AND `.claude/metrics/command-invocations.jsonl`
+2. If both empty: display "No command data recorded yet. Scores and invocations accumulate automatically as you use commands."
+3. Parse each line as JSON
+4. Group scores by `command`, group invocations by `skill`
+5. For each command:
+   - Count total runs
+   - Calculate average pass rate for last 5 runs
+   - Calculate trend (compare last 3 avg vs prior 3 avg)
+   - Find weakest check (lowest pass rate across all runs)
+   - Per-check breakdown for last 10 runs
+6. Sort by average pass rate (lowest first — worst commands at top)
+7. Flag any command below 60% on last 3 runs as an ALERT
+8. Read `.claude/metrics/autoresearch-runs.jsonl` — group by command, find last run's iteration range, extract start/end scores, display in AUTORESEARCH section
+---
+## Arguments
+Optional: command name to show detail for just one command.
+```
+/massu-command-health                    # Full dashboard
+/massu-command-health massu-create-plan   # Detail for one command
+```
+---
+## START NOW
+1. Read `.claude/metrics/command-scores.jsonl`
+2. Parse and aggregate scores
+3. Display dashboard in the format above
+4. Do NOT modify any files

package/commands/massu-command-improve.md ADDED Viewed

@@ -0,0 +1,232 @@
+---
+name: massu-command-improve
+description: "When user wants to improve a specific command's quality score, says 'improve this command', 'fix command issues', or after command-health shows weak commands"
+allowed-tools: Bash(*), Read(*), Write(*), Edit(*), Grep(*), Glob(*)
+---
+name: massu-command-improve
+# Massu Command Improve: Score-Driven Prompt Optimization
+## Purpose
+Read accumulated command quality scores from `.claude/metrics/command-scores.jsonl`, identify consistently failing checks, propose **one targeted prompt edit at a time** to the command file, and wait for explicit user approval before applying.
+**This is the manual-approval counterpart to autonomous autoresearch.** Scoring is automated; improvements require human judgment.
+---
+## ARGUMENTS
+```
+/massu-command-improve                    # Auto-pick weakest command
+/massu-command-improve massu-debug         # Target specific command
+```
+**Arguments from $ARGUMENTS**: {{ARGUMENTS}}
+---
+## NON-NEGOTIABLE RULES
+1. **ONE change per iteration** — never batch multiple edits. Single-variable perturbation only.
+2. **ALWAYS show the exact diff** — user must see before/after text.
+3. **NEVER apply without approval** — wait for explicit "yes", "approved", "apply it", etc.
+4. **ALWAYS create backup** — save original section to `.claude/metrics/backups/[command]-v[N].md.bak` before editing.
+5. **ALWAYS log the change** — append to `.claude/metrics/command-changelog.jsonl` after applying.
+6. **Target the weakest check** — don't guess what to improve; let the data decide.
+---
+## EXECUTION STEPS
+### Step 1: Load Score Data
+Read `.claude/metrics/command-scores.jsonl`. If empty or missing:
+```
+No command scores recorded yet.
+Scores accumulate automatically as you use instrumented commands:
+  massu-article-review, massu-create-plan, massu-loop, massu-plan, massu-debug
+Run these commands normally and come back when you have 5+ scored runs.
+```
+### Step 2: Analyze Weaknesses
+For each command in the data:
+1. Parse all JSONL lines for that command
+2. Extract individual check pass/fail counts across all runs
+3. Calculate per-check pass rate (e.g., `root_cause_identified`: 7/10 = 70%)
+4. Identify the **single weakest check** (lowest pass rate)
+5. Calculate overall command pass rate (average across all checks)
+If `$ARGUMENTS` specifies a command, use that. Otherwise, auto-select the command with the lowest overall pass rate.
+**Minimum data requirement**: At least 3 scored runs for the target command. If fewer:
+```
+[command] has only [N] scored runs. Need at least 3 for reliable analysis.
+Keep using the command normally — scores accumulate automatically.
+```
+### Step 3: Diagnose the Failing Check
+Read the target command file (`.claude/commands/[command].md`).
+For the weakest check, analyze WHY it might be failing:
+| Check Pattern | Likely Prompt Cause |
+|--------------|---------------------|
+| Check passes < 50% | The command doesn't mention this requirement at all, or buries it |
+| Check passes 50-70% | The requirement exists but is vague or easy to skip |
+| Check passes 70-85% | The requirement exists but lacks a concrete example or enforcement |
+| Check passes > 85% | Healthy — move to next weakest check |
+### Step 4: Propose ONE Edit
+Design a single, targeted prompt edit that addresses the weakest check. The edit should be one of:
+| Edit Type | When to Use | Example |
+|-----------|-------------|---------|
+| **Add explicit rule** | Check requirement is missing from prompt | "Your debug report MUST include a specific root cause statement, not 'probably' or 'might be'" |
+| **Add worked example** | Check exists but is vague | Add a before/after example of what good looks like |
+| **Promote to NON-NEGOTIABLE** | Check exists but is buried | Move the requirement higher in the file, add to the rules table |
+| **Add enforcement step** | Check exists but is easy to skip | Add a numbered step to START NOW that explicitly requires this check |
+| **Add banned pattern** | Check fails because of a specific anti-pattern | "NEVER produce a debug report without [X]" |
+### Step 5: Present to User
+Display:
+```
+━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
+COMMAND IMPROVE — [command]
+━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
+DATA SUMMARY
+  Scored runs:     [N]
+  Overall rate:    [X]%
+  Date range:      [earliest] — [latest]
+WEAKEST CHECK
+  Name:            [check_name]
+  Pass rate:       [X/N] ([Y]%)
+  Diagnosis:       [why it's failing]
+PROPOSED EDIT
+  Type:            [Add rule / Add example / Promote / Add step / Ban pattern]
+  Target section:  [which section of the command file]
+  --- BEFORE ---
+  [exact text being replaced or location of insertion]
+  --- AFTER ---
+  [exact new text]
+  --- END DIFF ---
+RATIONALE
+  This targets [check_name] by [explanation of why this change should help].
+APPROVE?
+  Reply "yes" to apply, "skip" to move to next check, or "no" to stop.
+━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
+```
+### Step 6: Wait for Approval
+**STOP HERE. Do NOT proceed without explicit user approval.**
+- **"yes" / "approved" / "apply it"** → proceed to Step 7
+- **"skip"** → move to next weakest check on the same command, or next command
+- **"no" / "stop"** → end the session, log the rejection
+### Step 7: Apply Change
+1. **Create backup**:
+   - Check `.claude/metrics/backups/` exists (create if not)
+   - Count existing backups for this command to determine version number
+   - Save the ORIGINAL section (not whole file) to `.claude/metrics/backups/[command]-v[N].md.bak`
+2. **Apply the edit** using the Edit tool
+3. **Log the change** — append to `.claude/metrics/command-changelog.jsonl`:
+```json
+{"command":"[command]","timestamp":"ISO8601","action":"applied","target_check":"[check_name]","check_pass_rate_before":"[X/N]","edit_type":"[type]","edit_summary":"[1-line description]","backup_file":"[path]"}
+```
+4. **Confirm to user**:
+```
+Applied. Backup saved to [path].
+Next time you run /[command], the scoring will capture whether this
+change improved [check_name]. Check /massu-command-health after 3+ runs.
+```
+### Step 8: Continue or Stop
+After applying (or skipping), offer to continue:
+- If there are more weak checks on the same command (pass rate < 85%), offer to address the next one
+- If the current command is healthy, offer to move to the next weakest command
+- If all commands are above 85%, report "All commands healthy"
+**Remember: ONE change at a time.** Even if 3 checks are weak, propose and apply them individually so you can measure each improvement's effect.
+---
+## REJECTION LOGGING
+If the user rejects a proposal, log it so future runs don't re-propose the same thing:
+```json
+{"command":"[command]","timestamp":"ISO8601","action":"rejected","target_check":"[check_name]","edit_type":"[type]","edit_summary":"[1-line description]","rejection_reason":"[user's reason if given, or 'no reason provided']"}
+```
+Before proposing an edit, check the changelog for recent rejections of the same check+type combination. If found, try a DIFFERENT edit type for the same check.
+---
+## MEASURING IMPROVEMENT
+After applying a change, the improvement is measured passively:
+1. User continues using the command normally
+2. Silent scoring appends new data to `command-scores.jsonl`
+3. Next time `/massu-command-health` or `/massu-bearings` runs, the trend shows whether the check improved
+4. Next time `/massu-command-improve` runs, it uses the updated data — if the check improved, it moves to the next weakest
+The feedback loop:
+```
+Score → Diagnose → Propose → Approve → Apply → Score again → ...
+```
+---
+## EDGE CASES
+- **No JSONL file**: Tell user to use instrumented commands first
+- **< 3 runs for target command**: Tell user to accumulate more data
+- **All checks > 85%**: "All commands healthy — no improvements needed"
+- **User specifies unknown command**: List the 5 instrumented commands
+- **Changelog shows same check was improved before**: Note this in the proposal ("Previously improved on [date] — may need a different approach")
+---
+## START NOW
+1. Read `.claude/metrics/command-scores.jsonl`
+2. If `$ARGUMENTS` specifies a command, target that; otherwise auto-pick weakest
+3. Analyze per-check pass rates
+4. Identify weakest check
+5. Read the command file
+6. Diagnose why the check fails
+7. Propose ONE targeted edit with exact diff
+8. **WAIT for user approval**
+9. If approved: backup → apply → log → confirm
+10. Offer to continue with next weak check
+---
+## Related Commands
+- `/massu-autoresearch [command]` — Autonomous version. Runs the optimize loop unattended with git-based accept/reject. Use for overnight runs.
+- `/massu-command-health` — Read-only dashboard. Shows scores, trends, and weakest checks.