npm - @massu/core - Versions diffs - 0.5.0 → 0.6.0 - Mend

@massu/core 0.5.0 → 0.6.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (118) hide show

package/README.md +40 -0
package/agents/massu-architecture-reviewer.md +104 -0
package/agents/massu-blast-radius-analyzer.md +84 -0
package/agents/massu-competitive-scorer.md +126 -0
package/agents/massu-help-sync.md +73 -0
package/agents/massu-migration-writer.md +94 -0
package/agents/massu-output-scorer.md +87 -0
package/agents/massu-pattern-reviewer.md +84 -0
package/agents/massu-plan-auditor.md +170 -0
package/agents/massu-schema-sync-verifier.md +70 -0
package/agents/massu-security-reviewer.md +98 -0
package/agents/massu-ux-reviewer.md +106 -0
package/commands/_shared-preamble.md +53 -23
package/commands/_shared-references/auto-learning-protocol.md +71 -0
package/commands/_shared-references/blast-radius-protocol.md +76 -0
package/commands/_shared-references/security-pre-screen.md +64 -0
package/commands/_shared-references/test-first-protocol.md +87 -0
package/commands/_shared-references/verification-table.md +52 -0
package/commands/massu-article-review.md +343 -0
package/commands/massu-autoresearch/references/eval-runner.md +84 -0
package/commands/massu-autoresearch/references/safety-rails.md +125 -0
package/commands/massu-autoresearch/references/scoring-protocol.md +151 -0
package/commands/massu-autoresearch.md +258 -0
package/commands/massu-batch.md +44 -12
package/commands/massu-bearings.md +42 -8
package/commands/massu-checkpoint.md +588 -0
package/commands/massu-ci-fix.md +2 -2
package/commands/massu-command-health.md +132 -0
package/commands/massu-command-improve.md +232 -0
package/commands/massu-commit.md +205 -44
package/commands/massu-create-plan.md +239 -57
package/commands/massu-data/references/common-queries.md +79 -0
package/commands/massu-data/references/table-guide.md +50 -0
package/commands/massu-data.md +66 -0
package/commands/massu-dead-code.md +29 -34
package/commands/massu-debug/references/auto-learning.md +61 -0
package/commands/massu-debug/references/codegraph-tracing.md +80 -0
package/commands/massu-debug/references/common-shortcuts.md +98 -0
package/commands/massu-debug/references/investigation-phases.md +294 -0
package/commands/massu-debug/references/report-format.md +107 -0
package/commands/massu-debug.md +105 -386
package/commands/massu-docs.md +1 -1
package/commands/massu-full-audit.md +61 -0
package/commands/massu-gap-enhancement-analyzer.md +276 -16
package/commands/massu-golden-path/references/approval-points.md +216 -0
package/commands/massu-golden-path/references/competitive-mode.md +273 -0
package/commands/massu-golden-path/references/error-handling.md +121 -0
package/commands/massu-golden-path/references/phase-0-requirements.md +53 -0
package/commands/massu-golden-path/references/phase-1-plan-creation.md +168 -0
package/commands/massu-golden-path/references/phase-2-implementation.md +397 -0
package/commands/massu-golden-path/references/phase-2.5-gap-analyzer.md +156 -0
package/commands/massu-golden-path/references/phase-3-simplify.md +40 -0
package/commands/massu-golden-path/references/phase-4-commit.md +94 -0
package/commands/massu-golden-path/references/phase-5-push.md +116 -0
package/commands/massu-golden-path/references/phase-5.5-production-verify.md +170 -0
package/commands/massu-golden-path/references/phase-6-completion.md +113 -0
package/commands/massu-golden-path/references/qa-evaluator-spec.md +137 -0
package/commands/massu-golden-path/references/sprint-contract-protocol.md +117 -0
package/commands/massu-golden-path/references/vr-visual-calibration.md +73 -0
package/commands/massu-golden-path.md +114 -848
package/commands/massu-guide.md +72 -69
package/commands/massu-hooks.md +27 -12
package/commands/massu-hotfix.md +221 -144
package/commands/massu-incident.md +49 -20
package/commands/massu-infra-audit.md +187 -0
package/commands/massu-learning-audit.md +211 -0
package/commands/massu-loop/references/auto-learning.md +49 -0
package/commands/massu-loop/references/checkpoint-audit.md +40 -0
package/commands/massu-loop/references/guardrails.md +17 -0
package/commands/massu-loop/references/iteration-structure.md +115 -0
package/commands/massu-loop/references/loop-controller.md +188 -0
package/commands/massu-loop/references/plan-extraction.md +78 -0
package/commands/massu-loop/references/vr-plan-spec.md +140 -0
package/commands/massu-loop-playwright.md +9 -9
package/commands/massu-loop.md +115 -670
package/commands/massu-new-pattern.md +423 -0
package/commands/massu-perf.md +422 -0
package/commands/massu-plan-audit.md +1 -1
package/commands/massu-plan.md +389 -122
package/commands/massu-production-verify.md +433 -0
package/commands/massu-push.md +62 -378
package/commands/massu-recap.md +29 -3
package/commands/massu-rollback.md +613 -0
package/commands/massu-scaffold-hook.md +2 -4
package/commands/massu-scaffold-page.md +2 -3
package/commands/massu-scaffold-router.md +1 -2
package/commands/massu-security.md +619 -0
package/commands/massu-simplify.md +115 -85
package/commands/massu-squirrels.md +2 -2
package/commands/massu-tdd.md +38 -22
package/commands/massu-test.md +3 -3
package/commands/massu-type-mismatch-audit.md +469 -0
package/commands/massu-ui-audit.md +587 -0
package/commands/massu-verify-playwright.md +287 -32
package/commands/massu-verify.md +150 -46
package/dist/cli.js +146 -95
package/package.json +6 -2
package/patterns/build-patterns.md +302 -0
package/patterns/component-patterns.md +246 -0
package/patterns/display-patterns.md +185 -0
package/patterns/form-patterns.md +890 -0
package/patterns/integration-testing-checklist.md +445 -0
package/patterns/security-patterns.md +219 -0
package/patterns/testing-patterns.md +569 -0
package/patterns/tool-routing.md +81 -0
package/patterns/ui-patterns.md +371 -0
package/protocols/plan-implementation.md +267 -0
package/protocols/recovery.md +225 -0
package/protocols/verification.md +404 -0
package/reference/command-taxonomy.md +178 -0
package/reference/cr-rules-reference.md +76 -0
package/reference/hook-execution-order.md +148 -0
package/reference/lessons-learned.md +175 -0
package/reference/patterns-quickref.md +208 -0
package/reference/standards.md +135 -0
package/reference/subagents-reference.md +17 -0
package/reference/vr-verification-reference.md +867 -0
package/src/commands/install-commands.ts +149 -53

package/commands/massu-golden-path/references/qa-evaluator-spec.md ADDED Viewed

@@ -0,0 +1,137 @@
+# QA Evaluator Specification
+> Reference doc for `/massu-golden-path` Phase 2C. Return to `phase-2-implementation.md` for full Phase 2.
+## Purpose
+An adversarial functional QA agent that exercises the running application via Playwright MCP, tuned for skepticism. Catches "compiles but doesn't work" failures that self-evaluation and code review miss.
+**Origin**: Adapted from Anthropic Labs' harness design. Key insight: separating generation from evaluation eliminates self-praise bias. Tuning a standalone evaluator to be skeptical is far more tractable than making a generator self-critical.
+---
+## Evaluation Dimensions
+The QA evaluator grades each implemented plan item across 4 dimensions:
+| # | Dimension | Weight | What It Checks |
+|---|-----------|--------|----------------|
+| 1 | **Functionality** | High | Can users complete the primary workflow? Navigate to feature, interact, verify result. |
+| 2 | **Completeness** | High | Are ALL contract acceptance criteria met? No stubs, no mock data, no placeholder text. |
+| 3 | **Data Integrity** | Medium | Do write→store→read→display roundtrips work? (Aligns with CR-47 VR-ROUNDTRIP) |
+| 4 | **Design Compliance** | Medium | Does the UI follow the design system tokens and component specs? (Aligns with VR-TOKEN, VR-SPEC-MATCH) |
+---
+## Grading Rubric
+Per plan item:
+| Grade | Meaning | Gate Impact |
+|-------|---------|-------------|
+| **PASS** | All contract criteria met, feature works as intended | QA_GATE remains PASS |
+| **PARTIAL** | Most criteria met but 1-2 minor gaps (e.g., missing empty state) | QA_GATE: FAIL — must fix |
+| **FAIL** | Core functionality broken, mock data, unwired features | QA_GATE: FAIL — must fix |
+**Failure threshold**: Any single FAIL or PARTIAL = QA_GATE: FAIL. The article emphasizes that a lenient evaluator defeats the purpose.
+---
+## Known Failure Patterns to Check
+These patterns are derived from production incidents. The QA evaluator MUST actively check for each:
+| Pattern | Source | How to Detect |
+|---------|--------|---------------|
+| **Mock/hardcoded data** | Common incident pattern | Data doesn't change when DB changes; look for hardcoded arrays in component files |
+| **Write succeeds but read/display broken** | Data visibility incidents | Submit form successfully, navigate away and back, verify data persists and displays |
+| **Feature stubs** | Multiple incidents | Component renders but onClick/onSubmit handlers are empty or log-only |
+| **Invisible elements** | Visibility incidents | Elements exist in DOM but have `display:none`, `opacity:0`, or are behind other elements |
+| **Missing query invalidation** | Common pattern | Create/update item, verify list updates without manual refresh |
+| **Broken dark mode** | Design audit findings | Toggle theme, verify all text visible, no invisible-on-dark elements |
+---
+## Evaluation Protocol
+For each plan item with a sprint contract:
+```
+1. NAVIGATE to the affected page using Playwright MCP
+   - browser_navigate to the target URL
+   - browser_snapshot to verify page loaded (not error/auth page)
+   - browser_console_messages to check for React errors
+2. EXERCISE the feature as a real user would
+   - Follow the happy path described in contract criteria
+   - browser_click, browser_fill_form, browser_select_option as needed
+   - Wait 2-3 seconds after interactions for async operations
+3. VERIFY against sprint contract acceptance criteria
+   - Check each criterion explicitly
+   - browser_snapshot after key interactions for evidence
+   - browser_network_requests to verify API calls succeed
+4. CHECK for known failure patterns
+   - Look for hardcoded data (browser_evaluate to check DOM for static arrays)
+   - Verify write→read roundtrip if applicable
+   - Check empty/loading/error states by testing edge conditions
+5. GRADE the item: PASS / PARTIAL / FAIL
+   - Include specific evidence for any non-PASS grade
+   - Reference the specific contract criterion that failed
+```
+---
+## Conditional Activation
+The QA evaluator only spawns when the plan touches UI files:
+- `src/app/**/*.tsx`
+- `src/components/**/*.tsx`
+For backend-only plans (routers, crons, migrations with no UI), skip with log note:
+```
+QA Evaluator: SKIPPED (no UI files in plan)
+```
+---
+## Relationship to Phase 2G
+| Aspect | QA Evaluator (Phase 2C) | Browser Verification (Phase 2G) |
+|--------|------------------------|--------------------------------|
+| **When** | After implementation, during review | After all reviews pass |
+| **Focus** | Contract compliance, feature functionality | Load audit, performance, interactive inventory |
+| **Scope** | Only plan items with contracts | All pages affected by changes |
+| **Adversarial?** | Yes — tuned for skepticism | No — comprehensive but not adversarial |
+| **Fixes** | Reports findings; main agent fixes | Fixes issues directly |
+They are **complementary, not redundant**. QA evaluator asks "does it do what we agreed?" while Phase 2G asks "does everything still work correctly?"
+---
+## Evaluator Prompt Tuning
+The article explicitly notes that Claude is "a poor QA agent" out of the box — it identifies issues then talks itself into approving. Effective QA evaluation requires iterative prompt tuning.
+### Tuning Protocol
+After each golden-path run:
+1. **Review QA evaluator findings log** — did it catch real bugs?
+2. **Check Phase 2G results** — did browser verification catch anything QA evaluator missed?
+3. **Check production** — did anything break after deploy that should have been caught?
+4. **Update evaluator prompt** if judgment divergences found:
+   - Add the missed pattern to the "Known Failure Patterns" list above
+   - Strengthen the prompt language for that failure mode
+   - Document the tuning decision in memory
+### Anti-Leniency Rules
+The evaluator prompt includes these rules to prevent the natural tendency toward generosity:
+1. **Never say "this is acceptable because..."** — if criteria aren't met, it's FAIL
+2. **Never give benefit of the doubt** — if you can't verify it works, it's FAIL
+3. **Partial credit is still failure** — PARTIAL means "not done yet"
+4. **Evidence required** — every PASS must cite specific evidence (screenshot, DOM state, network response)

package/commands/massu-golden-path/references/sprint-contract-protocol.md ADDED Viewed

@@ -0,0 +1,117 @@
+# Sprint Contract Protocol
+> Reference doc for `/massu-golden-path` Phase 2A.5. Return to `phase-2-implementation.md` for full Phase 2.
+## What Is a Sprint Contract?
+A sprint contract is a **negotiated definition-of-done** established for each plan item **before implementation begins**. It bridges the gap between high-level plan items and testable implementation by defining specific, measurable acceptance criteria that both the implementing agent and the evaluator agree on.
+**Origin**: Adapted from Anthropic Labs' harness design pattern where generator and evaluator agents negotiate per-sprint contracts before coding starts. This prevents the "I implemented something adjacent to the plan item" failure mode.
+---
+## Contract Template
+For each plan item in the Phase 2A tracking table, add these columns:
+| Column | Content |
+|--------|---------|
+| **Scope Boundary** | What is IN scope and what is explicitly OUT of scope for this item |
+| **Implementation Approach** | High-level approach (which files, which patterns) |
+| **Acceptance Criteria** | 3-5 testable statements per item (see quality bar below) |
+| **VR-* Mapping** | Which verification types apply and their expected output |
+### Example Contract
+```
+Plan Item: P3-001 — Add contact activity timeline to CRM detail page
+Scope Boundary:
+  IN: Activity timeline component on contact detail, showing calls/emails/meetings
+  OUT: Creating new activities (that's P3-002), activity filtering, pagination
+Implementation Approach:
+  - New ActivityTimeline component in src/components/crm/
+  - tRPC query in contacts router using 3-step pattern
+  - Render in contact detail page right column
+Acceptance Criteria:
+  1. Timeline renders with most recent activity first
+  2. Each activity shows type icon, timestamp (relative), and summary text
+  3. Empty state shows "No activities recorded" when contact has zero activities
+  4. Loading state shows 3 skeleton rows while fetching
+  5. Clicking an activity row navigates to the activity detail (or opens Sheet)
+VR-* Mapping:
+  - VR-GREP: ActivityTimeline component exists
+  - VR-RENDER: Component rendered in contact detail page
+  - VR-VISUAL: Route passes weighted scoring >= 3.0
+  - VR-ROUNDTRIP: Activity data flows from DB to display
+```
+---
+## Contract Quality Bar
+Acceptance criteria MUST be specific enough that **two independent evaluators would agree on PASS/FAIL**.
+| Quality | Example | Verdict |
+|---------|---------|---------|
+| **BAD** | "UI looks good" | Subjective, unmeasurable |
+| **BAD** | "Feature works correctly" | Too vague, no specific behavior |
+| **OKAY** | "Table renders contact data" | Testable but not specific enough |
+| **GOOD** | "DataTable renders with sortable columns for name, email, company. Clicking column header toggles sort direction. Empty state shows 'No contacts found' message." | Specific, testable, includes edge case |
+| **GOOD** | "Form submits and shows toast.success('Contact created'). New contact appears in list without page refresh." | Verifiable behavior with specific UI feedback |
+### Criteria Categories
+Each contract should include criteria from at least 3 of these categories:
+1. **Happy path**: Primary workflow completes successfully
+2. **Data display**: Correct data appears in the right places
+3. **Empty/loading/error states**: All states handled
+4. **User feedback**: Success/error messages shown
+5. **Edge cases**: Null values, long text, missing data
+---
+## Negotiation Rules
+1. **Max 3 negotiation rounds per item.** If generator and auditor can't agree after 3 rounds, escalate to user via AskUserQuestion.
+2. **Auditor challenges vague criteria.** If a criterion uses words like "good", "correct", "proper" without specifics, the auditor MUST push back.
+3. **Generator can propose simpler criteria** if the auditor's demands exceed the plan item's scope.
+4. **No gold-plating.** Criteria must match the plan item scope, not expand it. If the auditor wants more, that's a new plan item.
+---
+## Contract Storage
+Contracts are stored as **additional columns in the Phase 2A tracking table**, not as separate files. This avoids file proliferation and keeps contracts co-located with the items they describe.
+```
+| Item # | Type | Description | Location | Verification | Contract | Acceptance Criteria | Status |
+|--------|------|-------------|----------|--------------|----------|---------------------|--------|
+| P3-001 | COMPONENT | Activity timeline | src/components/crm/ | VR-RENDER | See above | 5 criteria | PENDING |
+```
+---
+## When to Skip
+Sprint contracts can be marked **N/A** for:
+- Pure refactors with no user-facing behavior change (verified by VR-BUILD + VR-TYPE + VR-TEST)
+- Documentation-only items
+- Migration items where the SQL IS the contract (the migration either applies or it doesn't)
+Mark as: `Contract: N/A — [reason]`
+---
+## Relationship to Existing Verification
+Sprint contracts **ADD** acceptance criteria on top of existing VR-* checks. They do NOT replace them.
+```
+VR-* checks = "Does the code meet technical standards?"
+Sprint contracts = "Does the code do what we agreed it would do?"
+```
+Both must pass. A plan item is NOT complete unless:
+1. All VR-* checks pass
+2. All contract acceptance criteria are met

package/commands/massu-golden-path/references/vr-visual-calibration.md ADDED Viewed

@@ -0,0 +1,73 @@
+# VR-VISUAL Calibration Examples
+> Reference doc for VR-VISUAL 4-dimension weighted scoring.
+> Used by `scripts/ui-review.sh` and golden-path Phase 2.5 gap analysis.
+## Purpose
+Few-shot calibration examples that anchor the LLM evaluator's scoring, reducing drift across runs. The article's author found that calibrating the evaluator with detailed score breakdowns ensured judgment aligned with preferences.
+---
+## Scoring Dimensions (Recap)
+| Dimension | Weight | What It Measures |
+|-----------|--------|------------------|
+| Design Quality | 2x | Coherent visual identity, deliberate choices |
+| Functionality | 2x | Usable, clear hierarchy, findable actions |
+| Craft | 1x | Spacing, typography, color consistency |
+| Completeness | 1x | All states handled, no broken elements |
+**Weighted Score** = (DQ×2 + FN×2 + CR×1 + CO×1) / 6
+**PASS threshold**: >= 3.0/5.0
+---
+## Score 5: Excellent
+**Description**: A page that a senior designer would approve without changes. Distinct visual identity, not generic template output.
+**Design Quality (5)**: Colors, typography, and spacing work together to create a cohesive mood. The page has personality — you could tell it's this product without seeing the logo. No purple gradients over white cards. No generic SaaS dashboard look.
+**Functionality (5)**: Primary action is immediately obvious. Navigation is intuitive. Information hierarchy is clear — the most important data is most prominent. Secondary actions are accessible but don't compete with primary.
+**Craft (5)**: Consistent spacing throughout. Typography has clear hierarchy (headings, body, captions). Color palette is harmonious with adequate contrast ratios. No misaligned elements, no inconsistent border radii, no orphaned text.
+**Completeness (5)**: Loading states use appropriate skeletons (not spinners for tabular data). Empty states have helpful messaging and a call-to-action. Error states are specific and actionable. No broken images, no placeholder text, no "TODO" comments visible.
+---
+## Score 3: Acceptable
+**Description**: Functional and usable but unremarkable. Would not embarrass the team but wouldn't impress either. Typical of competent but uninspired output.
+**Design Quality (3)**: Uses the design system correctly but without distinction. Colors are from the palette but the combination doesn't create a strong identity. Layout works but doesn't guide the eye naturally. It's "fine."
+**Functionality (3)**: Primary actions are findable but may require a moment of scanning. Information is present but hierarchy could be clearer. Navigation works but some paths feel one extra click too deep.
+**Craft (3)**: Mostly consistent spacing with occasional irregularities. Typography hierarchy exists but some levels are too similar in weight/size. Colors pass contrast checks but some combinations feel muddy.
+**Completeness (3)**: Most states handled. May be missing one of: loading skeleton for a secondary section, empty state for an uncommon scenario, or error messaging for a specific failure mode. Nothing broken, but gaps exist in edge cases.
+---
+## Score 1: Failing
+**Description**: Visually broken or fundamentally confusing. Would be flagged in any review.
+**Design Quality (1)**: Generic AI-generated appearance — default component library styling, no visual identity. Or worse: clashing colors, inconsistent themes, elements that look like they belong to different products.
+**Functionality (1)**: Primary action unclear. User has to guess what to do. Important data is buried or missing. Navigation is confusing — back button doesn't go where expected, breadcrumbs are wrong.
+**Craft (1)**: Overlapping elements. Cut-off text. Inconsistent spacing (some sections have gaps, others are cramped). Typography has no clear hierarchy. Colors clash or have poor contrast (text invisible on background).
+**Completeness (1)**: Obvious missing states — raw error objects displayed, blank sections where data should be, spinner that never resolves, "undefined" or "null" visible in the UI. Broken images showing alt text or image-not-found icons.
+---
+## Calibration Notes
+- **Design quality and functionality are weighted 2x** because Claude already scores well on craft and completeness by default. The emphasis pushes toward more distinctive, user-centric output.
+- **"Museum quality"** language in criteria pushes toward convergent aesthetics (the article noted this effect). Massu's criteria use more practical language ("distinct visual identity", "deliberate design choices") to encourage diversity.
+- **Score 3.0 is the minimum bar, not the target.** Golden path should aim for 3.5+ on weighted score. Pages scoring 3.0-3.2 pass but should be flagged for improvement in post-build reflection.
+- **Scores are route-specific.** An admin settings page scoring 3.5 is fine; a customer-facing landing page scoring 3.5 should be improved.