npm - hatch3r - Versions diffs - 1.9.0 → 2.0.0 - Mend

hatch3r 1.9.0 → 2.0.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (288) hide show

package/dist/content/rules/hatch3r-reviewer-calibration.md ADDED Viewed

@@ -0,0 +1,84 @@
+---
+id: hatch3r-reviewer-calibration
+type: rule
+description: "Reviewer runtime confidence-calibration contract: every Nth (default N=5) consecutive clean PASS triggers an out-of-band second-pass review before loop exit; divergence reverts to REQUEST CHANGES; each second pass logs to .hatch3r/calibration-log.jsonl. Canonical source of the N-default and the directive that agents/hatch3r-reviewer.md and calibration-protocol.md reference."
+tags: [review, orchestration, floor:protocol]
+scope: always
+precedence: high
+quality_charter: agents/shared/quality-charter.md
+cache_friendly: true
+---
+# hatch3r Reviewer Confidence Calibration
+**Pillars:** P2 (Scientific & Practical Quality), P5 (Governance Self-Quality)
+A reviewer's `confidence` rating is self-assigned by the same model that produced the verdict. Without an out-of-band check it is structurally over-trusted: LLM judges systematically overstate confidence — predicted confidence significantly exceeds realized correctness (Tian et al. 2025, arxiv:2508.06225) — so a self-reported clean PASS carries a non-zero, unmeasured miscalibration probability at runtime. This rule is the canonical, always-on source for the **runtime** (within-loop) bound that closes that gap before the review loop exits on a clean PASS. It owns the N-default and the directive that `agents/hatch3r-reviewer.md` §Runtime Confidence Calibration and the across-cycle calibration protocol cite.
+Scope split (do not duplicate across the two artifacts):
+- **Runtime, within-loop (this rule + `agents/hatch3r-reviewer.md`):** bounds an unbounded run of self-trusted clean verdicts inside one review-loop session. Fires before loop exit.
+- **Across-cycle measurement (the across-cycle calibration protocol):** samples N=20 prior-cycle PASS findings at cycle close and scores realized over-claim rate. Fires at cycle archive time.
+The two are complements, not substitutes — neither replaces the other.
+## Directive (verbatim)
+> Every Nth consecutive clean PASS verdict on a review-loop exit triggers one out-of-band second-pass review of the same diff. If the second pass surfaces any Critical or Warning the first pass did not, the loop does NOT exit clean — it reverts to REQUEST CHANGES. Each second pass appends one record to `.hatch3r/calibration-log.jsonl`.
+## N-default (authoritative)
+`N = 5` consecutive clean PASS verdicts for general diffs; `N = 1` for safety-class diffs (auth / security / migration — see the high-risk fast path in Trigger). These are the single source of truth for the defaults; `agents/hatch3r-reviewer.md` and the across-cycle calibration protocol cite these values rather than redeclaring them. The lowered safety-class default fires the second pass on the first clean PASS so an auth, security, or migration change never merges on a single self-trusted verdict (D23-2).
+- **Counter owner — the orchestrator, NOT the reviewer.** The reviewer sub-agent is spawned stateless per iteration and the review loop exits on the first clean verdict, so a reviewer-owned counter can never exceed 1 and the second pass would never fire. The orchestrator owns `consecutive_clean_pass_count` and reads/writes it; the reviewer only reports its per-verdict outcome.
+- **Counter scope — across top-level runs, persisted.** Count consecutive clean PASS verdicts across top-level pipeline runs, not within one loop and not per-iteration (the loop exits on the first clean verdict, so within a single loop the count advances by at most 1). The orchestrator persists the running count to project-local `.hatch3r/calibration-state.json` (`{ "consecutive_clean_pass_count": <int>, "updated_at": "<ISO-8601>" }`), written atomically via `src/merge/safeWrite.ts`. On each top-level run the orchestrator reads the prior count, increments on a would-be-clean exit, and resets to 0 on any REQUEST CHANGES or DESIGN_OBJECTION verdict. A missing/unparseable file is treated as count 0.
+- **Project override:** a project may set a different cadence via its own config; the override widens or narrows the cadence but never disables the second pass while a second pass remains available (see Unavailability below).
+## Trigger
+The orchestrator evaluates the trigger at the would-be-clean loop exit (the point where the loop would return a clean PASS — 0 Critical + 0 Warning — to Phase 4), using the cross-run counter it persisted per N-default above. Either branch fires the second pass:
+- **Cadence branch (default):** the post-increment `consecutive_clean_pass_count` (prior persisted count + 1 for this run) is a multiple of `N`.
+- **High-risk fast path (safety-class, N=1):** the reviewed diff touches any safety-class surface — a file tagged `floor:security`, auth/authn code (the `hatch3r-security` (CQ3) dispatch set in `agents/hatch3r-reviewer.md`: `src/auth/**`, OAuth/OIDC config, WebAuthn/passkey server, release-pipeline files, dependency manifest/lockfile), any change that triggers the CQ3 security specialist, OR a schema/event-schema migration (the `migration.review` surface — schema DDL, backfills, event-schema changes). For a safety-class diff, fire the second pass on the **first** clean PASS, independent of the cadence counter (do not wait for the Nth). The fast-path branch still increments and persists the cross-run counter; it only lowers the firing threshold to `N=1` for that run.
+## Action
+Run one second-pass review of the same diff with an independent judge:
+1. **Documented setup recommendation — a different model class.** A same-model-family critique shares the generator's blind spot, so a same-family second pass cannot detect the error classes the family is systematically biased to produce (Huang et al., ICLR 2024, "Large Language Models Cannot Self-Correct Reasoning Yet"). Route the second pass to a different model class wherever the deployment can — this is the recommended project setup, not best-effort. The second pass renders its own independent verdict + confidence.
+2. **Fallback — same model class re-rolled at higher temperature,** used ONLY when no second model class is routable. Because this fallback does not break the shared-blind-spot, it is a weaker check: emit `calibration: degraded (same-family re-roll)` in the verdict for that run so the weakened independence is visible and never asserted as a clean cross-family check. Record the model class used in the log (`second_pass_model_class: re-roll`).
+The second pass applies the same Review Checklist as the first (`agents/hatch3r-reviewer.md` → Review Checklist); it is a full re-review, not a spot check.
+## Divergence handling
+- **Divergent** — the second pass surfaces any Critical or Warning the first pass did not: do NOT exit clean. Revert the loop verdict to REQUEST CHANGES, record both verdicts, and feed the divergence to the next fixer iteration.
+- **Aligned** — both passes agree (both clean): exit clean and record alignment.
+A divergent second pass is the failure mode of interest — it is the runtime signal that the first pass was over-confident.
+## Logging
+Append exactly one record per second pass to `.hatch3r/calibration-log.jsonl` (project-local, JSON Lines) via the atomic append path in `src/merge/safeWrite.ts`. One JSON object per line:
+```json
+{"timestamp":"<ISO-8601>","first_pass_verdict":"PASS","second_pass_verdict":"PASS|REQUEST CHANGES","divergent":false,"second_pass_model_class":"different|re-roll","consecutive_clean_count":5,"trigger":"cadence|high-risk"}
+```
+`consecutive_clean_count` is the post-increment cross-run count at firing time; `trigger` records which Trigger branch fired (`high-risk` when the diff touched a safety-class surface and the second pass fired on the first clean PASS under the `N=1` fast path). `second_pass_model_class` is `different` for a cross-family second pass or `re-roll` for the same-family fallback; a `re-roll` record corresponds to a `calibration: degraded (same-family re-roll)` verdict annotation per Action. The project-local over-claim rate derived from this log feeds the iteration-summary `Confidence` field per `rules/hatch3r-iteration-summary.md`.
+## Unavailability (visible skip, never silent)
+Skip the second pass ONLY when no second model class is available AND the orchestrator has disabled same-model re-roll. In that case emit `calibration: skipped (no second pass available)` in the verdict so the gap is visible rather than silent — a silent skip is a Silent-Failure-Contract violation. A skip does NOT reset the consecutive-clean-PASS counter; the next eligible exit re-attempts the second pass.
+## Pillar Service
+- **P2 Scientific & Practical Quality (primary).** Adds an adversarial out-of-band check to a self-assigned confidence value; over-claimed clean verdicts become detectable at runtime, not just at cycle close.
+- **P5 Governance Self-Quality (supporting).** Removes the "reviewer as sole judge of its own confidence" structural over-trust pattern from the within-loop path, mirroring the across-cycle loop that `calibration-protocol.md` adds at cycle scope.
+## References
+- `agents/hatch3r-reviewer.md` §Runtime Confidence Calibration — the consuming agent body that invokes this contract (accessed 2026-05-28, trust tier: canonical).
+- The across-cycle calibration protocol §Runtime complement (F13.2-F1) — the across-cycle measurement loop this runtime bound complements (accessed 2026-05-28, trust tier: canonical).
+- `rules/hatch3r-iteration-summary.md` — consumes the project-local over-claim rate for the `Confidence` field (accessed 2026-05-28, trust tier: canonical).
+- Tian, Z. et al. "Overconfidence in LLM-as-a-Judge: Diagnosis and Confidence-Driven Solution" (arxiv:2508.06225). `https://arxiv.org/abs/2508.06225` (accessed 2026-06-09, peer-reviewed-methodology). Evidence that an LLM judge's predicted confidence significantly overstates realized correctness (the Overconfidence Phenomenon), so a self-reported clean PASS is structurally over-trusted — motivating the out-of-band second pass.
+- Huang, J. et al. "Large Language Models Cannot Self-Correct Reasoning Yet." ICLR 2024 (arxiv:2310.01798). `https://arxiv.org/abs/2310.01798` (accessed 2026-06-06, peer-reviewed-methodology). Evidence that same-model self-critique shares the generator's blind spot, motivating the different-model-class setup recommendation in Action and the lowered safety-class `N=1` second-pass cadence (D23-2).

package/dist/content/rules/hatch3r-reviewer-calibration.mdc ADDED Viewed

@@ -0,0 +1,84 @@
+---
+id: hatch3r-reviewer-calibration
+type: rule
+description: "Reviewer runtime confidence-calibration contract: every Nth (default N=5) consecutive clean PASS triggers an out-of-band second-pass review before loop exit; divergence reverts to REQUEST CHANGES; each second pass logs to .hatch3r/calibration-log.jsonl. Canonical source of the N-default and the directive that agents/hatch3r-reviewer.md and calibration-protocol.md reference."
+tags: [review, orchestration, floor:protocol]
+alwaysApply: true
+precedence: high
+quality_charter: agents/shared/quality-charter.md
+cache_friendly: true
+---
+# hatch3r Reviewer Confidence Calibration
+**Pillars:** P2 (Scientific & Practical Quality), P5 (Governance Self-Quality)
+A reviewer's `confidence` rating is self-assigned by the same model that produced the verdict. Without an out-of-band check it is structurally over-trusted: LLM judges systematically overstate confidence — predicted confidence significantly exceeds realized correctness (Tian et al. 2025, arxiv:2508.06225) — so a self-reported clean PASS carries a non-zero, unmeasured miscalibration probability at runtime. This rule is the canonical, always-on source for the **runtime** (within-loop) bound that closes that gap before the review loop exits on a clean PASS. It owns the N-default and the directive that `agents/hatch3r-reviewer.md` §Runtime Confidence Calibration and the across-cycle calibration protocol cite.
+Scope split (do not duplicate across the two artifacts):
+- **Runtime, within-loop (this rule + `agents/hatch3r-reviewer.md`):** bounds an unbounded run of self-trusted clean verdicts inside one review-loop session. Fires before loop exit.
+- **Across-cycle measurement (the across-cycle calibration protocol):** samples N=20 prior-cycle PASS findings at cycle close and scores realized over-claim rate. Fires at cycle archive time.
+The two are complements, not substitutes — neither replaces the other.
+## Directive (verbatim)
+> Every Nth consecutive clean PASS verdict on a review-loop exit triggers one out-of-band second-pass review of the same diff. If the second pass surfaces any Critical or Warning the first pass did not, the loop does NOT exit clean — it reverts to REQUEST CHANGES. Each second pass appends one record to `.hatch3r/calibration-log.jsonl`.
+## N-default (authoritative)
+`N = 5` consecutive clean PASS verdicts for general diffs; `N = 1` for safety-class diffs (auth / security / migration — see the high-risk fast path in Trigger). These are the single source of truth for the defaults; `agents/hatch3r-reviewer.md` and the across-cycle calibration protocol cite these values rather than redeclaring them. The lowered safety-class default fires the second pass on the first clean PASS so an auth, security, or migration change never merges on a single self-trusted verdict (D23-2).
+- **Counter owner — the orchestrator, NOT the reviewer.** The reviewer sub-agent is spawned stateless per iteration and the review loop exits on the first clean verdict, so a reviewer-owned counter can never exceed 1 and the second pass would never fire. The orchestrator owns `consecutive_clean_pass_count` and reads/writes it; the reviewer only reports its per-verdict outcome.
+- **Counter scope — across top-level runs, persisted.** Count consecutive clean PASS verdicts across top-level pipeline runs, not within one loop and not per-iteration (the loop exits on the first clean verdict, so within a single loop the count advances by at most 1). The orchestrator persists the running count to project-local `.hatch3r/calibration-state.json` (`{ "consecutive_clean_pass_count": <int>, "updated_at": "<ISO-8601>" }`), written atomically via `src/merge/safeWrite.ts`. On each top-level run the orchestrator reads the prior count, increments on a would-be-clean exit, and resets to 0 on any REQUEST CHANGES or DESIGN_OBJECTION verdict. A missing/unparseable file is treated as count 0.
+- **Project override:** a project may set a different cadence via its own config; the override widens or narrows the cadence but never disables the second pass while a second pass remains available (see Unavailability below).
+## Trigger
+The orchestrator evaluates the trigger at the would-be-clean loop exit (the point where the loop would return a clean PASS — 0 Critical + 0 Warning — to Phase 4), using the cross-run counter it persisted per N-default above. Either branch fires the second pass:
+- **Cadence branch (default):** the post-increment `consecutive_clean_pass_count` (prior persisted count + 1 for this run) is a multiple of `N`.
+- **High-risk fast path (safety-class, N=1):** the reviewed diff touches any safety-class surface — a file tagged `floor:security`, auth/authn code (the `hatch3r-security` (CQ3) dispatch set in `agents/hatch3r-reviewer.md`: `src/auth/**`, OAuth/OIDC config, WebAuthn/passkey server, release-pipeline files, dependency manifest/lockfile), any change that triggers the CQ3 security specialist, OR a schema/event-schema migration (the `migration.review` surface — schema DDL, backfills, event-schema changes). For a safety-class diff, fire the second pass on the **first** clean PASS, independent of the cadence counter (do not wait for the Nth). The fast-path branch still increments and persists the cross-run counter; it only lowers the firing threshold to `N=1` for that run.
+## Action
+Run one second-pass review of the same diff with an independent judge:
+1. **Documented setup recommendation — a different model class.** A same-model-family critique shares the generator's blind spot, so a same-family second pass cannot detect the error classes the family is systematically biased to produce (Huang et al., ICLR 2024, "Large Language Models Cannot Self-Correct Reasoning Yet"). Route the second pass to a different model class wherever the deployment can — this is the recommended project setup, not best-effort. The second pass renders its own independent verdict + confidence.
+2. **Fallback — same model class re-rolled at higher temperature,** used ONLY when no second model class is routable. Because this fallback does not break the shared-blind-spot, it is a weaker check: emit `calibration: degraded (same-family re-roll)` in the verdict for that run so the weakened independence is visible and never asserted as a clean cross-family check. Record the model class used in the log (`second_pass_model_class: re-roll`).
+The second pass applies the same Review Checklist as the first (`agents/hatch3r-reviewer.md` → Review Checklist); it is a full re-review, not a spot check.
+## Divergence handling
+- **Divergent** — the second pass surfaces any Critical or Warning the first pass did not: do NOT exit clean. Revert the loop verdict to REQUEST CHANGES, record both verdicts, and feed the divergence to the next fixer iteration.
+- **Aligned** — both passes agree (both clean): exit clean and record alignment.
+A divergent second pass is the failure mode of interest — it is the runtime signal that the first pass was over-confident.
+## Logging
+Append exactly one record per second pass to `.hatch3r/calibration-log.jsonl` (project-local, JSON Lines) via the atomic append path in `src/merge/safeWrite.ts`. One JSON object per line:
+```json
+{"timestamp":"<ISO-8601>","first_pass_verdict":"PASS","second_pass_verdict":"PASS|REQUEST CHANGES","divergent":false,"second_pass_model_class":"different|re-roll","consecutive_clean_count":5,"trigger":"cadence|high-risk"}
+```
+`consecutive_clean_count` is the post-increment cross-run count at firing time; `trigger` records which Trigger branch fired (`high-risk` when the diff touched a safety-class surface and the second pass fired on the first clean PASS under the `N=1` fast path). `second_pass_model_class` is `different` for a cross-family second pass or `re-roll` for the same-family fallback; a `re-roll` record corresponds to a `calibration: degraded (same-family re-roll)` verdict annotation per Action. The project-local over-claim rate derived from this log feeds the iteration-summary `Confidence` field per `rules/hatch3r-iteration-summary.md`.
+## Unavailability (visible skip, never silent)
+Skip the second pass ONLY when no second model class is available AND the orchestrator has disabled same-model re-roll. In that case emit `calibration: skipped (no second pass available)` in the verdict so the gap is visible rather than silent — a silent skip is a Silent-Failure-Contract violation. A skip does NOT reset the consecutive-clean-PASS counter; the next eligible exit re-attempts the second pass.
+## Pillar Service
+- **P2 Scientific & Practical Quality (primary).** Adds an adversarial out-of-band check to a self-assigned confidence value; over-claimed clean verdicts become detectable at runtime, not just at cycle close.
+- **P5 Governance Self-Quality (supporting).** Removes the "reviewer as sole judge of its own confidence" structural over-trust pattern from the within-loop path, mirroring the across-cycle loop that `calibration-protocol.md` adds at cycle scope.
+## References
+- `agents/hatch3r-reviewer.md` §Runtime Confidence Calibration — the consuming agent body that invokes this contract (accessed 2026-05-28, trust tier: canonical).
+- The across-cycle calibration protocol §Runtime complement (F13.2-F1) — the across-cycle measurement loop this runtime bound complements (accessed 2026-05-28, trust tier: canonical).
+- `rules/hatch3r-iteration-summary.md` — consumes the project-local over-claim rate for the `Confidence` field (accessed 2026-05-28, trust tier: canonical).
+- Tian, Z. et al. "Overconfidence in LLM-as-a-Judge: Diagnosis and Confidence-Driven Solution" (arxiv:2508.06225). `https://arxiv.org/abs/2508.06225` (accessed 2026-06-09, peer-reviewed-methodology). Evidence that an LLM judge's predicted confidence significantly overstates realized correctness (the Overconfidence Phenomenon), so a self-reported clean PASS is structurally over-trusted — motivating the out-of-band second pass.
+- Huang, J. et al. "Large Language Models Cannot Self-Correct Reasoning Yet." ICLR 2024 (arxiv:2310.01798). `https://arxiv.org/abs/2310.01798` (accessed 2026-06-06, peer-reviewed-methodology). Evidence that same-model self-critique shares the generator's blind spot, motivating the different-model-class setup recommendation in Action and the lowered safety-class `N=1` second-pass cadence (D23-2).

package/dist/content/rules/hatch3r-right-sizing.md ADDED Viewed

@@ -0,0 +1,68 @@
+---
+id: hatch3r-right-sizing
+type: rule
+description: Right-size every investment in robustness, scalability, testing, and infra to the project's maturity tier — invest only as much complexity as it takes to reach the next stage, never default to enterprise-grade. The universal floor (security, correctness, accessibility basics, baseline tests on changed surfaces) never relaxes. Overengineering and premature bureaucracy are P4 violations.
+tags: [right-sizing, code-quality, floor:content-quality]
+precedence: high
+scope: always
+quality_charter: agents/shared/quality-charter.md
+cache_friendly: true
+---
+# hatch3r Right-Sizing
+**Pillars:** P4 (Comprehensive Lean Coverage), CQ8 (Maintainability Quality)
+## North-Star Directive
+Invest in robustness, scalability, testing, and infrastructure in proportion to the project's maturity — and no further. Use only as much complexity as it takes to reach the **next** stage. Never default to enterprise-grade.
+Overengineering is a defect, not diligence. Building a generalized framework, a sharded data tier, or a mutation-testing harness for a single-author prototype is the same category of error as skipping a needed test — it spends scarce effort on the wrong axis. Premature bureaucracy (approval gates, ADRs on trivial choices, governance ceremony a two-person team cannot sustain) is the identical failure wearing a process costume.
+The maturity tier is an **investment-calibration dial, not a content gate**. Every capability — every specialist agent, every rule, every check — ships at every tier. The tier sets how DEEP you build, never WHETHER the concern applies. A solo project still cares about reliability and testing; it gets right-sized versions, not withheld ones.
+## The Universal Floor (never relaxed, any tier including solo)
+Four floors bind at every tier. No calibration choice may drop below them:
+1. **Security** — auth correctness on protected routes, no committed secrets, dependency hygiene (lockfile + install integrity), input validation. The `floor:security` controls in `agents/hatch3r-security.md` bind in full at solo; only supply-chain and governance DEPTH scales upward.
+2. **Correctness & data integrity** — logic is correct against its stated contract; schema migrations are reversible (expand-contract); no destructive single-deploy schema change; no silent data corruption.
+3. **Accessibility basics** — keyboard operability, semantic markup, axe-core serious+critical = 0 on shipped UI.
+4. **Baseline tests on changed surfaces** — a smoke / happy-path test on every changed surface; mocks justified; deterministic runs.
+If a calibration choice would drop below any floor, the floor wins. State the conflict; do not silently relax.
+## The Tier → Depth Ladder
+Each tier adds depth on top of the one below. The enterprise column is the deepest — it holds the historical absolute thresholds the CQ specialists enforce.
+| Tier | Investment posture | Build for |
+|------|--------------------|-----------|
+| **solo** | Universal floor only. Ship the smallest thing that is correct, secure, accessible, and tested on its changed surfaces. No speculative abstraction, no infra a single author cannot operate. | `team` |
+| **team** | + shared-codebase discipline: duplication control (jscpd ≤7%), design-system reuse, structured logging with correlation ids, ADRs on genuine architectural decisions (not trivia). | `scaleup` |
+| **scaleup** | + production operations: SLOs defined, distributed tracing on the request path, performance budgets, statelessness / idempotency / back-pressure on mutating writes, an incident-response path. | `enterprise` |
+| **enterprise** | + org governance: full mutation / property / contract testing, AI-eval coverage, extensibility governance, FinOps cost attribution, a published deprecation policy. The deepest column — today's absolute audit thresholds. | (steady state) |
+## Build for the NEXT Stage, Not the Final One
+When the right depth is ambiguous, build one tier up — never enterprise by default. Leave a documented seam (an interface boundary, a config indirection, a noted extension point), not a built-out cathedral. "Make the next step cheap" replaces "build everything now": cheap-to-extend later beats expensive-and-speculative now, because most speculative depth is never exercised — the project pivots or the need never lands.
+The right question is never "what would a mature system have here?" It is: **what does THIS project at THIS tier need to take its next step safely?** Answer that, leave the seam, ship.
+## Reading the Project's Maturity
+Resolve the tier in this order:
+1. **Adapter header** — generated artifacts carry the tier on their first line. Cursor: `<!-- hatch3r: right-size to maturity=<tier>… -->`. Copilot / inlined surfaces: `> hatch3r: right-size to maturity=<tier>…`. This is the fastest and most authoritative signal at agent runtime.
+2. **Manifest** — `.hatch3r/hatch.json` → `maturity`. Absent → treat as `solo`.
+3. **Ask** — if the tier is undiscoverable from both AND the decision is consequential (it changes the artifact's shape or cost), ask via `agents/shared/user-question-protocol.md`. Default to `solo` when no answer arrives.
+## Overengineering Is a P4 Violation
+Shipping depth the tier did not call for is over-fitting the solution to an imagined future — a P4 (Lean Coverage) violation, because the unused machinery earns no value against its complexity cost. The CQ specialist agents enforce right-sizing through their `## Tier calibration` ladders: the solo column equals the universal floor, the enterprise column equals the absolute threshold.
+A reviewer who finds enterprise-grade machinery (sharding, a plugin system, multi-burn-rate SLO alerting, a mutation-testing gate) on a solo or team project files a right-sizing finding — Info when it is dormant and cheap to remove, escalating to Medium when the unused depth slows the change under review or blocks the next step. Under-investment relative to the tier (no SLO on a scaleup request path, no design-system on a team frontend) is the symmetric finding: the floor and the tier ladder cut both ways.
+## References
+- "YAGNI (You Aren't Gonna Need It)." Laws of Software Engineering. URL: https://lawsofsoftwareengineering.com/laws/yagni/ — accessed 2026-06-03. Trust tier: established curated engineering-principles reference (named-law catalog). Synthesized: build only what is required now; YAGNI depends on cheap refactoring (test coverage + CI) so deferring is safe; iterate with real use-case data rather than speculative architecture.
+- Fritzsche, R. "Avoiding Over-Engineering: Focus on Real Problems in Software Development." 2025. URL: https://ricofritzsche.me/avoiding-over-engineering-focus-on-real-problems-in-software-development/ — accessed 2026-06-03. Trust tier: practitioner long-form with named author, citing Knuth premature-optimization + startup-scaling research (70% of failed startups scaled too early). Synthesized: build the simplest version first then iterate on observed problems; scale progressively with real growth, not imagined traffic; premature abstraction hinders maintainability.

package/dist/content/rules/hatch3r-right-sizing.mdc ADDED Viewed

@@ -0,0 +1,66 @@
+---
+id: hatch3r-right-sizing
+type: rule
+description: Right-size every investment in robustness, scalability, testing, and infra to the project's maturity tier — invest only as much complexity as it takes to reach the next stage, never default to enterprise-grade. The universal floor (security, correctness, accessibility basics, baseline tests on changed surfaces) never relaxes. Overengineering and premature bureaucracy are P4 violations.
+tags: [right-sizing, code-quality, floor:content-quality]
+precedence: high
+alwaysApply: true
+---
+# hatch3r Right-Sizing
+**Pillars:** P4 (Comprehensive Lean Coverage), CQ8 (Maintainability Quality)
+## North-Star Directive
+Invest in robustness, scalability, testing, and infrastructure in proportion to the project's maturity — and no further. Use only as much complexity as it takes to reach the **next** stage. Never default to enterprise-grade.
+Overengineering is a defect, not diligence. Building a generalized framework, a sharded data tier, or a mutation-testing harness for a single-author prototype is the same category of error as skipping a needed test — it spends scarce effort on the wrong axis. Premature bureaucracy (approval gates, ADRs on trivial choices, governance ceremony a two-person team cannot sustain) is the identical failure wearing a process costume.
+The maturity tier is an **investment-calibration dial, not a content gate**. Every capability — every specialist agent, every rule, every check — ships at every tier. The tier sets how DEEP you build, never WHETHER the concern applies. A solo project still cares about reliability and testing; it gets right-sized versions, not withheld ones.
+## The Universal Floor (never relaxed, any tier including solo)
+Four floors bind at every tier. No calibration choice may drop below them:
+1. **Security** — auth correctness on protected routes, no committed secrets, dependency hygiene (lockfile + install integrity), input validation. The `floor:security` controls in `agents/hatch3r-security.md` bind in full at solo; only supply-chain and governance DEPTH scales upward.
+2. **Correctness & data integrity** — logic is correct against its stated contract; schema migrations are reversible (expand-contract); no destructive single-deploy schema change; no silent data corruption.
+3. **Accessibility basics** — keyboard operability, semantic markup, axe-core serious+critical = 0 on shipped UI.
+4. **Baseline tests on changed surfaces** — a smoke / happy-path test on every changed surface; mocks justified; deterministic runs.
+If a calibration choice would drop below any floor, the floor wins. State the conflict; do not silently relax.
+## The Tier → Depth Ladder
+Each tier adds depth on top of the one below. The enterprise column is the deepest — it holds the historical absolute thresholds the CQ specialists enforce.
+| Tier | Investment posture | Build for |
+|------|--------------------|-----------|
+| **solo** | Universal floor only. Ship the smallest thing that is correct, secure, accessible, and tested on its changed surfaces. No speculative abstraction, no infra a single author cannot operate. | `team` |
+| **team** | + shared-codebase discipline: duplication control (jscpd ≤7%), design-system reuse, structured logging with correlation ids, ADRs on genuine architectural decisions (not trivia). | `scaleup` |
+| **scaleup** | + production operations: SLOs defined, distributed tracing on the request path, performance budgets, statelessness / idempotency / back-pressure on mutating writes, an incident-response path. | `enterprise` |
+| **enterprise** | + org governance: full mutation / property / contract testing, AI-eval coverage, extensibility governance, FinOps cost attribution, a published deprecation policy. The deepest column — today's absolute audit thresholds. | (steady state) |
+## Build for the NEXT Stage, Not the Final One
+When the right depth is ambiguous, build one tier up — never enterprise by default. Leave a documented seam (an interface boundary, a config indirection, a noted extension point), not a built-out cathedral. "Make the next step cheap" replaces "build everything now": cheap-to-extend later beats expensive-and-speculative now, because most speculative depth is never exercised — the project pivots or the need never lands.
+The right question is never "what would a mature system have here?" It is: **what does THIS project at THIS tier need to take its next step safely?** Answer that, leave the seam, ship.
+## Reading the Project's Maturity
+Resolve the tier in this order:
+1. **Adapter header** — generated artifacts carry the tier on their first line. Cursor: `<!-- hatch3r: right-size to maturity=<tier>… -->`. Copilot / inlined surfaces: `> hatch3r: right-size to maturity=<tier>…`. This is the fastest and most authoritative signal at agent runtime.
+2. **Manifest** — `.hatch3r/hatch.json` → `maturity`. Absent → treat as `solo`.
+3. **Ask** — if the tier is undiscoverable from both AND the decision is consequential (it changes the artifact's shape or cost), ask via `agents/shared/user-question-protocol.md`. Default to `solo` when no answer arrives.
+## Overengineering Is a P4 Violation
+Shipping depth the tier did not call for is over-fitting the solution to an imagined future — a P4 (Lean Coverage) violation, because the unused machinery earns no value against its complexity cost. The CQ specialist agents enforce right-sizing through their `## Tier calibration` ladders: the solo column equals the universal floor, the enterprise column equals the absolute threshold.
+A reviewer who finds enterprise-grade machinery (sharding, a plugin system, multi-burn-rate SLO alerting, a mutation-testing gate) on a solo or team project files a right-sizing finding — Info when it is dormant and cheap to remove, escalating to Medium when the unused depth slows the change under review or blocks the next step. Under-investment relative to the tier (no SLO on a scaleup request path, no design-system on a team frontend) is the symmetric finding: the floor and the tier ladder cut both ways.
+## References
+- "YAGNI (You Aren't Gonna Need It)." Laws of Software Engineering. URL: https://lawsofsoftwareengineering.com/laws/yagni/ — accessed 2026-06-03. Trust tier: established curated engineering-principles reference (named-law catalog). Synthesized: build only what is required now; YAGNI depends on cheap refactoring (test coverage + CI) so deferring is safe; iterate with real use-case data rather than speculative architecture.
+- Fritzsche, R. "Avoiding Over-Engineering: Focus on Real Problems in Software Development." 2025. URL: https://ricofritzsche.me/avoiding-over-engineering-focus-on-real-problems-in-software-development/ — accessed 2026-06-03. Trust tier: practitioner long-form with named author, citing Knuth premature-optimization + startup-scaling research (70% of failed startups scaled too early). Synthesized: build the simplest version first then iterate on observed problems; scale progressively with real growth, not imagined traffic; premature abstraction hinders maintainability.

package/dist/content/rules/hatch3r-ruby-rails-patterns.md ADDED Viewed

@@ -0,0 +1,111 @@
+---
+id: hatch3r-ruby-rails-patterns
+type: rule
+description: Ruby 3.3+ and Rails 8.x conventions covering Hotwire (Turbo + Stimulus), ActiveRecord patterns, Sidekiq jobs, RSpec testing, RuboCop / Standard, and YJIT performance
+scope: conditional
+globs: "**/*.rb,**/*.rake,**/Gemfile,**/Gemfile.lock,**/Rakefile,**/config.ru,**/.rubocop.yml,**/.rubocop.yaml,**/.standard.yml,**/app/**,**/config/**,**/db/migrate/**,**/lib/**,**/spec/**,**/test/**"
+tags: [implementation, lang:ruby]
+quality_charter: agents/shared/quality-charter.md
+cache_friendly: true
+---
+# Ruby / Rails Patterns
+**Pillars:** P2 (Scientific & Practical Quality), CQ8 (Maintainability Quality)
+> Applies when the project ships a Ruby application. Detection signals: `Gemfile` at repo root, `config/application.rb` (Rails), `.ruby-version`, or any `*.rb` file. Sinatra and Hanami projects share most of the Ruby-level guidance here.
+## Ruby Language Floor
+- Target Ruby 3.3+ (3.4 recommended for new projects). Use pattern matching (`case/in`), rightward assignment (`x => y`), endless methods (`def square(x) = x * x`) when they improve readability — not as defaults.
+- Enable YJIT in production (`--yjit` flag or `RUBY_YJIT_ENABLE=1`). YJIT delivers 15–25% throughput improvements on Rails workloads with no code changes.
+- Sorbet (`sorbet-runtime`) or RBS (`steep`) for gradual typing. Type-check business logic and public API surfaces; skip view code and trivial helpers.
+- Format with Standard Ruby (`standardrb`) or RuboCop with `rubocop-rails` + `rubocop-rspec`. Pin in CI; reformat-on-save in editors.
+## Project Layout (Rails)
+- Default Rails structure:
+  - `app/models/` — ActiveRecord models and POROs.
+  - `app/controllers/` — controllers (HTTP only).
+  - `app/views/` — templates (ERB / Slim / HAML).
+  - `app/components/` — ViewComponent (`view_component` gem) for reusable UI components.
+  - `app/services/<Domain>/` — service objects (single public `call` method).
+  - `app/jobs/` — Active Job / Sidekiq workers.
+  - `app/policies/` — Pundit policies (or equivalent authorization).
+- Service objects (`app/services/`) for multi-step business operations. Thin controllers → service object → return result struct. Never put complex logic in controllers or models.
+- Keep models focused: validations, associations, scopes. Move complex queries to query objects (`app/queries/`) and complex callbacks to dedicated observers / commands.
+## Rails 8.x
+- Rails 8.0 is the floor (Nov 2024 release). It bundles SolidQueue, SolidCache, and SolidCable — drop Redis-only deployments for new apps unless throughput requires it.
+- Hotwire (Turbo + Stimulus) is the default for interactive UI — no separate SPA. Use `turbo_frame_tag` and `turbo_stream` responses for in-page updates without writing custom JavaScript.
+- Authentication: built-in `bin/rails generate authentication` scaffold (Rails 8 default). Use `Devise` only if the project needs OAuth / SAML out of the box.
+- Skip Webpacker — use the bundled `propshaft` asset pipeline + `importmap-rails` for ESM imports without a Node build step. Use `jsbundling-rails` (esbuild/rollup/vite) only when the project needs heavy JS tooling.
+## ActiveRecord
+- Define explicit `strong_parameters` in controllers (`params.expect(user: [:name, :email])`). Mass-assignment vulnerabilities are real.
+- N+1 query prevention: eager-load with `.includes(:association)` or `.preload(:association)`. Use the `bullet` gem in development + CI to detect N+1 patterns.
+- Avoid `Model.all.each` over large tables — use `find_each(batch_size: 100)` for batched iteration with constant memory.
+- Migrations are forward-only in production. Mark destructive migrations with `safety_assured` (`strong_migrations` gem) only after review. Run migrations in a separate deploy step from code rollout to maintain rollback ability.
+- Use `optimize_for_inference_of_query` for complex scopes; avoid hand-written SQL strings (use Arel or query objects for parameterized custom SQL).
+## Hotwire & ViewComponent
+- Turbo Frames (`turbo_frame_tag`) for in-page partial updates. Turbo Streams (`turbo_stream.replace`, `.append`, `.update`) for server-pushed UI updates over WebSocket / Server-Sent Events.
+- Stimulus controllers for client-side interactivity (`app/javascript/controllers/`). Keep controllers small (≤100 lines). Use Stimulus values + classes for state; never reach into other controllers' DOM.
+- ViewComponent (`view_component` gem) for testable, reusable UI components. Each component has a `*.rb` class and `*.html.erb` template with co-located preview (`spec/components/<name>_preview.rb`).
+- Avoid jQuery and ad-hoc JavaScript files — Stimulus and Turbo cover 90% of interactivity needs in Rails apps.
+## Background Jobs
+- Active Job with SolidQueue (Rails 8 default), Sidekiq (Redis-backed), or GoodJob (Postgres-backed). Pick one and document in `docs/architecture.md`.
+- Configure retry policy explicitly: `retry_on StandardError, attempts: 3, wait: :exponentially_longer`. Default retry-forever is a footgun.
+- Idempotency keys for jobs touching external APIs — pass the key as a job argument, persist on first execution, no-op on retry with same key.
+- Set queue priorities: `queue_as :critical | :default | :low`. Critical for user-facing latency-sensitive work, low for background reporting.
+## Testing
+- RSpec (`rspec-rails`) for new projects — `Capybara` for system tests. Minitest is acceptable for legacy / official-Rails-pattern projects.
+- Test types under `spec/`:
+  - `spec/models/`, `spec/services/`, `spec/jobs/` — unit tests.
+  - `spec/requests/` — request specs (full middleware stack, faster than feature specs).
+  - `spec/system/` — system tests (Capybara + headless Chrome).
+- Database cleanup: `database_cleaner-active_record` with `:truncation` for system tests, transactional fixtures for unit tests. Never use `DatabaseCleaner` against production-like data.
+- Mock HTTP with `webmock` + VCR for cassette-based replay. Never hit real network in tests.
+- Factory definitions in `spec/factories/` with `factory_bot_rails`. Avoid fixtures — they become stale and tightly coupled.
+- Coverage: `simplecov` with floor 80% in `app/`; 90% in `app/services/` and `app/policies/`.
+## Security
+- Brakeman in CI: `bundle exec brakeman --no-pager`. Block merge on high-confidence warnings.
+- Strong parameters on every controller action that mutates state. Never `params.permit!` blindly.
+- Authorization via Pundit policies (`app/policies/`). Controllers call `authorize @post` before mutations. Never authorize in views — too late.
+- CSRF: Rails enables `protect_from_forgery` by default. Do not disable globally; disable per-action only for explicit API endpoints with token auth.
+- Encrypted credentials: `bin/rails credentials:edit` for secrets at rest. Never commit `master.key` to VCS.
+## Bundler & Dependency Hygiene
+- Pin gems in `Gemfile` with pessimistic version constraints (`~> 7.2`). Avoid `gem 'foo'` without a version pin.
+- `Gemfile.lock` committed for applications. Library gems typically omit the lock.
+- Vulnerability scanning: `bundle audit --update` against the rubysec/ruby-advisory-db. Block merge on advisories without acknowledged remediation.
+- License compliance: `license_finder` with an allowlist. Block GPL contamination.
+## Performance
+- YJIT enabled in production (`config/boot.rb`: `RubyVM::YJIT.enable`). Verify with `ruby --yjit --version`.
+- Profile with `rack-mini-profiler` in dev / staging; `vernier` or `stackprof` for production captures.
+- Use `Bullet` to catch N+1 queries in dev / CI. Treat N+1 violations as test failures.
+- Cache layer: `Rails.cache.fetch` for read-heavy data with explicit TTL. Use Solid Cache (Rails 8 default), Memcached, or Redis — pin one per environment.
+## References
+- Ruby 3.3 release notes: https://www.ruby-lang.org/en/news/2023/12/25/ruby-3-3-0-released/ (accessed 2026-05-27, official-docs)
+- Rails 8 release notes: https://rubyonrails.org/2024/11/8/Rails-8-no-paas-required (accessed 2026-05-27, official-docs)
+- Hotwire docs: https://hotwired.dev/ (accessed 2026-05-27, official-docs)
+- ViewComponent: https://viewcomponent.org/ (accessed 2026-05-27, official-docs)
+## Cross-References
+- `rules/hatch3r-api-design.md` — REST contract floors apply to Rails API endpoints.
+- `rules/hatch3r-testing.md` — coverage thresholds carry over to `bundle exec rspec` + SimpleCov.
+- `rules/hatch3r-secrets-management.md` — credentials and `.env` handling patterns.

package/dist/content/rules/hatch3r-ruby-rails-patterns.mdc ADDED Viewed

@@ -0,0 +1,106 @@
+---
+description: Ruby 3.3+ and Rails 8.x conventions covering Hotwire (Turbo + Stimulus), ActiveRecord patterns, Sidekiq jobs, RSpec testing, RuboCop / Standard, and YJIT performance
+globs: ["**/*.rb", "**/*.rake", "**/Gemfile", "**/Gemfile.lock", "**/Rakefile", "**/config.ru", "**/.rubocop.yml", "**/.rubocop.yaml", "**/.standard.yml", "**/app/**", "**/config/**", "**/db/migrate/**", "**/lib/**", "**/spec/**", "**/test/**"]
+alwaysApply: false
+---
+# Ruby / Rails Patterns
+**Pillars:** P2 (Scientific & Practical Quality), CQ8 (Maintainability Quality)
+> Applies when the project ships a Ruby application. Detection signals: `Gemfile` at repo root, `config/application.rb` (Rails), `.ruby-version`, or any `*.rb` file. Sinatra and Hanami projects share most of the Ruby-level guidance here.
+## Ruby Language Floor
+- Target Ruby 3.3+ (3.4 recommended for new projects). Use pattern matching (`case/in`), rightward assignment (`x => y`), endless methods (`def square(x) = x * x`) when they improve readability — not as defaults.
+- Enable YJIT in production (`--yjit` flag or `RUBY_YJIT_ENABLE=1`). YJIT delivers 15–25% throughput improvements on Rails workloads with no code changes.
+- Sorbet (`sorbet-runtime`) or RBS (`steep`) for gradual typing. Type-check business logic and public API surfaces; skip view code and trivial helpers.
+- Format with Standard Ruby (`standardrb`) or RuboCop with `rubocop-rails` + `rubocop-rspec`. Pin in CI; reformat-on-save in editors.
+## Project Layout (Rails)
+- Default Rails structure:
+  - `app/models/` — ActiveRecord models and POROs.
+  - `app/controllers/` — controllers (HTTP only).
+  - `app/views/` — templates (ERB / Slim / HAML).
+  - `app/components/` — ViewComponent (`view_component` gem) for reusable UI components.
+  - `app/services/<Domain>/` — service objects (single public `call` method).
+  - `app/jobs/` — Active Job / Sidekiq workers.
+  - `app/policies/` — Pundit policies (or equivalent authorization).
+- Service objects (`app/services/`) for multi-step business operations. Thin controllers → service object → return result struct. Never put complex logic in controllers or models.
+- Keep models focused: validations, associations, scopes. Move complex queries to query objects (`app/queries/`) and complex callbacks to dedicated observers / commands.
+## Rails 8.x
+- Rails 8.0 is the floor (Nov 2024 release). It bundles SolidQueue, SolidCache, and SolidCable — drop Redis-only deployments for new apps unless throughput requires it.
+- Hotwire (Turbo + Stimulus) is the default for interactive UI — no separate SPA. Use `turbo_frame_tag` and `turbo_stream` responses for in-page updates without writing custom JavaScript.
+- Authentication: built-in `bin/rails generate authentication` scaffold (Rails 8 default). Use `Devise` only if the project needs OAuth / SAML out of the box.
+- Skip Webpacker — use the bundled `propshaft` asset pipeline + `importmap-rails` for ESM imports without a Node build step. Use `jsbundling-rails` (esbuild/rollup/vite) only when the project needs heavy JS tooling.
+## ActiveRecord
+- Define explicit `strong_parameters` in controllers (`params.expect(user: [:name, :email])`). Mass-assignment vulnerabilities are real.
+- N+1 query prevention: eager-load with `.includes(:association)` or `.preload(:association)`. Use the `bullet` gem in development + CI to detect N+1 patterns.
+- Avoid `Model.all.each` over large tables — use `find_each(batch_size: 100)` for batched iteration with constant memory.
+- Migrations are forward-only in production. Mark destructive migrations with `safety_assured` (`strong_migrations` gem) only after review. Run migrations in a separate deploy step from code rollout to maintain rollback ability.
+- Use `optimize_for_inference_of_query` for complex scopes; avoid hand-written SQL strings (use Arel or query objects for parameterized custom SQL).
+## Hotwire & ViewComponent
+- Turbo Frames (`turbo_frame_tag`) for in-page partial updates. Turbo Streams (`turbo_stream.replace`, `.append`, `.update`) for server-pushed UI updates over WebSocket / Server-Sent Events.
+- Stimulus controllers for client-side interactivity (`app/javascript/controllers/`). Keep controllers small (≤100 lines). Use Stimulus values + classes for state; never reach into other controllers' DOM.
+- ViewComponent (`view_component` gem) for testable, reusable UI components. Each component has a `*.rb` class and `*.html.erb` template with co-located preview (`spec/components/<name>_preview.rb`).
+- Avoid jQuery and ad-hoc JavaScript files — Stimulus and Turbo cover 90% of interactivity needs in Rails apps.
+## Background Jobs
+- Active Job with SolidQueue (Rails 8 default), Sidekiq (Redis-backed), or GoodJob (Postgres-backed). Pick one and document in `docs/architecture.md`.
+- Configure retry policy explicitly: `retry_on StandardError, attempts: 3, wait: :exponentially_longer`. Default retry-forever is a footgun.
+- Idempotency keys for jobs touching external APIs — pass the key as a job argument, persist on first execution, no-op on retry with same key.
+- Set queue priorities: `queue_as :critical | :default | :low`. Critical for user-facing latency-sensitive work, low for background reporting.
+## Testing
+- RSpec (`rspec-rails`) for new projects — `Capybara` for system tests. Minitest is acceptable for legacy / official-Rails-pattern projects.
+- Test types under `spec/`:
+  - `spec/models/`, `spec/services/`, `spec/jobs/` — unit tests.
+  - `spec/requests/` — request specs (full middleware stack, faster than feature specs).
+  - `spec/system/` — system tests (Capybara + headless Chrome).
+- Database cleanup: `database_cleaner-active_record` with `:truncation` for system tests, transactional fixtures for unit tests. Never use `DatabaseCleaner` against production-like data.
+- Mock HTTP with `webmock` + VCR for cassette-based replay. Never hit real network in tests.
+- Factory definitions in `spec/factories/` with `factory_bot_rails`. Avoid fixtures — they become stale and tightly coupled.
+- Coverage: `simplecov` with floor 80% in `app/`; 90% in `app/services/` and `app/policies/`.
+## Security
+- Brakeman in CI: `bundle exec brakeman --no-pager`. Block merge on high-confidence warnings.
+- Strong parameters on every controller action that mutates state. Never `params.permit!` blindly.
+- Authorization via Pundit policies (`app/policies/`). Controllers call `authorize @post` before mutations. Never authorize in views — too late.
+- CSRF: Rails enables `protect_from_forgery` by default. Do not disable globally; disable per-action only for explicit API endpoints with token auth.
+- Encrypted credentials: `bin/rails credentials:edit` for secrets at rest. Never commit `master.key` to VCS.
+## Bundler & Dependency Hygiene
+- Pin gems in `Gemfile` with pessimistic version constraints (`~> 7.2`). Avoid `gem 'foo'` without a version pin.
+- `Gemfile.lock` committed for applications. Library gems typically omit the lock.
+- Vulnerability scanning: `bundle audit --update` against the rubysec/ruby-advisory-db. Block merge on advisories without acknowledged remediation.
+- License compliance: `license_finder` with an allowlist. Block GPL contamination.
+## Performance
+- YJIT enabled in production (`config/boot.rb`: `RubyVM::YJIT.enable`). Verify with `ruby --yjit --version`.
+- Profile with `rack-mini-profiler` in dev / staging; `vernier` or `stackprof` for production captures.
+- Use `Bullet` to catch N+1 queries in dev / CI. Treat N+1 violations as test failures.
+- Cache layer: `Rails.cache.fetch` for read-heavy data with explicit TTL. Use Solid Cache (Rails 8 default), Memcached, or Redis — pin one per environment.
+## References
+- Ruby 3.3 release notes: https://www.ruby-lang.org/en/news/2023/12/25/ruby-3-3-0-released/ (accessed 2026-05-27, official-docs)
+- Rails 8 release notes: https://rubyonrails.org/2024/11/8/Rails-8-no-paas-required (accessed 2026-05-27, official-docs)
+- Hotwire docs: https://hotwired.dev/ (accessed 2026-05-27, official-docs)
+- ViewComponent: https://viewcomponent.org/ (accessed 2026-05-27, official-docs)
+## Cross-References
+- `rules/hatch3r-api-design.md` — REST contract floors apply to Rails API endpoints.
+- `rules/hatch3r-testing.md` — coverage thresholds carry over to `bundle exec rspec` + SimpleCov.
+- `rules/hatch3r-secrets-management.md` — credentials and `.env` handling patterns.