npm - devrites - Versions diffs - 2.0.0 → 2.2.0 - Mend

devrites 2.0.0 → 2.2.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (24) hide show

package/CHANGELOG.md +17 -0
package/README.md +2 -1
package/docs/flow.md +1 -1
package/docs/orchestration.md +71 -0
package/docs/skills.md +1 -1
package/pack/.claude/agents/devrites-code-reviewer.md +34 -3
package/pack/.claude/agents/devrites-performance-reviewer.md +56 -8
package/pack/.claude/agents/devrites-security-auditor.md +21 -1
package/pack/.claude/rules/README.md +4 -2
package/pack/.claude/rules/afk-hitl.md +6 -0
package/pack/.claude/rules/core.md +1 -1
package/pack/.claude/rules/deprecation.md +50 -0
package/pack/.claude/rules/observability.md +60 -0
package/pack/.claude/rules/patterns.md +3 -0
package/pack/.claude/rules/performance.md +9 -0
package/pack/.claude/rules/security.md +31 -0
package/pack/.claude/skills/devrites-browser-proof/SKILL.md +23 -3
package/pack/.claude/skills/rite-prove/SKILL.md +10 -0
package/pack/.claude/skills/rite-review/reference/five-axis-review.md +15 -1
package/pack/.claude/skills/rite-review/reference/performance-checklist.md +80 -0
package/pack/.claude/skills/rite-review/reference/performance-review.md +4 -0
package/pack/.claude/skills/rite-seal/SKILL.md +11 -0
package/package.json +1 -1
package/scripts/validate.sh +15 -6

package/CHANGELOG.md CHANGED Viewed

@@ -2,6 +2,23 @@
 All notable changes to DevRites are documented here. The format follows [Keep a Changelog](https://keepachangelog.com/en/1.1.0/) and DevRites adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html). Releases are generated automatically by [semantic-release](https://semantic-release.gitbook.io/) from Conventional Commits on `main`.
+## [2.2.0](https://github.com/ViktorsBaikers/DevRites/compare/v2.1.0...v2.2.0) (2026-06-23)
+### Added
+* **rules:** add observability + deprecation rules, OWASP LLM Top 10 ([0c29760](https://github.com/ViktorsBaikers/DevRites/commit/0c29760e504c22c18d7ca2c5e1986746ba95a07d))
+### Fixed
+* **release:** drop duplicate 2.1.0 changelog block ([0a7b510](https://github.com/ViktorsBaikers/DevRites/commit/0a7b5101ecd05e4c0b4d3812cefc68c0f92f1626))
+## [2.1.0](https://github.com/ViktorsBaikers/DevRites/compare/v2.0.0...v2.1.0) (2026-06-23)
+### Added
+* **agents:** code reviewer gets structural-depth lenses ([b37c9d2](https://github.com/ViktorsBaikers/DevRites/commit/b37c9d259b99613b971269fac2943977dedbea6d))
+* **agents:** perf reviewer gets Source/Measured CWV modes ([be82b20](https://github.com/ViktorsBaikers/DevRites/commit/be82b20b37252a9f55d77e0f71659ec58a213ca6))
 ## [2.0.0](https://github.com/ViktorsBaikers/DevRites/compare/v1.18.0...v2.0.0) (2026-06-23)
 ### ⚠ BREAKING CHANGES

package/README.md CHANGED Viewed

@@ -94,7 +94,7 @@ Full diagram set (lifecycle, polish orchestrator, review fan-out, debug loop,
 rules carrier, workspace state, namespace map) →
 [`docs/flow.md`](docs/flow.md).
-**Status:** [`v2.0.0`](https://github.com/ViktorsBaikers/DevRites/releases/tag/v2.0.0) — see [`CHANGELOG.md`](CHANGELOG.md) for release notes.
+**Status:** [`v2.2.0`](https://github.com/ViktorsBaikers/DevRites/releases/tag/v2.2.0) — see [`CHANGELOG.md`](CHANGELOG.md) for release notes.
 ## Contents
@@ -113,6 +113,7 @@ rules carrier, workspace state, namespace map) →
 [skills catalogue](docs/skills.md) ·
 [command map](docs/command-map.md) ·
 [flow diagrams](docs/flow.md) ·
+[orchestration](docs/orchestration.md) ·
 [usage examples](docs/usage.md) ·
 [release pipeline](docs/release.md) ·
 [engineering rules](pack/.claude/rules/README.md)

package/docs/flow.md CHANGED Viewed

@@ -177,7 +177,7 @@ demands. No carrier skill, no session-start autoload.
 ```mermaid
 flowchart TD
     R[rite-* skill<br/>step 0] -->|always-on| Core[.claude/rules/core.md]
-    R -->|on demand index| Idx[(.claude/rules/README.md<br/>15 specialist rule files)]
+    R -->|on demand index| Idx[(.claude/rules/README.md<br/>19 specialist rule files)]
     Idx --> CS[coding-style.md]
     Idx --> EH[error-handling.md]
     Idx --> T[testing.md]

package/docs/orchestration.md ADDED Viewed

@@ -0,0 +1,71 @@
+# Orchestration patterns
+How DevRites coordinates multiple agents — the patterns it uses, the ones it deliberately avoids,
+and where Claude Code's Agent Teams and worktree isolation fit. The rule that governs this lives in
+[`pack/.claude/rules/agents.md`](../pack/.claude/rules/agents.md); this doc is the map.
+## The model
+DevRites separates three roles and never blurs them:
+- **Orchestrator** — the active `rite-*` skill (chiefly `/rite-build` and `/rite-seal`). It owns
+  the gates and the `.devrites/` workspace, dispatches the other agents, and is the *single
+  canonical writer* of workspace state.
+- **Reviewers** — fresh-context, **read-only** subagents under `.claude/agents/`. Each gets the
+  workspace path + the diff and returns labelled findings; read-only is enforced at the tool layer
+  (`devrites-reviewer-readonly.sh`), not merely promised.
+- **Executor** — `devrites-slice-wright`, the one **write-capable** agent. It implements a single
+  fully-specified slice in a fresh context and returns code + tests; it never writes the `.devrites/`
+  bookkeeping (the orchestrator does).
+Fresh, undirected context is the point: an agent gets the contract, not the author's reasoning, so
+its judgment is independent.
+## Endorsed patterns
+1. **Direct (no orchestration).** Most phases are one skill doing one job. Don't spawn an agent for
+   work the phase can do inline.
+2. **Parallel read-only fan-out.** At `/rite-seal` the relevant reviewers run *in parallel*, then
+   the orchestrator reconciles — banding by confidence and surfacing genuine disagreement explicitly
+   rather than averaging it away.
+3. **Single writer.** Exactly one `devrites-slice-wright` per slice, never a parallel fan-out of
+   writers — concurrent writers make conflicting implicit decisions that corrupt a coherent design.
+4. **Lifecycle as user-driven verbs.** Each verb performs one mutation and stops; chaining is
+   explicit (the user types the next command), so there are no hidden side effects between phases.
+   `/rite-autocomplete` is the one deliberate exception — an opt-in unattended driver.
+5. **Adversarial single-claim check.** `devrites-doubt` spawns a fresh reviewer to try to refute one
+   load-bearing decision, rather than asking the author to re-grade their own work.
+## Anti-patterns DevRites avoids
+- **A persona that paraphrases another.** Passing one agent's summary to the next is lossy telephone;
+  reviewers read the raw diff and contract, not a digest of the author's reasoning.
+- **Parallel writers.** See single-writer above — two agents editing the same feature concurrently
+  is a merge of conflicting decisions, not a speed-up.
+- **A router that does the work.** `/rite` is a *thin dispatcher* — it renders the menu, resolves a
+  verb to a skill, and gets out of the way. It holds no phase logic and produces no artifact itself.
+  (This is the distinction that makes a router fine: the anti-pattern is a "meta-orchestrator"
+  persona that paraphrases and re-decides on every call, not a dispatch table.)
+- **Deep persona trees.** Agents don't call agents that call agents. The orchestrator dispatches one
+  level deep — reviewers and the wright — and reconciles the results itself.
+## Agent Teams and worktree isolation
+Two Claude Code capabilities sit adjacent to DevRites' model; the stance on each is deliberate.
+- **Agent Teams.** DevRites does **not** use Agent Teams to run the lifecycle. The on-disk workspace
+  plus fresh-context read-only fan-out already give independent context per agent without the
+  coordination overhead, and the lifecycle is intentionally single-slice / single-writer. Reach for
+  Agent Teams for work *outside* that discipline — competing-hypothesis debugging, or exploring
+  several genuinely different approaches in parallel — not to parallelise a DevRites build.
+- **Worktree isolation.** Orthogonal to DevRites. A single feature is single-writer on one branch by
+  design, so DevRites doesn't spawn worktrees itself — but you can run DevRites *inside* a git
+  worktree to drive two features in parallel without them colliding. The `.devrites/ACTIVE` sentinel
+  is per-working-tree, so each worktree carries its own active feature.
+## See also
+- [`pack/.claude/rules/agents.md`](../pack/.claude/rules/agents.md) — the reviewer / executor roster
+  and the when-to-fan-out rules.
+- [`architecture.md`](architecture.md) — the full layer model.
+- [`flow.md`](flow.md) — phase-by-phase flow and the public/internal namespace.

package/docs/skills.md CHANGED Viewed

@@ -154,7 +154,7 @@ The 9 model-invoked internal specialists (hidden from the menu): `devrites-inter
 | Skill | What It Does | Use When |
 |---|---|---|
-| (engineering rules) | Live at `.claude/rules/` post-install — each `rite-*` skill Reads `.claude/rules/core.md` as its first step (step 0); the other 15 rule files (`coding-style.md`, `error-handling.md`, `testing.md`, `code-review.md`, `security.md`, `performance.md`, `patterns.md`, `git-workflow.md`, `hooks.md`, `documentation.md`, `development-workflow.md`, `agents.md`, `context-hygiene.md`, `afk-hitl.md`, `anti-patterns.md`) load on demand per skill body. No carrier skill, no session-start autoload. | n/a — step-0 core / on-demand by path. |
+| (engineering rules) | Live at `.claude/rules/` post-install — each `rite-*` skill Reads `.claude/rules/core.md` as its first step (step 0); the other 19 rule files (`coding-style.md`, `prose-style.md`, `error-handling.md`, `testing.md`, `code-review.md`, `security.md`, `performance.md`, `observability.md`, `patterns.md`, `git-workflow.md`, `hooks.md`, `documentation.md`, `development-workflow.md`, `deprecation.md`, `agents.md`, `context-hygiene.md`, `anti-patterns.md`, `afk-hitl.md`, `tooling.md`) load on demand per skill body. No carrier skill, no session-start autoload. | n/a — step-0 core / on-demand by path. |
 ---

package/pack/.claude/agents/devrites-code-reviewer.md CHANGED Viewed

@@ -25,13 +25,44 @@ scope. Read `spec.md` (objective + acceptance criteria), `tasks.md`, `decisions.
 - **Tests first** — do they exist and would they fail if the code were wrong? Do they
   cover the acceptance criteria and the edge/error cases?
 - **Correctness** — logic, null/empty/boundary, error paths, races, wrong assumptions.
-- **Readability** — naming, function size, nesting, comments that explain *why*.
+- **Readability** — naming, function size, nesting, comments that explain *why*. Watch
+  for a new conditional **bolted onto an unrelated flow** (a design smell, not a nit —
+  the logic wants its own helper / state / policy) and **repeated conditionals on the
+  same shape**, which signal a missing model or dispatcher.
 - **Architecture** — right boundary, coupling/cohesion, fits existing patterns, no
-  premature abstraction.
-- **Maintainability** — dead code, leftover TODOs/logs, convention drift.
+  premature abstraction. Press three structural questions: does a refactor **reduce**
+  complexity or just **relocate** it (count the concepts a reader must hold — if a
+  "cleaner" version leaves that count unchanged, it isn't cleaner); is feature-specific
+  logic **leaking into a shared/general module** instead of its owning layer; is a **type
+  boundary** left implicit by a gratuitous `any`/`unknown`/cast or a silent fallback that
+  papers over an unclear invariant.
+- **Maintainability** — dead code, leftover TODOs/logs, convention drift. Watch **file
+  size, not just diff size**: a small diff that pushes an already-large file further past
+  a healthy boundary wants decomposition (extract helpers / split modules) *first* — flag
+  decompose-then-add.
 - **Standards** — conformance to the project's conventions and the DevRites rules
   (naming, error handling, security, git/commit hygiene where the diff touches them).
+## Structural depth — propose the move, not just the problem
+When you flag a structural finding, name the **remedy**, don't stop at "this is complex" —
+a finding that only describes the smell leaves the author guessing. Reach for a named
+restructuring and prefer the one that **removes moving pieces** over one that spreads the
+same complexity around:
+- Replace a chain of conditionals with a typed model or an explicit dispatcher.
+- Collapse duplicate branches into one clearer flow.
+- Separate orchestration from business logic so each reads on its own.
+- Move feature-specific logic out of a shared module into the package that owns it; reuse
+  the canonical helper instead of a bespoke near-duplicate.
+- Make a type boundary explicit so downstream branching disappears.
+- Delete a pass-through wrapper that adds indirection without clarifying the API.
+- Extract a helper, or split a large file into focused modules.
+Severity follows impact, not how structural it is: a real maintainability risk is
+**Important**; a behavior-preserving tidy-up the author can take or leave is a
+**Suggestion**. Lead with the structural finding — if you have one and ten nits, the
+structural one *is* the review. Stay in feature scope; a project-wide restructuring is an
+FYI follow-up, not a blocker on this diff.
 ## Rules
 - Stay in feature scope (touched files + diff). Out-of-scope problems → FYI follow-ups.
 - Do **not** edit code. Return findings only.

package/pack/.claude/agents/devrites-performance-reviewer.md CHANGED Viewed

@@ -1,6 +1,6 @@
 ---
 name: devrites-performance-reviewer
-description: Fresh-context, measure-first performance reviewer for /rite-seal. Use to independently review a DevRites feature diff for N+1s, hot-path work, payload/bundle size, and Core Web Vitals risks. Won't claim a slowdown without a number or a measurement to take.
+description: Fresh-context, measure-first performance reviewer for /rite-seal. Use to independently review a DevRites feature diff for N+1s, hot-path work, payload/bundle size, and Core Web Vitals risks. Runs in Source mode (static scan, findings tagged potential) or Measured mode (judges real Lighthouse/PSI/CrUX/trace numbers and leads with a source-labeled CWV scorecard). Won't claim a slowdown without a number or a measurement to take, and never presents lab data as field data.
 tools: Read, Grep, Glob, Bash
 hooks:
   PreToolUse:
@@ -17,19 +17,50 @@ You are measure-first: no performance claim without a number or a specified meas
 ## Inputs
 Workspace `.devrites/work/<slug>/`: read `spec.md` (any perf budget), `evidence.md`,
-`touched-files.md`. Run `git diff` and read the touched files.
+`touched-files.md`. Run `git diff` and read the touched files. Also look for Core Web
+Vitals artifacts — numbers already in `evidence.md`, a Lighthouse / PageSpeed Insights /
+CrUX JSON path the builder left, or a browser-proof capture in `browser-evidence.md`.
+Read the baseline checklist on demand (resolve the path like the readonly hook):
+```
+C=.claude/skills/rite-review/reference/performance-checklist.md
+[ -f "$C" ] || C="$CLAUDE_PLUGIN_ROOT/pack/.claude/skills/rite-review/reference/performance-checklist.md"
+[ -f "$C" ] || C=pack/.claude/skills/rite-review/reference/performance-checklist.md
+```
+## Two modes (the inputs set the mode, not a flag)
+- **Source mode** — default, no perf artifacts present. Scan the diff statically for
+  structural anti-patterns. Every frontend finding is **potential impact**, never a
+  measurement; name the command that would confirm it. Emit **no scorecard**.
+- **Measured mode** — a CWV artifact or a real number exists. Judge it against the
+  `spec.md` budget or the pre-change baseline, and lead with the scorecard.
+Source mode is the same discipline as the old "specify the measurement" — it just names
+the contract for when the scorecard appears.
 ## Review (feature scope)
-- **Backend** — N+1 queries, missing indexes on new queries, unbounded result sets,
-  per-request work that should be cached/batched, blocking sync work.
-- **Frontend (Core Web Vitals)** — LCP (oversized images, render-blocking work), CLS
-  (layout shift), INP (interaction latency), bundle growth, unnecessary re-renders.
+- **Backend** (always, every feature) — N+1 queries, missing indexes on new queries,
+  unbounded result sets, per-request work that should be cached/batched, blocking sync
+  work. *AI-codegen smell:* over-fetching "just in case", sequential `await`s where
+  `Promise.all` fits, redundant calls a dedup would collapse.
+- **Frontend (Core Web Vitals)** — only when the feature is UI-facing. Identify the
+  framework and rendering model first (React / Vue / Svelte / Angular / Next / Astro /
+  vanilla) and apply only that stack's idioms — don't recommend `next/image` to a Vue app
+  or `React.memo` to Svelte. Check LCP (oversized images, render-blocking work, missing
+  `fetchpriority`), CLS (layout shift, missing image dimensions), INP (long tasks, heavy
+  event handlers), bundle growth, unnecessary re-renders. *AI-codegen smell:* `memo` /
+  `useMemo` / `useCallback` wrapping everything, over-eager effect deps, broad watchers.
 - **General** — accidental quadratic loops, repeated hot-path work, large allocations.
 ## Measure-first discipline
-- If a real number exists in `evidence.md`, judge it against the budget/baseline.
+- If a real number exists, judge it against the budget/baseline; state the before/after.
 - If not, **specify the measurement** (command, scenario, metric) instead of asserting a
   regression. Distinguish "measured regression" from "likely hot spot, verify with X".
+- **Source-honesty.** Label every measured CWV value with where it came from —
+  `Field (CrUX)` (real users, p75), `Lab (Lighthouse)` (one synthetic run), or
+  `Trace (DevTools)`. Field and lab are not interchangeable; presenting one as the other
+  is fabrication. Reading static source cannot measure LCP / INP / CLS — never invent a
+  number for a value you did not capture.
 ## Rules
 - Don't edit. Findings only, labeled Critical / Important / Suggestion / Nit / FYI with
@@ -37,9 +68,26 @@ Workspace `.devrites/work/<slug>/`: read `spec.md` (any perf budget), `evidence.
   micro-opt with no measured impact is a Suggestion at most. Feature scope only.
 ## Output
+**Measured mode** — lead with a compact scorecard, then the line findings:
+```
+Performance review (<slug>) — independent
+Scorecard (source-labeled):
+  LCP <value>  <Field(CrUX) | Lab(LH) | Trace>  <Good/Needs Work/Poor>  (target ≤2.5s)
+  INP <value>  <source>                          <status>               (target ≤200ms)
+  CLS <value>  <source>                          <status>               (target ≤0.1)
+  [Lighthouse perf <score> Lab(LH)]  Artifacts: <which>  Stack: <detected>
+[Critical]/[Important] file:line — issue. measured: <number>. direction.
+[Suggestion]/[Nit]/[FYI] ...
+Budget: <breached? | none stated>
+Verdict: <blockers? none/list>
+```
+**Source mode** — no artifacts; one scorecard line, findings tagged `potential`:
 ```
 Performance review (<slug>) — independent
-[Important] file:line — issue. measured: <number | "measure: <cmd/metric>">. direction.
+Scorecard: not measured (Source mode)
+[Important] file:line — issue. potential; verify: <cmd/metric>. direction.
 [Suggestion]/[Nit]/[FYI] ...
 Budget: <breached? | none stated>
 To prove any win: <measure X before/after>

package/pack/.claude/agents/devrites-security-auditor.md CHANGED Viewed

@@ -1,6 +1,6 @@
 ---
 name: devrites-security-auditor
-description: Fresh-context security auditor for /rite-seal. Use to independently audit a DevRites feature diff for OWASP Top 10 issues, trust-boundary violations, secrets, and dependency risk. Adversarial — assumes input is hostile.
+description: Fresh-context security auditor for /rite-seal. Use to independently audit a DevRites feature diff for OWASP Top 10 issues, trust-boundary violations, secrets, dependency risk, and — when the feature has an AI/LLM surface (model calls, agents, RAG, tool-use) — the OWASP LLM Top 10. Adversarial — assumes input is hostile.
 tools: Read, Grep, Glob, Bash
 hooks:
   PreToolUse:
@@ -31,6 +31,25 @@ Workspace `.devrites/work/<slug>/`: read `spec.md` (data model / API / affected
 - **Dependencies** — new/updated packages free of known-vuln versions.
 - **Deserialization** of untrusted data.
+## AI / LLM surface (only when the feature calls a model / builds an agent / does RAG / exposes tool-use)
+Apply the OWASP LLM Top 10 (`.claude/rules/security.md` § AI / LLM features):
+- **Prompt injection (LLM01)** — untrusted text fenced as data, not concatenated into a
+  privileged prompt; no authority-widening.
+- **Improper output handling (LLM05)** — model output treated as untrusted input: escaped /
+  parameterized / validated before HTML, SQL, shell, or a tool call. Never `eval`/render/exec raw.
+- **Excessive agency (LLM06)** — least tools/scopes/autonomy; destructive or outbound actions
+  behind a model decision gated or allowlisted, not taken on the model's say-so.
+- **Disclosure / prompt leakage (LLM02 / LLM07)** — no secret in the system prompt or context;
+  authz server-side, not prompt-enforced; PII/secrets not fed to the model or logged.
+- **Supply chain & poisoning (LLM03 / LLM04 / LLM08)** — models, weights, datasets, and RAG/
+  embedding sources pinned, vetted, and treated as untrusted.
+- **Overreliance (LLM09)** / **unbounded consumption (LLM10)** — grounded + human-in-loop for
+  consequential calls; rate/token/cost/time limits on model calls.
+When the diff touches DevRites' own agent surface (new agent, hook, or tool grant), apply the same
+lens to the pack itself: confirm least agency (read-only at the tool layer where it should be),
+no secret in any prompt, and model/tool output not trusted as instructions.
 ## Trust boundary
 Apply the three-tier discipline per `.claude/rules/security.md`. Flag any value
 reaching the trusted tier without crossing the boundary.
@@ -49,5 +68,6 @@ Security audit (<slug>) — independent
 [Important]/[Suggestion]/[Nit]/[FYI] ...
 Boundary check: <skips? | clean>
 Dependencies: <audited; issues?>
+LLM surface: <n/a | audited; issues?>
 Verdict: <GO-able / NO-GO — blockers>
 ```

package/pack/.claude/rules/README.md CHANGED Viewed

@@ -11,8 +11,8 @@ prefers what's already there).
 ## Loading model — progressive disclosure
 To keep context lean, the rules follow Claude's progressive-disclosure pattern. There
-are 18 rule files (plus this README index): each DevRites `rite-*` skill Reads
-`.claude/rules/core.md` as its first step; the other 17 rule files load on demand by the
+are 20 rule files (plus this README index): each DevRites `rite-*` skill Reads
+`.claude/rules/core.md` as its first step; the other 19 rule files load on demand by the
 phase that needs them.
 ### Always-on (read by each `rite-*` skill as step 0)
@@ -32,11 +32,13 @@ phase that needs them.
 | `code-review.md` | Small PRs, severity labels, what to check, actionable feedback. | `/rite-review`, `/rite-seal`. |
 | `security.md` | Untrusted input, least privilege, secrets, three-tier trust boundary, fail closed. | When input / auth / data / integrations are in scope. |
 | `performance.md` | Measure first, common pitfalls, prove the win. | When perf is in scope. |
+| `observability.md` | Structured logs, metrics/SLIs, traces, symptom-based alerts, verify-the-telemetry-fires — proof a feature works in prod. | When the change has a runtime surface (endpoint, job, integration, user flow); `/rite-prove`, `/rite-seal`. |
 | `patterns.md` | SOLID, composition, loose coupling, avoid over-engineering. | `/rite-build`, simplification audit. |
 | `git-workflow.md` | Conventional Commits, atomic commits, small PRs. | `/rite-ship` commit / push / tag steps. |
 | `hooks.md` | Stage checks by cost, fast local hooks, secret scanning. | Reference-only — read when setting up the project's git hooks; not auto-loaded by any phase. |
 | `documentation.md` | Explain why, keep current, record decisions. | `/rite-spec`, `/rite-define`, `/rite-seal`. |
 | `development-workflow.md` | Small batches, trunk-always-green, definition of done. | `/rite-define`, `/rite-plan`. |
+| `deprecation.md` | Code-as-liability, Hyrum's law, prove-unused-before-remove, expand→contract, deprecate-before-delete. The safe path behind the irreversible-migration gate. | When removing / replacing / migrating code, a feature, an API, or data. |
 | `agents.md` | DevRites review subagents + specialist skills, when to fan out. | `/rite-review`, `/rite-seal`. |
 | `context-hygiene.md` | `/clear` vs `/compact`, lost-in-the-middle, phase-aware hygiene footer. | Phase-end hygiene footer; choosing `/clear` vs `/compact`. |
 | `anti-patterns.md` | Pack-wide rationalizations + red flags the agent reaches for. | Loaded by each `rite-*/reference/anti-patterns.md`; loaded directly when reluctance is broader than the active phase. |

package/pack/.claude/rules/afk-hitl.md CHANGED Viewed

@@ -107,6 +107,12 @@ The following always invoke the checkpoint protocol, regardless of `Mode`, `Gate
 - Filesystem destruction outside the workspace.
 - Red tests / types / lint on slice completion (fail-on-red).
+When a pause clears and you proceed with a destructive migration, a removal, or a
+public-API break, take the **safe path** the gate stopped you for: expand→contract,
+prove the old path unused before removing it, and a rollback for every destructive step
+([`deprecation.md`](deprecation.md)). The gate exists to make you do it right, not to
+abandon the work.
 By default, AFK widens what's *automatic*; it never widens what's *irreversible*.
 **Maximal autonomy (`allow_irreversible: true` — opt-in, dangerous).** Setting this key in

package/pack/.claude/rules/core.md CHANGED Viewed

@@ -1,7 +1,7 @@
 # DevRites core rules — always-on
 The minimal always-on subset of the DevRites engineering rules. DevRites
-`rite-*` skills Read `.claude/rules/core.md` as their first step; the other 15
+`rite-*` skills Read `.claude/rules/core.md` as their first step; the other 19
 rule files in this directory load on demand by the phase that needs them (see
 `README.md` for the index).

package/pack/.claude/rules/deprecation.md ADDED Viewed

@@ -0,0 +1,50 @@
+# Deprecation & migration
+Code is a liability, not an asset. Every line carries ongoing cost — bugs to fix, dependencies
+to patch, security to maintain, the next engineer to onboard. Most teams build well and remove
+badly, so dead and superseded code accumulates. Removing code that no longer earns its keep, and
+moving users safely off the old path, is its own discipline — and a riskier one than writing new
+code, because the failure mode is breaking something you forgot depended on it.
+DevRites already **gates** the dangerous moves: destructive migration, auth/authz change,
+public-API break, and data-loss paths always pause ([`afk-hitl.md`](afk-hitl.md) irreversible-risk
+list). The gate stops you; this rule is the safe path it stops you to take.
+## Code is a liability
+Unused code is pure cost with no return — deleting earned-out code is a feature, not housekeeping.
+But "I think this is unused" is a guess; prove it (below) before you act on it.
+## Hyrum's law — observable behavior is the contract
+With enough consumers, *every* observable behavior is depended on by someone — not just the
+documented API, but the quirks: timing, error shapes, ordering, the off-by-one nobody noticed.
+Removing or changing a behavior you think is incidental can break invisible dependents. Assume a
+consumer relies on it; verify against real usage rather than the spec's intent.
+## Prove it's unused before you remove it
+- Find the dependents: callers, subscribers, stored data, external clients, scheduled jobs. Use a
+  code-intelligence index for blast radius ([`tooling.md`](tooling.md)), then confirm against
+  **runtime** usage — a log/metric on the path proves zero traffic in a way static search can't
+  ([`observability.md`](observability.md)). No-usage-confirmed beats no-usage-assumed.
+- If you can't prove zero usage, you're not removing — you're deprecating (below).
+## Expand → contract (parallel change)
+Never big-bang a breaking change. Three independently-shippable, independently-reversible steps:
+1. **Expand** — add the new path alongside the old; both work.
+2. **Migrate** — move consumers and data to the new path; watch the old path's usage fall to zero.
+3. **Contract** — remove the old path *only once telemetry confirms it's unused*.
+The same shape applies to data: add column → backfill → switch reads → drop the old column. Every
+destructive step has a rollback, or it doesn't ship.
+## Deprecate before delete
+- Mark the old path deprecated with a pointer to the replacement and a **removal trigger** — a
+  date or a condition ("when v1 traffic hits zero"), not a vague "soon". A deprecation with no
+  trigger is a permanent TODO.
+- Emit a usage signal on the deprecated path so the removal trigger is a measured fact, not a
+  hope ([`observability.md`](observability.md)).
+## Scope
+A removal or migration is a feature with its own spec, slices, and evidence — not a drive-by
+delete while you're in another change. A surprise deletion in an unrelated diff is a red flag
+([`anti-patterns.md`](anti-patterns.md)), and a destructive step trips the seal's risk-and-rollback
+check regardless.

package/pack/.claude/rules/observability.md ADDED Viewed

@@ -0,0 +1,60 @@
+# Observability
+Observability is proof the feature works in **production** — the evidence ladder extended
+past your machine. `/rite-prove` shows it works on localhost; observability is how you know
+it still works, and why it broke, once real traffic hits it. Un-instrumented code is a claim
+you can't verify after deploy.
+## Scope — when this applies
+Only when the change has a runtime surface worth debugging in prod: a new endpoint/route, a
+background job, a queue consumer, an external integration, a user-facing flow, or a new error
+path. Skip it for pure-internal refactors, docs, config-only, or type-only changes — the same
+scope discipline as [`performance.md`](performance.md). Don't instrument a typo fix.
+## The on-call test
+The litmus for "is this observable": **if this breaks at 3am, can you tell *what* broke and
+*why* from the signals alone — without shipping a new build just to add logging?** If the
+answer is no, it isn't done. Instrument the failure path you just wrote, not only the happy
+path.
+## Structured logs
+- Log the events you'd need to reconstruct a failure: request boundaries, state transitions,
+  external-call outcomes, validation rejections, and authz denials.
+- Structured (key/value or JSON), not string soup — a log you can't query is a log you won't
+  read. Carry a correlation id (request / trace / job id) so one incident's lines join up.
+- **Never log secrets, tokens, or PII** ([`security.md`](security.md),
+  [`error-handling.md`](error-handling.md)). Levels mean something: `error` is a page-worthy
+  claim, not routine flow.
+## Metrics & SLIs
+- Cover the signals that page someone: request rate, error rate, latency/duration, and
+  saturation of any bounded resource the change adds (a pool, a queue, a cache).
+- Emit a counter on the **failure** branch, not just success — an error you don't count is an
+  error you can't alert on.
+- Name the one Service Level Indicator for the feature's critical path; pin a target (SLO)
+  when the project tracks them.
+## Traces (across a boundary)
+When a request crosses a service, queue, or async boundary, propagate a trace/correlation id
+so the end-to-end path is reconstructable, and span the external call and the slow operation.
+A latency regression you can't attribute to a span is a guess.
+## Alerts — symptom, not cause
+Alert on user-visible symptoms (error-rate spike, SLO burn), not on every internal gauge — a
+noisy alert gets muted, and a muted alert is no alert. Every alert names an owner and a first
+action, or it's noise.
+## Verify the telemetry fires (evidence, not assumption)
+Instrumentation you added but never watched emit is unproven — the same standing as a test you
+never saw fail ([`testing.md`](testing.md) "See it fail first"). Trigger the path, confirm the
+log line / metric / span actually appears, and record the observation in `evidence.md`. "I
+added logging" with no observed emission is not done.
+## Confirm-before-remove
+Telemetry is also how you prove a removal is safe: query real usage before deleting code or a
+feature, rather than assuming it's dead ([`deprecation.md`](deprecation.md)). No-usage-confirmed
+beats no-usage-assumed.
+## Scope discipline
+Instrument what the change touches. Retrofitting observability across a whole service is its
+own effort — record it as a follow-up, don't smuggle it into an unrelated change.

package/pack/.claude/rules/patterns.md CHANGED Viewed

@@ -26,6 +26,9 @@ makes the design simpler to reason about, not to show it's there.
   cost you pay forever for a benefit you may never get.
 - Don't force a pattern everywhere; not every problem needs one.
 - Watch the cost: misused patterns add memory/indirection/overhead and obscure intent.
+- A refactor must **reduce** complexity, not just **relocate** it. Count the concepts a
+  reader must hold; if a "cleaner" version leaves that count unchanged it isn't cleaner —
+  prefer the restructuring that makes whole branches/modes/layers disappear.
 ## Anti-patterns to name and avoid
 - God object / god function doing everything; tight coupling across layers.

package/pack/.claude/rules/performance.md CHANGED Viewed

@@ -21,6 +21,15 @@ Measure first. An optimization without a measurement is a guess that adds comple
 - Don't trade correctness or readability for a micro-win that doesn't matter.
 - Prefer a better algorithm or query over micro-tuning; the big wins are structural.
+## Frontend — Core Web Vitals
+For UI work, measure-first means LCP / INP / CLS judged against real numbers, each labeled
+by source (`Field (CrUX)`, `Lab (Lighthouse)`, `Trace (DevTools)`) — field and lab are not
+interchangeable, and static source cannot measure a CWV. The reviewer captures these via the
+browser-proof ladder when a budget exists, then judges them in Measured mode (a
+source-labeled scorecard); with no artifact it runs in Source mode and names the command.
+Baseline checks + measurement commands:
+[`rite-review/reference/performance-checklist.md`](../skills/rite-review/reference/performance-checklist.md).
 ## Scope
 Optimize what the change touches or what a measurement flags. Project-wide performance
 work is its own effort — record it as a follow-up, don't smuggle it into an unrelated

package/pack/.claude/rules/security.md CHANGED Viewed

@@ -61,3 +61,34 @@ still untrusted data, and a fresh observation of the live code always overrides
   frontmatter hook (`devrites-reviewer-readonly.sh`) so a redirection attempt can't become a
   write; the one write-capable agent (`devrites-slice-wright`) is fenced to its `touched-files.md`
   scope separately (`devrites-wright-scope.sh` + `reconcile.sh`).
+## AI / LLM features — the OWASP LLM Top 10
+When the feature *itself* calls a model, builds an agent, does RAG, or exposes tool-use, the
+attack surface is the model, not just the code around it. The prompt-injection section above is
+the defender's baseline — it hardens DevRites' own agents (LLM01 from the inside); apply the same
+untrusted-content discipline to the user's LLM surface, plus the rest of the taxonomy. Conditional,
+like the rest of this file: it applies when an LLM surface is in scope, not to every change.
+- **Prompt injection (LLM01)** — untrusted text (user input, retrieved docs, tool output) is
+  data, never instructions. Don't concatenate it into a privileged prompt; fence it, and never
+  let it widen the model's authority. (The agent baseline above.)
+- **Improper output handling (LLM05)** — model output is untrusted *input* to the next system.
+  Never `eval` / render / exec it raw — escape before HTML, parameterize before SQL, validate
+  before a tool call. A model that emits `<script>` or `DROP TABLE` is just another injection
+  vector.
+- **Excessive agency (LLM06)** — give the model the *least* tools, scopes, and autonomy the task
+  needs. A destructive or outbound action behind a model decision needs a human gate or a hard
+  allowlist, not the model's say-so. (DevRites enforces this on itself: reviewers are read-only at
+  the tool layer; the one writer is scope-fenced.)
+- **Sensitive-info disclosure (LLM02) / system-prompt leakage (LLM07)** — assume the system prompt
+  and context are extractable. Put no secret in them; keep authz server-side, never "the prompt
+  told it not to"; don't feed PII/secrets to a model or log prompts/outputs in the clear.
+- **Supply chain & poisoning (LLM03 / LLM04 / LLM08)** — pin and vet models, weights, and datasets
+  like dependencies; treat third-party models and training/RAG data as untrusted. Embedding and
+  retrieval sources are an injection and poisoning surface — validate what you index.
+- **Misinformation / overreliance (LLM09)** — the model can be confidently wrong. Ground answers,
+  cite sources, keep a human in the loop for consequential decisions, and don't present generated
+  content as verified fact.
+- **Unbounded consumption (LLM10)** — rate-limit, cap tokens/cost, and time-out model calls; an
+  open-ended prompt loop is both a DoS and a bill.

package/pack/.claude/skills/devrites-browser-proof/SKILL.md CHANGED Viewed

@@ -22,12 +22,32 @@ of the ladder that's available; record which one.
    the project's existing commands. Don't add a new framework.
 5. **Manual fallback** — none available: record the limitation + exact manual steps.
+## Core Web Vitals capture (when the spec states a perf budget)
+If `spec.md` carries a perf budget — or a frontend regression risk is visible — capture the
+CWV numbers here so the perf reviewer judges real data instead of guessing. Use the highest
+rung available; **detect, don't install**.
+1. **Chrome DevTools MCP** (if configured) — `lighthouse_audit` for LCP/INP/CLS + the
+   Lighthouse performance score. Source label: **Lab (Lighthouse)**. A `performance_*` trace
+   gives **Trace (DevTools)** attribution.
+2. **browser-harness** — drive the route, capture a performance trace over CDP. Source
+   label: **Trace (DevTools)**.
+3. **CrUX / PageSpeed Insights** — only if the user supplied an API key. Field data, p75.
+   Source label: **Field (CrUX)**.
+4. **None available** → mark CWV **pending (manual)** and name the exact command
+   (`npx lighthouse <url> --output json …`). Don't install anything; don't fake a number.
+Write each captured value **with its source label** to `evidence.md` (so the perf reviewer
+reads it in Measured mode) and note the tool + route in `browser-evidence.md`. Never present
+a lab value as a field value or vice versa.
 ## Evidence schema → `browser-evidence.md`
 Tooling used · route(s) · viewports (375/768/1280) · screenshot paths **opened and
 described** · console errors/warnings · network failures · interaction path tested ·
-accessibility basics · responsive checks · **design-reference match** (if the spec saved
-references in `references/`, compare the built UI to them and note match/diffs) ·
-limitations.
+accessibility basics · responsive checks · **CWV capture** (tool + route + each
+source-labeled value, or `pending (manual)` + the command) · **design-reference match** (if
+the spec saved references in `references/`, compare the built UI to them and note
+match/diffs) · limitations.
 ## Hard rules
 - A screenshot **path is not proof** — open it and describe what's visible.

package/pack/.claude/skills/rite-prove/SKILL.md CHANGED Viewed

@@ -37,6 +37,8 @@ the affected criteria/routes to refresh proof before `/rite-seal`.
 pull these via `Read` when relevant:
 - `testing.md` — pyramid, determinism, no-flake discipline.
 - `performance.md` — measure first when perf is in scope.
+- `observability.md` — when the change has a runtime surface (endpoint, job, integration,
+  user flow): telemetry must be present **and observed to emit**, not assumed.
 ## Operating rules
 - Evidence over confidence. Feature scope only — fix within the feature or record a
@@ -97,6 +99,14 @@ pull these via `Read` when relevant:
    example tests miss the edge cases these explore. If the same unit regenerated from a paraphrased
    spec (or a second sample) **diverges in behaviour on shared inputs**, treat that as a low-confidence
    signal: under AFK it blocks an auto-GO and routes to HITL.
+5b. **Observability check (runtime surface only).** If the feature added an endpoint, job,
+   queue consumer, external integration, user-facing flow, or a new error path, apply the
+   on-call test (`observability.md`): are the signals needed to debug a prod failure present —
+   structured logs on the failure path, a metric/counter on errors, a trace id across any
+   boundary? Then **observe them fire**: trigger the path and confirm the log line / metric /
+   span actually emits, and record that observation in `evidence.md`. Instrumentation never seen
+   emitting is unproven, not done. Skip entirely for pure-internal / docs / config / type-only
+   changes — don't instrument a typo fix.
 6. **On failure** → [failure-triage](reference/failure-triage.md) +
    `devrites-debug-recovery`. Reproduce → isolate → fix within scope → re-run; if a fix
    would exceed scope, record a blocker.

package/pack/.claude/skills/rite-review/reference/five-axis-review.md CHANGED Viewed

@@ -20,10 +20,21 @@ is complete, and to scope anything the agent could not (e.g. UI-only lenses belo
 ## 2. Readability
 - Can the next engineer understand it without the author? Naming, function length,
   nesting depth, comments that explain *why* not *what*.
+- Structural smells: a conditional **bolted onto an unrelated flow** (wants its own
+  helper/state/policy — a design smell, not a nit); **repeated conditionals on the same
+  shape** (a missing model or dispatcher).
 ## 3. Architecture
 - Right seam/boundary? Coupling and cohesion. Does it fit existing patterns or
   introduce a competing one? Is the abstraction earned (not premature)?
+- Does a refactor **reduce** complexity or just **relocate** it? Count the concepts a
+  reader must hold; a "cleaner" version that leaves that count unchanged isn't cleaner.
+- Is feature-specific logic **leaking into a shared module** instead of its owning layer?
+  Is a **type boundary** left implicit by a gratuitous `any`/cast or a silent fallback?
+- **Name the remedy, not just the smell** — replace a conditional chain with a typed
+  dispatcher, separate orchestration from business logic, move feature logic to its owning
+  package, delete a pass-through wrapper, split a large file. Prefer the move that removes
+  moving pieces over one that re-centralizes the same complexity.
 ## 4. Security
 - Trust boundaries, input validation, authz checks, secrets handling. Hand off to
@@ -43,4 +54,7 @@ is complete, and to scope anything the agent could not (e.g. UI-only lenses belo
 ## Sizing & speed
 Prefer reviewing roughly one slice / ~100 lines of meaningful change at a time. Larger
-diffs hide defects — recommend splitting rather than rubber-stamping.
+diffs hide defects — recommend splitting rather than rubber-stamping. Watch **file size,
+not just diff size**: a small diff that pushes an already-large file further past a healthy
+boundary wants decomposition (extract helpers / split modules) *first* — decompose, then
+add.

package/pack/.claude/skills/rite-review/reference/performance-checklist.md ADDED Viewed

@@ -0,0 +1,80 @@
+# Performance checklist (baseline for the perf reviewer)
+The minimum baseline `devrites-performance-reviewer` checks against, in feature scope. It
+is a checklist, not a mandate: flag what the diff actually touches or what a measurement
+flags, never a project-wide sweep. The philosophy lives in
+[`rules/performance.md`](../../../rules/performance.md) (measure first) — this file is the
+concrete what-to-look-for.
+## Core Web Vitals targets
+| Metric | Good | Needs work | Poor |
+|---|---|---|---|
+| LCP — Largest Contentful Paint | ≤ 2.5s | ≤ 4.0s | > 4.0s |
+| INP — Interaction to Next Paint | ≤ 200ms | ≤ 500ms | > 500ms |
+| CLS — Cumulative Layout Shift | ≤ 0.1 | ≤ 0.25 | > 0.25 |
+A CWV value only counts when it carries a source — `Field (CrUX)`, `Lab (Lighthouse)`, or
+`Trace (DevTools)`. No source → it's a Source-mode hypothesis, not a measurement.
+## Frontend (only when the feature is UI-facing)
+Identify the framework first; apply only its idioms.
+- **Images** — modern format (WebP/AVIF); responsive `srcset`/`sizes`; explicit
+  `width`/`height` to reserve space (CLS); below-the-fold `loading="lazy"`; the LCP image
+  gets `fetchpriority="high"` and is **not** lazy-loaded.
+- **JavaScript** — initial bundle stays small (≈200KB gzipped is the usual line); code-split
+  routes and heavy features; no blocking script in `<head>` without `defer`/`async`; break
+  long tasks (>50ms) so the main thread stays free (the main INP lever); `memo`/`useMemo`/
+  `useCallback` only where profiling shows a win, not wrapped over everything.
+- **CSS** — critical CSS not render-blocking; no per-render CSS-in-JS runtime cost in prod.
+- **Fonts** — 2–3 families max; WOFF2; self-hosted where possible; preload the LCP font;
+  `font-display: swap`; subset with `unicode-range`.
+- **Rendering** — no layout thrashing (batch reads, then writes); animate `transform`/
+  `opacity` only; virtualize long lists; `content-visibility: auto` for off-screen sections;
+  don't break bfcache (no `unload` handler, no `Cache-Control: no-store` on HTML).
+## Backend (every feature, UI or not)
+- No N+1 queries; eager-load / join / batch instead.
+- New queries have the indexes they need for their filter/sort columns.
+- List endpoints paginate — never an unbounded `SELECT *`.
+- Per-request work that doesn't change per call is cached or hoisted.
+- Responses compressed (gzip/brotli); bulk operations instead of a loop of single calls.
+## Network
+- Static assets cached with a long `max-age` + content hashing.
+- Known origins `preconnect`-ed; no unnecessary redirects; HTTP/2 or HTTP/3 where available.
+## AI-codegen perf smells (fold into the area above, not a separate finding category)
+- State duplicated instead of lifted; effects with over-broad deps that re-run needlessly.
+- Sequential `await`s where `Promise.all` / parallel fetch fits.
+- Over-fetching "just in case"; redundant calls a dedup would collapse.
+- Defensive memoization wrapping cheap components — cost with no benefit.
+## Modern APIs to consider (one-liners — reach for these, don't gate on them)
+`scheduler.yield()` to keep input responsive in long loops · Speculation Rules for
+prefetch/prerender · View Transitions on SPA navigation · Long Animation Frames (LoAF) for
+production INP attribution · `fetchpriority` on critical non-image resources.
+## Measurement commands (name these in Source mode; don't run installs in review)
+```bash
+# Lighthouse (lab)
+npx lighthouse <url> --output json --output-path ./report.json
+# or via the Chrome DevTools MCP CLI, no install:
+npx -p chrome-devtools-mcp chrome-devtools lighthouse_audit --output-format=json > report.json
+# Bundle size
+npx vite-bundle-visualizer        # Vite
+npx webpack-bundle-analyzer stats.json   # webpack
+# Field data: CrUX / PageSpeed Insights (real users, p75) — needs an API key
+```
+Field is what real users experienced; lab is one synthetic run. Don't present one as the
+other.

package/pack/.claude/skills/rite-review/reference/performance-review.md CHANGED Viewed

@@ -17,6 +17,10 @@ a hunch.
   unnecessary re-renders.
 - **General**: accidental quadratic loops, repeated work in hot paths, large allocations.
+The agent runs **Source mode** (static scan, findings tagged `potential` + the verify
+command) or **Measured mode** (real CWV numbers → a source-labeled scorecard). Baseline
+checks + measurement commands: [`performance-checklist.md`](performance-checklist.md).
 ## Optimize responsibly
 - Fix the measured bottleneck, then **re-measure** to prove the win (before/after in
   `evidence.md`). An optimization with no measured improvement is just added complexity.

package/pack/.claude/skills/rite-seal/SKILL.md CHANGED Viewed

@@ -18,6 +18,9 @@ pull these via `Read` before sealing:
 - `agents.md` — review-subagent fan-out at seal.
 - `code-review.md` — severity labels (Critical / Important / Suggestion / Nit / FYI).
 - `documentation.md` — record decisions in `decisions.md` before sealing.
+- `observability.md` — a runtime surface that ships blind is an Important finding.
+- `deprecation.md` — when the diff removes / migrates code, API, or data (read with the
+  risk-and-rollback step below).
 ## Operating rules
 - Evidence over confidence — a criterion is met only if evidence proves it.
@@ -70,6 +73,14 @@ Read `review.md` and the latest reviewer outputs.
    `/rite-temper`), confirm its **top pre-mortem risks are mitigated** in the diff/evidence and
    that no **Non-goal / deferred item crept into the diff** (scope creep) — either is a finding
    (an unmitigated top risk or smuggled-in out-of-scope work).
+   - **Observability** (`observability.md`): if the diff added a runtime surface (endpoint,
+     job, integration, user flow, error path), a feature shipping with no way to debug it in
+     prod is an **Important** finding, not a pass — `evidence.md` should show telemetry observed
+     to emit (`/rite-prove` step 5b).
+   - **Removal / migration** (`deprecation.md`): if the diff deletes or migrates code, an API,
+     or data, confirm it followed expand→contract, proved the old path unused before removing it,
+     and carries a rollback for every destructive step. A surprise deletion or a one-shot
+     breaking migration is a finding (and trips the irreversible-risk gate, `afk-hitl.md`).
 6. Check **frontend polish** if UI is involved (states, a11y, responsive, design-system,
    browser evidence).
 7. **Independent review** — seal is the final gate, not a re-run of `/rite-review`.

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "devrites",
-  "version": "2.0.0",
+  "version": "2.2.0",
   "description": "DevRites — disciplined senior-engineer workflow skills pack for Claude Code",
   "license": "SEE LICENSE IN LICENSE",
   "homepage": "https://github.com/ViktorsBaikers/DevRites#readme",

package/scripts/validate.sh CHANGED Viewed

@@ -18,8 +18,8 @@ AGENT_FILES="devrites-spec-reviewer devrites-code-reviewer devrites-test-analyst
 # ---- 1. bash -n on every shell script ------------------------------------
 section "bash syntax (bash -n)"
-SH_LIST="$ROOT/install.sh $ROOT/uninstall.sh"
-for f in "$ROOT"/scripts/*.sh "$ROOT"/tests/*.sh "$ROOT"/pack/.claude/skills/*/scripts/*.sh; do [ -f "$f" ] && SH_LIST="$SH_LIST $f"; done
+SH_LIST="$ROOT/install.sh $ROOT/uninstall.sh $ROOT/update.sh"
+for f in "$ROOT"/scripts/*.sh "$ROOT"/tests/*.sh "$ROOT"/pack/.claude/hooks/*.sh "$ROOT"/pack/.claude/skills/*/scripts/*.sh; do [ -f "$f" ] && SH_LIST="$SH_LIST $f"; done
 for f in $SH_LIST; do
   if bash -n "$f" 2>/tmp/dr_synerr; then good "syntax ${f#$ROOT/}"; else bad "syntax ${f#$ROOT/}: $(cat /tmp/dr_synerr)"; fi
 done
@@ -174,12 +174,21 @@ else
   good "no false session-start autoload claim in pack/ docs/ README.md"
 fi
-# ---- 15. shellcheck (advisory) -------------------------------------------
-section "shellcheck (advisory)"
+# ---- 15. shellcheck (error = blocking, warning = advisory) ---------------
+# CI runners ship shellcheck, so the error-level gate is enforced on every PR.
+# Locally it self-skips when shellcheck is absent (the gate is non-blocking only
+# where the tool isn't installed — never silently downgraded where it is).
+section "shellcheck (-S error blocking · -S warning advisory)"
 if command -v shellcheck >/dev/null 2>&1; then
-  for f in $SH_LIST; do shellcheck -S warning "$f" || echo "  (shellcheck advisory only — not failing the build)"; done
+  for f in $SH_LIST; do
+    if shellcheck -S error "$f"; then good "shellcheck ${f#"$ROOT"/}"; else bad "shellcheck (error) ${f#"$ROOT"/}"; fi
+  done
+  # warning-level is informational — surfaced per file, never fails the build.
+  for f in $SH_LIST; do
+    shellcheck -S warning "$f" >/dev/null 2>&1 || echo "  advisory (warning-level): ${f#"$ROOT"/}"
+  done
 else
-  echo "skip: shellcheck not installed (optional)"
+  echo "skip: shellcheck not installed locally (optional — CI enforces the error-level gate)"
 fi
 # ---- summary -------------------------------------------------------------