npm - @hopla/claude-setup - Versions diffs - 1.16.0 → 1.17.1 - Mend

@hopla/claude-setup 1.16.0 → 1.17.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (9) hide show

package/.claude-plugin/marketplace.json +1 -1
package/.claude-plugin/plugin.json +1 -1
package/README.md +4 -0
package/commands/init-project.md +6 -0
package/commands/plan-feature.md +34 -3
package/package.json +1 -1
package/skills/migration/SKILL.md +110 -0
package/skills/performance/SKILL.md +102 -0
package/skills/refactoring/SKILL.md +84 -0

package/.claude-plugin/marketplace.json CHANGED Viewed

@@ -9,7 +9,7 @@
     {
       "name": "hopla",
       "description": "Agentic coding system: PIV loop, TDD, debugging, brainstorming, subagent execution, and team workflows",
-      "version": "1.16.0",
+      "version": "1.17.1",
       "source": "./",
       "author": {
         "name": "Hopla Tools",

package/.claude-plugin/plugin.json CHANGED Viewed

@@ -1,7 +1,7 @@
 {
   "name": "hopla",
   "description": "Agentic coding system for Claude Code: PIV loop (Plan → Implement → Validate), TDD, debugging, brainstorming, subagent execution, and team workflows",
-  "version": "1.16.0",
+  "version": "1.17.1",
   "author": {
     "name": "Hopla Tools",
     "email": "julio@hopla.tools"

package/README.md CHANGED Viewed

@@ -233,6 +233,9 @@ After each PIV loop, run the `execution-report` skill + `/hopla:system-review` t
 | `brainstorm` | "let's brainstorm", "explore approaches" |
 | `debug` | "debug this", "find the bug", "why is this failing" |
 | `tdd` | "write tests first", "TDD", "red-green-refactor" |
+| `refactoring` | "refactor", "clean up", "simplify", "extract", "deduplicate" |
+| `performance` | "slow", "optimize", "bottleneck", "lento", "tarda mucho" |
+| `migration` | "migrate", "upgrade", "switch from X to Y", "major version bump" |
 | `subagent-execution` | "use subagents", plans with 5+ tasks |
 | `parallel-dispatch` | "run in parallel", "parallelize this", independent tasks |
@@ -441,6 +444,7 @@ project/
 │   ├── rca/                       ← Root cause analysis docs (commit)
 │   ├── execution-reports/         ← Post-implementation reports (commit)
 │   ├── system-reviews/            ← Process improvement reports (commit)
+│   ├── audits/                    ← Persistent audit reports (commit — opt-in)
 │   └── code-reviews/              ← Code review reports (don't commit — ephemeral)
 └── .claude/
     └── commands/                  ← Project-specific commands (optional)

package/commands/init-project.md CHANGED Viewed

@@ -435,9 +435,15 @@ Create the following directories (with `.gitkeep` where needed):
 ├── rca/                 <- /hopla:rca saves root cause analysis docs here (commit)
 ├── execution-reports/   <- the `execution-report` skill saves here (commit — needed for cross-session learning)
 ├── system-reviews/      <- /hopla:system-review saves here (commit — needed for feedback loop)
+├── audits/              <- persistent audit reports worth preserving (commit — opt-in; copy a code review here when you want to keep it)
 └── code-reviews/        <- the `code-review` skill saves here (do NOT commit — ephemeral, consumed by code-review-fix)
 ```
+**Policy — `audits/` vs `code-reviews/`:**
+- `code-reviews/` is **ephemeral working state**. Every run overwrites/adds files; `code-review-fix` consumes them and they become stale fast. Never commit.
+- `audits/` is **persistent**. Move or copy a review here when it documents a finding the team should remember (security issue, architectural concern, post-mortem evidence). Commit.
 Add to `.gitignore` (create if it doesn't exist):
 ```
 .agents/code-reviews/

package/commands/plan-feature.md CHANGED Viewed

@@ -27,8 +27,9 @@ Read the following to understand the project:
 2. `README.md` — project overview and setup
 3. `package.json` or `pyproject.toml` — stack, dependencies, scripts
 4. `.agents/guides/` — if this directory exists, read any guides relevant to the feature being planned (e.g. `@.agents/guides/api-guide.md` when planning an API endpoint)
-5. `MEMORY.md` (if it exists at project root or `~/.claude/`) — check for user preferences that affect this feature (UI patterns like modal vs inline, keyboard shortcuts, component conventions)
-6. `.agents/execution-reports/` — if this directory exists, scan recent reports (last 3-5) for technical patterns discovered and gotchas relevant to the feature being planned. These contain real-world learnings from previous implementations that prevent re-discovering known issues.
+5. `.agents/specs/` — if this directory exists, scan for design specs that match the feature name. These come from the `brainstorm` skill and already document the chosen approach, files affected, edge cases, and open questions. If a matching spec exists, it is the authoritative design — the plan turns that design into tasks. If no spec exists and the feature is non-trivial, suggest running the `brainstorm` skill first.
+6. `MEMORY.md` (if it exists at project root or `~/.claude/`) — check for user preferences that affect this feature (UI patterns like modal vs inline, keyboard shortcuts, component conventions)
+7. `.agents/execution-reports/` — if this directory exists, scan recent reports (last 3-5) for technical patterns discovered and gotchas relevant to the feature being planned. These contain real-world learnings from previous implementations that prevent re-discovering known issues.
 Then run:
@@ -89,7 +90,23 @@ Based on research, define:
 - **API surface enumeration (security/access control plans):** When the plan modifies access control, authorization, or data visibility, enumerate ALL API surfaces that serve the same data — REST endpoints, WebSocket handlers, Durable Object methods, and any other data paths. Each surface must be updated consistently. Add a task for each surface, not just the primary one.
 - **Role access matrix:** For features involving multiple user roles or multi-tenant access (admin, member, viewer, buyer, external user), define a matrix: what data does each role see? What endpoints does each role call? What filters apply per role?
 - **External integration buffer:** If the feature integrates an external API or third-party service, budget 2x the estimated time. Document: do we have working test credentials? Is the SDK tested in our runtime (Workers, Node, edge, etc.)? Are there known deprecations or version constraints?
-- **UI iteration budget:** For features with significant UI (new pages, complex forms, interactive grids), note that UI specifications are provisional — visual polish typically needs iteration on user feedback. Specify what "good enough for v1" looks like vs. future polish so scope creep is not classified as plan failure.
+- **UI iteration budget (MANDATORY when UI is detected):** If the feature touches a visible UI surface (see "UI detection heuristic" below), the plan's "Out of Scope" OR "Notes for Executing Agent" section MUST include the line `Expected UX iterations: N`, where N is an honest positive integer (default 2-4 for new components, 1-2 for tweaks to existing components). Do NOT write `0` — manual smoke surfaces 2-4 deviations on average for any plan that adds or modifies a label, panel, modal, or interactive flow. Plans that don't budget for this perceive routine UX feedback as scope creep. The original "what 'good enough for v1' looks like" guidance still applies — specify it alongside the iteration count.
+- **UI detection heuristic:** A feature touches UI when ANY of the following matches the user's request OR the files the plan will create/modify:
+  - **Vocabulary triggers:** `component`, `panel`, `modal`, `dialog`, `card`, `badge`, `banner`, `label`, `button`, `flow`, `screen`, `page`, `view`, `render`, `layout`, `status indicator`, `tooltip`, `wizard`, `form`, `grid`, `table` (when interactive), `icon`
+  - **File path triggers:** `src/components/`, `src/pages/`, `src/views/`, `src/modules/*/components/`, `src/screens/`, files ending in `.tsx`, `.jsx`, `.vue`, `.svelte`
+  - **Plan task triggers:** any task with Action `create | modify` whose File path matches the above
+  - When uncertain, err on inclusion: a 2-line UX iteration budget costs nothing; a missing one re-classifies normal feedback as scope creep.
+- **Domain Assumptions (MANDATORY when domain vocabulary is detected):** If the feature uses project-specific domain terms (see "Domain vocabulary detection heuristic" below), the plan MUST include a `## Domain Assumptions` subsection in the plan template — placed BEFORE `## Implementation Tasks` — listing each user-confirmable assumption as a bullet. Each bullet describes ONE inference the planner made about meaning, behavior, or business rule. The user confirms or corrects each bullet before implementation begins. This is distinct from `Verified Assumptions` (Phase 3, technical state) — Domain Assumptions cover semantic meaning that only the user can validate.
+- **Domain vocabulary detection heuristic:** A feature uses domain vocabulary when ANY of the following category-based triggers fires (the categories below are universal — examples vary by project):
+  - **Entity state semantics:** any term describing a state, condition, or status of an entity in the project's domain. Generic examples: `Active` / `Inactive`, `Pending` / `Approved` / `Rejected`, `Open` / `Closed`, `Available` / `Unavailable`. Project-specific examples come from the project's own vocabulary.
+  - **Lifecycle states:** `draft` / `published` / `archived`, `created` / `processed` / `completed`, `submitted` / `reviewed` / `accepted`. Any sequence of states an entity passes through over time.
+  - **Grades / tiers / quality levels:** `A` / `B` / `C` grading, `premium` / `standard` / `basic`, `hot` / `warm` / `cold`, condition tiers, severity levels. Any ordinal classification with semantic boundaries.
+  - **Roles / permissions / multi-tenant access:** `admin` / `member` / `viewer` / `owner`, internal vs external users, buyer vs seller. Anything where access or behavior depends on who the actor is.
+  - **Color-coded UI states with semantic meaning:** green=ok / amber=warning / red=error, or any project-specific color → state mapping. The mapping itself is a domain assumption.
+  - **Business rules:** "when X happens, then Y is forbidden / required / triggered". Any conditional behavior that encodes a policy decision rather than a technical constraint.
+  - **Project-specific vocabulary harvest:** read the project's root `CLAUDE.md`. If a `Domain`, `Glossary`, `Vocabulary`, or similar section exists, harvest its terms. Any of those terms appearing in the user's request triggers the heuristic. **Fallback when no such section exists:** rely on the category-based triggers above.
+  - **Sentinel check:** if the planner wrote sentences in the plan that describe behavior using nouns or adjectives that are NOT in the user's verbatim request, those are inferred meanings — surface them as Domain Assumptions.
+  - When uncertain, err on inclusion: a `Domain Assumptions` subsection with 1-2 bullets is cheaper than a misaligned implementation surfaced during manual smoke.
 ## Phase 5: Generate the Plan
@@ -125,6 +142,16 @@ Key files the executing agent must read before starting:
 - `[path/to/file]` — [why it's relevant]
 - `[path/to/file]` — [why it's relevant]
+## Domain Assumptions (if applicable)
+> Include this section when the feature uses domain vocabulary (entity states, lifecycle, grades/tiers, roles, color-coded states, business rules, or any project-specific terms in the project's CLAUDE.md). **Skip entirely** (do not write "N/A") when no domain semantics are involved.
+Each bullet is a user-confirmable assumption the planner made about meaning, behavior, or business rule. The user MUST confirm or correct each bullet BEFORE execution begins.
+- [Assumption 1 — entity state: e.g., "An entity with `status='archived'` is excluded from active list queries — confirm."]
+- [Assumption 2 — derived/computed value: e.g., "The `expired` badge shows when `expiresAt < today` AND `status != 'archived'` — confirm formula."]
+- [Assumption 3 — grade / tier / business rule: e.g., "Tier B users have read access to the audit log but cannot export it — confirm boundary."]
 ## Implementation Tasks
 > All fields are required. Use `N/A` if a field does not apply — never leave a field blank.
@@ -196,6 +223,8 @@ Scoring guide:
 [Any important context, warnings, or decisions made during planning that the executing agent needs to know]
 > **UI Styling Note:** UI styling specifications (colors, sizes, variants, labels, spacing) are `[provisional]` proposals — expect them to change once the user sees the implementation. Implement as specified but do not over-invest in pixel-perfect adherence; plan for iteration.
+>
+> **Expected UX iterations: N** — *(include this line when the feature touches a visible UI surface — component, panel, modal, label, card, dialog, button, or interactive flow; **omit entirely otherwise** — do not write "N/A")*. N is the honest number of visual iterations the planner expects on this feature during manual smoke. Forbidden: `0`. Default: `2-4` for new components, `1-2` for tweaks. The user should not classify iterations within this budget as scope creep.
 ```
 ---
@@ -232,6 +261,8 @@ Before saving the draft, review the plan against these criteria:
 - [ ] **Likely follow-ups listed:** If the Out of Scope section has items, the Likely Follow-ups section is populated with naturally adjacent work the user may request
 - [ ] **API surface enumeration (if security/access plan):** All parallel API surfaces (REST, WebSocket, DO) that serve the same data are listed with a task for each
 - [ ] **N+1 query check:** For every task that writes database queries or API calls, verify: is any call inside a loop? Could it be batched? Are there duplicate existence checks before mutations?
+- [ ] **UX iteration budget declared:** If the feature touches UI per the Phase 4 heuristic, the plan includes `Expected UX iterations: N` (with N a positive integer ≥ 1) in Out of Scope or Notes for Executing Agent. If UI is NOT involved, the line is correctly absent (no `N/A`, no empty placeholder).
+- [ ] **Domain Assumptions surfaced:** If the feature uses domain vocabulary per the Phase 4 heuristic, the plan includes a `## Domain Assumptions` subsection BEFORE `## Implementation Tasks`, with each bullet phrased as a user-confirmable statement. If no domain vocabulary is involved, the section is correctly absent (no `N/A`, no empty placeholder).
 ## Phase 7: Save Draft and Enter Review Loop

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "@hopla/claude-setup",
-  "version": "1.16.0",
+  "version": "1.17.1",
   "description": "Hopla team agentic coding system for Claude Code",
   "type": "module",
   "bin": {

package/skills/migration/SKILL.md ADDED Viewed

@@ -0,0 +1,110 @@
+---
+name: migration
+description: "Phased migration workflow for upgrading dependencies, switching frameworks, or moving between systems. Use when the user says 'migrate', 'upgrade', 'switch from X to Y', 'move to', 'replace library', 'major version bump', 'deprecated', or when changing a framework/runtime/database version. Do NOT use for greenfield features or small refactors — use plan-feature or refactoring instead."
+---
+> 🌐 **Language:** All user-facing output must match the user's language. Code, paths, and commands stay in English.
+# Migration: Move Systems Without Breaking Them
+## Iron Rule
+**Every migration needs a rollback plan before the first line changes.** If you cannot describe how to undo the migration in one sentence, you are not ready to start it.
+## Step 1: Classify the Migration
+Ask the user (one question at a time):
+- **Type**: dependency upgrade (major version), framework switch (e.g. Express → Hono), runtime switch (Node → Bun), data store (SQLite → Postgres), API version (v1 → v2)
+- **Scope**: one module, one service, or the whole codebase?
+- **Downtime tolerance**: blue/green, zero-downtime (dual-run), acceptable window?
+- **Deadline driver**: deprecation, security, performance, or opportunistic?
+This framing determines whether the work is a single PR or a multi-phase plan.
+## Step 2: Audit the Surface
+Map **everything** that will be affected:
+- Imports / usages of the old API (use `grep -r` or `rg` across the codebase)
+- Public contracts that depend on current behavior (downstream callers, API consumers)
+- Build / deploy steps tied to the current version
+- Test suites that assume old behavior
+- Documentation mentioning the old API
+Write the inventory to `.agents/specs/migration-<topic>.md` with counts — "47 import sites across 12 files". Numbers help you size the work honestly.
+## Step 3: Read the Upgrade Notes
+Before writing code, read the target's official migration guide / changelog end to end. Note:
+- **Breaking changes** (renamed APIs, removed APIs, default-behavior flips)
+- **Deprecations** (will break in N+2, not now)
+- **Required minimum versions** for peer dependencies
+- **Data-shape changes** that require a migration script
+If the target project has no migration guide, treat it as higher risk and budget more time for exploration.
+## Step 4: Choose a Strategy
+| Strategy | When to use |
+|---|---|
+| **Big bang** | Small codebase, low downstream coupling, clean cut possible |
+| **Incremental with adapter** | Many call sites — introduce a thin wrapper that presents the old API on top of the new, migrate call sites one by one |
+| **Dual-run (strangler fig)** | High-risk or zero-downtime — run both old and new side by side, shift traffic gradually |
+| **Branch by abstraction** | Internal refactor + external API stays stable — hide the switch behind an interface |
+Pick one and document the trade-off in the spec file.
+## Step 5: Plan the Phases
+For anything non-trivial, run `/hopla:plan-feature` with `migration-<topic>` as the feature name. The plan should specify:
+- **Phase boundaries** (compatibility shim in, call sites migrated, shim removed)
+- **Rollback plan per phase** (revert commit? feature flag? dual-write?)
+- **Validation at each phase** (test suite green, feature flags covered, canary metrics)
+- **Data migration script** (if the storage layer changes) — idempotent, resumable
+Each phase should land as its own PR.
+## Step 6: Migrate With Guardrails
+Execute phase by phase. After every phase:
+- Run the full validation pyramid (`commands/guides/validation-pyramid.md`)
+- Check for mixed-version pitfalls — modules importing both the old and new API in the same request
+- Confirm the rollback path still works (git revert + redeploy, or feature flag off)
+Never advance to the next phase if validation failed on the previous one.
+## Step 7: Remove the Old Path
+Once every call site is migrated and observed green in production (where applicable):
+- Delete the compatibility shim
+- Remove the old dependency (`npm uninstall`, etc.)
+- Remove the feature flag
+- Update documentation to reference only the new path
+This "cleanup" step is part of the migration. A migration left half-done with a permanent shim is worse than no migration.
+## Rules
+- Never migrate on a Friday or before a public release
+- Keep the rollback plan alive at every phase — if it stops working, pause
+- Track breaking changes from the target's changelog in the spec, not in memory
+- Data migrations must be idempotent and resumable — migrations fail mid-run
+- If the migration drags past its original estimate by 2x, stop and reassess scope
+## Integration
+- Use `/hopla:plan-feature` to generate the phased plan from the Step 2 inventory
+- Use the `worktree` skill to keep the migration isolated from other work
+- The `code-review` skill (checklist sections 2 and 5) catches dual-import patterns and pattern drift
+- The `performance` skill verifies the migration did not regress hot paths
+## Next Step
+Once the migration is planned:
+> "Migration classified and inventoried. Saved spec to `.agents/specs/migration-<topic>.md`. Run `/hopla:plan-feature` to generate the phased implementation plan."

package/skills/performance/SKILL.md ADDED Viewed

@@ -0,0 +1,102 @@
+---
+name: performance
+description: "Measured performance optimization workflow. Use when the user says 'slow', 'optimize', 'performance', 'bottleneck', 'too slow', 'high memory', 'high CPU', 'lento', 'tarda mucho', or when asking to make something faster. Do NOT use for correctness bugs or new features — use the debug or plan-feature skills instead."
+---
+> 🌐 **Language:** All user-facing output must match the user's language. Code, paths, and commands stay in English.
+# Performance: Measure Before You Change
+## Iron Rule
+**No optimization without a measurement.** Every performance change must start with a number (latency, memory, query count) and end with a comparison. Guessing at hot paths wastes time and often makes things slower.
+## Step 1: Clarify the Symptom
+Ask the user (one question at a time):
+- What operation feels slow? (page load, API request, build, test run, specific query)
+- How slow is it? (exact number if possible — "3 seconds", "30 MB", "10s with 100 items")
+- What is "fast enough"? (target: < 500 ms p95, < 100 MB, etc.)
+- Is it reproducible, or only under load?
+Without a concrete target, you cannot declare the optimization done.
+## Step 2: Measure the Baseline
+Pick the right tool for the symptom:
+| Symptom | Measurement |
+|---|---|
+| Slow endpoint | `curl -w "%{time_total}"` or APM dashboard (see `guides/mcp-integration.md` for MCP options) |
+| Slow DB query | `EXPLAIN ANALYZE` (Postgres), `EXPLAIN` (SQLite/MySQL) |
+| Slow frontend render | Chrome DevTools Performance tab, React Profiler |
+| Memory growth | `process.memoryUsage()` snapshots, heap dumps |
+| Slow build/test | Time the command, compare against a clean cache |
+Record the baseline with units. "3.2 s to load /dashboard with 1000 items" — not "it feels slow".
+## Step 3: Identify the Hot Path
+Rank suspects by where the baseline measurement actually spends its time:
+- **N+1 queries** — are there loops calling the DB or an API?
+- **Missing indexes** — does `EXPLAIN ANALYZE` show a seq scan on a large table?
+- **Synchronous I/O** — is there a blocking call that could be awaited in parallel (`Promise.all`)?
+- **Rendering** — are components re-rendering with unchanged props? Are lists virtualized?
+- **Algorithm** — is there an O(n²) that could be O(n) with a map?
+- **Caching** — is the same computation repeated without memoization?
+Do **not** guess. Use the profiler output or query plan to pick one suspect.
+## Step 4: Apply One Change
+Change one thing. Not three.
+- Add the index
+- Replace the loop with `Promise.all`
+- Memoize the expensive selector
+- Batch the API calls
+- Virtualize the list
+Keep the diff minimal so you can attribute the delta to this change alone.
+## Step 5: Measure Again
+Re-run the exact same measurement from Step 2 under the same conditions. Report:
+- Baseline: X
+- After change: Y
+- Delta: (X − Y) / X × 100 %
+- Target: [target from Step 1]
+If you did not hit the target, go back to Step 3 and pick the next suspect. If you regressed, revert and rethink.
+## Step 6: Regression Guard
+Once the target is met, add a guard so future changes do not erode the win:
+- A test with a timeout assertion (e.g. `expect(duration).toBeLessThan(500)`)
+- A query count assertion (e.g. `expect(dbQueries).toHaveLength(1)`)
+- A bundle size budget, memory budget, or frame budget if applicable
+Without a guard, the win decays.
+## Rules
+- One suspect at a time — never stack optimizations before measuring
+- Keep the baseline in the commit message so the win is auditable
+- If the fix adds significant complexity for a small win (< 10 %), consider reverting
+- Do not optimize code that is not actually hot — premature optimization hurts readability
+## Integration
+- Use the `code-review` skill checklist section 3 (Performance Problems) for patterns to watch for
+- If the optimization requires architectural changes, stop and run `/hopla:plan-feature`
+- After the change lands, the `verify` skill will require the regression guard to run fresh
+## Next Step
+After the target is met and a regression guard is in place:
+> "Target hit: [baseline → result]. Regression guard added. Say 'commit' to trigger the `git` skill with a `perf:` conventional commit."

package/skills/refactoring/SKILL.md ADDED Viewed

@@ -0,0 +1,84 @@
+---
+name: refactoring
+description: "Safe refactoring workflow with behavior preservation. Use when the user says 'refactor', 'clean up', 'simplify', 'extract', 'restructure', 'deduplicate', 'rename', or when asking to improve code structure without changing behavior. Do NOT use for bug fixes, new features, or performance work — use the debug, plan-feature, or performance skills instead."
+---
+> 🌐 **Language:** All user-facing output must match the user's language. Code, paths, and commands stay in English.
+# Refactoring: Restructure Without Changing Behavior
+## Iron Rule
+**Behavior must be identical before and after.** If a refactor changes observable behavior — output, side effects, error shape, API surface — it is not a refactor. Stop and reclassify the work as a feature change or a bug fix.
+## Step 1: Confirm the Refactor Is Worth Doing
+Ask the user (one question at a time):
+- What is the current pain? (duplication, unclear naming, deep nesting, coupled modules)
+- What is the desired structure? (extract helper, collapse abstraction, rename, move, inline)
+- Is there a test suite, and does it cover the code being refactored?
+If the answers reveal a missing test covering the target, **write the test first** (pin current behavior), then refactor. Untested refactors are rewrites.
+## Step 2: Capture the Baseline
+Run the project's validation commands from `CLAUDE.md` (or use `/hopla:validate`). Record:
+- Lint / format — current state
+- Types — current state
+- Unit tests — pass/fail count
+- Relevant integration tests — pass/fail
+Every level must be green before starting. A refactor on top of red tests cannot prove it preserved behavior.
+## Step 3: Apply the Smallest Safe Change
+Pick one refactor at a time:
+- Extract function / module
+- Rename (symbol, file)
+- Inline (remove pointless indirection)
+- Move (relocate to a better home)
+- Deduplicate (merge two near-identical pieces)
+- Replace conditional with polymorphism / table lookup
+- Flatten / collapse nesting
+**Do not** mix refactors. If the change wants to become a redesign, stop and suggest `/hopla:plan-feature`.
+## Step 4: Re-run the Baseline
+After each refactor, re-run the same validation set from Step 2. Results must match exactly:
+- Same lint result (0 new warnings unless whitelisted)
+- Same type result
+- Same pass/fail count on tests
+- Same integration result
+If anything diverges, the refactor leaked behavior — revert or fix before continuing.
+## Step 5: Commit at a Clean Boundary
+When the baseline is restored and the refactor is coherent, suggest a commit via the `git` skill:
+> "Refactor complete — behavior preserved. Say 'commit' to save it with a `refactor:` conventional commit."
+## Rules
+- One refactor per commit — easier to review, easier to revert
+- Never combine refactor + feature in the same commit
+- Prefer many small refactors over one large one
+- If the test suite is missing, add tests FIRST, then refactor (two commits minimum)
+- Preserve public API unless the user explicitly approves a breaking change
+## Integration
+- Pair with the `tdd` skill when adding characterization tests before a refactor
+- Use the `code-review` skill after the refactor to confirm no pattern violations were introduced
+- If the refactor touches many files, consider the `worktree` skill for isolation
+## Next Step
+After the refactor passes validation:
+> "Refactor complete and validated. Say 'commit' to trigger the `git` skill with a `refactor:` conventional commit."