npm - @brunosps00/dev-workflow - Versions diffs - 1.0.0 → 1.0.1 - Mend

@brunosps00/dev-workflow 1.0.0 → 1.0.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (8) hide show

package/README.md +2 -0
package/package.json +1 -1
package/scaffold/en/agent-instructions.md +7 -3
package/scaffold/en/commands/dw-brainstorm.md +166 -23
package/scaffold/pt-br/agent-instructions.md +7 -3
package/scaffold/pt-br/commands/dw-brainstorm.md +166 -23
package/scaffold/skills/dw-simplification/SKILL.md +10 -5
package/scaffold/skills/dw-simplification/references/deep-modules.md +105 -0

package/README.md CHANGED Viewed

@@ -262,6 +262,8 @@ Incident response (`dw-incident-response`) adapted from [`wilsto/claude-code-sta
 LLM evaluation (`dw-llm-eval`) trajectory-match modes (strict / unordered / subset / superset) and tool-argument matching strategies adapted from [`langchain-ai/agentevals`](https://github.com/langchain-ai/agentevals) (MIT). The broader oracle-ladder framing, judge-calibration discipline, and reference-dataset principle are distilled from the open evaluations literature (OpenAI evals cookbook, Anthropic evals guidance, the academic eval-of-LLM body of work) and rewritten in our voice.
+Four patterns from [`mattpocock/skills`](https://github.com/mattpocock/skills) by Matt Pocock (MIT) were integrated **without adding new commands or skills** — instead they fold into the existing surface as internal modes and inline guidance: (1) **grill-with-docs** → `/dw-brainstorm` `grill` mode (interview discipline that stress-tests plan vocabulary against `.dw/rules/`); (2) **prototype** → `/dw-brainstorm` `prototype` mode (LOGIC terminal app or UI variant dispatch); (3) **improve-codebase-architecture** → `dw-simplification/references/deep-modules.md` (deletion test, locality, leverage, seam, adapter diagnostic invoked by the `refactor-audit` mode); (4) **zoom-out** → one-paragraph guidance in `agent-instructions.md`. The same release also collapses `/dw-brainstorm`'s six legacy flags into a single full-flow entry point that auto-dispatches modes based on project signals.
 ## Migration from v0.x (1.0.0 is a consolidation release)
 v1.0.0 consolidates 30 commands → 20. **The `dev-workflow update` command auto-removes obsolete wrappers** via the migrator that landed in v0.13.0; no manual action required.

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "@brunosps00/dev-workflow",
-  "version": "1.0.0",
+  "version": "1.0.1",
   "description": "AI-driven development workflow commands for any project. Scaffolds a complete PRD-to-PR pipeline with multi-platform AI assistant support.",
   "bin": {
     "dev-workflow": "./bin/dev-workflow.js"

package/scaffold/en/agent-instructions.md CHANGED Viewed

@@ -26,9 +26,7 @@ This project uses [`@brunosps00/dev-workflow`](https://www.npmjs.com/package/@br
 | "Just the code quality review" | `/dw-review --code-only` |
 | "Time to commit" / changes are validated and ready | `/dw-commit` |
 | "Open a PR" / "Ship this" | `/dw-generate-pr` |
-| "Brainstorm X" / "Explore ideas" | `/dw-brainstorm "X"` |
-| "Research X" / "Compare A vs B with citations" | `/dw-brainstorm --research "X"` |
-| "Code-health audit" / "Find tech debt" / "Refactoring opportunities" | `/dw-brainstorm --refactor` |
+| "Brainstorm X" / "Explore ideas" / "Research X" / "Code-health audit" / "Find tech debt" | `/dw-brainstorm "X"` (auto-dispatches grill / prototype / council / research / refactor-audit / onepager based on signals) |
 | "Where is X?" / "What uses Y?" / "How is Z structured?" | `/dw-intel "<question>"` |
 | "Rebuild the codebase index" / "Refresh intel" | `/dw-intel --build` |
 | "Redesign this UI" / "Audit and ship a new design" | `/dw-redesign-ui "<target>"` |
@@ -58,6 +56,12 @@ When any of these apply, answer directly and do **not** invoke a `dw-*` command:
 - User explicitly says "do this directly" / "skip autopilot" / "no need for a PRD" — honor it.
 - The conversation is already inside a `dw-*` flow (you're already executing tasks; don't start a new pipeline).
+## Zoom-out pattern (for unfamiliar code areas)
+When you land in an area of the codebase you don't know and orientation costs more than the task itself, **don't dive into files first** — ask an exploring agent to produce a map. Give it the project's domain glossary (`.dw/rules/index.md`) and tell it: "zoom out one level — show me the relevant modules, their public surfaces, who calls them, and the data flow between them, using domain glossary vocabulary." Get the lay of the land, then dive. This avoids the trap of reading the deepest file first and reconstructing the architecture from leaves upward.
+Adapted from [`mattpocock/skills/zoom-out`](https://github.com/mattpocock/skills/tree/main/zoom-out) (MIT).
 ## Workflow Reference
 ```

package/scaffold/en/commands/dw-brainstorm.md CHANGED Viewed

@@ -11,14 +11,34 @@ You are a brainstorming facilitator for the current workspace. This command exis
 ## Pipeline Position
 **Predecessor:** (user idea) | **Successor:** `/dw-plan prd`
-## Flags
+## How this command works (auto-dispatch, not flag switchboard)
-- **(default)**: normal brainstorm with 3-7 options (conservative, balanced, bold) and trade-offs. If the product has PRDs or rules, a **Product Inventory** is produced automatically and each option carries a classification tag.
-- **`--onepager`**: at the end of the brainstorm, generate a durable one-pager at `.dw/spec/ideas/<slug>.md` (using `.dw/templates/idea-onepager.md`) with Feature Inventory + Classification & Rationale + MVP Scope + Not Doing + Assumptions. Use when you want a persisted product artifact before moving to `/dw-plan prd`.
-- **`--council`**: after the normal brainstorm, invoke the `dw-council` skill to stress-test the top 2-3 options via 3-5 archetypes (pragmatic-engineer, architect-advisor, security-advocate, product-mind, devils-advocate). Useful when the choice is high-impact and there is genuine dissent between paths.
-- **`--research`**: heavyweight multi-source research mode. Pipeline: scope → plan → retrieve (parallel sources) → triangulate → outline-refine → synthesize → critique → refine → report. Output: cited research document. Use for state-of-the-art reviews, technology comparisons, regulatory landscape mapping. Sub-modes: `quick` (3 phases, 2-5min), `standard` (default, 6 phases, 5-10min), `deep` (8 phases, 10-20min), `ultradeep` (8+ phases, 20-45min).
-- **`--refactor`**: code-smell catalog mode. Audits a target directory or PRD scope for code smells using Martin Fowler's taxonomy (bloaters, change preventers, dispensables, couplers, conditional complexity, DRY violations). Maps each smell to a concrete refactoring technique with before/after sketches. Severity-ordered P0-P3 plan. Output: refactoring opportunities document.
-- Flags are composable where it makes sense: `--onepager --council` produces the one-pager after the council debate. `--research --onepager` saves the research output as a durable one-pager. `--refactor --onepager` saves the refactor plan as a durable one-pager. `--research --refactor` is NOT supported (pick one or the other — different ideation surfaces).
+`/dw-brainstorm` runs **FULL** by default. It opens with a **Signal Reading** phase that inspects the user's request, the project state (PRDs, rules, intel, recent commits), and the conversation so far, then **dispatches one or more internal modes**. The user does not pick a mode — the command does.
+Internal modes (the dispatcher picks 1+):
+| Mode | Auto-fires when |
+|------|-----------------|
+| **option-matrix** (always-on default) | Default surface: 3-7 options (conservative / balanced / bold) with `[IMPROVES] / [CONSOLIDATES] / [NEW]` tags. Always runs unless a hard override says otherwise. |
+| **grill** | Vocabulary feels unsettled — user terms differ from `.dw/rules/` / `.dw/constitution.md`, or two synonyms compete in the same conversation, or a contributor proposes a name that clashes with the glossary. |
+| **prototype** | User asks "is this state model right?" / "what should this look like?" with no clear answer; or the next sensible step is to RUN code, not write words. |
+| **council** | Two or more competing approaches surface with no obvious winner; or consensus forms too quickly (false-consensus signal). |
+| **research** | The question depends on external state-of-the-art ("what's the current best practice for X", multi-source comparisons, regulatory or framework landscape). |
+| **refactor-audit** | User points at a directory or describes a code area as "messy", "needs cleanup", "tech debt"; or a quarterly health-check is requested. |
+| **onepager** | The conversation has converged enough to deserve a durable product artifact (`.dw/spec/ideas/<slug>.md`); or the user signals they're about to call `/dw-plan prd` next. |
+Modes can chain inside one session — grill can surface a design question that the dispatcher then sends to prototype; refactor-audit can produce findings that the dispatcher sends to council for stress-testing.
+### Optional overrides (rarely needed)
+- **`--mode=<name>`** — force a specific dispatch and skip Signal Reading. Names: `option-matrix`, `grill`, `prototype`, `council`, `research`, `refactor-audit`, `onepager`. Combine with `+` to chain explicitly: `--mode=grill+onepager`.
+- **`--quiet`** — skip Signal Reading entirely and run only `option-matrix` as a minimal facilitator.
+Power users who already know what they want can pass `--mode=`. Everyone else gets auto-dispatch by default — the command reads the room and acts.
+### Migration note (transitional)
+Old flag invocations (`--onepager`, `--council`, `--research`, `--refactor`, `--grill`, `--prototype`) remain accepted for one minor cycle and map to the equivalent `--mode=` value. New code should use `--mode=` or rely on auto-dispatch.
 ## Decision Flowchart: Brainstorm vs Direct PRD
@@ -42,15 +62,16 @@ digraph brainstorm_decision {
 When available in the project under `./.agents/skills/`, use these skills to enrich ideation:
-- `dw-council` (opt-in via `--council`): multi-advisor stress-test of the most promising options with mandatory steel-manning and concession tracking. **DO NOT invoke by default** — only when the flag is present or when consensus forms too quickly (false-consensus signal).
-- `dw-ui-discipline`: use when brainstorming involves frontend or UI direction — its hard-gate (scene sentence, surface job) is a generative forcing function during ideation, not just a review check
-- `vercel-react-best-practices`: use when brainstorming React/Next.js architecture or performance trade-offs
-- `security-review`: use when brainstorming touches auth, data handling, or security-sensitive features
+- `dw-council`: invoked by the dispatcher's **council** mode — multi-advisor stress-test of the most promising options with mandatory steel-manning and concession tracking. The dispatcher fires it when 2+ paths tie OR consensus forms too quickly (false-consensus signal). Not invoked on every brainstorm — only when signals justify it.
+- `dw-simplification`: invoked by the dispatcher's **refactor-audit** mode — applies Chesterton's Fence + complexity metrics + the new **deep-modules** reference (deletion test, locality, leverage, interface depth) to every flagged smell.
+- `dw-ui-discipline`: use when brainstorming involves frontend or UI direction — its hard-gate (scene sentence, surface job) is a generative forcing function during ideation, not just a review check. Also picked up by the **prototype** mode's UI branch.
+- `vercel-react-best-practices`: use when brainstorming React/Next.js architecture or performance trade-offs.
+- `security-review`: use when brainstorming touches auth, data handling, or security-sensitive features.
 ## Template Reference
 - Brainstorm matrix template: `.dw/templates/brainstorm-matrix.md` (relative to workspace root)
-- Durable one-pager template: `.dw/templates/idea-onepager.md` (used with `--onepager` flag)
+- Durable one-pager template: `.dw/templates/idea-onepager.md` (used by the **onepager** mode)
 Use this command when the user wants to:
 - Generate ideas for product, UX, architecture, or automation
@@ -63,6 +84,19 @@ Use this command when the user wants to:
 <critical>The brainstorm is a **product-level** phase, not technical. DO NOT dive into architecture, stack, endpoints, schemas. That's the techspec's job. Here we work user journeys, value, features, and boundaries.</critical>
+### 0. Signal Reading (always first, unless `--quiet` or explicit `--mode=`)
+Before producing any output, **read the situation**:
+1. Inspect `.dw/spec/prd-*/`, `.dw/rules/`, `.dw/constitution.md`, `.dw/intel/` if they exist. Note the project's current vocabulary and recent PRDs.
+2. Inspect recent git activity (`git log --oneline -20`) to detect ongoing work.
+3. Re-read the user's request against the Auto-Dispatch table at the top of this file. Match signals to modes.
+4. Decide the dispatch: **option-matrix** always runs unless an explicit mode override skips it. Other modes (grill, prototype, council, research, refactor-audit, onepager) fire **additively** when their signals are present.
+5. State the dispatch decision to the user in one short line: e.g., "Dispatching: option-matrix + onepager (PRD is one step away)" or "Dispatching: grill (vocabulary unsettled in current PRD)". Do not bury this — surface it before running.
+6. Then proceed with the modes in this order (when chained): grill → research → option-matrix → council → refactor-audit → prototype → onepager. Skip modes not in the dispatch.
+### Standard flow (option-matrix mode)
 1. Start by summarizing the problem in 1 to 3 sentences.
 2. **Reframe as "How Might We"**: turn the raw idea into `How might we [verb] for [user] so that [outcome]?`. This pulls the team out of premature "solution mode".
 3. **Product Inventory (required if the product exists)**:
@@ -80,7 +114,7 @@ Use this command when the user wants to:
    - Approximate effort level
 7. Whenever it makes sense, include conservative, balanced, and bold alternatives.
 8. Close with a pragmatic recommendation and clear next steps.
-9. **If the `--onepager` flag is present**: at the end, generate `.dw/spec/ideas/<slug>.md` using `.dw/templates/idea-onepager.md`, filling Feature Inventory, Classification & Rationale, Recommended Direction (product language), MVP Scope (user stories), Not Doing, Key Assumptions, and Open Questions. Report the path to the user.
+9. **If the dispatcher selected `onepager` mode** (auto-fires when conversation has converged enough, or user signals they're heading to `/dw-plan prd` next): at the end, generate `.dw/spec/ideas/<slug>.md` using `.dw/templates/idea-onepager.md`, filling Feature Inventory, Classification & Rationale, Recommended Direction (product language), MVP Scope (user stories), Not Doing, Key Assumptions, and Open Questions. Report the path to the user.
 ## Preferred Response Format
@@ -104,7 +138,7 @@ Use this command when the user wants to:
 - Recommend 1 or 2 paths
 - Explain why they win in the current context
-### 6. One-pager (if `--onepager`)
+### 6. One-pager (if `onepager` mode fired)
 - Path of the created file at `.dw/spec/ideas/<slug>.md`
 ### 7. Next Steps
@@ -141,13 +175,15 @@ At the end, always leave the user in one of these situations:
 - With a clear recommendation (including an IMPROVES/CONSOLIDATES/NEW classification)
 - With better questions to decide
 - With a next workspace command to follow
-- With the one-pager at `.dw/spec/ideas/<slug>.md` (if `--onepager` was used)
-- With the research report at `~/Documents/<Topic>_Research_<date>/` (if `--research`)
-- With the refactor plan at `<target>/refactor-plan.md` (if `--refactor`)
+- With the one-pager at `.dw/spec/ideas/<slug>.md` (if **onepager** mode fired)
+- With the research report at `~/Documents/<Topic>_Research_<date>/` (if **research** mode fired)
+- With the refactor plan at `<target>/refactor-plan.md` (if **refactor-audit** mode fired)
+- With sharpened glossary entries in `.dw/rules/` (if **grill** mode fired)
+- With a runnable throwaway prototype + verdict template (if **prototype** mode fired)
-## Mode: `--research` (multi-source research)
+## Mode: research (multi-source research)
-Activated by the `--research` flag. Replaces the default brainstorm with a structured research pipeline that produces a cited document with verified claims.
+Fires when the question depends on external state-of-the-art (multi-source comparisons, framework / regulatory landscape, decisions needing cited evidence). Override: `--mode=research`. Replaces the default option-matrix with a structured research pipeline that produces a cited document with verified claims.
 <critical>Every factual claim MUST be cited immediately with [N] in the same sentence</critical>
 <critical>NEVER fabricate citations — if no source is found, say so explicitly</critical>
@@ -165,7 +201,7 @@ Activated by the `--research` flag. Replaces the default brainstorm with a struc
 ```
 Selection
 ├── Initial exploration → quick (3 phases, 2-5 min)
-├── Standard research → standard (6 phases, 5-10 min) [DEFAULT for --research]
+├── Standard research → standard (6 phases, 5-10 min) [DEFAULT for research]
 ├── Critical decision → deep (8 phases, 10-20 min)
 └── Comprehensive review → ultradeep (8+ phases, 20-45 min)
 ```
@@ -216,9 +252,9 @@ Target lengths: quick 2-4k words; standard 4-8k; deep 8-15k; ultradeep 15-20k+.
 - Label speculation explicitly.
 - Admit uncertainty: "No sources found for X."
-## Mode: `--refactor` (code-smell catalog)
+## Mode: refactor-audit (code-smell catalog + deep-modules)
-Activated by the `--refactor` flag. Audits a target codebase area for refactoring opportunities using Martin Fowler's smell taxonomy.
+Fires when the user points at a directory or describes a code area as "messy" / "needs cleanup" / "tech debt", or when a quarterly health-check is requested. Override: `--mode=refactor-audit`. Audits the target area for refactoring opportunities using Martin Fowler's smell taxonomy combined with the deep-modules analysis (deletion test, locality, leverage, interface depth) bundled in the `dw-simplification` skill.
 <critical>ASK EXACTLY 3 CLARIFICATION QUESTIONS BEFORE STARTING THE ANALYSIS</critical>
@@ -292,6 +328,111 @@ Saved to `<target>/refactor-plan.md`:
 - Proposing refactors that aren't tested or testable → high risk, won't ship.
 - Ignoring documented architectural decisions in `.dw/rules/` → flagging intentional design as smell.
+## Mode: grill (domain-grilling)
+Fires when vocabulary feels unsettled — user terms differ from `.dw/rules/` / `.dw/constitution.md`, two synonyms compete in the same conversation, or a contributor proposes a name that clashes with the glossary. Override: `--mode=grill`. Replaces the option-generation default with an **interview-style stress-test** of the plan/PRD against the project's domain vocabulary. Each round of questions sharpens one piece. Updates `.dw/rules/` (or `.dw/constitution.md`) inline as terms crystallize — never deferred to "after the conversation."
+<critical>Ask ONE question at a time. Wait for the answer. Don't dump a list of 5 questions and hope for the best.</critical>
+### When to use grill mode
+- Before `/dw-plan prd` when the domain feels unsettled or the team uses competing terms.
+- After `/dw-plan prd` when reviewers flag ambiguous language in the PRD.
+- During architectural discussion when "module", "service", "component" are being used interchangeably and you need to pin the canonical term.
+- When a contributor proposes a name that doesn't match the project's existing glossary.
+### During-session disciplines
+1. **Challenge against the glossary.** Read `.dw/rules/index.md` + per-module `.dw/rules/<module>.md` + `.dw/constitution.md`. Flag terminology conflicts the instant the user uses a term that differs from (or contradicts) what's already documented.
+2. **Sharpen fuzzy language.** When the user says "the user thing" or "the order stuff", propose a precise canonical term. Don't pretend you understood — push back.
+3. **Discuss concrete scenarios.** Force precision via specific edge cases: "What happens to the Order in state X when event Y arrives during retry Z?" Vague answers go back as further questions.
+4. **Cross-reference code.** When the user states a behavior, glance at the codebase to confirm it. Surface contradictions: "You said the API returns `OrderId` but `src/api/orders.ts:42` returns `{ order_id, status }`." Don't argue from generalities.
+5. **Update `.dw/rules/` inline.** When a term crystallizes, write it into the appropriate rules file in the same conversation turn. Lazy file creation: if the file doesn't exist, create it. Format follows the glossary discipline established by the project (see `.dw/rules/index.md` for shape).
+6. **Skip implementation details in the glossary.** `.dw/rules/` and `.dw/constitution.md` describe vocabulary and principles — not implementation. "Order: a customer's request to purchase one or more items, in one of these states: pending, paid, shipped, delivered, refunded" is fine. "Order: a TypeScript class in `src/orders/`" is implementation leak.
+### ADR creation discipline
+Only propose an ADR via `/dw-adr` when **all three** hold:
+| Criterion | Test |
+|-----------|------|
+| **Hard to reverse** | If we change this in 6 months, does it cost >1 week of work? |
+| **Surprising without context** | Would a new contributor reasonably reach a different decision? |
+| **Genuine trade-off** | Was there a real alternative we considered and chose against? |
+If any is missing, skip the ADR. Don't ADR every casual decision — that turns the ADR folder into noise.
+### Output
+grill mode produces:
+- **Updated `.dw/rules/<module>.md`** or `.dw/constitution.md` with crystallized terms.
+- **Updated PRD / TechSpec** if grilling happens mid-plan (terms in the artifact are aligned with the glossary).
+- **Optional `.dw/spec/<prd>/adrs/adr-NNN.md`** if criteria above hold.
+- **NO** option matrix or recommendation (that's option-matrix mode; grill is purely about sharpening). If the dispatcher chained grill+option-matrix, the option matrix runs in a separate phase.
+### When the discipline bends
+- **Greenfield project with no `.dw/rules/`**: grill anyway; the conversation produces the FIRST entries in `.dw/rules/index.md`. That's the value.
+- **Cosmetic terminology disagreements** ("should we call it `userId` or `user_id`?"): skip grill mode; use a coding-conventions ADR or `.dw/rules/index.md` Naming section.
+## Mode: prototype (throwaway prototype)
+Fires when the user asks "is this state model right?" / "what should this look like?" with no clear answer — i.e., the next sensible step is to RUN code, not write words. Override: `--mode=prototype`. Builds a **throwaway prototype that answers a single question**. The question decides the shape — pick a branch.
+<critical>The prototype is throwaway from day one. Don't polish. Don't add tests. Don't extract abstractions. The point is to LEARN something fast and then DELETE or absorb.</critical>
+### Pick a branch
+| User's question | Branch |
+|-----------------|--------|
+| "Does this state/logic model feel right?" | **LOGIC** — interactive terminal app that pushes the state machine through edge cases that are hard to reason about on paper. |
+| "What should this look like?" | **UI** — several radically different UI variations on a single route, switchable via a URL search param and a floating bottom bar. |
+If the question is ambiguous, ask the user. If user not reachable: default by surrounding code (backend module → LOGIC; page/component → UI) and state the assumption at the top of the prototype.
+### Rules (apply to both branches)
+1. **Throwaway from day one, clearly marked.** Place the prototype next to the module/page it's prototyping for (so context is obvious) but name it so a casual reader can see it's a prototype (`prototype-<slug>.ts`, `prototype-route.tsx`, etc.).
+2. **One command to run.** Whatever the project's task runner supports — `pnpm <name>`, `python <path>`, `bun <path>`, etc. The user must start it without thinking.
+3. **No persistence by default.** State lives in memory. Persistence is what the prototype is CHECKING, not what it depends on. If the question explicitly involves a database, hit a scratch DB or a local file with a clear `PROTOTYPE — wipe me` name.
+4. **Skip the polish.** No tests, no error handling beyond what makes the prototype runnable, no abstractions.
+5. **Surface the state.** After every action (LOGIC) or on every variant switch (UI), print or render the full relevant state so the user can see what changed.
+6. **Delete or absorb when done.** When the prototype has answered its question, either delete it or fold the validated decision into real code. Don't leave it rotting in the repo.
+### When done
+The **answer** is the only thing worth keeping. Capture it durably:
+- Commit message that closes the prototype: "removed prototype X; decided <answer> based on <observation>"
+- Or an ADR (if the criteria from grill mode hold)
+- Or `.dw/spec/<prd>/NOTES.md` if mid-PRD
+- Or an issue comment if user-driven
+If the user isn't around, leave a `PROTOTYPE VERDICT: <pending>` placeholder so the next pass can fill it in before deletion.
+### Output
+prototype mode produces:
+- **Throwaway code file(s)** in the appropriate location.
+- A `NOTES.md` next to the prototype with the QUESTION it's answering.
+- After the user runs it and answers the question, instructions to remove the prototype + capture the verdict.
+### Anti-patterns
+- Building a prototype that's actually a feature in disguise — production-quality code, tests, deployment config. That's not a prototype; it's a first draft.
+- Leaving the prototype in the repo "in case we need it later" — six months later it's load-bearing.
+- Not capturing the verdict — the prototype answered the question and the answer evaporated.
+- Multiple prototypes stacked up at once — pick one question, answer it, move.
 ## Inspired by
 The codebase-grounded idea refinement pattern is inspired by [`addyosmani/agent-skills@idea-refine`](https://skills.sh/addyosmani/agent-skills/idea-refine) (Addy Osmani, Google — 1.4K+ installs). Adaptations for dev-workflow:
@@ -301,6 +442,8 @@ The codebase-grounded idea refinement pattern is inspired by [`addyosmani/agent-
 - Output at `.dw/spec/ideas/<slug>.md` (sibling of `prd-<slug>/`) instead of `docs/ideas/` — preserves dev-workflow path conventions.
 - Integration with the existing pipeline: `/dw-plan prd` accepts the one-pager as input, reducing clarification questions.
-Credit: Addy Osmani and the `idea-refine` pattern.
+The **grill** and **prototype** modes are adapted from [`mattpocock/skills/grill-with-docs`](https://github.com/mattpocock/skills/tree/main/grill-with-docs) and [`mattpocock/skills/prototype`](https://github.com/mattpocock/skills/tree/main/prototype) (Matt Pocock, MIT). dev-workflow adaptation: integrated as INTERNAL auto-dispatched modes rather than separate skills, paths rebased on `.dw/rules/` + `.dw/spec/<prd>/`, ADR creation gated on the 3-criteria test (hard-to-reverse + surprising + genuine trade-off).
+Credit: Addy Osmani (idea-refine) and Matt Pocock (grill-with-docs, prototype).
 </system_instructions>

package/scaffold/pt-br/agent-instructions.md CHANGED Viewed

@@ -26,9 +26,7 @@ Este projeto usa [`@brunosps00/dev-workflow`](https://www.npmjs.com/package/@bru
 | "Só code review qualidade" | `/dw-review --code-only` |
 | "Hora de commitar" / mudanças validadas e prontas | `/dw-commit` |
 | "Abre um PR" / "Sobe isso" | `/dw-generate-pr` |
-| "Brainstorm X" / "Explora ideias" | `/dw-brainstorm "X"` |
-| "Research X" / "Compara A vs B com citações" | `/dw-brainstorm --research "X"` |
-| "Auditoria de saúde do código" / "Tech debt" / "Oportunidades de refactor" | `/dw-brainstorm --refactor` |
+| "Brainstorm X" / "Explora ideias" / "Research X" / "Auditoria de saúde do código" / "Tech debt" | `/dw-brainstorm "X"` (auto-dispatch dos modos grill / prototype / council / research / refactor-audit / onepager conforme os sinais) |
 | "Onde está X?" / "O que usa Y?" / "Como Z é estruturado?" | `/dw-intel "<pergunta>"` |
 | "Reconstrói o índice" / "Refresh do intel" | `/dw-intel --build` |
 | "Redesign dessa UI" / "Audita e entrega novo design" | `/dw-redesign-ui "<target>"` |
@@ -58,6 +56,12 @@ Quando qualquer destes se aplica, responda direto e **não** invoque comando `dw
 - Usuário diz explicitamente "faz direto" / "pula autopilot" / "não precisa de PRD" — honre.
 - A conversa já está dentro de um fluxo `dw-*` (você já está executando tasks; não inicie pipeline novo).
+## Padrão zoom-out (para áreas desconhecidas do código)
+Quando você cai numa área do codebase que não conhece e a orientação custa mais que a tarefa em si, **não mergulhe nos arquivos primeiro** — peça a um agente de exploração que produza um mapa. Passe o glossário de domínio do projeto (`.dw/rules/index.md`) e diga: "zoom out de um nível — me mostra os módulos relevantes, suas superfícies públicas, quem chama, e o fluxo de dados entre eles, usando o vocabulário do glossário de domínio." Pegue a visão geral, e só então mergulhe. Isso evita a armadilha de ler o arquivo mais profundo primeiro e reconstruir a arquitetura das folhas pra cima.
+Adaptado de [`mattpocock/skills/zoom-out`](https://github.com/mattpocock/skills/tree/main/zoom-out) (MIT).
 ## Referência de Workflow
 ```

package/scaffold/pt-br/commands/dw-brainstorm.md CHANGED Viewed

@@ -11,14 +11,34 @@ Você é um facilitador de brainstorming para o workspace atual. Este comando ex
 ## Posição no Pipeline
 **Antecessor:** (ideia do usuário) | **Sucessor:** `/dw-plan prd`
-## Flags
+## Como este comando funciona (auto-dispatch, não switchboard de flags)
-- **(padrão)**: brainstorm normal com 3-7 opções (conservadora, equilibrada, ousada) e trade-offs. Se o produto tem PRDs ou rules, **Product Inventory** é produzido automaticamente e cada opção recebe tag de classificação.
-- **`--onepager`**: ao final do brainstorm, gera one-pager durável em `.dw/spec/ideas/<slug>.md` (usando `.dw/templates/idea-onepager.md`) com Feature Inventory + Classification & Rationale + MVP Scope + Not Doing + Assumptions. Use quando quiser artefato persistido antes de seguir para `/dw-plan prd`.
-- **`--council`**: após o brainstorm normal, invoca a skill `dw-council` para stress-test das top 2-3 opções através de 3-5 archetypes (pragmatic-engineer, architect-advisor, security-advocate, product-mind, devils-advocate). Útil quando a escolha é de alto impacto e há genuine dissent entre caminhos.
-- **`--research`**: modo de research multi-source. Pipeline: scope → plan → retrieve (sources paralelos) → triangulate → outline-refine → synthesize → critique → refine → report. Output: documento citado. Use pra state-of-the-art reviews, comparações de tech, regulatory landscape mapping. Sub-modos: `quick` (3 fases, 2-5min), `standard` (default, 6 fases, 5-10min), `deep` (8 fases, 10-20min), `ultradeep` (8+ fases, 20-45min).
-- **`--refactor`**: modo catálogo de code smells. Audita um diretório-alvo ou escopo de PRD por smells usando taxonomia de Martin Fowler (bloaters, change preventers, dispensables, couplers, complexidade condicional, violações DRY). Mapeia cada smell pra técnica de refactoring com sketches before/after. Plano severity-ordered P0-P3. Output: documento de oportunidades de refactor.
-- Flags combináveis onde faz sentido: `--onepager --council` produz one-pager após debate. `--research --onepager` salva research como one-pager durável. `--refactor --onepager` salva plano de refactor como one-pager. `--research --refactor` NÃO suportado (escolha um — surfaces de ideação diferentes).
+`/dw-brainstorm` roda **FULL** por padrão. Abre com uma fase de **Signal Reading** que inspeciona o pedido do usuário, o estado do projeto (PRDs, rules, intel, commits recentes) e a conversa até agora, e então **dispara um ou mais modos internos**. O usuário não escolhe o modo — o comando escolhe.
+Modos internos (o dispatcher seleciona 1+):
+| Modo | Dispara automaticamente quando |
+|------|--------------------------------|
+| **option-matrix** (default sempre ativo) | Surface padrão: 3-7 opções (conservadora / equilibrada / ousada) com tags `[IMPROVES] / [CONSOLIDATES] / [NEW]`. Sempre roda salvo override explícito. |
+| **grill** | Vocabulário está instável — termos do usuário divergem de `.dw/rules/` / `.dw/constitution.md`, ou dois sinônimos competem na mesma conversa, ou alguém propõe um nome que conflita com o glossário. |
+| **prototype** | Usuário pergunta "esse modelo de estado faz sentido?" / "como isso deveria parecer?" sem resposta clara; ou o próximo passo razoável é RODAR código, não escrever palavras. |
+| **council** | Duas ou mais abordagens competem sem vencedor óbvio; ou o consenso se forma rápido demais (sinal de false-consensus). |
+| **research** | A pergunta depende de state-of-the-art externo ("qual é a best practice atual para X", comparações multi-fonte, landscape regulatório ou de framework). |
+| **refactor-audit** | Usuário aponta um diretório ou descreve uma área como "bagunçada", "precisa de limpeza", "tech debt"; ou pede health-check trimestral. |
+| **onepager** | A conversa convergiu o suficiente para merecer um artefato durável (`.dw/spec/ideas/<slug>.md`); ou o usuário sinaliza que vai chamar `/dw-plan prd` em seguida. |
+Modos podem encadear numa sessão — grill pode revelar uma pergunta de design que o dispatcher manda para prototype; refactor-audit pode produzir findings que o dispatcher manda para council pra stress-test.
+### Overrides opcionais (raramente necessários)
+- **`--mode=<nome>`** — força um dispatch específico e pula Signal Reading. Nomes: `option-matrix`, `grill`, `prototype`, `council`, `research`, `refactor-audit`, `onepager`. Combine com `+` para encadear explicitamente: `--mode=grill+onepager`.
+- **`--quiet`** — pula Signal Reading inteiramente e roda apenas `option-matrix` como facilitador mínimo.
+Power users que já sabem o que querem podem passar `--mode=`. Todo mundo mais ganha auto-dispatch por padrão — o comando lê a situação e age.
+### Nota de migração (transitória)
+Invocações antigas com flags (`--onepager`, `--council`, `--research`, `--refactor`, `--grill`, `--prototype`) continuam aceitas por um ciclo minor e mapeiam para o `--mode=` equivalente. Código novo deve usar `--mode=` ou confiar no auto-dispatch.
 ## Fluxograma de Decisão: Brainstorm vs PRD Direto
@@ -42,15 +62,16 @@ digraph brainstorm_decision {
 Quando disponíveis no projeto em `./.agents/skills/`, use para enriquecer a ideação:
-- `dw-council` (opt-in via `--council`): stress-test multi-advisor das opções mais promissoras com steel-manning obrigatório e concession tracking. **NÃO invocar por padrão** — só quando a flag está presente ou quando surge consenso rápido demais (sinal de false consensus).
-- `dw-ui-discipline`: use quando o brainstorm envolver frontend ou direção de UI — o hard-gate (scene sentence, surface job) é forcing function generativa durante ideação, não só check de review
-- `vercel-react-best-practices`: use quando explorar arquitetura React/Next.js ou trade-offs de performance
-- `security-review`: use quando o brainstorm tocar auth, manipulação de dados ou features sensíveis à segurança
+- `dw-council`: invocada pelo modo **council** do dispatcher — stress-test multi-advisor das opções mais promissoras com steel-manning obrigatório e concession tracking. O dispatcher dispara quando 2+ caminhos empatam OU consenso se forma rápido demais (sinal de false-consensus). Não roda em todo brainstorm — só quando os sinais justificam.
+- `dw-simplification`: invocada pelo modo **refactor-audit** — aplica Chesterton's Fence + métricas de complexidade + a nova referência **deep-modules** (deletion test, locality, leverage, interface depth) em todo smell flagueado.
+- `dw-ui-discipline`: use quando o brainstorm envolver frontend ou direção de UI — o hard-gate (scene sentence, surface job) é forcing function generativa durante ideação, não só check de review. Também usado pelo branch UI do modo **prototype**.
+- `vercel-react-best-practices`: use quando explorar arquitetura React/Next.js ou trade-offs de performance.
+- `security-review`: use quando o brainstorm tocar auth, manipulação de dados ou features sensíveis à segurança.
 ## Referência do Template
 - Template da matriz de brainstorm: `.dw/templates/brainstorm-matrix.md` (relativo ao workspace root)
-- Template do one-pager durável: `.dw/templates/idea-onepager.md` (usado com flag `--onepager`)
+- Template do one-pager durável: `.dw/templates/idea-onepager.md` (usado pelo modo **onepager**)
 Use este comando quando o usuario quiser:
 - gerar ideias para produto, UX, arquitetura ou automacao
@@ -63,6 +84,19 @@ Use este comando quando o usuario quiser:
 <critical>O brainstorm é fase **nível de produto**, não técnica. NÃO entre em arquitetura, stack, endpoints, schemas. Isso é trabalho do techspec. Aqui trabalhamos jornada do usuário, valor, features e fronteiras.</critical>
+### 0. Signal Reading (sempre primeiro, exceto com `--quiet` ou `--mode=` explícito)
+Antes de produzir qualquer output, **leia a situação**:
+1. Inspecione `.dw/spec/prd-*/`, `.dw/rules/`, `.dw/constitution.md`, `.dw/intel/` se existirem. Anote vocabulário atual e PRDs recentes.
+2. Inspecione git recente (`git log --oneline -20`) pra detectar trabalho em andamento.
+3. Releia o pedido do usuário contra a tabela de Auto-Dispatch no topo desse arquivo. Casa sinais com modos.
+4. Decida o dispatch: **option-matrix** sempre roda salvo override que pule. Outros modos (grill, prototype, council, research, refactor-audit, onepager) disparam **aditivamente** quando seus sinais estão presentes.
+5. Diga ao usuário em uma linha curta qual o dispatch decidido: ex. "Dispatch: option-matrix + onepager (PRD está um passo à frente)" ou "Dispatch: grill (vocabulário instável no PRD atual)". Não esconda — surface antes de rodar.
+6. Depois, execute os modos nessa ordem (quando encadeados): grill → research → option-matrix → council → refactor-audit → prototype → onepager. Pule modos fora do dispatch.
+### Fluxo padrão (modo option-matrix)
 1. Comece resumindo o problema em 1 a 3 frases.
 2. **Reformule como "How Might We"**: transforme a ideia bruta em `How might we [verbo] para [usuário] de forma que [resultado]?`. Isso tira o time de "solution mode" prematuro.
 3. **Product Inventory (obrigatório se o produto existe)**:
@@ -80,7 +114,7 @@ Use este comando quando o usuario quiser:
    - nível de esforço aproximado
 7. Sempre que fizer sentido, inclua alternativas conservadora, equilibrada e ousada.
 8. Feche com recomendação pragmática e próximos passos claros.
-9. **Se a flag `--onepager` estiver presente**: ao final, gerar `.dw/spec/ideas/<slug>.md` usando `.dw/templates/idea-onepager.md`, preenchendo Feature Inventory, Classification & Rationale, Recommended Direction (linguagem de produto), MVP Scope (user stories), Not Doing, Key Assumptions e Open Questions. Apresentar path ao usuário ao final.
+9. **Se o dispatcher selecionou o modo `onepager`** (auto-dispara quando a conversa converge ou usuário sinaliza que vai pra `/dw-plan prd`): ao final, gerar `.dw/spec/ideas/<slug>.md` usando `.dw/templates/idea-onepager.md`, preenchendo Feature Inventory, Classification & Rationale, Recommended Direction (linguagem de produto), MVP Scope (user stories), Not Doing, Key Assumptions e Open Questions. Apresentar path ao usuário ao final.
 ## Formato de resposta preferido
@@ -104,7 +138,7 @@ Use este comando quando o usuario quiser:
 - recomende 1 ou 2 caminhos
 - diga por que eles vencem no contexto atual
-### 6. One-pager (se `--onepager`)
+### 6. One-pager (se modo `onepager` disparou)
 - path do arquivo criado em `.dw/spec/ideas/<slug>.md`
 ### 7. Próximos passos
@@ -141,13 +175,15 @@ Ao final, sempre deixe o usuario em uma destas situacoes:
 - com uma recomendacao clara (incluindo classificação IMPROVES/CONSOLIDATES/NEW)
 - com perguntas melhores para decidir
 - com um proximo comando do workspace para seguir
-- com o one-pager em `.dw/spec/ideas/<slug>.md` (se `--onepager` foi usado)
-- com o relatório de research em `~/Documents/<Tópico>_Research_<data>/` (se `--research`)
-- com o plano de refactor em `<target>/refactor-plan.md` (se `--refactor`)
+- com o one-pager em `.dw/spec/ideas/<slug>.md` (se modo **onepager** disparou)
+- com o relatório de research em `~/Documents/<Tópico>_Research_<data>/` (se modo **research** disparou)
+- com o plano de refactor em `<target>/refactor-plan.md` (se modo **refactor-audit** disparou)
+- com entradas de glossário sharpened em `.dw/rules/` (se modo **grill** disparou)
+- com um protótipo throwaway rodável + template de verdict (se modo **prototype** disparou)
-## Modo: `--research` (research multi-fonte)
+## Modo: research (research multi-fonte)
-Ativado pela flag `--research`. Substitui o brainstorm padrão por um pipeline estruturado de research que produz documento citado com claims verificados.
+Dispara quando a pergunta depende de state-of-the-art externo (comparações multi-fonte, framework/regulatory landscape, decisões precisando de evidência citada). Override: `--mode=research`. Substitui o option-matrix padrão por um pipeline estruturado de research que produz documento citado com claims verificados.
 <critical>Cada afirmação factual DEVE ser citada imediatamente com [N] na mesma frase</critical>
 <critical>NUNCA fabrique citações — se não encontrar fonte, diga explicitamente</critical>
@@ -165,7 +201,7 @@ Ativado pela flag `--research`. Substitui o brainstorm padrão por um pipeline e
 ```
 Seleção
 ├── Exploração inicial → quick (3 fases, 2-5 min)
-├── Research padrão → standard (6 fases, 5-10 min) [DEFAULT pra --research]
+├── Research padrão → standard (6 fases, 5-10 min) [DEFAULT pra research]
 ├── Decisão crítica → deep (8 fases, 10-20 min)
 └── Review abrangente → ultradeep (8+ fases, 20-45 min)
 ```
@@ -216,9 +252,9 @@ Tamanhos-alvo: quick 2-4k palavras; standard 4-8k; deep 8-15k; ultradeep 15-20k+
 - Rotular especulação explicitamente.
 - Admitir incerteza: "Sem fontes encontradas para X."
-## Modo: `--refactor` (catálogo de code smells)
+## Modo: refactor-audit (catálogo de code smells + deep-modules)
-Ativado pela flag `--refactor`. Audita uma área-alvo do codebase por oportunidades de refactoring usando taxonomia de smells de Martin Fowler.
+Dispara quando o usuário aponta um diretório ou descreve uma área como "bagunçada" / "precisa de limpeza" / "tech debt", ou pede health-check trimestral. Override: `--mode=refactor-audit`. Audita a área-alvo por oportunidades de refactoring usando a taxonomia de smells de Martin Fowler combinada com a análise deep-modules (deletion test, locality, leverage, interface depth) embutida na skill `dw-simplification`.
 <critical>FAÇA EXATAMENTE 3 PERGUNTAS DE CLARIFICAÇÃO ANTES DE INICIAR ANÁLISE</critical>
@@ -292,6 +328,111 @@ Salvo em `<target>/refactor-plan.md`:
 - Propor refactors sem teste ou não-testáveis → alto risco, não shippa.
 - Ignorar decisões arquiteturais documentadas em `.dw/rules/` → flagar design intencional como smell.
+## Modo: grill (domain-grilling)
+Dispara quando o vocabulário está instável — termos do usuário divergem de `.dw/rules/` / `.dw/constitution.md`, dois sinônimos competem, ou alguém propõe um nome que conflita com o glossário. Override: `--mode=grill`. Substitui o option-matrix por um **stress-test estilo entrevista** do plan/PRD contra o vocabulário do projeto. Cada rodada sharpens um pedaço. Atualiza `.dw/rules/` (ou `.dw/constitution.md`) inline conforme termos cristalizam — nunca adia pra "depois da conversa".
+<critical>Pergunte UMA pergunta de cada vez. Espere a resposta. Não despeje 5 perguntas e torça pelo melhor.</critical>
+### Quando usar grill mode
+- Antes de `/dw-plan prd` quando o domínio parece instável ou o time usa termos competindo.
+- Depois de `/dw-plan prd` quando reviewers flagam linguagem ambígua no PRD.
+- Durante discussão de arquitetura quando "módulo", "serviço", "componente" são usados de forma intercambiável e precisa fixar o termo canônico.
+- Quando alguém propõe um nome que não combina com o glossário existente do projeto.
+### Disciplinas durante a sessão
+1. **Desafie contra o glossário.** Leia `.dw/rules/index.md` + `.dw/rules/<modulo>.md` + `.dw/constitution.md`. Flague conflitos de terminologia no instante em que o usuário usa um termo que diverge do que já está documentado.
+2. **Sharpen linguagem vaga.** Quando o usuário disser "a coisa do user" ou "aquele lance de pedidos", proponha um termo canônico preciso. Não finja que entendeu — empurre de volta.
+3. **Discuta cenários concretos.** Force precisão com edge cases específicos: "O que acontece com a Order no estado X quando o evento Y chega durante o retry Z?" Respostas vagas voltam como mais perguntas.
+4. **Cross-reference o código.** Quando o usuário afirmar um comportamento, olhe rápido no codebase pra confirmar. Surface contradições: "Você disse que a API retorna `OrderId` mas `src/api/orders.ts:42` retorna `{ order_id, status }`." Não argumente em generalidades.
+5. **Atualize `.dw/rules/` inline.** Quando um termo cristaliza, escreva no arquivo de rules apropriado no mesmo turn da conversa. Lazy file creation: se o arquivo não existir, crie. Formato segue a disciplina de glossário do projeto (ver `.dw/rules/index.md`).
+6. **Pule detalhe de implementação no glossário.** `.dw/rules/` e `.dw/constitution.md` descrevem vocabulário e princípios — não implementação. "Order: pedido de um cliente para comprar itens, em um destes estados: pending, paid, shipped, delivered, refunded" é bom. "Order: uma classe TypeScript em `src/orders/`" é vazamento de implementação.
+### Disciplina de criação de ADR
+Só proponha um ADR via `/dw-adr` quando **todos os três** valem:
+| Critério | Teste |
+|----------|-------|
+| **Difícil de reverter** | Se mudarmos em 6 meses, custa >1 semana de trabalho? |
+| **Surpreendente sem contexto** | Um novo contribuinte chegaria razoavelmente a uma decisão diferente? |
+| **Trade-off real** | Havia uma alternativa real considerada e descartada? |
+Se algum falta, pule o ADR. Não ADR toda decisão casual — vira ruído na pasta de ADRs.
+### Output
+O modo grill produz:
+- **`.dw/rules/<modulo>.md` ou `.dw/constitution.md` atualizado** com termos cristalizados.
+- **PRD / TechSpec atualizado** se grill rodou no meio do planejamento (termos alinhados com o glossário).
+- **`.dw/spec/<prd>/adrs/adr-NNN.md` opcional** se os critérios acima valem.
+- **NÃO** produz option matrix ou recomendação (esse é o option-matrix; grill é só sharpening). Se o dispatcher encadeou grill+option-matrix, o option matrix roda em fase separada.
+### Quando a disciplina dobra
+- **Projeto greenfield sem `.dw/rules/`**: grille mesmo assim; a conversa produz as PRIMEIRAS entradas em `.dw/rules/index.md`. Isso é o valor.
+- **Discordância cosmética de terminologia** ("usamos `userId` ou `user_id`?"): pule grill mode; use ADR de convenção de código ou seção Naming em `.dw/rules/index.md`.
+## Modo: prototype (protótipo descartável)
+Dispara quando o usuário pergunta "esse modelo de estado faz sentido?" / "como isso deveria parecer?" sem resposta clara — i.e., o próximo passo razoável é RODAR código, não escrever palavras. Override: `--mode=prototype`. Constrói um **protótipo descartável que responde a uma única pergunta**. A pergunta decide a forma — escolha um branch.
+<critical>O protótipo é descartável desde o dia um. Não polir. Não adicionar testes. Não extrair abstrações. O ponto é APRENDER algo rápido e depois DELETAR ou absorver.</critical>
+### Escolha um branch
+| Pergunta do usuário | Branch |
+|---------------------|--------|
+| "Esse modelo de state/logic faz sentido?" | **LOGIC** — terminal app interativo que empurra a máquina de estado por edge cases difíceis de raciocinar no papel. |
+| "Como isso deveria parecer?" | **UI** — várias variações radicalmente diferentes de UI num único route, toggleable por search param e bottom bar flutuante. |
+Se a pergunta é ambígua, pergunte ao usuário. Se não puder alcançar: default pelo contexto (módulo backend → LOGIC; página/componente → UI) e declare a suposição no topo do protótipo.
+### Regras (valem para os dois branches)
+1. **Descartável desde o dia um, claramente marcado.** Coloque o protótipo perto do módulo/página que ele está prototipando (pra contexto) mas nomeie pra que um leitor casual veja que é protótipo (`prototype-<slug>.ts`, `prototype-route.tsx`, etc.).
+2. **Um comando pra rodar.** Qualquer que seja o task runner do projeto — `pnpm <nome>`, `python <path>`, `bun <path>`, etc. O usuário tem que rodar sem pensar.
+3. **Sem persistência por padrão.** Estado vive em memória. Persistência é o que o protótipo está VERIFICANDO, não algo do qual depende. Se a pergunta envolve banco, use um DB scratch ou arquivo local com nome claro `PROTOTYPE — wipe me`.
+4. **Pule o polish.** Sem testes, sem error handling além do mínimo pro protótipo rodar, sem abstrações.
+5. **Surface o estado.** Depois de cada ação (LOGIC) ou troca de variante (UI), imprima ou renderize o estado relevante completo pro usuário ver o que mudou.
+6. **Delete ou absorva quando terminar.** Quando o protótipo respondeu sua pergunta, ou delete ou dobre a decisão validada em código real. Não deixe apodrecendo no repo.
+### Quando terminar
+A **resposta** é a única coisa que vale guardar. Capture duravelmente:
+- Commit message fechando o protótipo: "removed prototype X; decided <resposta> based on <observação>"
+- Ou um ADR (se os critérios do grill valem)
+- Ou `.dw/spec/<prd>/NOTES.md` se mid-PRD
+- Ou comentário em issue se user-driven
+Se o usuário não está por perto, deixe um placeholder `PROTOTYPE VERDICT: <pending>` pro próximo pass preencher antes da deleção.
+### Output
+O modo prototype produz:
+- **Arquivo(s) de código descartável** na localização apropriada.
+- Um `NOTES.md` ao lado do protótipo com a PERGUNTA que está respondendo.
+- Depois do usuário rodar e responder a pergunta, instruções pra remover o protótipo + capturar o verdict.
+### Anti-patterns
+- Construir protótipo que é feature disfarçada — código production-quality, testes, deploy config. Isso não é protótipo; é primeiro draft.
+- Deixar o protótipo no repo "por via das dúvidas" — seis meses depois é load-bearing.
+- Não capturar o verdict — protótipo respondeu a pergunta e a resposta evaporou.
+- Múltiplos protótipos empilhados — escolha uma pergunta, responda, mova.
 ## Inspired by
 O padrão de codebase-grounded idea refinement é inspirado em [`addyosmani/agent-skills@idea-refine`](https://skills.sh/addyosmani/agent-skills/idea-refine) (Addy Osmani, Google — 1.4K+ installs). Adaptações para o dev-workflow:
@@ -301,6 +442,8 @@ O padrão de codebase-grounded idea refinement é inspirado em [`addyosmani/agen
 - Output em `.dw/spec/ideas/<slug>.md` (irmão de `prd-<slug>/`) em vez de `docs/ideas/` — mantém a convenção de paths do dev-workflow.
 - Integração com o pipeline existente: `/dw-plan prd` aceita o one-pager como input, reduzindo perguntas de clarificação.
-Crédito: Addy Osmani e o padrão `idea-refine`.
+Os modos **grill** e **prototype** são adaptados de [`mattpocock/skills/grill-with-docs`](https://github.com/mattpocock/skills/tree/main/grill-with-docs) e [`mattpocock/skills/prototype`](https://github.com/mattpocock/skills/tree/main/prototype) (Matt Pocock, MIT). Adaptação dev-workflow: integrados como modos INTERNOS auto-dispatchados em vez de skills separadas, paths rebaseados em `.dw/rules/` + `.dw/spec/<prd>/`, criação de ADR gated no teste 3-critérios (difícil de reverter + surpreendente + trade-off real).
+Crédito: Addy Osmani (idea-refine) e Matt Pocock (grill-with-docs, prototype).
 </system_instructions>

package/scaffold/skills/dw-simplification/SKILL.md CHANGED Viewed

@@ -1,6 +1,6 @@
 ---
 name: dw-simplification
-description: Use when simplifying code. Chesterton's Fence (understand WHY first), behavior-preserving refactor, complexity metrics. Triggers from /dw-review --code-only and /dw-brainstorm --refactor.
+description: Use when simplifying code. Chesterton's Fence (WHY first), behavior-preserving refactor, complexity metrics, deep-modules analysis. Triggers from /dw-review and /dw-brainstorm refactor-audit.
 allowed-tools:
   - Read
   - Edit
@@ -18,7 +18,7 @@ Behavioral discipline for simplifying code without breaking it. The trap of refa
 Read this skill when:
 - `/dw-review --code-only` flagged a complexity issue (deep nesting, long function, duplication).
-- `/dw-brainstorm --refactor` proposed a simplification target.
+- `/dw-brainstorm` dispatched its **refactor-audit** mode and proposed a simplification target.
 - The user explicitly asks to "clean this up" / "simplify X".
 - During `/dw-run` if the implementation accidentally produced complex code that wants pre-commit cleanup.
@@ -117,11 +117,13 @@ For changes that altered cyclomatic complexity, optionally run a complexity anal
 In the formal Level 3 review (post-Level 2 chain from `dw-review-implementation`), code-review flags complexity issues using the patterns above. Each flagged issue references this skill: "consider simplifying via guard clauses; apply Chesterton's Fence — verify why the nested check exists before flattening."
-## How `dw-refactoring-analysis` uses this
+## How `/dw-brainstorm` refactor-audit mode uses this
+The refactor-audit mode dispatched by `/dw-brainstorm` catalogs code smells (Fowler vocabulary) AND runs the deep-modules analysis (`references/deep-modules.md`) against the target area. For each smell or shallow-module flag, the proposed refactor cites:
-The analysis catalogs code smells (Fowler vocabulary). For each smell, the proposed refactor cites:
 1. Which simplification rule applies (early return / extract method / lookup table / etc.).
 2. Whether Chesterton's Fence concerns block the refactor (existing tests inadequate? no recent commits explaining the structure? → flag as YELLOW, don't act).
+3. Whether the deep-modules test points the other way (some "code smells" are actually deep-module wrappers that absorb complexity; the fix is to make the wrapper deeper, not to flatten it).
 ## Anti-patterns
@@ -136,7 +138,10 @@ The analysis catalogs code smells (Fowler vocabulary). For each smell, the propo
 - `references/chestertons-fence.md` — the protocol in detail; case studies of "obvious-but-wrong" removals.
 - `references/complexity-metrics.md` — when each metric (cyclomatic, cognitive, depth, fanout) actually matters; how to measure cheaply.
 - `references/behavior-preserving.md` — characterization tests, refactor with test gate, rollback patterns, codemod tooling per language.
+- `references/deep-modules.md` — high-leverage modules behind small interfaces; deletion test, locality, leverage, seam, adapter diagnostic; anti-patterns (shallow wrapper, god-module). Invoked by `/dw-brainstorm` refactor-audit mode.
 ## Inspired by
-Adapted from [`addyosmani/agent-skills/code-simplification`](https://github.com/addyosmani/agent-skills) by Addy Osmani (MIT license). Core principles (Chesterton's Fence, behavior preservation, scope discipline, Rule of 500) preserved. dev-workflow integration: invoked by `dw-code-review` and `dw-refactoring-analysis` via Complementary Skills.
+Adapted from [`addyosmani/agent-skills/code-simplification`](https://github.com/addyosmani/agent-skills) by Addy Osmani (MIT license). Core principles (Chesterton's Fence, behavior preservation, scope discipline, Rule of 500) preserved. dev-workflow integration: invoked by `dw-code-review` and `/dw-brainstorm` refactor-audit mode via Complementary Skills.
+The deep-modules reference is adapted from [`mattpocock/skills/improve-codebase-architecture`](https://github.com/mattpocock/skills/tree/main/improve-codebase-architecture) by Matt Pocock (MIT license). Core framing (deep modules = high leverage at small interface, deletion test, shallow-wrapper anti-pattern) preserved; paths and integration points rebased on dev-workflow conventions.

package/scaffold/skills/dw-simplification/references/deep-modules.md ADDED Viewed

@@ -0,0 +1,105 @@
+# Deep modules — high leverage at small interface
+A **deep module** absorbs complexity behind a narrow public surface. Its callers see a small, stable API and get a lot of work done per call. A **shallow module** is the opposite — small interface AND small implementation, so it adds an indirection without absorbing any complexity. Or worse: a god-module, deep implementation BUT huge interface, where the surface itself is the complexity.
+The refactor-audit mode of `/dw-brainstorm` checks every candidate module against this framing alongside the Fowler smell taxonomy. A "code smell" can mislead if the construct is actually a deep wrapper doing its job correctly.
+## The premise
+Good abstractions hide complexity. The metric isn't "how short is the code?" — it's "how much complexity does the interface eliminate per unit of public surface area?"
+- **Deep module**: 1000 lines of implementation, 3 public functions. Caller does NOT need to know the 1000 lines.
+- **Shallow module**: 50 lines of implementation, 8 public functions. Caller has to learn all 8 to use the 50 productively.
+- **God-module**: 5000 lines of implementation, 60 public functions. Surface area is itself unmanageable.
+The deep version wins not because it's shorter overall (it's longer), but because the cost of using it is concentrated in the implementation, not spread to every caller.
+## Diagnostic checklist
+When the refactor-audit mode evaluates a module, ask these in order:
+### 1. Deletion test
+> If this module were deleted, how many call sites would break?
+- **0–1 callers**: the module isn't pulling its weight. Inline it; the abstraction is shallower than the call sites' real shape. Mark this as `low` priority candidate for removal.
+- **2–3 callers**: borderline. Look at what each caller does with the result — if they immediately wrap/unwrap, the module is shallow. If they consume the result directly, leave it alone.
+- **4+ callers**: the module is at least somewhat load-bearing. Move to next checks.
+### 2. Locality test
+> Does the caller pass everything the module needs, OR does the module reach into globals / shared state to fill gaps?
+A deep module is **self-contained**: callers pass in arguments, get back results. A module that reads from `process.env`, a shared singleton, or implicit thread-local state has a deeper coupling than its interface admits — the caller has to know about the hidden inputs too.
+If the module reaches into shared state, its **effective interface** is bigger than what the function signature shows. That's a hidden shallowness — the public surface lies about its real cost.
+### 3. Leverage test
+> LOC saved per caller × number of callers — what's the multiplicative payoff?
+Crude formula: if each caller would otherwise write ~N lines of inline code to replace this module's call, and there are M callers, the module's leverage is N × M. A module whose leverage is < 2× its own LOC is shallow — it's not absorbing enough complexity to justify itself.
+### 4. Seam test
+> How often does the public interface change relative to the implementation?
+A stable public interface with frequently-changing implementation is the **ideal seam** — callers stay put while internals evolve. An interface that changes every few months means callers track it; the abstraction is leaking.
+Check `git log --oneline -- <module>` against `git log --oneline -- <module-public-surface-file>`. If they move together, the seam is weak.
+### 5. Adapter test
+> Does this module mediate between two foreign vocabularies?
+The best deep modules are **adapters** — they translate between two domains that have different naming conventions, error semantics, or state models. A module that doesn't mediate between domains is harder to justify as deep; it's probably either a passthrough (shallow) or just a piece of the same domain stretched across files (cohesion issue).
+## Anti-patterns
+### Shallow wrapper
+```ts
+// shallow — passthrough with one line of "work"
+export function getUserName(id: string) {
+  return db.users.findById(id).name;
+}
+```
+The wrapper saves the caller 8 characters (`.name`) and forces them to learn a new function name. The leverage is < 1. Inline it.
+When to keep a one-liner: it's an **adapter** that locks down ONE meaning of a query. If `getUserName` exists specifically because there are 4 candidate fields (`name`, `displayName`, `username`, `fullName`) and the team's canonical answer is "use `name`", the function IS pulling its weight by serving as the project's vocabulary anchor. Document the WHY in the function body.
+### God-module
+A class with 60 public methods is shallow even if each method is 100 lines, because the caller has to learn 60 concepts. Split by **role**: what is the smallest subset of methods a caller typically uses together? Each subset becomes a deep module of its own.
+### Implementation-leak abstraction
+A module that returns its internal representation directly (`return this.cache;`) makes callers depend on the internals. The first time the cache shape changes, every caller breaks. Wrap or copy on egress.
+## When refactor-audit raises a smell, also check deep-modules
+The combined verdict:
+| Fowler smell | Deep-modules check | Action |
+|--------------|--------------------|--------|
+| Long Method | Is the method INSIDE a deep module that callers don't see? | If yes, leave it — internal complexity is OK in deep modules. If no, extract. |
+| Large Class | Is the public surface narrow despite the line count? | If yes, leave it — that's deep. If no, split by role. |
+| Long Parameter List | Does each parameter represent a real choice the caller makes? | If yes, the deep module is forcing necessary configuration; introduce a config object. If parameters are derivable from each other, reduce. |
+| Feature Envy | Is the module a deep adapter between domains? | If yes, the "envy" is the mediation — keep it. If no, move the logic closer to its data. |
+| Middle Man | Does the middle man absorb any complexity? | If no, delete the middle man. If yes, document the why (it's a deep wrapper). |
+A flat "this is too long, extract method" recommendation is shallow analysis. Combining Fowler + deep-modules surfaces when the smell is actually a feature.
+## Output for refactor-audit findings
+When the audit flags a module as a shallow-wrapper or god-module candidate, the finding entry adds a `Deep-modules: <verdict>` line:
+```markdown
+### P1 — Shallow Wrapper
+**Files:** src/lib/getUserName.ts
+**Symptom:** 1-line passthrough wrapper around `db.users.findById(id).name`; 2 callers.
+**Deep-modules:** SHALLOW — fails deletion test (only 2 callers) AND leverage test (0 LOC saved per call).
+**Refactor:** Inline at both call sites. If the wrapper exists for vocabulary reasons, document and keep — otherwise remove.
+**Risk:** Low.
+```