npm - @brunosps00/dev-workflow - Versions diffs - 0.8.1 → 0.10.0 - Mend

@brunosps00/dev-workflow 0.8.1 → 0.10.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (96) hide show

package/scaffold/skills/dw-debug-protocol/references/non-reproducible-strategy.md ADDED Viewed

@@ -0,0 +1,108 @@
+# Non-reproducible bugs — strategy when you can't trigger it on demand
+Some bugs only fire in production, only for some users, only some times. The protocol differs from normal debugging because step 1 (reproduce) is the open question. Don't fix what you haven't seen.
+## Don't fix on guess
+The strongest urge with a non-reproducible bug is: "I bet I know what it is — let me push a fix and see if reports stop." This fails because:
+1. If the fix doesn't work, you wasted a deploy and learned nothing about the cause.
+2. If reports stop (correlation), you can't tell whether your fix worked or the trigger condition stopped occurring.
+3. You build a habit of guessing; the next bug, you guess wrong.
+Instead: instrument first, fix second.
+## The four sub-strategies
+### A. Timing / race conditions
+**Signs:** "Fails 1 out of N runs," "Worked locally, fails in CI," "Started failing after we sped up X."
+**Strategy:**
+1. Add timing-stamped logs at every async boundary in the suspect path.
+2. Push to a branch with extra logging; run the suite multiple times in CI.
+3. Compare the timestamps of failure runs vs success runs. Look for: events arriving in different order, gaps that exceed expectations.
+4. Once you see the order anomaly, you have the reproduction (forced via test that constrains order).
+**Tools:**
+- `Date.now()` or `performance.now()` in logs.
+- For network: response time logging.
+- For tests: `--repeat 100` (Jest, Vitest) to amplify rare failures.
+- Chaos: insert random `await sleep(rand)` at suspected boundaries to provoke order issues.
+### B. Environment-specific
+**Signs:** "Works in dev, fails in staging," "Works for one user, fails for another," "Worked yesterday, fails today, no code changes."
+**Strategy:**
+1. Diff the environments. Run a script that prints config (env vars, dependencies, OS, locale, feature flags) in both.
+2. Bisect by environment delta: change one variable at a time in the failing environment toward the working environment. The change that flips the bug is the cause.
+3. If user-specific: capture the user's state (account flags, history, A/B variants, browser, region) — diff against a working user.
+**Tools:**
+- Per-user replay: instrument enough that you can re-run a single user's session.
+- A/B comparison: working vs failing pair.
+- Feature flag audit: every flag that differs is a suspect.
+### C. State-dependent
+**Signs:** "Happens to long-time users, never new ones," "Only after they've done X then Y," "Cache-related," "Database-state-related."
+**Strategy:**
+1. Identify the state precondition. (Usually the failing user has a value or pattern in their data that the test fixtures don't.)
+2. Capture the failing user's state (sanitized DB rows, sanitized client state).
+3. Reproduce locally by seeding that exact state. Now you have a deterministic reproduction.
+4. Once reproduced, run the standard six-step protocol.
+**Tools:**
+- DB dump / sanitize / load locally.
+- Client state export (localStorage, IndexedDB, in-memory store snapshot).
+- Replay system that takes a captured state and replays user actions.
+**Privacy:** Sanitize before capturing. Production user data should never sit in dev environments unsanitized.
+### D. Frequency-dependent (rare)
+**Signs:** "We see this once a week," "Maybe 0.1% of requests," "Customer complained but logs don't show it."
+**Strategy:**
+1. Add specific logging or telemetry that would catch the bug NEXT time it fires.
+2. Deploy the logging.
+3. Wait for next occurrence (set up an alert if possible).
+4. Use the captured event as your reproduction.
+**Tools:**
+- Structured logs filterable by error pattern.
+- Sentry / Bugsnag / similar with breadcrumbs (action history before failure).
+- Custom telemetry on the suspect code path.
+**Time discipline:**
+- Don't sit waiting forever. Set a deadline ("if no event in 7 days, escalate or reprioritize").
+- Continue other work in parallel; this bug is "instrumented and waiting," not "actively in progress."
+## When to ship a guess
+Sometimes the bug is high-impact and waiting for instrumentation isn't acceptable. In that case:
+1. Acknowledge it's a guess in the commit and PR. Don't pretend you reproduced it.
+2. Make the fix MINIMAL. Don't rewrite a module to fix an unconfirmed bug — you'll cause new bugs.
+3. Add monitoring/alerting that would catch the bug if it persists. The fix has a verification path: "no recurrence reports in N days."
+4. Plan a follow-up: if the bug returns, fall back to the instrument-first protocol.
+A guess fix shipped with explicit acknowledgement is acceptable. A guess fix shipped pretending it was a real diagnosis is not — that pattern erodes the team's trust in fix quality.
+## Anti-patterns
+- "Just add a try/catch and log it" → hides the bug; report stops; cause persists.
+- "Restart the service / clear the cache" — operational mitigation, not a fix. Document it as such.
+- Fix shipped without monitoring → no way to know if it worked.
+- Fix shipped touching unrelated code → "while I was in there" expansion that turns one unverified change into many.
+- Refusing to ship until reproduced when production is suffering — perfectionism that costs users. Use the "ship a minimal guess" path with clear acknowledgement.
+## When to escalate
+After 4 hours instrumenting + waiting with no reproduction:
+- Surface the situation: what's instrumented, what's expected, what window of time you're waiting.
+- Ask whether to (a) keep waiting, (b) ship a minimal guess fix with monitoring, (c) reprioritize.
+- The decision is the user/team's, not yours alone. Provide the trade-off.

package/scaffold/skills/dw-debug-protocol/references/six-step-triage.md ADDED Viewed

@@ -0,0 +1,139 @@
+# Six-step debug triage
+Six discrete steps. Each has a specific output. Don't move to the next without finishing the current.
+## Step 1 — Reproduce
+**Question:** Can I trigger this on demand?
+**Output:** A failing test, a script, or a sequence of UI clicks that produces the bug. Save it. The reproduction IS the hypothesis test for your fix.
+**How:**
+- From a user report → ask: what version, what environment, what input, what was expected, what happened. Get all five before guessing.
+- From a stack trace → identify the entry point (HTTP route, CLI command, UI event) and the input that triggered it.
+- From a log line → find the surrounding context (request ID, user ID, timestamp); reconstruct the scenario.
+- Write a failing test FIRST when the bug is in pure logic. The test commits the bug to record before you fix it.
+**Common pitfall:** "It happens sometimes." This is not a reproduction. Either:
+- Find a deterministic trigger (often there's a state precondition you missed), OR
+- Quantify the rate (8/10 runs) and treat the test as flaky-but-progress.
+- See `non-reproducible-strategy.md` for state/timing/env-dependent cases.
+**Done when:** You have a one-line command (or a series of clicks) that produces the bug ≥80% of the time.
+## Step 2 — Localize
+**Question:** Where does the invariant break?
+**Output:** A file:line reference where state is wrong. Not where the symptom appears — where the cause lives.
+**How:**
+- Stack trace → walk UPWARDS from the failure point, asking at each frame: "is the input here valid?" The deepest frame with valid input is where the invariant breaks.
+- Bisect the change history if a recent commit broke things: `git bisect start && git bisect bad HEAD && git bisect good <known-good-sha>`.
+- Logs/instrumentation → add explicit log lines at suspected points; reproduce; narrow the window.
+- Debugger → set breakpoints at boundaries (entry/exit of suspect functions); inspect state.
+**Common pitfall:** Stopping at the first place that catches the error. The error often surfaces far from the cause.
+**Done when:** You can point at one specific function (or 2-3 closely related lines) and say "the violation happens here."
+## Step 3 — Reduce
+**Question:** What's the smallest input that triggers this?
+**Output:** A minimal reproduction — ideally 1-3 lines of input or a 5-10 line test case.
+**How:**
+- Start with your full reproduction.
+- Remove one element at a time (a config flag, a request field, a piece of state). Check if the bug still reproduces.
+- Repeat until removing anything else makes the bug disappear.
+**Why bother:**
+- Smaller repros are faster to test (your fix loop runs in seconds, not minutes).
+- Smaller repros surface the EXACT trigger; vague repros let you fool yourself.
+- The minimal repro often makes the cause obvious: "oh, it only fails when the array is empty."
+**Common pitfall:** Skipping this step because "I already know what's wrong." If you really know, the reduce takes 2 minutes. Do it.
+**Done when:** Removing one more thing makes the bug disappear.
+## Step 4 — Fix root cause
+**Question:** WHY is the invariant violated? Fix THAT.
+**Output:** Code change that addresses the cause, not the symptom.
+**How:**
+- Ask "why?" until you reach a design or assumption that's wrong, not just a bad value.
+  - "X is null" — why?
+  - "Because Y returned null" — why?
+  - "Because the cache was empty when called from this path" — why was the cache empty?
+  - "Because the warm-up runs after this path is reachable" — root cause: ordering.
+- Fix at the deepest "why" you can change without scope creep.
+- If the root cause is too deep ("this whole module assumes synchronous I/O"), fix at the shallowest level that makes the bug not recur, AND open a follow-up issue for the deeper fix.
+**Symptoms vs root cause — examples:**
+| Symptom fix (don't) | Root cause fix (do) |
+|---------------------|---------------------|
+| Catch the null and return default | Find why null was produced and prevent it |
+| Add a retry loop | Find why the first call failed |
+| Increase the timeout | Find why the operation is slow |
+| Add `if (process.env.NODE_ENV !== 'production')` guard | Find why prod has different behavior |
+| Suppress the warning | Address what the warning is warning about |
+**Common pitfall:** Adding `try/catch` around the symptom and moving on. This is hiding the bug, not fixing it.
+**Done when:** Reproduction now passes; lint + tests + build all green.
+## Step 5 — Guard against recurrence
+**Question:** What test would have caught this before deploy?
+**Output:** A regression test (or two) added to the suite that fails without your fix and passes with it.
+**How:**
+- Take the reproduction from step 1 → make it a test.
+- The test name should describe the bug ("returns empty list when cache warm-up is delayed", not "fix bug").
+- The test should fail clearly if the bug returns: assertion error with a message naming the invariant.
+- Place the test where future maintainers will find it (alongside the function with the cause, not in some catch-all "bugs.test.ts").
+**When you can't add a test:**
+- Bug requires production-only conditions: add a logging guard or a runtime invariant check that would fire on recurrence.
+- Bug is in infra/config: add a CI check that validates the config matches expectations.
+- Bug is in third-party library: open an upstream issue AND add a defensive wrapper with a test for your defense.
+**Common pitfall:** "I'll add the test later." You won't. Add it now or the bug returns.
+**Done when:** Test exists, was committed in same commit as fix, fails on `git revert <fix>`.
+## Step 6 — Verify end-to-end
+**Question:** Does the original report scenario now pass?
+**Output:** Proof — manual verification, screenshot, log trace, scripted E2E run.
+**How:**
+- Re-read the original bug report.
+- Run through it as if you're the reporter — same inputs, same path, same environment if possible.
+- Capture evidence (screenshot, terminal output, log line) that the issue no longer reproduces.
+**Why this step matters:**
+- Unit tests pass ≠ user-facing scenario works. The user's path may differ from the test's path.
+- Confirms you fixed the user's bug, not just A bug.
+- Catches "fixed it for case A but broke case B" regressions.
+**Common pitfall:** Stopping at "tests pass" and skipping the user-facing verification. Tests are necessary, not sufficient.
+**Done when:** You can confidently say "the original reporter would now see the expected behavior."
+## After all six steps
+The fix is ready to commit and ship:
+- Commit message: describes the bug, root cause, and fix in 1-3 sentences.
+- The regression test goes in the same commit as the fix.
+- If a deeper fix was deferred, the follow-up issue is linked from the commit.
+- E2E verification evidence is in the PR description (screenshot, log, run output).
+This is what "fixed" looks like. Anything less is "in progress."

package/scaffold/skills/dw-debug-protocol/references/stop-the-line.md ADDED Viewed

@@ -0,0 +1,52 @@
+# Stop the line — when to drop everything for a bug
+The Toyota factory metaphor: any worker can pull the cord and halt production when they see a defect. The reason isn't dramatic — it's economic. Defects compound downstream. Fixing one now is cheaper than ten later, on top of all the work that built on the broken state.
+In code, this means: when a real bug surfaces, you stop building features on top of unknown-state code.
+## What "stop the line" actually requires
+1. **Don't keep coding around it.** If a test is red, you don't merge other PRs that touch the same module until it's green.
+2. **Don't ignore "flaky" tests.** Flakiness is a bug — usually a race, sometimes a fixture. "Re-run the suite" is not a fix.
+3. **Don't downgrade priority without evidence.** "It's edge case" requires evidence the case is rare AND the impact is low. Often only one of those is true.
+4. **Document the pause.** A bug discovered → a tracking issue or task created → a clear "what's blocked by this" note.
+## When the rule bends
+Stop-the-line is not absolute. It bends when:
+- **The bug is in a parallel system you don't own.** Report it; continue your work. (Don't sit blocked waiting on someone else.)
+- **Production isn't on fire AND a release is shipping in <24h.** Sometimes "ship the working subset, fix the broken one in the next release" is correct. But this requires explicit acknowledgment, not silent slipping.
+- **The bug is informational, not blocking.** A console warning that doesn't affect behavior, a log noise, a deprecated-API notice — those go on a backlog, not stop the line.
+- **Reproduction needs production data.** Continue feature work in parallel branches; don't merge them until repro is achieved or scope-isolated.
+## When the rule does NOT bend
+- Tests are red and you don't know why → stop.
+- A previous green build went red after a merge → stop and bisect.
+- A user report can be reproduced → stop.
+- "Build passes locally but fails CI" → stop. CI is closer to truth.
+- Coverage drops below threshold without obvious reason → stop. (Removed a test silently?)
+## The cost of not stopping
+Three failure modes when teams ignore stop-the-line:
+1. **Compounding defects.** Bug A masks bug B; you fix A and now B surfaces in production. The user reports it as a regression of A's fix.
+2. **Lost reproduction.** A bug seen in commit N is hard to reproduce by commit N+5 because the surrounding code moved. The fix becomes "rewrite this whole area" because nobody can isolate the original issue anymore.
+3. **Test debt.** Red tests get marked `.skip` "temporarily" — and never come back. Three months later half the suite is skipped and nobody trusts it.
+## How to communicate "I'm stopping the line"
+- "Found a bug in X — pausing my current task to investigate. Will report status in 30 minutes."
+- "I can reproduce the issue. Estimated fix path: Y. Do you want me to push through, or hand off?"
+- "Stuck at localization for 60 minutes. Tried: A, B, C. Need fresh eyes."
+Brief, factual, no apology. Stopping is the right call; you're naming it.
+## Anti-patterns to avoid
+- "Quick fix and we'll come back to it" — almost never come back to it. Either fix root cause now or formally defer with a tracking issue.
+- "It might be flaky" — investigate, don't speculate. Flakiness has a cause.
+- Silently rolling back instead of investigating — you lost the chance to learn what broke.
+- "Worked around it" with a `// TODO: actual fix` — the workaround is now load-bearing in 3 weeks.

package/scaffold/skills/dw-execute-phase/SKILL.md ADDED Viewed

@@ -0,0 +1,133 @@
+---
+name: dw-execute-phase
+description: Phase execution and plan verification for dev-workflow. Two agents (executor for wave-based parallel task dispatch with deviation handling, plan-checker for goal-backward plan verification before execution). Used by /dw-execute-phase, /dw-plan-checker, /dw-run-plan, /dw-autopilot. Adapted from get-shit-done-cc (MIT).
+allowed-tools:
+  - Read
+  - Write
+  - Edit
+  - Bash
+  - Grep
+  - Glob
+---
+# dw-execute-phase
+Bundled skill providing **phase-level execution discipline** for dev-workflow: parallel task dispatch in waves, atomic commit per task, deviation handling mid-execution, and goal-backward plan verification before any execution burns context.
+## Why a skill (not inline)
+- The execution discipline (wave coordination, deviation rules, checkpoint protocol) is a separate concern from the commands that invoke it. Bundling it as a skill lets multiple commands (`/dw-run-plan`, `/dw-autopilot`, `/dw-execute-phase` itself) reuse the same discipline.
+- The plan-checker is a verification GATE — it must run before `/dw-run-plan`/`/dw-execute-phase` mutate code, and bundling it makes that contract visible.
+- The agents own the protocol; the orchestrating commands just wire them up.
+## When to Use
+Read this skill when:
+- `/dw-execute-phase` is invoked to run a batch of tasks in parallel waves.
+- `/dw-plan-checker` is invoked to verify a `tasks.md` file will achieve its PRD goal before execution.
+- `/dw-run-plan` is invoked (it spawns the executor agent for each wave).
+- `/dw-autopilot` enters the execution stage (it gates on plan-checker before invoking the executor).
+Do NOT use when:
+- A single one-off change is being made (use `/dw-run-task` directly — no waves needed).
+- The user is exploring/brainstorming, not executing (use `/dw-brainstorm`).
+- The plan hasn't been created yet (use `/dw-create-tasks` first).
+## Agents
+| Agent | Responsibility | Spawn from |
+|-------|----------------|------------|
+| `agents/executor.md` | Runs tasks in waves, atomic commit per task, handles deviations (3 deviation rules), respects checkpoint markers, writes `SUMMARY.md` per phase | `/dw-execute-phase`, `/dw-run-plan` |
+| `agents/plan-checker.md` | Goal-backward verification of `tasks.md` before execution. Checks: requirement coverage, task completeness, dependency soundness, artifact wiring, context budget. Returns PASS / REVISE / BLOCK. | `/dw-plan-checker`, `/dw-create-tasks` (auto-gate before declaring tasks ready) |
+## How the Two Agents Compose
+The expected flow:
+1. `/dw-create-tasks` produces `.dw/spec/prd-<slug>/tasks.md` from PRD + TechSpec.
+2. **Plan-checker GATE** — `/dw-plan-checker .dw/spec/prd-<slug>/` spawns the plan-checker agent. The agent reads PRD/TechSpec/tasks.md and verifies tasks WILL achieve the goal. Returns one of: `PASS` (proceed), `REVISE` (issues found, planner re-runs), `BLOCK` (fundamental gap, abort).
+3. `/dw-execute-phase` spawns the executor agent ONLY if plan-checker returned `PASS`. The executor runs tasks in waves, commits atomically, handles deviations.
+4. `/dw-run-qa` runs after all waves complete to validate the implementation against PRD.
+`/dw-autopilot` orchestrates this entire flow with hard gates between stages.
+## Wave Concept
+Tasks in `tasks.md` are grouped into **waves** by their `Depends on:` frontmatter:
+- Wave 1: tasks with no dependencies → run in parallel
+- Wave 2: tasks that depend on Wave 1 → run after Wave 1 commits land
+- Wave N: ...
+Within a wave, tasks run in parallel via subagent dispatch. Across waves, sequential.
+The executor calculates waves automatically by topologically sorting task dependencies. See `references/wave-coordination.md`.
+## Deviation Rules (during execution)
+Mid-task, the executor may discover the plan is incomplete or contradicted by reality. Three rules:
+1. **Auto-add missing critical functionality** — if a task says "create endpoint" but no validation is specified and the project's CLAUDE.md/rules require it, add the validation as part of the same task. Note in the task's commit message.
+2. **Surface ambiguity, don't guess** — if the plan says "use the existing service" and 3 services match, STOP, write a deviation entry to `.dw/spec/prd-<slug>/deviations.md`, and ask the user.
+3. **Block on architectural conflicts** — if the task as planned would violate a locked decision in CONTEXT.md / project rules, abort the task and surface the conflict for re-planning.
+Detail in `references/atomic-commits.md` (deviation entry format) and `references/plan-verification.md` (what plan-checker should catch BEFORE execution to prevent deviations).
+## Atomic Commit Protocol
+Every task ends with exactly one git commit:
+```
+feat(<scope>): <task title> (RF-XX)
+<one-line summary of what this commit delivers>
+- Files added: <list>
+- Files modified: <list>
+- Tests added/updated: <list>
+- Deviations: <link to deviations.md entry, if any>
+Closes RF-XX (partial — see tasks.md).
+```
+The commit message format is consistent across waves so `/dw-generate-pr` can build a clean PR body. See `references/atomic-commits.md`.
+## Checkpoint Protocol
+If the executor exhausts its context budget mid-phase OR the user signals stop:
+- Write current progress to `.dw/spec/prd-<slug>/active-session.md` (last completed task, next task, blockers).
+- Exit cleanly with status `CHECKPOINT`.
+- Next invocation of `/dw-resume` reads this file and resumes from the last completed task.
+## Files in `.dw/spec/prd-<slug>/`
+| File | Read by | Written by |
+|------|---------|------------|
+| `prd.md` | plan-checker, executor | `/dw-create-prd` |
+| `techspec.md` | plan-checker, executor | `/dw-create-techspec` |
+| `tasks.md` | plan-checker (verifies), executor (executes) | `/dw-create-tasks` |
+| `<NN>_task.md` | executor (per-task detail) | `/dw-create-tasks` |
+| `deviations.md` | plan-checker (next iteration), executor | executor (rule 1/2 deviations) |
+| `active-session.md` | `/dw-resume`, executor (continuation) | executor (checkpoint) |
+| `SUMMARY.md` | `/dw-generate-pr` | executor (after final wave) |
+## References
+- `references/wave-coordination.md` — how the executor groups tasks into waves and dispatches them in parallel.
+- `references/plan-verification.md` — the 6-dimension goal-backward analysis the plan-checker performs.
+- `references/atomic-commits.md` — commit message format, deviation entry format, when to use Edit vs Write.
+## Rules
+- **No execution without plan-checker PASS.** `/dw-execute-phase` and `/dw-run-plan` must call plan-checker first; if it returns REVISE or BLOCK, abort.
+- **One commit per task, no exceptions.** Even trivial tasks commit. This drives traceability and revert safety.
+- **Deviations are recorded, not silenced.** Every adjustment beyond the plan goes in `deviations.md` with reason.
+- **Checkpoint > timeout.** When context budget is low, checkpoint cleanly rather than running tasks half-way.
+- **Wave order is topological, not user-defined.** The executor computes wave boundaries from `Depends on:` fields; users can't override.
+## Inspired by
+Adapted from [`get-shit-done-cc`](https://github.com/gsd-build/get-shit-done) (`gsd-executor`, `gsd-plan-checker`) by gsd-build (MIT license). Core protocols (goal-backward verification, atomic commits, deviation handling, checkpoint resume) preserved. Path conventions changed from `.planning/<phase>/` to `.dw/spec/prd-<slug>/`. SDK CLI calls (`gsd-sdk query init.execute-phase`) replaced by inline operations. The companion `gsd-debugger` agent (1452 lines) was NOT ported — its scope overlaps with the existing `/dw-bugfix` and `/dw-fix-qa` commands.