npm - @gitwhy-cli/whyspec - Versions diffs - 0.1.17 → 0.1.19 - Mend

@gitwhy-cli/whyspec 0.1.17 → 0.1.19

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (27) hide show

package/dist/adapters/claude-code.d.ts +5 -10
package/dist/adapters/claude-code.d.ts.map +1 -1
package/dist/adapters/claude-code.js +55 -115
package/dist/adapters/claude-code.js.map +1 -1
package/dist/adapters/codex.d.ts.map +1 -1
package/dist/adapters/codex.js +12 -9
package/dist/adapters/codex.js.map +1 -1
package/dist/adapters/types.d.ts +3 -1
package/dist/adapters/types.d.ts.map +1 -1
package/dist/adapters/types.js +15 -13
package/dist/adapters/types.js.map +1 -1
package/dist/commands/init.d.ts.map +1 -1
package/dist/commands/init.js +23 -29
package/dist/commands/init.js.map +1 -1
package/package.json +2 -2
package/skill-sources/whyspec-capture/SKILL.md +241 -0
package/skill-sources/whyspec-debug/SKILL.md +288 -0
package/skill-sources/whyspec-execute/SKILL.md +210 -0
package/skill-sources/whyspec-plan/SKILL.md +331 -0
package/skill-sources/whyspec-search/SKILL.md +137 -0
package/skill-sources/whyspec-show/SKILL.md +140 -0
package/skills/whyspec-capture/SKILL.md +0 -154
package/skills/whyspec-debug/SKILL.md +0 -404
package/skills/whyspec-execute/SKILL.md +0 -118
package/skills/whyspec-plan/SKILL.md +0 -170
package/skills/whyspec-search/SKILL.md +0 -69
package/skills/whyspec-show/SKILL.md +0 -90

package/skill-sources/whyspec-capture/SKILL.md ADDED Viewed

@@ -0,0 +1,241 @@
+---
+name: whyspec-capture
+description: Use after coding to preserve reasoning — resolves the Decision Bridge with actual outcomes.
+argument-hint: "[change-name]"
+---
+Capture reasoning — create a context file that resolves the Decision Bridge and preserves the full story.
+View the complete story with `/whyspec:show`
+---
+**Input**: Optionally specify a change name. If omitted, auto-detect the most recently executed change.
+## Iron Law
+**CAPTURE REASONING, NOT SUMMARIES.** "We used Redis" is a summary. "We chose Redis over in-memory because the app runs on 3 instances and rate limits must be shared — in-memory would let users bypass limits by hitting different instances" is reasoning. Every decision needs the WHY.
+## Red Flags — If You're Thinking This, STOP
+- "The implementation was straightforward, not much to capture" → Every implementation has decisions. Find them.
+- "I'll just summarize what files changed" → That's a git log, not reasoning. Capture WHY, not WHAT.
+- "There were no surprises" → Look harder. Did you deviate from the plan at all? Change any approach? That's a surprise.
+- "The decisions are obvious from the code" → They're obvious NOW. In 6 months, they won't be. Write the rationale.
+## Steps
+1. **Select the change**
+   If ARGUMENTS provides a name, use it. Otherwise:
+   - Auto-detect the most recently executed change (look for changes with completed tasks)
+   - If ambiguous, run `whyspec list --json` and let the user select
+2. **Read plan files for Decision Bridge mapping**
+   Read these files from the change folder — **required** before generating context:
+   - `<path>/intent.md` — the stated intent, "Decisions to Make" checkboxes
+   - `<path>/design.md` — the approach, "Questions to Resolve" items
+   Extract and track:
+   - Every `- [ ]` or `- [x]` item under "Decisions to Make" → each MUST be resolved in the context
+   - Every item under "Questions to Resolve" → each MUST be answered
+   - The stated constraints and success criteria → compare against what actually happened
+3. **Get capture data from CLI**
+   ```bash
+   whyspec capture --json "<name>"
+   ```
+   Parse the JSON response:
+   - `template`: Context file template
+   - `commits`: Commits associated with this change (auto-detected from git)
+   - `files_changed`: Files modified during implementation (auto-detected)
+   - `decisions_to_make`: Decision checkboxes extracted from plan files
+   - `change_name`: The change name for the header
+4. **Populate the Decision Bridge**
+   This is the core of the capture. Map every planned decision to its outcome:
+   a. **Decisions to Make → Decisions Made**: For EACH checkbox from intent.md, record:
+      - What was decided
+      - Why (the rationale — not just the choice, but the reasoning)
+      - Any constraints that influenced the decision
+   b. **Questions to Resolve → Answers**: For EACH question from design.md, record:
+      - The answer that emerged during implementation
+      - How it was determined
+   c. **Capture Surprises**: Identify decisions made during implementation that were NOT in the original plan:
+      - "What did we decide that we didn't plan to decide?"
+      - "What changed from the original design?"
+      - "What unexpected requirements emerged?"
+   If a planned decision was NOT made during implementation, note it as unresolved and ask the user.
+   <examples>
+   <good>
+   ## Decisions Made
+   **Rate limit storage → Redis**
+   Chose Redis over in-memory. The app runs on 3 instances behind an ALB.
+   In-memory rate limiting would let users bypass limits by hitting different
+   instances. Redis adds ~2ms latency per check, but our p99 is already 180ms
+   so the overhead is negligible. Reused the existing ioredis client at
+   src/lib/redis.ts rather than adding a new dependency.
+   **Limit granularity → Both IP and token**
+   IP-only would block shared offices (NAT). Token-only would let unauthenticated
+   abuse through. Implemented tiered: 100/15min per IP for unauthenticated,
+   1000/15min per token for authenticated. The token tier uses the JWT sub claim.
+   **429 response → Standard with Retry-After**
+   Went with standard 429 + Retry-After header. Custom error body would require
+   updating all API clients. The Retry-After header is sufficient for automated
+   retry logic and doesn't break existing integrations.
+   Why good: Each decision has the WHAT (choice), WHY (rationale), and HOW
+   (specific implementation detail). References actual code and numbers.
+   </good>
+   <bad>
+   ## Decisions Made
+   - Used Redis for rate limiting
+   - Implemented per-IP and per-token limits
+   - Returns 429 status code
+   Why bad: Just restates WHAT was done. No WHY. No trade-off reasoning.
+   A future developer learns nothing about why these choices were made.
+   </bad>
+   <good>
+   ## Surprises (not in original plan)
+   **Added X-Request-ID middleware** — During implementation, discovered that
+   429 responses were impossible to debug without a request ID. Added
+   X-Request-ID header generation as a prerequisite in src/middleware/requestId.ts.
+   This wasn't in the plan but is essential for production debugging.
+   Follow-up: Should be extracted into its own WhySpec change if we add more observability.
+   **Changed Redis key schema** — Plan assumed simple key-value, but discovered
+   the sliding window algorithm needs sorted sets. Changed from `ratelimit:{ip}`
+   string keys to `ratelimit:{ip}` sorted sets with timestamp scores.
+   This affects the Redis memory profile — noted in Risks.
+   Why good: Documents unplanned decisions with full context. Notes follow-up items.
+   </good>
+   <bad>
+   (No surprises section)
+   Why bad: Every implementation has surprises. If you didn't document any,
+   you weren't paying attention.
+   </bad>
+   </examples>
+5. **Generate ctx_<id>.md in SaaS XML format**
+   Write to `<path>/ctx_<id>.md` using the GitWhy SaaS format:
+   ```xml
+   <context>
+     <title>Short title describing what was built and why</title>
+     <story>
+       Phase-organized engineering journal. First-person, chronological.
+       Capture the FULL reasoning — not a summary.
+       Phase 1 — [Setup/Context]:
+       What the user asked for, initial understanding, preparation work.
+       Phase 2 — [Implementation]:
+       What was built, key decision points encountered, problems solved.
+       Reference specific files and approaches.
+       Phase 3 — [Verification]:
+       How the work was verified, test results, manual checks.
+     </story>
+     <reasoning>
+       Why this approach was chosen over alternatives.
+       <decisions>
+         - [Planned decision] — [chosen option] — [rationale]
+       </decisions>
+       <rejected>
+         - [Alternative not chosen] — [why it was rejected]
+       </rejected>
+       <tradeoffs>
+         - [Trade-off accepted] — [what was gained vs lost]
+       </tradeoffs>
+       Surprises (decisions not in the original plan):
+       - [Unexpected decision] — [why it was needed]
+     </reasoning>
+     <files>
+       path/to/file.ts — new — Brief description
+       path/to/other.ts — modified — Brief description
+     </files>
+     <agent>claude-code (model-name)</agent>
+     <tags>comma, separated, domain, keywords</tags>
+     <verification>Test results and build status</verification>
+     <risks>Open questions, follow-up items, known limitations</risks>
+   </context>
+   ```
+6. **Show summary**
+   ```
+   ## Reasoning Captured: <name>
+   Context: ctx_<id>.md
+   Decision Bridge:
+     Planned decisions resolved: N/N
+     Questions answered: N/N
+     Surprises captured: N
+   Files documented: N
+   Commits linked: N
+   View the full story: /whyspec:show <name>
+   ```
+## Tools
+| Tool | When to use | When NOT to use |
+|------|------------|-----------------|
+| **Read** | Read intent.md and design.md for Decision Bridge mapping (REQUIRED first step) | Don't skip reading plan files |
+| **Bash** | Run `whyspec capture --json`, `git log --oneline` to find commits, `git diff` to review changes | Don't modify code during capture |
+| **Write** | Create the ctx_<id>.md context file | Don't overwrite existing context files |
+| **Grep** | Search for decisions referenced in plan files (verify they were implemented) | Don't search the entire codebase |
+| **AskUserQuestion** | When a planned decision was NOT made during implementation — ask the user to resolve | Don't ask about decisions that are clearly resolved in the code |
+### AskUserQuestion Format (for unresolved decisions only)
+1. **Re-ground**: "Capturing reasoning for **<change>**"
+2. **The gap**: Which planned decision wasn't resolved
+3. **What you found**: Evidence from the code about what actually happened
+4. **Ask for rationale**: "What drove this choice?"
+## Rationalization Table
+| If you catch yourself thinking... | Reality |
+|----------------------------------|---------|
+| "This decision is obvious, no need to explain it" | It's obvious now. In 6 months with new team members, it won't be. |
+| "The code is self-documenting" | Code shows WHAT. Context captures WHY. They're complementary. |
+| "There were no surprises during implementation" | You changed zero things from the plan? Really? Look again. |
+| "I'll just list the files that changed" | That's `git diff --stat`. The capture's value is reasoning, not file lists. |
+| "The rationale is already in the commit messages" | Commit messages are 1-2 lines. Reasoning is paragraphs. Different depth. |
+## Guardrails
+- **Must read plan files FIRST** — never generate context without reading intent.md and design.md. The Decision Bridge requires mapping FROM plan TO outcome.
+- **Every planned decision must be resolved** — if intent.md lists 5 "Decisions to Make", all 5 must appear in the context. Prompt the user for any that weren't addressed.
+- **Never skip surprises** — unplanned decisions are the most valuable context. Actively search for them.
+- **Use SaaS XML format exactly** — the `<context>` tags must match the GitWhy format so `git why log` and `git why push` work without conversion.
+- **Include verification results** — what tests pass, what was manually verified. Evidence, not claims.
+- **Don't fabricate rationale** — if you don't know why a decision was made, ask the user. Invented reasoning is worse than no reasoning.
+- **One context per capture** — each `/whyspec:capture` invocation creates exactly one `ctx_<id>.md` file.

package/skill-sources/whyspec-debug/SKILL.md ADDED Viewed

@@ -0,0 +1,288 @@
+---
+name: whyspec-debug
+description: Use when encountering any bug, test failure, or unexpected behavior — before proposing fixes.
+argument-hint: "<bug-description-or-change-name>"
+---
+# WhySpec Debug — Scientific Investigation
+Debug systematically. No fix without root cause.
+The investigation is automatically saved as a context file when resolved.
+---
+**Input**: A bug description, error message, or change name for an existing debug session.
+## Iron Law
+**NO FIX WITHOUT VERIFIED ROOT CAUSE.** Guessing at fixes creates more bugs than it solves. A wrong diagnosis leads to a wrong fix that masks the real problem.
+## Red Flags — If You're Thinking This, STOP
+- "The fix is obvious, I'll just apply it" → If it's obvious, verification takes 10 seconds. Do it.
+- "I'll try this fix and see if it works" → That's guess-and-check, not debugging. Form a hypothesis first.
+- "This is too simple to need the full process" → Simple bugs have root causes too. The process is fast for simple bugs.
+- "I already know what's wrong from the stack trace" → The stack trace shows WHERE, not WHY. Investigate the WHY.
+- "Let me just add some logging and see" → Logging is a test for a hypothesis. What's your hypothesis?
+## Rationalization Table
+| If you catch yourself thinking... | Reality |
+|----------------------------------|---------|
+| "The fix is obvious, skip investigation" | Obvious fixes have root causes too. Verify in 30 seconds. |
+| "It's just a typo/config issue" | Confirm it. Read the code. Don't assume. |
+| "I'll just try this quick fix first" | The first fix sets the pattern. Do it right from the start. |
+| "No time for full investigation" | Systematic debugging is FASTER than guess-and-check thrashing. |
+| "I already know what's wrong" | Then verification should take 10 seconds. Do it anyway. |
+| "It works on my machine" | That's a symptom, not a diagnosis. Find the environmental difference. |
+| "The error message tells me exactly what's wrong" | Error messages describe symptoms. Root causes are upstream. |
+| "Let me just revert the last change" | Revert is a workaround, not a fix. Why did the change break things? |
+## Tools
+| Tool | When to use | When NOT to use |
+|------|------------|-----------------|
+| **Grep** | Search for error messages, function names, patterns in code | Don't grep before forming hypotheses — symptoms first |
+| **Read** | Read suspect files, stack trace locations, config files | Don't read unrelated files — stay focused on hypotheses |
+| **Bash** | Run tests to reproduce, `git log`/`git diff` to find trigger commits, execute test commands for hypotheses | Don't run destructive commands or modify production data |
+| **Glob** | Find files by pattern when error references unknown paths | Don't glob the entire repo — scope to suspect areas |
+| **Write** | Write/update debug.md (investigation state) and ctx_<id>.md (final capture) | Don't write fix code until root cause is verified |
+| **WebSearch** | ONLY for: error messages with no codebase matches, library changelogs, CVE lookups | Never search web for: "how to debug X", generic solutions, or before investigating the codebase |
+| **AskUserQuestion** | Escalation ONLY — after 2 rounds of failed hypotheses, or when root cause is outside codebase | Don't ask before investigating. The codebase has the answers. |
+### Codebase First, Web Never-First
+Read the codebase BEFORE considering web search. Web search is justified ONLY when:
+- Error message has zero matches in the codebase or git history
+- Library version changelog needed (breaking changes between versions)
+- Security advisory lookup (CVEs)
+- Stack trace references internal framework code you can't read locally
+Never search the web for: "how to fix X", architecture decisions, or generic debugging advice.
+## Step 0: Team Knowledge Search
+Before investigating, check if someone has reasoned about this domain before:
+```bash
+whyspec search --json "<keywords from bug description>"
+```
+If results exist:
+- Display relevant titles and key decisions from past investigations
+- Note past decisions that might inform the current bug (e.g., "The auth middleware was deliberately changed in add-auth — this might explain the current session bug")
+If no results: note "No prior context found" and continue.
+This takes seconds. It prevents re-investigating solved problems.
+## Step 1: Symptoms Gathering
+Create the debug session:
+```bash
+whyspec debug --json "<bug-name>"
+```
+Parse the JSON response:
+- `path`: Debug session directory
+- `template`: debug.md template structure
+- `related_contexts`: Past contexts in the same domain
+**Gather symptoms** — investigate the codebase directly. Only ask the user if you genuinely can't find the information yourself:
+| Symptom | What to capture |
+|---------|----------------|
+| Expected behavior | What SHOULD happen |
+| Actual behavior | What ACTUALLY happens |
+| Error messages | Exact text, stack traces, error codes |
+| Reproduction steps | Minimal sequence to trigger the bug |
+| Timeline | When it started, what changed recently |
+| Scope | Who is affected, how often, which environments |
+<examples>
+<good>
+## Symptoms
+**Expected:** POST /api/users returns 201 with user object
+**Actual:** Returns 500 with "Cannot read properties of undefined (reading 'email')"
+**Error:** TypeError at src/handlers/users.ts:47 — `req.body.email` is undefined
+**Stack trace:**
+```
+TypeError: Cannot read properties of undefined (reading 'email')
+    at createUser (src/handlers/users.ts:47:28)
+    at Layer.handle (node_modules/express/lib/router/layer.js:95:5)
+```
+**Reproduction:** `curl -X POST localhost:3000/api/users -H "Content-Type: application/json" -d '{"name":"test"}'`
+**Timeline:** Started after commit a1b2c3d (merged express-validator upgrade, Apr 8)
+**Scope:** All POST endpoints with body parsing, not just /users. GET endpoints unaffected.
+Why good: Exact error with file:line, exact reproduction command,
+identified the trigger commit, and noticed the scope is broader than reported.
+</good>
+<bad>
+## Symptoms
+**Expected:** API should work
+**Actual:** Getting 500 errors
+**Error:** Server error
+Why bad: Vague symptoms lead to vague hypotheses. No file, no line,
+no reproduction steps, no timeline.
+</bad>
+</examples>
+**Write debug.md immediately** to `<path>/debug.md`. It persists across context resets.
+## Step 2: Hypothesis Formation
+Form **3 or more falsifiable hypotheses**:
+<examples>
+<good>
+### H1: express-validator upgrade broke body parsing middleware order
+- **Test:** `git diff a1b2c3d -- src/app.ts` — check if middleware registration order changed
+- **Disproof:** If middleware order is identical pre/post upgrade, this is wrong
+- **Status:** UNTESTED
+- **Likelihood:** HIGH (scope matches — all POST endpoints affected, timing matches upgrade)
+### H2: express-validator v7 changed req.body population timing
+- **Test:** Add `console.log(req.body)` before and after validation middleware, compare output
+- **Disproof:** If req.body is populated before validation in both versions, this is wrong
+- **Status:** UNTESTED
+- **Likelihood:** MEDIUM (v7 changelog mentions "async validation" changes)
+### H3: Content-Type header handling changed in the upgrade
+- **Test:** Send request with `Content-Type: application/json; charset=utf-8` — does it parse?
+- **Disproof:** If extended Content-Type works the same in both versions, this is wrong
+- **Status:** UNTESTED
+- **Likelihood:** LOW (but worth testing — charset handling has caused issues before)
+Why good: Each hypothesis is specific, testable, and falsifiable.
+They target different root causes. Likelihood is justified with evidence.
+</good>
+<bad>
+### H1: Something is wrong with the API
+- **Test:** Check the API
+- **Disproof:** If the API works
+### H2: Maybe a dependency issue
+- **Test:** Check dependencies
+Why bad: Not specific enough to test. "Check the API" is not a concrete action.
+</bad>
+</examples>
+Rank by likelihood. Test the most likely first. Update debug.md before proceeding.
+## Step 3: Hypothesis Testing
+Test each hypothesis **one at a time, sequentially**:
+1. Execute the test described in the hypothesis
+2. Record evidence — exact output, logs, observed behavior
+3. Evaluate — support, refute, or inconclusive?
+4. Update status: `CONFIRMED`, `DISPROVED`, or `INCONCLUSIVE`
+5. Update debug.md immediately
+**Rules:**
+- **One hypothesis at a time** — never test multiple simultaneously
+- **Max 3 tests per hypothesis** — if inconclusive after 3, mark INCONCLUSIVE and move on
+- **Preserve the crime scene** — record current state before modifying suspect code
+- **Update debug.md after each test** — don't batch
+If ALL hypotheses are disproved:
+- Form new hypotheses based on what evidence revealed
+- If stuck after a second round, escalate to the user
+## Step 4: Root Cause Verification
+Before proposing ANY fix:
+1. **State the root cause** clearly and specifically
+2. **Explain the causal chain**: [trigger] → [mechanism] → [symptom]
+3. **Verify predictive power**: can you predict the symptom from the cause?
+```markdown
+## Root Cause
+**Cause:** express-validator v7 switched to async validation, body parsing
+now completes AFTER route handler starts executing
+**Causal chain:** express-validator upgrade → async body parsing → req.body
+undefined when handler reads it synchronously → TypeError
+**Verified by:** Adding `await` before validation resolved the issue in test
+**Confidence:** HIGH
+```
+| Confidence | Criteria | Action |
+|-----------|----------|--------|
+| HIGH | Reliable reproduction, clear causal chain | Proceed to fix |
+| MEDIUM | Strong evidence, some uncertainty | Proceed with caution, note risks |
+| LOW | Circumstantial evidence | **Escalate — do NOT fix** |
+## Step 5: Fix + Auto-Capture
+Once root cause is verified (HIGH or MEDIUM confidence):
+1. **Implement the minimal fix** — fix the bug, don't refactor surrounding code
+2. **Verify the fix** — run reproduction steps again, run test suite
+3. **Update debug.md** — add fix details and prevention measures:
+   ```markdown
+   ## Fix
+   **Change:** Added `await` before express-validator `validationResult()` calls
+   **Files:** src/middleware/validate.ts (3 lines changed)
+   **Verification:** `npm test` — 47 pass, 0 fail. Manual curl test returns 201.
+   ## Prevention
+   - Added eslint rule for async validation middleware
+   - Added integration test: POST with body → verify req.body populated
+   - Updated UPGRADE.md with express-validator v7 migration notes
+   ```
+4. **Commit** atomically with root cause in message
+5. **Auto-capture** reasoning:
+   ```bash
+   whyspec capture --json "<bug-name>"
+   ```
+   Write `<path>/ctx_<id>.md` with the full investigation story.
+6. **Show summary:**
+   ```
+   ## Debug Complete: <bug-name>
+   Root cause: [one-line summary]
+   Fix: [what was changed]
+   Context: ctx_<id>.md
+   Investigation:
+     Hypotheses tested: N (M confirmed, P disproved)
+     Evidence entries: N
+     Past contexts referenced: N
+   View full investigation: /whyspec:show <bug-name>
+   ```
+## Resuming an Investigation
+If debug.md already exists for a change:
+1. Read debug.md
+2. Check Status and resume from the appropriate step
+3. Announce: "Resuming debug session: <name> — Status: <status>"
+## Escalation Rules
+| Trigger | Action |
+|---------|--------|
+| All hypotheses disproved (2 rounds) | Present full evidence, ask for new direction |
+| Cannot reproduce | Document symptoms, ask for environment details |
+| Root cause outside codebase | Document findings, suggest infrastructure investigation |
+| Confidence is LOW | Present evidence, explain uncertainty, do NOT fix |
+| Fix introduces significant risk | Present fix + risk assessment, ask for approval |
+| 3 failed fix attempts | Stop. Present what was tried. Ask for help. |
+**Never silently give up.** If stuck, present evidence and ask.
+## Guardrails
+- **No fix without root cause** — the Iron Law is non-negotiable
+- **Max 3 tests per hypothesis** — escalate if inconclusive
+- **Always capture reasoning** — every debug session produces both debug.md AND ctx_<id>.md
+- **Write debug.md incrementally** — update after EVERY step, not at the end
+- **Don't skip team knowledge** — always run Step 0
+- **Test one hypothesis at a time** — sequential testing produces clean evidence
+- **Preserve evidence** — record state before modifying suspect code
+- **Minimal fixes only** — fix the bug, don't refactor