npm - warp-os - Versions diffs - 1.1.2 → 1.2.1 - Mend

warp-os 1.1.2 → 1.2.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (45) hide show

package/CHANGELOG.md +85 -0
package/README.md +6 -4
package/VERSION +1 -1
package/agents/warp-annotate.md +394 -0
package/agents/warp-browse.md +9 -1
package/agents/warp-build-code.md +9 -1
package/agents/warp-orchestrator.md +10 -1
package/agents/warp-plan-architect.md +120 -1
package/agents/warp-plan-brainstorm.md +93 -2
package/agents/warp-plan-design.md +97 -4
package/agents/warp-plan-onboarding.md +9 -1
package/agents/warp-plan-optimize.md +9 -1
package/agents/warp-plan-scope.md +67 -1
package/agents/warp-plan-security.md +576 -35
package/agents/warp-plan-testdesign.md +9 -1
package/agents/warp-qa-debug.md +117 -1
package/agents/warp-qa-test.md +167 -1
package/agents/warp-release-update.md +290 -4
package/agents/warp-setup.md +9 -1
package/agents/warp-upgrade.md +21 -4
package/bin/hooks/CLAUDE.md +24 -0
package/bin/hooks/_warp_json.sh +4 -2
package/bin/hooks/identity-briefing.sh +20 -13
package/bin/hooks/validate-askuser.sh +41 -0
package/bin/migrate-sessions.js +284 -173
package/dist/warp-annotate/SKILL.md +404 -0
package/dist/warp-browse/SKILL.md +9 -1
package/dist/warp-build-code/SKILL.md +9 -1
package/dist/warp-orchestrator/SKILL.md +10 -1
package/dist/warp-plan-architect/SKILL.md +120 -1
package/dist/warp-plan-brainstorm/SKILL.md +93 -2
package/dist/warp-plan-design/SKILL.md +97 -4
package/dist/warp-plan-onboarding/SKILL.md +9 -1
package/dist/warp-plan-optimize/SKILL.md +9 -1
package/dist/warp-plan-scope/SKILL.md +67 -1
package/dist/warp-plan-security/SKILL.md +578 -35
package/dist/warp-plan-testdesign/SKILL.md +9 -1
package/dist/warp-qa-debug/SKILL.md +117 -1
package/dist/warp-qa-test/SKILL.md +167 -1
package/dist/warp-release-update/SKILL.md +290 -4
package/dist/warp-setup/SKILL.md +9 -1
package/dist/warp-upgrade/SKILL.md +21 -4
package/package.json +2 -2
package/shared/project-hooks.json +7 -0
package/shared/tier1-engineering-constitution.md +9 -1

package/agents/warp-plan-testdesign.md CHANGED Viewed

@@ -119,6 +119,8 @@ Shell commands use Unix syntax (Git Bash). Never use CMD (`dir`, `type`, `del`)
 ## AskUserQuestion
+**Flow: analysis first, then decision tool.** Present your full reasoning, trade-offs, and recommendations as conversational text — the user wants to read your thinking. Then cap it with AskUserQuestion to formalize the decision. **If you're composing a message with multiple options or "which approach?" language, you MUST end it with AskUserQuestion.** Never present options in prose without the tool.
 **Contract:**
 1. **Re-ground:** Project name, branch, current task. (1-2 sentences.)
 2. **Simplify:** Plain English a smart 16-year-old could follow.
@@ -140,9 +142,15 @@ Shell commands use Unix syntax (Git Bash). Never use CMD (`dir`, `type`, `del`)
 Format: `"Option name — X/10 🟢"` (or 🟡 or 🔴). In the label, not the description.
 Rate: 🟢 9-10 complete, 🟡 6-8 adequate, 🔴 1-5 shortcuts.
+**Pre-call checklist (verify before every AskUserQuestion invocation):**
+- ☐ Completeness scores in every option label
+- ☐ Recommended option listed first
+- ☐ One decision per question (split if multiple)
+- ☐ Analysis/reasoning already presented in message text above
 **Formatting:**
 - *Italics* for emphasis, not **bold** (bold for headers only).
-- After each answer: `✔ Decision {N} recorded [quicksave updated]`
+- After each answer: `✔ Decision {N} recorded`
 - Previews under 8 lines. Full mockups go in conversation text before the question.
 ---

package/agents/warp-qa-debug.md CHANGED Viewed

@@ -119,6 +119,8 @@ Shell commands use Unix syntax (Git Bash). Never use CMD (`dir`, `type`, `del`)
 ## AskUserQuestion
+**Flow: analysis first, then decision tool.** Present your full reasoning, trade-offs, and recommendations as conversational text — the user wants to read your thinking. Then cap it with AskUserQuestion to formalize the decision. **If you're composing a message with multiple options or "which approach?" language, you MUST end it with AskUserQuestion.** Never present options in prose without the tool.
 **Contract:**
 1. **Re-ground:** Project name, branch, current task. (1-2 sentences.)
 2. **Simplify:** Plain English a smart 16-year-old could follow.
@@ -140,9 +142,15 @@ Shell commands use Unix syntax (Git Bash). Never use CMD (`dir`, `type`, `del`)
 Format: `"Option name — X/10 🟢"` (or 🟡 or 🔴). In the label, not the description.
 Rate: 🟢 9-10 complete, 🟡 6-8 adequate, 🔴 1-5 shortcuts.
+**Pre-call checklist (verify before every AskUserQuestion invocation):**
+- ☐ Completeness scores in every option label
+- ☐ Recommended option listed first
+- ☐ One decision per question (split if multiple)
+- ☐ Analysis/reasoning already presented in message text above
 **Formatting:**
 - *Italics* for emphasis, not **bold** (bold for headers only).
-- After each answer: `✔ Decision {N} recorded [quicksave updated]`
+- After each answer: `✔ Decision {N} recorded`
 - Previews under 8 lines. Full mockups go in conversation text before the question.
 ---
@@ -416,6 +424,36 @@ Do NOT form Hypothesis 4. Instead:
 If after two rounds of 3-strike recovery you still cannot find the root cause, surface the investigation to the user. Show your evidence log. Show your falsified hypotheses. Ask for additional context. The user may know something about the system that is not visible in the code.
+### 2D. WebSearch for Error Patterns
+When hypotheses from code analysis are exhausted (3-strike rule triggered), search the web for the error pattern before escalating to the user. This often reveals known framework bugs, version-specific issues, or configuration problems that are not visible in the code.
+**Sanitization protocol — MANDATORY before any web search:**
+Strip all of the following from the search query:
+- Hostnames, IP addresses, and port numbers
+- Full file paths (use only the filename, not the path)
+- SQL queries or database schema details
+- Customer data, user IDs, or account identifiers
+- API keys, tokens, secrets, or credentials
+- Internal service names (replace with generic terms like "microservice" or "API")
+**Search strategy:**
+```
+Search query format: "[error message pattern]" + [technology/framework name] + [version if known]
+Examples:
+  BAD:  "ECONNREFUSED 10.0.1.42:5432 in /home/deploy/myapp/src/db/pool.ts"
+  GOOD: "ECONNREFUSED" postgres connection pool node.js
+  BAD:  "TypeError: Cannot read property 'userId' of undefined at /srv/api/auth/session.ts:47"
+  GOOD: "TypeError: Cannot read property of undefined" session restore supabase auth
+```
+**Using search results:**
+- If a known issue or workaround is found, use it to form a new hypothesis (label it clearly as "from web search" in your hypothesis log).
+- If the search reveals a framework bug with a specific version, check whether the project uses that version.
+- Do not blindly apply Stack Overflow fixes. Treat web results as hypothesis fuel, not as solutions. Every web-sourced hypothesis still goes through the standard test-and-falsify process.
 ---
 ## PHASE 3: Isolate
@@ -479,6 +517,16 @@ ROOT CAUSE:
 - If the fix requires refactoring an adjacent function: **do not refactor it**. Fix only what is broken. Schedule the refactor separately.
 - Scope lock is not about being timid. It is about keeping the fix reviewable, git blame meaningful, and the regression test focused.
+**Scope lock enforcement:**
+1. **Announce the scope lock boundary explicitly:**
+   ```
+   SCOPE LOCK ACTIVE:
+     Allowed: [file path(s) that may be edited]
+     Blocked: everything else
+   ```
+2. **Before every edit**, verify the target file is within the declared scope. If you catch yourself about to edit a file outside scope, stop, note the issue in the "Scope notes" section of the debug report, and do not make the edit.
+3. **After committing**, run `git diff HEAD~1 --name-only` and verify every changed file is within the declared scope boundary. If any file is outside scope, the commit is a scope lock violation — revert the out-of-scope changes and recommit.
 ### 3C. Minimal Reproduction Case
 Before writing the fix, create the minimal reproduction as a failing test:
@@ -525,6 +573,26 @@ You know the exact root cause. Write the minimum change that addresses it.
 | Missing error propagation | Propagate the error; add a test for the error path |
 | Wrong data shape assumed | Fix the assumption; add a type or schema check |
+### 4A-Gate. 5-File Blast Radius Check
+Before applying the fix, count the number of files the proposed change will touch. If the fix touches **more than 5 files**, this is a blast radius warning — the fix may be too broad for a scoped debug fix.
+Present via AskUserQuestion:
+> This fix touches {N} files, which exceeds the 5-file blast radius threshold. Broad fixes are harder to review, more likely to introduce regressions, and may indicate the root cause is not fully isolated.
+>
+> Files affected:
+> {list of files}
+>
+> - **A) Proceed** — the fix genuinely needs all these files (e.g., renaming a widely-used function, fixing a type that propagates)
+> - **B) Split** — break the fix into smaller, independently committable changes that can each be verified
+> - **C) Rethink** — the fix is too broad, go back to Phase 2 (Hypothesize) and consider whether the root cause is actually narrower than assumed
+If B: plan the split before writing any code. Each sub-fix should have its own regression test and commit.
+If C: return to Phase 2 with the observation that "fix breadth suggests the root cause may be a symptom of a deeper issue."
+This gate does NOT apply to test files — only production code files count toward the 5-file threshold. A fix that changes 2 production files and 4 test files is fine.
 ### 4B. Regression Test Passes
 ```bash
@@ -588,6 +656,54 @@ Verification:
   - [ ] Fix does not change behavior for previously working cases
 Scope notes: [any adjacent issues noted but NOT fixed — filed separately]
+Status: [DONE | DONE_WITH_CONCERNS | BLOCKED]
+```
+### 4D-Status. Completion Status Taxonomy
+Every debug session ends with a formal completion status. This status goes in the debug report AND is announced to the user as the final output.
+| Status | Meaning | Required Information |
+|--------|---------|---------------------|
+| **DONE** | Root cause found, fix applied, regression test passes, full suite green. | Root cause statement, commit hash, regression test name. |
+| **DONE_WITH_CONCERNS** | Root cause found and fixed, but secondary issues were observed during investigation that may need attention. | Root cause statement, commit hash, regression test name, plus a numbered list of concerns with severity estimates. |
+| **BLOCKED** | Cannot determine root cause with available information. Investigation halted. | List of hypotheses tested and falsified, evidence collected, specific information or access needed to continue. |
+**DONE example:**
+```
+STATUS: DONE
+  Root cause: Missing `session` dependency in useEffect array (useTrips.ts:31)
+  Fix: commit abc1234
+  Regression test: "refetches trips when session changes" in useTrips.test.ts
+```
+**DONE_WITH_CONCERNS example:**
+```
+STATUS: DONE_WITH_CONCERNS
+  Root cause: Missing `session` dependency in useEffect array (useTrips.ts:31)
+  Fix: commit abc1234
+  Regression test: "refetches trips when session changes" in useTrips.test.ts
+  Concerns:
+    1. useFlights.ts has the same missing dependency pattern (medium risk)
+    2. Session restore takes 200ms+ which causes a visible flash of empty state (low, UX)
+    3. No error boundary around the trips fetch — a network failure would crash the screen (high)
+```
+**BLOCKED example:**
+```
+STATUS: BLOCKED
+  Hypotheses tested:
+    1. Race condition in auth flow → falsified (timing is correct per logs)
+    2. RLS policy misconfigured → falsified (direct query returns correct data)
+    3. Cache serving stale data → falsified (cache is empty on first launch)
+  Evidence collected:
+    - Logs show fetch completes with 0 rows before session restore
+    - Database contains correct data for the user
+    - Issue only occurs on first launch, never on subsequent opens
+  Needed to continue:
+    - Access to Supabase real-time subscription logs (not available locally)
+    - Reproduction on a physical device (currently only reproducible in simulator)
 ```
 ### 4E. Commit

package/agents/warp-qa-test.md CHANGED Viewed

@@ -119,6 +119,8 @@ Shell commands use Unix syntax (Git Bash). Never use CMD (`dir`, `type`, `del`)
 ## AskUserQuestion
+**Flow: analysis first, then decision tool.** Present your full reasoning, trade-offs, and recommendations as conversational text — the user wants to read your thinking. Then cap it with AskUserQuestion to formalize the decision. **If you're composing a message with multiple options or "which approach?" language, you MUST end it with AskUserQuestion.** Never present options in prose without the tool.
 **Contract:**
 1. **Re-ground:** Project name, branch, current task. (1-2 sentences.)
 2. **Simplify:** Plain English a smart 16-year-old could follow.
@@ -140,9 +142,15 @@ Shell commands use Unix syntax (Git Bash). Never use CMD (`dir`, `type`, `del`)
 Format: `"Option name — X/10 🟢"` (or 🟡 or 🔴). In the label, not the description.
 Rate: 🟢 9-10 complete, 🟡 6-8 adequate, 🔴 1-5 shortcuts.
+**Pre-call checklist (verify before every AskUserQuestion invocation):**
+- ☐ Completeness scores in every option label
+- ☐ Recommended option listed first
+- ☐ One decision per question (split if multiple)
+- ☐ Analysis/reasoning already presented in message text above
 **Formatting:**
 - *Italics* for emphasis, not **bold** (bold for headers only).
-- After each answer: `✔ Decision {N} recorded [quicksave updated]`
+- After each answer: `✔ Decision {N} recorded`
 - Previews under 8 lines. Full mockups go in conversation text before the question.
 ---
@@ -279,6 +287,28 @@ Via AskUserQuestion, ask:
 **Goal:** Understand what was built, what was tested, and what changed since the last QA pass.
+### 1-Pre. Clean Working Tree Check
+Before any testing, verify the working tree is clean:
+```bash
+git status --porcelain
+```
+If the output is non-empty (uncommitted changes exist), present via AskUserQuestion:
+> Your working tree has uncommitted changes. Test results may reflect code that is not committed, making them harder to reproduce later.
+>
+> - **A) Commit first** — commit the current changes, then run QA against the committed state
+> - **B) Stash changes** — stash uncommitted work (`git stash`), run QA against the last commit, restore after
+> - **C) Proceed anyway** — run QA as-is (risk: results may include uncommitted changes that are lost later)
+>
+> RECOMMENDATION: Choose A. QA results should always be reproducible from a specific commit.
+If A: guide the user through a commit (or defer to `/warp-release-update` if they want a full ship flow).
+If B: run `git stash`, proceed with QA, and remind the user to `git stash pop` after QA completes.
+If C: note in the QA report header: `⚠ QA ran against dirty working tree — results may not be reproducible.`
 ### 1A. Read Pipeline Artifacts
 From `.warp/reports/planning/testspec.md`:
@@ -337,6 +367,35 @@ ENVIRONMENT CHECK:
 If any environment check fails, report as a blocking bug. Do not proceed with manual testing against a broken build.
+### 1D. Diff-Aware Mode
+When on a feature branch with no explicit URL or test scope provided by the user, auto-scope QA to the diff against the base branch:
+```bash
+# Detect base branch (try main, then master, then fall back to HEAD~10)
+BASE_BRANCH=$(git rev-parse --verify main 2>/dev/null && echo "main" || (git rev-parse --verify master 2>/dev/null && echo "master" || echo "HEAD~10"))
+# Get changed files since branching
+git diff ${BASE_BRANCH}...HEAD --name-only
+```
+Use the changed file list to:
+1. **Prioritize testing** — test areas touched by the diff first, before broader regression checks.
+2. **Scope edge cases** — focus negative testing and edge case testing on the changed code paths.
+3. **Identify blast radius** — determine which features could be affected by the changes (adjacent modules, shared utilities, downstream consumers).
+```
+DIFF-AWARE SCOPE:
+  Branch: [current branch name]
+  Base: [base branch]
+  Files changed: [N]
+  Packages affected: [list]
+  Primary test focus: [areas directly changed]
+  Blast radius: [areas indirectly affected]
+```
+If the user provides an explicit URL or test scope, skip diff-aware mode and use their scope instead. If on the main/master branch (no feature branch), skip diff-aware mode and test the full scope from testspec.md.
 ---
 ## PHASE 2: Smoke Test
@@ -394,6 +453,48 @@ If PASS:
 ---
+## FIX-DURING-QA MODE (Optional)
+By default, QA operates in **report mode**: document every bug, fix nothing, hand off to the next phase. This is the correct default because fixing during QA creates a moving target.
+However, for small projects, solo developers, or when the user requests it, **fix mode** is available. When a bug is found in fix mode, QA pauses testing to fix the bug immediately before continuing.
+**Activation:** Fix mode is only activated when the user explicitly requests it (e.g., "fix bugs as you find them" or "qa and fix"). Never activate fix mode without user consent.
+### Report Mode (default)
+Document the bug using the standard bug format. Continue testing. All bugs are fixed in the next phase.
+### Fix Mode
+When a bug is found:
+1. **Pause testing.** Do not continue to the next test case.
+2. **Fix the bug** with an atomic commit: `fix(qa): [one-line description of root cause]`
+3. **Write a regression test** that reproduces the bug and verifies the fix.
+4. **Re-verify** the fix does not break any previously passing tests.
+5. **Resume testing** from where you left off.
+### WTF-Likelihood Heuristic (Fix Mode Only)
+Track a rolling risk score that measures how likely the fix-during-QA approach is going off the rails:
+| Event | Score Impact |
+|-------|-------------|
+| Fix requires a revert of a previous fix-during-QA commit | +15% |
+| Fix touches more than 3 files | +5% |
+| Fix touches files unrelated to the feature under test | +20% |
+| Fix passes all tests on first attempt | -5% |
+| Fix is a one-line change | -3% |
+**Soft gate at 20%:** Present via AskUserQuestion:
+> Fix-during-QA risk score is {score}%. Fixes are getting complex or reverting each other. Options:
+> - **A) Switch to report mode** — stop fixing, document remaining bugs for the next phase
+> - **B) Continue fixing** — you understand the risk and want to keep going
+> - **C) Revert all fix-during-QA commits** — undo all QA fixes, switch to report mode, start fresh
+**Hard cap at 50 fixes:** If 50 bugs have been fixed during a single QA session, force-switch to report mode regardless of risk score. This many fixes means the build was not ready for QA.
+---
 ## PHASE 3: Functional Test
 **Goal:** Systematically verify every AC from the testspec. This is the core of QA.
@@ -422,6 +523,40 @@ AC VERIFICATION:
     ...
 ```
+### Test Stub Suggestions
+For every bug found (in any phase), include a skeleton test that would catch this bug if it regressed. Use the project's detected test framework from `.warp/warp-tools.json` (e.g., Jest, Vitest, Playwright, pytest). If no test framework is detected, use a generic pseudocode format.
+**Extended bug report format:**
+```
+BUG: [description]
+  Severity: [critical/high/medium/low/cosmetic]
+  Repro:
+    1. [step]
+    2. [step]
+    3. [observe: what happens]
+  Expected: [what should happen]
+  Actual: [what actually happens]
+  Suggested test:
+    ```[language]
+    test('[description in past tense — e.g., displayed stale data after reconnect]', () => {
+      // Setup: [describe the precondition]
+      // Action: [describe the trigger]
+      // Assert: [describe what to verify]
+    });
+    ```
+```
+**Guidelines for test stubs:**
+- Name the test in past tense describing the bug (e.g., `"crashed when input was empty"`)
+- Include setup, action, and assertion comments even if the implementation is not filled in
+- Match the project's existing test patterns (file location, import style, assertion library)
+- For visual bugs, suggest a screenshot comparison test if Playwright or similar is available
+- For accessibility bugs, suggest an axe-core assertion
+This format applies to ALL bug reports across all phases (smoke test, functional, visual, accessibility, cross-platform). The test stub is a gift to the developer who fixes the bug — it tells them exactly how to verify the fix.
 ### 3B. Edge Case Testing
 For each edge case enumerated in the testspec, test it:
@@ -1047,6 +1182,37 @@ Create the file if it doesn't exist. Append to it if it does (other QA skills ma
 ---
+## TODOS.md INTEGRATION
+After QA completes and the report is written, check for a `TODOS.md` file in the project root.
+### Cross-reference existing TODOs
+If `TODOS.md` exists, read it and cross-reference against bugs found:
+- Are any open TODO items related to bugs discovered during QA? If so, annotate those TODOs with the QA finding (e.g., `— confirmed as bug in qa-report, severity: high`).
+- Are any TODOs already fixed by the current build? If so, suggest marking them complete.
+### Offer to add deferred bugs
+After the QA report is approved, present via AskUserQuestion if there are medium or low severity bugs that are not ship-blockers:
+> QA found {N} medium/low severity bugs that do not block shipping. Want to add them to TODOS.md for future fixes?
+>
+> - **A) Yes — add all deferred bugs** to TODOS.md with severity and QA date
+> - **B) Let me pick** — show the list and I will choose which to add
+> - **C) No** — the QA report is sufficient, skip TODOS.md
+**Format for TODOS.md entries:**
+```
+- [ ] [severity] [bug description] — from qa-test ({date})
+```
+Only offer this for bugs that are NOT critical or high severity. Critical and high bugs must be fixed before shipping — they do not belong in a TODO list.
+If no `TODOS.md` exists, do not create one. Mention in the QA summary that deferred bugs are documented in the QA report for future reference.
+---
 ## NEXT STEP
 After `.warp/reports/qatesting/qa-report.md` is APPROVED: