npm - odd-studio - Versions diffs - 3.5.1 → 3.6.0 - Mend

odd-studio 3.5.1 → 3.6.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (25) hide show

package/.claude-plugin/plugin.json +5 -1
package/README.md +28 -1
package/bin/commands/status.js +8 -0
package/bin/commands/upgrade.js +10 -0
package/bin/odd-studio.js +1 -1
package/codex-plugin/.codex-plugin/plugin.json +1 -1
package/codex-plugin/hooks.json +16 -0
package/hooks/odd-studio.sh +93 -0
package/package.json +1 -1
package/plugins/plugin-gates.js +34 -3
package/plugins/plugin-quality-checks.js +20 -0
package/scripts/command-definitions.js +5 -0
package/scripts/scaffold-project.js +3 -2
package/scripts/setup-hooks.js +4 -0
package/scripts/state-schema.js +48 -0
package/skill/SKILL.md +86 -9
package/skill/docs/build/build-protocol.md +34 -0
package/skill/docs/build/code-excellence.md +37 -1
package/skill/docs/build/debug-protocol.md +141 -0
package/skill/docs/chapters/chapter-10.md +4 -4
package/skill/docs/planning/build-planner.md +32 -9
package/skill/odd-debug/SKILL.md +60 -0
package/templates/.odd/state.json +11 -1
package/templates/AGENTS.md +16 -1
package/templates/CLAUDE.md +27 -0

package/skill/SKILL.md CHANGED Viewed

@@ -1,6 +1,6 @@
 ---
 name: "odd"
-version: "3.5.1"
+version: "3.6.0"
 description: "Outcome-Driven Development planning and build coach. Use /odd to start or resume an ODD project — building personas, writing outcomes, mapping contracts, creating a Master Implementation Plan, and directing a odd-flow-powered build. Designed for domain experts who are not developers. Works with Claude Code, OpenCode, and Codex."
 metadata:
   priority: 10
@@ -23,6 +23,7 @@ metadata:
       - "begin odd"
       - "resume odd"
       - "continue odd"
+      - "odd debug"
       - "odd studio"
       - "outcome-driven development"
       - "odd status"
@@ -33,6 +34,7 @@ metadata:
     allOf:
       - [odd, status]
       - [odd, build]
+      - [odd, debug]
       - [odd, plan]
       - [outcome, driven]
     anyOf:
@@ -42,6 +44,7 @@ metadata:
       - "outcome"
       - "contract map"
       - "phase brief"
+      - "debug"
     noneOf: []
     minScore: 5
 retrieval:
@@ -96,7 +99,7 @@ Display this when no existing state is found:
 ---
-Welcome to ODD Studio v3.5.1.
+Welcome to ODD Studio v3.6.0.
 You are about to plan and build something real — using a methodology called Outcome-Driven Development. Before we write a single line of code, we are going to get precise about three things:
@@ -120,7 +123,7 @@ Display this when existing state is found. Replace the bracketed values with act
 ---
-Welcome back to ODD Studio v3.5.1.
+Welcome back to ODD Studio v3.6.0.
 **Project:** [project.name]
 **Current Phase:** [state.currentPhase]
@@ -136,7 +139,7 @@ Welcome back to ODD Studio v3.5.1.
 **What's next:** [state.nextStep]
-Type `*plan` to continue planning, `*build` to enter build mode, or `*status` for full detail.
+Type `*plan` to continue planning, `*build` to enter build mode, `*debug` to investigate a failing outcome without leaving the ODD flow, or `*status` for full detail.
 ---
@@ -219,6 +222,36 @@ Enter build mode. This command runs the following checks in order before beginni
    - Do NOT run the brief generation and build agents "in parallel" — the brief MUST be confirmed BEFORE any build work begins
    - This is a hard sequential gate. There are no exceptions.
+### `*debug`
+Enter controlled debug mode for the current outcome.
+This command must keep the work inside the ODD flow. It is not a free-form detour.
+Execute these steps in order:
+1. Read `.odd/state.json` and confirm `currentPhase` is `"build"`. If not, explain that debugging only exists inside build work and route back to `*build`.
+2. Read the latest failure in domain language from the current conversation and identify the active outcome.
+3. Read `docs/build/debug-protocol.md` and choose exactly one debug strategy before inspecting code:
+   - `ui-behaviour`
+   - `full-stack`
+   - `auth-security`
+   - `integration-contract`
+   - `background-process`
+   - `performance-state`
+4. Update `.odd/state.json`:
+   - set `buildMode` to `"debug"`
+   - set `verificationConfirmed` to `false`
+   - set `debugStartedAt` to the current timestamp
+   - set `debugStrategy`, `debugTarget`, and `debugSummary`
+5. Call `mcp__odd-flow__memory_store` with key `odd-project-state`, namespace `odd-project`, value set to the full updated `.odd/state.json`
+6. Run the investigation and fix strictly according to the chosen strategy. Do not guess. Do not apply quick fixes. Reproduce first, identify the failing boundary, then fix.
+7. When the fix is ready, update `.odd/state.json` again:
+   - set `buildMode` to `"verify"`
+   - keep `debugStrategy`, `debugTarget`, and `debugSummary` as the latest resolved context
+8. Call `mcp__odd-flow__memory_store` again with the full updated `.odd/state.json`
+9. Return to the verification walkthrough from step one. A debug session ends only when verification passes.
    **If the brief exists but `briefConfirmed` is not true in state.json:**
    - Present it to the domain expert: "Session Brief [N] exists. Review it at docs/session-brief-[N].md and confirm before we build."
    - Wait for confirmation, then set `briefConfirmed: true` in `.odd/state.json`
@@ -508,6 +541,7 @@ You can use either format:
 |---|---|---|
 | `*plan` | `/odd-plan` | Continue from where you left off in planning |
 | `*build` | `/odd-build` | Enter build mode and initialise odd-flow swarm |
+| `*debug` | `/odd-debug` | Keep debugging inside the active outcome and force an explicit debug strategy before fixing |
 | `*status` | `/odd-status` | Show full project state and progress |
 | `*swarm` | `/odd-swarm` | Build all independent outcomes in the current phase simultaneously |
 | `*deploy` | `/odd-deploy` | Deploy the current verified build to production |
@@ -566,11 +600,54 @@ Enforce this sequence — do not proceed to a later step without the earlier one
 Run when `*build` is called and `servicesConfigured` is false.
 1. **Scaffold.** If `package.json` exists, skip to step 2. If not: `create-next-app` rejects non-empty directories — scaffold into a sibling dir (`${PROJECT_DIR}-scaffold`) then rsync across excluding `.git`, `docs/`, `node_modules/`. Fix `package.json name` after rsync. Tell user they can delete the sibling dir.
-2. **Install deps.** `npm install drizzle-orm drizzle-kit vitest @testing-library/react @vitejs/plugin-react`
-3. **Generate `.env.local`.** Write a placeholder file with every credential the chosen stack needs. Each line must have a comment pointing to exactly where to find the real value in the service dashboard. Include a note: never commit this file, use test keys for payment services.
-4. **Wait.** Display the credential list. Wait for the user to confirm they've filled everything in.
-5. **Verify.** Kill port 3000 (`lsof -ti:3000 | xargs kill 2>/dev/null || true`), run `npm run dev`. Translate any connection errors into plain language. Repeat until server starts cleanly.
-6. **Mark done.** Set `servicesConfigured: true` in `.odd/state.json`. Confirm: "All services connected. Development server running at http://localhost:3000."
+2. **Install deps.** Read `testingFramework` from `.odd/state.json` (default "Vitest"). Install the chosen testing stack:
+   - **Vitest (default):** `npm install --save-dev vitest @testing-library/react @vitejs/plugin-react @testing-library/jest-dom jsdom`
+   - **Jest:** `npm install --save-dev jest @testing-library/react @testing-library/jest-dom ts-jest @types/jest jest-environment-jsdom`
+   - **Playwright:** `npm install --save-dev @playwright/test` then `npx playwright install`
+   - Also install production deps: `npm install drizzle-orm drizzle-kit`
+3. **Scaffold test harness.** Read `testingFramework` from `.odd/state.json` and scaffold the appropriate config. For **Vitest** (the default):
+   - Create `vitest.config.ts`:
+     ```typescript
+     import { defineConfig } from "vitest/config"
+     import react from "@vitejs/plugin-react"
+     import path from "path"
+     export default defineConfig({
+       plugins: [react()],
+       test: {
+         environment: "jsdom",
+         globals: true,
+         setupFiles: ["./tests/setup.ts"],
+         include: ["tests/**/*.test.{ts,tsx}"],
+       },
+       resolve: {
+         alias: {
+           "@": path.resolve(__dirname, "."),
+         },
+       },
+     })
+     ```
+   - Create `tests/setup.ts`:
+     ```typescript
+     import "@testing-library/jest-dom/vitest"
+     ```
+   - Create `tests/setup.test.ts` (smoke test):
+     ```typescript
+     import { describe, it, expect } from "vitest"
+     describe("vitest setup", () => {
+       it("runs", () => {
+         expect(true).toBe(true)
+       })
+     })
+     ```
+   - Add scripts to `package.json`: `"test": "vitest run"` and `"test:watch": "vitest"`
+   - Run `npm test` to confirm the harness works. If the smoke test fails, diagnose and fix before proceeding.
+   - Display: "Test harness configured. `npm test` runs the suite. `npm run test:watch` runs in watch mode."
+4. **Generate `.env.local`.** Write a placeholder file with every credential the chosen stack needs. Each line must have a comment pointing to exactly where to find the real value in the service dashboard. Include a note: never commit this file, use test keys for payment services.
+5. **Wait.** Display the credential list. Wait for the user to confirm they've filled everything in.
+6. **Verify.** Kill port 3000 (`lsof -ti:3000 | xargs kill 2>/dev/null || true`), run `npm run dev`. Translate any connection errors into plain language. Repeat until server starts cleanly.
+7. **Mark done.** Set `servicesConfigured: true` in `.odd/state.json`. Confirm: "All services connected. Development server running at http://localhost:3000. Test harness verified."
 ---

package/skill/docs/build/build-protocol.md CHANGED Viewed

@@ -32,6 +32,40 @@ The domain expert does not re-brief the AI, paste context, identify shared infra
 ---
+### Step 2b — Test
+After the build completes and before verification begins, the build agent runs the test suite automatically.
+**What the build agent tests:**
+Every outcome produces code. Some of that code is pure logic — functions that take inputs and return outputs without touching databases, APIs, or the browser. These functions MUST have tests written alongside the implementation. The build agent writes tests for:
+- **Business rules** — access control, pricing, eligibility, classification logic
+- **Data transformations** — formatting, aggregation, filtering, sorting
+- **Validation** — input parsing, CSV import, form validation, regex matching
+- **Calculations** — mastery scores, scheduling, priority ordering, time-based logic
+- **Safety-critical logic** — safeguarding detection, content filtering, concern routing
+**What is NOT tested at this stage:**
+- Database queries (tested via verification walkthrough)
+- UI rendering (tested via verification walkthrough)
+- External API calls (tested via verification walkthrough)
+- LLM prompt/response cycles (tested via verification walkthrough)
+**The test gate:**
+After the build completes, run `npm test`. If any tests fail:
+1. The build agent fixes the failures immediately — do not proceed to verification with failing tests
+2. Re-run `npm test` until all tests pass
+3. Display to the domain expert: "All [n] tests passing. Ready for verification."
+If no testable pure logic was produced by this outcome (e.g., a purely UI outcome), display: "No new business logic tests required for this outcome. Ready for verification."
+Tests are committed alongside the implementation code. They live in `tests/` mirroring the source structure. Test files must never be deleted — they are regression guards for every future outcome.
+---
 ### Step 3 — Verify
 When ODD Studio reports the build is complete, the verification checklist is on screen.

package/skill/docs/build/code-excellence.md CHANGED Viewed

@@ -191,6 +191,40 @@ If a build agent finds itself exceeding these limits, it stops and restructures
 ---
+## Testing Standard
+Every pure-logic module — any function that takes inputs and returns outputs without side effects — MUST have a corresponding test file in `tests/` that mirrors the source path. If the source is `lib/psm/mastery.ts`, the test is `tests/lib/psm/mastery.test.ts`.
+### What MUST be tested
+- **Business rules:** access control checks, eligibility logic, classification functions, routing decisions
+- **Data transformations:** formatting, aggregation, filtering, score calculations
+- **Parsing and validation:** CSV import, regex patterns, input sanitisation, form validation
+- **Safety-critical logic:** safeguarding keyword detection, content filtering, concern classification and routing
+- **State machines:** plant growth levels, unlock gates, engagement level thresholds
+### What is NOT unit tested
+- Database queries — these are tested via the verification walkthrough
+- React components — these are tested via the verification walkthrough and design verification
+- LLM calls — these are tested via the verification walkthrough
+- External API integrations — these are tested via the verification walkthrough
+### Test quality rules
+- Test the behaviour, not the implementation. Test what the function returns, not how it computes it.
+- Use `it.each` for data-driven tests with multiple inputs against the same assertion.
+- Use `vi.useFakeTimers()` for time-dependent logic. Clean up with `vi.useRealTimers()`.
+- Use `vi.stubEnv()` for environment variable tests. Clean up with `vi.unstubAllEnvs()`.
+- No mocks for things that can be tested directly. If a function is pure, test it directly.
+- Every test file must pass independently — no shared state between test files.
+### The testing gate
+`npm test` runs before every verification walkthrough. Failing tests block verification. The domain expert never sees a system with broken business logic.
+---
 ## How This Standard Is Enforced
 1. **At build time.** The build agent reads this document before writing any code. It applies the Design-It-Twice protocol internally and outputs only the minimal version.
@@ -199,4 +233,6 @@ If a build agent finds itself exceeding these limits, it stops and restructures
 3. **At verification time.** When the domain expert verifies an outcome, the code behind it has already been through two passes. The domain expert does not review code — but the code they are relying on is clean, minimal, and maintainable.
-4. **At refactor time.** If an outcome is rebuilt after a verification failure, the rebuild starts from scratch against this standard — it does not patch the previous attempt.
+4. **At test time.** `npm test` runs after every build and before every verification. Failing tests block verification. New pure-logic functions without corresponding test files are flagged.
+5. **At refactor time.** If an outcome is rebuilt after a verification failure, the rebuild starts from scratch against this standard — it does not patch the previous attempt. Existing tests must still pass after the rebuild.

package/skill/docs/build/debug-protocol.md ADDED Viewed

@@ -0,0 +1,141 @@
+# ODD Debug Protocol
+Debugging does not sit outside Outcome-Driven Development. It is a controlled sub-mode of the current build.
+When something fails during verification or during an in-progress build, use `*debug`. Do not abandon the active outcome. Do not start free-form fixing. Do not guess.
+## Purpose
+`*debug` exists to keep failure analysis inside the ODD flow:
+- The failing outcome remains the active unit of work
+- The investigation approach is chosen deliberately, not guessed
+- The fix stays tied to the outcome walkthrough and contracts
+- The work returns to verification when the defect is resolved
+## Entry Conditions
+Before debugging:
+1. Read `.odd/state.json`
+2. Confirm `currentPhase` is `"build"`
+3. Identify the active outcome and the latest failure in domain language
+4. Set these fields in `.odd/state.json`:
+   - `buildMode: "debug"`
+   - `verificationConfirmed: false`
+   - `debugStartedAt: <timestamp>`
+   - `debugSummary: <one-sentence failure in domain language>`
+5. Store the updated state to odd-flow with key `odd-project-state`
+Debugging must never mark an outcome verified, complete, or committed.
+## Strategy Selection
+Choose exactly one debug strategy before inspecting code. State the chosen strategy and the reason.
+Use this routing rule so the coding tool does not guess:
+- Choose `ui-behaviour` when the problem is visible in the interface and you do not yet have evidence of a backend or data fault
+- Choose `full-stack` when the failure crosses a user action, server boundary, and persisted state
+- Choose `auth-security` when access, identity, trust, or validation boundaries might be wrong
+- Choose `integration-contract` when one part of the system expects data or sequencing another part does not produce
+- Choose `background-process` when the failure depends on async handoff, jobs, retries, or event delivery
+- Choose `performance-state` when the issue depends on timing, staleness, cache invalidation, or repeated actions
+If more than one strategy seems plausible, do not fix anything yet. Gather one more piece of evidence, then choose the narrowest strategy that still explains the failure.
+### 1. `ui-behaviour`
+Use when the failure is visible in the interface only:
+- layout is wrong
+- a button does nothing
+- a view does not update
+- a message or validation state is missing
+Approach:
+- reproduce in browser first
+- inspect the rendered path backwards to the triggering action
+- verify whether the contract is correct and the rendering is wrong
+### 2. `full-stack`
+Use when the failure spans browser, route, service, and data:
+- a form submits but the result is missing
+- a saved change does not appear
+- a payment or enrolment looks complete but data is inconsistent
+Approach:
+- trace the full request path
+- identify the first boundary where expected data diverges
+- fix the smallest broken boundary, not the symptom
+### 3. `auth-security`
+Use when the defect touches access, trust, or sensitive behaviour:
+- the wrong person can see or do something
+- a protected route is open
+- a webhook or upload path behaves unsafely
+- a session, role, or permission check is wrong
+Approach:
+- verify actor, boundary, and expected restriction first
+- inspect authentication, authorisation, validation, and side-effect points in order
+- prefer the fix that narrows access and restores explicit checks
+### 4. `integration-contract`
+Use when two outcomes disagree about shared data or sequencing:
+- one screen expects data another workflow never produces
+- a downstream step fails because an upstream assumption changed
+Approach:
+- inspect the contract map and the active outcome contracts
+- find the first producer/consumer mismatch
+- fix the contract implementation or update the outcome if the specification is wrong
+### 5. `background-process`
+Use when the failure depends on queues, jobs, webhooks, scheduled work, or async delivery.
+Approach:
+- identify the triggering event
+- confirm the worker/task started
+- inspect the persisted state before and after the async boundary
+- fix the handoff, retry, or idempotency break
+### 6. `performance-state`
+Use when the issue is stale data, race conditions, repeated actions, caching, or timing-sensitive state.
+Approach:
+- reproduce twice
+- confirm whether the fault is deterministic or timing-sensitive
+- inspect cache/state invalidation boundaries before changing business logic
+## Non-Negotiable Rules
+- Never use “quick fix” or “patch it” reasoning
+- Never change multiple layers at once before reproducing the fault
+- Never skip the reproduction step
+- Never jump to a fix before naming the failing boundary
+- Never broaden the strategy after starting unless new evidence proves the original classification wrong
+- Never leave `buildMode: "debug"` active after the fix is complete
+## Fix Protocol
+After choosing the strategy:
+1. Reproduce the failure
+2. Name the failing boundary
+3. Inspect only the layers required by the chosen strategy
+4. Apply the smallest fix that restores the specified behaviour
+5. Run the relevant automated checks
+6. Set these fields in `.odd/state.json`:
+   - `buildMode: "verify"`
+   - `debugStrategy: <chosen strategy>`
+   - `debugTarget: <affected outcome/surface>`
+   - `debugSummary: <resolved failure summary>`
+7. Store the updated state to odd-flow with key `odd-project-state`
+8. Return to the verification walkthrough from step one
+If the investigation reveals that the specification is wrong, stop debugging and update the outcome instead. That is not a bug fix. That is a specification correction.

package/skill/docs/chapters/chapter-10.md CHANGED Viewed

@@ -1,6 +1,6 @@
 # Chapter 10: The Build Protocol
-ODD Studio handles all mechanics — context loading, contract validation, re-briefing, committing. You do three things: type /odd, type *build, verify the result. The tool handles the rest.
+ODD Studio handles all mechanics — context loading, contract validation, re-briefing, committing, and controlled debugging. You do three things: type /odd, type *build, verify the result. If verification fails, type *debug. The tool handles the rest.
 The Build Protocol is the repeating rhythm of every session. It is deliberately simple because the complexity belongs in the specification, not in the process. If the specification is precise, the build protocol is almost mechanical. If the build goes wrong, the cause is almost always in the specification, not in the process.
@@ -8,7 +8,7 @@ The Build Protocol is the repeating rhythm of every session. It is deliberately
 - **The tool handles the mechanics. You handle the judgment.** ODD Studio loads context from odd-flow, reads the contract map, identifies the next outcome to build, briefs the AI, waits for the result, and presents you with a verification checklist. Your job is to follow that checklist as the persona and judge whether the result is correct.
-- **The session rhythm is: /odd, *build, verify, confirm.** `/odd` loads the skill and restores project state from odd-flow. `*build` starts the next outcome. You verify the result against the checklist. `confirm` runs Checkpoint (a security scan of what was just built), commits the verified outcome, and advances to the next one. That is it.
+- **The session rhythm is: /odd, *build, verify, confirm. If verification fails: *debug, then verify again.** `/odd` loads the skill and restores project state from odd-flow. `*build` starts the next outcome. You verify the result against the checklist. If it fails, `*debug` classifies the failure and keeps the fix inside the active outcome. `confirm` runs Checkpoint (a security scan of what was just built), commits the verified outcome, and advances to the next one. That is it.
 - **Re-briefing is automatic.** You do not need to remind the AI what your project is, what has been built, or what comes next. odd-flow stores all of this. ODD Studio reads it at the start of every session. If you find yourself explaining context, something is wrong with the state — not with the process.
@@ -22,12 +22,12 @@ The Build Protocol is the repeating rhythm of every session. It is deliberately
 - Building without verifying. Typing `confirm` without following the verification checklist is the fastest way to accumulate hidden defects. Every unverified outcome is a risk to every outcome that depends on it.
-- Worrying about security implementation. Checkpoint runs automatically when you type `confirm`. It scans for exposed secrets, missing authentication checks, and injection vulnerabilities in what was just built. If it finds something, the build agent fixes it before the commit. You do not need to think about this — it happens in the background.
+- Worrying about security implementation. Checkpoint runs automatically before verification is completed and before commit. It scans for exposed secrets, missing authentication checks, insecure session shortcuts, and dangerous rendering or transport shortcuts in what was just built. If it finds something, the build returns to controlled debugging instead of drifting into ad hoc fixes.
 - Batching multiple outcomes into one build. Each outcome has its own verification checklist for a reason. Mixing them makes it impossible to know which outcome caused a failure.
 ## What This Means for You
-Your next session: type `/odd`. Read what ODD Studio tells you about where you are. Type `*build`. Follow the checklist. If it passes, type `confirm`. If it fails, describe the failure in your own words — not in technical terms. ODD Studio handles the fix.
+Your next session: type `/odd`. Read what ODD Studio tells you about where you are. Type `*build`. Follow the checklist. If it passes, type `confirm`. If it fails, describe the failure in your own words — not in technical terms — then type `*debug`. ODD Studio classifies the failure, keeps the work inside the active outcome, and routes the fix back into verification.
 Next: Chapter 11 explains why verification is your job and no tool can replace your judgment.

package/skill/docs/planning/build-planner.md CHANGED Viewed

@@ -267,19 +267,42 @@ If the domain expert raises a concern:
 - Always tie reasoning to the specification. "This fits because Outcome 2.1 requires real-time updates for 90 students" — not "This is popular."
 - If a previous choice constrains the next (e.g., choosing Supabase for database means Supabase Auth is available as an auth option), mention it as context but do not force it.
-**Phase 3: Confirm the Fixed Layers**
+**Phase 3: Fixed Layer and Testing Decision**
-After all choices are made, explain the two components that are included in every ODD Studio project regardless of choice:
+After all choices are made, explain the fixed ORM layer:
 **Drizzle ORM** — the database layer that keeps the AI honest.
 "Drizzle is the tool that ensures the build agents always know the exact shape of your data. Every field, every type, every relationship lives in your codebase as versioned migrations. When something goes wrong, we can reverse the last change precisely — the same way git lets us reverse code changes. Without Drizzle, agents are guessing about your database. With it, they know."
-**Vitest** — automated checks for invisible behaviours.
+Drizzle is not negotiable. It exists because the build agents need it, not because of preference.
-"Vitest runs the business rules and calculations you cannot verify by clicking — access control logic, pricing calculations, workflow state transitions. Every outcome built triggers Vitest automatically. If a rule breaks because of a change somewhere else, Vitest catches it before you reach the verification step."
+**Then present the testing decision.**
-These are not negotiable. They exist because the build agents need them, not because of preference.
+"Now we choose your testing framework. Automated tests run the business rules and calculations you cannot verify by clicking — access control logic, pricing calculations, workflow state transitions. Every outcome built triggers the test suite automatically. If a rule breaks because of a change somewhere else, the tests catch it before you reach the verification step."
+Present the testing options:
+**Decision: Testing Framework**
+**Vitest** (recommended)
+- What it is: A fast, modern test runner built for the JavaScript/TypeScript ecosystem, with native ESM support and a jsdom browser environment for component testing.
+- Why it fits: Vitest understands your project's TypeScript and path aliases out of the box. It runs in under 2 seconds for most test suites. It includes everything needed — assertions, mocking, fake timers, coverage — with zero extra configuration.
+- Trade-off: None significant. This is the default because it works best with the ODD build process.
+**Jest**
+- What it is: The most widely-used JavaScript test runner. Battle-tested, enormous ecosystem.
+- Why it fits: If your team already uses Jest and has existing test patterns, consistency may matter more than speed.
+- Trade-off: Slower than Vitest, requires additional configuration for TypeScript and ESM, heavier setup.
+**Playwright Test** (for E2E-only projects)
+- What it is: A browser-based test runner. Tests run against a real browser, not jsdom.
+- Why it fits: If your project is almost entirely UI with minimal business logic, browser-level testing may be more valuable than unit tests.
+- Trade-off: Much slower per test, requires a running dev server, not suitable for testing pure business logic in isolation.
+"Which testing framework do you prefer? Vitest is the default because it integrates cleanly with the build process — but if you have a strong preference, we will use it."
+If the domain expert has no preference or chooses Vitest, proceed with Vitest. Record the choice.
 **Phase 4: Summarise and Confirm**
@@ -292,7 +315,7 @@ After all decisions are made, present the complete stack as a summary:
 - **ORM**: Drizzle (fixed — build agent requirement)
 - **Auth**: [chosen] — because [reason from their decision]
 - **Hosting**: [chosen] — because [reason from their decision]
-- **Testing**: Vitest (fixed — build agent requirement)
+- **Testing**: [chosen — default Vitest] — because [reason]
 - [Any specialist services]: [chosen] — because [reason from their decision]
 Does this look right? Any second thoughts before I record it?"
@@ -313,7 +336,7 @@ Append a technical decisions section to `CLAUDE.md`:
 - Database: [chosen]
 - ORM: Drizzle
 - Auth: [chosen]
-- Testing: Vitest
+- Testing: [chosen — default Vitest]
 - Hosting: [chosen]
 - [Other services]: [chosen]
@@ -323,7 +346,7 @@ Append a technical decisions section to `CLAUDE.md`:
 - Auth: [why, with reference to specific outcome or persona]
 - Hosting: [why, with reference to specific outcome or persona]
 - Drizzle: type-safe database layer with versioned migrations — build agents always know the exact shape of the data and every change is tracked alongside code changes
-- Vitest: automated testing for invisible business rules — catches regressions before verification
+- Testing: [chosen framework — default Vitest] — automated testing for invisible business rules — catches regressions before verification
 **Alternatives considered (per layer):**
 - Framework: [rejected options and why — specific constraint from the specification]
@@ -346,7 +369,7 @@ Call `mcp__odd-flow__memory_store`:
 - Set `techStackDecided: true`
 - Set `techStack` to the chosen stack description (e.g., "Next.js 16 + Supabase + NextAuth + Vercel")
 - Set `orm` to "Drizzle"
-- Set `testingFramework` to "Vitest"
+- Set `testingFramework` to the chosen testing framework (default "Vitest")
 - Update `nextStep` to "Choose the design approach, then generate architecture and design system documents"
 Confirm to the user: "Technical stack chosen and recorded. Every build agent will read this before writing a line of code."

package/skill/odd-debug/SKILL.md ADDED Viewed

@@ -0,0 +1,60 @@
+---
+name: "odd-debug"
+version: "1.0.0"
+description: "ODD Studio debug command. Keeps debugging inside the active outcome, selects the correct debug strategy, and routes back to verification instead of drifting outside the ODD flow."
+metadata:
+  priority: 9
+  pathPatterns:
+    - '.odd/state.json'
+    - 'docs/plan.md'
+    - 'docs/outcomes/**'
+    - 'docs/session-brief*.md'
+  promptSignals:
+    phrases:
+      - "odd debug"
+      - "start odd debug"
+      - "continue odd debug"
+      - "resume odd debug"
+      - "debug this in odd"
+    allOf:
+      - [odd, debug]
+    anyOf:
+      - "debug"
+      - "fix"
+      - "broken"
+      - "verification failed"
+      - "regression"
+    noneOf: []
+    minScore: 5
+retrieval:
+  aliases:
+    - odd debug
+    - debug with odd
+  intents:
+    - start odd debug
+    - continue odd debug
+  entities:
+    - debug strategy
+    - failing outcome
+    - verification failure
+---
+# /odd-debug
+You are executing the ODD Studio `*debug` command.
+Read these two files now:
+1. `.claude/skills/odd/SKILL.md` — the full ODD Studio coach and build protocol
+2. `.claude/skills/odd/docs/build/debug-protocol.md` — the Debug Protocol detail
+Then execute the `*debug` protocol exactly as documented in those files, starting from the state check and selecting the explicit debug strategy before any fix is attempted.
+You must classify the failure into exactly one debug strategy before reading implementation details:
+- `ui-behaviour`
+- `full-stack`
+- `auth-security`
+- `integration-contract`
+- `background-process`
+- `performance-state`
+If the correct strategy is not yet clear, gather evidence first. Never guess. Never jump straight to a fix.

package/templates/.odd/state.json CHANGED Viewed

@@ -1,5 +1,5 @@
 {
-  "version": "2.0.0",
+  "version": "2.1.0",
   "projectName": "{{PROJECT_NAME}}",
   "initialisedAt": null,
   "lastSaved": null,
@@ -14,6 +14,7 @@
   "planApproved": false,
   "techStackDecided": false,
   "designApproachDecided": false,
+  "architectureDocGenerated": false,
   "servicesConfigured": false,
   "sessionBriefExported": false,
   "sessionBriefCount": 0,
@@ -22,5 +23,14 @@
   "swarmActive": false,
   "buildPhase": null,
   "currentBuildPhase": null,
+  "buildMode": "idle",
+  "debugStrategy": null,
+  "debugTarget": null,
+  "debugSummary": null,
+  "debugStartedAt": null,
+  "checkpointStatus": "unknown",
+  "lastCheckpointAt": null,
+  "checkpointFindings": 0,
+  "securityBaselineVersion": "2026-04-12",
   "notes": ""
 }

package/templates/AGENTS.md CHANGED Viewed

@@ -6,7 +6,7 @@
 This project uses ODD Studio for planning and building. To activate the ODD coach:
 **In Codex:** Type `use ODD`, `start ODD`, or `begin ODD` to start.
-Use `ODD status` to check the current state before resuming, or `ODD build` to continue the build flow.
+Use `ODD status` to check the current state before resuming, `ODD build` to continue the build flow, or `ODD debug` to investigate a failing outcome without leaving ODD.
 **In OpenCode:** Type `/odd` to start.
@@ -124,6 +124,21 @@ export function canAccess(user: User): boolean {
 ### Security Baseline
 - No hardcoded secrets, API keys, or credentials — use environment variables
 - Validate user input at system boundaries
+- Authenticate and authorise every protected route, action, webhook, and admin surface
+- Verify webhooks, uploads, and third-party callbacks before trusting payloads
+- Use secure session defaults — no localStorage auth/session tokens, no JWT-by-default shortcuts
+- Rate-limit auth, admin, upload, payment, and public write surfaces
+- Record audit trails for admin and security-sensitive actions
+- Never disable TLS, CSRF, origin, or certificate verification in production code
+- Treat any security scan finding as release-blocking until fixed
+## Debugging Inside ODD
+- Use `ODD debug` or `*debug` when verification fails or a build breaks
+- Debugging stays inside the current outcome — it is not a free-form detour
+- Choose an explicit debug strategy before touching code: `ui-behaviour`, `full-stack`, `auth-security`, `integration-contract`, `background-process`, or `performance-state`
+- Reproduce first, identify the failing boundary second, fix third
+- Never apply a “quick fix” without naming the failing boundary
+- After a fix, return to the verification walkthrough from step one
 ## UI Standards (Every UI Outcome)
 - Use shadcn/ui components as the default component library