npm - waypoint-codex - Versions diffs - 1.0.14 → 1.0.16 - Mend

waypoint-codex 1.0.14 → 1.0.16

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (39) hide show

package/templates/.agents/skills/planning/SKILL.md CHANGED Viewed

@@ -2,35 +2,74 @@
 name: planning
 description: >
   Interview-driven planning methodology that produces implementation-ready plans.
-  Use for new features, refactoring, architecture changes, migrations, or any non-trivial
-  implementation work. Ask multiple rounds of clarifying questions about product behavior,
-  user expectations, edge cases, and architecture; explore the repo deeply before deciding;
-  do not waste questions on implementation details that can be learned directly from the
-  code or routed docs; and write the final plan into `.waypoint/plans/` so it persists in the repo.
+  Use for non-trivial implementation work such as new features, refactors, migrations,
+  architecture changes, or ambiguous bugs that require repo exploration and product
+  clarification. Do not use for tiny obvious edits, straightforward one-file changes,
+  or post-implementation closeout. Ask multiple rounds of clarifying questions about
+  behavior, constraints, edge cases, and architecture; explore the repo before deciding;
+  and write the final plan into `.waypoint/plans/` so it persists in the repo.
 ---
 # Planning
-Good plans prove you understand the problem. Size matches complexity — a rename might be 20 lines, a complex feature might be 500.
+Good plans prove you understand the problem. Size matches complexity: a rename might be 20 lines, a complex feature might be 500.
-**The handoff test:** Could someone implement this plan without asking you questions? If not, find what's missing.
+The handoff test: could someone implement this plan without asking you questions? If not, find what is missing.
-## When Not To Use This Skill
+## When To Use This Skill
+- Use it for new feature work, refactors, migrations, architecture changes, or bugs where the right fix depends on repo exploration plus product or architectural clarification.
+- Use it when the request can affect contracts, schemas, data flow, compatibility, or multiple call sites.
+- Use it when a durable plan doc will help someone else implement the change without re-discovering the problem.
-- Skip it for tiny obvious edits where a full planning pass would cost more than it saves.
-- Skip it when the user explicitly wants implementation right away and the work is already straightforward.
-- Skip it for post-implementation closeout; use the review or hygiene workflows for that.
+Examples:
-## Read First
+- Use: "Add a new billing flow with backend and frontend changes."
+- Use: "Refactor the event pipeline to a new payload shape."
+- Use: "Migrate the job runner to a new queue contract."
+- Do not use: "Fix a typo in one markdown file."
+- Do not use: "Rename this single variable in one function."
+- Do not use: "Make this one-line styling tweak and ship it."
-Before planning:
+## When Not To Use This Skill
-1. Read `AGENTS.md`
-2. Read `.waypoint/WORKSPACE.md`
-3. Read `.waypoint/ACTIVE_PLANS.md`
-4. Read `.waypoint/DOCS_INDEX.md`
-5. Read `.waypoint/context/SNAPSHOT.md`
-6. Read the routed docs relevant to the task
+- Do not use it for tiny obvious edits where the overhead of a full planning pass would cost more than it saves.
+- Do not use it for straightforward implementation that fits cleanly in one local change and does not need interview-driven discovery.
+- Do not use it for post-implementation closeout; use the review or hygiene workflows for that.
+## Ordered Workflow
+1. Classify the request.
+   - Decide whether the work needs full planning or the reduced-depth exception.
+   - Identify the likely surface area: product behavior, architecture, migration, refactor, or local edit.
+2. Read the routed workspace context.
+   - Read `AGENTS.md`.
+   - Read `.waypoint/WORKSPACE.md`.
+   - Read `.waypoint/ACTIVE_PLANS.md`.
+   - Read `.waypoint/DOCS_INDEX.md`.
+   - Read `.waypoint/context/SNAPSHOT.md`.
+   - Read the routed docs relevant to the task.
+3. Explore the codebase.
+   - Find the real entry points.
+   - Trace callers, imports, and data flow.
+   - Inspect adjacent modules that already solve similar problems.
+   - Identify constraints the change must respect.
+4. Interview the user.
+   - Ask 2-4 focused questions.
+   - Ask about behavior, edge cases, users, tradeoffs, and architecture.
+   - Do not ask for facts that the repo or routed docs already answer.
+5. Repeat exploration and interviewing until the plan is grounded.
+   - Keep drilling until you can explain what exists today, what changes, what could go wrong, and what decisions remain.
+6. Choose the plan depth.
+   - Use the full planning shape by default.
+   - Use the reduced-depth exception only when the task qualifies under the exception rule below.
+7. Write or update the durable plan doc.
+   - Put it under `.waypoint/plans/`.
+   - Choose the smallest routed location that matches the work.
+   - Update `.waypoint/ACTIVE_PLANS.md` when the plan is approved or changes phase.
+8. Self-review the plan against real code.
+   - Verify state invariants, transaction boundaries, and the current tooling.
+   - Remove assumptions that are not backed by the repo or routed docs.
 ## Output Location
@@ -57,7 +96,7 @@ Keep looping until you can explain:
 ## Interviewing
-**Interviewing is the most important part of planning.** You cannot build what you don't understand. Every unasked product, behavior, edge-case, or architecture question is an assumption that will break during implementation.
+Interviewing is the most important part of planning. You cannot build what you do not understand. Every unasked product, behavior, edge-case, or architecture question is an assumption that will break during implementation.
 Interview iteratively: 2-4 questions -> answers -> deeper follow-ups -> repeat. Each round should go deeper.
@@ -73,45 +112,55 @@ Ask aggressively about:
 - what tradeoffs are acceptable
 - what architecture direction the user wants
-Do **not** spend those questions on implementation facts that can be learned from reading the code, routed docs, or external docs already linked by the repo.
+Do not spend those questions on implementation facts that can be learned from reading the code, routed docs, or external docs already linked by the repo.
 Push back when something seems off. Neutrality is not the goal; correctness is.
-## Exploring the Codebase
+## Plan Content
-**More exploration = better plans.** The number one cause of plan failure is insufficient exploration.
+Plans document your understanding. Include what matters for this task.
-Explore until you stop having questions, not until you've "done enough."
+### Mandatory in every plan
-Use the repository like a map:
+- **Current State**: What exists today, including the relevant files, data flow, constraints, and existing patterns.
+- **Changes**: Every file to create, modify, or delete, and how the changes connect.
+- **Decisions**: Why this approach, tradeoffs, and assumptions.
+- **Phase breakdown**: Distinct execution phases in the order they should happen.
+- **Scope checklist**: Concrete implementation items that can be marked done or not done.
+- **Acceptance criteria**: What must be true when each phase is done.
+- **Non-Goals**: Explicitly out of scope items to prevent implementation drift.
+- **TL;DR**: A short summary for quick review.
-- find the real entry points
-- trace callers and imports
-- inspect nearby modules solving similar problems
-- identify existing patterns worth following
-- identify constraints that the change must respect
+### Conditional artifacts
-Do not plan from abstractions alone. Ground major decisions in actual files.
+Add these only when the work actually needs them:
-## Plan Content
+- **Legacy seam inventory**: Required for migrations, refactors, or compatibility-sensitive replacements where legacy readers, writers, consumers, or tests still depend on the old shape.
+- **Removals**: Required when obsolete code, compatibility logic, dead branches, or unused files will be deleted.
+- **Phase checkpoints**: Required when the change needs explicit phase gates, review passes, or staged verification before moving on.
+- **File strategy**: Required when multiple files or new files need justification, locality control, or split decisions.
+- **Test strategy**: Required when the work needs a deliberate minimal test set and the risk is not obvious from the change itself.
+- **Grep gates**: Required when the plan must prove that legacy symbols or shapes are gone before a phase completes.
+- **Cleanup expectations**: Required when the implementation must delete replaced paths before the work is complete.
+- **Test cases**: Required for behavioral changes where concrete input -> expected output examples prevent ambiguity.
+- **Docs/workspace updates**: Required when the change affects durable project behavior or operator-facing guidance.
+  Any new or updated routable doc under `.waypoint/docs/` or `.waypoint/plans/` must include `summary`, `last_updated`, and `read_when` frontmatter so `DOCS_INDEX` can route it.
-Plans document your understanding. Include what matters for this task:
-- **Current State**: What exists today — relevant files, data flows, constraints, existing patterns
-- **Legacy seam inventory**: Every read path, write path, sync or worker path, route contract, frontend consumer, event payload, fixture, and test surface that still depends on the legacy shape
-- **Changes**: Every file to create/modify/delete, how changes connect
-- **Removals**: What obsolete code, compatibility logic, unused files, debug logs, dead props, or stale branches will be deleted as part of the change
-- **Decisions**: Why this approach, tradeoffs, assumptions
-- **Phase breakdown**: Distinct execution phases in the order they should happen
-- **Scope checklist**: Concrete implementation items that can be marked done or not done
-- **Acceptance criteria**: What must be true when each phase is "done"
-- **Phase checkpoints**: What verification, reviewer passes, tests, typechecks, builds, or manual QA must pass before moving to the next phase, with explicit cadence (targeted checks during implementation, full sweeps at phase-complete or pre-commit checkpoints unless the user asks otherwise)
-- **File strategy**: Why each new file is necessary, how edit locality is preserved, and which splits are intentionally avoided
-- **Test strategy**: The smallest durable test set that gives confidence for this change, plus why additional tests are not needed right now
-- **Grep gates**: Exact searches that must return clean before a phase is review-ready or complete
-- **Cleanup expectations**: What legacy or replaced paths must be removed before the work can be called complete
-- **Test cases**: For behavioral changes, include input -> expected output examples
-- **Non-Goals**: Explicitly out of scope to prevent implementation drift
+### Reduced-Depth Exception Rule
+Use a reduced-depth plan only when all of the following are true:
+- The request is small and localized.
+- The change does not alter schemas, contracts, compatibility boundaries, or multi-step data flow.
+- The likely implementation fits in a narrow set of files.
+- The user wants planning support, but a full interview-and-audit pass would add more overhead than value.
+When this exception applies:
+- Ask at most one focused clarification round.
+- Read only the directly relevant files and immediate neighbors.
+- Produce a shorter plan doc that keeps `Current State`, `Changes`, `Decisions`, `Scope checklist`, `Acceptance criteria`, `Test strategy`, and `Non-Goals`.
+- Omit `Legacy seam inventory`, `Grep gates`, and `Phase checkpoints` unless the request unexpectedly grows in scope.
 Use ASCII diagrams when they clarify flow, layering, or state.
@@ -128,18 +177,18 @@ Before presenting the plan, verify against real code:
 ## Rules
-- No TBD
-- No "we'll figure it out during implementation"
-- No literal code unless the user explicitly wants it
-- No pretending you verified something you didn't
-- Approved scope must be explicit enough to act as an execution contract after user approval
-- The plan must be explicit enough to support phase-by-phase execution and checkpoints without rediscovering the intended order in chat
-- Migration and refactor plans should include a legacy seam inventory before implementation starts
-- Migration and refactor phases should include exact grep gates for the legacy symbols being removed
-- Refactor and replacement plans should explicitly call out what legacy or obsolete code will be removed instead of preserving it by default
+- No TBD.
+- No "we'll figure it out during implementation."
+- No literal code unless the user explicitly wants it.
+- No pretending you verified something you did not.
+- Approved scope must be explicit enough to act as an execution contract after user approval.
+- The plan must be explicit enough to support phase-by-phase execution and checkpoints without rediscovering the intended order in chat.
 - Do not split files by concern labels alone. A new file requires a clear boundary, reuse need, or size reason.
 - Do not inflate tests by default. Start from a small high-signal set and expand only when risk justifies it.
-- If the user approves the plan, do not silently defer or drop checklist items later; discuss any proposed scope change first
+- If the user approves the plan, do not silently defer or drop checklist items later; discuss any proposed scope change first.
+- For migrations and refactors, include the conditional legacy seam inventory and exact grep gates required by the work.
+- For refactors and replacements, call out what legacy or obsolete code will be removed instead of preserving it by default.
+- If the reduced-depth exception applies, do not force the full artifact set just to satisfy a template.
 If the change touches durable project behavior, include docs/workspace updates in the plan.
 Write or update the durable plan doc under `.waypoint/plans/` as part of the skill, not as an optional follow-up.
@@ -158,9 +207,10 @@ A good durable plan doc usually includes:
 4. Phase breakdown
 5. Scope checklist
 6. Acceptance criteria
-7. Phase checkpoints
-8. Verification
-9. TL;DR
+7. Verification
+8. TL;DR
+Include `Legacy seam inventory`, `Phase checkpoints`, `Grep gates`, and `Cleanup expectations` only when the conditional rules above require them.
 ## Final Response
@@ -168,7 +218,9 @@ When the plan doc is written:
 - give a short chat summary
 - include the doc path
+- state whether the plan is full-depth or reduced-depth
 - call out any unresolved decisions that still need the user's input
+- list which artifacts are mandatory and which are conditional for this task
 - if there are no unresolved decisions and the user approves the plan, treat that approval as authorization to execute the plan end to end rather than asking again at each obvious next step
 - once approved, update `.waypoint/ACTIVE_PLANS.md` so the active plan, current phase, and current checkpoint are visible during execution
 - once approved, use the plan's checklist, phase checkpoints, and acceptance criteria to decide whether the work is actually done; if anything approved is skipped, report that as partial work or ask to change scope instead of calling it complete
@@ -187,6 +239,7 @@ If the plan would make the implementer ask "where does this hook in?" or "what e
 - Do not leave grep gates implicit. Name the exact legacy symbols or shapes that must be gone before the phase can move forward.
 - Do not dump a transcript into the plan doc. Distill the decisions and requirements into a clean implementation handoff.
 - Do not treat a reviewed plan as a stopping point. Once the user approves it, the workflow expects execution to continue.
+- Do not use the reduced-depth exception for work that crosses contracts, data flow, or compatibility boundaries.
 ## Keep This Skill Sharp

package/templates/.agents/skills/planning/agents/openai.yaml CHANGED Viewed

@@ -1,4 +1,4 @@
 interface:
   display_name: "Planning"
-  short_description: "Interview, explore, and write an implementation-ready plan into the repo"
-  default_prompt: "Use $planning to deeply explore the repository, interview for the product and architectural details that materially affect the work, and write an implementation-ready plan into .waypoint/plans/."
+  short_description: "Interview, explore, and write a durable plan for non-trivial implementation work"
+  default_prompt: "Use $planning for non-trivial implementation work that needs interview-driven repo exploration and a durable plan doc in .waypoint/plans/. For tiny obvious edits or one-file changes, use the reduced-depth path or skip planning."

package/templates/.agents/skills/replace-dont-layer/SKILL.md CHANGED Viewed

@@ -1,32 +1,78 @@
 ---
 name: replace-dont-layer
-description: Prevent layering a new implementation path on top of an older one when the change should replace it. Use when modifying an existing flow, interface, abstraction, migration, or behavior and there is a risk of leaving both old and new paths alive. Determine whether the new path is additive, replacing, or transitional, and remove, redirect, or explicitly deprecate the old path as part of the same work.
+description: Prevent a replacement change from being layered on top of an old path. Use when an existing implementation, flow, abstraction, or behavior already has a clear old path and the requested change is to replace it, redirect traffic to the new path, and remove obsolete glue in the same change. Do not use for additive-only work, broad redesigns, or schema/state hard cuts.
 ---
-Determine whether the requested change is additive, replacing, or transitional.
+# Replace Don't Layer
-Identify the old path, the new path, and where each is used.
+Replace the old path. Do not add the new path beside it.
-If the change is additive, keep both paths only if they serve clearly different intended roles.
+## Core Instruction
-If the change is replacing, do not just add the new path. In the same work:
-- redirect callers to the new path
-- remove or deprecate the old path
-- delete obsolete conditionals, adapters, and compatibility glue
-- update tests to reflect the intended single path
+When the requested change is a replacement, the codebase must end with one clear active path, and the old path must be removed, redirected, or explicitly fenced at a real compatibility boundary in the same work.
-If the change is transitional:
-- keep duplication to the minimum necessary
-- make the temporary path explicit
-- attach a concrete removal condition
-- add a brief in-code comment only if it materially helps future cleanup, and make it specific rather than a vague TODO
+## When To Use
-Before considering the work complete, check whether both old and new paths still exist without a clear reason. If they do, keep cleaning up.
+Use this skill when all of the following are true:
-Rules:
-- Do not keep both paths alive by default.
-- Do not leave old logic in place just because removal feels riskier.
-- Do not call the work complete if the new path exists but the old one still silently handles traffic.
-- Prefer one clear path over split behavior.
-- If temporary coexistence is necessary, make the exit condition explicit.
-- Do not rely on vague TODO comments as the cleanup plan.
+1. There is already an old implementation path in the codebase.
+2. The change is meant to replace that path, not add a separate alternative.
+3. Leaving both paths alive would create split behavior, duplicated logic, or hidden fallback handling.
+4. The work is not primarily a schema migration, persisted-state cutover, or architecture redesign.
+Do not use this skill when the change is truly additive and both paths are intentionally independent.
+## Default Workflow
+1. Identify the old path, the proposed new path, and every live caller or producer/consumer that touches them.
+2. Classify the change as one of three cases:
+   - replacement
+   - additive and independent
+   - transitional with a bounded compatibility bridge
+3. If the change is replacement, update the owning layer first, then redirect callers to the new path, then delete obsolete adapters, branches, flags, and duplicate code.
+4. If the change is additive, keep both paths only when they serve distinct responsibilities and do not share behavior that should be unified.
+5. If the change is transitional, keep the bridge minimal, name the removal condition, and schedule the cleanup in the same change set.
+6. Re-check the diff for any code that still silently preserves the old path, even indirectly through helper functions, compatibility branches, or fallback routing.
+## Rules
+- Do not leave both old and new paths active by default.
+- Do not add a new path beside the old one just to reduce immediate risk.
+- Do not preserve compatibility glue after the redirection is complete.
+- Do not leave hidden fallback behavior, duplicate branching, or parallel implementations in place.
+- Do not use vague TODO comments as a cleanup plan.
+- Do not call the change complete while old logic still handles the same traffic, data, or behavior behind the scenes.
+- Do not use this skill to justify broad redesign work that belongs under `foundational-redesign`.
+- Do not use this skill to police schema or persisted-state compatibility that is already governed by `hard-cut`.
+## Exception Rule
+Keep an old path only at a real compatibility boundary that cannot be removed in the same change.
+The exception is allowed only when the old path is required for one of these:
+- already persisted user or system data
+- on-disk or database state that must still load
+- an external wire format or process boundary
+- a documented public or supported contract
+When the exception applies:
+- isolate it to the exact boundary function, file, or adapter
+- keep the bridge as small as possible
+- name the concrete removal condition
+- avoid spreading compatibility logic to callers or new codepaths
+## Output Contract
+Return a short implementation summary that states:
+- the old path
+- the new path
+- the decision: replacement, additive, or transitional
+- what was removed or redirected
+- whether any compatibility boundary remains
+- the exact removal condition if a transitional bridge remains
+- the verification performed
+If the work is incomplete, say exactly what remains instead of implying it is done.

package/templates/.agents/skills/replace-dont-layer/agents/openai.yaml CHANGED Viewed

@@ -0,0 +1,4 @@
+interface:
+  display_name: "Replace Don't Layer"
+  short_description: "Replace an existing path instead of layering the new one on top"
+  default_prompt: "Use $replace-dont-layer when a change is intended to replace an existing implementation path. Identify the old path and the new path, redirect callers to the new path, remove obsolete glue in the same change, and keep coexistence only at a real compatibility boundary."

package/templates/.agents/skills/root-cause-finder/SKILL.md ADDED Viewed

@@ -0,0 +1,70 @@
+---
+name: root-cause-finder
+description: Performs root-cause-first debugging and review by tracing expected behavior to the first unintended side effect before changing contracts, parsing, or types. Use when debugging protocol errors, deserialization failures, null payloads, missing fields, restore or hydration issues, state-ownership bugs, unexpected requests, background mutations, or reviewing junior-created code where the visible failure may be downstream noise.
+---
+# Root-Cause Finder
+## Core Instruction
+Before fixing the error, prove whether the code path that produced it was intended.
+Do not stop at the first contract, parsing, type, null, or schema error. Treat it as a possible symptom.
+## Default Workflow
+1. State the expected behavior in plain language.
+2. State the invariant in one sentence.
+3. State what definitely did not happen.
+4. Trace the causal chain from the intended action or system event to the observed system effect.
+5. Ask whether the request or mutation should have happened at all.
+6. Identify the canonical source of truth and every competing source.
+7. Find the first unintended side effect or write.
+8. Only then decide whether a downstream contract fix is still necessary.
+## Questions To Answer In Order
+1. What user action or system event was supposed to happen?
+2. What exact call path caused this request or response?
+3. Should this request, mutation, or side effect have happened at all under the expected behavior and invariants?
+4. Who owns the state at each layer?
+5. Is there observer-driven syncing, lifecycle startup code, persistence restore, retry logic, background work, or multiple sources of truth causing an unintended side effect?
+6. If a contract is violated, is the contract wrong, or did unintended logic reach the contract?
+## Rules
+- Do not make the contract more permissive unless you can prove the observed payload is intended in the final design.
+- Prefer fixing the upstream logic bug over accepting bad downstream data.
+- Separate symptom, trigger, root cause, minimal safe fix, and architectural follow-up.
+- If a low-level fix is still needed, explain why the upstream fix is not sufficient or why both are required.
+- Identify the correct layer to fix first.
+- Name the first visible wrong behavior, not only the final error.
+## Hidden Write Checks
+Treat non-explicit writes as suspicious by default.
+- Audit lifecycle hooks, callbacks, subscribers, watchers, interceptors, middleware, retries, background jobs, cache refreshers, persistence restore, scheduled tasks, and startup code.
+- Check whether derived data is being mirrored into another store, cache, file, queue, session, or database through an observer or helper layer.
+- Prefer explicit command handlers, request handlers, job runners, or user actions as writers; treat startup-time and background writes as suspects until proven intentional.
+- If a framework has automatic reactivity or lifecycle execution, map this rule onto its equivalent constructs without assuming the framework behavior is correct.
+## Output Format
+Use this structure:
+- Expected behavior
+- Invariant
+- What definitely did not happen
+- Bug class
+- Causal chain from intended action to system effect
+- First unintended side effect
+- Canonical source of truth
+- Competing sources of truth
+- Symptom
+- Trigger
+- Root cause
+- Correct layer to fix first
+- Minimal safe fix
+- Architectural follow-up
+- Proposed patch

package/templates/.agents/skills/root-cause-finder/agents/openai.yaml ADDED Viewed

File without changes

package/templates/.agents/skills/test-writing/SKILL.md CHANGED Viewed

@@ -1,52 +1,66 @@
 ---
 name: test-writing
-description: Write a small, high-signal test set that protects important behavior without overfitting to implementation details. Use whenever adding or modifying automated tests for a feature, bug fix, refactor, or behavior change. Prefer durable tests that verify user-visible behavior, important business rules, and meaningful failure modes. Avoid redundant, brittle, or low-value tests that increase maintenance cost without materially increasing confidence.
+description: Choose and write the smallest durable automated test set for a non-trivial code change when test selection is ambiguous or high-value. Use when adding or modifying tests for a behavior change, regression fix, refactor, or other change where multiple test scopes are plausible and durable coverage matters. Do not use for documentation-only edits, cosmetic changes, or straightforward one-test maintenance.
 ---
-Write tests for confidence, not volume.
+# Test Writing
-Start with the smallest test set that gives strong confidence in the requested change.
+## Mission
+Choose the smallest durable automated test set that proves the requested change and protects the highest-risk behavior.
-Default test budget for a normal feature or bug fix:
-- one main-path test
-- one key edge case or failure-path test
-- unit tests only for non-trivial pure logic
+## Default Workflow
+1. Identify the contract.
+   - State the behavior that must hold after the change.
+   - Identify the smallest surface that can prove it.
+   - Check whether existing tests already cover the contract.
+2. Rank the risks.
+   - Main path
+   - Highest-value failure path or boundary
+   - Regression that motivated the change
+   - Non-trivial pure logic
+3. Select the minimum set.
+   - Choose one test per distinct risk.
+   - Prefer the highest-level test that proves the contract.
+   - Add unit tests only for non-trivial pure logic that is not already covered.
+4. Remove redundancy.
+   - Drop tests that restate confidence already provided by a stronger test.
+   - Reject assertions that depend on implementation choreography instead of outcomes.
+5. Verify sufficiency.
+   - Confirm every chosen test has a distinct reason to exist.
+   - Confirm the set is still small enough to maintain through refactors.
+6. Report the decision.
+   - State what was chosen, what was omitted, why the set is sufficient, and what remains at risk.
-Test at the highest level that gives strong confidence at reasonable cost.
+## Rules
+- Do not add a test unless it protects a distinct behavior or failure mode.
+- Do not mirror implementation structure in test structure.
+- Do not test trivial helpers, pass-through glue, or obvious mappings unless they contain a real bug risk.
+- Do not add redundant tests across layers unless the lower-level test covers a different risk than the higher-level test.
+- Do not use brittle assertions on incidental DOM shape, private calls, exact class strings, or unstable snapshots.
+- Do not expand to a matrix of cases by default.
+- Do not optimize for test count; optimize for unique confidence.
+- For frontend work, do not choose structural assertions when a user-visible assertion can prove the behavior.
+- For backend or domain logic, do not skip contract, invariant, permission, validation, or state-transition coverage when those are the real risks.
-Prefer tests that:
-- verify observable behavior
-- protect important business rules and invariants
-- cover meaningful boundaries and failure modes
-- survive refactors
-- exercise public interfaces rather than private helpers
+## Exception Rule
+Expand beyond the default budget only when the change introduces an additional high-risk contract that cannot be covered by the same test without losing clarity or coverage.
-Avoid tests that:
-- duplicate confidence across layers without a distinct risk
-- assert implementation choreography instead of outcomes
-- test trivial helpers, thin wrappers, pass-through glue, or obvious mappings
-- encode fragile structure such as incidental DOM shape, exact class strings, private function calls, or unstable snapshots
-- expand a feature into a large matrix of low-value cases unless the risk truly requires it
+The default budget is 2 tests:
+- 1 main-path test
+- 1 highest-value edge, boundary, or failure-path test
-When choosing between many narrow tests and one stronger test, prefer the smaller set that better protects real behavior.
+You may expand to 3 tests only if all of the following are true:
+- each added test covers a distinct high-risk behavior
+- no existing test already proves the same contract
+- a combined test would hide an important failure mode or become unreadable
-For frontend work:
-- prefer stable user-visible behavior over structural assertions
-- add automated regression tests only when the behavior is worth protecting and likely to remain stable
-- do not add large numbers of UI tests for cosmetic or refactor-sensitive details
+Never exceed 4 total tests without explicit human approval.
-For backend or domain logic:
-- prefer behavior-focused tests around contracts, invariants, validation, permissions, state transitions, and real regressions
-- add targeted unit tests for tricky pure logic only when they materially improve confidence
+## Output Contract
+Return the decision in this shape:
+- Chosen tests: list each test and the risk it covers.
+- Omitted tests: list notable tests you did not write and why.
+- Rationale: explain why the selected set is sufficient and minimal.
+- Residual risk: name what remains untested and why that risk is acceptable.
-If an integration-style test already proves the important flow, do not add multiple lower-level tests that mostly restate the same confidence.
-Before finishing, remove or avoid any test whose main effect is to make future refactors harder without protecting an important contract.
-Rules:
-- Do not optimize for test count.
-- Do not mirror the implementation structure in the test structure.
-- Do not create one test per helper by default.
-- Do not add redundant tests across layers unless a distinct risk justifies them.
-- Do not test trivial code just because it exists.
-- Prefer the smallest durable set that gives high confidence.
+If no new test is needed, say that explicitly and explain which existing test already covers the contract.

package/templates/.agents/skills/test-writing/agents/openai.yaml CHANGED Viewed

@@ -1,4 +1,4 @@
 interface:
   display_name: "Test Writing"
-  short_description: "Write high-signal tests with minimal bloat"
-  default_prompt: "Use $test-writing to design and add the smallest durable test set that gives strong confidence for this change."
+  short_description: "Choose the smallest durable test set when test scope is not obvious"
+  default_prompt: "Use $test-writing to choose and add the smallest durable automated test set for this non-trivial change. Report the chosen tests, omitted tests, rationale, and residual risk."