npm - waypoint-codex - Versions diffs - 0.2.0 → 0.3.1 - Mend

waypoint-codex 0.2.0 → 0.3.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (18) hide show

package/README.md CHANGED Viewed

@@ -8,7 +8,7 @@ It helps the next agent pick up your repo with full context by keeping the impor
 - `.waypoint/WORKSPACE.md` for live state, with timestamped multi-topic entries
 - `.waypoint/docs/` for durable project memory, with `summary`, `last_updated`, and `read_when` frontmatter on routable docs
 - `.waypoint/DOCS_INDEX.md` for docs routing
-- repo-local skills for planning and audits
+- repo-local skills for planning, audits, verification, workspace compression, and review closure
 ## Install
@@ -59,6 +59,10 @@ repo/
 - `error-audit`
 - `observability-audit`
 - `ux-states-audit`
+- `workspace-compress`
+- `pre-pr-hygiene`
+- `pr-review`
+- `e2e-verify`
 ## Optional reviewer roles
@@ -66,7 +70,6 @@ If you initialize with `--with-roles`, Waypoint scaffolds:
 - `code-health-reviewer`
 - `code-reviewer`
-- `docs-researcher`
 - `plan-reviewer`
 The intended workflow is post-commit: after your own commit lands, run `code-reviewer` and `code-health-reviewer` in parallel in the background, then fix real findings before you call the work finished.

package/dist/src/core.js CHANGED Viewed

@@ -453,6 +453,10 @@ export function doctorRepository(projectRoot) {
         "error-audit",
         "observability-audit",
         "ux-states-audit",
+        "workspace-compress",
+        "pre-pr-hygiene",
+        "pr-review",
+        "e2e-verify",
     ]) {
         const skillPath = path.join(projectRoot, ".agents/skills", skillName, "SKILL.md");
         if (!existsSync(skillPath)) {

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "waypoint-codex",
-  "version": "0.2.0",
+  "version": "0.3.1",
   "description": "Codex-native repository operating system: scaffolding, docs routing, repo-local skills, doctor, and sync.",
   "license": "MIT",
   "type": "module",

package/templates/.agents/skills/e2e-verify/SKILL.md ADDED Viewed

@@ -0,0 +1,63 @@
+---
+name: e2e-verify
+description: Perform manual end-to-end verification for a shipped feature or major change. Use when frontend and backend behavior must be verified together, when a feature needs a realistic walkthrough, or when the agent should manually exercise the flow, inspect logs and persisted state, document issues, fix them, and repeat until no meaningful end-to-end issues remain.
+---
+# E2E Verify
+Use this skill when "it should work" is not enough and the flow needs to be proven end to end.
+## Read First
+Before verification:
+1. Read `.waypoint/SOUL.md`
+2. Read `.waypoint/agent-operating-manual.md`
+3. Read `.waypoint/WORKSPACE.md`
+4. Read `.waypoint/context/MANIFEST.md`
+5. Read every file listed in that manifest
+6. Read the routed docs that define the feature, flow, or contract being verified
+## Step 1: Exercise The Real Flow
+- For browser-facing paths, manually exercise the feature through the real UI.
+- For backend-only or service flows, drive the real API or runtime path directly.
+- Follow the feature from entry point to persistence to user-visible outcome.
+## Step 2: Inspect End-To-End State
+Check the surfaces that prove the system actually behaved correctly:
+- UI state
+- server responses
+- logs
+- background-job state if relevant
+- database or persisted records when relevant
+Do not stop at "the page looked okay."
+## Step 3: Record And Fix Issues
+- Document each meaningful issue you find.
+- Fix the issue when the remediation is clear.
+- Update docs or contracts if verification exposes stale assumptions.
+## Step 4: Repeat Until Clean
+Re-run the end-to-end flow after fixes.
+The skill is complete only when:
+- the intended flow works
+- the persisted state is correct
+- the logs tell a truthful story
+- no meaningful issues remain
+## Step 5: Report Verification Truthfully
+Summarize:
+- the flows exercised
+- the state surfaces inspected
+- the issues found and fixed
+- any residual risks or unverified edges

package/templates/.agents/skills/e2e-verify/agents/openai.yaml ADDED Viewed

@@ -0,0 +1,4 @@
+interface:
+  display_name: "E2E Verify"
+  short_description: "Manually verify a feature end to end"
+  default_prompt: "Use this skill for manual end-to-end verification of a feature or major change. Exercise the real flow, inspect UI plus logs and persisted state, document issues, fix them, and repeat until no meaningful end-to-end issues remain."

package/templates/.agents/skills/planning/SKILL.md CHANGED Viewed

@@ -3,9 +3,10 @@ name: planning
 description: >
   Interview-driven planning methodology that produces implementation-ready plans.
   Use for new features, refactoring, architecture changes, migrations, or any non-trivial
-  implementation work. Ask multiple rounds of clarifying questions, explore the repo deeply
-  before deciding, capture verbatim requirements when needed, and do not stop at vague or
-  partially-investigated plans.
+  implementation work. Ask multiple rounds of clarifying questions about product behavior,
+  user expectations, edge cases, and architecture; explore the repo deeply before deciding;
+  and do not waste questions on implementation details that can be learned directly from the
+  code or routed docs.
 ---
 # Planning
@@ -20,7 +21,7 @@ Before planning:
 1. Read `.waypoint/SOUL.md`
 2. Read `.waypoint/agent-operating-manual.md`
-3. Read `WORKSPACE.md`
+3. Read `.waypoint/WORKSPACE.md`
 4. Read `.waypoint/context/MANIFEST.md`
 5. Read every file listed in the manifest
 6. Read the routed docs relevant to the task
@@ -55,7 +56,7 @@ Keep looping until you can explain:
 ## Interviewing
-**Interviewing is the most important part of planning.** You cannot build what you don't understand. Every unasked question is an assumption that will break during implementation.
+**Interviewing is the most important part of planning.** You cannot build what you don't understand. Every unasked product, behavior, edge-case, or architecture question is an assumption that will break during implementation.
 Interview iteratively: 2-4 questions -> answers -> deeper follow-ups -> repeat. Each round should go deeper.
@@ -63,6 +64,16 @@ Interview iteratively: 2-4 questions -> answers -> deeper follow-ups -> repeat.
 - Feature or migration -> many rounds of questions
 - Architectural work -> keep drilling until the constraints and tradeoffs are clear
+Ask aggressively about:
+- how the feature should behave
+- who the users are
+- which edge cases matter
+- what tradeoffs are acceptable
+- what architecture direction the user wants
+Do **not** spend those questions on implementation facts that can be learned from reading the code, routed docs, or external docs already linked by the repo.
 Push back when something seems off. Neutrality is not the goal; correctness is.
 ## Exploring the Codebase
@@ -111,10 +122,11 @@ Before presenting the plan, verify against real code:
 - No pretending you verified something you didn't
 If the change touches durable project behavior, include docs/workspace updates in the plan.
+If the plan is meant to outlive the current chat, write or update a durable plan doc under `.waypoint/docs/` and make sure it is discoverable through the routed docs layer.
 ## External APIs
-If an external API or library is relevant and the repo lacks durable docs for it, use the `docs-researcher` agent before finalizing the plan.
+If an external API or library is relevant, read the actual upstream docs before finalizing the plan. Prefer the repo's external-doc links or URL registry when one exists. Read the real source docs; do not copy vendor reference material into the repo unless the user explicitly wants a durable local note.
 ## Output
@@ -130,4 +142,3 @@ A good final plan usually includes:
 ## Quality Bar
 If the plan would make the implementer ask "where does this hook in?" or "what exactly am I changing?", it is not done.

package/templates/.agents/skills/pr-review/SKILL.md ADDED Viewed

@@ -0,0 +1,48 @@
+---
+name: pr-review
+description: Triage and close the review loop on an open PR after automated or human review has started. Use when a PR has review comments pending, when automated reviewers are still running, or when you need to wait for review completion, answer every inline comment, fix meaningful issues, push follow-up commits, and keep repeating until no new meaningful review findings remain.
+---
+# PR Review
+Use this skill to drive the PR through review instead of treating review as a one-shot comment sweep.
+## Step 1: Wait For Review To Settle
+- Check the PR's current review and CI status.
+- If automated review is still running, wait for it to finish instead of racing it.
+- If comments are still arriving, do not prematurely declare the loop complete.
+## Step 2: Read Every Review Comment
+- Read all open review comments, especially inline comments.
+- Group duplicates, but do not ignore any comment.
+- Distinguish between meaningful issues, optional suggestions, and comments that should be explicitly declined.
+## Step 3: Triage And Respond Inline
+For every comment:
+- fix it if it is correct and in scope
+- explain clearly if you are declining it
+- reply inline where the comment lives instead of posting a disconnected summary comment
+Do not leave comments unanswered.
+## Step 4: Push The Next Round
+- Make the needed fixes.
+- rerun the relevant verification
+- push follow-up commit(s)
+- return to the PR and continue the loop
+Stay in the loop until no new meaningful issues remain.
+## Step 5: Close With A Crisp State Summary
+Summarize:
+- what was fixed
+- what was intentionally declined
+- what verification ran
+- whether the PR is clear or still waiting on reviewer response

package/templates/.agents/skills/pr-review/agents/openai.yaml ADDED Viewed

@@ -0,0 +1,4 @@
+interface:
+  display_name: "PR Review"
+  short_description: "Close the review loop on an active PR"
+  default_prompt: "Use this skill when a PR has active review comments or automated review in progress. Wait for review to settle, triage every comment, reply inline, fix meaningful issues, push follow-up commits, and repeat until no new meaningful findings remain."

package/templates/.agents/skills/pre-pr-hygiene/SKILL.md ADDED Viewed

@@ -0,0 +1,61 @@
+---
+name: pre-pr-hygiene
+description: Run a broad final hygiene pass before pushing, before opening or updating a PR, or after a large implementation chunk when the diff is substantial and needs a deeper audit than per-commit review. Verify code-guide compliance, docs-to-behavior alignment, shared contract/schema drift, typing gaps, optimistic UI rollback or invalidation strategy, persistence correctness risks, and any other cross-cutting quality issues that would make the next review painful.
+---
+# Pre-PR Hygiene
+Use this skill for the larger final audit before code leaves the machine.
+## Read First
+Before the hygiene pass:
+1. Read `.waypoint/SOUL.md`
+2. Read `.waypoint/agent-operating-manual.md`
+3. Read `.waypoint/WORKSPACE.md`
+4. Read `.waypoint/context/MANIFEST.md`
+5. Read every file listed in that manifest
+6. Read `.waypoint/docs/code-guide.md` and the routed docs relevant to the area being shipped
+## Step 1: Audit The Whole Change Surface
+Inspect the code and docs that are about to ship.
+Look for:
+- code-guide violations such as silent fallbacks, swallowed errors, weak boundary validation, unsafe typing, or stale compatibility layers
+- stale docs, stale routes, or workspace notes that no longer match real behavior
+- shared schema, fixture, or API-contract drift
+- typing gaps such as avoidable `any`, weak narrowing, or unnecessary casts
+- optimistic UI without rollback or invalidation
+- persistence risks such as missing provenance, missing idempotency protection, weak uniqueness, or missing foreign-key style invariants
+Skip checks that truly do not apply, but say that you skipped them.
+## Step 2: Fix Or Stage Findings
+- Fix meaningful issues directly when the right remediation is clear.
+- Update `.waypoint/docs/` when shipped behavior or routes changed.
+- If the live handoff has become bloated or stale, use `workspace-compress`.
+Do not stop at reporting obvious fixable issues.
+## Step 3: Verify Before Ship
+Run the most relevant verification for the area:
+- tests
+- typecheck
+- lint
+- build
+- targeted manual QA
+## Step 4: Report The Gate Result
+Summarize:
+- what you checked
+- what you fixed
+- what verification ran
+- what residual risks remain, if any

package/templates/.agents/skills/pre-pr-hygiene/agents/openai.yaml ADDED Viewed

@@ -0,0 +1,4 @@
+interface:
+  display_name: "Pre-PR Hygiene"
+  short_description: "Run the final cross-cutting ship audit"
+  default_prompt: "Use this skill before pushing or opening/updating a PR for substantial work to do a broader hygiene pass across code, docs, contracts, typing, UI rollback, persistence correctness, and code-guide compliance."

package/templates/.agents/skills/workspace-compress/SKILL.md ADDED Viewed

@@ -0,0 +1,90 @@
+---
+name: workspace-compress
+description: Compress and refresh the repository's live workspace handoff so `WORKSPACE.md` stays short, current, and useful to the next agent. Use after finishing a meaningful chunk of work, before stopping for the session, before asking for review, before opening or updating a PR, or whenever the workspace has started accumulating stale history, repeated status logs, or resolved context that should no longer stay in the live handoff. Keep the minimum current operational state, collapse old resolved entries, and move durable detail into existing routed docs instead of duplicating it in the workspace.
+---
+# Workspace Compress
+Keep `WORKSPACE.md` as a live handoff, not a project diary.
+This skill is for compression, not for erasing context. Preserve what the next agent needs in the first few minutes of a resume, and push durable detail into the docs layer that already exists in the repo.
+## Read First
+Before compressing:
+1. Read `.waypoint/SOUL.md`
+2. Read `.waypoint/agent-operating-manual.md`
+3. Read `.waypoint/WORKSPACE.md`
+4. Read `.waypoint/context/MANIFEST.md`
+5. Read every file listed in that manifest
+6. Read the routed docs relevant to the active workspace sections
+## Step 1: Build Context From Routing, Not From Git Diff
+This skill must work even in a dirty tree or an arbitrary session state.
+- Read the workspace file in full.
+- Read `.waypoint/DOCS_INDEX.md` and the workspace's obvious routing pointers.
+- Read the project or domain docs directly linked from the active sections you may compress.
+- If the workspace references a progress, status, architecture, or release doc, treat that as the durable home for details before removing anything from the live handoff.
+Do not rely on `git diff` as the primary signal for what matters.
+## Step 2: Apply The Handoff Test
+Ask one question:
+**What does the next agent need in the first 10 minutes to resume effectively?**
+Keep only the answer to that question in the workspace. Usually that means:
+- current focus
+- latest verified state
+- open blockers or risks
+- immediate next steps
+- only the last few meaningful timestamped updates
+Usually remove or collapse:
+- resolved implementation logs
+- repeated status updates that say the same thing
+- validation transcripts
+- old milestone history
+- duplicated durable documentation
+Compression is documentation quality, not data loss.
+## Step 3: Compress Safely
+When editing the workspace:
+1. Preserve the active operational truth.
+2. Collapse stale resolved bullets into one short summary when history still matters.
+3. Remove entries that are already preserved elsewhere and no longer affect immediate execution.
+4. Keep timestamp discipline for new or materially revised bullets.
+5. Do not turn the workspace into an archive, changelog, or debug notebook.
+If durable context is missing from `.waypoint/docs/`, add or refresh the smallest coherent routed doc before removing it from the workspace.
+## Step 4: Protect User-Owned State
+- Never overwrite or revert unrelated user changes.
+- If a workspace or doc already has in-flight edits you did not make, read carefully and work around them.
+- Prefer surgical edits over broad rewrites.
+- Do not delete project memory just because live state is being compressed.
+## Step 5: Refresh Routing
+After changing routed docs or the workspace:
+- Run `node .waypoint/scripts/prepare-context.mjs` so the docs index and generated context match the edited sources.
+## Step 6: Report The Result
+Summarize:
+- what stayed in the live handoff
+- what was collapsed or removed
+- which durable docs now hold the preserved detail
+- any remaining risk that still belongs in the workspace

package/templates/.agents/skills/workspace-compress/agents/openai.yaml ADDED Viewed

@@ -0,0 +1,4 @@
+interface:
+  display_name: "Workspace Compress"
+  short_description: "Compress the live workspace handoff"
+  default_prompt: "Use this skill after a meaningful chunk of work, before stopping, before review, or before opening or updating a PR to keep WORKSPACE.md short, current, and useful to the next agent."

package/templates/.codex/config.toml CHANGED Viewed

@@ -3,7 +3,7 @@ multi_agent = true
 [agents]
 max_depth = 1
-max_threads = 4
+max_threads = 24
 [agents."code-health-reviewer"]
 description = "Read-only background reviewer for post-commit maintainability drift, dead code, duplication, and refactoring opportunities worth fixing."
@@ -13,10 +13,6 @@ config_file = "agents/code-health-reviewer.toml"
 description = "Read-only background reviewer for post-commit bugs, regressions, and integration mistakes."
 config_file = "agents/code-reviewer.toml"
-[agents."docs-researcher"]
-description = "Documentation researcher for external APIs, libraries, and tools when the repo lacks durable docs."
-config_file = "agents/docs-researcher.toml"
 [agents."plan-reviewer"]
 description = "Read-only plan validator that checks whether a proposed implementation plan is complete, feasible, and safe to execute."
 config_file = "agents/plan-reviewer.toml"

package/templates/.waypoint/agent-operating-manual.md CHANGED Viewed

@@ -46,6 +46,7 @@ If something important lives only in your head or in the chat transcript, the re
 - Update `.waypoint/docs/` when durable knowledge changes, and refresh each changed routable doc's `last_updated` field.
 - Rebuild `.waypoint/DOCS_INDEX.md` whenever routable docs change.
 - Use the repo-local skills and optional reviewer agents instead of improvising from scratch.
+- Do not kill long-running subagents or reviewer agents just because they are slow. Wait unless they are clearly stuck, failed, or the user redirects the work.
 ## Documentation expectations
@@ -66,6 +67,10 @@ Do not document every trivial implementation detail. Document the non-obvious, d
 - `error-audit` when failures are being swallowed or degraded invisibly
 - `observability-audit` when production debugging signals look weak
 - `ux-states-audit` when async/data-driven UI likely lacks loading, empty, or error states
+- `workspace-compress` after meaningful chunks, before stopping, and before review when the live handoff needs compression
+- `pre-pr-hygiene` before pushing or opening/updating a PR for substantial work
+- `pr-review` once a PR has active review comments or automated review in progress
+- `e2e-verify` for major user-facing or cross-system changes that need manual end-to-end verification
 ## When to use the optional reviewer agents
@@ -73,7 +78,6 @@ If the repo was initialized with Waypoint roles enabled, use them as focused sec
 - `code-reviewer` for correctness and regression review
 - `code-health-reviewer` for maintainability drift
-- `docs-researcher` for external dependency research
 - `plan-reviewer` to challenge weak implementation plans before execution
 ## Post-Commit Review Loop

package/templates/.waypoint/docs/code-guide.md CHANGED Viewed

@@ -1,96 +1,113 @@
 ---
-summary: Opinionated rules for writing and changing Waypoint code so behavior stays explicit, strict, observable, and easy to evolve.
-last_updated: "2026-03-05 21:27 PST"
+summary: Universal coding conventions — explicit behavior, type safety, frontend consistency, reliability, and behavior-focused verification
+last_updated: "2026-03-10 10:05 PDT"
 read_when:
-  - writing new code
-  - changing existing behavior
-  - introducing or removing configuration
-  - handling external input or external systems
-  - adding tests, logging, or state transitions
+  - writing code
+  - coding standards
+  - code conventions
+  - TypeScript
+  - frontend
+  - backend
+  - error handling patterns
+  - testing
 ---
 # Code Guide
-Waypoint favors explicitness over convenience, correctness over compatibility theater, and deletion over accumulation. Code should make the system easier to reason about after the change, not merely pass today.
+Write code that keeps behavior explicit, failure visible, and the next change easier than the last one.
 ## 1. Compatibility is opt-in, not ambient
 Do not preserve old behavior unless a user-facing requirement explicitly asks for it.
-When replacing a path, remove the old one instead of leaving a shim, alias, translation layer, or silent compatibility branch. If compatibility must be preserved, document the exact contract being preserved and the planned removal condition.
+- Remove replaced paths instead of leaving shims, aliases, or silent compatibility branches.
+- Do not keep dead fields, dual formats, or migration-only logic "just in case."
+- If compatibility must stay, document the exact contract being preserved and the removal condition.
-Do not keep dead fields, dead parameters, dual formats, or migration-only logic "just in case". Every compatibility layer becomes part of the design whether intended or not.
+## 2. Type safety is non-negotiable
-## 2. Fail clearly, never quietly
+The compiler is part of the design, not an afterthought.
-Errors are part of the contract. Surface them early and with enough context to diagnose the cause.
+- Write as if strict mode is enabled. Type errors are build blockers.
+- Never use `any` when `unknown`, narrowing, generics, or better shared types can express the real contract.
+- Reuse exported library or app types instead of recreating them locally.
+- Be explicit at boundaries: function params, returns, public interfaces, API payloads, DB rows, and shared contracts.
+- Validate external data at boundaries with schema validation and convert it into trusted internal shapes once.
+- Avoid cross-package type casts unless there is no better contract available; fix the shared types instead when practical.
-Do not swallow errors, downgrade them to logs, or return partial success unless partial success is the explicit API. If an operation can fail in distinct ways, preserve those distinctions. Do not replace a specific failure with a generic one that destroys meaning.
+## 3. Fail clearly, never quietly
-Error messages should identify what failed, at which boundary, and why the system refused to proceed. They should not force readers to infer missing state from generic text.
+Errors are part of the contract.
-## 3. No silent fallback paths
+- Fail explicitly. No silent fallbacks, empty catches, or degraded behavior that pretends everything is fine.
+- Every caught exception must propagate, crash, or be surfaced truthfully to the user or operator.
+- Do not silently switch to worse models, stale cache, inferred defaults, empty values, or best-effort modes unless that degradation is an intentional product behavior.
+- Required configuration has no silent defaults. Missing required config is a startup or boundary failure.
+- Error messages should identify what failed, where, and why.
-Fallbacks are allowed only when they are deliberate product behavior, not a coding reflex.
+## 4. Validate at boundaries
-Do not silently retry with weaker behavior, alternate providers, cached data, inferred defaults, empty values, or best-effort modes unless the user asked for degraded operation and the degraded result remains truthful.
+Anything crossing a boundary is untrusted until proven otherwise.
-If degradation exists, it must be explicit in code, testable, and observable. Hidden fallback logic makes the system look healthy while it is already off-contract.
+- Validate user input, config, files, HTTP responses, generated content, database reads, queue payloads, and external API data at the boundary.
+- Reject invalid data instead of "normalizing" it into something ambiguous.
+- Keep validation near the boundary instead of scattering half-validation deep inside the system.
-## 4. Validate at boundaries, not deep inside
+## 5. Prefer direct code over speculative abstraction
-Anything that crosses a boundary must be treated as untrusted: user input, config, environment, files, network responses, database reads, queue payloads, generated content, and data from other modules.
+Do not invent complexity for hypothetical future needs.
-Validate structure, required fields, allowed values, and invariants at the boundary. Convert external data into a trusted internal shape once. Do not pass loosely-typed or half-validated data deeper into the system and hope downstream code copes.
+- Add abstractions only when multiple concrete cases already demand the same shape.
+- Prefer straightforward code and small duplication over the wrong generic layer.
+- If a helper hides critical validation, state changes, or failure modes, it is probably hurting clarity.
-Boundary validation should reject invalid data, not "normalize" it into something ambiguous.
+## 6. Make state, contracts, and provenance explicit
-## 5. Configuration must be strict
+Readers should be able to tell what states exist, what transitions are legal, and what data can be trusted.
-Missing or invalid configuration should stop the system at startup or at the feature boundary where it becomes required.
+- Use explicit state representations and enforce invariants at the boundary of the operation.
+- Multi-step writes must have clear transaction boundaries.
+- Retryable operations must be idempotent or guarded against duplicate effects.
+- New schema and persistence work should make provenance obvious and protect against duplication with the right uniqueness constraints, foreign keys, or equivalent invariants.
+- Shared schemas, fixtures, and contract types must match the real API and stored data shape.
-Do not hide absent configuration behind guessed defaults, environment-dependent behavior, or implicit no-op modes. Defaults are acceptable only when they are safe, intentional, and documented as part of the product contract.
+## 7. Frontend must reuse and fit the existing system
-Configuration should be centralized enough to audit. A reader should be able to tell which settings matter, what values are valid, and what happens when they are wrong.
+Frontend changes should extend the app, not fork its design language.
-## 6. Prefer direct code over speculative abstraction
+- Before creating a new component, check whether the app already has a component or pattern that should be reused.
+- Reuse existing components when they satisfy the need, even if minor adaptation is required.
+- When a new component is necessary, make it match the design language, interaction model, spacing, states, and compositional patterns of the rest of the app.
+- Handle all states for async and data-driven UI: loading, success, empty, error.
+- Optimistic UI must have an explicit rollback or invalidation strategy. Never leave optimistic state hanging without a recovery path.
-Do not introduce abstractions for imagined future use. Add them only when multiple concrete cases already demand the same shape.
+## 8. Observability is part of correctness
-A small amount of duplication is cheaper than the wrong abstraction. Prefer code that exposes the current domain plainly over generic layers, plugin systems, factories, wrappers, or strategy trees created before the need is real.
+If you cannot see the failure path, you have not finished the work.
-Abstractions must remove real complexity, not relocate it. If a helper hides critical behavior, state changes, validation, or failure modes, it is making the system harder to read.
+- Emit structured logs, metrics, or events at important boundaries and state transitions.
+- Include enough context to reproduce issues without logging secrets or sensitive data.
+- Failed async work, retries, degraded paths, and rejected inputs must leave a useful trace.
+- Do not use noisy logging to compensate for unclear control flow.
-## 7. Make state and invariants explicit
+## 9. Test behavior, not implementation
-State transitions should be visible in code. Readers should be able to answer: what states exist, what moves the system between them, and what must always be true.
+Tests should protect the contract users depend on.
-Do not encode important transitions as scattered flag mutations, ordering assumptions, optional fields, or side effects hidden in utility calls. Avoid representations where invalid states are easy to create and hard to detect.
-When a function changes state, make the transition obvious. When a module depends on an invariant, assert it at the boundary of the operation instead of relying on folklore.
-## 8. Tests define behavior changes, not just regressions
-Any behavior change must update tests to describe the new contract. If the old behavior mattered, remove or rewrite the old tests instead of making them weaker.
-Test observable behavior and boundary cases, not implementation trivia. Cover failure modes, validation rules, configuration strictness, and any intentional degradation path. If a bug fix closes a previously possible bad state, add a test that proves the bad state is now rejected.
-Do not merge code whose behavior changed without leaving behind executable evidence of the new rules.
-## 9. Observability is part of correctness
-Code is not complete if production failures cannot be understood from its signals.
-Emit structured logs, metrics, or events at important boundaries and state transitions, especially around input rejection, external calls, retries, and degraded modes. Observability should explain which path executed and why.
-Do not log noise to compensate for poor design. Prefer a small number of high-value signals tied to decisions, failures, and contract edges.
+- Test observable behavior and boundary cases, not implementation trivia.
+- Never write brittle regression tests that assert exact class strings, styling internals, private helper calls, incidental DOM structure, internal schema representations, or other implementation-detail artifacts.
+- Regression tests must focus on the behavior that was broken and the behavior that is now guaranteed.
+- For backend bugs, prefer behavior-focused regression tests by default.
+- For frontend bugs, prefer manual QA by default; add automated regression coverage only when there is a stable user-visible behavior worth protecting.
+- Do not merge behavior changes without leaving behind executable or clearly documented evidence of the new contract.
 ## 10. Optimize for future legibility
-Write code for the next person who must change it under pressure.
-Keep modules narrow in responsibility. Keep data flow obvious. Keep control flow boring. Prefer designs where the main path is easy to follow and unusual behavior is explicitly named.
+Write code for the next engineer or agent who has to change it under pressure.
-When changing code, improve the shape around the change if needed. Do not leave behind half-migrated designs, obsolete branches, commented-out code, or placeholders for imagined follow-ups.
+- Keep modules narrow in responsibility and data flow obvious.
+- Remove stale branches, half-migrations, dead code, and obsolete docs around the change.
+- Keep docs and shipped behavior aligned.
+- Before pushing or opening a PR, do a hygiene pass for stale docs, drifting contracts, typing gaps, missing rollback strategies, and new persistence correctness risks.
-The best code is not code that can handle every possible future. It is code whose current truth is obvious, whose failures are visible, and whose wrong parts can be deleted without fear.
+The best code is not the most flexible code. It is the code whose current truth is obvious, whose failures are visible, and whose wrong parts can be deleted without fear.

package/templates/managed-agents-block.md CHANGED Viewed

@@ -35,5 +35,8 @@ Working rules:
 - Update `.waypoint/docs/` when behavior or durable project knowledge changes, and refresh `last_updated` on touched routable docs
 - Use the repo-local skills Waypoint ships for structured workflows when relevant
 - If optional reviewer roles are present and you make a commit, run `code-reviewer` and `code-health-reviewer` in parallel before calling the work done
+- Before pushing or opening/updating a PR for substantial work, use `pre-pr-hygiene`
+- Use `pr-review` once a PR has active review comments or automated review in progress
+- Use `e2e-verify` for major user-facing or cross-system changes that need manual end-to-end verification
 - Treat the generated context bundle as required session bootstrap, not optional reference material
 <!-- waypoint:end -->

package/templates/.codex/agents/docs-researcher.toml DELETED Viewed

@@ -1,14 +0,0 @@
-model_reasoning_effort = "medium"
-sandbox_mode = "read-only"
-developer_instructions = """
-Read these files in order before doing anything else:
-1. .waypoint/SOUL.md
-2. .waypoint/agent-operating-manual.md
-3. WORKSPACE.md
-4. .waypoint/context/MANIFEST.md
-5. every file listed in that manifest
-6. .waypoint/agents/docs-researcher.md
-After reading them, follow .waypoint/agents/docs-researcher.md as your operating instructions.
-"""

package/templates/.waypoint/agents/docs-researcher.md DELETED Viewed

@@ -1,73 +0,0 @@
----
-name: docs-researcher
-source: meridian-adapted
----
-You research external tools, APIs, and products, building comprehensive knowledge docs that the repo can reference later.
-## Read First
-1. Read `.waypoint/SOUL.md`
-2. Read `.waypoint/agent-operating-manual.md`
-3. Read `WORKSPACE.md`
-4. Read `.waypoint/context/MANIFEST.md`
-5. Read every file listed in the manifest
-6. Read any existing repo docs for the dependency or integration
-## Critical Rule
-Do not write documentation from memory when the facts are version-sensitive.
-Research first, then write.
-## What You Produce
-Not just API specs. Produce durable repo knowledge:
-- what the tool is and what it's for
-- current version or current relevant state
-- setup requirements
-- conceptual model
-- best practices
-- operations and constraints
-- gotchas and known sharp edges
-- links to official docs for deeper reading
-## Process
-### 1. Check Existing Docs
-Search the repo for existing docs about the tool. Understand what's documented, what's missing, and what's stale.
-### 2. Research
-Use primary sources first:
-- official docs
-- official guides
-- changelogs
-- release notes
-- authoritative repos
-Capture text content, not just code snippets. The conceptual guides matter as much as the method signatures.
-### 3. Write or Update Durable Docs
-Create or update repo docs so future work does not have to repeat the same research.
-### 4. Rebuild the Docs Index
-If you add or update durable docs, rebuild `DOCS_INDEX.md`.
-## Quality Bar
-The repo should be smarter after you finish than before you started.
-## Output
-Summarize:
-- what you researched
-- what matters for this repo
-- what durable docs were added or updated