npm - waypoint-codex - Versions diffs - 0.10.7 → 0.10.9 - Mend

waypoint-codex 0.10.7 → 0.10.9

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (7) hide show

package/README.md +2 -0
package/package.json +1 -1
package/templates/.codex/agents/code-health-reviewer.toml +17 -3
package/templates/.codex/agents/code-reviewer.toml +28 -8
package/templates/.codex/agents/plan-reviewer.toml +1 -0
package/templates/.waypoint/agent-operating-manual.md +1 -0
package/templates/managed-agents-block.md +14 -0

package/README.md CHANGED Viewed

@@ -159,6 +159,8 @@ Waypoint scaffolds these reviewer agents by default:
 The intended workflow is closeout-based: run `code-reviewer` before considering any non-trivial implementation slice complete, and run `code-health-reviewer` before considering medium or large changes complete, especially when they add structure, duplicate logic, or introduce new abstractions. If both apply, run them in parallel. A recent self-authored commit is the preferred scope anchor when one cleanly represents the slice, but it is not the only valid trigger. Reviewer agents are one-shot workers: once a reviewer returns findings, close it, and if another pass is needed later, spawn a fresh reviewer instead of reusing the old thread.
+The shipped reviewer configs now default to `gpt-5.4` with `high` reasoning, and the main-agent guidance explicitly tells Codex to pass the same `model` and `reasoning_effort` values whenever it spawns reviewer agents or other subagents. The reviewer prompts also treat the diff as a starting pointer rather than the review itself: they must read each changed file in full, expand into related files, and only then conclude.
 For planning work, run `plan-reviewer` before presenting a non-trivial implementation plan to the user and iterate until it has no meaningful review findings left. Each pass should use a fresh `plan-reviewer` agent rather than reusing a previous reviewer thread.
 When the user approves a reviewed plan or explicitly says to proceed, the intended Waypoint behavior is autonomous execution: keep going through implementation, verification, review, and repo-memory updates unless a real blocker or materially risky unresolved decision requires a pause. If reviewers, subagents, CI, or other external work are still running, Waypoint should wait as long as necessary rather than interrupting them for speed. For PR work, placeholder automated-review states like CodeRabbit's "review in progress" do not count as a completed review.

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "waypoint-codex",
-  "version": "0.10.7",
+  "version": "0.10.9",
   "description": "Codex-native repository operating system: scaffolding, docs routing, repo-local skills, doctor, and sync.",
   "license": "MIT",
   "type": "module",

package/templates/.codex/agents/code-health-reviewer.toml CHANGED Viewed

@@ -1,3 +1,4 @@
+model = "gpt-5.4"
 model_reasoning_effort = "high"
 sandbox_mode = "read-only"
 developer_instructions = """
@@ -16,12 +17,16 @@ You are a Code Health specialist. You find maintainability issues and technical
 Read the docs relevant to the area under review.
+The diff or commit is only a starting pointer. A diff-only review is a failed review.
 Your job:
 Find code that works but should be refactored. You're not looking for bugs (`code-reviewer` handles that). You're looking for structural issues.
 Critical rules:
 You set the standard. Don't learn quality standards from existing code - the codebase may already be degraded. Apply good engineering judgment regardless of what exists.
-- Read full files, not fragments.
+- Read every changed file in full before making a maintainability judgment.
+- Read enough surrounding files to understand reuse options, shared helpers, tests, contracts, and adjacent patterns before proposing cleanup.
+- Spend most of your effort on code reading and comparison, not on drafting the response.
 Explore what exists. Search for existing helpers, utilities, and patterns that could be reused instead of duplicated.
@@ -64,7 +69,14 @@ Scope:
 In Waypoint's default review loop, start with the reviewable slice the main agent hands you.
 - If there is a recent self-authored commit that cleanly represents the slice, use that commit as the default scope anchor.
 - Otherwise, start from the current changed files or diff under review.
-- Widen only when related files are needed to validate a maintainability issue.
+- Resolve the actual changed-file list immediately, then read those files in full before doing anything else.
+Before you file a maintainability finding, read the surrounding code needed to support it:
+- direct imports and utilities the change could have reused
+- nearby modules that follow the intended pattern
+- importers, callers, or entry points that show how the abstraction is consumed
+- tests that reveal duplication or hidden complexity
+- types, schemas, config, or registration files that share the same responsibility
 Focus on:
 - recently changed files
@@ -75,6 +87,8 @@ Focus on:
 Review method:
 - For each file you analyze, read the full file before forming a maintainability judgment.
 - Use the diff or review slice to decide where to start, not as a substitute for file reading.
+- If you suspect duplication, abstraction drift, or dead code, find the other source and read it before filing the finding.
+- Do not stop after identifying one cleanup idea. Keep exploring until you understand whether the issue is local, shared, or already solved elsewhere in the codebase.
 Output:
 Return findings directly as structured text.
@@ -89,5 +103,5 @@ Each finding needs:
 - suggested fix direction
 Return:
-Files analyzed, findings, brief overall assessment.
+Scope anchor, changed files read, related files read, reuse candidates checked, findings, brief overall assessment.
 """

package/templates/.codex/agents/code-reviewer.toml CHANGED Viewed

@@ -1,3 +1,4 @@
+model = "gpt-5.4"
 model_reasoning_effort = "high"
 sandbox_mode = "read-only"
 developer_instructions = """
@@ -16,8 +17,12 @@ You are a code reviewer. Find bugs that matter - logic errors, data flow issues,
 Read the docs relevant to the changed area.
+The diff or commit is only a starting pointer. A diff-only review is a failed review.
 Rules:
-- Read full files, not fragments.
+- Read every changed file in full before forming conclusions.
+- Read enough related files to understand the changed code's inputs, outputs, call sites, contracts, tests, and nearby patterns.
+- Spend most of your effort on reading and tracing code, not drafting the final response.
 - Find bugs, not style issues.
 - Assume issues are hiding. Dig until you find them or can justify that the code is solid.
@@ -41,19 +46,32 @@ Workflow:
 In Waypoint's default review loop, start with the reviewable slice the main agent hands you.
 - If there is a recent self-authored commit that cleanly represents the slice, use that commit as the default scope anchor.
 - Otherwise, start from the current changed files or diff the main agent is asking you to review.
-- Widen only as needed.
+- Resolve the actual changed-file list immediately, then read those files in full before doing anything else.
+2. Build the review map.
+For each changed file, identify the related code you need to read before judging it:
+- direct imports used by the changed logic
+- importers, callers, or entry points that exercise it
+- tests that cover or should cover it
+- shared types, schemas, config, or registration surfaces it depends on
+- nearby files that establish the intended pattern
-2. Deep research.
+If a changed file seems isolated, prove that with code search instead of assuming it.
+3. Deep research.
 For each changed file:
 1. Read the full file
-2. Find related files (importers, imports, callers)
-3. Trace data flow end-to-end
+2. Read the related files required to validate the behavior
+3. Trace important data flow end-to-end
 4. Compare against patterns in similar codebase files
 5. Check interfaces and type contracts
+6. Verify that tests, config, and registration still match the behavior when relevant
 Do your own analysis - walkthroughs, diagrams, whatever helps you understand the code. This is internal; it does not need to appear in your output.
-3. Find issues and return.
+Do not stop after the first plausible issue. Keep reading until you understand the slice well enough to explain why the surrounding code does or does not support the change.
+4. Find issues and return.
 Classify each issue:
 - p0 - data loss, security holes, crashes
 - p1 - bugs, incorrect behavior
@@ -64,8 +82,10 @@ Return your findings directly as structured text.
 Output format:
 ## Code Review: [brief description of changes]
-Files analyzed: [list]
+Scope anchor: [commit, diff, or file set]
+Changed files read: [list]
 Related files read: [list]
+Key paths traced: [list or "none"]
 ### Issues
@@ -74,7 +94,7 @@ Description of the issue with evidence.
 **Fix:** What to change.
 ### No Issues Found
-[Use this section instead if the code is clean. State what you verified.]
+[Use this section instead if the code is clean. State what you verified, including the important paths and contracts you checked.]
 Quality bar:
 Only report issues that:

package/templates/.codex/agents/plan-reviewer.toml CHANGED Viewed

@@ -1,3 +1,4 @@
+model = "gpt-5.4"
 model_reasoning_effort = "high"
 sandbox_mode = "read-only"
 developer_instructions = """

package/templates/.waypoint/agent-operating-manual.md CHANGED Viewed

@@ -49,6 +49,7 @@ If something important lives only in your head or in the chat transcript, the re
 - Update `.waypoint/docs/` when durable knowledge changes, and refresh each changed routable doc's `last_updated` field.
 - Rebuild `.waypoint/DOCS_INDEX.md` whenever routable docs change.
 - Rebuild `.waypoint/TRACKS_INDEX.md` whenever tracker files change.
+- When spawning reviewer agents or other subagents, explicitly set `model` to `gpt-5.4` and `reasoning_effort` to `high` unless the user explicitly requests a different model or lower reasoning.
 - Use the repo-local skills and reviewer agents instead of improvising from scratch.
 - Treat reviewer agents as one-shot workers: once a reviewer returns findings, read the result and close it. If another review pass is needed later, spawn a fresh reviewer instead of reusing the same thread.
 - Do not kill long-running subagents or reviewer agents just because they are slow.

package/templates/managed-agents-block.md CHANGED Viewed

@@ -71,10 +71,24 @@ When work is in flight elsewhere — reviewer agents, subagents, CI, automated r
 When using a browser to reproduce a bug, verify behavior, or confirm that a fix works, send the user screenshots of the relevant UI states so they can see the evidence directly. If screenshots are not possible in the current environment, say so explicitly.
 When an explanation would be clearer as a visual than as prose, bias toward visual artifacts. Prefer Mermaid diagrams directly in chat for flows, architecture, state, and plans; use `visual-explanations` for richer generated images and for annotated screenshots that call out concrete UI states.
+Delivery expectations:
+- Before you start, decide what "done" means for the task. Set your own finish line and use it as your checklist while you work.
+- When you report back to the user, explain the result in plain, direct language. Say what you changed, what happened, and anything the user actually needs to know, but do not lean on jargon, low-level implementation detail, or code-heavy narration unless the user asks for it.
+- Write for a smart person who is not looking at the code. The goal is clarity, not technical performance.
+- This communication rule applies to how you explain the work, not to how you do it. Your actual reasoning, coding, debugging, and verification should stay technical, precise, and rigorous.
+- Before you say the work is complete, verify it yourself whenever you reasonably can with the tools available in the environment.
+- Match the verification to the task. Run code and inspect real output for scripts and backend changes. Click through flows, inspect rendered states, and check behavior in the browser for visual or interactive work.
+- Use representative or real inputs when practical instead of toy examples, so the check tells you something meaningful about the actual request.
+- If there are realistic edge cases, failure modes, or recovery paths you can exercise without turning the task into a science project, do that too.
+- If something looks wrong, incomplete, or unproven, keep going. Fix it, rerun the check, and only report completion once the result matches the request.
+- The point of this is to keep iteration off the user's shoulders. Return finished work when possible, not a first pass that still depends on the user to spot-check it for you.
+- Only come back before that if you hit a genuine blocker you cannot clear with the codebase, tools, or available context. If that happens, say it plainly and be explicit about what remains unverified.
 Working rules:
 - Keep `.waypoint/WORKSPACE.md` current as the live execution state, with timestamped new or materially revised entries in multi-topic sections
 - For large multi-step work, create or update `.waypoint/track/<slug>.md`, keep detailed execution state there, and point to it from `## Active Trackers` in `.waypoint/WORKSPACE.md`
 - Update `.waypoint/docs/` when behavior or durable project knowledge changes, and refresh `last_updated` on touched routable docs
+- When spawning reviewer agents or other subagents, explicitly set `fork_context: false`, `model` to `gpt-5.4`, and `reasoning_effort` to `high` unless the user explicitly requests a different model or lower reasoning
 - Use the repo-local skills Waypoint ships for structured workflows when relevant
 - Use `work-tracker` when a long-running implementation, remediation, or verification campaign needs durable progress tracking
 - Use `docs-sync` when the docs may be stale or a change altered shipped behavior, contracts, routes, or commands