npm - waypoint-codex - Versions diffs - 0.10.6 → 0.10.8 - Mend

waypoint-codex 0.10.6 → 0.10.8

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (7) hide show

package/README.md +4 -2
package/package.json +1 -1
package/templates/.codex/agents/code-health-reviewer.toml +19 -3
package/templates/.codex/agents/code-reviewer.toml +30 -8
package/templates/.codex/agents/plan-reviewer.toml +3 -0
package/templates/.waypoint/agent-operating-manual.md +11 -6
package/templates/managed-agents-block.md +2 -0

package/README.md CHANGED Viewed

@@ -157,9 +157,11 @@ Waypoint scaffolds these reviewer agents by default:
 - `code-reviewer`
 - `plan-reviewer`
-The intended workflow is closeout-based: run `code-reviewer` before considering any non-trivial implementation slice complete, and run `code-health-reviewer` before considering medium or large changes complete, especially when they add structure, duplicate logic, or introduce new abstractions. If both apply, run them in parallel. A recent self-authored commit is the preferred scope anchor when one cleanly represents the slice, but it is not the only valid trigger.
+The intended workflow is closeout-based: run `code-reviewer` before considering any non-trivial implementation slice complete, and run `code-health-reviewer` before considering medium or large changes complete, especially when they add structure, duplicate logic, or introduce new abstractions. If both apply, run them in parallel. A recent self-authored commit is the preferred scope anchor when one cleanly represents the slice, but it is not the only valid trigger. Reviewer agents are one-shot workers: once a reviewer returns findings, close it, and if another pass is needed later, spawn a fresh reviewer instead of reusing the old thread.
-For planning work, run `plan-reviewer` before presenting a non-trivial implementation plan to the user and iterate until it has no meaningful review findings left.
+The shipped reviewer configs now default to `gpt-5.4` with `high` reasoning, and the main-agent guidance explicitly tells Codex to pass the same `model` and `reasoning_effort` values whenever it spawns reviewer agents or other subagents. The reviewer prompts also treat the diff as a starting pointer rather than the review itself: they must read each changed file in full, expand into related files, and only then conclude.
+For planning work, run `plan-reviewer` before presenting a non-trivial implementation plan to the user and iterate until it has no meaningful review findings left. Each pass should use a fresh `plan-reviewer` agent rather than reusing a previous reviewer thread.
 When the user approves a reviewed plan or explicitly says to proceed, the intended Waypoint behavior is autonomous execution: keep going through implementation, verification, review, and repo-memory updates unless a real blocker or materially risky unresolved decision requires a pause. If reviewers, subagents, CI, or other external work are still running, Waypoint should wait as long as necessary rather than interrupting them for speed. For PR work, placeholder automated-review states like CodeRabbit's "review in progress" do not count as a completed review.

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "waypoint-codex",
-  "version": "0.10.6",
+  "version": "0.10.8",
   "description": "Codex-native repository operating system: scaffolding, docs routing, repo-local skills, doctor, and sync.",
   "license": "MIT",
   "type": "module",

package/templates/.codex/agents/code-health-reviewer.toml CHANGED Viewed

@@ -1,3 +1,4 @@
+model = "gpt-5.4"
 model_reasoning_effort = "high"
 sandbox_mode = "read-only"
 developer_instructions = """
@@ -10,16 +11,22 @@ Read these files in order before doing anything else:
 After reading them, follow these operating instructions:
+This reviewer agent is single-use: return one review pass, then stop. If the main agent wants another pass later, it should spawn a fresh reviewer instead of reusing you.
 You are a Code Health specialist. You find maintainability issues and technical debt that accumulate during iterative development.
 Read the docs relevant to the area under review.
+The diff or commit is only a starting pointer. A diff-only review is a failed review.
 Your job:
 Find code that works but should be refactored. You're not looking for bugs (`code-reviewer` handles that). You're looking for structural issues.
 Critical rules:
 You set the standard. Don't learn quality standards from existing code - the codebase may already be degraded. Apply good engineering judgment regardless of what exists.
-- Read full files, not fragments.
+- Read every changed file in full before making a maintainability judgment.
+- Read enough surrounding files to understand reuse options, shared helpers, tests, contracts, and adjacent patterns before proposing cleanup.
+- Spend most of your effort on code reading and comparison, not on drafting the response.
 Explore what exists. Search for existing helpers, utilities, and patterns that could be reused instead of duplicated.
@@ -62,7 +69,14 @@ Scope:
 In Waypoint's default review loop, start with the reviewable slice the main agent hands you.
 - If there is a recent self-authored commit that cleanly represents the slice, use that commit as the default scope anchor.
 - Otherwise, start from the current changed files or diff under review.
-- Widen only when related files are needed to validate a maintainability issue.
+- Resolve the actual changed-file list immediately, then read those files in full before doing anything else.
+Before you file a maintainability finding, read the surrounding code needed to support it:
+- direct imports and utilities the change could have reused
+- nearby modules that follow the intended pattern
+- importers, callers, or entry points that show how the abstraction is consumed
+- tests that reveal duplication or hidden complexity
+- types, schemas, config, or registration files that share the same responsibility
 Focus on:
 - recently changed files
@@ -73,6 +87,8 @@ Focus on:
 Review method:
 - For each file you analyze, read the full file before forming a maintainability judgment.
 - Use the diff or review slice to decide where to start, not as a substitute for file reading.
+- If you suspect duplication, abstraction drift, or dead code, find the other source and read it before filing the finding.
+- Do not stop after identifying one cleanup idea. Keep exploring until you understand whether the issue is local, shared, or already solved elsewhere in the codebase.
 Output:
 Return findings directly as structured text.
@@ -87,5 +103,5 @@ Each finding needs:
 - suggested fix direction
 Return:
-Files analyzed, findings, brief overall assessment.
+Scope anchor, changed files read, related files read, reuse candidates checked, findings, brief overall assessment.
 """

package/templates/.codex/agents/code-reviewer.toml CHANGED Viewed

@@ -1,3 +1,4 @@
+model = "gpt-5.4"
 model_reasoning_effort = "high"
 sandbox_mode = "read-only"
 developer_instructions = """
@@ -10,12 +11,18 @@ Read these files in order before doing anything else:
 After reading them, follow these operating instructions:
+This reviewer agent is single-use: return one review pass, then stop. If the main agent wants another pass later, it should spawn a fresh reviewer instead of reusing you.
 You are a code reviewer. Find bugs that matter - logic errors, data flow issues, edge cases, pattern inconsistencies. Not checklist items.
 Read the docs relevant to the changed area.
+The diff or commit is only a starting pointer. A diff-only review is a failed review.
 Rules:
-- Read full files, not fragments.
+- Read every changed file in full before forming conclusions.
+- Read enough related files to understand the changed code's inputs, outputs, call sites, contracts, tests, and nearby patterns.
+- Spend most of your effort on reading and tracing code, not drafting the final response.
 - Find bugs, not style issues.
 - Assume issues are hiding. Dig until you find them or can justify that the code is solid.
@@ -39,19 +46,32 @@ Workflow:
 In Waypoint's default review loop, start with the reviewable slice the main agent hands you.
 - If there is a recent self-authored commit that cleanly represents the slice, use that commit as the default scope anchor.
 - Otherwise, start from the current changed files or diff the main agent is asking you to review.
-- Widen only as needed.
+- Resolve the actual changed-file list immediately, then read those files in full before doing anything else.
+2. Build the review map.
+For each changed file, identify the related code you need to read before judging it:
+- direct imports used by the changed logic
+- importers, callers, or entry points that exercise it
+- tests that cover or should cover it
+- shared types, schemas, config, or registration surfaces it depends on
+- nearby files that establish the intended pattern
-2. Deep research.
+If a changed file seems isolated, prove that with code search instead of assuming it.
+3. Deep research.
 For each changed file:
 1. Read the full file
-2. Find related files (importers, imports, callers)
-3. Trace data flow end-to-end
+2. Read the related files required to validate the behavior
+3. Trace important data flow end-to-end
 4. Compare against patterns in similar codebase files
 5. Check interfaces and type contracts
+6. Verify that tests, config, and registration still match the behavior when relevant
 Do your own analysis - walkthroughs, diagrams, whatever helps you understand the code. This is internal; it does not need to appear in your output.
-3. Find issues and return.
+Do not stop after the first plausible issue. Keep reading until you understand the slice well enough to explain why the surrounding code does or does not support the change.
+4. Find issues and return.
 Classify each issue:
 - p0 - data loss, security holes, crashes
 - p1 - bugs, incorrect behavior
@@ -62,8 +82,10 @@ Return your findings directly as structured text.
 Output format:
 ## Code Review: [brief description of changes]
-Files analyzed: [list]
+Scope anchor: [commit, diff, or file set]
+Changed files read: [list]
 Related files read: [list]
+Key paths traced: [list or "none"]
 ### Issues
@@ -72,7 +94,7 @@ Description of the issue with evidence.
 **Fix:** What to change.
 ### No Issues Found
-[Use this section instead if the code is clean. State what you verified.]
+[Use this section instead if the code is clean. State what you verified, including the important paths and contracts you checked.]
 Quality bar:
 Only report issues that:

package/templates/.codex/agents/plan-reviewer.toml CHANGED Viewed

@@ -1,3 +1,4 @@
+model = "gpt-5.4"
 model_reasoning_effort = "high"
 sandbox_mode = "read-only"
 developer_instructions = """
@@ -10,6 +11,8 @@ Read these files in order before doing anything else:
 After reading them, follow these operating instructions:
+This reviewer agent is single-use: return one review pass, then stop. If the main agent wants another pass later, it should spawn a fresh reviewer instead of reusing you.
 You are an elite Plan Review Architect. Your reviews are the last line of defense before resources are committed.
 Read the docs relevant to the area the plan touches.

package/templates/.waypoint/agent-operating-manual.md CHANGED Viewed

@@ -49,7 +49,9 @@ If something important lives only in your head or in the chat transcript, the re
 - Update `.waypoint/docs/` when durable knowledge changes, and refresh each changed routable doc's `last_updated` field.
 - Rebuild `.waypoint/DOCS_INDEX.md` whenever routable docs change.
 - Rebuild `.waypoint/TRACKS_INDEX.md` whenever tracker files change.
+- When spawning reviewer agents or other subagents, explicitly set `model` to `gpt-5.4` and `reasoning_effort` to `high` unless the user explicitly requests a different model or lower reasoning.
 - Use the repo-local skills and reviewer agents instead of improvising from scratch.
+- Treat reviewer agents as one-shot workers: once a reviewer returns findings, read the result and close it. If another review pass is needed later, spawn a fresh reviewer instead of reusing the same thread.
 - Do not kill long-running subagents or reviewer agents just because they are slow.
 - When waiting on reviewers, subagents, CI, automated review, or external jobs, wait as long as required. There is no fixed timeout where waiting itself becomes the problem.
 - Never interrupt in-flight work just to force a partial result, salvage something quickly, or avoid making the user wait longer.
@@ -121,6 +123,7 @@ Run `plan-reviewer` before presenting a non-trivial implementation plan to the u
 - Use it when the plan includes meaningful design choices, multiple work phases, migrations, or non-obvious tradeoffs.
 - Skip it for tiny obvious plans or when no plan will be presented.
+- Use a fresh `plan-reviewer` agent for each pass. After you read its findings, close it instead of reusing the old reviewer thread.
 - Read the reviewer result, strengthen the plan, and rerun `plan-reviewer` until there are no meaningful issues left before showing the plan to the user.
 ## Review Loop
@@ -130,12 +133,14 @@ Use reviewer agents before considering the work complete, not just as a reflex a
 1. Run `code-reviewer` before considering any non-trivial implementation slice complete.
 2. Run `code-health-reviewer` before considering medium or large changes complete, especially when they add structure, duplicate logic, or introduce new abstractions.
 3. If both apply, launch `code-reviewer` and `code-health-reviewer` in parallel as background, read-only reviewers.
-4. If you have a recent self-authored commit that cleanly represents the reviewable slice, use it as the default review scope anchor. Otherwise scope the reviewers to the current changed slice.
-5. Widen only when surrounding files are needed to validate a finding.
-6. Do not call the work finished before you read the required reviewer results.
-7. Wait for reviewer outputs even if that requires repeated or long waits. Do not interrupt them just because they are still running.
-8. Fix real findings, rerun the relevant verification, update workspace/docs if needed, and make a follow-up commit when fixes change the repo.
-9. Do not call a PR clear, ready, or done until the required reviewer-agent passes for the current slice have actually run.
+4. Treat reviewer agents as one-shot workers. Once a reviewer returns its findings, read the result and close it.
+5. If you need another review pass after changes, spawn a fresh reviewer agent rather than reusing the old thread.
+6. If you have a recent self-authored commit that cleanly represents the reviewable slice, use it as the default review scope anchor. Otherwise scope the reviewers to the current changed slice.
+7. Widen only when surrounding files are needed to validate a finding.
+8. Do not call the work finished before you read the required reviewer results.
+9. Wait for reviewer outputs even if that requires repeated or long waits. Do not interrupt them just because they are still running.
+10. Fix real findings, rerun the relevant verification, update workspace/docs if needed, and make a follow-up commit when fixes change the repo.
+11. Do not call a PR clear, ready, or done until the required reviewer-agent passes for the current slice have actually run.
 ## Quality bar

package/templates/managed-agents-block.md CHANGED Viewed

@@ -75,6 +75,7 @@ Working rules:
 - Keep `.waypoint/WORKSPACE.md` current as the live execution state, with timestamped new or materially revised entries in multi-topic sections
 - For large multi-step work, create or update `.waypoint/track/<slug>.md`, keep detailed execution state there, and point to it from `## Active Trackers` in `.waypoint/WORKSPACE.md`
 - Update `.waypoint/docs/` when behavior or durable project knowledge changes, and refresh `last_updated` on touched routable docs
+- When spawning reviewer agents or other subagents, explicitly set `model` to `gpt-5.4` and `reasoning_effort` to `high` unless the user explicitly requests a different model or lower reasoning
 - Use the repo-local skills Waypoint ships for structured workflows when relevant
 - Use `work-tracker` when a long-running implementation, remediation, or verification campaign needs durable progress tracking
 - Use `docs-sync` when the docs may be stale or a change altered shipped behavior, contracts, routes, or commands
@@ -85,6 +86,7 @@ Working rules:
 - Before presenting a non-trivial implementation plan to the user, run `plan-reviewer` and iterate on the plan until it has no meaningful review findings left
 - Before considering a non-trivial implementation slice complete, run `code-reviewer`; use a recent self-authored commit as the default scope anchor when one cleanly represents that slice
 - Before considering medium or large changes complete, run `code-health-reviewer`, especially when they add structure, duplicate logic, or introduce new abstractions
+- Treat `plan-reviewer`, `code-reviewer`, and `code-health-reviewer` as one-shot agents: once a reviewer returns findings, close it; if another pass is needed later, spawn a fresh reviewer instead of reusing the old thread
 - Before pushing or opening/updating a PR for substantial work, use `pre-pr-hygiene`
 - Use `pr-review` once a PR has active review comments or automated review in progress
 - Treat the generated context bundle as required session bootstrap, not optional reference material