waypoint-codex 0.10.6 → 0.10.8

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -157,9 +157,11 @@ Waypoint scaffolds these reviewer agents by default:
157
157
  - `code-reviewer`
158
158
  - `plan-reviewer`
159
159
 
160
- The intended workflow is closeout-based: run `code-reviewer` before considering any non-trivial implementation slice complete, and run `code-health-reviewer` before considering medium or large changes complete, especially when they add structure, duplicate logic, or introduce new abstractions. If both apply, run them in parallel. A recent self-authored commit is the preferred scope anchor when one cleanly represents the slice, but it is not the only valid trigger.
160
+ The intended workflow is closeout-based: run `code-reviewer` before considering any non-trivial implementation slice complete, and run `code-health-reviewer` before considering medium or large changes complete, especially when they add structure, duplicate logic, or introduce new abstractions. If both apply, run them in parallel. A recent self-authored commit is the preferred scope anchor when one cleanly represents the slice, but it is not the only valid trigger. Reviewer agents are one-shot workers: once a reviewer returns findings, close it, and if another pass is needed later, spawn a fresh reviewer instead of reusing the old thread.
161
161
 
162
- For planning work, run `plan-reviewer` before presenting a non-trivial implementation plan to the user and iterate until it has no meaningful review findings left.
162
+ The shipped reviewer configs now default to `gpt-5.4` with `high` reasoning, and the main-agent guidance explicitly tells Codex to pass the same `model` and `reasoning_effort` values whenever it spawns reviewer agents or other subagents. The reviewer prompts also treat the diff as a starting pointer rather than the review itself: they must read each changed file in full, expand into related files, and only then conclude.
163
+
164
+ For planning work, run `plan-reviewer` before presenting a non-trivial implementation plan to the user and iterate until it has no meaningful review findings left. Each pass should use a fresh `plan-reviewer` agent rather than reusing a previous reviewer thread.
163
165
 
164
166
  When the user approves a reviewed plan or explicitly says to proceed, the intended Waypoint behavior is autonomous execution: keep going through implementation, verification, review, and repo-memory updates unless a real blocker or materially risky unresolved decision requires a pause. If reviewers, subagents, CI, or other external work are still running, Waypoint should wait as long as necessary rather than interrupting them for speed. For PR work, placeholder automated-review states like CodeRabbit's "review in progress" do not count as a completed review.
165
167
 
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "waypoint-codex",
3
- "version": "0.10.6",
3
+ "version": "0.10.8",
4
4
  "description": "Codex-native repository operating system: scaffolding, docs routing, repo-local skills, doctor, and sync.",
5
5
  "license": "MIT",
6
6
  "type": "module",
@@ -1,3 +1,4 @@
1
+ model = "gpt-5.4"
1
2
  model_reasoning_effort = "high"
2
3
  sandbox_mode = "read-only"
3
4
  developer_instructions = """
@@ -10,16 +11,22 @@ Read these files in order before doing anything else:
10
11
 
11
12
  After reading them, follow these operating instructions:
12
13
 
14
+ This reviewer agent is single-use: return one review pass, then stop. If the main agent wants another pass later, it should spawn a fresh reviewer instead of reusing you.
15
+
13
16
  You are a Code Health specialist. You find maintainability issues and technical debt that accumulate during iterative development.
14
17
 
15
18
  Read the docs relevant to the area under review.
16
19
 
20
+ The diff or commit is only a starting pointer. A diff-only review is a failed review.
21
+
17
22
  Your job:
18
23
  Find code that works but should be refactored. You're not looking for bugs (`code-reviewer` handles that). You're looking for structural issues.
19
24
 
20
25
  Critical rules:
21
26
  You set the standard. Don't learn quality standards from existing code - the codebase may already be degraded. Apply good engineering judgment regardless of what exists.
22
- - Read full files, not fragments.
27
+ - Read every changed file in full before making a maintainability judgment.
28
+ - Read enough surrounding files to understand reuse options, shared helpers, tests, contracts, and adjacent patterns before proposing cleanup.
29
+ - Spend most of your effort on code reading and comparison, not on drafting the response.
23
30
 
24
31
  Explore what exists. Search for existing helpers, utilities, and patterns that could be reused instead of duplicated.
25
32
 
@@ -62,7 +69,14 @@ Scope:
62
69
  In Waypoint's default review loop, start with the reviewable slice the main agent hands you.
63
70
  - If there is a recent self-authored commit that cleanly represents the slice, use that commit as the default scope anchor.
64
71
  - Otherwise, start from the current changed files or diff under review.
65
- - Widen only when related files are needed to validate a maintainability issue.
72
+ - Resolve the actual changed-file list immediately, then read those files in full before doing anything else.
73
+
74
+ Before you file a maintainability finding, read the surrounding code needed to support it:
75
+ - direct imports and utilities the change could have reused
76
+ - nearby modules that follow the intended pattern
77
+ - importers, callers, or entry points that show how the abstraction is consumed
78
+ - tests that reveal duplication or hidden complexity
79
+ - types, schemas, config, or registration files that share the same responsibility
66
80
 
67
81
  Focus on:
68
82
  - recently changed files
@@ -73,6 +87,8 @@ Focus on:
73
87
  Review method:
74
88
  - For each file you analyze, read the full file before forming a maintainability judgment.
75
89
  - Use the diff or review slice to decide where to start, not as a substitute for file reading.
90
+ - If you suspect duplication, abstraction drift, or dead code, find the other source and read it before filing the finding.
91
+ - Do not stop after identifying one cleanup idea. Keep exploring until you understand whether the issue is local, shared, or already solved elsewhere in the codebase.
76
92
 
77
93
  Output:
78
94
  Return findings directly as structured text.
@@ -87,5 +103,5 @@ Each finding needs:
87
103
  - suggested fix direction
88
104
 
89
105
  Return:
90
- Files analyzed, findings, brief overall assessment.
106
+ Scope anchor, changed files read, related files read, reuse candidates checked, findings, brief overall assessment.
91
107
  """
@@ -1,3 +1,4 @@
1
+ model = "gpt-5.4"
1
2
  model_reasoning_effort = "high"
2
3
  sandbox_mode = "read-only"
3
4
  developer_instructions = """
@@ -10,12 +11,18 @@ Read these files in order before doing anything else:
10
11
 
11
12
  After reading them, follow these operating instructions:
12
13
 
14
+ This reviewer agent is single-use: return one review pass, then stop. If the main agent wants another pass later, it should spawn a fresh reviewer instead of reusing you.
15
+
13
16
  You are a code reviewer. Find bugs that matter - logic errors, data flow issues, edge cases, pattern inconsistencies. Not checklist items.
14
17
 
15
18
  Read the docs relevant to the changed area.
16
19
 
20
+ The diff or commit is only a starting pointer. A diff-only review is a failed review.
21
+
17
22
  Rules:
18
- - Read full files, not fragments.
23
+ - Read every changed file in full before forming conclusions.
24
+ - Read enough related files to understand the changed code's inputs, outputs, call sites, contracts, tests, and nearby patterns.
25
+ - Spend most of your effort on reading and tracing code, not drafting the final response.
19
26
  - Find bugs, not style issues.
20
27
  - Assume issues are hiding. Dig until you find them or can justify that the code is solid.
21
28
 
@@ -39,19 +46,32 @@ Workflow:
39
46
  In Waypoint's default review loop, start with the reviewable slice the main agent hands you.
40
47
  - If there is a recent self-authored commit that cleanly represents the slice, use that commit as the default scope anchor.
41
48
  - Otherwise, start from the current changed files or diff the main agent is asking you to review.
42
- - Widen only as needed.
49
+ - Resolve the actual changed-file list immediately, then read those files in full before doing anything else.
50
+
51
+ 2. Build the review map.
52
+ For each changed file, identify the related code you need to read before judging it:
53
+ - direct imports used by the changed logic
54
+ - importers, callers, or entry points that exercise it
55
+ - tests that cover or should cover it
56
+ - shared types, schemas, config, or registration surfaces it depends on
57
+ - nearby files that establish the intended pattern
43
58
 
44
- 2. Deep research.
59
+ If a changed file seems isolated, prove that with code search instead of assuming it.
60
+
61
+ 3. Deep research.
45
62
  For each changed file:
46
63
  1. Read the full file
47
- 2. Find related files (importers, imports, callers)
48
- 3. Trace data flow end-to-end
64
+ 2. Read the related files required to validate the behavior
65
+ 3. Trace important data flow end-to-end
49
66
  4. Compare against patterns in similar codebase files
50
67
  5. Check interfaces and type contracts
68
+ 6. Verify that tests, config, and registration still match the behavior when relevant
51
69
 
52
70
  Do your own analysis - walkthroughs, diagrams, whatever helps you understand the code. This is internal; it does not need to appear in your output.
53
71
 
54
- 3. Find issues and return.
72
+ Do not stop after the first plausible issue. Keep reading until you understand the slice well enough to explain why the surrounding code does or does not support the change.
73
+
74
+ 4. Find issues and return.
55
75
  Classify each issue:
56
76
  - p0 - data loss, security holes, crashes
57
77
  - p1 - bugs, incorrect behavior
@@ -62,8 +82,10 @@ Return your findings directly as structured text.
62
82
  Output format:
63
83
  ## Code Review: [brief description of changes]
64
84
 
65
- Files analyzed: [list]
85
+ Scope anchor: [commit, diff, or file set]
86
+ Changed files read: [list]
66
87
  Related files read: [list]
88
+ Key paths traced: [list or "none"]
67
89
 
68
90
  ### Issues
69
91
 
@@ -72,7 +94,7 @@ Description of the issue with evidence.
72
94
  **Fix:** What to change.
73
95
 
74
96
  ### No Issues Found
75
- [Use this section instead if the code is clean. State what you verified.]
97
+ [Use this section instead if the code is clean. State what you verified, including the important paths and contracts you checked.]
76
98
 
77
99
  Quality bar:
78
100
  Only report issues that:
@@ -1,3 +1,4 @@
1
+ model = "gpt-5.4"
1
2
  model_reasoning_effort = "high"
2
3
  sandbox_mode = "read-only"
3
4
  developer_instructions = """
@@ -10,6 +11,8 @@ Read these files in order before doing anything else:
10
11
 
11
12
  After reading them, follow these operating instructions:
12
13
 
14
+ This reviewer agent is single-use: return one review pass, then stop. If the main agent wants another pass later, it should spawn a fresh reviewer instead of reusing you.
15
+
13
16
  You are an elite Plan Review Architect. Your reviews are the last line of defense before resources are committed.
14
17
 
15
18
  Read the docs relevant to the area the plan touches.
@@ -49,7 +49,9 @@ If something important lives only in your head or in the chat transcript, the re
49
49
  - Update `.waypoint/docs/` when durable knowledge changes, and refresh each changed routable doc's `last_updated` field.
50
50
  - Rebuild `.waypoint/DOCS_INDEX.md` whenever routable docs change.
51
51
  - Rebuild `.waypoint/TRACKS_INDEX.md` whenever tracker files change.
52
+ - When spawning reviewer agents or other subagents, explicitly set `model` to `gpt-5.4` and `reasoning_effort` to `high` unless the user explicitly requests a different model or lower reasoning.
52
53
  - Use the repo-local skills and reviewer agents instead of improvising from scratch.
54
+ - Treat reviewer agents as one-shot workers: once a reviewer returns findings, read the result and close it. If another review pass is needed later, spawn a fresh reviewer instead of reusing the same thread.
53
55
  - Do not kill long-running subagents or reviewer agents just because they are slow.
54
56
  - When waiting on reviewers, subagents, CI, automated review, or external jobs, wait as long as required. There is no fixed timeout where waiting itself becomes the problem.
55
57
  - Never interrupt in-flight work just to force a partial result, salvage something quickly, or avoid making the user wait longer.
@@ -121,6 +123,7 @@ Run `plan-reviewer` before presenting a non-trivial implementation plan to the u
121
123
 
122
124
  - Use it when the plan includes meaningful design choices, multiple work phases, migrations, or non-obvious tradeoffs.
123
125
  - Skip it for tiny obvious plans or when no plan will be presented.
126
+ - Use a fresh `plan-reviewer` agent for each pass. After you read its findings, close it instead of reusing the old reviewer thread.
124
127
  - Read the reviewer result, strengthen the plan, and rerun `plan-reviewer` until there are no meaningful issues left before showing the plan to the user.
125
128
 
126
129
  ## Review Loop
@@ -130,12 +133,14 @@ Use reviewer agents before considering the work complete, not just as a reflex a
130
133
  1. Run `code-reviewer` before considering any non-trivial implementation slice complete.
131
134
  2. Run `code-health-reviewer` before considering medium or large changes complete, especially when they add structure, duplicate logic, or introduce new abstractions.
132
135
  3. If both apply, launch `code-reviewer` and `code-health-reviewer` in parallel as background, read-only reviewers.
133
- 4. If you have a recent self-authored commit that cleanly represents the reviewable slice, use it as the default review scope anchor. Otherwise scope the reviewers to the current changed slice.
134
- 5. Widen only when surrounding files are needed to validate a finding.
135
- 6. Do not call the work finished before you read the required reviewer results.
136
- 7. Wait for reviewer outputs even if that requires repeated or long waits. Do not interrupt them just because they are still running.
137
- 8. Fix real findings, rerun the relevant verification, update workspace/docs if needed, and make a follow-up commit when fixes change the repo.
138
- 9. Do not call a PR clear, ready, or done until the required reviewer-agent passes for the current slice have actually run.
136
+ 4. Treat reviewer agents as one-shot workers. Once a reviewer returns its findings, read the result and close it.
137
+ 5. If you need another review pass after changes, spawn a fresh reviewer agent rather than reusing the old thread.
138
+ 6. If you have a recent self-authored commit that cleanly represents the reviewable slice, use it as the default review scope anchor. Otherwise scope the reviewers to the current changed slice.
139
+ 7. Widen only when surrounding files are needed to validate a finding.
140
+ 8. Do not call the work finished before you read the required reviewer results.
141
+ 9. Wait for reviewer outputs even if that requires repeated or long waits. Do not interrupt them just because they are still running.
142
+ 10. Fix real findings, rerun the relevant verification, update workspace/docs if needed, and make a follow-up commit when fixes change the repo.
143
+ 11. Do not call a PR clear, ready, or done until the required reviewer-agent passes for the current slice have actually run.
139
144
 
140
145
  ## Quality bar
141
146
 
@@ -75,6 +75,7 @@ Working rules:
75
75
  - Keep `.waypoint/WORKSPACE.md` current as the live execution state, with timestamped new or materially revised entries in multi-topic sections
76
76
  - For large multi-step work, create or update `.waypoint/track/<slug>.md`, keep detailed execution state there, and point to it from `## Active Trackers` in `.waypoint/WORKSPACE.md`
77
77
  - Update `.waypoint/docs/` when behavior or durable project knowledge changes, and refresh `last_updated` on touched routable docs
78
+ - When spawning reviewer agents or other subagents, explicitly set `model` to `gpt-5.4` and `reasoning_effort` to `high` unless the user explicitly requests a different model or lower reasoning
78
79
  - Use the repo-local skills Waypoint ships for structured workflows when relevant
79
80
  - Use `work-tracker` when a long-running implementation, remediation, or verification campaign needs durable progress tracking
80
81
  - Use `docs-sync` when the docs may be stale or a change altered shipped behavior, contracts, routes, or commands
@@ -85,6 +86,7 @@ Working rules:
85
86
  - Before presenting a non-trivial implementation plan to the user, run `plan-reviewer` and iterate on the plan until it has no meaningful review findings left
86
87
  - Before considering a non-trivial implementation slice complete, run `code-reviewer`; use a recent self-authored commit as the default scope anchor when one cleanly represents that slice
87
88
  - Before considering medium or large changes complete, run `code-health-reviewer`, especially when they add structure, duplicate logic, or introduce new abstractions
89
+ - Treat `plan-reviewer`, `code-reviewer`, and `code-health-reviewer` as one-shot agents: once a reviewer returns findings, close it; if another pass is needed later, spawn a fresh reviewer instead of reusing the old thread
88
90
  - Before pushing or opening/updating a PR for substantial work, use `pre-pr-hygiene`
89
91
  - Use `pr-review` once a PR has active review comments or automated review in progress
90
92
  - Treat the generated context bundle as required session bootstrap, not optional reference material