npm - waypoint-codex - Versions diffs - 0.10.2 → 0.10.4 - Mend

waypoint-codex 0.10.2 → 0.10.4

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (6) hide show

package/README.md +3 -1
package/package.json +1 -1
package/templates/.agents/skills/break-it-qa/SKILL.md +3 -0
package/templates/.agents/skills/pr-review/SKILL.md +5 -0
package/templates/.waypoint/agent-operating-manual.md +4 -0
package/templates/managed-agents-block.md +1 -0

package/README.md CHANGED Viewed

@@ -161,7 +161,9 @@ The intended workflow is closeout-based: run `code-reviewer` before considering
 For planning work, run `plan-reviewer` before presenting a non-trivial implementation plan to the user and iterate until it has no meaningful review findings left.
-When the user approves a reviewed plan or explicitly says to proceed, the intended Waypoint behavior is autonomous execution: keep going through implementation, verification, review, and repo-memory updates unless a real blocker or materially risky unresolved decision requires a pause. If reviewers, subagents, CI, or other external work are still running, Waypoint should wait as long as necessary rather than interrupting them for speed.
+When the user approves a reviewed plan or explicitly says to proceed, the intended Waypoint behavior is autonomous execution: keep going through implementation, verification, review, and repo-memory updates unless a real blocker or materially risky unresolved decision requires a pause. If reviewers, subagents, CI, or other external work are still running, Waypoint should wait as long as necessary rather than interrupting them for speed. For PR work, placeholder automated-review states like CodeRabbit's "review in progress" do not count as a completed review.
+When browser-based reproduction or verification is part of the work, Waypoint should also send screenshots of the relevant UI states so the user can see the evidence directly.
 ## What makes it different

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "waypoint-codex",
-  "version": "0.10.2",
+  "version": "0.10.4",
   "description": "Codex-native repository operating system: scaffolding, docs routing, repo-local skills, doctor, and sync.",
   "license": "MIT",
   "type": "module",

package/templates/.agents/skills/break-it-qa/SKILL.md CHANGED Viewed

@@ -118,6 +118,7 @@ Anti-cheating rules:
 - Use `playwright-interactive`.
 - Exercise the actual UI instead of mocking the flow in code.
 - Keep the scope focused on the feature the user asked you to verify.
+- Capture screenshots of the important states you observe so the user can see the evidence directly.
 ## Step 7: Try To Break It On Purpose
@@ -151,6 +152,7 @@ As you test, keep expanding the break log with new "What if...?" cases that emer
 - Update docs when the verification exposes stale assumptions about how the feature works.
 - Update the break log entry for each attempted action with what happened and whether the feature survived.
 - Require a short observed-result note for every executed item. "Worked" is too weak; capture what actually happened.
+- Save screenshots for the key broken, risky, or fixed states as you go.
 Do not stop at the first bug.
@@ -174,6 +176,7 @@ Summarize:
 - the path to the break log markdown file
 - how many attack items were recorded and exercised
 - how coverage was distributed across steps and categories
+- which screenshots you captured and what each one shows
 - what break attempts you tried
 - which issues you found
 - what you fixed

package/templates/.agents/skills/pr-review/SKILL.md CHANGED Viewed

@@ -12,6 +12,9 @@ Use this skill to drive the PR through review instead of treating review as a on
 - Check the PR's current review and CI status.
 - If CI is red or pending, inspect the failed check logs before triaging review comments so you do not chase comment fixes while a separate blocker is breaking the branch.
 - If automated review is still running, wait for it to finish instead of racing it.
+- Treat placeholder messages such as CodeRabbit's "review in progress" as unfinished state, not as a meaningful review result.
+- If an automated reviewer like CodeRabbit is still pending, in progress, or has not reached a green/completed check state yet, keep waiting before you conclude there are no findings.
+- Once the automated reviewer check turns green/completed, reread the review comments and threads because the real findings may only appear after the placeholder state clears.
 - If comments are still arriving, do not prematurely declare the loop complete.
 - For stacked or non-`main` PRs, explicitly compare the PR head against its base branch and make sure later fixes on the base branch have actually been merged or rebased forward. Do not assume a sibling/base PR fix is already present in the dependent PR.
 - Keep waiting as long as required. Do not interrupt or abandon the review loop just because CI, reviewers, or automated checks are taking a long time.
@@ -42,10 +45,12 @@ Do not leave comments unanswered.
 - push follow-up commit(s)
 - after pushing, return to the PR and wait for the next round of CI, automated review, and human review comments before deciding whether the loop is complete
 - if CI or review is still in flight, keep waiting instead of assuming your last push ended the process
+- before declaring the PR clear or ready, make sure the required Waypoint reviewer agents for this slice have actually run and that their real findings, if any, were handled
 Stay in the loop until no new meaningful issues remain.
 Never cut the loop short by forcing a partial return from still-running review or verification systems.
 The loop is not complete while any meaningful review thread still lacks an inline response.
+The loop is also not complete if required Waypoint reviewer-agent passes for the current slice have not been run yet.
 ## Step 5: Close With A Crisp State Summary

package/templates/.waypoint/agent-operating-manual.md CHANGED Viewed

@@ -54,6 +54,9 @@ If something important lives only in your head or in the chat transcript, the re
 - When waiting on reviewers, subagents, CI, automated review, or external jobs, wait as long as required. There is no fixed timeout where waiting itself becomes the problem.
 - Never interrupt in-flight work just to force a partial result, salvage something quickly, or avoid making the user wait longer.
 - Only stop waiting when the work has actually finished, clearly failed, or the user explicitly redirects the work.
+- When browser work is part of reproduction or verification, send screenshots of the relevant UI states to the user so they can visually confirm what you observed.
+- Capture the states that matter, such as the broken state, the fixed state, or an important intermediate state that explains the issue.
+- If the current environment cannot provide screenshots, state that explicitly instead of silently omitting visual evidence.
 ## Execution autonomy
@@ -129,6 +132,7 @@ Use reviewer agents before considering the work complete, not just as a reflex a
 6. Do not call the work finished before you read the required reviewer results.
 7. Wait for reviewer outputs even if that requires repeated or long waits. Do not interrupt them just because they are still running.
 8. Fix real findings, rerun the relevant verification, update workspace/docs if needed, and make a follow-up commit when fixes change the repo.
+9. Do not call a PR clear, ready, or done until the required reviewer-agent passes for the current slice have actually run.
 ## Quality bar

package/templates/managed-agents-block.md CHANGED Viewed

@@ -68,6 +68,7 @@ Prefer existing persisted context over re-interviewing the user.
 If the user approves a plan or explicitly tells you to proceed, treat that as authorization to execute the work end to end. Do not stop mid-implementation for incremental permission unless a real blocker, hidden-risk decision, or explicit user redirect requires a pause.
 When work is in flight elsewhere — reviewer agents, subagents, CI, automated review, external jobs, or other waiting periods — wait as long as required. There is no fixed waiting limit, and slowness alone is not a reason to interrupt or abandon the work.
+When using a browser to reproduce a bug, verify behavior, or confirm that a fix works, send the user screenshots of the relevant UI states so they can see the evidence directly. If screenshots are not possible in the current environment, say so explicitly.
 Working rules:
 - Keep `.waypoint/WORKSPACE.md` current as the live execution state, with timestamped new or materially revised entries in multi-topic sections