waypoint-codex 0.10.0 → 0.10.2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -147,6 +147,8 @@ Waypoint ships a strong default skill pack for real coding work:
147
147
  These are repo-local, so the workflow travels with the project.
148
148
  `conversation-retrospective`, `break-it-qa`, `frontend-ship-audit`, and `backend-ship-audit` are on-demand skills, not default autonomous agent steps.
149
149
 
150
+ In practice, Waypoint now expects `conversation-retrospective` to run automatically after major completed work pieces so durable learnings, user feedback, errors, and skill improvements do not stay trapped in chat.
151
+
150
152
  ## Reviewer agents
151
153
 
152
154
  Waypoint scaffolds these reviewer agents by default:
@@ -159,7 +161,7 @@ The intended workflow is closeout-based: run `code-reviewer` before considering
159
161
 
160
162
  For planning work, run `plan-reviewer` before presenting a non-trivial implementation plan to the user and iterate until it has no meaningful review findings left.
161
163
 
162
- When the user approves a reviewed plan or explicitly says to proceed, the intended Waypoint behavior is autonomous execution: keep going through implementation, verification, review, and repo-memory updates unless a real blocker or materially risky unresolved decision requires a pause.
164
+ When the user approves a reviewed plan or explicitly says to proceed, the intended Waypoint behavior is autonomous execution: keep going through implementation, verification, review, and repo-memory updates unless a real blocker or materially risky unresolved decision requires a pause. If reviewers, subagents, CI, or other external work are still running, Waypoint should wait as long as necessary rather than interrupting them for speed.
163
165
 
164
166
  ## What makes it different
165
167
 
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "waypoint-codex",
3
- "version": "0.10.0",
3
+ "version": "0.10.2",
4
4
  "description": "Codex-native repository operating system: scaffolding, docs routing, repo-local skills, doctor, and sync.",
5
5
  "license": "MIT",
6
6
  "type": "module",
@@ -28,9 +28,12 @@ Review the current conversation and separate:
28
28
  - durable project knowledge
29
29
  - live execution state
30
30
  - transient chatter
31
+ - direct user feedback, corrections, complaints, and preferences
31
32
 
32
33
  Persist without asking follow-up questions when the correct destination is clear.
33
34
 
35
+ Treat explicit user feedback as a high-priority signal. If the user corrected the approach, rejected a behavior, called out friction, or stated a standing preference, prefer preserving that over the agent's earlier assumptions.
36
+
34
37
  Write durable knowledge to the smallest truthful home the repo already uses:
35
38
 
36
39
  - the main docs or knowledge layer for architecture, behavior, decisions, debugging knowledge, durable plans, and reusable operating guidance
@@ -48,11 +51,33 @@ Do not leave important truths only in chat.
48
51
 
49
52
  Identify which skills were actually used in this conversation, or which existing skills clearly should have covered the workflow but left avoidable gaps.
50
53
 
54
+ For each used or clearly relevant skill, explicitly decide whether it:
55
+
56
+ - succeeded
57
+ - partially succeeded
58
+ - failed
59
+
60
+ Base that judgment on the actual conversation, especially:
61
+
62
+ - direct user feedback
63
+ - whether the skill helped complete the task
64
+ - whether the agent had to work around missing guidance
65
+ - whether concrete errors, dead ends, or repeated corrections happened while using it
66
+
67
+ Distinguish between:
68
+
69
+ - a skill problem
70
+ - an execution mistake by the agent
71
+ - an external/tooling failure
72
+ - a one-off user preference that should not be generalized
73
+
74
+ Only change the skill when the problem is truly in the skill guidance.
75
+
51
76
  For each affected skill:
52
77
 
53
78
  - read the existing skill before editing it
54
79
  - update only reusable guidance, not one-off transcript details
55
- - add missing guardrails, path hints, failure modes, decision rules, or references that would have made the conversation easier to complete
80
+ - add missing guardrails, path hints, failure modes, error-handling guidance, decision rules, or references that would have made the conversation easier to complete
56
81
  - keep `SKILL.md` concise; prefer targeted structural improvements over turning the skill into a diary
57
82
 
58
83
  If the environment has both a source-of-truth skill and one or more mirrored or installed copies, update the source-of-truth version and any copies the user expects to stay in sync.
@@ -91,7 +116,9 @@ Do not invent a refresh command when the repo does not have one.
91
116
  Summarize:
92
117
 
93
118
  - what durable knowledge you saved and where
119
+ - which skills you evaluated and whether they succeeded, partially succeeded, or failed
94
120
  - which skills you improved
121
+ - which concrete errors, failure modes, or repeated friction points you captured
95
122
  - which new skill ideas you recorded, if any
96
123
  - what you intentionally left unpersisted because it was transient
97
124
 
@@ -1,4 +1,4 @@
1
1
  interface:
2
2
  display_name: "Conversation Retrospective"
3
3
  short_description: "Harvest the live conversation into repo memory"
4
- default_prompt: "Use this skill to analyze the active conversation, save durable knowledge into the repo's existing docs, memory, guidance, handoff, or tracker surfaces, improve any skills that were used or exposed gaps, and record new skill ideas without asking follow-up questions when the correct destination is clear."
4
+ default_prompt: "Use this skill to analyze the active conversation, preserve durable knowledge and user feedback in the repo's existing memory surfaces, evaluate whether used skills succeeded or failed, capture concrete errors and friction points, improve skills whose guidance was insufficient, and record new skill ideas without asking follow-up questions when the correct destination is clear."
@@ -14,6 +14,7 @@ Use this skill to drive the PR through review instead of treating review as a on
14
14
  - If automated review is still running, wait for it to finish instead of racing it.
15
15
  - If comments are still arriving, do not prematurely declare the loop complete.
16
16
  - For stacked or non-`main` PRs, explicitly compare the PR head against its base branch and make sure later fixes on the base branch have actually been merged or rebased forward. Do not assume a sibling/base PR fix is already present in the dependent PR.
17
+ - Keep waiting as long as required. Do not interrupt or abandon the review loop just because CI, reviewers, or automated checks are taking a long time.
17
18
 
18
19
  ## Step 2: Read Every Review Comment
19
20
 
@@ -28,6 +29,8 @@ For every comment:
28
29
  - fix it if it is correct and in scope
29
30
  - explain clearly if you are declining it
30
31
  - reply inline where the comment lives instead of posting a disconnected summary comment
32
+ - after pushing a fix, go back and answer the comment thread explicitly so the reviewer can see what changed
33
+ - do not leave a comment thread silent just because the code was updated
31
34
 
32
35
  Do not leave comments unanswered.
33
36
 
@@ -37,9 +40,12 @@ Do not leave comments unanswered.
37
40
  - rerun the relevant verification
38
41
  - if the PR is stacked, repeat the base-vs-head sanity check after pushes so you catch missing forward-merges before the next CI cycle
39
42
  - push follow-up commit(s)
40
- - return to the PR and continue the loop
43
+ - after pushing, return to the PR and wait for the next round of CI, automated review, and human review comments before deciding whether the loop is complete
44
+ - if CI or review is still in flight, keep waiting instead of assuming your last push ended the process
41
45
 
42
46
  Stay in the loop until no new meaningful issues remain.
47
+ Never cut the loop short by forcing a partial return from still-running review or verification systems.
48
+ The loop is not complete while any meaningful review thread still lacks an inline response.
43
49
 
44
50
  ## Step 5: Close With A Crisp State Summary
45
51
 
@@ -22,6 +22,8 @@ You're direct, opinionated, and evidence-driven. You read before you write. You
22
22
 
23
23
  **Approval means ownership.** Once the human approves a plan or tells you to proceed, keep driving until the work is actually complete unless a real blocker or risky unresolved decision requires a pause.
24
24
 
25
+ **Waiting is part of the job.** If reviewers, subagents, CI, or other external work are still running, wait for them. Time alone is not a justification for interruption.
26
+
25
27
  ## How You Work
26
28
 
27
29
  **Read before you write.** Never modify code you haven't read.
@@ -50,7 +50,10 @@ If something important lives only in your head or in the chat transcript, the re
50
50
  - Rebuild `.waypoint/DOCS_INDEX.md` whenever routable docs change.
51
51
  - Rebuild `.waypoint/TRACKS_INDEX.md` whenever tracker files change.
52
52
  - Use the repo-local skills and reviewer agents instead of improvising from scratch.
53
- - Do not kill long-running subagents or reviewer agents just because they are slow. Wait unless they are clearly stuck, failed, or the user redirects the work.
53
+ - Do not kill long-running subagents or reviewer agents just because they are slow.
54
+ - When waiting on reviewers, subagents, CI, automated review, or external jobs, wait as long as required. There is no fixed timeout where waiting itself becomes the problem.
55
+ - Never interrupt in-flight work just to force a partial result, salvage something quickly, or avoid making the user wait longer.
56
+ - Only stop waiting when the work has actually finished, clearly failed, or the user explicitly redirects the work.
54
57
 
55
58
  ## Execution autonomy
56
59
 
@@ -89,12 +92,15 @@ Do not document every trivial implementation detail. Document the non-obvious, d
89
92
  - `work-tracker` when large multi-step work needs durable progress tracking in `.waypoint/track/`
90
93
  - `docs-sync` when routed docs may be stale, missing, or inconsistent with the codebase
91
94
  - `code-guide-audit` when a specific feature or file set needs a targeted coding-guide compliance check
95
+ - `conversation-retrospective` after major completed work pieces so the active conversation is distilled into durable memory, user feedback and errors are preserved, exercised skills are improved, and real new-skill candidates are recorded
92
96
  - `break-it-qa` when a browser-facing feature should be attacked with invalid inputs, refreshes, repeated clicks, wrong action order, or other adversarial manual QA
93
97
  - `frontend-ship-audit` and `backend-ship-audit` only when the user explicitly requests a ship-readiness audit; do not trigger them autonomously as part of the default Waypoint workflow
94
98
  - `workspace-compress` after meaningful chunks, before stopping, and before review when the live handoff needs compression
95
99
  - `pre-pr-hygiene` before pushing or opening/updating a PR for substantial work
96
100
  - `pr-review` once a PR has active review comments or automated review in progress
97
101
 
102
+ Treat `conversation-retrospective` as a default closeout step for major work pieces, not as a rare manual tool.
103
+
98
104
  ## When to use the reviewer agents
99
105
 
100
106
  Waypoint scaffolds these focused second-pass specialists by default:
@@ -121,7 +127,8 @@ Use reviewer agents before considering the work complete, not just as a reflex a
121
127
  4. If you have a recent self-authored commit that cleanly represents the reviewable slice, use it as the default review scope anchor. Otherwise scope the reviewers to the current changed slice.
122
128
  5. Widen only when surrounding files are needed to validate a finding.
123
129
  6. Do not call the work finished before you read the required reviewer results.
124
- 7. Fix real findings, rerun the relevant verification, update workspace/docs if needed, and make a follow-up commit when fixes change the repo.
130
+ 7. Wait for reviewer outputs even if that requires repeated or long waits. Do not interrupt them just because they are still running.
131
+ 8. Fix real findings, rerun the relevant verification, update workspace/docs if needed, and make a follow-up commit when fixes change the repo.
125
132
 
126
133
  ## Quality bar
127
134
 
@@ -67,6 +67,7 @@ If some uncertainty still remains after checking persisted context and interview
67
67
  Prefer existing persisted context over re-interviewing the user.
68
68
 
69
69
  If the user approves a plan or explicitly tells you to proceed, treat that as authorization to execute the work end to end. Do not stop mid-implementation for incremental permission unless a real blocker, hidden-risk decision, or explicit user redirect requires a pause.
70
+ When work is in flight elsewhere — reviewer agents, subagents, CI, automated review, external jobs, or other waiting periods — wait as long as required. There is no fixed waiting limit, and slowness alone is not a reason to interrupt or abandon the work.
70
71
 
71
72
  Working rules:
72
73
  - Keep `.waypoint/WORKSPACE.md` current as the live execution state, with timestamped new or materially revised entries in multi-topic sections
@@ -76,6 +77,7 @@ Working rules:
76
77
  - Use `work-tracker` when a long-running implementation, remediation, or verification campaign needs durable progress tracking
77
78
  - Use `docs-sync` when the docs may be stale or a change altered shipped behavior, contracts, routes, or commands
78
79
  - Use `code-guide-audit` for a targeted coding-guide compliance pass on a specific feature, file set, or change slice
80
+ - Use `conversation-retrospective` after major completed work pieces to preserve durable learnings, capture user feedback and errors, improve any skills that were exercised, and record real new-skill candidates
79
81
  - Do not invoke `break-it-qa`, `frontend-ship-audit`, or `backend-ship-audit` yourself from the managed AGENTS block workflow; they are user-facing skills for explicit human-requested QA or ship-readiness audits, not default agent steps
80
82
  - Before presenting a non-trivial implementation plan to the user, run `plan-reviewer` and iterate on the plan until it has no meaningful review findings left
81
83
  - Before considering a non-trivial implementation slice complete, run `code-reviewer`; use a recent self-authored commit as the default scope anchor when one cleanly represents that slice