waypoint-codex 0.10.0 → 0.10.2
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +3 -1
- package/package.json +1 -1
- package/templates/.agents/skills/conversation-retrospective/SKILL.md +28 -1
- package/templates/.agents/skills/conversation-retrospective/agents/openai.yaml +1 -1
- package/templates/.agents/skills/pr-review/SKILL.md +7 -1
- package/templates/.waypoint/SOUL.md +2 -0
- package/templates/.waypoint/agent-operating-manual.md +9 -2
- package/templates/managed-agents-block.md +2 -0
package/README.md
CHANGED
|
@@ -147,6 +147,8 @@ Waypoint ships a strong default skill pack for real coding work:
|
|
|
147
147
|
These are repo-local, so the workflow travels with the project.
|
|
148
148
|
`conversation-retrospective`, `break-it-qa`, `frontend-ship-audit`, and `backend-ship-audit` are on-demand skills, not default autonomous agent steps.
|
|
149
149
|
|
|
150
|
+
In practice, Waypoint now expects `conversation-retrospective` to run automatically after major completed work pieces so durable learnings, user feedback, errors, and skill improvements do not stay trapped in chat.
|
|
151
|
+
|
|
150
152
|
## Reviewer agents
|
|
151
153
|
|
|
152
154
|
Waypoint scaffolds these reviewer agents by default:
|
|
@@ -159,7 +161,7 @@ The intended workflow is closeout-based: run `code-reviewer` before considering
|
|
|
159
161
|
|
|
160
162
|
For planning work, run `plan-reviewer` before presenting a non-trivial implementation plan to the user and iterate until it has no meaningful review findings left.
|
|
161
163
|
|
|
162
|
-
When the user approves a reviewed plan or explicitly says to proceed, the intended Waypoint behavior is autonomous execution: keep going through implementation, verification, review, and repo-memory updates unless a real blocker or materially risky unresolved decision requires a pause.
|
|
164
|
+
When the user approves a reviewed plan or explicitly says to proceed, the intended Waypoint behavior is autonomous execution: keep going through implementation, verification, review, and repo-memory updates unless a real blocker or materially risky unresolved decision requires a pause. If reviewers, subagents, CI, or other external work are still running, Waypoint should wait as long as necessary rather than interrupting them for speed.
|
|
163
165
|
|
|
164
166
|
## What makes it different
|
|
165
167
|
|
package/package.json
CHANGED
|
@@ -28,9 +28,12 @@ Review the current conversation and separate:
|
|
|
28
28
|
- durable project knowledge
|
|
29
29
|
- live execution state
|
|
30
30
|
- transient chatter
|
|
31
|
+
- direct user feedback, corrections, complaints, and preferences
|
|
31
32
|
|
|
32
33
|
Persist without asking follow-up questions when the correct destination is clear.
|
|
33
34
|
|
|
35
|
+
Treat explicit user feedback as a high-priority signal. If the user corrected the approach, rejected a behavior, called out friction, or stated a standing preference, prefer preserving that over the agent's earlier assumptions.
|
|
36
|
+
|
|
34
37
|
Write durable knowledge to the smallest truthful home the repo already uses:
|
|
35
38
|
|
|
36
39
|
- the main docs or knowledge layer for architecture, behavior, decisions, debugging knowledge, durable plans, and reusable operating guidance
|
|
@@ -48,11 +51,33 @@ Do not leave important truths only in chat.
|
|
|
48
51
|
|
|
49
52
|
Identify which skills were actually used in this conversation, or which existing skills clearly should have covered the workflow but left avoidable gaps.
|
|
50
53
|
|
|
54
|
+
For each used or clearly relevant skill, explicitly decide whether it:
|
|
55
|
+
|
|
56
|
+
- succeeded
|
|
57
|
+
- partially succeeded
|
|
58
|
+
- failed
|
|
59
|
+
|
|
60
|
+
Base that judgment on the actual conversation, especially:
|
|
61
|
+
|
|
62
|
+
- direct user feedback
|
|
63
|
+
- whether the skill helped complete the task
|
|
64
|
+
- whether the agent had to work around missing guidance
|
|
65
|
+
- whether concrete errors, dead ends, or repeated corrections happened while using it
|
|
66
|
+
|
|
67
|
+
Distinguish between:
|
|
68
|
+
|
|
69
|
+
- a skill problem
|
|
70
|
+
- an execution mistake by the agent
|
|
71
|
+
- an external/tooling failure
|
|
72
|
+
- a one-off user preference that should not be generalized
|
|
73
|
+
|
|
74
|
+
Only change the skill when the problem is truly in the skill guidance.
|
|
75
|
+
|
|
51
76
|
For each affected skill:
|
|
52
77
|
|
|
53
78
|
- read the existing skill before editing it
|
|
54
79
|
- update only reusable guidance, not one-off transcript details
|
|
55
|
-
- add missing guardrails, path hints, failure modes, decision rules, or references that would have made the conversation easier to complete
|
|
80
|
+
- add missing guardrails, path hints, failure modes, error-handling guidance, decision rules, or references that would have made the conversation easier to complete
|
|
56
81
|
- keep `SKILL.md` concise; prefer targeted structural improvements over turning the skill into a diary
|
|
57
82
|
|
|
58
83
|
If the environment has both a source-of-truth skill and one or more mirrored or installed copies, update the source-of-truth version and any copies the user expects to stay in sync.
|
|
@@ -91,7 +116,9 @@ Do not invent a refresh command when the repo does not have one.
|
|
|
91
116
|
Summarize:
|
|
92
117
|
|
|
93
118
|
- what durable knowledge you saved and where
|
|
119
|
+
- which skills you evaluated and whether they succeeded, partially succeeded, or failed
|
|
94
120
|
- which skills you improved
|
|
121
|
+
- which concrete errors, failure modes, or repeated friction points you captured
|
|
95
122
|
- which new skill ideas you recorded, if any
|
|
96
123
|
- what you intentionally left unpersisted because it was transient
|
|
97
124
|
|
|
@@ -1,4 +1,4 @@
|
|
|
1
1
|
interface:
|
|
2
2
|
display_name: "Conversation Retrospective"
|
|
3
3
|
short_description: "Harvest the live conversation into repo memory"
|
|
4
|
-
default_prompt: "Use this skill to analyze the active conversation,
|
|
4
|
+
default_prompt: "Use this skill to analyze the active conversation, preserve durable knowledge and user feedback in the repo's existing memory surfaces, evaluate whether used skills succeeded or failed, capture concrete errors and friction points, improve skills whose guidance was insufficient, and record new skill ideas without asking follow-up questions when the correct destination is clear."
|
|
@@ -14,6 +14,7 @@ Use this skill to drive the PR through review instead of treating review as a on
|
|
|
14
14
|
- If automated review is still running, wait for it to finish instead of racing it.
|
|
15
15
|
- If comments are still arriving, do not prematurely declare the loop complete.
|
|
16
16
|
- For stacked or non-`main` PRs, explicitly compare the PR head against its base branch and make sure later fixes on the base branch have actually been merged or rebased forward. Do not assume a sibling/base PR fix is already present in the dependent PR.
|
|
17
|
+
- Keep waiting as long as required. Do not interrupt or abandon the review loop just because CI, reviewers, or automated checks are taking a long time.
|
|
17
18
|
|
|
18
19
|
## Step 2: Read Every Review Comment
|
|
19
20
|
|
|
@@ -28,6 +29,8 @@ For every comment:
|
|
|
28
29
|
- fix it if it is correct and in scope
|
|
29
30
|
- explain clearly if you are declining it
|
|
30
31
|
- reply inline where the comment lives instead of posting a disconnected summary comment
|
|
32
|
+
- after pushing a fix, go back and answer the comment thread explicitly so the reviewer can see what changed
|
|
33
|
+
- do not leave a comment thread silent just because the code was updated
|
|
31
34
|
|
|
32
35
|
Do not leave comments unanswered.
|
|
33
36
|
|
|
@@ -37,9 +40,12 @@ Do not leave comments unanswered.
|
|
|
37
40
|
- rerun the relevant verification
|
|
38
41
|
- if the PR is stacked, repeat the base-vs-head sanity check after pushes so you catch missing forward-merges before the next CI cycle
|
|
39
42
|
- push follow-up commit(s)
|
|
40
|
-
- return to the PR and
|
|
43
|
+
- after pushing, return to the PR and wait for the next round of CI, automated review, and human review comments before deciding whether the loop is complete
|
|
44
|
+
- if CI or review is still in flight, keep waiting instead of assuming your last push ended the process
|
|
41
45
|
|
|
42
46
|
Stay in the loop until no new meaningful issues remain.
|
|
47
|
+
Never cut the loop short by forcing a partial return from still-running review or verification systems.
|
|
48
|
+
The loop is not complete while any meaningful review thread still lacks an inline response.
|
|
43
49
|
|
|
44
50
|
## Step 5: Close With A Crisp State Summary
|
|
45
51
|
|
|
@@ -22,6 +22,8 @@ You're direct, opinionated, and evidence-driven. You read before you write. You
|
|
|
22
22
|
|
|
23
23
|
**Approval means ownership.** Once the human approves a plan or tells you to proceed, keep driving until the work is actually complete unless a real blocker or risky unresolved decision requires a pause.
|
|
24
24
|
|
|
25
|
+
**Waiting is part of the job.** If reviewers, subagents, CI, or other external work are still running, wait for them. Time alone is not a justification for interruption.
|
|
26
|
+
|
|
25
27
|
## How You Work
|
|
26
28
|
|
|
27
29
|
**Read before you write.** Never modify code you haven't read.
|
|
@@ -50,7 +50,10 @@ If something important lives only in your head or in the chat transcript, the re
|
|
|
50
50
|
- Rebuild `.waypoint/DOCS_INDEX.md` whenever routable docs change.
|
|
51
51
|
- Rebuild `.waypoint/TRACKS_INDEX.md` whenever tracker files change.
|
|
52
52
|
- Use the repo-local skills and reviewer agents instead of improvising from scratch.
|
|
53
|
-
- Do not kill long-running subagents or reviewer agents just because they are slow.
|
|
53
|
+
- Do not kill long-running subagents or reviewer agents just because they are slow.
|
|
54
|
+
- When waiting on reviewers, subagents, CI, automated review, or external jobs, wait as long as required. There is no fixed timeout where waiting itself becomes the problem.
|
|
55
|
+
- Never interrupt in-flight work just to force a partial result, salvage something quickly, or avoid making the user wait longer.
|
|
56
|
+
- Only stop waiting when the work has actually finished, clearly failed, or the user explicitly redirects the work.
|
|
54
57
|
|
|
55
58
|
## Execution autonomy
|
|
56
59
|
|
|
@@ -89,12 +92,15 @@ Do not document every trivial implementation detail. Document the non-obvious, d
|
|
|
89
92
|
- `work-tracker` when large multi-step work needs durable progress tracking in `.waypoint/track/`
|
|
90
93
|
- `docs-sync` when routed docs may be stale, missing, or inconsistent with the codebase
|
|
91
94
|
- `code-guide-audit` when a specific feature or file set needs a targeted coding-guide compliance check
|
|
95
|
+
- `conversation-retrospective` after major completed work pieces so the active conversation is distilled into durable memory, user feedback and errors are preserved, exercised skills are improved, and real new-skill candidates are recorded
|
|
92
96
|
- `break-it-qa` when a browser-facing feature should be attacked with invalid inputs, refreshes, repeated clicks, wrong action order, or other adversarial manual QA
|
|
93
97
|
- `frontend-ship-audit` and `backend-ship-audit` only when the user explicitly requests a ship-readiness audit; do not trigger them autonomously as part of the default Waypoint workflow
|
|
94
98
|
- `workspace-compress` after meaningful chunks, before stopping, and before review when the live handoff needs compression
|
|
95
99
|
- `pre-pr-hygiene` before pushing or opening/updating a PR for substantial work
|
|
96
100
|
- `pr-review` once a PR has active review comments or automated review in progress
|
|
97
101
|
|
|
102
|
+
Treat `conversation-retrospective` as a default closeout step for major work pieces, not as a rare manual tool.
|
|
103
|
+
|
|
98
104
|
## When to use the reviewer agents
|
|
99
105
|
|
|
100
106
|
Waypoint scaffolds these focused second-pass specialists by default:
|
|
@@ -121,7 +127,8 @@ Use reviewer agents before considering the work complete, not just as a reflex a
|
|
|
121
127
|
4. If you have a recent self-authored commit that cleanly represents the reviewable slice, use it as the default review scope anchor. Otherwise scope the reviewers to the current changed slice.
|
|
122
128
|
5. Widen only when surrounding files are needed to validate a finding.
|
|
123
129
|
6. Do not call the work finished before you read the required reviewer results.
|
|
124
|
-
7.
|
|
130
|
+
7. Wait for reviewer outputs even if that requires repeated or long waits. Do not interrupt them just because they are still running.
|
|
131
|
+
8. Fix real findings, rerun the relevant verification, update workspace/docs if needed, and make a follow-up commit when fixes change the repo.
|
|
125
132
|
|
|
126
133
|
## Quality bar
|
|
127
134
|
|
|
@@ -67,6 +67,7 @@ If some uncertainty still remains after checking persisted context and interview
|
|
|
67
67
|
Prefer existing persisted context over re-interviewing the user.
|
|
68
68
|
|
|
69
69
|
If the user approves a plan or explicitly tells you to proceed, treat that as authorization to execute the work end to end. Do not stop mid-implementation for incremental permission unless a real blocker, hidden-risk decision, or explicit user redirect requires a pause.
|
|
70
|
+
When work is in flight elsewhere — reviewer agents, subagents, CI, automated review, external jobs, or other waiting periods — wait as long as required. There is no fixed waiting limit, and slowness alone is not a reason to interrupt or abandon the work.
|
|
70
71
|
|
|
71
72
|
Working rules:
|
|
72
73
|
- Keep `.waypoint/WORKSPACE.md` current as the live execution state, with timestamped new or materially revised entries in multi-topic sections
|
|
@@ -76,6 +77,7 @@ Working rules:
|
|
|
76
77
|
- Use `work-tracker` when a long-running implementation, remediation, or verification campaign needs durable progress tracking
|
|
77
78
|
- Use `docs-sync` when the docs may be stale or a change altered shipped behavior, contracts, routes, or commands
|
|
78
79
|
- Use `code-guide-audit` for a targeted coding-guide compliance pass on a specific feature, file set, or change slice
|
|
80
|
+
- Use `conversation-retrospective` after major completed work pieces to preserve durable learnings, capture user feedback and errors, improve any skills that were exercised, and record real new-skill candidates
|
|
79
81
|
- Do not invoke `break-it-qa`, `frontend-ship-audit`, or `backend-ship-audit` yourself from the managed AGENTS block workflow; they are user-facing skills for explicit human-requested QA or ship-readiness audits, not default agent steps
|
|
80
82
|
- Before presenting a non-trivial implementation plan to the user, run `plan-reviewer` and iterate on the plan until it has no meaningful review findings left
|
|
81
83
|
- Before considering a non-trivial implementation slice complete, run `code-reviewer`; use a recent self-authored commit as the default scope anchor when one cleanly represents that slice
|