waypoint-codex 0.10.1 → 0.10.2
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +2 -0
- package/package.json +1 -1
- package/templates/.agents/skills/conversation-retrospective/SKILL.md +28 -1
- package/templates/.agents/skills/conversation-retrospective/agents/openai.yaml +1 -1
- package/templates/.waypoint/agent-operating-manual.md +3 -0
- package/templates/managed-agents-block.md +1 -0
package/README.md
CHANGED
|
@@ -147,6 +147,8 @@ Waypoint ships a strong default skill pack for real coding work:
|
|
|
147
147
|
These are repo-local, so the workflow travels with the project.
|
|
148
148
|
`conversation-retrospective`, `break-it-qa`, `frontend-ship-audit`, and `backend-ship-audit` are on-demand skills, not default autonomous agent steps.
|
|
149
149
|
|
|
150
|
+
In practice, Waypoint now expects `conversation-retrospective` to run automatically after major completed work pieces so durable learnings, user feedback, errors, and skill improvements do not stay trapped in chat.
|
|
151
|
+
|
|
150
152
|
## Reviewer agents
|
|
151
153
|
|
|
152
154
|
Waypoint scaffolds these reviewer agents by default:
|
package/package.json
CHANGED
|
@@ -28,9 +28,12 @@ Review the current conversation and separate:
|
|
|
28
28
|
- durable project knowledge
|
|
29
29
|
- live execution state
|
|
30
30
|
- transient chatter
|
|
31
|
+
- direct user feedback, corrections, complaints, and preferences
|
|
31
32
|
|
|
32
33
|
Persist without asking follow-up questions when the correct destination is clear.
|
|
33
34
|
|
|
35
|
+
Treat explicit user feedback as a high-priority signal. If the user corrected the approach, rejected a behavior, called out friction, or stated a standing preference, prefer preserving that over the agent's earlier assumptions.
|
|
36
|
+
|
|
34
37
|
Write durable knowledge to the smallest truthful home the repo already uses:
|
|
35
38
|
|
|
36
39
|
- the main docs or knowledge layer for architecture, behavior, decisions, debugging knowledge, durable plans, and reusable operating guidance
|
|
@@ -48,11 +51,33 @@ Do not leave important truths only in chat.
|
|
|
48
51
|
|
|
49
52
|
Identify which skills were actually used in this conversation, or which existing skills clearly should have covered the workflow but left avoidable gaps.
|
|
50
53
|
|
|
54
|
+
For each used or clearly relevant skill, explicitly decide whether it:
|
|
55
|
+
|
|
56
|
+
- succeeded
|
|
57
|
+
- partially succeeded
|
|
58
|
+
- failed
|
|
59
|
+
|
|
60
|
+
Base that judgment on the actual conversation, especially:
|
|
61
|
+
|
|
62
|
+
- direct user feedback
|
|
63
|
+
- whether the skill helped complete the task
|
|
64
|
+
- whether the agent had to work around missing guidance
|
|
65
|
+
- whether concrete errors, dead ends, or repeated corrections happened while using it
|
|
66
|
+
|
|
67
|
+
Distinguish between:
|
|
68
|
+
|
|
69
|
+
- a skill problem
|
|
70
|
+
- an execution mistake by the agent
|
|
71
|
+
- an external/tooling failure
|
|
72
|
+
- a one-off user preference that should not be generalized
|
|
73
|
+
|
|
74
|
+
Only change the skill when the problem is truly in the skill guidance.
|
|
75
|
+
|
|
51
76
|
For each affected skill:
|
|
52
77
|
|
|
53
78
|
- read the existing skill before editing it
|
|
54
79
|
- update only reusable guidance, not one-off transcript details
|
|
55
|
-
- add missing guardrails, path hints, failure modes, decision rules, or references that would have made the conversation easier to complete
|
|
80
|
+
- add missing guardrails, path hints, failure modes, error-handling guidance, decision rules, or references that would have made the conversation easier to complete
|
|
56
81
|
- keep `SKILL.md` concise; prefer targeted structural improvements over turning the skill into a diary
|
|
57
82
|
|
|
58
83
|
If the environment has both a source-of-truth skill and one or more mirrored or installed copies, update the source-of-truth version and any copies the user expects to stay in sync.
|
|
@@ -91,7 +116,9 @@ Do not invent a refresh command when the repo does not have one.
|
|
|
91
116
|
Summarize:
|
|
92
117
|
|
|
93
118
|
- what durable knowledge you saved and where
|
|
119
|
+
- which skills you evaluated and whether they succeeded, partially succeeded, or failed
|
|
94
120
|
- which skills you improved
|
|
121
|
+
- which concrete errors, failure modes, or repeated friction points you captured
|
|
95
122
|
- which new skill ideas you recorded, if any
|
|
96
123
|
- what you intentionally left unpersisted because it was transient
|
|
97
124
|
|
|
@@ -1,4 +1,4 @@
|
|
|
1
1
|
interface:
|
|
2
2
|
display_name: "Conversation Retrospective"
|
|
3
3
|
short_description: "Harvest the live conversation into repo memory"
|
|
4
|
-
default_prompt: "Use this skill to analyze the active conversation,
|
|
4
|
+
default_prompt: "Use this skill to analyze the active conversation, preserve durable knowledge and user feedback in the repo's existing memory surfaces, evaluate whether used skills succeeded or failed, capture concrete errors and friction points, improve skills whose guidance was insufficient, and record new skill ideas without asking follow-up questions when the correct destination is clear."
|
|
@@ -92,12 +92,15 @@ Do not document every trivial implementation detail. Document the non-obvious, d
|
|
|
92
92
|
- `work-tracker` when large multi-step work needs durable progress tracking in `.waypoint/track/`
|
|
93
93
|
- `docs-sync` when routed docs may be stale, missing, or inconsistent with the codebase
|
|
94
94
|
- `code-guide-audit` when a specific feature or file set needs a targeted coding-guide compliance check
|
|
95
|
+
- `conversation-retrospective` after major completed work pieces so the active conversation is distilled into durable memory, user feedback and errors are preserved, exercised skills are improved, and real new-skill candidates are recorded
|
|
95
96
|
- `break-it-qa` when a browser-facing feature should be attacked with invalid inputs, refreshes, repeated clicks, wrong action order, or other adversarial manual QA
|
|
96
97
|
- `frontend-ship-audit` and `backend-ship-audit` only when the user explicitly requests a ship-readiness audit; do not trigger them autonomously as part of the default Waypoint workflow
|
|
97
98
|
- `workspace-compress` after meaningful chunks, before stopping, and before review when the live handoff needs compression
|
|
98
99
|
- `pre-pr-hygiene` before pushing or opening/updating a PR for substantial work
|
|
99
100
|
- `pr-review` once a PR has active review comments or automated review in progress
|
|
100
101
|
|
|
102
|
+
Treat `conversation-retrospective` as a default closeout step for major work pieces, not as a rare manual tool.
|
|
103
|
+
|
|
101
104
|
## When to use the reviewer agents
|
|
102
105
|
|
|
103
106
|
Waypoint scaffolds these focused second-pass specialists by default:
|
|
@@ -77,6 +77,7 @@ Working rules:
|
|
|
77
77
|
- Use `work-tracker` when a long-running implementation, remediation, or verification campaign needs durable progress tracking
|
|
78
78
|
- Use `docs-sync` when the docs may be stale or a change altered shipped behavior, contracts, routes, or commands
|
|
79
79
|
- Use `code-guide-audit` for a targeted coding-guide compliance pass on a specific feature, file set, or change slice
|
|
80
|
+
- Use `conversation-retrospective` after major completed work pieces to preserve durable learnings, capture user feedback and errors, improve any skills that were exercised, and record real new-skill candidates
|
|
80
81
|
- Do not invoke `break-it-qa`, `frontend-ship-audit`, or `backend-ship-audit` yourself from the managed AGENTS block workflow; they are user-facing skills for explicit human-requested QA or ship-readiness audits, not default agent steps
|
|
81
82
|
- Before presenting a non-trivial implementation plan to the user, run `plan-reviewer` and iterate on the plan until it has no meaningful review findings left
|
|
82
83
|
- Before considering a non-trivial implementation slice complete, run `code-reviewer`; use a recent self-authored commit as the default scope anchor when one cleanly represents that slice
|