waypoint-codex 0.19.2 → 1.0.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (41) hide show
  1. package/README.md +18 -37
  2. package/dist/src/cli.js +1 -1
  3. package/dist/src/core.js +33 -116
  4. package/dist/src/docs-index.js +1 -1
  5. package/dist/src/templates.js +0 -10
  6. package/package.json +1 -1
  7. package/templates/.agents/skills/agi-help/SKILL.md +1 -1
  8. package/templates/.agents/skills/code-guide-audit/SKILL.md +1 -5
  9. package/templates/.agents/skills/planning/SKILL.md +12 -10
  10. package/templates/.agents/skills/pr-review/SKILL.md +0 -1
  11. package/templates/.codex/agents/code-health-reviewer.toml +6 -5
  12. package/templates/.codex/agents/code-reviewer.toml +6 -5
  13. package/templates/.codex/agents/plan-reviewer.toml +6 -5
  14. package/templates/.waypoint/ACTIVE_PLANS.md +7 -7
  15. package/templates/.waypoint/README.md +5 -8
  16. package/templates/.waypoint/config.toml +0 -5
  17. package/templates/.waypoint/docs/README.md +1 -3
  18. package/templates/.waypoint/scripts/build-docs-index.mjs +25 -11
  19. package/templates/.waypoint/scripts/prepare-context.mjs +120 -205
  20. package/templates/WORKSPACE.md +2 -6
  21. package/templates/managed-agents-block.md +22 -111
  22. package/dist/src/track-index.js +0 -107
  23. package/templates/.agents/skills/break-it-qa/SKILL.md +0 -184
  24. package/templates/.agents/skills/break-it-qa/agents/openai.yaml +0 -4
  25. package/templates/.agents/skills/conversation-retrospective/SKILL.md +0 -147
  26. package/templates/.agents/skills/conversation-retrospective/agents/openai.yaml +0 -4
  27. package/templates/.agents/skills/docs-sync/SKILL.md +0 -78
  28. package/templates/.agents/skills/docs-sync/agents/openai.yaml +0 -4
  29. package/templates/.agents/skills/merge-ready-owner/SKILL.md +0 -196
  30. package/templates/.agents/skills/merge-ready-owner/agents/openai.yaml +0 -4
  31. package/templates/.agents/skills/pre-pr-hygiene/SKILL.md +0 -83
  32. package/templates/.agents/skills/pre-pr-hygiene/agents/openai.yaml +0 -4
  33. package/templates/.agents/skills/work-tracker/SKILL.md +0 -139
  34. package/templates/.agents/skills/work-tracker/agents/openai.yaml +0 -4
  35. package/templates/.agents/skills/workspace-compress/SKILL.md +0 -113
  36. package/templates/.agents/skills/workspace-compress/agents/openai.yaml +0 -4
  37. package/templates/.waypoint/SOUL.md +0 -71
  38. package/templates/.waypoint/agent-operating-manual.md +0 -178
  39. package/templates/.waypoint/scripts/build-track-index.mjs +0 -169
  40. package/templates/.waypoint/track/README.md +0 -38
  41. package/templates/.waypoint/track/_template.md +0 -48
@@ -1,130 +1,41 @@
1
1
  <!-- waypoint:start -->
2
2
  # Waypoint
3
3
 
4
- This repository uses Waypoint as its Codex operating system.
5
-
6
- These instructions are mandatory for work in this repo. Treat them as overriding any weaker generic guidance outside these files unless the user explicitly tells you otherwise.
4
+ These instructions are mandatory in this repo and override weaker generic guidance unless the user says otherwise.
7
5
 
8
6
  Waypoint owns only the text inside these `waypoint:start/end` markers.
9
7
  If you need repo-specific AGENTS instructions, write them outside this managed block.
10
8
  Do not put durable repo guidance inside the managed block, because `waypoint init` may replace it during upgrades.
11
9
 
12
- Stop here if the bootstrap has not been run yet.
10
+ You are a direct, evidence-driven collaborator. Investigate before claiming status. Fix root causes when the scope supports it. Keep communication concise.
13
11
 
14
- Run the Waypoint bootstrap only in these cases:
15
- - at the start of a new session
16
- - immediately after a compaction
17
- - if the user explicitly tells you to rerun it
12
+ This repo's default artifact flow is:
13
+ 1. `AGENTS.md` for the always-on contract
14
+ 2. `.waypoint/WORKSPACE.md` for current repo state
15
+ 3. `.waypoint/ACTIVE_PLANS.md` for the active plan pointer, execution checklist, blockers, and verification state
16
+ 4. `.waypoint/DOCS_INDEX.md` for durable docs routing
17
+ 5. `.waypoint/context/SNAPSHOT.md` and `.waypoint/context/RECENT_THREAD.md` for generated volatile context
18
18
 
19
- Bootstrap sequence:
19
+ Run the Waypoint bootstrap only at session start, after compaction, or when the user explicitly asks for it:
20
20
  1. Run `node .waypoint/scripts/prepare-context.mjs`
21
- 2. Read `.waypoint/SOUL.md`
22
- 3. Read `.waypoint/agent-operating-manual.md`
23
- 4. Read `.waypoint/WORKSPACE.md`
24
- 5. Read `.waypoint/ACTIVE_PLANS.md`
25
- 6. Read `.waypoint/context/MANIFEST.md`
26
- 7. Read every file listed in the manifest
27
-
28
- This is mandatory, not optional.
29
-
30
- - Do not skip it at session start or after compaction.
31
- - Do not rerun it mid-conversation just because a task is substantial.
32
- - Earlier chat context or earlier work in the session does not replace the bootstrap when a new session starts or a compaction happens.
33
- - If you are not sure whether a new session started or a compaction happened, rerun it.
34
- - Do not skip the context refresh or skip files in the manifest.
35
-
36
- Before making meaningful implementation, review, architectural, or tradeoff decisions, inspect the project root guidance files for persisted project context.
37
-
38
- Project guidance rules:
39
- - Distinguish user-scoped guidance from project-scoped guidance.
40
- - User-scoped `AGENTS.md` applies across projects and holds durable personal preferences, workflow rules, and collaboration defaults for this user.
41
- - Prefer `AGENTS.md` in the project root if present.
42
- - The project root `AGENTS.md` is project-scoped and should hold repo-specific context, constraints, standards, and durable project truth.
43
- - Look for context sections relevant to the task, including `## Project Context`, `## Frontend Context`, and `## Backend Context`.
44
- - Treat relevant context sections as active inputs to decision-making, not passive documentation.
45
- - Apply that context to scope, architecture, implementation depth, review standards, risk tolerance, testing strategy, compatibility expectations, rollout caution, and UX/product quality bar.
21
+ 2. Read `AGENTS.md`
22
+ 3. Read `.waypoint/WORKSPACE.md`
23
+ 4. Read `.waypoint/ACTIVE_PLANS.md`
24
+ 5. Read `.waypoint/DOCS_INDEX.md`
25
+ 6. Read `.waypoint/context/SNAPSHOT.md`
26
+ 7. Read `.waypoint/context/RECENT_THREAD.md`
46
27
 
47
- Examples of durable context that can materially change the correct approach:
48
- - internal tool vs public internet-facing product
49
- - expected scale, criticality, and usage patterns
50
- - regulatory, privacy, or compliance requirements
51
- - browser and device support expectations
52
- - accessibility expectations
53
- - SEO requirements
54
- - tenant model and authorization model
55
- - backward compatibility requirements
56
- - reliability and observability expectations
57
- - security posture assumptions
28
+ Before major implementation or architecture changes, check the repo guidance and routed docs for durable context. Ask only the missing high-leverage questions.
58
29
 
59
- If relevant context is missing, empty, stale, or insufficient and that gap would materially change the correct approach:
60
- - do not guess silently
61
- - if the task touches frontend and the needed frontend project context is not present in `AGENTS.md` or routed docs, use `frontend-context-interview`
62
- - if the task touches backend and the needed backend project context is not present in `AGENTS.md` or routed docs, use `backend-context-interview`
63
- - ask only the missing high-leverage questions
64
- - ask about the project, deployment reality, and operating constraints rather than the concrete feature
65
- - persist only durable context back into the project guidance file
66
- - do not write transient task-specific details into context sections
30
+ Once the user approves a plan or tells you to proceed, that approved scope is the execution contract. Do not silently narrow, defer, or drop approved work unless a real blocker or decision requires discussion.
67
31
 
68
- If some uncertainty still remains after checking persisted context and interviewing:
69
- - proceed with explicit assumptions
70
- - state those assumptions clearly in the work output or review
71
- - do not present guesses as established project context
32
+ `WORKSPACE.md` is the live state file. `ACTIVE_PLANS.md` is the active execution checklist. Keep them current when state, blockers, or verification materially change.
72
33
 
73
- Prefer existing persisted context over re-interviewing the user.
34
+ Refactor and migration default: use direct replacement, not compatibility scaffolding, unless the user or project docs explicitly require coexistence. Delete obsolete code aggressively and finish the phase back to green. Large destructive edits are allowed when they are the clearest path to the approved target state.
74
35
 
75
- If the user approves a plan or explicitly tells you to proceed, treat that as authorization to execute the work end to end. An approved plan is the active execution contract: do not silently narrow, defer, or drop planned work because the system feels good enough, the remaining work feels less important, or you would prefer a smaller PR. If you believe the approved scope should change, pause and discuss that change with the user before proceeding. Only change approved scope without that discussion when a real blocker, hidden-risk decision, or explicit user redirect requires it.
76
- When work is in flight elsewhere — reviewer agents, subagents, CI, automated review, external jobs, or other waiting periods — wait as long as required. There is no fixed waiting limit, and slowness alone is not a reason to interrupt or abandon the work.
77
- When you use a browser, app, or other interactive UI to inspect, reproduce, or verify something, send the user screenshots of the relevant states so they can see what you saw. If screenshots are not possible in the current environment, say so explicitly.
78
- When an explanation is clearer visually, use Mermaid diagrams directly in chat for flows, architecture, state, and plans.
36
+ Use reviewer passes when the work is non-trivial or risky, before PR-ready handoff, and before final closeout when helpful.
79
37
 
80
- Delivery expectations:
81
- - Keep communication concise by default. Lead with the answer, diagnosis, decision, or next step, and include only the most important supporting detail unless the user asks for more.
82
- - For planned work, treat `.waypoint/ACTIVE_PLANS.md` as the live execution contract and define done from the approved scope, current phase checkpoint, and acceptance criteria, not from your own sense that the system is already good enough.
83
- - Execute approved plans phase by phase. Finish the current phase, run the relevant checkpoint, resolve findings, and only then move to the next phase.
84
- - When you report back to the user, explain the result in plain, direct language. Say what you changed, what happened, and anything the user actually needs to know, but do not lean on jargon, low-level implementation detail, or code-heavy narration unless the user asks for it.
85
- - Write for a smart person who is not looking at the code. The goal is clarity, not technical performance.
86
- - This communication rule applies to how you explain the work, not to how you do it. Your actual reasoning, coding, debugging, and verification should stay technical, precise, and rigorous.
87
- - When the user shows a bug, broken behavior, or a screenshot of something wrong, investigate before discussing readiness.
88
- - After investigation, explain the problem to the user before jumping into implementation whenever the diagnosis, tradeoffs, or solution shape are not already obvious.
89
- - Lead with the useful truth: what is happening, the likely cause, the important options or tradeoffs if they matter, what you checked, and what you are doing next.
90
- - Fix the underlying problem, not only the visible symptom. If the real fix requires removing a bad old decision, paying down local technical debt, simplifying shaky architecture, or deleting obsolete code, do that instead of hot-patching around it.
91
- - When replacing a brittle path, aggressively delete obsolete code, stale compatibility branches, dead props, unused files, and debug logs instead of preserving them by default.
92
- - Do not preserve backward compatibility, old branches, or legacy code paths unless the user or documented project constraints explicitly require that compatibility.
93
- - Do not ship a bug fix that knowingly leaves the real cause in place behind a cosmetic patch unless the user explicitly asked for a temporary workaround.
94
- - Do not lead with refusal or readiness-disclaimer language like "I can't call this done yet" unless the user explicitly asked for a ship/readiness judgment.
95
- - Honesty means accurate diagnosis, explicit uncertainty, and clear verification limits. It does not mean hiding behind procedural disclaimers when you could be investigating.
96
- - Before you say the work is complete, verify it yourself whenever you reasonably can with the tools available in the environment.
97
- - Before you report completion, reread `.waypoint/ACTIVE_PLANS.md`, the active tracker if one exists, `WORKSPACE.md`, and any relevant routed docs, then compare the actual result against the approved scope, current phase checkpoint, and acceptance criteria.
98
- - If that reread shows the task is not actually complete, continue working. Do not stop just to report partial progress as if it were completion.
99
- - Match the verification to the task. Run code and inspect real output for scripts and backend changes. Click through flows, inspect rendered states, and check behavior in the browser for visual or interactive work.
100
- - Use representative or real inputs when practical instead of toy examples, so the check tells you something meaningful about the actual request.
101
- - If there are realistic edge cases, failure modes, or recovery paths you can exercise without turning the task into a science project, do that too.
102
- - If something looks wrong, incomplete, or unproven, keep going. Fix it, rerun the check, and only report completion once the result matches the request.
103
- - Do not call work done while approved scope or acceptance criteria remain unfinished. If any approved item was skipped or deferred, report that plainly as partial work or a scope-change proposal, not as completion.
104
- - The point of this is to keep iteration off the user's shoulders. Return finished work when possible, not a first pass that still depends on the user to spot-check it for you.
105
- - Only come back before that if you hit a genuine blocker you cannot clear with the codebase, tools, or available context. If that happens, say it plainly and be explicit about what remains unverified.
38
+ Keep communication concise. Lead with the answer, diagnosis, decision, or next step. Explain the diagnosis before implementation when the cause, tradeoffs, or solution shape are not already obvious.
106
39
 
107
- Working rules:
108
- - Treat `.waypoint/WORKSPACE.md` as a mandatory live execution log, not a closeout chore.
109
- - Treat `.waypoint/ACTIVE_PLANS.md` as the mandatory live execution-contract file for approved plans.
110
- - Update `.waypoint/WORKSPACE.md` during the work whenever the active goal, current phase, next step, blocker, verification state, or handoff context materially changes.
111
- - Update `.waypoint/ACTIVE_PLANS.md` whenever the active approved plan, current phase, phase checklist, checkpoint, or approved scope changes.
112
- - For multi-step work, keep the workspace and active plan file moving as you move: do not wait until the end of the task to reconstruct what happened.
113
- - If a tracker exists for the active workstream, update the tracker during the work as well and keep `WORKSPACE.md` pointing at the current tracker state.
114
- - Persist corrections and newly learned context in the right durable layer instead of defaulting to `AGENTS.md`.
115
- - Update user-scoped `AGENTS.md` only for true cross-project standing preferences or global operating rules.
116
- - Update the project-scoped repo `AGENTS.md` only for durable repo context or project-wide rules that should always apply in this repo.
117
- - If the correction is workflow-specific or method-specific, update the relevant repo skill instead. If no existing skill owns it well, propose creating one instead of stuffing that guidance into `AGENTS.md`.
118
- - Update `.waypoint/docs/` when durable project knowledge changes, update `.waypoint/plans/` when a durable plan changes, update `.waypoint/ACTIVE_PLANS.md` when the active approved plan or current phase changes, and refresh `last_updated` on touched routable docs
119
- - Keep most work in the main agent. Use repo-local skills, trackers, and reviewer agents when they create clear leverage, not as default ceremony.
120
- - Let repo-local skills describe their own triggers. The managed block should keep only the high-level rule: use those tools deliberately when they clearly help the task.
121
- - Do not hide behind generic heuristics like "try the simplest approach first" or "avoid refactoring beyond the ask" when the approved work or root-cause fix clearly requires deeper cleanup. Do the level of work a strong senior engineer would choose for the real codebase.
122
- - Use reviewer agents proactively at phase checkpoints when the work is non-trivial, risky, user-facing, merge-bound, or otherwise expensive to get wrong.
123
- - Strong default moments for reviewer-agent passes are: after completing a plan phase, before opening or materially updating a PR, after fixing substantial review findings, and before finally calling the work clear.
124
- - Do not interrupt implementation for heavyweight checks after every tiny edit. Batch related work into the current plan phase, then run the checkpoint.
125
- - When `code-reviewer` or `code-health-reviewer` find anything more serious than obvious optional polish, fix those findings, rerun the relevant verification, and run fresh review passes until the remaining feedback is only nitpicks or none.
126
- - Treat `plan-reviewer`, `code-reviewer`, and `code-health-reviewer` as one-shot agents: once a reviewer returns findings, close it; if another pass is needed later, spawn a fresh reviewer instead of reusing the old thread
127
- - If you created a PR earlier in the current session and need to push more work, first confirm that PR is still open. If it is closed, create a fresh branch from `origin/main` and open a fresh PR instead of pushing more commits to the old PR branch
128
- - Treat the generated context bundle as required session bootstrap, not optional reference material
129
- - After plan approval, own the execution through implementation, verification, and necessary repo-memory updates before surfacing a final completion report
40
+ Before reporting completion, verify the result yourself when reasonably possible, reread `ACTIVE_PLANS.md` and `WORKSPACE.md`, and compare reality against the approved scope. If the work is not actually complete, keep going.
130
41
  <!-- waypoint:end -->
@@ -1,107 +0,0 @@
1
- import { existsSync, readFileSync, readdirSync, statSync } from "node:fs";
2
- import path from "node:path";
3
- const VALID_TRACK_STATUSES = new Set(["active", "blocked", "paused", "done", "archived"]);
4
- const ACTIVE_TRACK_STATUSES = new Set(["active", "blocked", "paused"]);
5
- const SKIP_NAMES = new Set(["README.md", "CHANGELOG.md", "LICENSE.md"]);
6
- function shouldSkipTrackFile(entry) {
7
- return SKIP_NAMES.has(entry) || entry.startsWith("_");
8
- }
9
- function parseFrontmatter(filePath) {
10
- const text = readFileSync(filePath, "utf8");
11
- if (!text.startsWith("---\n")) {
12
- return { summary: "", lastUpdated: "", readWhen: [], status: "" };
13
- }
14
- const endIndex = text.indexOf("\n---\n", 4);
15
- if (endIndex === -1) {
16
- return { summary: "", lastUpdated: "", readWhen: [], status: "" };
17
- }
18
- const frontmatter = text.slice(4, endIndex);
19
- let summary = "";
20
- let lastUpdated = "";
21
- let status = "";
22
- const readWhen = [];
23
- let collectingReadWhen = false;
24
- for (const rawLine of frontmatter.split("\n")) {
25
- const line = rawLine.trim();
26
- if (line.startsWith("summary:")) {
27
- summary = line.slice("summary:".length).trim().replace(/^['"]|['"]$/g, "");
28
- collectingReadWhen = false;
29
- continue;
30
- }
31
- if (line.startsWith("last_updated:")) {
32
- lastUpdated = line.slice("last_updated:".length).trim().replace(/^['"]|['"]$/g, "");
33
- collectingReadWhen = false;
34
- continue;
35
- }
36
- if (line.startsWith("status:")) {
37
- status = line.slice("status:".length).trim().replace(/^['"]|['"]$/g, "").toLowerCase();
38
- collectingReadWhen = false;
39
- continue;
40
- }
41
- if (line.startsWith("read_when:")) {
42
- collectingReadWhen = true;
43
- continue;
44
- }
45
- if (collectingReadWhen && line.startsWith("- ")) {
46
- readWhen.push(line.slice(2).trim());
47
- continue;
48
- }
49
- if (collectingReadWhen && line.length > 0) {
50
- collectingReadWhen = false;
51
- }
52
- }
53
- return { summary, lastUpdated, readWhen, status };
54
- }
55
- function walkTracks(projectRoot, currentDir, output, invalid) {
56
- for (const entry of readdirSync(currentDir)) {
57
- const fullPath = path.join(currentDir, entry);
58
- const stat = statSync(fullPath);
59
- if (stat.isDirectory()) {
60
- walkTracks(projectRoot, fullPath, output, invalid);
61
- continue;
62
- }
63
- if (!entry.endsWith(".md") || shouldSkipTrackFile(entry)) {
64
- continue;
65
- }
66
- const { summary, lastUpdated, readWhen, status } = parseFrontmatter(fullPath);
67
- const relPath = path.relative(projectRoot, fullPath);
68
- if (!summary || !lastUpdated || readWhen.length === 0 || !VALID_TRACK_STATUSES.has(status)) {
69
- invalid.push(relPath);
70
- continue;
71
- }
72
- output.push({ path: relPath, summary, readWhen, status });
73
- }
74
- }
75
- export function renderTracksIndex(projectRoot, trackDir) {
76
- const entries = [];
77
- const invalidTracks = [];
78
- if (existsSync(trackDir)) {
79
- walkTracks(projectRoot, trackDir, entries, invalidTracks);
80
- }
81
- const lines = [
82
- "# Tracks Index",
83
- "",
84
- "Auto-generated by `waypoint sync` / `waypoint doctor`. Read active trackers when resuming long-running work.",
85
- "",
86
- "## .waypoint/track/",
87
- "",
88
- ];
89
- if (entries.length === 0) {
90
- lines.push("No tracker files found.");
91
- }
92
- else {
93
- for (const entry of entries.sort((a, b) => a.path.localeCompare(b.path))) {
94
- lines.push(`- **${entry.path}** — [${entry.status}] ${entry.summary}`);
95
- lines.push(` Read when: ${entry.readWhen.join("; ")}`);
96
- }
97
- }
98
- lines.push("");
99
- return {
100
- content: `${lines.join("\n")}`,
101
- invalidTracks,
102
- activeTrackPaths: entries
103
- .filter((entry) => ACTIVE_TRACK_STATUSES.has(entry.status))
104
- .map((entry) => entry.path)
105
- .sort((a, b) => a.localeCompare(b)),
106
- };
107
- }
@@ -1,184 +0,0 @@
1
- ---
2
- name: break-it-qa
3
- description: Verify a user-facing feature by trying to break it on purpose instead of only following the happy path. Use after building forms, multistep flows, settings pages, onboarding, stateful UI, destructive actions, or any browser-facing feature where invalid inputs, refreshes, back navigation, repeated clicks, wrong action order, or recovery paths might expose real bugs.
4
- ---
5
-
6
- # Break-It QA
7
-
8
- Use this skill to attack the feature like an impatient, confused, or careless user.
9
-
10
- This skill is for adversarial manual QA. It tries to make the feature fail through invalid, interrupted, stale, repeated, or out-of-order interactions instead of only proving the happy path works.
11
-
12
- ## Step 1: Ask The Three Setup Questions
13
-
14
- Before testing, ask the user these questions if the answer is not already clear from context:
15
-
16
- - what exact feature or scope should this cover?
17
- - how many attack items should the break log reach before stopping?
18
- - should the skill stop at findings or also fix clear issues after they are found?
19
-
20
- Keep this intake short. These are the main user-controlled knobs for the skill.
21
-
22
- If the user does not specify a count, use a reasonable default such as `40`.
23
-
24
- ## Step 2: Read First
25
-
26
- Before verification:
27
-
28
- 1. Read `.waypoint/SOUL.md`
29
- 2. Read `.waypoint/agent-operating-manual.md`
30
- 3. Read `.waypoint/WORKSPACE.md`
31
- 4. Read `.waypoint/context/MANIFEST.md`
32
- 5. Read every file listed in that manifest
33
- 6. Read the routed docs or nearby code that define the feature being tested
34
-
35
- ## Step 3: Identify Break Surfaces
36
-
37
- - Identify the happy path first so you know what "broken" means.
38
- - Find the fragile surfaces: forms, wizards, pending states, destructive actions, async transitions, navigation changes, and persisted state.
39
- - For each major step or transition, ask explicit "What if...?" questions before testing. Examples:
40
- - What if the user refreshes here?
41
- - What if they go back now?
42
- - What if they click twice?
43
- - What if this input is empty, malformed, too long, or contradictory?
44
- - What if this action succeeds in the UI but fails in persistence?
45
-
46
- Do not test blindly.
47
-
48
- ## Step 4: Create A Break Log
49
-
50
- Write or update a durable markdown log under `.waypoint/docs/`.
51
-
52
- - Prefer a focused path such as `.waypoint/docs/verification/<feature>-break-it-qa.md`.
53
- - If a routed verification doc already exists for this feature, update it instead of creating a competing file.
54
- - The log is part of the skill, not an optional extra.
55
- - Pre-generate the attack plan in this log before executing it. Do not improvise everything live.
56
-
57
- Use one item per attempted action. A good entry shape is:
58
-
59
- ```markdown
60
- - [ ] What if the user refreshes on the confirmation step before the request finishes?
61
- Step: confirmation
62
- Category: navigation
63
- Status: pending
64
- Observed: not tried yet
65
- ```
66
-
67
- Then update each item as you go:
68
-
69
- - `survived`
70
- - `broke`
71
- - `fixed`
72
- - `retested-survived`
73
- - `blocked`
74
- - `not-applicable`
75
-
76
- Every executed item must include:
77
-
78
- - `Step`
79
- - `Category`
80
- - `Status`
81
- - `Observed`
82
-
83
- If the user sets a target such as "make this file 150 items long before you stop," treat that as a hard stopping condition unless you hit a real blocker and explain why.
84
-
85
- Use consistent categories such as:
86
-
87
- - `navigation`
88
- - `input-validation`
89
- - `repeat-action`
90
- - `stale-state`
91
- - `error-recovery`
92
- - `destructive-action`
93
- - `permissions`
94
- - `async-state`
95
- - `persistence`
96
-
97
- ## Step 5: Enforce Coverage Before Execution
98
-
99
- Before you start executing attacks:
100
-
101
- - pre-generate a meaningful attack list
102
- - spread it across the major flow steps
103
- - spread it across relevant categories
104
- - make sure the count is not satisfied by one repetitive corner of the feature
105
-
106
- Do not treat total item count alone as sufficient coverage.
107
-
108
- If the user asks for a large target such as `150`, ensure the log covers multiple steps and multiple categories instead of padding one surface.
109
-
110
- Anti-cheating rules:
111
-
112
- - no filler items
113
- - each attack must be meaningfully distinct
114
- - reworded duplicates do not count toward the target
115
-
116
- ## Step 6: Use The Real UI
117
-
118
- - Use `playwright-interactive`.
119
- - Exercise the actual UI instead of mocking the flow in code.
120
- - Keep the scope focused on the feature the user asked you to verify.
121
- - Capture screenshots of the important states you observe so the user can see the evidence directly.
122
-
123
- ## Step 7: Try To Break It On Purpose
124
-
125
- Do more than a happy-path walkthrough.
126
-
127
- Actively try:
128
-
129
- - invalid inputs
130
- - empty required fields
131
- - boundary-length or malformed inputs
132
- - repeated or double clicks
133
- - submitting twice
134
- - wrong action order
135
- - back and forward navigation
136
- - page refresh during the flow
137
- - closing and reopening modals or screens
138
- - canceling mid-flow and re-entering
139
- - stale UI state after edits
140
- - conflicting selections or toggles
141
- - error recovery after a failed action
142
-
143
- If the feature is stateful, also check whether the UI, network result, and persisted state stay coherent after those interactions.
144
-
145
- As you test, keep expanding the break log with new "What if...?" cases that emerge from the flow. Do not rely on memory or chat-only notes.
146
-
147
- ## Step 8: Record And Fix Real Bugs
148
-
149
- - Document each meaningful issue you find.
150
- - Fix the issue when the remediation is clear and the chosen mode includes fixes.
151
- - If the behavior is ambiguous, call out the product decision instead of bluffing a fix.
152
- - Update docs when the verification exposes stale assumptions about how the feature works.
153
- - Update the break log entry for each attempted action with what happened and whether the feature survived.
154
- - Require a short observed-result note for every executed item. "Worked" is too weak; capture what actually happened.
155
- - Save screenshots for the key broken, risky, or fixed states as you go.
156
-
157
- Do not stop at the first bug.
158
-
159
- ## Step 9: Repeat Until The Feature Resists Abuse
160
-
161
- After fixes:
162
-
163
- - rerun the relevant happy path
164
- - rerun the break attempts that previously failed
165
- - rerun directly related attacks
166
- - rerun neighboring attacks that touch the same step, state transition, or failure surface
167
- - verify the fix did not create a new inconsistent state
168
- - keep adding and executing new "What if...?" items until the requested target coverage is reached
169
-
170
- The skill is not done when the feature only works once. It is done when the feature behaves predictably under sloppy real-world use.
171
-
172
- ## Step 10: Report Truthfully
173
-
174
- Summarize:
175
-
176
- - the path to the break log markdown file
177
- - how many attack items were recorded and exercised
178
- - how coverage was distributed across steps and categories
179
- - which screenshots you captured and what each one shows
180
- - what break attempts you tried
181
- - which issues you found
182
- - what you fixed
183
- - a short systemic-risks summary describing recurring weakness patterns, not just individual bugs
184
- - what still looks risky or was not exercised
@@ -1,4 +0,0 @@
1
- interface:
2
- display_name: "Break-It QA"
3
- short_description: "Try to break a feature through the UI"
4
- default_prompt: "Use $break-it-qa to verify this user-facing feature by trying to break it through the browser with invalid inputs, wrong action order, refreshes, back navigation, repeated clicks, and other adversarial interactions, then fix clear issues and repeat."
@@ -1,147 +0,0 @@
1
- ---
2
- name: conversation-retrospective
3
- description: Harvest durable knowledge, user feedback, skill lessons, and repeated workflow patterns from the active conversation into the repo's existing memory system. Use when the user asks to save what was learned, write down what changed, capture lessons from this thread, update docs or handoff state without more prompting, improve skills that were used or exposed gaps, or record new skill ideas based on repetitive work in the live conversation. Do not use this for generic planning, broad docs audits, or digging through archived session history unless the user explicitly asks for that.
4
- ---
5
-
6
- # Conversation Retrospective
7
-
8
- Use this skill to harvest the active conversation into the repo's existing memory system.
9
-
10
- This skill works from the live conversation already in context. Do not go hunting through archived session files unless the user explicitly asks for that.
11
-
12
- This is a closeout and distillation workflow, not a generic planning pass or a broad docs audit.
13
-
14
- ## When Not To Use This Skill
15
-
16
- - Skip it for generic planning or implementation design; use the planning workflow for that.
17
- - Skip it for broad docs audits that are not driven by what happened in this conversation.
18
- - Skip it when the user wants archived history analysis rather than the live thread; only dig into old sessions if they explicitly ask.
19
- - Skip it when there is nothing durable to preserve and no skill or workflow lesson to capture.
20
-
21
- ## Read First
22
-
23
- Before persisting anything:
24
-
25
- 1. Read the repo's main agent guidance and project-context files
26
- 2. Read the repo's current durable memory surfaces, such as docs, workspace/handoff files, trackers, decision logs, or knowledge files
27
- 3. Read the exact docs, notes, and skill files that the conversation touched
28
-
29
- Do not assume the repo uses Waypoint. Adapt to the memory structure that already exists.
30
-
31
- ## Step 1: Distill Durable Knowledge
32
-
33
- Review the current conversation and separate:
34
-
35
- - durable project knowledge
36
- - live execution state
37
- - transient chatter
38
- - direct user feedback, corrections, complaints, and preferences
39
-
40
- Persist without asking follow-up questions when the correct destination is clear.
41
-
42
- Treat explicit user feedback as a high-priority signal. If the user corrected the approach, rejected a behavior, called out friction, or stated a standing preference, prefer preserving that over the agent's earlier assumptions.
43
-
44
- Write durable knowledge to the smallest truthful home the repo already uses:
45
-
46
- - the main docs or knowledge layer for architecture, behavior, decisions, debugging knowledge, and reusable operating guidance
47
- - the repo's plans layer for durable implementation, rollout, migration, or investigation plans
48
- - the repo's standing guidance file for durable project context or long-lived working rules
49
- - the repo's live handoff or workspace file for current state, blockers, and immediate next steps
50
- - the repo's tracker or execution-log layer when the conversation created or materially changed a long-running workstream
51
-
52
- If the repo uses doc metadata such as `last_updated`, refresh it when needed.
53
-
54
- If the repo has no obvious durable home but the need is clear, create the smallest coherent doc or note that fits the surrounding patterns instead of leaving the learning only in chat.
55
-
56
- Do not leave important truths only in chat.
57
-
58
- ## Step 2: Improve Existing Skills
59
-
60
- Identify which skills were actually used in this conversation, or which existing skills clearly should have covered the workflow but left avoidable gaps.
61
-
62
- For each used or clearly relevant skill, explicitly decide whether it:
63
-
64
- - succeeded
65
- - partially succeeded
66
- - failed
67
-
68
- Base that judgment on the actual conversation, especially:
69
-
70
- - direct user feedback
71
- - whether the skill helped complete the task
72
- - whether the agent had to work around missing guidance
73
- - whether concrete errors, dead ends, or repeated corrections happened while using it
74
-
75
- Distinguish between:
76
-
77
- - a skill problem
78
- - an execution mistake by the agent
79
- - an external/tooling failure
80
- - a one-off user preference that should not be generalized
81
-
82
- Only change the skill when the problem is truly in the skill guidance.
83
-
84
- For each affected skill:
85
-
86
- - read the existing skill before editing it
87
- - update only reusable guidance, not one-off transcript details
88
- - add missing guardrails, path hints, failure modes, error-handling guidance, decision rules, or references that would have made the conversation easier to complete
89
- - keep `SKILL.md` concise; prefer targeted structural improvements over turning the skill into a diary
90
-
91
- If the environment has both a source-of-truth skill and one or more mirrored or installed copies, update the source-of-truth version and any copies the user expects to stay in sync.
92
-
93
- Do not assume there is only one skill location, and do not assume there are many.
94
-
95
- ## Step 3: Propose New Skills
96
-
97
- When the conversation revealed repetitive work that existing skills do not cover well:
98
-
99
- - do not silently scaffold a new skill unless the user asked for implementation
100
- - record the proposal in the repo's existing docs or idea-capture layer
101
-
102
- If there is no obvious place for durable skill proposals, create a small doc such as `skill-ideas.md` in the repo's normal docs area.
103
-
104
- Each proposal should include:
105
-
106
- - the repeated workflow or problem
107
- - likely trigger phrases
108
- - expected outputs or side effects
109
- - why existing skills were insufficient
110
-
111
- Skip this doc when there is no real new-skill candidate.
112
-
113
- ## Step 4: Refresh Repo Memory
114
-
115
- After changing docs, handoff state, trackers, or skills:
116
-
117
- - run whatever repo-local refresh or index step the project uses, if one exists
118
- - otherwise make sure the edited memory surfaces are internally consistent and discoverable
119
-
120
- Do not invent a refresh command when the repo does not have one.
121
-
122
- ## Step 5: Report
123
-
124
- Summarize:
125
-
126
- - what durable knowledge you saved and where
127
- - which skills you evaluated and whether they succeeded, partially succeeded, or failed
128
- - which skills you improved
129
- - which concrete errors, failure modes, or repeated friction points you captured
130
- - which new skill ideas you recorded, if any
131
- - what you intentionally left unpersisted because it was transient
132
-
133
- If no substantive persistence changes were needed, say that explicitly instead of inventing updates.
134
-
135
- ## Gotchas
136
-
137
- - Do not turn this skill into a transcript dump. Persist only durable knowledge, live state, or reusable lessons.
138
- - Do not scatter the same learning across multiple files. Pick the smallest truthful home the repo already uses.
139
- - Do not blame a skill for a problem that was really an execution mistake or an external tool failure.
140
- - Do not preserve one-off user phrasing or temporary frustration as if it were standing repo policy unless the user clearly framed it that way.
141
- - Do not go hunting through archived session files just because the live thread feels incomplete. This skill should work from the current conversation unless the user explicitly broadens the scope.
142
-
143
- ## Keep This Skill Sharp
144
-
145
- - After meaningful retrospectives, add new gotchas when the same persistence mistake, memory-placement mistake, or skill-triage mistake keeps recurring.
146
- - Tighten the description if the skill misses real prompts like "save what we learned here" or fires on requests that are really planning or docs-audit work.
147
- - If the same kind of durable learning keeps needing a custom destination, add that routing guidance to the skill instead of leaving the decision to be rediscovered in chat.
@@ -1,4 +0,0 @@
1
- interface:
2
- display_name: "Conversation Retrospective"
3
- short_description: "Harvest the live conversation into repo memory"
4
- default_prompt: "Use $conversation-retrospective to preserve the durable lessons, repo-memory updates, and skill learnings from this live conversation."