waypoint-codex 0.10.9 → 0.10.11

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (37) hide show
  1. package/README.md +4 -3
  2. package/dist/src/core.js +166 -14
  3. package/package.json +1 -1
  4. package/templates/.agents/skills/adversarial-review/SKILL.md +85 -0
  5. package/templates/.agents/skills/adversarial-review/agents/openai.yaml +4 -0
  6. package/templates/.agents/skills/backend-context-interview/SKILL.md +100 -50
  7. package/templates/.agents/skills/backend-context-interview/agents/openai.yaml +2 -2
  8. package/templates/.agents/skills/backend-ship-audit/SKILL.md +23 -25
  9. package/templates/.agents/skills/backend-ship-audit/agents/openai.yaml +4 -3
  10. package/templates/.agents/skills/code-guide-audit/SKILL.md +22 -1
  11. package/templates/.agents/skills/code-guide-audit/agents/openai.yaml +2 -2
  12. package/templates/.agents/skills/conversation-retrospective/SKILL.md +22 -1
  13. package/templates/.agents/skills/conversation-retrospective/agents/openai.yaml +1 -1
  14. package/templates/.agents/skills/docs-sync/SKILL.md +22 -1
  15. package/templates/.agents/skills/docs-sync/agents/openai.yaml +1 -1
  16. package/templates/.agents/skills/frontend-context-interview/SKILL.md +19 -0
  17. package/templates/.agents/skills/frontend-context-interview/agents/openai.yaml +2 -2
  18. package/templates/.agents/skills/frontend-ship-audit/SKILL.md +20 -0
  19. package/templates/.agents/skills/frontend-ship-audit/agents/openai.yaml +4 -3
  20. package/templates/.agents/skills/planning/SKILL.md +20 -0
  21. package/templates/.agents/skills/planning/agents/openai.yaml +1 -1
  22. package/templates/.agents/skills/pr-review/SKILL.md +20 -0
  23. package/templates/.agents/skills/pr-review/agents/openai.yaml +1 -1
  24. package/templates/.agents/skills/pre-pr-hygiene/SKILL.md +20 -0
  25. package/templates/.agents/skills/pre-pr-hygiene/agents/openai.yaml +1 -1
  26. package/templates/.agents/skills/visual-explanations/SKILL.md +14 -0
  27. package/templates/.agents/skills/visual-explanations/agents/openai.yaml +1 -1
  28. package/templates/.agents/skills/work-tracker/SKILL.md +20 -0
  29. package/templates/.agents/skills/work-tracker/agents/openai.yaml +1 -1
  30. package/templates/.agents/skills/workspace-compress/SKILL.md +20 -0
  31. package/templates/.agents/skills/workspace-compress/agents/openai.yaml +1 -1
  32. package/templates/.codex/agents/code-health-reviewer.toml +2 -1
  33. package/templates/.codex/agents/code-reviewer.toml +2 -1
  34. package/templates/.gitignore.snippet +2 -0
  35. package/templates/.waypoint/SOUL.md +1 -1
  36. package/templates/.waypoint/agent-operating-manual.md +13 -14
  37. package/templates/managed-agents-block.md +1 -2
package/README.md CHANGED
@@ -37,7 +37,7 @@ The philosophy is simple:
37
37
  - more markdown
38
38
  - better continuity for the next agent
39
39
 
40
- By default, Waypoint appends a `.gitignore` snippet that ignores the exact Waypoint-created skill directories and reviewer-agent config files, plus everything under `.waypoint/` except `.waypoint/docs/`, while still ignoring the scaffolded `.waypoint/docs/README.md` and `.waypoint/docs/code-guide.md` assets. User-authored durable docs stay trackable; workspace, context, indexes, and other operational state remain local.
40
+ By default, Waypoint keeps its `.gitignore` rules inside a comment-delimited `# Waypoint state` section. That section ignores the exact Waypoint-created skill directories and reviewer-agent config files, plus everything under `.waypoint/` except `.waypoint/docs/`, while still ignoring the scaffolded `.waypoint/docs/README.md` and `.waypoint/docs/code-guide.md` assets. User-authored durable docs stay trackable; workspace, context, indexes, and other operational state remain local.
41
41
 
42
42
  ## Best fit
43
43
 
@@ -136,6 +136,7 @@ Waypoint ships a strong default skill pack for real coding work:
136
136
  - `work-tracker`
137
137
  - `docs-sync`
138
138
  - `code-guide-audit`
139
+ - `adversarial-review`
139
140
  - `break-it-qa`
140
141
  - `conversation-retrospective`
141
142
  - `frontend-ship-audit`
@@ -157,9 +158,9 @@ Waypoint scaffolds these reviewer agents by default:
157
158
  - `code-reviewer`
158
159
  - `plan-reviewer`
159
160
 
160
- The intended workflow is closeout-based: run `code-reviewer` before considering any non-trivial implementation slice complete, and run `code-health-reviewer` before considering medium or large changes complete, especially when they add structure, duplicate logic, or introduce new abstractions. If both apply, run them in parallel. A recent self-authored commit is the preferred scope anchor when one cleanly represents the slice, but it is not the only valid trigger. Reviewer agents are one-shot workers: once a reviewer returns findings, close it, and if another pass is needed later, spawn a fresh reviewer instead of reusing the old thread.
161
+ The intended workflow is closeout-based: run `adversarial-review` before considering any non-trivial implementation slice complete. That skill scopes the current slice, runs `code-reviewer`, runs `code-health-reviewer` when the change is medium or large or otherwise structurally risky, runs `code-guide-audit`, waits as long as needed, fixes meaningful findings, and reruns fresh reviewer rounds until no meaningful findings remain. A recent self-authored commit is the preferred scope anchor when one cleanly represents the slice, but it is not the only valid trigger. Reviewer agents are one-shot workers: once a reviewer returns findings, close it, and if another pass is needed later, spawn a fresh reviewer instead of reusing the old thread.
161
162
 
162
- The shipped reviewer configs now default to `gpt-5.4` with `high` reasoning, and the main-agent guidance explicitly tells Codex to pass the same `model` and `reasoning_effort` values whenever it spawns reviewer agents or other subagents. The reviewer prompts also treat the diff as a starting pointer rather than the review itself: they must read each changed file in full, expand into related files, and only then conclude.
163
+ The shipped reviewer configs now default to `gpt-5.4` with `high` reasoning, and the main-agent guidance explicitly tells Codex to pass `fork_context: false` plus the same `model` and `reasoning_effort` values whenever it spawns reviewer agents or other subagents. The reviewer prompts also treat the diff as a starting pointer rather than the review itself: they must read each changed file in full, expand into related files, and only then conclude.
163
164
 
164
165
  For planning work, run `plan-reviewer` before presenting a non-trivial implementation plan to the user and iterate until it has no meaningful review findings left. Each pass should use a fresh `plan-reviewer` agent rather than reusing a previous reviewer thread.
165
166
 
package/dist/src/core.js CHANGED
@@ -10,6 +10,59 @@ const DEFAULT_DOCS_INDEX = ".waypoint/DOCS_INDEX.md";
10
10
  const DEFAULT_TRACK_DIR = ".waypoint/track";
11
11
  const DEFAULT_TRACKS_INDEX = ".waypoint/TRACKS_INDEX.md";
12
12
  const DEFAULT_WORKSPACE = ".waypoint/WORKSPACE.md";
13
+ const GITIGNORE_WAYPOINT_START = "# Waypoint state";
14
+ const GITIGNORE_WAYPOINT_END = "# End Waypoint state";
15
+ const LEGACY_WAYPOINT_GITIGNORE_RULES = new Set([
16
+ ".codex/",
17
+ ".codex/config.toml",
18
+ ".codex/agents/",
19
+ ".codex/agents/code-reviewer.toml",
20
+ ".codex/agents/code-health-reviewer.toml",
21
+ ".codex/agents/plan-reviewer.toml",
22
+ ".agents/",
23
+ ".agents/skills/",
24
+ ".agents/skills/planning/",
25
+ ".agents/skills/work-tracker/",
26
+ ".agents/skills/docs-sync/",
27
+ ".agents/skills/code-guide-audit/",
28
+ ".agents/skills/adversarial-review/",
29
+ ".agents/skills/visual-explanations/",
30
+ ".agents/skills/break-it-qa/",
31
+ ".agents/skills/frontend-context-interview/",
32
+ ".agents/skills/backend-context-interview/",
33
+ ".agents/skills/frontend-ship-audit/",
34
+ ".agents/skills/backend-ship-audit/",
35
+ ".agents/skills/conversation-retrospective/",
36
+ ".agents/skills/workspace-compress/",
37
+ ".agents/skills/pre-pr-hygiene/",
38
+ ".agents/skills/pr-review/",
39
+ ".waypoint/",
40
+ ".waypoint/DOCS_INDEX.md",
41
+ ".waypoint/state/",
42
+ ".waypoint/context/",
43
+ ".waypoint/*",
44
+ "!.waypoint/docs/",
45
+ "!.waypoint/docs/**",
46
+ ".waypoint/docs/README.md",
47
+ ".waypoint/docs/code-guide.md",
48
+ ]);
49
+ const SHIPPED_SKILL_NAMES = [
50
+ "planning",
51
+ "work-tracker",
52
+ "docs-sync",
53
+ "code-guide-audit",
54
+ "adversarial-review",
55
+ "visual-explanations",
56
+ "break-it-qa",
57
+ "conversation-retrospective",
58
+ "workspace-compress",
59
+ "pre-pr-hygiene",
60
+ "pr-review",
61
+ "frontend-context-interview",
62
+ "backend-context-interview",
63
+ "frontend-ship-audit",
64
+ "backend-ship-audit",
65
+ ];
13
66
  const TIMESTAMPED_WORKSPACE_SECTIONS = new Set([
14
67
  "## Active Trackers",
15
68
  "## Current State",
@@ -49,15 +102,114 @@ function migrateLegacyRootFiles(projectRoot) {
49
102
  function appendGitignoreSnippet(projectRoot) {
50
103
  const gitignorePath = path.join(projectRoot, ".gitignore");
51
104
  const snippet = readTemplate(".gitignore.snippet").trim();
105
+ const snippetLines = snippet.split("\n");
52
106
  if (!existsSync(gitignorePath)) {
53
107
  writeText(gitignorePath, `${snippet}\n`);
54
108
  return;
55
109
  }
56
110
  const content = readFileSync(gitignorePath, "utf8");
57
- if (content.includes(snippet)) {
111
+ const normalizedLines = content.split(/\r?\n/);
112
+ const normalizedContent = normalizedLines.join("\n");
113
+ const headerCount = normalizedLines.filter((line) => line === GITIGNORE_WAYPOINT_START).length;
114
+ if (normalizedContent.includes(snippet) && headerCount <= 1) {
115
+ return;
116
+ }
117
+ const startIndex = normalizedLines.findIndex((line) => line === snippetLines[0]);
118
+ if (startIndex === -1) {
119
+ writeText(gitignorePath, `${content.trimEnd()}\n\n${snippet}\n`);
120
+ return;
121
+ }
122
+ const managedLineSet = new Set(snippetLines);
123
+ const endIndex = findWaypointGitignoreBlockEnd(normalizedLines, startIndex);
124
+ if (endIndex === -1) {
125
+ writeText(gitignorePath, `${content.trimEnd()}\n\n${snippet}\n`);
58
126
  return;
59
127
  }
60
- writeText(gitignorePath, `${content.trimEnd()}\n\n${snippet}\n`);
128
+ const hasForeignLineInsideBlock = normalizedLines
129
+ .slice(startIndex + 1, endIndex)
130
+ .some((line) => line.length > 0 && !isManagedWaypointGitignoreLine(line, managedLineSet));
131
+ const trailingLines = stripSubsequentWaypointGitignoreBlocks(normalizedLines.slice(endIndex + 1), managedLineSet);
132
+ if (hasForeignLineInsideBlock) {
133
+ const foreignLines = normalizedLines
134
+ .slice(startIndex + 1, endIndex)
135
+ .filter((line) => line.length > 0 && !isManagedWaypointGitignoreLine(line, managedLineSet))
136
+ .join("\n");
137
+ const before = normalizedLines.slice(0, startIndex).join("\n").trimEnd();
138
+ const after = trailingLines.join("\n").trimStart();
139
+ const merged = [before, snippet, foreignLines, after].filter((piece) => piece.length > 0).join("\n\n");
140
+ writeText(gitignorePath, `${merged}\n`);
141
+ return;
142
+ }
143
+ const before = normalizedLines.slice(0, startIndex).join("\n").trimEnd();
144
+ const after = trailingLines.join("\n").trimStart();
145
+ const merged = [before, snippet, after].filter((piece) => piece.length > 0).join("\n\n");
146
+ writeText(gitignorePath, `${merged}\n`);
147
+ }
148
+ function findWaypointGitignoreBlockEnd(lines, startIndex) {
149
+ const explicitEndIndex = lines.findIndex((line, index) => index > startIndex && line === GITIGNORE_WAYPOINT_END);
150
+ if (explicitEndIndex !== -1) {
151
+ return explicitEndIndex;
152
+ }
153
+ return findLegacyWaypointGitignoreBlockEnd(lines, startIndex);
154
+ }
155
+ function findLegacyWaypointGitignoreBlockEnd(lines, startIndex) {
156
+ let scanEndExclusive = lines.length;
157
+ for (let index = startIndex + 1; index < lines.length; index += 1) {
158
+ const line = lines[index];
159
+ if (line.length === 0) {
160
+ scanEndExclusive = index;
161
+ break;
162
+ }
163
+ if (line.startsWith("#") && line !== GITIGNORE_WAYPOINT_START) {
164
+ scanEndExclusive = index;
165
+ break;
166
+ }
167
+ }
168
+ let endIndex = -1;
169
+ for (let index = startIndex + 1; index < scanEndExclusive; index += 1) {
170
+ if (isLegacyWaypointGitignoreRule(lines[index])) {
171
+ endIndex = index;
172
+ }
173
+ }
174
+ return endIndex;
175
+ }
176
+ function isLegacyWaypointGitignoreRule(line) {
177
+ const normalizedLine = line.startsWith("/") ? line.slice(1) : line;
178
+ return LEGACY_WAYPOINT_GITIGNORE_RULES.has(normalizedLine);
179
+ }
180
+ function isManagedWaypointGitignoreLine(line, managedLineSet) {
181
+ return managedLineSet.has(line) || isLegacyWaypointGitignoreRule(line);
182
+ }
183
+ function stripSubsequentWaypointGitignoreBlocks(lines, managedLineSet) {
184
+ const keptLines = [];
185
+ let index = 0;
186
+ while (index < lines.length) {
187
+ if (lines[index] !== GITIGNORE_WAYPOINT_START) {
188
+ keptLines.push(lines[index]);
189
+ index += 1;
190
+ continue;
191
+ }
192
+ const endIndex = findWaypointGitignoreBlockEnd(lines, index);
193
+ if (endIndex === -1) {
194
+ keptLines.push(lines[index]);
195
+ index += 1;
196
+ continue;
197
+ }
198
+ const foreignLines = lines
199
+ .slice(index + 1, endIndex)
200
+ .filter((line) => line.length > 0 && !isManagedWaypointGitignoreLine(line, managedLineSet));
201
+ if (foreignLines.length > 0) {
202
+ if (keptLines.length > 0 && keptLines[keptLines.length - 1] !== "") {
203
+ keptLines.push("");
204
+ }
205
+ keptLines.push(...foreignLines);
206
+ if (endIndex + 1 < lines.length && lines[endIndex + 1] !== "") {
207
+ keptLines.push("");
208
+ }
209
+ }
210
+ index = endIndex + 1;
211
+ }
212
+ return keptLines;
61
213
  }
62
214
  function upsertManagedBlock(filePath, block) {
63
215
  if (!existsSync(filePath)) {
@@ -377,18 +529,7 @@ export function doctorRepository(projectRoot) {
377
529
  paths: [workspacePath, ...tracksIndex.activeTrackPaths.map((trackPath) => path.join(projectRoot, trackPath))],
378
530
  });
379
531
  }
380
- for (const skillName of [
381
- "planning",
382
- "work-tracker",
383
- "docs-sync",
384
- "code-guide-audit",
385
- "visual-explanations",
386
- "break-it-qa",
387
- "conversation-retrospective",
388
- "workspace-compress",
389
- "pre-pr-hygiene",
390
- "pr-review",
391
- ]) {
532
+ for (const skillName of SHIPPED_SKILL_NAMES) {
392
533
  const skillPath = path.join(projectRoot, ".agents/skills", skillName, "SKILL.md");
393
534
  if (!existsSync(skillPath)) {
394
535
  findings.push({
@@ -398,6 +539,17 @@ export function doctorRepository(projectRoot) {
398
539
  remediation: "Run `waypoint init` to restore repo-local skills.",
399
540
  paths: [skillPath],
400
541
  });
542
+ continue;
543
+ }
544
+ const metadataPath = path.join(projectRoot, ".agents/skills", skillName, "agents", "openai.yaml");
545
+ if (!existsSync(metadataPath)) {
546
+ findings.push({
547
+ severity: "error",
548
+ category: "skills",
549
+ message: `Repo skill \`${skillName}\` metadata is missing.`,
550
+ remediation: "Run `waypoint init` to restore repo-local skill metadata.",
551
+ paths: [metadataPath],
552
+ });
401
553
  }
402
554
  }
403
555
  const codexConfigPath = path.join(projectRoot, ".codex/config.toml");
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "waypoint-codex",
3
- "version": "0.10.9",
3
+ "version": "0.10.11",
4
4
  "description": "Codex-native repository operating system: scaffolding, docs routing, repo-local skills, doctor, and sync.",
5
5
  "license": "MIT",
6
6
  "type": "module",
@@ -0,0 +1,85 @@
1
+ ---
2
+ name: adversarial-review
3
+ description: Close out a meaningful implementation slice with the full iterative review loop. Use when the user asks for a final review pass, asks to "close the loop," asks whether work is ready to call done, or when Codex is about to say a non-trivial code change is complete. This skill scopes the slice, runs `code-reviewer`, runs `code-health-reviewer` when the change is medium or large or structurally risky, runs `code-guide-audit`, waits for the required outputs, fixes real findings, and repeats with fresh rounds until no meaningful issues remain. Do not use this for tiny obvious edits, pre-implementation plan review, or active PR comment triage.
4
+ ---
5
+
6
+ # Adversarial Review
7
+
8
+ Use this skill to close the loop on implementation work instead of treating review as a one-shot pass.
9
+
10
+ This skill owns the default closeout workflow for a reviewable slice. It coordinates the specialist reviewers, keeps the scope tight, waits as long as needed, fixes meaningful findings, and reruns fresh review rounds until the remaining feedback is only optional polish or no findings at all.
11
+
12
+ ## When To Skip This Skill
13
+
14
+ - Skip it for tiny obvious edits where launching the full closeout loop would be noise.
15
+ - Skip it for pre-implementation planning; that is `plan-reviewer` territory.
16
+ - Skip it for active PR comment back-and-forth; use `pr-review` for that workflow.
17
+ - Skip it when the user wants a one-off targeted coding-guide check and not the full closeout loop; use `code-guide-audit` directly in that case.
18
+
19
+ ## Step 1: Define The Reviewable Slice
20
+
21
+ - Resolve the exact slice you are trying to close out before launching reviewers.
22
+ - Prefer a recent self-authored commit when one cleanly represents the slice.
23
+ - Otherwise use the current changed files, diff, or feature path.
24
+ - Pass the reviewers the same concrete scope anchor, plus a short plain-English summary of what changed.
25
+ - If the scope is muddy, tighten it before review instead of asking the reviewers to figure it out from an entire worktree.
26
+
27
+ ## Step 2: Launch The Required Reviewers
28
+
29
+ - Spawn `code-reviewer` for every non-trivial implementation slice.
30
+ - Spawn `code-health-reviewer` when the change is medium or large, especially when it adds structure, duplicates logic, or introduces new abstractions.
31
+ - Run `code-guide-audit` on the same scoped slice as part of the closeout loop.
32
+ - Launch the reviewer agents with `fork_context: false`, `model: gpt-5.4`, and `reasoning_effort: high` unless the user explicitly asked for something else.
33
+ - Tell the reviewer agents what changed, what scope anchor to use, and which files or feature area represent the slice under review.
34
+ - When both reviewer agents apply, launch them in parallel.
35
+
36
+ ## Step 3: Wait For The Round To Finish
37
+
38
+ - Wait for every required reviewer result, no matter how long it takes.
39
+ - Do not interrupt slow reviewer agents just because they are still running.
40
+ - Do not call the work done while a required reviewer round is still in flight.
41
+ - Read the full reviewer outputs before deciding what to fix.
42
+
43
+ ## Step 4: Fix Meaningful Findings
44
+
45
+ - Fix real correctness, regression, maintainability, and code-guide issues.
46
+ - Rerun the most relevant verification for the changed area after the fixes.
47
+ - If a reviewer comment is only a nit or clearly optional polish, note that distinction and do not keep reopening the loop just to satisfy minor taste differences.
48
+ - If a finding changes durable behavior or repo memory, update the relevant docs and workspace state before the next round.
49
+
50
+ ## Step 5: Close The Old Review Round
51
+
52
+ - Treat `code-reviewer` and `code-health-reviewer` as one-shot reviewer agents.
53
+ - After you have read a reviewer result, close that reviewer thread.
54
+ - If another pass is needed later, spawn a fresh reviewer instead of reusing the old thread.
55
+
56
+ ## Step 6: Repeat Until The Slice Is Actually Clear
57
+
58
+ - Start a fresh round whenever you made meaningful fixes in response to the previous round.
59
+ - Reuse the same scope anchor when it still represents the slice cleanly; otherwise hand the new round the updated changed-file set or follow-up commit.
60
+ - Rerun `code-guide-audit` when the fixes materially changed guide-relevant behavior or when the previous round surfaced guide-related issues.
61
+ - Stop only when no meaningful findings remain. Optional polish and obvious nitpicks do not block closeout.
62
+
63
+ ## Step 7: Report The Closeout State
64
+
65
+ Summarize:
66
+
67
+ - what scope was reviewed
68
+ - which reviewers ran
69
+ - what meaningful issues were fixed
70
+ - what verification ran
71
+ - whether the slice is now clear or what still blocks it
72
+
73
+ ## Gotchas
74
+
75
+ - Fresh reviewer rounds matter. If you make meaningful fixes, do not treat older reviewer findings as if they still describe the current code.
76
+ - Green local tests are not enough if required reviewer threads are still running. Wait for the actual reviewer outputs before calling the slice done.
77
+ - Close reviewer agents after each round. Reusing a stale reviewer thread weakens the signal and blurs which code state the findings apply to.
78
+ - When this loop changes repo-health or upgrade behavior, test real old-repo edge cases, not just fresh-init cases.
79
+ - If a reviewer result is clean, it should still name the key paths and related files it checked. A "looks fine" skim is not a real closeout pass.
80
+
81
+ ## Keep This Skill Sharp
82
+
83
+ - After meaningful runs, add new gotchas when the same review-loop failure, stale-review mistake, or repo-upgrade edge case is likely to happen again.
84
+ - Tighten the description if the skill fires too broadly or misses real prompts like "final review pass" or "before we call this done."
85
+ - If the loop keeps re-creating the same helper logic or review instructions, move that reusable logic into the skill or its supporting resources instead of leaving it in chat.
@@ -0,0 +1,4 @@
1
+ interface:
2
+ display_name: "Adversarial Review"
3
+ short_description: "Close out a code slice with iterative review"
4
+ default_prompt: "Use $adversarial-review to close out this non-trivial implementation slice before calling it done."
@@ -1,80 +1,130 @@
1
1
  ---
2
2
  name: backend-context-interview
3
- description: Gather and persist durable backend project context when missing or insufficient for implementation, architecture decisions, or ship-readiness review. Use this to ask project-level questions about deployment reality, scale, criticality, compatibility, tenant model, security posture, reliability expectations, and other durable backend context that is not clearly documented. This is not a feature-discovery skill.
3
+ description: Gather and persist durable backend project context when it is missing, stale, contradictory, or too weak to support implementation, architecture, migration, or ship-readiness decisions. Use when a task needs project-level backend facts such as deployment reality, user exposure, scale, criticality, tenant model, compatibility expectations, security posture, observability expectations, or compliance constraints, and those facts are not already clear in `AGENTS.md` or the repo docs. This is not a feature-discovery skill and should not trigger for endpoint details, acceptance criteria, or other task-specific product questions.
4
4
  ---
5
5
 
6
6
  # Backend Context Interview
7
7
 
8
- Use this skill when relevant backend project context is missing, stale, contradictory, or too weak to support correct implementation or review decisions.
8
+ Use this skill to fill in missing backend operating context only when that context materially changes the right engineering choices.
9
9
 
10
- ## Goals
10
+ This skill is for durable project truth, not for feature planning. It should reduce repeated questioning, not create a habit of interviewing the user every time a backend task appears.
11
11
 
12
- 1. identify the missing backend context that materially affects the work
13
- 2. ask only high-leverage questions that cannot be answered from the repo or guidance files
14
- 3. persist durable context into the project root guidance file
15
- 4. avoid repeated questioning in future tasks
12
+ ## What This Skill Owns
16
13
 
17
- This skill is for project-level operating context, not feature requirements gathering.
14
+ - identify which backend context facts are still missing after reading the repo guidance
15
+ - ask only the smallest set of high-leverage project questions
16
+ - persist the durable answers into the project guidance layer
17
+ - avoid re-asking the same foundational backend questions later
18
18
 
19
- ## When to use
19
+ ## When To Use This Skill
20
20
 
21
- Use this skill when the current task depends on context such as:
22
- - internal tool vs public internet-facing product
23
- - expected scale, concurrency, and criticality
24
- - regulatory, privacy, or compliance requirements
25
- - multi-tenant vs single-tenant behavior
26
- - backward compatibility requirements
27
- - uptime and reliability expectations
28
- - migration and rollback risk tolerance
29
- - security posture expectations
30
- - observability or incident response expectations
31
- - infrastructure constraints that materially affect design
32
-
33
- Do not use this skill when the answer is already clearly present in `AGENTS.md`, architecture docs, runbooks, or the task itself.
34
- Do not use this skill to ask about feature-specific behavior, UX details, endpoint shapes, acceptance criteria, implementation preferences, or other concrete requirements that belong in planning or normal task clarification.
21
+ Use it when the task depends on backend context such as:
22
+
23
+ - whether the system is internal, customer-facing, partner-facing, or public
24
+ - expected scale, concurrency, workload shape, or job intensity
25
+ - reliability, outage, or data-loss tolerance
26
+ - tenant model and isolation expectations
27
+ - backward compatibility or migration constraints
28
+ - security posture or exposure assumptions
29
+ - regulatory, privacy, or compliance constraints
30
+ - observability, incident response, or audit expectations
31
+ - infrastructure or deployment constraints that materially affect design
32
+
33
+ ## When Not To Use This Skill
34
+
35
+ - Do not use it when the answer is already clearly documented in `AGENTS.md`, architecture docs, runbooks, or durable repo guidance.
36
+ - Do not use it for feature-specific behavior, endpoint shapes, UX details, acceptance criteria, or implementation preferences.
37
+ - Do not use it as a substitute for planning. If the missing information is really about the feature itself, use planning or normal task clarification instead.
35
38
 
36
39
  ## Workflow
37
40
 
38
- ### 1. Check persisted context first
41
+ ### 1. Read Persisted Context First
39
42
 
40
- Inspect the project root guidance files.
43
+ - Read `AGENTS.md` first.
44
+ - Look for `## Project Context`, `## Backend Context`, or equivalent sections.
45
+ - Read the repo docs that are most likely to hold deployment, security, migration, or operating constraints.
46
+ - If the existing context is sufficient and credible, stop. Do not interview the user just because the skill triggered.
41
47
 
42
- Priority:
43
- 1. `AGENTS.md`
48
+ ### 2. Decide What Is Actually Missing
44
49
 
45
- Look for:
46
- - `## Project Context`
47
- - `## Backend Context`
48
- - equivalent sections with the same intent
50
+ Ask only about facts that would materially change implementation or review choices.
49
51
 
50
- If the existing section is accurate and sufficient, do not interview the user.
52
+ High-value examples:
51
53
 
52
- ### 2. Determine what is actually missing
54
+ - internal tool vs public internet-facing product
55
+ - low-risk internal automation vs business-critical system
56
+ - best-effort batch work vs strict correctness and rollback expectations
57
+ - single-tenant assumptions vs hard tenant isolation
58
+ - backward compatibility required vs safe to break old contracts
53
59
 
54
- Only ask questions that materially affect implementation or review choices.
60
+ Low-value examples:
55
61
 
56
- Good triggers:
57
- - public service vs internal tool changes reliability and security bar
58
- - scale and concurrency change architecture depth and observability expectations
59
- - compatibility requirements change migration and API decisions
60
- - tenant model changes authorization and data-isolation design
62
+ - "What should this feature do?"
63
+ - "What endpoint shape do you want?"
64
+ - "Should I use library X or Y?"
61
65
 
62
- Do not ask broad or low-value questions.
63
- Do not ask feature-specific product questions.
66
+ ### 3. Ask The Smallest Useful Interview
64
67
 
65
- ### 3. Ask concise grouped questions
68
+ - Group questions so the user can answer quickly.
69
+ - Prefer a few project-level questions over a long checklist.
70
+ - Ask only what the repo cannot already tell you.
66
71
 
67
- Ask the minimum set of questions needed.
72
+ Good categories:
68
73
 
69
- Suggested categories:
70
- - product type and exposure
74
+ - product exposure and real users
71
75
  - scale and criticality
76
+ - compatibility and migration safety
72
77
  - data sensitivity and compliance
78
+ - reliability and observability expectations
79
+
80
+ Good question shapes:
73
81
 
74
- Do not ask generic product questions that do not affect backend engineering.
75
- Do ask project-level questions like:
76
82
  - whether the product is internal, customer-facing, partner-facing, or public
77
- - whether there are real users yet or only development/staging use
78
- - expected traffic, concurrency, or import/job intensity
83
+ - whether there are real users yet or only dev/staging use
84
+ - expected traffic, concurrency, or job/import volume
79
85
  - whether backward compatibility is required
80
- - how costly outages, corruption, or security mistakes would be
86
+ - how costly outages, data corruption, or security mistakes would be
87
+
88
+ ### 4. Persist Only Durable Answers
89
+
90
+ - Write durable answers into the project root guidance file, normally `AGENTS.md`.
91
+ - Prefer a `## Backend Context` section if one exists.
92
+ - If there is no such section, add the smallest coherent backend-context section that matches the repo's guidance style.
93
+ - Persist stable operating facts, not one-off task details.
94
+ - Keep the wording concrete enough that a later agent can make decisions from it without rereading the whole conversation.
95
+
96
+ What belongs there:
97
+
98
+ - exposure level
99
+ - scale assumptions
100
+ - compatibility expectations
101
+ - tenant model
102
+ - security/compliance constraints
103
+ - reliability bar
104
+ - observability expectations
105
+
106
+ What does not belong there:
107
+
108
+ - current feature requirements
109
+ - temporary blockers
110
+ - one-off implementation notes
111
+ - chat-only phrasing that will age badly
112
+
113
+ ### 5. Reuse The Saved Context
114
+
115
+ - After persisting the answers, treat them as the new source of truth for later work.
116
+ - Do not ask the same foundational questions again unless the saved context is clearly stale or contradictory.
117
+
118
+ ## Gotchas
119
+
120
+ - Missing backend context is not the same as missing feature requirements. Do not drift into product discovery.
121
+ - If `AGENTS.md` already answers the important questions, stop there. Re-asking stable project questions is wasted user effort.
122
+ - Persist only durable operating truth. If the answer only matters for the current task, it does not belong in backend context.
123
+ - Do not ask broad "tell me about your backend" questions. Ask the few facts that would actually change architecture, migration, reliability, or security choices.
124
+ - If the repo gives partial answers, ask only the delta instead of restating the full questionnaire.
125
+
126
+ ## Keep This Skill Sharp
127
+
128
+ - After meaningful runs, add new gotchas when the same backend-context confusion or repeated over-questioning happens again.
129
+ - Tighten the description if the skill starts firing for feature-planning prompts instead of true project-context gaps.
130
+ - If the same persistence pattern or backend-context template keeps getting recreated, move that reusable guidance into this skill instead of relearning it in chat.
@@ -1,4 +1,4 @@
1
1
  interface:
2
2
  display_name: "Backend Context Interview"
3
- short_description: "Ask only the backend context questions that materially change implementation or review decisions"
4
- default_prompt: "Use this skill to inspect persisted project context, identify only the missing backend deployment or reliability facts that materially affect the work, ask a concise high-leverage interview if needed, and persist durable Backend Context into AGENTS.md."
3
+ short_description: "Ask only the missing backend context questions"
4
+ default_prompt: "Use $backend-context-interview to inspect persisted guidance, ask only the missing backend project-context questions that materially affect the work, and persist durable Backend Context into AGENTS.md."
@@ -1,6 +1,6 @@
1
1
  ---
2
2
  name: backend-ship-audit
3
- description: Audit a backend scope for practical ship readiness with evidence-based findings focused on real release risk rather than style. Use when reviewing a backend service, feature, endpoint group, worker, scheduler, API surface, pull request, or directory to decide whether it is ready to ship; when the backend scope must be resolved from repository structure; when complete-file reading is required to understand behavior and dependencies; when only high-leverage deployment-context questions should be asked after repository exploration; when durable backend context should be persisted in project-root AGENTS.md; or when a timestamped audit should be written under .waypoint/audit/.
3
+ description: Audit a backend scope for practical ship readiness with evidence-based findings focused on real release risk rather than style. Use when the user asks whether a backend service, API, worker, scheduler, endpoint group, pull request, or backend directory is ready to ship; when Codex needs to perform a release-risk review before launch; when the audit scope must be resolved from repository structure; when complete-file reading is required to understand behavior and dependencies; when only high-leverage deployment-context questions should be asked after repository exploration; or when a timestamped backend audit should be written under `.waypoint/audit/`. Do not use this for frontend ship review, generic style review, PR comment triage, or a one-off coding-guide check.
4
4
  ---
5
5
 
6
6
  # Backend ship audit
@@ -11,6 +11,13 @@ Use bundled resources as follows:
11
11
  - Use `references/audit-framework.md` for detailed evaluation prompts and severity calibration.
12
12
  - Use `references/report-template.md` for the audit structure and finding format.
13
13
 
14
+ ## When Not To Use This Skill
15
+
16
+ - Skip it for frontend release review; use the frontend ship-audit workflow instead.
17
+ - Skip it for generic code review or maintainability review that is not explicitly about ship readiness.
18
+ - Skip it for active PR comment triage; use `pr-review` for that loop.
19
+ - Skip it for a one-off coding-guide compliance check on a narrow slice; use `code-guide-audit` for that job.
20
+
14
21
  ## 1. Resolve the reviewable unit
15
22
 
16
23
  Turn the user request into the narrowest defensible backend unit that can be audited end to end.
@@ -121,30 +128,7 @@ Make this edit manually. Prefer the smallest precise change that preserves all u
121
128
 
122
129
  Assess the scoped backend like a strong backend reviewer. Focus on real ship risk, not code taste.
123
130
 
124
- Evaluate at least these categories when relevant:
125
- - scope and architecture fit
126
- - API and contract quality
127
- - input validation and trust boundaries
128
- - domain modeling correctness
129
- - data integrity and consistency
130
- - transaction boundaries and idempotency
131
- - migration safety and rollback safety
132
- - failure handling and retry semantics
133
- - timeouts, cancellation, and backpressure
134
- - concurrency and race risks
135
- - queue and background job correctness
136
- - authorization and access control
137
- - authentication assumptions
138
- - secret handling and configuration safety
139
- - tenant isolation
140
- - security vulnerabilities and unsafe defaults
141
- - boundary clarity between layers and services
142
- - reliability under expected production conditions
143
- - observability, alertability, and debuggability
144
- - test coverage for meaningful failure modes
145
- - future legibility and maintainability as it affects shipping risk
146
-
147
- Use judgment. Do not force findings in every category.
131
+ Use `references/audit-framework.md` to drive the detailed category pass and severity calibration. Do not force findings in every category; use judgment and focus on the risks that actually matter for release readiness.
148
132
 
149
133
  Treat missing evidence carefully:
150
134
  - Missing tests, docs, or operational controls can be findings if the absence creates real release risk.
@@ -219,3 +203,17 @@ Do not include:
219
203
  - vague advice such as "add more tests" without naming the missing failure mode or blind spot
220
204
 
221
205
  Prefer a short audit with strong evidence over a long audit with weak claims.
206
+
207
+ ## Gotchas
208
+
209
+ - Do not start asking deployment-context questions before you have read the scoped code and docs. This skill should ask only what the repository cannot answer.
210
+ - Do not rely on grep hits or partial snippets for anything that informs a finding. Backend ship audits need complete-file reads for the code and docs that matter.
211
+ - Do not drift into style review, generic refactor advice, or "nice to have" cleanup. Every finding should connect to real release risk.
212
+ - Do not trust route names or file names alone to define the scope. Resolve the actual entry points, persistence paths, jobs, and external dependencies before judging readiness.
213
+ - If deployment context is missing, state the assumption and calibrate confidence or severity accordingly. Do not present guessed operating conditions as established fact.
214
+
215
+ ## Keep This Skill Sharp
216
+
217
+ - After meaningful audits, add new gotchas when the same backend-risk blind spot, scope mistake, or deployment-context question keeps recurring.
218
+ - Tighten the description if the skill misses real prompts like "is this API ready to ship" or fires on requests that only need generic code review.
219
+ - If the audit keeps reusing the same detailed evaluation logic or evidence format, move that reusable detail into `references/` instead of expanding the hub file.
@@ -1,3 +1,4 @@
1
- display_name: Backend Ship Audit
2
- short_description: Audit a backend scope for real ship risk and write an evidence-based readiness report.
3
- default_prompt: Audit this backend scope for ship readiness. Resolve scope from the repository, read relevant backend code and docs completely, ask only missing high-leverage questions, persist durable backend context, and write a prioritized audit under .waypoint/audit/.
1
+ interface:
2
+ display_name: "Backend Ship Audit"
3
+ short_description: "Audit backend ship-readiness with evidence"
4
+ default_prompt: "Use $backend-ship-audit to audit this backend scope for ship readiness and write the resulting evidence-based report."
@@ -1,6 +1,6 @@
1
1
  ---
2
2
  name: code-guide-audit
3
- description: Audit a specific feature, file set, or implementation slice against the coding guide and report only coding-guide-related violations or risks in that scope. Use after building a feature, when the user wants a coding-guide compliance check, before review on a targeted area, or when validating whether a change follows rules like no silent fallbacks, strong boundary validation, frontend reuse, explicit state handling, and behavior-focused verification.
3
+ description: Audit a specific feature, file set, or implementation slice against the coding guide and report only coding-guide-related violations or risks in that scope. Use when the user asks for a code-guide audit, coding-guide compliance check, guide-specific review, or wants to know whether a change follows rules like no silent fallbacks, strong boundary validation, frontend reuse, explicit state handling, and behavior-focused verification. Do not use this for broad ship-readiness review, generic bug hunting, PR comment triage, or repo-wide cleanup.
4
4
  ---
5
5
 
6
6
  # Code Guide Audit
@@ -9,6 +9,13 @@ Use this skill for a targeted audit against the coding guide, not for a whole-re
9
9
 
10
10
  This skill owns one job: inspect the specific code the user points at, map it against the coding guide, and report only guide-related findings in that scope.
11
11
 
12
+ ## When Not To Use This Skill
13
+
14
+ - Skip it for broad ship-readiness review; use `pre-pr-hygiene` or a ship-audit workflow for that.
15
+ - Skip it for generic bug finding or regression review that is not specifically about the coding guide.
16
+ - Skip it for active PR comment triage; use `pr-review` for that loop.
17
+ - Skip it for repo-wide cleanup unless the user explicitly asked for a repo-wide coding-guide audit.
18
+
12
19
  ## Step 1: Load The Right Scope
13
20
 
14
21
  - Read the repo's routed code guide.
@@ -67,3 +74,17 @@ Summarize the scoped result in review style:
67
74
  - each finding tied back to the relevant coding-guide rule
68
75
  - include exact file references
69
76
  - then note any skipped guide areas or residual uncertainty
77
+
78
+ ## Gotchas
79
+
80
+ - Do not turn this into generic code review. Every finding should tie back to a specific coding-guide rule.
81
+ - Do not audit the whole repo by accident. Resolve the narrow slice first, then stay inside it unless an out-of-scope issue would seriously mislead the user.
82
+ - Do not report a guide violation from a grep hit alone. Read the real implementation and the nearby evidence before calling it a problem.
83
+ - Do not force every coding-guide rule onto every change. Skip non-applicable rules explicitly instead of inventing weak findings.
84
+ - If you notice a broader ship-risk issue that is not really a coding-guide issue, say it is outside this skill's scope instead of quietly drifting into another audit.
85
+
86
+ ## Keep This Skill Sharp
87
+
88
+ - After meaningful runs, add new gotchas when the same guide-specific failure mode or scope-drift mistake keeps recurring.
89
+ - Tighten the description if the skill fires on generic review requests or misses real prompts like "check this against the code guide."
90
+ - If the same guide-rule translation logic keeps repeating, move that reusable detail into a supporting reference instead of expanding the hub file.
@@ -1,4 +1,4 @@
1
1
  interface:
2
2
  display_name: "Code Guide Audit"
3
- short_description: "Audit scoped code against the coding guide"
4
- default_prompt: "Use this skill to audit a specific feature, file set, or implementation slice against the coding guide and report only guide-related violations or risks in that scope."
3
+ short_description: "Audit code-guide compliance on a scoped slice"
4
+ default_prompt: "Use $code-guide-audit to audit this specific feature, file set, or implementation slice against the coding guide."
@@ -1,6 +1,6 @@
1
1
  ---
2
2
  name: conversation-retrospective
3
- description: Analyze the active conversation for durable repo knowledge, skill improvements, and repeated workflow patterns. Use when the user asks to save what was learned from the current conversation, update memory/docs without more prompting, improve skills that were used or exposed gaps, or propose new skills based on repetitive work in the live thread.
3
+ description: Harvest durable knowledge, user feedback, skill lessons, and repeated workflow patterns from the active conversation into the repo's existing memory system. Use when the user asks to save what was learned, write down what changed, capture lessons from this thread, update docs or handoff state without more prompting, improve skills that were used or exposed gaps, or record new skill ideas based on repetitive work in the live conversation. Do not use this for generic planning, broad docs audits, or digging through archived session history unless the user explicitly asks for that.
4
4
  ---
5
5
 
6
6
  # Conversation Retrospective
@@ -11,6 +11,13 @@ This skill works from the live conversation already in context. Do not go huntin
11
11
 
12
12
  This is a closeout and distillation workflow, not a generic planning pass or a broad docs audit.
13
13
 
14
+ ## When Not To Use This Skill
15
+
16
+ - Skip it for generic planning or implementation design; use the planning workflow for that.
17
+ - Skip it for broad docs audits that are not driven by what happened in this conversation.
18
+ - Skip it when the user wants archived history analysis rather than the live thread; only dig into old sessions if they explicitly ask.
19
+ - Skip it when there is nothing durable to preserve and no skill or workflow lesson to capture.
20
+
14
21
  ## Read First
15
22
 
16
23
  Before persisting anything:
@@ -123,3 +130,17 @@ Summarize:
123
130
  - what you intentionally left unpersisted because it was transient
124
131
 
125
132
  If no substantive persistence changes were needed, say that explicitly instead of inventing updates.
133
+
134
+ ## Gotchas
135
+
136
+ - Do not turn this skill into a transcript dump. Persist only durable knowledge, live state, or reusable lessons.
137
+ - Do not scatter the same learning across multiple files. Pick the smallest truthful home the repo already uses.
138
+ - Do not blame a skill for a problem that was really an execution mistake or an external tool failure.
139
+ - Do not preserve one-off user phrasing or temporary frustration as if it were standing repo policy unless the user clearly framed it that way.
140
+ - Do not go hunting through archived session files just because the live thread feels incomplete. This skill should work from the current conversation unless the user explicitly broadens the scope.
141
+
142
+ ## Keep This Skill Sharp
143
+
144
+ - After meaningful retrospectives, add new gotchas when the same persistence mistake, memory-placement mistake, or skill-triage mistake keeps recurring.
145
+ - Tighten the description if the skill misses real prompts like "save what we learned here" or fires on requests that are really planning or docs-audit work.
146
+ - If the same kind of durable learning keeps needing a custom destination, add that routing guidance to the skill instead of leaving the decision to be rediscovered in chat.
@@ -1,4 +1,4 @@
1
1
  interface:
2
2
  display_name: "Conversation Retrospective"
3
3
  short_description: "Harvest the live conversation into repo memory"
4
- default_prompt: "Use this skill to analyze the active conversation, preserve durable knowledge and user feedback in the repo's existing memory surfaces, evaluate whether used skills succeeded or failed, capture concrete errors and friction points, improve skills whose guidance was insufficient, and record new skill ideas without asking follow-up questions when the correct destination is clear."
4
+ default_prompt: "Use $conversation-retrospective to preserve the durable lessons, repo-memory updates, and skill learnings from this live conversation."
@@ -1,6 +1,6 @@
1
1
  ---
2
2
  name: docs-sync
3
- description: Audit routed docs against the actual codebase and shipped behavior. Use when docs may be stale after implementation work, before pushing or opening a PR, when routes/contracts/config changed, or when the agent should find missing, incorrect, outdated, or broken documentation and then update or flag the exact gaps.
3
+ description: Audit routed docs against the actual codebase and shipped behavior. Use when the user asks to sync docs, when docs may be stale after implementation work, before pushing or opening a PR, when routes, contracts, config, commands, or shipped behavior changed, or when Codex should find missing, incorrect, outdated, or broken documentation and then update or flag the exact gaps. Do not use this for vendor-doc ingestion, repo-memory cleanup, or broad code review that is not specifically about docs drift.
4
4
  ---
5
5
 
6
6
  # Docs Sync
@@ -9,6 +9,13 @@ Use this skill to keep repo docs aligned with reality.
9
9
 
10
10
  This is not a vendor-doc ingestion skill and not a workspace-cleanup skill. It owns one job: compare the codebase and shipped behavior against routed docs, then fix or flag the mismatches.
11
11
 
12
+ ## When Not To Use This Skill
13
+
14
+ - Skip it for importing or summarizing upstream vendor docs. Link to the real source instead of copying it into the repo.
15
+ - Skip it for workspace compression or tracker cleanup. This skill is about docs drift, not handoff hygiene.
16
+ - Skip it for broad code review that is not specifically about docs-to-reality mismatches.
17
+ - Skip it when the user only wants a new durable plan or architecture note; use the planning or normal docs-writing flow in that case.
18
+
12
19
  ## Read First
13
20
 
14
21
  Before auditing docs:
@@ -55,3 +62,17 @@ Summarize:
55
62
  - what docs were stale or missing
56
63
  - what you updated
57
64
  - what still needs a decision, if anything
65
+
66
+ ## Gotchas
67
+
68
+ - Do not trust docs-to-docs consistency alone. The source of truth is the shipped code and behavior, not whether two markdown files agree with each other.
69
+ - Do not leave stale future-tense claims behind after a feature ships or is cut. Docs drift often shows up as roadmap language that quietly became false.
70
+ - Do not update prose without checking commands, routes, config names, and examples. Small copied snippets are often where docs rot first.
71
+ - Do not invent certainty when the right doc shape is unclear. Flag the mismatch instead of bluffing a final answer.
72
+ - After touching routed docs, always refresh the generated docs/context layer so the repo’s index and bootstrap bundle match the new reality.
73
+
74
+ ## Keep This Skill Sharp
75
+
76
+ - After meaningful runs, add new gotchas when the same docs-drift pattern, broken example shape, or stale-claim mistake keeps recurring.
77
+ - Tighten the description if the skill misses real prompts like "sync the docs" or fires on requests that are really about repo-memory cleanup instead.
78
+ - If the skill starts needing detailed provider-specific or command-heavy guidance, move that detail into references instead of bloating the hub file.
@@ -1,4 +1,4 @@
1
1
  interface:
2
2
  display_name: "Docs Sync"
3
3
  short_description: "Audit docs against the real codebase"
4
- default_prompt: "Use this skill to audit routed docs against the actual codebase and shipped behavior, then update or flag any missing, incorrect, or outdated documentation."
4
+ default_prompt: "Use $docs-sync to audit routed docs against the actual codebase and shipped behavior, then update or flag any missing, incorrect, or outdated documentation."
@@ -16,6 +16,12 @@ Use this skill when relevant frontend project context is missing, stale, contrad
16
16
 
17
17
  This skill is for project-level operating context, not feature requirements gathering.
18
18
 
19
+ ## When Not To Use This Skill
20
+
21
+ - Skip it when the needed context is already clearly documented in `AGENTS.md` or routed docs.
22
+ - Skip it for feature-specific UX, copy, flow, or acceptance-criteria questions.
23
+ - Skip it for implementation preferences that can be resolved from the codebase.
24
+
19
25
  ## When to use
20
26
 
21
27
  Use this skill when the current task depends on context such as:
@@ -70,3 +76,16 @@ Good project-level question areas include:
70
76
  - accessibility expectations or compliance targets
71
77
  - whether SEO matters for any routes
72
78
  - whether backward compatibility in user workflows matters
79
+
80
+ ## Gotchas
81
+
82
+ - Do not re-ask stable context that is already present in `AGENTS.md` or routed docs.
83
+ - Do not drift into feature discovery. This skill is about project context that changes implementation or review choices across many tasks.
84
+ - Do not persist transient task details into `## Frontend Context`; only save durable deployment and product constraints.
85
+ - Do not create a new guidance file if `AGENTS.md` is missing unless the user explicitly asked for that.
86
+
87
+ ## Keep This Skill Sharp
88
+
89
+ - Add new gotchas when the same frontend-context blind spot or repeated unnecessary question keeps showing up in real work.
90
+ - Tighten the description if the skill fires on feature-planning prompts or misses real requests about browser support, accessibility, SEO, or deployment context.
91
+ - If the same stable setup facts keep being asked across repos, add sharper routing or persistence guidance instead of leaving that learning in chat.
@@ -1,4 +1,4 @@
1
1
  interface:
2
2
  display_name: "Frontend Context Interview"
3
- short_description: "Ask only the frontend context questions that materially change implementation or review decisions"
4
- default_prompt: "Use this skill to inspect persisted project context, identify only the missing frontend product or deployment facts that materially affect the work, ask a concise high-leverage interview if needed, and persist durable Frontend Context into AGENTS.md."
3
+ short_description: "Ask only the missing frontend context questions"
4
+ default_prompt: "Use $frontend-context-interview to inspect persisted project context, ask only the missing frontend deployment or product-context questions that materially affect the work, and persist durable Frontend Context into AGENTS.md when needed."
@@ -5,6 +5,12 @@ description: Audit a defined frontend scope for ship-readiness with a strong foc
5
5
 
6
6
  Audit ship-readiness like a strong frontend reviewer. Optimize for user impact, release risk, and production correctness. Do not optimize for style policing.
7
7
 
8
+ ## When Not To Use This Skill
9
+
10
+ - Skip it for backend ship-readiness; use the backend ship-audit workflow instead.
11
+ - Skip it for generic code review, maintainability review, or PR comment triage that is not explicitly about ship readiness.
12
+ - Skip it for a one-off coding-guide check on a narrow slice; use `code-guide-audit` for that.
13
+
8
14
  Use this workflow:
9
15
 
10
16
  1. Resolve the scope.
@@ -85,3 +91,17 @@ When evidence is partial:
85
91
  - say what remains assumed
86
92
  - lower confidence instead of overstating certainty
87
93
  - ask only the missing questions that would change the release decision
94
+
95
+ ## Gotchas
96
+
97
+ - Do not drift into style review or generic UX commentary. Every finding should connect to release risk or user-facing correctness.
98
+ - Do not rely on grep hits or partial snippets for files that support a finding. Ship audits need complete reads of the frontend files and docs that matter.
99
+ - Do not ask deployment or audience questions before you have exhausted what the repo already tells you.
100
+ - Do not treat API, auth, SEO, accessibility, analytics, or localization assumptions as proven just because they are implied by filenames. Trace the real behavior.
101
+ - If the code and docs disagree, call out the mismatch instead of quietly choosing whichever story feels cleaner.
102
+
103
+ ## Keep This Skill Sharp
104
+
105
+ - Add new gotchas when the same frontend-risk blind spot, stale-doc problem, or deployment-context miss keeps recurring.
106
+ - Tighten the description if the skill fires on ordinary review requests or misses real prompts like "is this route ready to ship" or "audit this frontend before launch."
107
+ - If the same evidence structure or audit framing keeps repeating, move more of that detail into the existing references or helper script instead of bloating the hub file.
@@ -1,3 +1,4 @@
1
- display_name: Frontend Ship Audit
2
- short_description: Audit a scoped frontend surface for ship-readiness with evidence-based findings and durable deployment context.
3
- default_prompt: Audit the ship-readiness of the requested frontend scope. Resolve the reviewable unit from the repo, read all relevant frontend files completely, ask only missing high-leverage questions, persist durable Frontend Context in the project root guidance file when present, and write a prioritized audit at .waypoint/audit/dd-mm-yyyy-hh-mm-frontend-audit.md.
1
+ interface:
2
+ display_name: "Frontend Ship Audit"
3
+ short_description: "Audit frontend ship-readiness with evidence"
4
+ default_prompt: "Use $frontend-ship-audit to audit the requested frontend scope for ship readiness and write the resulting evidence-based report."
@@ -15,6 +15,12 @@ Good plans prove you understand the problem. Size matches complexity — a renam
15
15
 
16
16
  **The handoff test:** Could someone implement this plan without asking you questions? If not, find what's missing.
17
17
 
18
+ ## When Not To Use This Skill
19
+
20
+ - Skip it for tiny obvious edits where a full planning pass would cost more than it saves.
21
+ - Skip it when the user explicitly wants implementation right away and the work is already straightforward.
22
+ - Skip it for post-implementation closeout; use the review or hygiene workflows for that.
23
+
18
24
  ## Read First
19
25
 
20
26
  Before planning:
@@ -150,3 +156,17 @@ When the plan doc is written:
150
156
  ## Quality Bar
151
157
 
152
158
  If the plan would make the implementer ask "where does this hook in?" or "what exactly am I changing?", it is not done.
159
+
160
+ ## Gotchas
161
+
162
+ - Do not spend interview turns on implementation facts that are already in the code or routed docs.
163
+ - Do not stop exploring just because you have a plausible plan. The usual failure mode is shallow repo understanding.
164
+ - Do not leave unresolved architecture or product decisions hidden behind "we can figure that out during implementation."
165
+ - Do not dump a transcript into the plan doc. Distill the decisions and requirements into a clean implementation handoff.
166
+ - Do not treat a reviewed plan as a stopping point. Once the user approves it, the workflow expects execution to continue.
167
+
168
+ ## Keep This Skill Sharp
169
+
170
+ - Add new gotchas when the same planning blind spot, under-explored area, or vague plan failure keeps recurring.
171
+ - Tighten the description if the skill fires on tiny tasks or misses real prompts about migrations, refactors, and implementation-ready design work.
172
+ - If planning keeps depending on the same durable context or external reference paths, encode that routing into the skill instead of rediscovering it in chat.
@@ -1,4 +1,4 @@
1
1
  interface:
2
2
  display_name: "Planning"
3
3
  short_description: "Interview, explore, and write an implementation-ready plan into the repo"
4
- default_prompt: "Use this skill to deeply explore the repository, interview for the product and architectural details that materially affect the work, and write an implementation-ready plan into .waypoint/docs/."
4
+ default_prompt: "Use $planning to deeply explore the repository, interview for the product and architectural details that materially affect the work, and write an implementation-ready plan into .waypoint/docs/."
@@ -7,6 +7,12 @@ description: Triage and close the review loop on an open PR after automated or h
7
7
 
8
8
  Use this skill to drive the PR through review instead of treating review as a one-shot comment sweep.
9
9
 
10
+ ## When Not To Use This Skill
11
+
12
+ - Skip it before a PR has active review or automated review in flight.
13
+ - Skip it for local pre-push hygiene; use `pre-pr-hygiene` for that workflow.
14
+ - Skip it for the repo-internal closeout loop on an unpushed slice; use the normal review workflows instead.
15
+
10
16
  ## Step 1: Wait For Review To Settle
11
17
 
12
18
  - Check the PR's current review and CI status.
@@ -60,3 +66,17 @@ Summarize:
60
66
  - what was intentionally declined
61
67
  - what verification ran
62
68
  - whether the PR is clear or still waiting on reviewer response
69
+
70
+ ## Gotchas
71
+
72
+ - Do not treat a placeholder like "review in progress" as a clean review result.
73
+ - Do not leave comment threads silent just because the code changed. The reply is part of the workflow.
74
+ - Do not assume stacked PR fixes have landed in the branch you are reviewing; compare against the actual base.
75
+ - Do not leave the loop just because CI is slow. A pending review state is still unfinished.
76
+ - Do not declare the PR clear if the required repo-level reviewer passes have not actually run.
77
+
78
+ ## Keep This Skill Sharp
79
+
80
+ - Add new gotchas when the same PR-review failure mode, automation blind spot, or reviewer-state confusion keeps recurring.
81
+ - Tighten the description if the skill fires before review has actually started or misses real prompts about "address these PR comments" or "close the loop on this PR."
82
+ - If the workflow keeps repeating the same review-system quirks, preserve them in the skill instead of letting them stay as one-off chat lessons.
@@ -1,4 +1,4 @@
1
1
  interface:
2
2
  display_name: "PR Review"
3
3
  short_description: "Close the review loop on an active PR"
4
- default_prompt: "Use this skill when a PR has active review comments or automated review in progress. Wait for review to settle, triage every comment, reply inline, fix meaningful issues, push follow-up commits, and repeat until no new meaningful findings remain."
4
+ default_prompt: "Use $pr-review when a PR has active review comments or automated review in progress. Wait for review to settle, triage every comment, reply inline, fix meaningful issues, push follow-up commits, and repeat until no new meaningful findings remain."
@@ -7,6 +7,12 @@ description: Run a broad final hygiene pass before pushing, before opening or up
7
7
 
8
8
  Use this skill for the larger final audit before code leaves the machine.
9
9
 
10
+ ## When Not To Use This Skill
11
+
12
+ - Skip it for tiny changes that do not justify a broad hygiene pass.
13
+ - Skip it after a PR already has active review comments; use `pr-review` for that loop.
14
+ - Skip it when the task is only a narrow coding-guide check or only a docs sync pass.
15
+
10
16
  ## Read First
11
17
 
12
18
  Before the hygiene pass:
@@ -61,3 +67,17 @@ Summarize:
61
67
  - what you fixed
62
68
  - what verification ran
63
69
  - what residual risks remain, if any
70
+
71
+ ## Gotchas
72
+
73
+ - Do not turn this into a whole-repo cleanup mission. Keep the pass tied to the change surface that is about to leave the machine.
74
+ - Do not stop at reporting obvious fixable issues if the correct remediation is clear.
75
+ - Do not call the pass complete without real verification that matches the risk of the change.
76
+ - Do not let docs, contracts, and code drift just because the implementation itself "works."
77
+ - Do not use this as a replacement for active PR review or the normal closeout loop.
78
+
79
+ ## Keep This Skill Sharp
80
+
81
+ - Add new gotchas when the same hygiene blind spot, contract drift, or verification miss keeps recurring.
82
+ - Tighten the description if the skill fires on tiny edits or misses real prompts about "do a final pass before I push."
83
+ - If the same cross-cutting checks keep being rediscovered, encode them more explicitly here instead of relying on chat memory.
@@ -1,4 +1,4 @@
1
1
  interface:
2
2
  display_name: "Pre-PR Hygiene"
3
3
  short_description: "Run the final cross-cutting ship audit"
4
- default_prompt: "Use this skill before pushing or opening/updating a PR for substantial work to do a broader hygiene pass across code, docs, contracts, typing, UI rollback, persistence correctness, and code-guide compliance."
4
+ default_prompt: "Use $pre-pr-hygiene before pushing or opening/updating a PR for substantial work to do a broader hygiene pass across code, docs, contracts, typing, UI rollback, persistence correctness, and code-guide compliance."
@@ -84,3 +84,17 @@ Do not trust the generation step blindly.
84
84
  - prefer a single strong visual over a pile of mediocre ones
85
85
 
86
86
  If the artifact is only for the current conversation, store it in a temp or scratch location. If the user wants a durable asset in the repo, place it in the repo's normal docs or asset structure instead of inventing a new convention.
87
+
88
+ ## Gotchas
89
+
90
+ - Do not make an image when Mermaid or a short paragraph would already explain the point cleanly.
91
+ - Do not annotate a screenshot until you have verified the source screenshot actually shows the right state.
92
+ - Do not bury the main message under too many callouts or labels. One image should usually explain one thing.
93
+ - Do not present a conceptual mockup as if it were a real current UI state. Label it clearly when it is illustrative.
94
+ - Do not trust the rendering step blindly; clipped text, tiny labels, and misplaced arrows are common failure modes.
95
+
96
+ ## Keep This Skill Sharp
97
+
98
+ - Add new gotchas when the same visual clarity problem, screenshot mistake, or rendering failure keeps showing up.
99
+ - Tighten the description if the skill fires when Mermaid would have been enough or misses real requests for annotated screenshots and concept cards.
100
+ - If the same layout patterns or annotation helpers keep repeating, move them into reusable assets or scripts instead of rebuilding them from scratch.
@@ -1,4 +1,4 @@
1
1
  interface:
2
2
  display_name: "Visual Explanations"
3
3
  short_description: "Create generated images and annotated screenshots"
4
- default_prompt: "Use this skill to create a generated image or annotated screenshot when a visual artifact would explain the point more clearly than prose alone. Prefer Mermaid directly when a simple in-chat diagram is enough."
4
+ default_prompt: "Use $visual-explanations to create a generated image or annotated screenshot when a visual artifact would explain the point more clearly than prose alone. Prefer Mermaid directly when a simple in-chat diagram is enough."
@@ -13,6 +13,12 @@ This skill owns the execution tracker layer:
13
13
  - keep `WORKSPACE.md` pointing at the active tracker
14
14
  - move detailed checklists and progress into the tracker instead of bloating the workspace
15
15
 
16
+ ## When Not To Use This Skill
17
+
18
+ - Skip it for small single-shot tasks that fit comfortably in `WORKSPACE.md`.
19
+ - Skip it when the work has already finished and does not need a durable execution log.
20
+ - Skip it when the real need is docs compression or docs sync rather than active execution tracking.
21
+
16
22
  ## Read First
17
23
 
18
24
  Before tracking:
@@ -108,3 +114,17 @@ When you create or update a tracker, report:
108
114
  - the tracker path
109
115
  - the current status
110
116
  - what `WORKSPACE.md` now points to
117
+
118
+ ## Gotchas
119
+
120
+ - Do not create a new tracker if a relevant active tracker already exists for the same workstream.
121
+ - Do not let the tracker become fiction. Completed items, blockers, and verification state should match reality.
122
+ - Do not stuff durable architecture or debugging knowledge into the tracker if it belongs in `.waypoint/docs/`.
123
+ - Do not leave `WORKSPACE.md` carrying the full execution log after a tracker exists.
124
+ - Do not keep trackers "active" forever after the work is done; update the status.
125
+
126
+ ## Keep This Skill Sharp
127
+
128
+ - Add new gotchas when the same tracker drift, duplicate-tracker pattern, or workspace-bloat problem keeps recurring.
129
+ - Tighten the description if the skill fires for tiny work that does not need a tracker or misses long-running remediation campaigns.
130
+ - If the tracker format keeps needing the same recurring section or checklist pattern, capture that reusable pattern in the skill instead of rediscovering it each time.
@@ -1,4 +1,4 @@
1
1
  interface:
2
2
  display_name: "Work Tracker"
3
3
  short_description: "Create or maintain a durable tracker for a large workstream"
4
- default_prompt: "Use this skill to create or update a tracker under .waypoint/track/, keep WORKSPACE.md pointing to it, and move detailed execution state out of chat and into the repo."
4
+ default_prompt: "Use $work-tracker to create or update a tracker under .waypoint/track/, keep WORKSPACE.md pointing to it, and move detailed execution state out of chat and into the repo."
@@ -9,6 +9,12 @@ Keep `WORKSPACE.md` as a live handoff, not a project diary.
9
9
 
10
10
  This skill is for compression, not for erasing context. Preserve what the next agent needs in the first few minutes of a resume, and push durable detail into the docs layer that already exists in the repo.
11
11
 
12
+ ## When Not To Use This Skill
13
+
14
+ - Skip it when the workspace is still short, current, and easy to resume from.
15
+ - Skip it when the detail you want to remove is still active execution state that belongs in a tracker.
16
+ - Skip it when the real problem is stale docs rather than stale workspace history.
17
+
12
18
  ## Read First
13
19
 
14
20
  Before compressing:
@@ -91,3 +97,17 @@ Summarize:
91
97
  - what was collapsed or removed
92
98
  - which durable docs now hold the preserved detail
93
99
  - any remaining risk that still belongs in the workspace
100
+
101
+ ## Gotchas
102
+
103
+ - Do not compress away the active operational truth just because the workspace feels long.
104
+ - Do not rely on `git diff` to decide what matters; the workspace must stay useful even in a dirty tree.
105
+ - Do not delete detail unless you know which routed doc or tracker now preserves it.
106
+ - Do not compress unresolved blockers or immediate next steps into vague summaries.
107
+ - Do not rewrite over in-flight user edits in the workspace or linked docs.
108
+
109
+ ## Keep This Skill Sharp
110
+
111
+ - Add new gotchas when the same compression mistake, lost-context problem, or stale-workspace pattern keeps recurring.
112
+ - Tighten the description if the skill fires on already-clean workspaces or misses real "clean up the handoff" requests.
113
+ - If the same compression pattern keeps moving detail into the same durable home, make that routing more explicit in the skill.
@@ -1,4 +1,4 @@
1
1
  interface:
2
2
  display_name: "Workspace Compress"
3
3
  short_description: "Compress the live workspace handoff"
4
- default_prompt: "Use this skill after a meaningful chunk of work, before stopping, before review, or before opening or updating a PR to keep WORKSPACE.md short, current, and useful to the next agent."
4
+ default_prompt: "Use $workspace-compress after a meaningful chunk of work, before stopping, before review, or before opening or updating a PR to keep WORKSPACE.md short, current, and useful to the next agent."
@@ -26,6 +26,7 @@ Critical rules:
26
26
  You set the standard. Don't learn quality standards from existing code - the codebase may already be degraded. Apply good engineering judgment regardless of what exists.
27
27
  - Read every changed file in full before making a maintainability judgment.
28
28
  - Read enough surrounding files to understand reuse options, shared helpers, tests, contracts, and adjacent patterns before proposing cleanup.
29
+ - Do not clear a change as healthy unless you can explain which surrounding files, reuse candidates, and calling paths you checked to support that conclusion.
29
30
  - Spend most of your effort on code reading and comparison, not on drafting the response.
30
31
 
31
32
  Explore what exists. Search for existing helpers, utilities, and patterns that could be reused instead of duplicated.
@@ -103,5 +104,5 @@ Each finding needs:
103
104
  - suggested fix direction
104
105
 
105
106
  Return:
106
- Scope anchor, changed files read, related files read, reuse candidates checked, findings, brief overall assessment.
107
+ Scope anchor, changed files read, related files read, reuse candidates checked, findings, brief overall assessment. If you report no issues, explain why the surrounding code supported the change instead of only saying the diff looked fine.
107
108
  """
@@ -22,6 +22,7 @@ The diff or commit is only a starting pointer. A diff-only review is a failed re
22
22
  Rules:
23
23
  - Read every changed file in full before forming conclusions.
24
24
  - Read enough related files to understand the changed code's inputs, outputs, call sites, contracts, tests, and nearby patterns.
25
+ - Do not clear a change unless you can explain the critical paths you traced, the related files you read, and why the surrounding code supports the behavior.
25
26
  - Spend most of your effort on reading and tracing code, not drafting the final response.
26
27
  - Find bugs, not style issues.
27
28
  - Assume issues are hiding. Dig until you find them or can justify that the code is solid.
@@ -94,7 +95,7 @@ Description of the issue with evidence.
94
95
  **Fix:** What to change.
95
96
 
96
97
  ### No Issues Found
97
- [Use this section instead if the code is clean. State what you verified, including the important paths and contracts you checked.]
98
+ [Use this section instead if the code is clean. State what you verified, including the important paths, related files, and contracts you checked. A clean review without surrounding-file understanding is a failed review.]
98
99
 
99
100
  Quality bar:
100
101
  Only report issues that:
@@ -7,6 +7,7 @@
7
7
  .agents/skills/work-tracker/
8
8
  .agents/skills/docs-sync/
9
9
  .agents/skills/code-guide-audit/
10
+ .agents/skills/adversarial-review/
10
11
  .agents/skills/visual-explanations/
11
12
  .agents/skills/break-it-qa/
12
13
  .agents/skills/frontend-context-interview/
@@ -22,3 +23,4 @@
22
23
  !.waypoint/docs/**
23
24
  .waypoint/docs/README.md
24
25
  .waypoint/docs/code-guide.md
26
+ # End Waypoint state
@@ -34,7 +34,7 @@ You're direct, opinionated, and evidence-driven. You read before you write. You
34
34
 
35
35
  **Update the durable record.** When behavior changes, update docs. When state changes, update `WORKSPACE.md`. When a better pattern emerges, encode it in the repo contract instead of rediscovering it later.
36
36
 
37
- **Close the loop before complete.** Run `code-reviewer` before considering any non-trivial implementation slice complete. Run `code-health-reviewer` before considering medium or large changes complete, especially when they add structure, duplicate logic, or introduce new abstractions.
37
+ **Close the loop before complete.** Run `adversarial-review` before considering any non-trivial implementation slice complete. That closeout skill should keep looping through reviewer passes and fixes until no meaningful findings remain.
38
38
 
39
39
  **Prefer small, reviewable changes.** Keep work scoped and comprehensible.
40
40
 
@@ -49,7 +49,7 @@ If something important lives only in your head or in the chat transcript, the re
49
49
  - Update `.waypoint/docs/` when durable knowledge changes, and refresh each changed routable doc's `last_updated` field.
50
50
  - Rebuild `.waypoint/DOCS_INDEX.md` whenever routable docs change.
51
51
  - Rebuild `.waypoint/TRACKS_INDEX.md` whenever tracker files change.
52
- - When spawning reviewer agents or other subagents, explicitly set `model` to `gpt-5.4` and `reasoning_effort` to `high` unless the user explicitly requests a different model or lower reasoning.
52
+ - When spawning reviewer agents or other subagents, explicitly set `fork_context: false`, `model` to `gpt-5.4`, and `reasoning_effort` to `high` unless the user explicitly requests a different model or lower reasoning.
53
53
  - Use the repo-local skills and reviewer agents instead of improvising from scratch.
54
54
  - Treat reviewer agents as one-shot workers: once a reviewer returns findings, read the result and close it. If another review pass is needed later, spawn a fresh reviewer instead of reusing the same thread.
55
55
  - Do not kill long-running subagents or reviewer agents just because they are slow.
@@ -99,6 +99,7 @@ Do not document every trivial implementation detail. Document the non-obvious, d
99
99
  - `work-tracker` when large multi-step work needs durable progress tracking in `.waypoint/track/`
100
100
  - `docs-sync` when routed docs may be stale, missing, or inconsistent with the codebase
101
101
  - `code-guide-audit` when a specific feature or file set needs a targeted coding-guide compliance check
102
+ - `adversarial-review` when a non-trivial implementation slice is nearing completion and needs the default closeout loop for reviewer agents plus code-guide checks
102
103
  - `visual-explanations` when a generated image or annotated screenshot would explain the work more clearly than prose alone; Mermaid diagrams do not need a skill
103
104
  - `conversation-retrospective` after major completed work pieces so the active conversation is distilled into durable memory, user feedback and errors are preserved, exercised skills are improved, and real new-skill candidates are recorded
104
105
  - `break-it-qa` when a browser-facing feature should be attacked with invalid inputs, refreshes, repeated clicks, wrong action order, or other adversarial manual QA
@@ -128,19 +129,17 @@ Run `plan-reviewer` before presenting a non-trivial implementation plan to the u
128
129
 
129
130
  ## Review Loop
130
131
 
131
- Use reviewer agents before considering the work complete, not just as a reflex after every tiny commit.
132
-
133
- 1. Run `code-reviewer` before considering any non-trivial implementation slice complete.
134
- 2. Run `code-health-reviewer` before considering medium or large changes complete, especially when they add structure, duplicate logic, or introduce new abstractions.
135
- 3. If both apply, launch `code-reviewer` and `code-health-reviewer` in parallel as background, read-only reviewers.
136
- 4. Treat reviewer agents as one-shot workers. Once a reviewer returns its findings, read the result and close it.
137
- 5. If you need another review pass after changes, spawn a fresh reviewer agent rather than reusing the old thread.
138
- 6. If you have a recent self-authored commit that cleanly represents the reviewable slice, use it as the default review scope anchor. Otherwise scope the reviewers to the current changed slice.
139
- 7. Widen only when surrounding files are needed to validate a finding.
140
- 8. Do not call the work finished before you read the required reviewer results.
141
- 9. Wait for reviewer outputs even if that requires repeated or long waits. Do not interrupt them just because they are still running.
142
- 10. Fix real findings, rerun the relevant verification, update workspace/docs if needed, and make a follow-up commit when fixes change the repo.
143
- 11. Do not call a PR clear, ready, or done until the required reviewer-agent passes for the current slice have actually run.
132
+ Use `adversarial-review` before considering the work complete, not just as a reflex after every tiny commit.
133
+
134
+ 1. Run `adversarial-review` before considering any non-trivial implementation slice complete.
135
+ 2. That skill owns the default closeout loop for the current slice: define the scope, run `code-reviewer`, run `code-health-reviewer` when applicable, run `code-guide-audit`, wait as long as needed, fix meaningful issues, and repeat with fresh reviewer rounds until no meaningful findings remain.
136
+ 3. Treat reviewer agents as one-shot workers. Once a reviewer returns its findings, read the result and close it.
137
+ 4. If you need another review pass after changes, spawn a fresh reviewer agent rather than reusing the old thread.
138
+ 5. If you have a recent self-authored commit that cleanly represents the reviewable slice, use it as the default review scope anchor. Otherwise scope the review loop to the current changed slice.
139
+ 6. Widen only when surrounding files are needed to validate a finding.
140
+ 7. Do not call the work finished before you read the required closeout outputs.
141
+ 8. Wait for reviewer outputs even if that requires repeated or long waits. Do not interrupt them just because they are still running.
142
+ 9. Do not call a PR clear, ready, or done until the required reviewer-agent passes for the current slice have actually run.
144
143
 
145
144
  ## Quality bar
146
145
 
@@ -93,12 +93,11 @@ Working rules:
93
93
  - Use `work-tracker` when a long-running implementation, remediation, or verification campaign needs durable progress tracking
94
94
  - Use `docs-sync` when the docs may be stale or a change altered shipped behavior, contracts, routes, or commands
95
95
  - Use `code-guide-audit` for a targeted coding-guide compliance pass on a specific feature, file set, or change slice
96
+ - Use `adversarial-review` before considering a non-trivial implementation slice complete; it owns the closeout loop for `code-reviewer`, `code-health-reviewer`, and `code-guide-audit`, reruns fresh review rounds after meaningful fixes, and stops only when no meaningful findings remain
96
97
  - Use `visual-explanations` when a generated image or annotated screenshot would explain the work more clearly than prose alone; Mermaid diagrams can be written directly in chat without invoking a skill
97
98
  - Use `conversation-retrospective` after major completed work pieces to preserve durable learnings, capture user feedback and errors, improve any skills that were exercised, and record real new-skill candidates
98
99
  - Do not invoke `break-it-qa`, `frontend-ship-audit`, or `backend-ship-audit` yourself from the managed AGENTS block workflow; they are user-facing skills for explicit human-requested QA or ship-readiness audits, not default agent steps
99
100
  - Before presenting a non-trivial implementation plan to the user, run `plan-reviewer` and iterate on the plan until it has no meaningful review findings left
100
- - Before considering a non-trivial implementation slice complete, run `code-reviewer`; use a recent self-authored commit as the default scope anchor when one cleanly represents that slice
101
- - Before considering medium or large changes complete, run `code-health-reviewer`, especially when they add structure, duplicate logic, or introduce new abstractions
102
101
  - Treat `plan-reviewer`, `code-reviewer`, and `code-health-reviewer` as one-shot agents: once a reviewer returns findings, close it; if another pass is needed later, spawn a fresh reviewer instead of reusing the old thread
103
102
  - Before pushing or opening/updating a PR for substantial work, use `pre-pr-hygiene`
104
103
  - Use `pr-review` once a PR has active review comments or automated review in progress