waypoint-codex 1.0.13 → 1.0.15

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (43) hide show
  1. package/dist/src/core.js +4 -5
  2. package/package.json +1 -1
  3. package/templates/.agents/skills/collapse-fragmented-modules/SKILL.md +31 -52
  4. package/templates/.agents/skills/collapse-fragmented-modules/agents/openai.yaml +4 -0
  5. package/templates/.agents/skills/edit-at-the-right-layer/SKILL.md +33 -0
  6. package/templates/.agents/skills/edit-at-the-right-layer/agents/openai.yaml +4 -0
  7. package/templates/.agents/skills/execution-reset/SKILL.md +68 -31
  8. package/templates/.agents/skills/execution-reset/agents/openai.yaml +2 -2
  9. package/templates/.agents/skills/find-duplicate-ownership/SKILL.md +120 -0
  10. package/templates/.agents/skills/find-duplicate-ownership/agents/openai.yaml +0 -0
  11. package/templates/.agents/skills/find-duplicate-ownership/references/audit-prompts.md +89 -0
  12. package/templates/.agents/skills/foundational-redesign/SKILL.md +18 -13
  13. package/templates/.agents/skills/foundational-redesign/agents/openai.yaml +2 -2
  14. package/templates/.agents/skills/hard-cut/SKILL.md +78 -0
  15. package/templates/.agents/skills/hard-cut/agents/openai.yaml +4 -0
  16. package/templates/.agents/skills/legibility-pass/SKILL.md +30 -26
  17. package/templates/.agents/skills/legibility-pass/agents/openai.yaml +2 -2
  18. package/templates/.agents/skills/make-invariants-explicit/SKILL.md +73 -0
  19. package/templates/.agents/skills/make-invariants-explicit/agents/openai.yaml +4 -0
  20. package/templates/.agents/skills/plan-start/SKILL.md +65 -28
  21. package/templates/.agents/skills/plan-start/agents/openai.yaml +4 -0
  22. package/templates/.agents/skills/plan-swarm-audit/SKILL.md +50 -6
  23. package/templates/.agents/skills/plan-swarm-audit/agents/openai.yaml +2 -2
  24. package/templates/.agents/skills/planning/SKILL.md +114 -62
  25. package/templates/.agents/skills/planning/agents/openai.yaml +2 -2
  26. package/templates/.agents/skills/replace-dont-layer/SKILL.md +68 -22
  27. package/templates/.agents/skills/replace-dont-layer/agents/openai.yaml +4 -0
  28. package/templates/.agents/skills/root-cause-finder/SKILL.md +70 -0
  29. package/templates/.agents/skills/root-cause-finder/agents/openai.yaml +0 -0
  30. package/templates/.agents/skills/test-writing/SKILL.md +53 -39
  31. package/templates/.agents/skills/test-writing/agents/openai.yaml +2 -2
  32. package/templates/.agents/skills/verify-codebase-coherence/SKILL.md +35 -0
  33. package/templates/.agents/skills/verify-codebase-coherence/agents/openai.yaml +4 -0
  34. package/templates/.agents/skills/verify-completeness/SKILL.md +38 -27
  35. package/templates/.agents/skills/verify-completeness/agents/openai.yaml +2 -2
  36. package/templates/.codex/agents/duplicate_ownership_explorer.toml +17 -0
  37. package/templates/.codex/agents/ownership_taxonomy_mapper.toml +16 -0
  38. package/templates/.codex/agents/ssot_judge.toml +17 -0
  39. package/templates/.codex/config.toml +12 -0
  40. package/templates/.agents/skills/adversarial-review/SKILL.md +0 -86
  41. package/templates/.agents/skills/adversarial-review/agents/openai.yaml +0 -4
  42. package/templates/.agents/skills/code-guide-audit/SKILL.md +0 -86
  43. package/templates/.agents/skills/code-guide-audit/agents/openai.yaml +0 -4
@@ -1,52 +1,63 @@
1
1
  ---
2
2
  name: verify-completeness
3
- description: Use when implementation appears done and before reporting completion. Re-read the original plan and agreed scope, re-read files that were supposed to be created or changed, verify no approved scope was reduced or skipped, and continue working until the scope is truly complete.
3
+ description: Use when implementation appears done and before reporting completion. Re-read the approved plan and final scope, verify every in-scope file and checkpoint, and only then decide whether work can be reported complete.
4
4
  ---
5
5
 
6
6
  # Verify Completeness
7
7
 
8
- Use this skill at final closeout, right before you would report the work complete.
8
+ Use this skill at final closeout, right before you would report work complete. Its job is to gate completion, not to re-open the whole project.
9
9
 
10
- ## Required verification loop
10
+ ## Rules
11
11
 
12
- 1. Re-read the original plan and the latest agreed scope before deciding status.
13
- 2. Re-read `ACTIVE_PLANS.md` and `WORKSPACE.md` for current checklist, phase, blockers, and verification state.
14
- 3. Build the expected file set from plan/scope: files that were supposed to be created, modified, or deleted.
15
- 4. Re-read those files directly. This final re-read is mandatory even if they were read earlier in the session.
16
- 5. Compare expected scope vs actual outcome and list any missing or partially completed items.
17
- 6. Run a scope-discipline pass: identify additions that were not requested or approved. Remove/simplify them before completion, or explicitly ask the user to approve keeping them.
18
- 7. Run a cleanup pass on changed files: remove duplicated logic, unnecessary abstractions/files, and low-value comments that create maintenance bloat.
19
- 8. If changed code is still hard to read or reason about, run `legibility-pass` before completion and apply the resulting readability cleanup.
20
- 9. Run a file-footprint sanity pass: collapse avoidable tiny-file fragmentation and keep code that changes together in the same place when boundary/reuse/size reasons are weak.
21
- 10. Run a test-signal sanity pass: remove redundant or brittle tests and keep the smallest high-signal set that still protects the contract.
22
- 11. Before commit/final handoff, run the full checks required by the plan (for example full typecheck/test/build sweep) once, unless explicitly blocked or the user asks for a different cadence.
23
- 12. If any approved item is missing, incomplete, or silently deferred, do not report completion. Continue working until the agreed scope is fully satisfied or discuss a scope change explicitly.
12
+ 1. Re-read the approved plan and the latest agreed scope before deciding status.
13
+ 2. Re-read `ACTIVE_PLANS.md` and `WORKSPACE.md` for the current checklist, phase, blockers, and verification state.
14
+ 3. Build the expected file set from the approved scope only: files that were supposed to be created, modified, or deleted.
15
+ 4. Re-read every file in the expected set directly. This final re-read is mandatory even if the file was read earlier in the session.
16
+ 5. Compare expected scope vs actual outcome and identify any missing, partial, or silently deferred items.
17
+ 6. Run required plan checkpoints at the required cadence, including the full pre-commit checks when the plan requires them, unless a bounded exception applies.
18
+ 7. Do not report completion if any approved item is missing, incomplete, or deferred without explicit approval.
19
+ 8. Do not keep unapproved additions, cleanup work, refactors, abstractions, file splits, or test pruning in this skill's core scope; treat them as separate decisions that require explicit approval unless they are needed to finish the approved scope.
20
+ 9. If changed code is still hard to read or reason about, run `legibility-pass` and apply only the readability cleanup required to finish the approved scope.
21
+ 10. Keep adjacent skills conditional and narrow: use them only when the verification pass exposes that specific need, not as part of the default completion gate.
24
22
 
25
- ## Completion gate
23
+ ## Exception Rule
24
+
25
+ You may relax the normal verification loop only when one of these is true:
26
+
27
+ - required plan artifacts are missing or stale, and you need to reconstruct the approved scope before continuing
28
+ - the task is read-only or review-only, so no code or files are expected to change
29
+ - required checks cannot run because the environment, dependencies, permissions, or upstream blockers make them impossible right now
30
+
31
+ In these cases, continue only far enough to identify the gap, record the blocker, and report the exact missing verification step or artifact. Do not use the exception to absorb cleanup, refactor, or scope expansion work.
32
+
33
+ ## Completion Gate
26
34
 
27
35
  You can report complete only when all are true:
28
36
 
29
37
  - approved scope items are done
30
38
  - planned file changes match reality
31
- - verification/checkpoints required by the plan were run at the required cadence, including full pre-commit checks when required (or explicitly called out as blocked)
39
+ - required verification/checkpoints were run, or each skipped check has a specific blocked reason
32
40
  - no hidden scope reduction occurred
33
41
  - no unapproved scope expansion remains
34
42
  - no obvious duplication or avoidable bloat remains in touched files
35
43
  - no avoidable file fragmentation remains in touched feature areas
36
44
  - test set remains high-signal and non-redundant for the risk level
37
45
 
38
- ## Output contract
46
+ ## Output Contract
47
+
48
+ Before final status, report these items explicitly:
39
49
 
40
- Before final status, summarize briefly:
50
+ - `status`: `complete`, `blocked`, or `incomplete`
51
+ - `scope reviewed`: the plan/scope sources you re-read
52
+ - `files re-read`: the files you re-opened for final verification
53
+ - `missing scope items`: any approved items still absent, or `none`
54
+ - `checks run`: each verification step actually executed
55
+ - `checks skipped`: each omitted check with a reason, or `none`
56
+ - `removed extras`: any unapproved extras, cleanup, or bloat you removed, or `none`
57
+ - `adjacent skill escalation`: any conditional skill you invoked and why, or `none`
58
+ - `next action`: continue execution, request scope approval, or complete
41
59
 
42
- - scope reviewed
43
- - files re-read for final verification
44
- - completed items
45
- - removed unapproved extras or bloat cleanup applied
46
- - legibility cleanup applied (if run)
47
- - file-collapsing or test-pruning done during sanity passes
48
- - remaining gaps (if any)
49
- - next action (continue execution or complete)
60
+ Do not say the work is complete unless the `status` is `complete` and the completion gate is satisfied.
50
61
 
51
62
  ## Gotchas
52
63
 
@@ -1,4 +1,4 @@
1
1
  interface:
2
2
  display_name: "Verify Completeness"
3
- short_description: "Run a strict final scope and file verification pass"
4
- default_prompt: "Use $verify-completeness now: re-read the approved plan and scope, re-read all files that were supposed to change, remove unapproved extras and obvious bloat, verify nothing was dropped, and keep working if any approved scope item is still incomplete."
3
+ short_description: "Gate completion by re-reading scope, files, and required checks before closeout"
4
+ default_prompt: "Use $verify-completeness now: re-read the approved plan and final scope, re-read every file that was supposed to change, confirm missing scope items and skipped checks with reasons, and only report completion if the approved scope is fully finished."
@@ -0,0 +1,17 @@
1
+ name = "duplicate_ownership_explorer"
2
+ description = "Read-only explorer for one bounded slice of duplicate ownership. Use one agent per feature or contract slice."
3
+ model = "gpt-5.3-codex"
4
+ model_reasoning_effort = "medium"
5
+ sandbox_mode = "read-only"
6
+ nickname_candidates = ["Slice", "Scout", "Trace"]
7
+ developer_instructions = """
8
+ Own one narrow slice only.
9
+ Look for duplicate ownership of rules such as validation, defaults, canonicalization, mapping, repair, persistence policy, and helper semantics.
10
+ For each candidate, decide whether it is:
11
+ - architecture / SSOT bug
12
+ - local dedupe cleanup
13
+ - legitimate boundary adapter
14
+ - legitimate domain constraint
15
+ Return exact files and symbols.
16
+ Do not recommend wrappers, shims, or fallback paths.
17
+ """
@@ -0,0 +1,16 @@
1
+ name = "ownership_taxonomy_mapper"
2
+ description = "Read-only mapper for duplicate-ownership audits. Use first to identify likely SSOT conflicts, ownership boundaries, and exploration slices before deeper review."
3
+ model = "gpt-5.3-codex"
4
+ model_reasoning_effort = "medium"
5
+ sandbox_mode = "read-only"
6
+ nickname_candidates = ["Mapper", "Topo", "Surveyor"]
7
+ developer_instructions = """
8
+ Stay in exploration mode.
9
+ Map ownership boundaries, not fixes.
10
+ Find likely places where the same rule is owned twice.
11
+ Return:
12
+ - candidate taxonomy
13
+ - likely hot spots
14
+ - recommended slice boundaries for parallel follow-up
15
+ Do not propose compatibility shims or code changes unless the parent asks.
16
+ """
@@ -0,0 +1,17 @@
1
+ name = "ssot_judge"
2
+ description = "Strict read-only judge for duplicate-ownership findings. Use after exploration to decide the winning owner and hard-cut cleanup."
3
+ model = "gpt-5.3-codex"
4
+ model_reasoning_effort = "high"
5
+ sandbox_mode = "read-only"
6
+ nickname_candidates = ["Judge", "Owner", "Verdict"]
7
+ developer_instructions = """
8
+ Act like an architecture owner.
9
+ Given candidate duplicate-ownership findings, decide:
10
+ - is this real duplicate ownership or legitimate boundary work
11
+ - who should be the single owner
12
+ - what should be deleted
13
+ - what should remain
14
+ - what should be renamed for honest semantics
15
+ Bias toward hard-cut cleanup.
16
+ Reject mediator layers, wrappers, compatibility paths, and dual ownership.
17
+ """
@@ -16,3 +16,15 @@ config_file = "agents/code-reviewer.toml"
16
16
  [agents."plan-reviewer"]
17
17
  description = "Read-only plan validator that checks whether a proposed implementation plan is complete, feasible, and safe to execute."
18
18
  config_file = "agents/plan-reviewer.toml"
19
+
20
+ [agents."ssot_judge"]
21
+ description = "Strict read-only judge for duplicate-ownership findings that selects the canonical owner and hard-cut cleanup."
22
+ config_file = "agents/ssot_judge.toml"
23
+
24
+ [agents."ownership_taxonomy_mapper"]
25
+ description = "Read-only mapper that identifies ownership boundaries and high-risk SSOT duplicate-ownership slices before deeper audits."
26
+ config_file = "agents/ownership_taxonomy_mapper.toml"
27
+
28
+ [agents."duplicate_ownership_explorer"]
29
+ description = "Read-only slice explorer that audits one bounded area for duplicate ownership and classifies findings with file-level evidence."
30
+ config_file = "agents/duplicate_ownership_explorer.toml"
@@ -1,86 +0,0 @@
1
- ---
2
- name: adversarial-review
3
- description: Second-pass closeout review for a non-trivial implementation slice. Use when risky work needs a deliberate final review before being called done. This skill scopes the slice, runs the right reviewer agents and code-guide checks, fixes meaningful findings, and repeats until only optional polish remains.
4
- ---
5
-
6
- # Adversarial Review
7
-
8
- Use this skill when you explicitly want a closeout-grade second pass before calling a non-trivial slice done or ready to ship.
9
-
10
- This skill coordinates the specialist reviewers, keeps the scope tight, waits as long as needed, fixes meaningful findings, and reruns fresh review rounds until the remaining feedback is only optional polish or no findings at all.
11
-
12
- ## When To Skip This Skill
13
-
14
- - Skip it for tiny obvious edits where launching the full closeout loop would be noise.
15
- - Skip it for normal debugging or investigation where the user needs diagnosis and forward motion more than formal ship-readiness.
16
- - Skip it for pre-implementation planning; that is `plan-reviewer` territory.
17
- - Skip it for active PR comment back-and-forth; use `pr-review` for that workflow.
18
- - Skip it when the user wants a one-off targeted coding-guide check and not the full closeout loop; use `code-guide-audit` directly in that case.
19
-
20
- ## Step 1: Define The Reviewable Slice
21
-
22
- - Resolve the exact slice you are trying to close out before launching reviewers.
23
- - Prefer a recent self-authored commit when one cleanly represents the slice.
24
- - Otherwise use the current changed files, diff, or feature path.
25
- - Pass the reviewers the same concrete scope anchor, plus a short plain-English summary of what changed.
26
- - If the scope is muddy, tighten it before review instead of asking the reviewers to figure it out from an entire worktree.
27
-
28
- ## Step 2: Launch The Required Reviewers
29
-
30
- - Spawn `code-reviewer` for every non-trivial implementation slice.
31
- - Spawn `code-health-reviewer` when the change is medium or large, especially when it adds structure, duplicates logic, or introduces new abstractions.
32
- - Run `code-guide-audit` on the same scoped slice as part of the closeout loop.
33
- - Launch the reviewer agents with `fork_context: false`, `model: gpt-5.4`, and `reasoning_effort: high` unless the user explicitly asked for something else.
34
- - Tell the reviewer agents what changed, what scope anchor to use, and which files or feature area represent the slice under review.
35
- - When both reviewer agents apply, launch them in parallel.
36
-
37
- ## Step 3: Wait For The Round To Finish
38
-
39
- - Wait for every required reviewer result, no matter how long it takes.
40
- - Do not interrupt slow reviewer agents just because they are still running.
41
- - Do not call the work done while a required reviewer round is still in flight.
42
- - Read the full reviewer outputs before deciding what to fix.
43
-
44
- ## Step 4: Fix Meaningful Findings
45
-
46
- - Fix real correctness, regression, maintainability, and code-guide issues.
47
- - Rerun the most relevant verification for the changed area after the fixes.
48
- - If a reviewer comment is only a nit or clearly optional polish, note that distinction and do not keep reopening the loop just to satisfy minor taste differences.
49
- - If a finding changes durable behavior or repo memory, update the relevant docs and workspace state before the next round.
50
-
51
- ## Step 5: Close The Old Review Round
52
-
53
- - Treat `code-reviewer` and `code-health-reviewer` as one-shot reviewer agents.
54
- - After you have read a reviewer result, close that reviewer thread.
55
- - If another pass is needed later, spawn a fresh reviewer instead of reusing the old thread.
56
-
57
- ## Step 6: Repeat Until The Slice Is Actually Clear
58
-
59
- - Start a fresh round whenever you made meaningful fixes in response to the previous round.
60
- - Reuse the same scope anchor when it still represents the slice cleanly; otherwise hand the new round the updated changed-file set or follow-up commit.
61
- - Rerun `code-guide-audit` when the fixes materially changed guide-relevant behavior or when the previous round surfaced guide-related issues.
62
- - Stop only when no meaningful findings remain. Optional polish and obvious nitpicks do not block closeout.
63
-
64
- ## Step 7: Report The Closeout State
65
-
66
- Summarize:
67
-
68
- - what scope was reviewed
69
- - which reviewers ran
70
- - what meaningful issues were fixed
71
- - what verification ran
72
- - whether the slice is now clear or what still blocks it
73
-
74
- ## Gotchas
75
-
76
- - Fresh reviewer rounds matter. If you make meaningful fixes, do not treat older reviewer findings as if they still describe the current code.
77
- - Green local tests are not enough if required reviewer threads are still running. Wait for the actual reviewer outputs before calling the slice done.
78
- - Close reviewer agents after each round. Reusing a stale reviewer thread weakens the signal and blurs which code state the findings apply to.
79
- - When this loop changes repo-health or upgrade behavior, test real old-repo edge cases, not just fresh-init cases.
80
- - If a reviewer result is clean, it should still name the key paths and related files it checked. A "looks fine" skim is not a real closeout pass.
81
-
82
- ## Keep This Skill Sharp
83
-
84
- - After meaningful runs, add new gotchas when the same review-loop failure, stale-review mistake, or repo-upgrade edge case is likely to happen again.
85
- - Tighten the description if the skill fires too broadly or misses real prompts like "final review pass" or "before we call this done."
86
- - If the loop keeps re-creating the same helper logic or review instructions, move that reusable logic into the skill or its supporting resources instead of leaving it in chat.
@@ -1,4 +0,0 @@
1
- interface:
2
- display_name: "Adversarial Review"
3
- short_description: "Run a deliberate second-pass closeout review"
4
- default_prompt: "Use $adversarial-review when this non-trivial implementation slice needs a deliberate final review loop with reviewer agents and code-guide checks before we call it done."
@@ -1,86 +0,0 @@
1
- ---
2
- name: code-guide-audit
3
- description: Audit a scoped implementation slice against the code guide and report only guide-related violations or risks. Use for coding-guide compliance checks on explicit behavior, root-cause fixes, boundary validation, security, concurrency, accessibility, performance, and future legibility.
4
- ---
5
-
6
- # Code Guide Audit
7
-
8
- Use this skill for a targeted audit against the coding guide, not for a whole-repo hygiene sweep.
9
-
10
- This skill owns one job: inspect the specific code the user points at, map it against the coding guide, and report only guide-related findings in that scope.
11
-
12
- ## When Not To Use This Skill
13
-
14
- - Skip it for broad ship-readiness review; use a ship-audit workflow for that.
15
- - Skip it for generic bug finding or regression review that is not specifically about the coding guide.
16
- - Skip it for active PR comment triage; use `pr-review` for that loop.
17
- - Skip it for repo-wide cleanup unless the user explicitly asked for a repo-wide coding-guide audit.
18
-
19
- ## Step 1: Load The Right Scope
20
-
21
- - Read the repo's routed code guide.
22
- - In standard Waypoint repos, use `.waypoint/docs/code-guide.md`.
23
- - If the repo routes the code guide somewhere else, follow the repo's own docs and routing instead of assuming another fixed path.
24
- - Read only the files, routes, tests, contracts, and nearby docs needed to understand the specific feature or slice under review.
25
- - If the scope is ambiguous, resolve it to a concrete file set, feature path, or commit-sized change surface before auditing.
26
-
27
- Do not expand into a whole-repo audit unless the user explicitly asks for that.
28
-
29
- ## Step 2: Translate The Guide Into Checks
30
-
31
- Audit only for rules that actually apply to the scoped code.
32
-
33
- Look for:
34
-
35
- - stale compatibility layers, shims, aliases, or migration-only branches
36
- - weak typing, avoidable `any`, recreated shared types, or unsafe casts
37
- - silent fallbacks, swallowed errors, degraded paths, or missing required-config failures
38
- - missing validation at input, config, API, file, queue, or database boundaries
39
- - speculative abstractions that hide the actual behavior
40
- - unclear state transitions, weak transaction boundaries, missing idempotency, or weak persistence invariants
41
- - frontend code that ignored reusable components or broke the existing design language
42
- - missing loading, empty, or error states
43
- - optimistic UI without rollback or invalidation
44
- - missing observability at important failure or state boundaries
45
- - regression tests that assert implementation details instead of behavior
46
-
47
- Skip rules that genuinely do not apply, but say that you skipped them.
48
-
49
- ## Step 3: Keep The Audit Narrow
50
-
51
- - Report only coding-guide findings for the requested scope.
52
- - Do not drift into generic architecture advice, repo-wide cleanup, docs sync, or PR readiness unless the finding is directly required by the guide.
53
- - If you notice issues outside scope, mention them only if they are severe enough that ignoring them would mislead the user about this audit.
54
-
55
- ## Step 4: Verify Evidence
56
-
57
- Ground each finding in the actual code.
58
-
59
- - Read the real implementation before calling something a violation.
60
- - When relevant, inspect the nearest tests, contracts, schemas, or reused components to confirm the gap.
61
- - Do not invent verification that you did not run.
62
-
63
- If the user asked for a pure audit, stop at findings. If they asked for fixes too, fix the clear issues and then verify the changed area.
64
-
65
- ## Step 5: Report The Result
66
-
67
- Summarize the scoped result in review style:
68
-
69
- - findings first, ordered by severity
70
- - each finding tied back to the relevant coding-guide rule
71
- - include exact file references
72
- - then note any skipped guide areas or residual uncertainty
73
-
74
- ## Gotchas
75
-
76
- - Do not turn this into generic code review. Every finding should tie back to a specific coding-guide rule.
77
- - Do not audit the whole repo by accident. Resolve the narrow slice first, then stay inside it unless an out-of-scope issue would seriously mislead the user.
78
- - Do not report a guide violation from a grep hit alone. Read the real implementation and the nearby evidence before calling it a problem.
79
- - Do not force every coding-guide rule onto every change. Skip non-applicable rules explicitly instead of inventing weak findings.
80
- - If you notice a broader ship-risk issue that is not really a coding-guide issue, say it is outside this skill's scope instead of quietly drifting into another audit.
81
-
82
- ## Keep This Skill Sharp
83
-
84
- - After meaningful runs, add new gotchas when the same guide-specific failure mode or scope-drift mistake keeps recurring.
85
- - Tighten the description if the skill fires on generic review requests or misses real prompts like "check this against the code guide."
86
- - If the same guide-rule translation logic keeps repeating, move that reusable detail into a supporting reference instead of expanding the hub file.
@@ -1,4 +0,0 @@
1
- interface:
2
- display_name: "Code Guide Audit"
3
- short_description: "Audit code-guide compliance on a scoped slice"
4
- default_prompt: "Use $code-guide-audit to audit this specific feature, file set, or implementation slice against the coding guide."