waypoint-codex 0.8.0 → 0.9.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (30) hide show
  1. package/README.md +17 -24
  2. package/dist/src/cli.js +15 -36
  3. package/dist/src/core.js +16 -323
  4. package/dist/src/templates.js +1 -4
  5. package/dist/src/upgrade.js +98 -8
  6. package/package.json +1 -1
  7. package/templates/.agents/skills/backend-context-interview/SKILL.md +70 -0
  8. package/templates/.agents/skills/backend-ship-audit/SKILL.md +221 -0
  9. package/templates/.agents/skills/backend-ship-audit/agents/openai.yaml +3 -0
  10. package/templates/.agents/skills/backend-ship-audit/references/audit-framework.md +228 -0
  11. package/templates/.agents/skills/backend-ship-audit/references/report-template.md +92 -0
  12. package/templates/.agents/skills/frontend-context-interview/SKILL.md +60 -0
  13. package/templates/.agents/skills/frontend-ship-audit/SKILL.md +87 -0
  14. package/templates/.agents/skills/frontend-ship-audit/agents/openai.yaml +3 -0
  15. package/templates/.agents/skills/frontend-ship-audit/references/guidance-file-updates.md +57 -0
  16. package/templates/.agents/skills/frontend-ship-audit/references/report-template.md +51 -0
  17. package/templates/.agents/skills/frontend-ship-audit/references/review-framework.md +83 -0
  18. package/templates/.agents/skills/frontend-ship-audit/scripts/create_frontend_audit.py +81 -0
  19. package/templates/.codex/agents/plan-reviewer.toml +1 -2
  20. package/templates/.waypoint/README.md +2 -5
  21. package/templates/.waypoint/SOUL.md +1 -1
  22. package/templates/.waypoint/agent-operating-manual.md +12 -9
  23. package/templates/.waypoint/agents/code-health-reviewer.md +10 -1
  24. package/templates/.waypoint/agents/plan-reviewer.md +7 -2
  25. package/templates/.waypoint/config.toml +0 -3
  26. package/templates/managed-agents-block.md +38 -2
  27. package/templates/.waypoint/automations/README.md +0 -18
  28. package/templates/.waypoint/automations/docs-garden.toml +0 -7
  29. package/templates/.waypoint/automations/repo-health.toml +0 -8
  30. package/templates/.waypoint/rules/README.md +0 -6
@@ -0,0 +1,92 @@
1
+ # Backend audit report template
2
+
3
+ Use this template for `.waypoint/audit/dd-mm-yyyy-hh-mm-backend-audit.md`.
4
+
5
+ ```markdown
6
+ # Backend Ship-Readiness Audit: <scope name>
7
+
8
+ - Timestamp: <dd-mm-yyyy hh:mm>
9
+ - Requested scope: <user request>
10
+ - Assumed audit scope: <narrowed reviewable unit>
11
+ - Ship recommendation: <Ready to ship | Ready to ship with explicit risk acceptance | Not ready to ship>
12
+
13
+ ## Scope
14
+
15
+ ### In scope
16
+ - <paths, services, APIs, workers, migrations, docs>
17
+
18
+ ### Adjacent dependencies and boundaries
19
+ - <datastores, queues, auth layers, partner APIs, shared libraries>
20
+
21
+ ### Out of scope
22
+ - <explicit exclusions>
23
+
24
+ ## What was read
25
+ - `<path>`: <why it mattered>
26
+ - `<path>`: <why it mattered>
27
+
28
+ ## Open questions
29
+ - <only unresolved questions that materially affect readiness>
30
+
31
+ ## Assumptions used in this audit
32
+ - <assumption>
33
+ - <assumption>
34
+
35
+ ## System understanding
36
+
37
+ Provide a concise explanation of the scoped backend:
38
+ - primary entry points
39
+ - data flow
40
+ - trust boundaries
41
+ - transaction and async boundaries
42
+ - external dependencies
43
+ - operational controls
44
+
45
+ ## Priority summary
46
+
47
+ - P0: <count>
48
+ - P1: <count>
49
+ - P2: <count>
50
+ - P3: <count>
51
+ - P4: <count>
52
+
53
+ ## Findings
54
+
55
+ ### BA-001: <title>
56
+ - Priority: <P0-P4>
57
+ - Why it matters: <plain-language impact>
58
+ - Evidence:
59
+ - `<path>:<line-range>` <concise fact>
60
+ - `<path>:<line-range>` <concise fact>
61
+ - Affected area: <service, endpoint, worker, migration, table, client>
62
+ - Risk if shipped as-is: <practical release risk>
63
+ - Recommended fix: <specific fix or mitigation>
64
+ - Confidence: <High | Medium | Low>
65
+
66
+ ### BA-002: <title>
67
+ - Priority: <P0-P4>
68
+ - Why it matters: <plain-language impact>
69
+ - Evidence:
70
+ - `<path>:<line-range>` <concise fact>
71
+ - Affected area: <service, endpoint, worker, migration, table, client>
72
+ - Risk if shipped as-is: <practical release risk>
73
+ - Recommended fix: <specific fix or mitigation>
74
+ - Confidence: <High | Medium | Low>
75
+
76
+ ## Release conditions / next actions
77
+
78
+ List only the conditions that matter for shipment.
79
+
80
+ 1. <required fix, mitigation, or explicit risk acceptance>
81
+ 2. <required fix, mitigation, or explicit risk acceptance>
82
+
83
+ ## Notes
84
+
85
+ Include only brief context that materially helps a future reviewer.
86
+ ```
87
+
88
+ Guidance:
89
+ - Keep the summary short.
90
+ - Prefer fewer findings with stronger evidence.
91
+ - Include no finding that is unsupported by the repository or an explicit unanswered question.
92
+ - Use stable IDs in the form `BA-001`, `BA-002`, and so on.
@@ -0,0 +1,60 @@
1
+ ---
2
+ name: frontend-context-interview
3
+ description: Gather and persist durable frontend project context when missing or insufficient for implementation or review work. Use when frontend decisions depend on product type, audience, support matrix, accessibility, SEO, localization, design-system constraints, or similar context that is not clearly documented.
4
+ ---
5
+
6
+ # Frontend Context Interview
7
+
8
+ Use this skill when relevant frontend context is missing, stale, contradictory, or too weak to support correct implementation or review decisions.
9
+
10
+ ## Goals
11
+
12
+ 1. identify the missing frontend context that materially affects the work
13
+ 2. ask only high-leverage questions that cannot be answered from the repo or guidance files
14
+ 3. persist durable context into the project root guidance file
15
+ 4. avoid repeated questioning in future tasks
16
+
17
+ ## When to use
18
+
19
+ Use this skill when the current task depends on context such as:
20
+ - internal tool vs customer-facing product vs public marketing site
21
+ - expected scale or traffic patterns
22
+ - browser and device support requirements
23
+ - accessibility targets
24
+ - SEO requirements
25
+ - localization or internationalization requirements
26
+ - analytics or experimentation requirements
27
+ - design-system or branding constraints
28
+ - auth or role-based UI expectations
29
+ - security or privacy expectations that change frontend behavior
30
+
31
+ Do not use this skill when the answer is already clearly present in `AGENTS.md`, product docs, or the task itself.
32
+
33
+ ## Workflow
34
+
35
+ ### 1. Check persisted context first
36
+
37
+ Inspect the project root guidance files.
38
+
39
+ Priority:
40
+ 1. `AGENTS.md`
41
+
42
+ Look for:
43
+ - `## Project Context`
44
+ - `## Frontend Context`
45
+ - equivalent sections with the same intent
46
+
47
+ If the existing section is accurate and sufficient, do not interview the user.
48
+
49
+ ### 2. Determine what is actually missing
50
+
51
+ Only ask questions that materially affect implementation or review choices.
52
+
53
+ Good triggers:
54
+ - the right browser support changes implementation or QA expectations
55
+ - accessibility bar changes component and interaction requirements
56
+ - public marketing surface vs internal tool changes polish, SEO, and content expectations
57
+ - localization changes copy, layout, and component design
58
+
59
+ Do not ask broad or low-value questions.
60
+ Do not ask generic discovery questions that do not affect implementation.
@@ -0,0 +1,87 @@
1
+ ---
2
+ name: frontend-ship-audit
3
+ description: Audit a defined frontend scope for ship-readiness with a strong focus on real product risk, user-facing correctness, and evidence from the repository. Use when Codex needs to review an app, route group, feature, page set, component area, PR, or frontend directory to decide whether it is ready to ship; resolve the actual reviewable frontend scope from the user request and repository structure; read all relevant frontend code and docs completely; ask a concise high-leverage interview only for missing context that materially changes the release bar; persist durable frontend deployment context into the project root AGENTS.md under a Frontend Context section; and write an evidence-based audit with prioritized P0-P4 findings at .waypoint/audit/dd-mm-yyyy-hh-mm-frontend-audit.md.
4
+ ---
5
+
6
+ Audit ship-readiness like a strong frontend reviewer. Optimize for user impact, release risk, and production correctness. Do not optimize for style policing.
7
+
8
+ Use this workflow:
9
+
10
+ 1. Resolve the scope.
11
+ - Infer the most defensible reviewable unit from the user request and repository structure.
12
+ - State the assumed scope when the request is broad or ambiguous.
13
+ - List what is directly in scope, what important dependencies matter, and what is explicitly out of scope.
14
+ - Include dependent APIs, design systems, auth flows, platform constraints, analytics, SEO, localization, and accessibility requirements when they materially affect the scoped experience.
15
+
16
+ 2. Build repository understanding before judging readiness.
17
+ - Read the project root guidance file first: `AGENTS.md`.
18
+ - Read package manifests, router entry points, route definitions, layouts, pages, screens, composition layers, state containers, API clients, validation logic, design-system primitives, styling and theming files, accessibility helpers, tests, specs, design docs, runbooks, and architecture docs when they matter to the scoped frontend.
19
+ - Read complete files for all relevant materials. Do not rely on grep hits, single matched lines, or truncated snippets for anything that informs architecture or a finding.
20
+ - Ignore clearly irrelevant material such as vendored dependencies, generated outputs, caches, and unrelated subsystems.
21
+
22
+ 3. Model the real user experience.
23
+ - Trace primary and secondary user journeys across entry points, route transitions, loading states, empty states, errors, retries, mutations, auth boundaries, success states, and exits.
24
+ - Identify frontend boundary assumptions: API contracts, feature flags, experiments, permissions, browser support, device classes, SEO rules, localization rules, analytics expectations, and privacy constraints.
25
+ - Distinguish proven behavior from assumed behavior.
26
+
27
+ 4. Ask only the questions that materially change the release bar.
28
+ - Ask the interview after repository exploration.
29
+ - Group questions by topic.
30
+ - Keep them concise and high leverage.
31
+ - Skip questions that the codebase or docs already answer.
32
+ - If answers are unavailable, proceed with explicit assumptions and label them clearly in the audit.
33
+
34
+ 5. Persist durable frontend context.
35
+ - Prefer the project root `AGENTS.md`.
36
+ - If it does not exist, do not create a new guidance file unless the user explicitly asks.
37
+ - Update an existing `## Frontend Context` section when present.
38
+ - Otherwise add a new `## Frontend Context` section.
39
+ - Preserve surrounding content exactly.
40
+ - Do not overwrite unrelated sections.
41
+ - Do not duplicate existing context.
42
+ - Do not persist transient findings.
43
+ - Persist only stable deployment context and durable product constraints such as audience, browser support, device classes, accessibility targets, performance expectations, SEO expectations, localization requirements, analytics obligations, design-system constraints, auth expectations, and privacy or security expectations.
44
+ - Make this edit manually and preserve surrounding content exactly.
45
+ - Read `references/guidance-file-updates.md` before editing the guidance file.
46
+
47
+ 6. Produce the audit.
48
+ - Write the audit to `.waypoint/audit/dd-mm-yyyy-hh-mm-frontend-audit.md`.
49
+ - Create directories if needed.
50
+ - Use current local execution time for the timestamp unless the project or task specifies a different timezone convention.
51
+ - Use `scripts/create_frontend_audit.py` to create the timestamped audit file path and scaffold when helpful.
52
+ - Read `references/report-template.md` before writing the final report.
53
+
54
+ 7. Evaluate with practical release judgment.
55
+ - Judge the scoped frontend across architecture fit, boundary clarity, user journey completeness, loading and failure handling, form correctness, validation, API integration robustness, state management correctness, rendering behavior, responsiveness, accessibility, focus management, keyboard support, visual consistency, design-system usage, interaction quality, auth and authorization exposure, client-side security and privacy, performance risks, hydration risks, SEO and metadata correctness, analytics correctness, observability, future legibility, and cross-browser or cross-device risk when relevant.
56
+ - Do not limit the audit to that list. Apply specialist judgment.
57
+ - Read `references/review-framework.md` for the detailed audit lenses.
58
+
59
+ 8. Keep findings evidence-based and severity-calibrated.
60
+ - Do not include stylistic preferences, generic best-practice commentary, or trivial refactors without ship impact.
61
+ - Tie every finding to repository evidence.
62
+ - Use the smallest severity that honestly reflects the risk.
63
+ - Mark confidence when evidence is incomplete.
64
+
65
+ Use this priority model consistently:
66
+ - P0: clear ship blocker; likely severe production breakage, critical accessibility or security failure, or fundamentally unsafe release
67
+ - P1: serious issue that should usually be fixed before shipping; substantial user, reliability, accessibility, security, or operational risk
68
+ - P2: important issue that may be acceptable only with conscious acceptance of risk; not an immediate blocker in all contexts
69
+ - P3: moderate weakness or gap; should be addressed soon but not necessarily before launch
70
+ - P4: minor improvement with limited near-term impact
71
+
72
+ Every finding must include:
73
+ - ID
74
+ - title
75
+ - priority
76
+ - why it matters
77
+ - evidence
78
+ - affected area
79
+ - risk if shipped as-is
80
+ - recommended fix
81
+ - confidence level if evidence is incomplete
82
+
83
+ When evidence is partial:
84
+ - say what you verified
85
+ - say what remains assumed
86
+ - lower confidence instead of overstating certainty
87
+ - ask only the missing questions that would change the release decision
@@ -0,0 +1,3 @@
1
+ display_name: Frontend Ship Audit
2
+ short_description: Audit a scoped frontend surface for ship-readiness with evidence-based findings and durable deployment context.
3
+ default_prompt: Audit the ship-readiness of the requested frontend scope. Resolve the reviewable unit from the repo, read all relevant frontend files completely, ask only missing high-leverage questions, persist durable Frontend Context in the project root guidance file when present, and write a prioritized audit at .waypoint/audit/dd-mm-yyyy-hh-mm-frontend-audit.md.
@@ -0,0 +1,57 @@
1
+ # Frontend Context guidance file update rules
2
+
3
+ Use these rules when persisting durable frontend context.
4
+
5
+ ## File selection
6
+
7
+ 1. Prefer `AGENTS.md` in the project root.
8
+ 2. If it does not exist, do not create a new guidance file unless the user explicitly asked for it.
9
+
10
+ ## Section rules
11
+
12
+ - Target the exact heading `## Frontend Context`.
13
+ - Update the existing section when present.
14
+ - Otherwise append a new `## Frontend Context` section.
15
+ - Preserve all surrounding content exactly.
16
+ - Do not alter unrelated sections.
17
+ - Do not duplicate facts that already exist in accurate form.
18
+ - Do not persist audit findings, one-off bugs, or transient release notes.
19
+
20
+ ## Good content for this section
21
+
22
+ Persist stable context such as:
23
+ - deployment surface and audience
24
+ - internal, partner, customer-facing, or public marketing classification
25
+ - required browsers and device classes
26
+ - accessibility target or compliance expectation
27
+ - performance budget or latency expectation
28
+ - SEO requirements
29
+ - localization requirements
30
+ - analytics or experimentation obligations
31
+ - design-system or brand constraints
32
+ - auth and role-based UI expectations
33
+ - privacy and client-side security expectations
34
+
35
+ ## Bad content for this section
36
+
37
+ Do not add:
38
+ - current audit findings
39
+ - temporary workarounds
40
+ - one-time release decisions
41
+ - generic engineering principles unrelated to the frontend deployment context
42
+
43
+ ## Suggested format
44
+
45
+ Use concise bullets under `## Frontend Context`. Prefer facts and defaults over prose.
46
+
47
+ Example:
48
+
49
+ ## Frontend Context
50
+ - Surface: Public customer-facing web app.
51
+ - Devices: Mobile and desktop must both work.
52
+ - Browser support: Latest Chrome, Safari, Firefox, and Edge.
53
+ - Accessibility: Keyboard-accessible flows and screen-reader-compatible forms are required.
54
+ - Performance: Primary routes should remain responsive on mid-range mobile devices.
55
+ - SEO: Product and marketing routes require accurate metadata and indexable content.
56
+ - Localization: English only for now.
57
+ - Analytics: Core conversion events must remain instrumented.
@@ -0,0 +1,51 @@
1
+ # Frontend audit report template
2
+
3
+ Use this structure for `.waypoint/audit/dd-mm-yyyy-hh-mm-frontend-audit.md`.
4
+
5
+ # Frontend Ship-Readiness Audit
6
+
7
+ Generated: DD-MM-YYYY HH:MM
8
+
9
+ ## Scope
10
+ - Requested scope:
11
+ - Assumed reviewable unit:
12
+ - In scope:
13
+ - Important dependencies:
14
+ - Explicitly out of scope:
15
+
16
+ ## Deployment Context
17
+ - Established context:
18
+ - Missing context that affects the bar:
19
+ - Assumptions used for this audit:
20
+
21
+ ## Repository Coverage
22
+ - Files and docs read completely:
23
+ - Areas intentionally skipped as irrelevant:
24
+
25
+ ## Summary
26
+ - Verdict: Ready / Ready with accepted risk / Not ready
27
+ - Highest-risk themes:
28
+ - What would need to change before shipping, if not ready:
29
+
30
+ ## Findings
31
+
32
+ ### F-001: Title
33
+ - Priority: P1
34
+ - Why it matters:
35
+ - Evidence:
36
+ - Affected area:
37
+ - Risk if shipped as-is:
38
+ - Recommended fix:
39
+ - Confidence: High / Medium / Low
40
+
41
+ Repeat for each finding in priority order.
42
+
43
+ ## Positive evidence
44
+ - Note behaviors that reduce release risk when they are directly supported by repository evidence.
45
+
46
+ ## Open questions
47
+ - List only unanswered questions that would materially change the release decision.
48
+
49
+ ## Release recommendation
50
+ - State the release recommendation in one concise paragraph.
51
+ - If the scope can ship only with accepted risk, name the exact accepted risks.
@@ -0,0 +1,83 @@
1
+ # Frontend ship-readiness review framework
2
+
3
+ Use these lenses to decide whether the scoped frontend is safe and complete enough to ship.
4
+
5
+ ## Reading order
6
+
7
+ 1. Read root guidance and product docs.
8
+ 2. Read route entry points, layouts, and page or screen composition.
9
+ 3. Read state, data fetching, API clients, validation, and mutation paths.
10
+ 4. Read design-system, styling, accessibility, metadata, analytics, and auth helpers that affect the scope.
11
+ 5. Read tests that exercise the scoped behavior.
12
+ 6. Read adjacent docs or runbooks when they explain production expectations.
13
+
14
+ ## Core evaluation lenses
15
+
16
+ ### Scope and architecture fit
17
+ - Check whether the implementation boundary matches the requested surface.
18
+ - Check whether route, page, and component boundaries are legible and coherent.
19
+ - Check whether critical behavior is spread across too many layers or hidden behind implicit defaults.
20
+
21
+ ### User journey completeness
22
+ - Trace main happy paths, edge paths, entry paths, exits, and return paths.
23
+ - Check loading, empty, error, retry, disabled, unauthorized, and success states.
24
+ - Check whether failures are visible and actionable rather than silent.
25
+
26
+ ### Boundary correctness
27
+ - Verify request and response assumptions at frontend boundaries.
28
+ - Check parsing, validation, null handling, optimistic updates, stale data handling, retries, and race conditions.
29
+ - Check whether the UI assumes fields, permissions, or states that the backend does not guarantee.
30
+
31
+ ### State and rendering correctness
32
+ - Check whether state ownership is clear.
33
+ - Check whether derived state duplicates server state or causes drift.
34
+ - Check whether effects, memoization, and conditional rendering create stale UI, loops, or hydration mismatches.
35
+
36
+ ### Forms and input correctness
37
+ - Verify validation rules, error surfacing, submission gating, retry behavior, and server error handling.
38
+ - Check whether defaults, formatting, and field constraints match product expectations.
39
+
40
+ ### Responsiveness and device fit
41
+ - Check whether layouts, interactions, and content density hold across required breakpoints and device classes.
42
+ - Check tap target sizes, overflow, sticky elements, keyboard overlap, and modal behavior on smaller screens.
43
+
44
+ ### Accessibility
45
+ - Check semantic structure, labels, accessible names, focus order, focus visibility, keyboard access, screen-reader announcements, and dialog or popover behavior.
46
+ - Check whether error messaging and status changes are perceivable.
47
+ - Treat critical accessibility failures as real ship risk.
48
+
49
+ ### Visual and interaction quality
50
+ - Check whether design-system primitives are used consistently where required.
51
+ - Check whether states provide clear feedback and whether destructive or irreversible actions are appropriately signaled.
52
+ - Do not flag visual issues that are purely stylistic unless they affect usability, consistency, or release confidence.
53
+
54
+ ### Auth, authorization, security, and privacy
55
+ - Check whether privileged UI states are exposed to the wrong roles.
56
+ - Check whether secrets, tokens, PII, or internal data are exposed in client code, storage, logs, analytics payloads, or rendered markup.
57
+ - Check whether client-side behavior could mislead users about authorization.
58
+
59
+ ### Performance and delivery risk
60
+ - Check route-level loading strategy, bundle pressure, unnecessary client rendering, hydration risk, redundant requests, and expensive re-renders where relevant.
61
+ - Check whether performance expectations or budgets are violated for the target surface.
62
+
63
+ ### SEO, metadata, analytics, and observability
64
+ - Check metadata, canonical handling, structured data, crawlability, and rendering mode when the scope is indexable.
65
+ - Check event wiring, experiment exposure, and required tracking for key journeys.
66
+ - Check whether failures have enough logging or observability to support release confidence when relevant.
67
+
68
+ ### Maintainability as ship risk
69
+ - Flag overengineering, underengineering, hidden coupling, or silent fallback behavior when they create near-term release risk.
70
+ - Ignore refactor ideas that do not materially affect shipping confidence.
71
+
72
+ ## Risk heuristics
73
+
74
+ Raise priority when the issue is likely to:
75
+ - break the primary journey
76
+ - mis-handle auth or roles
77
+ - hide errors or create silent failure
78
+ - expose private or unsafe data
79
+ - strand keyboard or screen-reader users
80
+ - fail on a required browser or device class
81
+ - create high-probability production regressions due to unclear ownership or boundary assumptions
82
+
83
+ Lower priority when the issue is isolated, recoverable, obvious to users, or only affects non-critical polish.
@@ -0,0 +1,81 @@
1
+ #!/usr/bin/env python3
2
+ from __future__ import annotations
3
+
4
+ import argparse
5
+ from datetime import datetime
6
+ from pathlib import Path
7
+ import sys
8
+
9
+ TEMPLATE = """# Frontend Ship-Readiness Audit
10
+
11
+ Generated: {generated}
12
+
13
+ ## Scope
14
+ - Requested scope: {requested_scope}
15
+ - Assumed reviewable unit:
16
+ - In scope:
17
+ - Important dependencies:
18
+ - Explicitly out of scope:
19
+
20
+ ## Deployment Context
21
+ - Established context:
22
+ - Missing context that affects the bar:
23
+ - Assumptions used for this audit:
24
+
25
+ ## Repository Coverage
26
+ - Files and docs read completely:
27
+ - Areas intentionally skipped as irrelevant:
28
+
29
+ ## Summary
30
+ - Verdict: Ready / Ready with accepted risk / Not ready
31
+ - Highest-risk themes:
32
+ - What would need to change before shipping, if not ready:
33
+
34
+ ## Findings
35
+
36
+ ## Positive evidence
37
+
38
+ ## Open questions
39
+
40
+ ## Release recommendation
41
+ """
42
+
43
+
44
+ def main() -> int:
45
+ parser = argparse.ArgumentParser(description="Create a timestamped frontend audit file in .waypoint/audit.")
46
+ parser.add_argument("--project-root", default=".", help="Path to the repository root.")
47
+ parser.add_argument("--requested-scope", default="", help="Original requested scope for the audit.")
48
+ parser.add_argument("--timestamp", help="Override timestamp in dd-mm-yyyy-hh-mm format.")
49
+ parser.add_argument("--stdout-path-only", action="store_true", help="Print the output path without creating the file.")
50
+ parser.add_argument("--force", action="store_true", help="Overwrite the file if it already exists.")
51
+ args = parser.parse_args()
52
+
53
+ project_root = Path(args.project_root).resolve()
54
+ if args.timestamp:
55
+ stamp = args.timestamp
56
+ if len(stamp) == 16:
57
+ generated = f"{stamp[:10]} {stamp[11:13]}:{stamp[14:16]}"
58
+ else:
59
+ generated = stamp
60
+ else:
61
+ now = datetime.now()
62
+ stamp = now.strftime("%d-%m-%Y-%H-%M")
63
+ generated = now.strftime("%d-%m-%Y %H:%M")
64
+
65
+ out_path = project_root / ".waypoint" / "audit" / f"{stamp}-frontend-audit.md"
66
+ if args.stdout_path_only:
67
+ print(out_path)
68
+ return 0
69
+
70
+ out_path.parent.mkdir(parents=True, exist_ok=True)
71
+ if out_path.exists() and not args.force:
72
+ raise SystemExit(f"Refusing to overwrite existing file: {out_path}")
73
+
74
+ content = TEMPLATE.format(generated=generated, requested_scope=args.requested_scope)
75
+ out_path.write_text(content, encoding="utf-8")
76
+ print(out_path)
77
+ return 0
78
+
79
+
80
+ if __name__ == "__main__":
81
+ raise SystemExit(main())
@@ -4,11 +4,10 @@ developer_instructions = """
4
4
  Read these files in order before doing anything else:
5
5
  1. .waypoint/SOUL.md
6
6
  2. .waypoint/agent-operating-manual.md
7
- 3. WORKSPACE.md
7
+ 3. .waypoint/WORKSPACE.md
8
8
  4. .waypoint/context/MANIFEST.md
9
9
  5. every file listed in that manifest
10
10
  6. .waypoint/agents/plan-reviewer.md
11
11
 
12
12
  After reading them, follow .waypoint/agents/plan-reviewer.md as your operating instructions.
13
13
  """
14
-
@@ -1,6 +1,6 @@
1
1
  # .waypoint
2
2
 
3
- Repo-local Waypoint configuration and optional integration sources.
3
+ Repo-local Waypoint configuration and project memory files.
4
4
 
5
5
  - `config.toml` — Waypoint feature toggles and file locations
6
6
  - `WORKSPACE.md` — live operational state; new or materially revised entries in multi-topic sections are timestamped
@@ -8,9 +8,6 @@ Repo-local Waypoint configuration and optional integration sources.
8
8
  - `SOUL.md` — agent identity and working values
9
9
  - `agent-operating-manual.md` — required session workflow
10
10
  - `docs/` — Waypoint-managed project memory (architecture, decisions, debugging knowledge, durable plans); routable docs use `summary`, `last_updated`, and `read_when` frontmatter
11
- - `agents/` — agent prompt files that optional Codex roles can read and follow
12
- - `automations/` — optional automation source specs
11
+ - `agents/` — agent prompt files that Waypoint's reviewer agents can read and follow
13
12
  - `context/` — generated session context bundle
14
- - `rules/` — optional rule source files
15
13
  - `scripts/` — repo-local Waypoint helper scripts
16
- - `state/` — local sync state and tooling metadata
@@ -30,7 +30,7 @@ You're direct, opinionated, and evidence-driven. You read before you write. You
30
30
 
31
31
  **Update the durable record.** When behavior changes, update docs. When state changes, update `WORKSPACE.md`. When a better pattern emerges, encode it in the repo contract instead of rediscovering it later.
32
32
 
33
- **Close the loop after commits.** If Waypoint's reviewer roles are available, launch `code-reviewer` and `code-health-reviewer` after your own commits and address the real findings before you call the work finished.
33
+ **Close the loop before complete.** Run `code-reviewer` before considering any non-trivial implementation slice complete. Run `code-health-reviewer` before considering medium or large changes complete, especially when they add structure, duplicate logic, or introduce new abstractions.
34
34
 
35
35
  **Prefer small, reviewable changes.** Keep work scoped and comprehensible.
36
36
 
@@ -48,7 +48,7 @@ If something important lives only in your head or in the chat transcript, the re
48
48
  - Update `.waypoint/docs/` when durable knowledge changes, and refresh each changed routable doc's `last_updated` field.
49
49
  - Rebuild `.waypoint/DOCS_INDEX.md` whenever routable docs change.
50
50
  - Rebuild `.waypoint/TRACKS_INDEX.md` whenever tracker files change.
51
- - Use the repo-local skills and optional reviewer agents instead of improvising from scratch.
51
+ - Use the repo-local skills and reviewer agents instead of improvising from scratch.
52
52
  - Do not kill long-running subagents or reviewer agents just because they are slow. Wait unless they are clearly stuck, failed, or the user redirects the work.
53
53
 
54
54
  ## Documentation expectations
@@ -71,13 +71,14 @@ Do not document every trivial implementation detail. Document the non-obvious, d
71
71
  - `docs-sync` when routed docs may be stale, missing, or inconsistent with the codebase
72
72
  - `code-guide-audit` when a specific feature or file set needs a targeted coding-guide compliance check
73
73
  - `break-it-qa` when a browser-facing feature should be attacked with invalid inputs, refreshes, repeated clicks, wrong action order, or other adversarial manual QA
74
+ - `frontend-ship-audit` and `backend-ship-audit` only when the user explicitly requests a ship-readiness audit; do not trigger them autonomously as part of the default Waypoint workflow
74
75
  - `workspace-compress` after meaningful chunks, before stopping, and before review when the live handoff needs compression
75
76
  - `pre-pr-hygiene` before pushing or opening/updating a PR for substantial work
76
77
  - `pr-review` once a PR has active review comments or automated review in progress
77
78
 
78
- ## When to use the optional reviewer agents
79
+ ## When to use the reviewer agents
79
80
 
80
- If the repo was initialized with Waypoint roles enabled, use them as focused second-pass specialists:
81
+ Waypoint scaffolds these focused second-pass specialists by default:
81
82
 
82
83
  - `code-reviewer` for correctness and regression review
83
84
  - `code-health-reviewer` for maintainability drift
@@ -85,13 +86,15 @@ If the repo was initialized with Waypoint roles enabled, use them as focused sec
85
86
 
86
87
  ## Review Loop
87
88
 
88
- If Waypoint's optional roles are enabled, run the reviewer pair after a meaningful reviewable implementation chunk, not just as a reflex after every tiny commit.
89
+ Use reviewer agents before considering the work complete, not just as a reflex after every tiny commit.
89
90
 
90
- 1. Launch `code-reviewer` and `code-health-reviewer` in parallel as background, read-only reviewers once there is a coherent slice of work worth reviewing.
91
- 2. If you have a recent self-authored commit that cleanly represents that slice, use it as the default review scope anchor. Otherwise scope the reviewers to the current changed slice.
92
- 3. Widen only when surrounding files are needed to validate a finding.
93
- 4. Do not call the work finished before you read both reviewer results.
94
- 5. Fix real findings, rerun the relevant verification, update workspace/docs if needed, and make a follow-up commit when fixes change the repo.
91
+ 1. Run `code-reviewer` before considering any non-trivial implementation slice complete.
92
+ 2. Run `code-health-reviewer` before considering medium or large changes complete, especially when they add structure, duplicate logic, or introduce new abstractions.
93
+ 3. If both apply, launch `code-reviewer` and `code-health-reviewer` in parallel as background, read-only reviewers.
94
+ 4. If you have a recent self-authored commit that cleanly represents the reviewable slice, use it as the default review scope anchor. Otherwise scope the reviewers to the current changed slice.
95
+ 5. Widen only when surrounding files are needed to validate a finding.
96
+ 6. Do not call the work finished before you read the required reviewer results.
97
+ 7. Fix real findings, rerun the relevant verification, update workspace/docs if needed, and make a follow-up commit when fixes change the repo.
95
98
 
96
99
  ## Quality bar
97
100
 
@@ -24,6 +24,8 @@ Find code that works but should be refactored. You're not looking for bugs (`cod
24
24
 
25
25
  **Explore what exists.** Search for existing helpers, utilities, and patterns that could be reused instead of duplicated.
26
26
 
27
+ **Stay practical.** Do not file a code-health finding if the proposed cleanup would materially expand scope without enough maintenance payoff.
28
+
27
29
  ## What You're Looking For
28
30
 
29
31
  Code that works but hurts maintainability. Examples:
@@ -34,7 +36,14 @@ Code that works but hurts maintainability. Examples:
34
36
  - pattern drift
35
37
  - over-engineering
36
38
 
37
- Use your judgment these are examples, not a checklist.
39
+ Use these operational lenses to make findings concrete and defensible:
40
+
41
+ - makes future changes harder than necessary
42
+ - hides important behavior or state transitions
43
+ - duplicates business logic that is likely to diverge
44
+ - introduces abstraction without enough concrete reuse
45
+ - spreads one responsibility across too many files or layers
46
+ - leaves dead or transitional code that obscures current truth
38
47
 
39
48
  ## What You're NOT Looking For
40
49