waypoint-codex 0.4.0 → 0.6.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +2 -0
- package/dist/src/core.js +2 -0
- package/package.json +1 -1
- package/templates/.agents/skills/break-it-qa/SKILL.md +87 -0
- package/templates/.agents/skills/break-it-qa/agents/openai.yaml +4 -0
- package/templates/.agents/skills/code-guide-audit/SKILL.md +65 -0
- package/templates/.agents/skills/code-guide-audit/agents/openai.yaml +4 -0
- package/templates/.waypoint/agent-operating-manual.md +2 -0
- package/templates/managed-agents-block.md +2 -0
package/README.md
CHANGED
package/dist/src/core.js
CHANGED
package/package.json
CHANGED
|
@@ -0,0 +1,87 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: break-it-qa
|
|
3
|
+
description: Verify a user-facing feature by trying to break it on purpose instead of only following the happy path. Use after building forms, multistep flows, settings pages, onboarding, stateful UI, destructive actions, or any browser-facing feature where invalid inputs, refreshes, back navigation, repeated clicks, wrong action order, or recovery paths might expose real bugs.
|
|
4
|
+
---
|
|
5
|
+
|
|
6
|
+
# Break-It QA
|
|
7
|
+
|
|
8
|
+
Use this skill to attack the feature like an impatient, confused, or careless user.
|
|
9
|
+
|
|
10
|
+
This is not the same as `e2e-verify`.
|
|
11
|
+
|
|
12
|
+
- `e2e-verify` proves the intended flow works end to end.
|
|
13
|
+
- `break-it-qa` tries to make the feature fail through invalid, interrupted, stale, repeated, or out-of-order interactions.
|
|
14
|
+
|
|
15
|
+
## Read First
|
|
16
|
+
|
|
17
|
+
Before verification:
|
|
18
|
+
|
|
19
|
+
1. Read `.waypoint/SOUL.md`
|
|
20
|
+
2. Read `.waypoint/agent-operating-manual.md`
|
|
21
|
+
3. Read `.waypoint/WORKSPACE.md`
|
|
22
|
+
4. Read `.waypoint/context/MANIFEST.md`
|
|
23
|
+
5. Read every file listed in that manifest
|
|
24
|
+
6. Read the routed docs or nearby code that define the feature being tested
|
|
25
|
+
|
|
26
|
+
## Step 1: Identify Break Surfaces
|
|
27
|
+
|
|
28
|
+
- Identify the happy path first so you know what "broken" means.
|
|
29
|
+
- Find the fragile surfaces: forms, wizards, pending states, destructive actions, async transitions, navigation changes, and persisted state.
|
|
30
|
+
|
|
31
|
+
Do not test blindly.
|
|
32
|
+
|
|
33
|
+
## Step 2: Use The Real UI
|
|
34
|
+
|
|
35
|
+
- Use `playwright-interactive`.
|
|
36
|
+
- Exercise the actual UI instead of mocking the flow in code.
|
|
37
|
+
- Keep the scope focused on the feature the user asked you to verify.
|
|
38
|
+
|
|
39
|
+
## Step 3: Try To Break It On Purpose
|
|
40
|
+
|
|
41
|
+
Do more than a happy-path walkthrough.
|
|
42
|
+
|
|
43
|
+
Actively try:
|
|
44
|
+
|
|
45
|
+
- invalid inputs
|
|
46
|
+
- empty required fields
|
|
47
|
+
- boundary-length or malformed inputs
|
|
48
|
+
- repeated or double clicks
|
|
49
|
+
- submitting twice
|
|
50
|
+
- wrong action order
|
|
51
|
+
- back and forward navigation
|
|
52
|
+
- page refresh during the flow
|
|
53
|
+
- closing and reopening modals or screens
|
|
54
|
+
- canceling mid-flow and re-entering
|
|
55
|
+
- stale UI state after edits
|
|
56
|
+
- conflicting selections or toggles
|
|
57
|
+
- error recovery after a failed action
|
|
58
|
+
|
|
59
|
+
If the feature is stateful, also check whether the UI, network result, and persisted state stay coherent after those interactions.
|
|
60
|
+
|
|
61
|
+
## Step 4: Record And Fix Real Bugs
|
|
62
|
+
|
|
63
|
+
- Document each meaningful issue you find.
|
|
64
|
+
- Fix the issue when the remediation is clear.
|
|
65
|
+
- If the behavior is ambiguous, call out the product decision instead of bluffing a fix.
|
|
66
|
+
- Update docs when the verification exposes stale assumptions about how the feature works.
|
|
67
|
+
|
|
68
|
+
Do not stop at the first bug.
|
|
69
|
+
|
|
70
|
+
## Step 5: Repeat Until The Feature Resists Abuse
|
|
71
|
+
|
|
72
|
+
After fixes:
|
|
73
|
+
|
|
74
|
+
- rerun the relevant happy path
|
|
75
|
+
- rerun the break attempts that previously failed
|
|
76
|
+
- verify the fix did not create a new inconsistent state
|
|
77
|
+
|
|
78
|
+
The skill is not done when the feature only works once. It is done when the feature behaves predictably under sloppy real-world use.
|
|
79
|
+
|
|
80
|
+
## Step 6: Report Truthfully
|
|
81
|
+
|
|
82
|
+
Summarize:
|
|
83
|
+
|
|
84
|
+
- what break attempts you tried
|
|
85
|
+
- which issues you found
|
|
86
|
+
- what you fixed
|
|
87
|
+
- what still looks risky or was not exercised
|
|
@@ -0,0 +1,4 @@
|
|
|
1
|
+
interface:
|
|
2
|
+
display_name: "Break-It QA"
|
|
3
|
+
short_description: "Try to break a feature through the UI"
|
|
4
|
+
default_prompt: "Use this skill to verify a user-facing feature by trying to break it through the browser with invalid inputs, wrong action order, refreshes, back navigation, repeated clicks, and other adversarial interactions, then fix clear issues and repeat."
|
|
@@ -0,0 +1,65 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: code-guide-audit
|
|
3
|
+
description: Audit a specific feature, file set, or implementation slice against the coding guide and report only coding-guide-related violations or risks in that scope. Use after building a feature, when the user wants a coding-guide compliance check, before review on a targeted area, or when validating whether a change follows rules like no silent fallbacks, strong boundary validation, frontend reuse, explicit state handling, and behavior-focused verification.
|
|
4
|
+
---
|
|
5
|
+
|
|
6
|
+
# Code Guide Audit
|
|
7
|
+
|
|
8
|
+
Use this skill for a targeted audit against the coding guide, not for a whole-repo hygiene sweep.
|
|
9
|
+
|
|
10
|
+
This skill owns one job: inspect the specific code the user points at, map it against the coding guide, and report only guide-related findings in that scope.
|
|
11
|
+
|
|
12
|
+
## Step 1: Load The Right Scope
|
|
13
|
+
|
|
14
|
+
- Read `.waypoint/docs/code-guide.md`.
|
|
15
|
+
- Read only the files, routes, tests, contracts, and nearby docs needed to understand the specific feature or slice under review.
|
|
16
|
+
- If the scope is ambiguous, resolve it to a concrete file set, feature path, or commit-sized change surface before auditing.
|
|
17
|
+
|
|
18
|
+
Do not expand into a whole-repo audit unless the user explicitly asks for that.
|
|
19
|
+
|
|
20
|
+
## Step 2: Translate The Guide Into Checks
|
|
21
|
+
|
|
22
|
+
Audit only for rules that actually apply to the scoped code.
|
|
23
|
+
|
|
24
|
+
Look for:
|
|
25
|
+
|
|
26
|
+
- stale compatibility layers, shims, aliases, or migration-only branches
|
|
27
|
+
- weak typing, avoidable `any`, recreated shared types, or unsafe casts
|
|
28
|
+
- silent fallbacks, swallowed errors, degraded paths, or missing required-config failures
|
|
29
|
+
- missing validation at input, config, API, file, queue, or database boundaries
|
|
30
|
+
- speculative abstractions that hide the actual behavior
|
|
31
|
+
- unclear state transitions, weak transaction boundaries, missing idempotency, or weak persistence invariants
|
|
32
|
+
- frontend code that ignored reusable components or broke the existing design language
|
|
33
|
+
- missing loading, empty, or error states
|
|
34
|
+
- optimistic UI without rollback or invalidation
|
|
35
|
+
- missing observability at important failure or state boundaries
|
|
36
|
+
- regression tests that assert implementation details instead of behavior
|
|
37
|
+
|
|
38
|
+
Skip rules that genuinely do not apply, but say that you skipped them.
|
|
39
|
+
|
|
40
|
+
## Step 3: Keep The Audit Narrow
|
|
41
|
+
|
|
42
|
+
- Report only coding-guide findings for the requested scope.
|
|
43
|
+
- Do not drift into generic architecture advice, repo-wide cleanup, docs sync, or PR readiness unless the finding is directly required by the guide.
|
|
44
|
+
- If you notice issues outside scope, mention them only if they are severe enough that ignoring them would mislead the user about this audit.
|
|
45
|
+
|
|
46
|
+
This skill is narrower than `pre-pr-hygiene`. Use that other skill for broader ship-readiness.
|
|
47
|
+
|
|
48
|
+
## Step 4: Verify Evidence
|
|
49
|
+
|
|
50
|
+
Ground each finding in the actual code.
|
|
51
|
+
|
|
52
|
+
- Read the real implementation before calling something a violation.
|
|
53
|
+
- When relevant, inspect the nearest tests, contracts, schemas, or reused components to confirm the gap.
|
|
54
|
+
- Do not invent verification that you did not run.
|
|
55
|
+
|
|
56
|
+
If the user asked for a pure audit, stop at findings. If they asked for fixes too, fix the clear issues and then verify the changed area.
|
|
57
|
+
|
|
58
|
+
## Step 5: Report The Result
|
|
59
|
+
|
|
60
|
+
Summarize the scoped result in review style:
|
|
61
|
+
|
|
62
|
+
- findings first, ordered by severity
|
|
63
|
+
- each finding tied back to the relevant coding-guide rule
|
|
64
|
+
- include exact file references
|
|
65
|
+
- then note any skipped guide areas or residual uncertainty
|
|
@@ -0,0 +1,4 @@
|
|
|
1
|
+
interface:
|
|
2
|
+
display_name: "Code Guide Audit"
|
|
3
|
+
short_description: "Audit scoped code against the coding guide"
|
|
4
|
+
default_prompt: "Use this skill to audit a specific feature, file set, or implementation slice against the coding guide and report only guide-related violations or risks in that scope."
|
|
@@ -68,6 +68,8 @@ Do not document every trivial implementation detail. Document the non-obvious, d
|
|
|
68
68
|
- `observability-audit` when production debugging signals look weak
|
|
69
69
|
- `ux-states-audit` when async/data-driven UI likely lacks loading, empty, or error states
|
|
70
70
|
- `docs-sync` when routed docs may be stale, missing, or inconsistent with the codebase
|
|
71
|
+
- `code-guide-audit` when a specific feature or file set needs a targeted coding-guide compliance check
|
|
72
|
+
- `break-it-qa` when a browser-facing feature should be attacked with invalid inputs, refreshes, repeated clicks, wrong action order, or other adversarial manual QA
|
|
71
73
|
- `workspace-compress` after meaningful chunks, before stopping, and before review when the live handoff needs compression
|
|
72
74
|
- `pre-pr-hygiene` before pushing or opening/updating a PR for substantial work
|
|
73
75
|
- `pr-review` once a PR has active review comments or automated review in progress
|
|
@@ -35,6 +35,8 @@ Working rules:
|
|
|
35
35
|
- Update `.waypoint/docs/` when behavior or durable project knowledge changes, and refresh `last_updated` on touched routable docs
|
|
36
36
|
- Use the repo-local skills Waypoint ships for structured workflows when relevant
|
|
37
37
|
- Use `docs-sync` when the docs may be stale or a change altered shipped behavior, contracts, routes, or commands
|
|
38
|
+
- Use `code-guide-audit` for a targeted coding-guide compliance pass on a specific feature, file set, or change slice
|
|
39
|
+
- Use `break-it-qa` for browser-facing features that should be tested against invalid inputs, refreshes, repeated clicks, wrong navigation, and other adversarial user behavior
|
|
38
40
|
- If optional reviewer roles are present and you make a commit, run `code-reviewer` and `code-health-reviewer` in parallel before calling the work done
|
|
39
41
|
- Before pushing or opening/updating a PR for substantial work, use `pre-pr-hygiene`
|
|
40
42
|
- Use `pr-review` once a PR has active review comments or automated review in progress
|