waypoint-codex 0.6.0 → 0.6.1
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +122 -19
- package/package.json +1 -1
- package/templates/.agents/skills/break-it-qa/SKILL.md +105 -8
package/README.md
CHANGED
|
@@ -2,58 +2,128 @@
|
|
|
2
2
|
|
|
3
3
|
Waypoint is a docs-first repository operating system for Codex.
|
|
4
4
|
|
|
5
|
-
It helps the next agent
|
|
5
|
+
It helps the next agent understand your repo by making the important context live in the repo itself instead of disappearing into chat history.
|
|
6
6
|
|
|
7
|
-
|
|
8
|
-
|
|
9
|
-
|
|
7
|
+
## Why people use it
|
|
8
|
+
|
|
9
|
+
Most agent workflows break down the same way:
|
|
10
|
+
|
|
11
|
+
- the next session starts half-blind
|
|
12
|
+
- important project docs exist, but the agent does not know which ones matter
|
|
13
|
+
- workspace notes turn into noisy append-only logs
|
|
14
|
+
- repo conventions live in people's heads instead of files
|
|
15
|
+
- review and cleanup happen inconsistently
|
|
16
|
+
|
|
17
|
+
Waypoint gives you a lightweight repo contract that fixes those problems with explicit files, generated context, and a strong default skill set.
|
|
18
|
+
|
|
19
|
+
## What Waypoint sets up
|
|
20
|
+
|
|
21
|
+
Waypoint scaffolds a Codex-friendly repo structure built around a few core pieces:
|
|
22
|
+
|
|
23
|
+
- `AGENTS.md` for the startup contract
|
|
24
|
+
- `.waypoint/WORKSPACE.md` for live operational state
|
|
25
|
+
- `.waypoint/docs/` for durable project memory
|
|
10
26
|
- `.waypoint/DOCS_INDEX.md` for docs routing
|
|
11
|
-
-
|
|
27
|
+
- `.waypoint/context/` for generated startup context
|
|
28
|
+
- `.agents/skills/` for repo-local workflows like planning, audits, and QA
|
|
29
|
+
|
|
30
|
+
The philosophy is simple:
|
|
31
|
+
|
|
32
|
+
- less hidden runtime magic
|
|
33
|
+
- more repo-local state
|
|
34
|
+
- more markdown
|
|
35
|
+
- better continuity for the next agent
|
|
36
|
+
|
|
37
|
+
## Best fit
|
|
38
|
+
|
|
39
|
+
Waypoint is most useful when you want:
|
|
40
|
+
|
|
41
|
+
- multi-session continuity in a real repo
|
|
42
|
+
- a durable docs and workspace structure for agents
|
|
43
|
+
- stronger planning, review, QA, and closeout defaults
|
|
44
|
+
- repo-local scaffolding instead of a bunch of global mystery behavior
|
|
45
|
+
|
|
46
|
+
If you only use Codex for tiny one-off edits, Waypoint is probably unnecessary.
|
|
12
47
|
|
|
13
48
|
## Install
|
|
14
49
|
|
|
50
|
+
Waypoint requires Node 20+.
|
|
51
|
+
|
|
15
52
|
```bash
|
|
16
53
|
npm install -g waypoint-codex
|
|
17
54
|
```
|
|
18
55
|
|
|
19
|
-
Or
|
|
56
|
+
Or run it without a global install:
|
|
20
57
|
|
|
21
58
|
```bash
|
|
22
59
|
npx waypoint-codex@latest --help
|
|
23
60
|
```
|
|
24
61
|
|
|
25
|
-
##
|
|
62
|
+
## Quick start
|
|
26
63
|
|
|
27
|
-
Inside
|
|
64
|
+
Inside the repo you want to prepare for Codex:
|
|
28
65
|
|
|
29
66
|
```bash
|
|
30
|
-
waypoint init --with-
|
|
67
|
+
waypoint init --with-roles --with-automations
|
|
31
68
|
waypoint doctor
|
|
32
69
|
```
|
|
33
70
|
|
|
34
|
-
That
|
|
71
|
+
That gives you a repo that looks roughly like this:
|
|
35
72
|
|
|
36
73
|
```text
|
|
37
74
|
repo/
|
|
38
75
|
├── AGENTS.md
|
|
39
|
-
├── .agents/
|
|
76
|
+
├── .agents/
|
|
77
|
+
│ └── skills/
|
|
40
78
|
└── .waypoint/
|
|
41
79
|
├── WORKSPACE.md
|
|
42
80
|
├── DOCS_INDEX.md
|
|
43
81
|
├── docs/
|
|
44
82
|
├── context/
|
|
83
|
+
├── scripts/
|
|
45
84
|
└── ...
|
|
46
85
|
```
|
|
47
86
|
|
|
87
|
+
From there, start your Codex session in the repo and follow the generated bootstrap in `AGENTS.md`.
|
|
88
|
+
|
|
89
|
+
## Common init modes
|
|
90
|
+
|
|
91
|
+
### Minimal setup
|
|
92
|
+
|
|
93
|
+
```bash
|
|
94
|
+
waypoint init
|
|
95
|
+
```
|
|
96
|
+
|
|
97
|
+
### Full local workflow setup
|
|
98
|
+
|
|
99
|
+
```bash
|
|
100
|
+
waypoint init --with-roles --with-rules --with-automations
|
|
101
|
+
```
|
|
102
|
+
|
|
103
|
+
### App-friendly profile
|
|
104
|
+
|
|
105
|
+
```bash
|
|
106
|
+
waypoint init --app-friendly --with-roles --with-automations
|
|
107
|
+
```
|
|
108
|
+
|
|
109
|
+
Flags you can combine:
|
|
110
|
+
|
|
111
|
+
- `--app-friendly`
|
|
112
|
+
- `--with-roles`
|
|
113
|
+
- `--with-rules`
|
|
114
|
+
- `--with-automations`
|
|
115
|
+
|
|
48
116
|
## Main commands
|
|
49
117
|
|
|
50
118
|
- `waypoint init` — scaffold or refresh the repo
|
|
51
|
-
- `waypoint doctor` —
|
|
52
|
-
- `waypoint sync` — rebuild
|
|
53
|
-
- `waypoint upgrade` — update the
|
|
54
|
-
- `waypoint import-legacy` —
|
|
119
|
+
- `waypoint doctor` — validate health and report drift
|
|
120
|
+
- `waypoint sync` — rebuild the docs index and sync optional user-home artifacts
|
|
121
|
+
- `waypoint upgrade` — update the CLI and refresh the current repo using its saved config
|
|
122
|
+
- `waypoint import-legacy` — analyze an older repo layout and produce an adoption report
|
|
55
123
|
|
|
56
|
-
##
|
|
124
|
+
## Built-in skills
|
|
125
|
+
|
|
126
|
+
Waypoint ships a strong default skill pack for real coding work:
|
|
57
127
|
|
|
58
128
|
- `planning`
|
|
59
129
|
- `error-audit`
|
|
@@ -67,6 +137,8 @@ repo/
|
|
|
67
137
|
- `pr-review`
|
|
68
138
|
- `e2e-verify`
|
|
69
139
|
|
|
140
|
+
These are repo-local, so the workflow travels with the project.
|
|
141
|
+
|
|
70
142
|
## Optional reviewer roles
|
|
71
143
|
|
|
72
144
|
If you initialize with `--with-roles`, Waypoint scaffolds:
|
|
@@ -75,23 +147,54 @@ If you initialize with `--with-roles`, Waypoint scaffolds:
|
|
|
75
147
|
- `code-reviewer`
|
|
76
148
|
- `plan-reviewer`
|
|
77
149
|
|
|
78
|
-
The intended workflow is post-commit:
|
|
150
|
+
The intended workflow is post-commit: make your change, commit it, run the reviewers in parallel, fix real findings, then close out.
|
|
151
|
+
|
|
152
|
+
## What makes it different
|
|
153
|
+
|
|
154
|
+
Waypoint is not trying to hide everything behind hooks and background machinery.
|
|
155
|
+
|
|
156
|
+
It is opinionated, but explicit:
|
|
79
157
|
|
|
80
|
-
|
|
158
|
+
- state lives in files you can inspect
|
|
159
|
+
- docs routing is generated, not guessed from memory
|
|
160
|
+
- repo conventions are encoded in markdown
|
|
161
|
+
- startup context is rebuilt on purpose
|
|
162
|
+
- the repo remains the source of truth
|
|
163
|
+
|
|
164
|
+
## Upgrading
|
|
165
|
+
|
|
166
|
+
Recommended path:
|
|
81
167
|
|
|
82
168
|
```bash
|
|
83
169
|
waypoint upgrade
|
|
84
170
|
```
|
|
85
171
|
|
|
86
|
-
|
|
172
|
+
That updates the global CLI and refreshes the current repo using its existing Waypoint config.
|
|
173
|
+
|
|
174
|
+
If you only want to update the CLI:
|
|
87
175
|
|
|
88
176
|
```bash
|
|
89
177
|
waypoint upgrade --skip-repo-refresh
|
|
90
178
|
```
|
|
91
179
|
|
|
180
|
+
## Importing an existing repo
|
|
181
|
+
|
|
182
|
+
If you already have an older assistant setup or repo-memory system:
|
|
183
|
+
|
|
184
|
+
```bash
|
|
185
|
+
waypoint import-legacy /path/to/source-repo /path/to/new-repo --init-target
|
|
186
|
+
```
|
|
187
|
+
|
|
188
|
+
This generates an adoption report and helps separate durable docs from old runtime-specific scaffolding.
|
|
189
|
+
|
|
92
190
|
## Learn more
|
|
93
191
|
|
|
94
192
|
- [Overview](docs/overview.md)
|
|
95
193
|
- [Architecture](docs/architecture.md)
|
|
96
194
|
- [Upgrading](docs/upgrading.md)
|
|
97
195
|
- [Importing Existing Repositories](docs/importing-existing-repos.md)
|
|
196
|
+
- [Releasing](docs/releasing.md)
|
|
197
|
+
|
|
198
|
+
## License
|
|
199
|
+
|
|
200
|
+
MIT. See [LICENSE](LICENSE).
|
package/package.json
CHANGED
|
@@ -12,7 +12,19 @@ This is not the same as `e2e-verify`.
|
|
|
12
12
|
- `e2e-verify` proves the intended flow works end to end.
|
|
13
13
|
- `break-it-qa` tries to make the feature fail through invalid, interrupted, stale, repeated, or out-of-order interactions.
|
|
14
14
|
|
|
15
|
-
##
|
|
15
|
+
## Step 1: Ask The Three Setup Questions
|
|
16
|
+
|
|
17
|
+
Before testing, ask the user these questions if the answer is not already clear from context:
|
|
18
|
+
|
|
19
|
+
- what exact feature or scope should this cover?
|
|
20
|
+
- how many attack items should the break log reach before stopping?
|
|
21
|
+
- should the skill stop at findings or also fix clear issues after they are found?
|
|
22
|
+
|
|
23
|
+
Keep this intake short. These are the main user-controlled knobs for the skill.
|
|
24
|
+
|
|
25
|
+
If the user does not specify a count, use a reasonable default such as `40`.
|
|
26
|
+
|
|
27
|
+
## Step 2: Read First
|
|
16
28
|
|
|
17
29
|
Before verification:
|
|
18
30
|
|
|
@@ -23,20 +35,94 @@ Before verification:
|
|
|
23
35
|
5. Read every file listed in that manifest
|
|
24
36
|
6. Read the routed docs or nearby code that define the feature being tested
|
|
25
37
|
|
|
26
|
-
## Step
|
|
38
|
+
## Step 3: Identify Break Surfaces
|
|
27
39
|
|
|
28
40
|
- Identify the happy path first so you know what "broken" means.
|
|
29
41
|
- Find the fragile surfaces: forms, wizards, pending states, destructive actions, async transitions, navigation changes, and persisted state.
|
|
42
|
+
- For each major step or transition, ask explicit "What if...?" questions before testing. Examples:
|
|
43
|
+
- What if the user refreshes here?
|
|
44
|
+
- What if they go back now?
|
|
45
|
+
- What if they click twice?
|
|
46
|
+
- What if this input is empty, malformed, too long, or contradictory?
|
|
47
|
+
- What if this action succeeds in the UI but fails in persistence?
|
|
30
48
|
|
|
31
49
|
Do not test blindly.
|
|
32
50
|
|
|
33
|
-
## Step
|
|
51
|
+
## Step 4: Create A Break Log
|
|
52
|
+
|
|
53
|
+
Write or update a durable markdown log under `.waypoint/docs/`.
|
|
54
|
+
|
|
55
|
+
- Prefer a focused path such as `.waypoint/docs/verification/<feature>-break-it-qa.md`.
|
|
56
|
+
- If a routed verification doc already exists for this feature, update it instead of creating a competing file.
|
|
57
|
+
- The log is part of the skill, not an optional extra.
|
|
58
|
+
- Pre-generate the attack plan in this log before executing it. Do not improvise everything live.
|
|
59
|
+
|
|
60
|
+
Use one item per attempted action. A good entry shape is:
|
|
61
|
+
|
|
62
|
+
```markdown
|
|
63
|
+
- [ ] What if the user refreshes on the confirmation step before the request finishes?
|
|
64
|
+
Step: confirmation
|
|
65
|
+
Category: navigation
|
|
66
|
+
Status: pending
|
|
67
|
+
Observed: not tried yet
|
|
68
|
+
```
|
|
69
|
+
|
|
70
|
+
Then update each item as you go:
|
|
71
|
+
|
|
72
|
+
- `survived`
|
|
73
|
+
- `broke`
|
|
74
|
+
- `fixed`
|
|
75
|
+
- `retested-survived`
|
|
76
|
+
- `blocked`
|
|
77
|
+
- `not-applicable`
|
|
78
|
+
|
|
79
|
+
Every executed item must include:
|
|
80
|
+
|
|
81
|
+
- `Step`
|
|
82
|
+
- `Category`
|
|
83
|
+
- `Status`
|
|
84
|
+
- `Observed`
|
|
85
|
+
|
|
86
|
+
If the user sets a target such as "make this file 150 items long before you stop," treat that as a hard stopping condition unless you hit a real blocker and explain why.
|
|
87
|
+
|
|
88
|
+
Use consistent categories such as:
|
|
89
|
+
|
|
90
|
+
- `navigation`
|
|
91
|
+
- `input-validation`
|
|
92
|
+
- `repeat-action`
|
|
93
|
+
- `stale-state`
|
|
94
|
+
- `error-recovery`
|
|
95
|
+
- `destructive-action`
|
|
96
|
+
- `permissions`
|
|
97
|
+
- `async-state`
|
|
98
|
+
- `persistence`
|
|
99
|
+
|
|
100
|
+
## Step 5: Enforce Coverage Before Execution
|
|
101
|
+
|
|
102
|
+
Before you start executing attacks:
|
|
103
|
+
|
|
104
|
+
- pre-generate a meaningful attack list
|
|
105
|
+
- spread it across the major flow steps
|
|
106
|
+
- spread it across relevant categories
|
|
107
|
+
- make sure the count is not satisfied by one repetitive corner of the feature
|
|
108
|
+
|
|
109
|
+
Do not treat total item count alone as sufficient coverage.
|
|
110
|
+
|
|
111
|
+
If the user asks for a large target such as `150`, ensure the log covers multiple steps and multiple categories instead of padding one surface.
|
|
112
|
+
|
|
113
|
+
Anti-cheating rules:
|
|
114
|
+
|
|
115
|
+
- no filler items
|
|
116
|
+
- each attack must be meaningfully distinct
|
|
117
|
+
- reworded duplicates do not count toward the target
|
|
118
|
+
|
|
119
|
+
## Step 6: Use The Real UI
|
|
34
120
|
|
|
35
121
|
- Use `playwright-interactive`.
|
|
36
122
|
- Exercise the actual UI instead of mocking the flow in code.
|
|
37
123
|
- Keep the scope focused on the feature the user asked you to verify.
|
|
38
124
|
|
|
39
|
-
## Step
|
|
125
|
+
## Step 7: Try To Break It On Purpose
|
|
40
126
|
|
|
41
127
|
Do more than a happy-path walkthrough.
|
|
42
128
|
|
|
@@ -58,30 +144,41 @@ Actively try:
|
|
|
58
144
|
|
|
59
145
|
If the feature is stateful, also check whether the UI, network result, and persisted state stay coherent after those interactions.
|
|
60
146
|
|
|
61
|
-
|
|
147
|
+
As you test, keep expanding the break log with new "What if...?" cases that emerge from the flow. Do not rely on memory or chat-only notes.
|
|
148
|
+
|
|
149
|
+
## Step 8: Record And Fix Real Bugs
|
|
62
150
|
|
|
63
151
|
- Document each meaningful issue you find.
|
|
64
|
-
- Fix the issue when the remediation is clear.
|
|
152
|
+
- Fix the issue when the remediation is clear and the chosen mode includes fixes.
|
|
65
153
|
- If the behavior is ambiguous, call out the product decision instead of bluffing a fix.
|
|
66
154
|
- Update docs when the verification exposes stale assumptions about how the feature works.
|
|
155
|
+
- Update the break log entry for each attempted action with what happened and whether the feature survived.
|
|
156
|
+
- Require a short observed-result note for every executed item. "Worked" is too weak; capture what actually happened.
|
|
67
157
|
|
|
68
158
|
Do not stop at the first bug.
|
|
69
159
|
|
|
70
|
-
## Step
|
|
160
|
+
## Step 9: Repeat Until The Feature Resists Abuse
|
|
71
161
|
|
|
72
162
|
After fixes:
|
|
73
163
|
|
|
74
164
|
- rerun the relevant happy path
|
|
75
165
|
- rerun the break attempts that previously failed
|
|
166
|
+
- rerun directly related attacks
|
|
167
|
+
- rerun neighboring attacks that touch the same step, state transition, or failure surface
|
|
76
168
|
- verify the fix did not create a new inconsistent state
|
|
169
|
+
- keep adding and executing new "What if...?" items until the requested target coverage is reached
|
|
77
170
|
|
|
78
171
|
The skill is not done when the feature only works once. It is done when the feature behaves predictably under sloppy real-world use.
|
|
79
172
|
|
|
80
|
-
## Step
|
|
173
|
+
## Step 10: Report Truthfully
|
|
81
174
|
|
|
82
175
|
Summarize:
|
|
83
176
|
|
|
177
|
+
- the path to the break log markdown file
|
|
178
|
+
- how many attack items were recorded and exercised
|
|
179
|
+
- how coverage was distributed across steps and categories
|
|
84
180
|
- what break attempts you tried
|
|
85
181
|
- which issues you found
|
|
86
182
|
- what you fixed
|
|
183
|
+
- a short systemic-risks summary describing recurring weakness patterns, not just individual bugs
|
|
87
184
|
- what still looks risky or was not exercised
|