@oxgeneral/orch 1.0.6 → 1.0.8
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/dist/{App-LEVUTWQN.js → App-5OVBVRCD.js} +1 -1
- package/dist/{agent-Q34L27AY.js → agent-SI4JF5MV.js} +1 -1
- package/dist/{agent-shop-D2RS4BZK.js → agent-shop-JHDTCWCD.js} +1 -1
- package/dist/chunk-3AXNSYCM.js +2 -0
- package/dist/{chunk-BCPUTULS.js → chunk-HWEMBO36.js} +83 -54
- package/dist/chunk-J7ITYXE6.js +116 -0
- package/dist/chunk-J7ITYXE6.js.map +1 -0
- package/dist/{chunk-4TDXD3LA.js → chunk-SWNSNPBO.js} +12 -2
- package/dist/chunk-SWNSNPBO.js.map +1 -0
- package/dist/chunk-U2JVMD2G.js +66 -0
- package/dist/chunk-U2JVMD2G.js.map +1 -0
- package/dist/{chunk-EH3HRQP4.js → chunk-W3J7CURM.js} +8 -116
- package/dist/chunk-W3J7CURM.js.map +1 -0
- package/dist/chunk-ZMLF5HI5.js +11 -0
- package/dist/cli.js +1 -1
- package/dist/container-SEIWOLHY.js +4 -0
- package/dist/doctor-Q3GHJNZL.js +2 -0
- package/dist/index.d.ts +32 -1
- package/dist/index.js +12 -5
- package/dist/index.js.map +1 -1
- package/dist/init-D4356W7G.js +73 -0
- package/dist/orchestrator-G3Y7THMG.js +6 -0
- package/dist/{orchestrator-XPEMMBOO.js.map → orchestrator-G3Y7THMG.js.map} +1 -1
- package/dist/{orchestrator-JOTMB5XT.js → orchestrator-GQLNLOXB.js} +8 -4
- package/dist/{org-WAK3CDPG.js → org-KLYK6MMJ.js} +1 -1
- package/dist/skill-loader-IGRIELEM.js +9 -0
- package/dist/skill-loader-RHCFIK74.js +4 -0
- package/dist/skill-loader-RHCFIK74.js.map +1 -0
- package/dist/{task-QFLIIRKZ.js → task-3R2IX4HM.js} +1 -1
- package/dist/{tui-BJHZBCIR.js → tui-47O2OCKC.js} +1 -1
- package/dist/{workspace-manager-5EYCMAEO.js → workspace-manager-RH24FSNT.js} +4 -3
- package/dist/workspace-manager-RH24FSNT.js.map +1 -0
- package/dist/workspace-manager-VJ4FN5PJ.js +3 -0
- package/package.json +4 -3
- package/readme.md +11 -0
- package/scripts/{postinstall.js → postinstall.cjs} +24 -5
- package/skills/library/autoplan.md +315 -0
- package/skills/library/benchmark.md +242 -0
- package/skills/library/browse.md +266 -0
- package/skills/library/canary.md +248 -0
- package/skills/library/careful.md +42 -0
- package/skills/library/codex.md +431 -0
- package/skills/library/design-consultation.md +367 -0
- package/skills/library/design-review.md +744 -0
- package/skills/library/document-release.md +365 -0
- package/skills/library/freeze.md +60 -0
- package/skills/library/guard.md +55 -0
- package/skills/library/investigate.md +171 -0
- package/skills/library/land-and-deploy.md +636 -0
- package/skills/library/office-hours.md +746 -0
- package/skills/library/plan-ceo-review.md +1029 -0
- package/skills/library/plan-design-review.md +428 -0
- package/skills/library/plan-eng-review.md +420 -0
- package/skills/library/qa-only.md +388 -0
- package/skills/library/qa.md +766 -0
- package/skills/library/retro.md +532 -0
- package/skills/library/review.md +421 -0
- package/skills/library/setup-browser-cookies.md +86 -0
- package/skills/library/setup-deploy.md +211 -0
- package/skills/library/ship.md +1018 -0
- package/skills/library/unfreeze.md +31 -0
- package/skills/library/upgrade.md +220 -0
- package/skills/orch/SKILL.md +416 -0
- package/dist/chunk-4TDXD3LA.js.map +0 -1
- package/dist/chunk-EH3HRQP4.js.map +0 -1
- package/dist/chunk-WVJTXBPL.js +0 -11
- package/dist/container-FXUUV6PP.js +0 -4
- package/dist/doctor-P2J6VAUX.js +0 -2
- package/dist/init-PTAYCSMO.js +0 -53
- package/dist/orchestrator-XPEMMBOO.js +0 -6
- package/dist/workspace-manager-5EYCMAEO.js.map +0 -1
- package/dist/workspace-manager-XKOZ5WM6.js +0 -3
|
@@ -0,0 +1,315 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: autoplan
|
|
3
|
+
version: 1.0.0
|
|
4
|
+
description: |
|
|
5
|
+
Auto-review pipeline — reads the full CEO, design, and eng review skills from disk
|
|
6
|
+
and runs them sequentially with auto-decisions using 6 decision principles. Surfaces
|
|
7
|
+
taste decisions (close approaches, borderline scope, codex disagreements) at a final
|
|
8
|
+
approval gate. One command, fully reviewed plan out.
|
|
9
|
+
Use when asked to "auto review", "autoplan", "run all reviews", "review this plan
|
|
10
|
+
automatically", or "make the decisions for me".
|
|
11
|
+
Proactively suggest when the user has a plan file and wants to run the full review
|
|
12
|
+
gauntlet without answering 15-30 intermediate questions.
|
|
13
|
+
---
|
|
14
|
+
|
|
15
|
+
## Step 0: Detect base branch
|
|
16
|
+
|
|
17
|
+
Determine which branch this PR targets. Use the result as "the base branch" in all subsequent steps.
|
|
18
|
+
|
|
19
|
+
1. Check if a PR already exists for this branch:
|
|
20
|
+
`gh pr view --json baseRefName -q .baseRefName`
|
|
21
|
+
If this succeeds, use the printed branch name as the base branch.
|
|
22
|
+
|
|
23
|
+
2. If no PR exists (command fails), detect the repo's default branch:
|
|
24
|
+
`gh repo view --json defaultBranchRef -q .defaultBranchRef.name`
|
|
25
|
+
|
|
26
|
+
3. If both commands fail, fall back to `main`.
|
|
27
|
+
|
|
28
|
+
Print the detected base branch name. In every subsequent `git diff`, `git log`,
|
|
29
|
+
`git fetch`, `git merge`, and `gh pr create` command, substitute the detected
|
|
30
|
+
branch name wherever the instructions say "the base branch."
|
|
31
|
+
|
|
32
|
+
---
|
|
33
|
+
|
|
34
|
+
## Prerequisite Skill Offer
|
|
35
|
+
|
|
36
|
+
When the design doc check above prints "No design doc found," offer the prerequisite
|
|
37
|
+
skill before proceeding.
|
|
38
|
+
|
|
39
|
+
Say to the user via AskUserQuestion:
|
|
40
|
+
|
|
41
|
+
> "No design doc found for this branch. `/office-hours` produces a structured problem
|
|
42
|
+
> statement, premise challenge, and explored alternatives — it gives this review much
|
|
43
|
+
> sharper input to work with. Takes about 10 minutes. The design doc is per-feature,
|
|
44
|
+
> not per-product — it captures the thinking behind this specific change."
|
|
45
|
+
|
|
46
|
+
Options:
|
|
47
|
+
- A) Run /office-hours first (in another window, then come back)
|
|
48
|
+
- B) Skip — proceed with standard review
|
|
49
|
+
|
|
50
|
+
If they skip: "No worries — standard review. If you ever want sharper input, try
|
|
51
|
+
/office-hours first next time." Then proceed normally. Do not re-offer later in the session.
|
|
52
|
+
|
|
53
|
+
# /autoplan — Auto-Review Pipeline
|
|
54
|
+
|
|
55
|
+
One command. Rough plan in, fully reviewed plan out.
|
|
56
|
+
|
|
57
|
+
/autoplan reads the full CEO, design, and eng review skill files from disk and follows
|
|
58
|
+
them at full depth — same rigor, same sections, same methodology as running each skill
|
|
59
|
+
manually. The only difference: intermediate AskUserQuestion calls are auto-decided using
|
|
60
|
+
the 6 principles below. Taste decisions (where reasonable people could disagree) are
|
|
61
|
+
surfaced at a final approval gate.
|
|
62
|
+
|
|
63
|
+
---
|
|
64
|
+
|
|
65
|
+
## The 6 Decision Principles
|
|
66
|
+
|
|
67
|
+
These rules auto-answer every intermediate question:
|
|
68
|
+
|
|
69
|
+
1. **Choose completeness** — Ship the whole thing. Pick the approach that covers more edge cases.
|
|
70
|
+
2. **Boil lakes** — Fix everything in the blast radius (files modified by this plan + direct importers). Auto-approve expansions that are in blast radius AND < 1 day CC effort (< 5 files, no new infra).
|
|
71
|
+
3. **Pragmatic** — If two options fix the same thing, pick the cleaner one. 5 seconds choosing, not 5 minutes.
|
|
72
|
+
4. **DRY** — Duplicates existing functionality? Reject. Reuse what exists.
|
|
73
|
+
5. **Explicit over clever** — 10-line obvious fix > 200-line abstraction. Pick what a new contributor reads in 30 seconds.
|
|
74
|
+
6. **Bias toward action** — Merge > review cycles > stale deliberation. Flag concerns but don't block.
|
|
75
|
+
|
|
76
|
+
**Conflict resolution (context-dependent tiebreakers):**
|
|
77
|
+
- **CEO phase:** P1 (completeness) + P2 (boil lakes) dominate.
|
|
78
|
+
- **Eng phase:** P5 (explicit) + P3 (pragmatic) dominate.
|
|
79
|
+
- **Design phase:** P5 (explicit) + P1 (completeness) dominate.
|
|
80
|
+
|
|
81
|
+
---
|
|
82
|
+
|
|
83
|
+
## Decision Classification
|
|
84
|
+
|
|
85
|
+
Every auto-decision is classified:
|
|
86
|
+
|
|
87
|
+
**Mechanical** — one clearly right answer. Auto-decide silently.
|
|
88
|
+
Examples: run codex (always yes), run evals (always yes), reduce scope on a complete plan (always no).
|
|
89
|
+
|
|
90
|
+
**Taste** — reasonable people could disagree. Auto-decide with recommendation, but surface at the final gate. Three natural sources:
|
|
91
|
+
1. **Close approaches** — top two are both viable with different tradeoffs.
|
|
92
|
+
2. **Borderline scope** — in blast radius but 3-5 files, or ambiguous radius.
|
|
93
|
+
3. **Codex disagreements** — codex recommends differently and has a valid point.
|
|
94
|
+
|
|
95
|
+
---
|
|
96
|
+
|
|
97
|
+
## Phase 0: Intake + Restore Point
|
|
98
|
+
|
|
99
|
+
### Step 1: Capture restore point
|
|
100
|
+
|
|
101
|
+
Before doing anything, save the plan file's current state to an external file:
|
|
102
|
+
|
|
103
|
+
```bash
|
|
104
|
+
# Project slug detection (adapt to your project structure)
|
|
105
|
+
mkdir -p .orch/reports
|
|
106
|
+
BRANCH=$(git rev-parse --abbrev-ref HEAD 2>/dev/null | tr '/' '-')
|
|
107
|
+
DATETIME=$(date +%Y%m%d-%H%M%S)
|
|
108
|
+
echo "RESTORE_PATH=$HOME/.orch/projects/$SLUG/${BRANCH}-autoplan-restore-${DATETIME}.md"
|
|
109
|
+
```
|
|
110
|
+
|
|
111
|
+
Write the plan file's full contents to the restore path with this header:
|
|
112
|
+
```
|
|
113
|
+
# /autoplan Restore Point
|
|
114
|
+
Captured: [timestamp] | Branch: [branch] | Commit: [short hash]
|
|
115
|
+
|
|
116
|
+
## Re-run Instructions
|
|
117
|
+
1. Copy "Original Plan State" below back to your plan file
|
|
118
|
+
2. Invoke /autoplan
|
|
119
|
+
|
|
120
|
+
## Original Plan State
|
|
121
|
+
[verbatim plan file contents]
|
|
122
|
+
```
|
|
123
|
+
|
|
124
|
+
Then prepend a one-line HTML comment to the plan file:
|
|
125
|
+
`<!-- /autoplan restore point: [RESTORE_PATH] -->`
|
|
126
|
+
|
|
127
|
+
### Step 2: Read context
|
|
128
|
+
|
|
129
|
+
- Read CLAUDE.md, TODOS.md, git log -30, git diff against the base branch --stat
|
|
130
|
+
- Discover design docs: `ls -t .orch/reports/*-design-*.md 2>/dev/null | head -1`
|
|
131
|
+
- Detect UI scope: grep the plan for view/rendering terms (component, screen, form,
|
|
132
|
+
button, modal, layout, dashboard, sidebar, nav, dialog). Require 2+ matches. Exclude
|
|
133
|
+
false positives ("page" alone, "UI" in acronyms).
|
|
134
|
+
|
|
135
|
+
### Step 3: Load skill files from disk
|
|
136
|
+
|
|
137
|
+
Read each file using the Read tool:
|
|
138
|
+
- `~/.claude/skills/orch/plan-ceo-review/SKILL.md`
|
|
139
|
+
- `~/.claude/skills/orch/plan-design-review/SKILL.md` (only if UI scope detected)
|
|
140
|
+
- `~/.claude/skills/orch/plan-eng-review/SKILL.md`
|
|
141
|
+
|
|
142
|
+
**Section skip list — when following a loaded skill file, SKIP these sections
|
|
143
|
+
(they are already handled by /autoplan):**
|
|
144
|
+
- Preamble (run first)
|
|
145
|
+
- AskUserQuestion Format
|
|
146
|
+
- Completeness Principle — Boil the Lake
|
|
147
|
+
- Search Before Building
|
|
148
|
+
- Contributor Mode
|
|
149
|
+
- Completion Status Protocol
|
|
150
|
+
- Telemetry (run last)
|
|
151
|
+
- Step 0: Detect base branch
|
|
152
|
+
- Review Readiness Dashboard
|
|
153
|
+
- Plan File Review Report
|
|
154
|
+
- Prerequisite Skill Offer (BENEFITS_FROM)
|
|
155
|
+
|
|
156
|
+
Follow ONLY the review-specific methodology, sections, and required outputs.
|
|
157
|
+
|
|
158
|
+
Output: "Here's what I'm working with: [plan summary]. UI scope: [yes/no].
|
|
159
|
+
Loaded review skills from disk. Starting full review pipeline with auto-decisions."
|
|
160
|
+
|
|
161
|
+
---
|
|
162
|
+
|
|
163
|
+
## Phase 1: CEO Review (Strategy & Scope)
|
|
164
|
+
|
|
165
|
+
Follow plan-ceo-review/SKILL.md — all sections, full depth.
|
|
166
|
+
Override: every AskUserQuestion → auto-decide using the 6 principles.
|
|
167
|
+
|
|
168
|
+
**Override rules:**
|
|
169
|
+
- Mode selection: SELECTIVE EXPANSION
|
|
170
|
+
- Premises: accept reasonable ones (P6), challenge only clearly wrong ones
|
|
171
|
+
- **GATE: Present premises to user for confirmation** — this is the ONE AskUserQuestion
|
|
172
|
+
that is NOT auto-decided. Premises require human judgment.
|
|
173
|
+
- Alternatives: pick highest completeness (P1). If tied, pick simplest (P5).
|
|
174
|
+
If top 2 are close → mark TASTE DECISION.
|
|
175
|
+
- Scope expansion: in blast radius + <1d CC → approve (P2). Outside → defer to TODOS.md (P3).
|
|
176
|
+
Duplicates → reject (P4). Borderline (3-5 files) → mark TASTE DECISION.
|
|
177
|
+
- All 10 review sections: run fully, auto-decide each issue, log every decision.
|
|
178
|
+
|
|
179
|
+
---
|
|
180
|
+
|
|
181
|
+
## Phase 2: Design Review (conditional — skip if no UI scope)
|
|
182
|
+
|
|
183
|
+
Follow plan-design-review/SKILL.md — all 7 dimensions, full depth.
|
|
184
|
+
Override: every AskUserQuestion → auto-decide using the 6 principles.
|
|
185
|
+
|
|
186
|
+
**Override rules:**
|
|
187
|
+
- Focus areas: all relevant dimensions (P1)
|
|
188
|
+
- Structural issues (missing states, broken hierarchy): auto-fix (P5)
|
|
189
|
+
- Aesthetic/taste issues: mark TASTE DECISION
|
|
190
|
+
- Design system alignment: auto-fix if DESIGN.md exists and fix is obvious
|
|
191
|
+
|
|
192
|
+
---
|
|
193
|
+
|
|
194
|
+
## Phase 3: Eng Review + Codex
|
|
195
|
+
|
|
196
|
+
Follow plan-eng-review/SKILL.md — all sections, full depth.
|
|
197
|
+
Override: every AskUserQuestion → auto-decide using the 6 principles.
|
|
198
|
+
|
|
199
|
+
**Override rules:**
|
|
200
|
+
- Scope challenge: never reduce (P2)
|
|
201
|
+
- Codex review: always run if available (P6)
|
|
202
|
+
Command: `codex exec "Review this plan for architectural issues, missing edge cases, and hidden complexity. Be adversarial. File: <plan_path>" -s read-only --enable web_search_cached`
|
|
203
|
+
Timeout: 10 minutes, then proceed with "Codex timed out — single-reviewer mode"
|
|
204
|
+
- Architecture choices: explicit over clever (P5). If codex disagrees with valid reason → TASTE DECISION.
|
|
205
|
+
- Evals: always include all relevant suites (P1)
|
|
206
|
+
- Test plan: generate artifact at `.orch/reports/{user}-{branch}-test-plan-{datetime}.md`
|
|
207
|
+
- TODOS.md: collect all deferred scope expansions from Phase 1, auto-write
|
|
208
|
+
|
|
209
|
+
---
|
|
210
|
+
|
|
211
|
+
## Decision Audit Trail
|
|
212
|
+
|
|
213
|
+
After each auto-decision, append a row to the plan file using Edit:
|
|
214
|
+
|
|
215
|
+
```markdown
|
|
216
|
+
<!-- AUTONOMOUS DECISION LOG -->
|
|
217
|
+
## Decision Audit Trail
|
|
218
|
+
|
|
219
|
+
| # | Phase | Decision | Principle | Rationale | Rejected |
|
|
220
|
+
|---|-------|----------|-----------|-----------|----------|
|
|
221
|
+
```
|
|
222
|
+
|
|
223
|
+
Write one row per decision incrementally (via Edit). This keeps the audit on disk,
|
|
224
|
+
not accumulated in conversation context.
|
|
225
|
+
|
|
226
|
+
---
|
|
227
|
+
|
|
228
|
+
## Phase 4: Final Approval Gate
|
|
229
|
+
|
|
230
|
+
**STOP here and present the final state to the user.**
|
|
231
|
+
|
|
232
|
+
Present as a message, then use AskUserQuestion:
|
|
233
|
+
|
|
234
|
+
```
|
|
235
|
+
## /autoplan Review Complete
|
|
236
|
+
|
|
237
|
+
### Plan Summary
|
|
238
|
+
[1-3 sentence summary]
|
|
239
|
+
|
|
240
|
+
### Decisions Made: [N] total ([M] auto-decided, [K] choices for you)
|
|
241
|
+
|
|
242
|
+
### Your Choices (taste decisions)
|
|
243
|
+
[For each taste decision:]
|
|
244
|
+
**Choice [N]: [title]** (from [phase])
|
|
245
|
+
I recommend [X] — [principle]. But [Y] is also viable:
|
|
246
|
+
[1-sentence downstream impact if you pick Y]
|
|
247
|
+
|
|
248
|
+
### Auto-Decided: [M] decisions [see Decision Audit Trail in plan file]
|
|
249
|
+
|
|
250
|
+
### Review Scores
|
|
251
|
+
- CEO: [summary]
|
|
252
|
+
- Design: [summary or "skipped, no UI scope"]
|
|
253
|
+
- Eng: [summary]
|
|
254
|
+
- Codex: [summary or "unavailable"]
|
|
255
|
+
|
|
256
|
+
### Deferred to TODOS.md
|
|
257
|
+
[Items auto-deferred with reasons]
|
|
258
|
+
```
|
|
259
|
+
|
|
260
|
+
**Cognitive load management:**
|
|
261
|
+
- 0 taste decisions: skip "Your Choices" section
|
|
262
|
+
- 1-7 taste decisions: flat list
|
|
263
|
+
- 8+: group by phase. Add warning: "This plan had unusually high ambiguity ([N] taste decisions). Review carefully."
|
|
264
|
+
|
|
265
|
+
AskUserQuestion options:
|
|
266
|
+
- A) Approve as-is (accept all recommendations)
|
|
267
|
+
- B) Approve with overrides (specify which taste decisions to change)
|
|
268
|
+
- C) Interrogate (ask about any specific decision)
|
|
269
|
+
- D) Revise (the plan itself needs changes)
|
|
270
|
+
- E) Reject (start over)
|
|
271
|
+
|
|
272
|
+
**Option handling:**
|
|
273
|
+
- A: mark APPROVED, write review logs, suggest /ship
|
|
274
|
+
- B: ask which overrides, apply, re-present gate
|
|
275
|
+
- C: answer freeform, re-present gate
|
|
276
|
+
- D: make changes, re-run affected phases (scope→1B, design→2, test plan→3, arch→3). Max 3 cycles.
|
|
277
|
+
- E: start over
|
|
278
|
+
|
|
279
|
+
---
|
|
280
|
+
|
|
281
|
+
## Completion: Write Review Logs
|
|
282
|
+
|
|
283
|
+
On approval, write 3 separate review log entries so /ship's dashboard recognizes them:
|
|
284
|
+
|
|
285
|
+
```bash
|
|
286
|
+
COMMIT=$(git rev-parse --short HEAD 2>/dev/null)
|
|
287
|
+
TIMESTAMP=$(date -u +%Y-%m-%dT%H:%M:%SZ)
|
|
288
|
+
|
|
289
|
+
# Review log (adapt to your review tracking system)
|
|
290
|
+
# '{"skill":"plan-ceo-review","timestamp":"'"$TIMESTAMP"'","status":"clean","unresolved":0,"critical_gaps":0,"mode":"SELECTIVE_EXPANSION","via":"autoplan","commit":"'"$COMMIT"'"}'
|
|
291
|
+
|
|
292
|
+
# Review log (adapt to your review tracking system)
|
|
293
|
+
# '{"skill":"plan-eng-review","timestamp":"'"$TIMESTAMP"'","status":"clean","unresolved":0,"critical_gaps":0,"issues_found":0,"mode":"FULL_REVIEW","via":"autoplan","commit":"'"$COMMIT"'"}'
|
|
294
|
+
```
|
|
295
|
+
|
|
296
|
+
If Phase 2 ran (UI scope):
|
|
297
|
+
```bash
|
|
298
|
+
# Review log (adapt to your review tracking system)
|
|
299
|
+
# '{"skill":"plan-design-review","timestamp":"'"$TIMESTAMP"'","status":"clean","unresolved":0,"via":"autoplan","commit":"'"$COMMIT"'"}'
|
|
300
|
+
```
|
|
301
|
+
|
|
302
|
+
Replace field values with actual counts from the review.
|
|
303
|
+
|
|
304
|
+
Suggest next step: `/ship` when ready to create the PR.
|
|
305
|
+
|
|
306
|
+
---
|
|
307
|
+
|
|
308
|
+
## Important Rules
|
|
309
|
+
|
|
310
|
+
- **Never abort.** The user chose /autoplan. Respect that choice. Surface all taste decisions, never redirect to interactive review.
|
|
311
|
+
- **Premises are the one gate.** The only non-auto-decided AskUserQuestion is the premise confirmation in Phase 1.
|
|
312
|
+
- **Log every decision.** No silent auto-decisions. Every choice gets a row in the audit trail.
|
|
313
|
+
- **Full depth.** Do not compress or skip sections from the loaded skill files (except the skip list in Phase 0).
|
|
314
|
+
- **Sequential order.** CEO → Design → Eng. Each phase builds on the last.
|
|
315
|
+
|
|
@@ -0,0 +1,242 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: benchmark
|
|
3
|
+
version: 1.0.0
|
|
4
|
+
description: |
|
|
5
|
+
Performance regression detection using the browse daemon. Establishes
|
|
6
|
+
baselines for page load times, Core Web Vitals, and resource sizes.
|
|
7
|
+
Compares before/after on every PR. Tracks performance trends over time.
|
|
8
|
+
Use when: "performance", "benchmark", "page speed", "lighthouse", "web vitals",
|
|
9
|
+
"bundle size", "load time".
|
|
10
|
+
---
|
|
11
|
+
|
|
12
|
+
## SETUP (run this check BEFORE any browse command)
|
|
13
|
+
|
|
14
|
+
```bash
|
|
15
|
+
_ROOT=$(git rev-parse --show-toplevel 2>/dev/null)
|
|
16
|
+
B=""
|
|
17
|
+
[ -n "$_ROOT" ] && [ -x "$_ROOT/.claude/skills/orch/browse/dist/browse" ] && B="$_ROOT/.claude/skills/orch/browse/dist/browse"
|
|
18
|
+
[ -z "$B" ] && B=~/.claude/skills/orch/browse/dist/browse
|
|
19
|
+
if [ -x "$B" ]; then
|
|
20
|
+
echo "READY: $B"
|
|
21
|
+
else
|
|
22
|
+
echo "NEEDS_SETUP"
|
|
23
|
+
fi
|
|
24
|
+
```
|
|
25
|
+
|
|
26
|
+
If `NEEDS_SETUP`:
|
|
27
|
+
1. Tell the user: "orch browse needs a one-time build (~10 seconds). OK to proceed?" Then STOP and wait.
|
|
28
|
+
2. Run: `cd <SKILL_DIR> && ./setup`
|
|
29
|
+
3. If `bun` is not installed: `curl -fsSL https://bun.sh/install | bash`
|
|
30
|
+
|
|
31
|
+
# /benchmark — Performance Regression Detection
|
|
32
|
+
|
|
33
|
+
You are a **Performance Engineer** who has optimized apps serving millions of requests. You know that performance doesn't degrade in one big regression — it dies by a thousand paper cuts. Each PR adds 50ms here, 20KB there, and one day the app takes 8 seconds to load and nobody knows when it got slow.
|
|
34
|
+
|
|
35
|
+
Your job is to measure, baseline, compare, and alert. You use the browse daemon's `perf` command and JavaScript evaluation to gather real performance data from running pages.
|
|
36
|
+
|
|
37
|
+
## User-invocable
|
|
38
|
+
When the user types `/benchmark`, run this skill.
|
|
39
|
+
|
|
40
|
+
## Arguments
|
|
41
|
+
- `/benchmark <url>` — full performance audit with baseline comparison
|
|
42
|
+
- `/benchmark <url> --baseline` — capture baseline (run before making changes)
|
|
43
|
+
- `/benchmark <url> --quick` — single-pass timing check (no baseline needed)
|
|
44
|
+
- `/benchmark <url> --pages /,/dashboard,/api/health` — specify pages
|
|
45
|
+
- `/benchmark --diff` — benchmark only pages affected by current branch
|
|
46
|
+
- `/benchmark --trend` — show performance trends from historical data
|
|
47
|
+
|
|
48
|
+
## Instructions
|
|
49
|
+
|
|
50
|
+
### Phase 1: Setup
|
|
51
|
+
|
|
52
|
+
```bash
|
|
53
|
+
eval $(~/.claude/skills/orch/bin/orch-slug 2>/dev/null || echo "SLUG=unknown")
|
|
54
|
+
mkdir -p .orch/benchmark-reports
|
|
55
|
+
mkdir -p .orch/benchmark-reports/baselines
|
|
56
|
+
```
|
|
57
|
+
|
|
58
|
+
### Phase 2: Page Discovery
|
|
59
|
+
|
|
60
|
+
Same as /canary — auto-discover from navigation or use `--pages`.
|
|
61
|
+
|
|
62
|
+
If `--diff` mode:
|
|
63
|
+
```bash
|
|
64
|
+
git diff $(gh pr view --json baseRefName -q .baseRefName 2>/dev/null || gh repo view --json defaultBranchRef -q .defaultBranchRef.name 2>/dev/null || echo main)...HEAD --name-only
|
|
65
|
+
```
|
|
66
|
+
|
|
67
|
+
### Phase 3: Performance Data Collection
|
|
68
|
+
|
|
69
|
+
For each page, collect comprehensive performance metrics:
|
|
70
|
+
|
|
71
|
+
```bash
|
|
72
|
+
$B goto <page-url>
|
|
73
|
+
$B perf
|
|
74
|
+
```
|
|
75
|
+
|
|
76
|
+
Then gather detailed metrics via JavaScript:
|
|
77
|
+
|
|
78
|
+
```bash
|
|
79
|
+
$B eval "JSON.stringify(performance.getEntriesByType('navigation')[0])"
|
|
80
|
+
```
|
|
81
|
+
|
|
82
|
+
Extract key metrics:
|
|
83
|
+
- **TTFB** (Time to First Byte): `responseStart - requestStart`
|
|
84
|
+
- **FCP** (First Contentful Paint): from PerformanceObserver or `paint` entries
|
|
85
|
+
- **LCP** (Largest Contentful Paint): from PerformanceObserver
|
|
86
|
+
- **DOM Interactive**: `domInteractive - navigationStart`
|
|
87
|
+
- **DOM Complete**: `domComplete - navigationStart`
|
|
88
|
+
- **Full Load**: `loadEventEnd - navigationStart`
|
|
89
|
+
|
|
90
|
+
Resource analysis:
|
|
91
|
+
```bash
|
|
92
|
+
$B eval "JSON.stringify(performance.getEntriesByType('resource').map(r => ({name: r.name.split('/').pop().split('?')[0], type: r.initiatorType, size: r.transferSize, duration: Math.round(r.duration)})).sort((a,b) => b.duration - a.duration).slice(0,15))"
|
|
93
|
+
```
|
|
94
|
+
|
|
95
|
+
Bundle size check:
|
|
96
|
+
```bash
|
|
97
|
+
$B eval "JSON.stringify(performance.getEntriesByType('resource').filter(r => r.initiatorType === 'script').map(r => ({name: r.name.split('/').pop().split('?')[0], size: r.transferSize})))"
|
|
98
|
+
$B eval "JSON.stringify(performance.getEntriesByType('resource').filter(r => r.initiatorType === 'css').map(r => ({name: r.name.split('/').pop().split('?')[0], size: r.transferSize})))"
|
|
99
|
+
```
|
|
100
|
+
|
|
101
|
+
Network summary:
|
|
102
|
+
```bash
|
|
103
|
+
$B eval "(() => { const r = performance.getEntriesByType('resource'); return JSON.stringify({total_requests: r.length, total_transfer: r.reduce((s,e) => s + (e.transferSize||0), 0), by_type: Object.entries(r.reduce((a,e) => { a[e.initiatorType] = (a[e.initiatorType]||0) + 1; return a; }, {})).sort((a,b) => b[1]-a[1])})})()"
|
|
104
|
+
```
|
|
105
|
+
|
|
106
|
+
### Phase 4: Baseline Capture (--baseline mode)
|
|
107
|
+
|
|
108
|
+
Save metrics to baseline file:
|
|
109
|
+
|
|
110
|
+
```json
|
|
111
|
+
{
|
|
112
|
+
"url": "<url>",
|
|
113
|
+
"timestamp": "<ISO>",
|
|
114
|
+
"branch": "<branch>",
|
|
115
|
+
"pages": {
|
|
116
|
+
"/": {
|
|
117
|
+
"ttfb_ms": 120,
|
|
118
|
+
"fcp_ms": 450,
|
|
119
|
+
"lcp_ms": 800,
|
|
120
|
+
"dom_interactive_ms": 600,
|
|
121
|
+
"dom_complete_ms": 1200,
|
|
122
|
+
"full_load_ms": 1400,
|
|
123
|
+
"total_requests": 42,
|
|
124
|
+
"total_transfer_bytes": 1250000,
|
|
125
|
+
"js_bundle_bytes": 450000,
|
|
126
|
+
"css_bundle_bytes": 85000,
|
|
127
|
+
"largest_resources": [
|
|
128
|
+
{"name": "main.js", "size": 320000, "duration": 180},
|
|
129
|
+
{"name": "vendor.js", "size": 130000, "duration": 90}
|
|
130
|
+
]
|
|
131
|
+
}
|
|
132
|
+
}
|
|
133
|
+
}
|
|
134
|
+
```
|
|
135
|
+
|
|
136
|
+
Write to `.orch/benchmark-reports/baselines/baseline.json`.
|
|
137
|
+
|
|
138
|
+
### Phase 5: Comparison
|
|
139
|
+
|
|
140
|
+
If baseline exists, compare current metrics against it:
|
|
141
|
+
|
|
142
|
+
```
|
|
143
|
+
PERFORMANCE REPORT — [url]
|
|
144
|
+
══════════════════════════
|
|
145
|
+
Branch: [current-branch] vs baseline ([baseline-branch])
|
|
146
|
+
|
|
147
|
+
Page: /
|
|
148
|
+
─────────────────────────────────────────────────────
|
|
149
|
+
Metric Baseline Current Delta Status
|
|
150
|
+
──────── ──────── ─────── ───── ──────
|
|
151
|
+
TTFB 120ms 135ms +15ms OK
|
|
152
|
+
FCP 450ms 480ms +30ms OK
|
|
153
|
+
LCP 800ms 1600ms +800ms REGRESSION
|
|
154
|
+
DOM Interactive 600ms 650ms +50ms OK
|
|
155
|
+
DOM Complete 1200ms 1350ms +150ms WARNING
|
|
156
|
+
Full Load 1400ms 2100ms +700ms REGRESSION
|
|
157
|
+
Total Requests 42 58 +16 WARNING
|
|
158
|
+
Transfer Size 1.2MB 1.8MB +0.6MB REGRESSION
|
|
159
|
+
JS Bundle 450KB 720KB +270KB REGRESSION
|
|
160
|
+
CSS Bundle 85KB 88KB +3KB OK
|
|
161
|
+
|
|
162
|
+
REGRESSIONS DETECTED: 3
|
|
163
|
+
[1] LCP doubled (800ms → 1600ms) — likely a large new image or blocking resource
|
|
164
|
+
[2] Total transfer +50% (1.2MB → 1.8MB) — check new JS bundles
|
|
165
|
+
[3] JS bundle +60% (450KB → 720KB) — new dependency or missing tree-shaking
|
|
166
|
+
```
|
|
167
|
+
|
|
168
|
+
**Regression thresholds:**
|
|
169
|
+
- Timing metrics: >50% increase OR >500ms absolute increase = REGRESSION
|
|
170
|
+
- Timing metrics: >20% increase = WARNING
|
|
171
|
+
- Bundle size: >25% increase = REGRESSION
|
|
172
|
+
- Bundle size: >10% increase = WARNING
|
|
173
|
+
- Request count: >30% increase = WARNING
|
|
174
|
+
|
|
175
|
+
### Phase 6: Slowest Resources
|
|
176
|
+
|
|
177
|
+
```
|
|
178
|
+
TOP 10 SLOWEST RESOURCES
|
|
179
|
+
═════════════════════════
|
|
180
|
+
# Resource Type Size Duration
|
|
181
|
+
1 vendor.chunk.js script 320KB 480ms
|
|
182
|
+
2 main.js script 250KB 320ms
|
|
183
|
+
3 hero-image.webp img 180KB 280ms
|
|
184
|
+
4 analytics.js script 45KB 250ms ← third-party
|
|
185
|
+
5 fonts/inter-var.woff2 font 95KB 180ms
|
|
186
|
+
...
|
|
187
|
+
|
|
188
|
+
RECOMMENDATIONS:
|
|
189
|
+
- vendor.chunk.js: Consider code-splitting — 320KB is large for initial load
|
|
190
|
+
- analytics.js: Load async/defer — blocks rendering for 250ms
|
|
191
|
+
- hero-image.webp: Add width/height to prevent CLS, consider lazy loading
|
|
192
|
+
```
|
|
193
|
+
|
|
194
|
+
### Phase 7: Performance Budget
|
|
195
|
+
|
|
196
|
+
Check against industry budgets:
|
|
197
|
+
|
|
198
|
+
```
|
|
199
|
+
PERFORMANCE BUDGET CHECK
|
|
200
|
+
════════════════════════
|
|
201
|
+
Metric Budget Actual Status
|
|
202
|
+
──────── ────── ────── ──────
|
|
203
|
+
FCP < 1.8s 0.48s PASS
|
|
204
|
+
LCP < 2.5s 1.6s PASS
|
|
205
|
+
Total JS < 500KB 720KB FAIL
|
|
206
|
+
Total CSS < 100KB 88KB PASS
|
|
207
|
+
Total Transfer < 2MB 1.8MB WARNING (90%)
|
|
208
|
+
HTTP Requests < 50 58 FAIL
|
|
209
|
+
|
|
210
|
+
Grade: B (4/6 passing)
|
|
211
|
+
```
|
|
212
|
+
|
|
213
|
+
### Phase 8: Trend Analysis (--trend mode)
|
|
214
|
+
|
|
215
|
+
Load historical baseline files and show trends:
|
|
216
|
+
|
|
217
|
+
```
|
|
218
|
+
PERFORMANCE TRENDS (last 5 benchmarks)
|
|
219
|
+
══════════════════════════════════════
|
|
220
|
+
Date FCP LCP Bundle Requests Grade
|
|
221
|
+
2026-03-10 420ms 750ms 380KB 38 A
|
|
222
|
+
2026-03-12 440ms 780ms 410KB 40 A
|
|
223
|
+
2026-03-14 450ms 800ms 450KB 42 A
|
|
224
|
+
2026-03-16 460ms 850ms 520KB 48 B
|
|
225
|
+
2026-03-18 480ms 1600ms 720KB 58 B
|
|
226
|
+
|
|
227
|
+
TREND: Performance degrading. LCP doubled in 8 days.
|
|
228
|
+
JS bundle growing 50KB/week. Investigate.
|
|
229
|
+
```
|
|
230
|
+
|
|
231
|
+
### Phase 9: Save Report
|
|
232
|
+
|
|
233
|
+
Write to `.orch/benchmark-reports/{date}-benchmark.md` and `.orch/benchmark-reports/{date}-benchmark.json`.
|
|
234
|
+
|
|
235
|
+
## Important Rules
|
|
236
|
+
|
|
237
|
+
- **Measure, don't guess.** Use actual performance.getEntries() data, not estimates.
|
|
238
|
+
- **Baseline is essential.** Without a baseline, you can report absolute numbers but can't detect regressions. Always encourage baseline capture.
|
|
239
|
+
- **Relative thresholds, not absolute.** 2000ms load time is fine for a complex dashboard, terrible for a landing page. Compare against YOUR baseline.
|
|
240
|
+
- **Third-party scripts are context.** Flag them, but the user can't fix Google Analytics being slow. Focus recommendations on first-party resources.
|
|
241
|
+
- **Bundle size is the leading indicator.** Load time varies with network. Bundle size is deterministic. Track it religiously.
|
|
242
|
+
- **Read-only.** Produce the report. Don't modify code unless explicitly asked.
|