orchestrix-yuri 4.8.16 → 4.9.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -261,10 +261,12 @@ function loadSessionState() {
261
261
  if (!fs.existsSync(SESSION_FILE)) return null;
262
262
  try {
263
263
  const state = JSON.parse(fs.readFileSync(SESSION_FILE, 'utf8'));
264
- // Reject: wrong version (system prompt changed), or expired (>24h)
264
+ // Reject: wrong version (system prompt changed), or expired (>4h)
265
+ // Claude API sessions expire well before 24h. 4h is a safe max to avoid
266
+ // wasting a call on a stale --resume that will fail.
265
267
  if (state.version !== SESSION_VERSION) return null;
266
268
  const age = Date.now() - new Date(state.savedAt).getTime();
267
- if (age > 24 * 3600_000) return null;
269
+ if (age > 4 * 3600_000) return null;
268
270
  return state;
269
271
  } catch { return null; }
270
272
  }
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "orchestrix-yuri",
3
- "version": "4.8.16",
3
+ "version": "4.9.0",
4
4
  "description": "Yuri — Meta-Orchestrator for Orchestrix. Drive your entire project lifecycle with natural language.",
5
5
  "main": "lib/installer.js",
6
6
  "bin": {
package/skill/SKILL.md CHANGED
@@ -44,18 +44,19 @@ Without it, the Enter may arrive before the TUI is ready, leaving content stuck
44
44
  | # | Command | Description |
45
45
  |---|---------|-------------|
46
46
  | 1 | *create | Create a new project (Phase 1) |
47
- | 2 | *plan | Start/resume project planning (Phase 2) |
48
- | 3 | *develop | Start/resume automated development (Phase 3) |
49
- | 4 | *test | Start/resume smoke testing (Phase 4) |
50
- | 5 | *deploy | Start/resume deployment (Phase 5) |
51
- | 6 | *status | Show project progress card |
52
- | 7 | *resume | Resume from last saved checkpoint |
53
- | 8 | *change "{desc}" | Handle requirement change (auto scope assessment) |
54
- | 9 | *iterate | Start new iteration (PM Architect → SM → dev) |
55
- | 10 | *cancel | Cancel running phase |
56
- | 11 | *projects | List all registered projects |
57
- | 12 | *switch {name} | Switch active project |
58
- | 13 | *help | Show all commands |
47
+ | 2 | *plan | Start/resume project planning (Phase 2) — includes gstack Plan Review Gate |
48
+ | 3 | *develop | Start/resume automated development (Phase 3) — gstack /investigate for stuck stories |
49
+ | 4 | *test | Start/resume smoke testing (Phase 4) — includes gstack browser QA for UI projects |
50
+ | 5 | *pre-ship | Run quality gates: code review + security + performance + design (Phase 4.5) |
51
+ | 6 | *deploy | Start/resume deployment (Phase 5) — gstack /canary post-deploy monitoring |
52
+ | 7 | *status | Show project progress card |
53
+ | 8 | *resume | Resume from last saved checkpoint |
54
+ | 9 | *change "{desc}" | Handle requirement change (auto scope assessment) |
55
+ | 10 | *iterate | Start new iteration (PM → Architect → SM → dev) |
56
+ | 11 | *cancel | Cancel running phase |
57
+ | 12 | *projects | List all registered projects |
58
+ | 13 | *switch {name} | Switch active project |
59
+ | 14 | *help | Show all commands |
59
60
 
60
61
  ## Activation Protocol
61
62
 
@@ -89,15 +90,16 @@ I can take you from a one-sentence idea to a fully deployed project:
89
90
  | 2 | *plan | Start/resume project planning (Phase 2) |
90
91
  | 3 | *develop | Start/resume automated development (Phase 3) |
91
92
  | 4 | *test | Start/resume smoke testing (Phase 4) |
92
- | 5 | *deploy | Start/resume deployment (Phase 5) |
93
- | 6 | *status | Show project progress card |
94
- | 7 | *resume | Resume from last saved checkpoint |
95
- | 8 | *change "{desc}" | Handle requirement change (auto scope assessment) |
96
- | 9 | *iterate | Start new iteration (PM Architect → SM → dev) |
97
- | 10 | *cancel | Cancel running phase |
98
- | 11 | *projects | List all registered projects |
99
- | 12 | *switch {name} | Switch active project |
100
- | 13 | *help | Show all commands |
93
+ | 5 | *pre-ship | Run quality gates before deploy (Phase 4.5) |
94
+ | 6 | *deploy | Start/resume deployment (Phase 5) |
95
+ | 7 | *status | Show project progress card |
96
+ | 8 | *resume | Resume from last saved checkpoint |
97
+ | 9 | *change "{desc}" | Handle requirement change (auto scope assessment) |
98
+ | 10 | *iterate | Start new iteration (PM → Architect → SM → dev) |
99
+ | 11 | *cancel | Cancel running phase |
100
+ | 12 | *projects | List all registered projects |
101
+ | 13 | *switch {name} | Switch active project |
102
+ | 14 | *help | Show all commands |
101
103
 
102
104
  Tell me what you'd like to build, or pick a command to get started.
103
105
  ```
@@ -134,11 +136,37 @@ Every task file follows a standardized three-part structure:
134
136
  | *plan | [yuri-plan-project.md](tasks/yuri-plan-project.md) |
135
137
  | *develop | [yuri-develop-project.md](tasks/yuri-develop-project.md) |
136
138
  | *test | [yuri-test-project.md](tasks/yuri-test-project.md) |
139
+ | *pre-ship | [yuri-pre-ship.md](tasks/yuri-pre-ship.md) |
137
140
  | *deploy | [yuri-deploy-project.md](tasks/yuri-deploy-project.md) |
138
141
  | *resume | [yuri-resume.md](tasks/yuri-resume.md) |
139
142
  | *status | [yuri-status.md](tasks/yuri-status.md) |
140
143
  | *change | [yuri-handle-change.md](tasks/yuri-handle-change.md) |
141
144
 
145
+ ## Lifecycle Flow
146
+
147
+ ```
148
+ Create → Plan → [Plan Review Gate] → Develop → Test → [Browser QA] → Pre-Ship → Deploy → [Canary] → Close Out
149
+ (gstack) (gstack) (gstack) (gstack) (gstack retro+docs)
150
+ ```
151
+
152
+ gstack quality gates are **additive** — if gstack is not installed, the original flow works unchanged.
153
+ All gstack integration points gracefully degrade: check for `~/.claude/skills/gstack/` and skip if absent.
154
+
155
+ ## gstack Integration
156
+
157
+ Yuri integrates with [gstack](https://github.com/garrytan/gstack) skills at 6 points in the lifecycle:
158
+
159
+ | Phase | gstack Skill | Purpose | Required? |
160
+ |-------|-------------|---------|-----------|
161
+ | Plan (Phase 2) | `/plan-ceo-review`, `/plan-eng-review`, `/plan-design-review` | Challenge assumptions, audit architecture, validate design | Optional (user chooses depth) |
162
+ | Develop (Phase 3) | `/investigate` | Root cause analysis for stuck stories | Auto-triggered on stuck_count=2 |
163
+ | Test (Phase 4) | `/qa` | Browser-based end-to-end testing for UI projects | Auto for UI projects, skip for API-only |
164
+ | Pre-Ship (Phase 4.5) | `/review`, `/cso`, `/benchmark`, `/design-review` | Code review, security audit, performance baseline, design QA | Recommended, skippable |
165
+ | Deploy (Phase 5) | `/canary` | 10-minute post-deploy monitoring with regression detection | Auto if gstack available |
166
+ | Close Out | `/document-release`, `/retro` | Sync docs with shipped code, structured retrospective | Auto after Phase 5 |
167
+
168
+ **Install gstack**: `git clone --depth 1 https://github.com/garrytan/gstack.git ~/.claude/skills/gstack && cd ~/.claude/skills/gstack && ./setup`
169
+
142
170
  ## Memory Layout
143
171
 
144
172
  Yuri uses a four-layer memory system. Files are organized by access frequency and change rate.
@@ -18,6 +18,10 @@ routes:
18
18
  target_phase: 4
19
19
  command: "*test"
20
20
 
21
+ - intent_patterns: ["pre-ship", "quality gate", "code review", "security audit", "pre-launch check", "ready to ship"]
22
+ target_phase: 4.5
23
+ command: "*pre-ship"
24
+
21
25
  - intent_patterns: ["deploy", "ship", "release", "go live", "publish", "launch"]
22
26
  target_phase: 5
23
27
  command: "*deploy"
@@ -90,3 +90,80 @@ IF the current phase just transitioned to `complete`:
90
90
  - Projects with `status: paused` for > 30 days: suggest archiving to user.
91
91
  - Projects with `status: maintenance` and no timeline events for > 60 days:
92
92
  remove from `~/.yuri/focus.yaml` → `attention_queue`.
93
+
94
+ ---
95
+
96
+ ## F.5 Post-Ship Documentation & Retrospective (gstack — only after Phase 5 completes)
97
+
98
+ **Skip this section entirely unless Phase 5 (Deploy) just completed successfully.**
99
+
100
+ Check gstack availability first:
101
+ ```bash
102
+ test -d "$HOME/.claude/skills/gstack" && echo "gstack_available" || echo "gstack_missing"
103
+ ```
104
+
105
+ IF `gstack_missing` → skip F.5 entirely.
106
+
107
+ ### F.5.1 Document Release (`/document-release`)
108
+
109
+ **Purpose**: Automatically sync project documentation with what actually shipped. READMEs, ARCHITECTURE.md, CONTRIBUTING.md, CHANGELOG, and CLAUDE.md are often stale by deployment time.
110
+
111
+ Execute:
112
+
113
+ ```
114
+ /document-release
115
+ ```
116
+
117
+ This reads all project docs, cross-references the diff since project start, and updates:
118
+ - README.md (features, setup instructions, API docs)
119
+ - ARCHITECTURE.md (if it exists — system design, data flows)
120
+ - CHANGELOG.md (polished voice, consistent format)
121
+ - Any other documentation that references changed code
122
+
123
+ Report:
124
+ ```
125
+ 📝 Documentation synced with shipped code.
126
+ Updated: {list of updated doc files}
127
+ ```
128
+
129
+ Append to timeline:
130
+ ```jsonl
131
+ {"ts":"{ISO-8601}","type":"document_release","files_updated":[{list}]}
132
+ ```
133
+
134
+ ### F.5.2 Engineering Retrospective (`/retro`)
135
+
136
+ **Purpose**: Generate a structured retrospective analyzing the full project lifecycle — commit patterns, code quality metrics, testing coverage, shipping velocity, and actionable improvements for the next project.
137
+
138
+ Execute:
139
+
140
+ ```
141
+ /retro
142
+ ```
143
+
144
+ This analyzes git history for the project and produces:
145
+ - **Velocity metrics**: commits, LOC, test-to-production ratio
146
+ - **Code quality signals**: test coverage trends, file hotspots, fix-to-feature ratio
147
+ - **Time patterns**: work sessions, peak hours, focus score
148
+ - **Top 3 wins**: highest-impact deliverables
149
+ - **3 improvements**: specific, actionable, anchored in data
150
+ - **3 habits for next project**: small, realistic practices
151
+
152
+ Parse the retro output and extract universal insights for wisdom:
153
+
154
+ **Promote to global wisdom** (if applicable):
155
+ - Patterns that would benefit future projects → `~/.yuri/wisdom/workflow.md`
156
+ - Technical gotchas discovered → `~/.yuri/wisdom/tech.md`
157
+ - Process pitfalls encountered → `~/.yuri/wisdom/pitfalls.md`
158
+
159
+ Report summary to user:
160
+ ```
161
+ 📊 Project Retrospective generated.
162
+ Key stats: {commits} commits, {loc} LOC, {test_ratio}% test coverage
163
+ Top win: {top_win}
164
+ Top improvement: {top_improvement}
165
+ ```
166
+
167
+ Append to timeline:
168
+ ```jsonl
169
+ {"ts":"{ISO-8601}","type":"retro_completed","commits":{n},"loc":{n},"test_ratio":{n}}
@@ -9,6 +9,9 @@
9
9
 
10
10
  - Phase 4 (Test) must be complete.
11
11
  - `{project}/.yuri/state/phase4.yaml` must have `status: complete`.
12
+ - Pre-Ship Quality Gate should be complete (recommended but not required).
13
+ - IF `{project}/.yuri/state/pre-ship.yaml` exists with `status: complete` → proceed.
14
+ - IF not → warn: "Pre-ship quality gates were not run. Consider running `*pre-ship` first." → ask user to continue or run pre-ship.
12
15
 
13
16
  ---
14
17
 
@@ -89,29 +92,105 @@ Update `{project}/.yuri/focus.yaml` → `action: "executing deployment via {stra
89
92
 
90
93
  ---
91
94
 
92
- ## Step 4: Health Check
93
-
94
- After deployment completes:
95
+ ## Step 4: Post-Deploy Verification
95
96
 
97
+ After deployment completes, set:
96
98
  ```bash
97
99
  DEPLOYED_URL="{url from deployment output}"
100
+ ```
101
+
102
+ ### 4.1 Quick Health Check (immediate)
103
+
104
+ First, do a fast HTTP check to confirm the deployment succeeded at all:
105
+
106
+ ```bash
98
107
  HTTP_CODE=$(curl -s -o /dev/null -w "%{http_code}" "$DEPLOYED_URL" 2>/dev/null)
99
108
  ```
100
109
 
101
- IF `HTTP_CODE` = 200 (or expected success code):
102
- - `{project}/.yuri/state/phase5.yaml` `status: "complete"`, `completed_at: now`, `url: "$DEPLOYED_URL"`, `health: "healthy"`
103
- - `{project}/.yuri/focus.yaml` → `step: "phase5.complete"`, `pulse: "Deployed and healthy at {url}"`
110
+ IF `HTTP_CODE` != 200 deployment itself failed:
111
+ - Report error to user with diagnostics.
112
+ - Offer: retry, try alternative strategy, or pause.
113
+ - Do NOT proceed to canary monitoring.
114
+
115
+ ### 4.2 Canary Monitoring (gstack — 10 minutes)
116
+
117
+ IF quick health check passed AND gstack is available:
118
+
119
+ ```bash
120
+ test -d "$HOME/.claude/skills/gstack" && echo "gstack_available" || echo "gstack_missing"
121
+ ```
122
+
123
+ **IF gstack available:**
124
+
125
+ Report to user:
126
+ ```
127
+ ✅ Deployment succeeded (HTTP 200). Starting 10-minute canary monitoring...
128
+ ```
129
+
130
+ Step 1 — Capture baseline (if pre-ship `/benchmark` was run, this is already done):
131
+ ```
132
+ /canary {DEPLOYED_URL} --baseline
133
+ ```
134
+
135
+ Step 2 — Start canary monitoring:
136
+ ```
137
+ /canary {DEPLOYED_URL}
138
+ ```
139
+
140
+ This monitors for 10 minutes, checking every 60 seconds:
141
+ - Page load failures or timeouts
142
+ - New console errors not in baseline
143
+ - Performance regressions (load times 2x slower than baseline)
144
+ - Broken links
145
+
146
+ Wait for canary to complete. Parse results:
147
+ - **Health status**: HEALTHY / DEGRADED / BROKEN
148
+ - **Alert count**
149
+ - **Rollback recommendation** (if any)
150
+
151
+ IF canary reports **HEALTHY**:
152
+ - Proceed to success state (Step 4.3).
153
+
154
+ IF canary reports **DEGRADED**:
155
+ - Report issues to user with screenshot evidence:
156
+ ```
157
+ ⚠️ Canary detected issues during monitoring:
158
+ {list of alerts with evidence}
159
+
160
+ The deployment is functional but has issues. Continue / Rollback?
161
+ ```
162
+ - IF continue → proceed to Step 4.3 with warnings logged.
163
+ - IF rollback → offer rollback guidance, pause.
164
+
165
+ IF canary reports **BROKEN**:
166
+ - Report critical failure:
167
+ ```
168
+ 🚨 Canary detected critical failures:
169
+ {alerts with screenshots}
170
+
171
+ Recommend immediate rollback. Rollback / Investigate / Keep?
172
+ ```
173
+
174
+ Append to timeline:
175
+ ```jsonl
176
+ {"ts":"{ISO-8601}","type":"canary_completed","url":"{url}","health":"{status}","alerts":{count}}
177
+ ```
178
+
179
+ **IF gstack NOT available** — fall back to basic check:
180
+ - The HTTP 200 check from 4.1 is sufficient. Proceed to Step 4.3.
181
+ - Note in timeline: `"canary":"skipped_no_gstack"`
182
+
183
+ ### 4.3 Mark Complete
184
+
185
+ - `{project}/.yuri/state/phase5.yaml` → `status: "complete"`, `completed_at: now`, `url: "$DEPLOYED_URL"`, `health: "{canary_status or 'healthy'}"`, `canary_alerts: {count or 0}`
186
+ - `{project}/.yuri/focus.yaml` → `step: "phase5.complete"`, `pulse: "Deployed and {health} at {url}"`
104
187
  - Append to timeline:
105
188
  ```jsonl
106
- {"ts":"{ISO-8601}","type":"project_deployed","url":"{url}","strategy":"{strategy}","health":"healthy"}
189
+ {"ts":"{ISO-8601}","type":"project_deployed","url":"{url}","strategy":"{strategy}","health":"{status}","canary_alerts":{count}}
107
190
  {"ts":"{ISO-8601}","type":"phase_completed","phase":5}
108
191
  ```
109
192
  - Update `~/.yuri/portfolio/registry.yaml` → this project's `status: maintenance`, `pulse: "v1.0 deployed at {url}"`
110
193
 
111
- IF health check fails:
112
- - Report error to user with diagnostics.
113
- - Offer: retry, try alternative strategy, or pause.
114
-
115
194
  Report:
116
195
  ```
117
196
  ## ✅ Deployment Complete
@@ -120,7 +199,8 @@ Report:
120
199
  |------|--------|
121
200
  | **Strategy** | {strategy} |
122
201
  | **URL** | {url} |
123
- | **Health** | ✅ Healthy |
202
+ | **Health** | {✅ Healthy / ⚠️ Degraded} |
203
+ | **Canary** | {10 min monitoring: N alerts / skipped} |
124
204
 
125
205
  🎉 Project lifecycle complete! From idea to production.
126
206
 
@@ -130,7 +130,61 @@ done
130
130
  - `/clear` → restart agent
131
131
  4. Increment `state/phase3.yaml` → `monitoring.stuck_count`.
132
132
 
133
- IF `stuck_count > 3`:
133
+ IF `stuck_count` = 2 AND gstack is available:
134
+ - **Escalate to `/investigate`** before requesting human intervention.
135
+ - This provides a structured root cause analysis instead of blind retries.
136
+
137
+ ```bash
138
+ test -d "$HOME/.claude/skills/gstack" && echo "gstack_available" || echo "gstack_missing"
139
+ ```
140
+
141
+ IF gstack available:
142
+
143
+ Report to user:
144
+ ```
145
+ 🔍 Story stuck after 2 recovery attempts. Running root cause investigation...
146
+ ```
147
+
148
+ Execute in Yuri's own session:
149
+
150
+ ```
151
+ /investigate
152
+ ```
153
+
154
+ Provide `/investigate` with context:
155
+ - Captured tmux pane contents from all 4 windows
156
+ - Current story ID and description
157
+ - Error patterns detected
158
+ - What recovery was already attempted
159
+
160
+ Wait for `/investigate` to complete. Parse output:
161
+ - **Root cause** identified?
162
+ - **Fix confidence** (1-10)
163
+ - **Recommended fix** with file:line references
164
+
165
+ IF fix confidence ≥ 8:
166
+ - Report to user: "Root cause identified: {summary}. Auto-fix recommended. Proceed? (Y/N)"
167
+ - IF Y → route fix to Dev agent via `*quick-fix "{investigate_fix_description}"`, then resume monitoring.
168
+ - IF N → present full investigation report, let user decide.
169
+
170
+ IF fix confidence < 8:
171
+ - Report full investigation to user with diagnostics:
172
+ ```
173
+ 🔍 Investigation Report:
174
+ - Symptom: {symptom}
175
+ - Root cause: {root_cause or "inconclusive"}
176
+ - Confidence: {score}/10
177
+ - Recommended action: {action}
178
+
179
+ Please advise: Fix manually / Retry / Skip this story / Pause?
180
+ ```
181
+
182
+ Append to timeline:
183
+ ```jsonl
184
+ {"ts":"{ISO-8601}","type":"investigate","story":"{id}","root_cause":"{summary}","confidence":{score}}
185
+ ```
186
+
187
+ IF `stuck_count > 3` (after investigate attempt or if gstack not available):
134
188
  - Report to user with full diagnostics.
135
189
  - Request human intervention.
136
190
  - Pause monitoring loop.
@@ -196,7 +196,7 @@ tmux kill-session -t "$SESSION"
196
196
 
197
197
  2. Update memory:
198
198
  - `{project}/.yuri/state/phase2.yaml` → `status: complete`, `completed_at: now`
199
- - `{project}/.yuri/focus.yaml` → `step: "phase2.complete"`, `pulse: "Phase 2 complete, ready for development"`, `tmux.planning_session: ""`
199
+ - `{project}/.yuri/focus.yaml` → `step: "phase2.complete"`, `pulse: "Phase 2 complete, ready for review"`, `tmux.planning_session: ""`
200
200
 
201
201
  3. Output summary:
202
202
  ```
@@ -210,8 +210,149 @@ All planning documents generated:
210
210
  - Sharded context files for development
211
211
  ```
212
212
 
213
- 4. Ask:
213
+ 4. Proceed to Plan Review Gate (Step 4).
214
+
215
+ ---
216
+
217
+ ## Step 4: Plan Review Gate (gstack)
218
+
219
+ **Purpose**: Use gstack review skills to challenge assumptions, audit architecture, and validate design before writing any code. Catching issues here is 10x cheaper than catching them in development.
220
+
221
+ ### 4.0 Check gstack availability
222
+
223
+ ```bash
224
+ test -d "$HOME/.claude/skills/gstack" && echo "gstack_available" || echo "gstack_missing"
225
+ ```
226
+
227
+ IF `gstack_missing` → skip to Step 4.5 (ask user to proceed to development).
228
+
229
+ ### 4.1 Ask user for review depth
230
+
231
+ ```
232
+ ## 📋 Plan Review Gate
233
+
234
+ Planning is complete. Before development begins, I can run independent quality reviews on the plan using gstack:
235
+
236
+ | # | Review | What it does | Time |
237
+ |---|--------|-------------|------|
238
+ | 1 | **Full Review** | CEO strategy + Eng architecture + Design (if UI) | ~10 min |
239
+ | 2 | **Eng Only** | Architecture, test coverage, failure modes | ~5 min |
240
+ | 3 | **Skip** | Go straight to development | 0 min |
241
+
242
+ Recommendation: **Full Review** for new projects, **Eng Only** for iterations.
243
+ Select (1/2/3):
244
+ ```
245
+
246
+ - IF user selects **3 (Skip)** → go to Step 4.5.
247
+ - IF user selects **2 (Eng Only)** → run only Step 4.3.
248
+ - IF user selects **1 (Full Review)** → run Steps 4.2 through 4.4.
249
+
250
+ ### 4.2 CEO Strategy Review (`/plan-ceo-review`)
251
+
252
+ Report:
253
+ ```
254
+ 🎯 Plan Review 1/3: CEO Strategy Review — challenging scope and premises...
255
+ ```
256
+
257
+ Execute in Yuri's own session:
258
+
259
+ ```
260
+ /plan-ceo-review
261
+ ```
262
+
263
+ This runs in **HOLD SCOPE** mode by default (maximum rigor without scope creep).
264
+
265
+ Wait for completion. Key outputs to capture:
266
+ - Premise challenges
267
+ - Error/rescue registry
268
+ - Failure modes
269
+ - "Not in scope" section
270
+ - Review readiness score
271
+
272
+ IF `/plan-ceo-review` proposes plan modifications:
273
+ - Show diff summary to user.
274
+ - Ask: "Accept changes / Modify / Reject?"
275
+ - IF accepted → plan files updated (gstack writes directly).
276
+ - IF rejected → revert changes, continue with original plan.
277
+
278
+ Append to timeline:
279
+ ```jsonl
280
+ {"ts":"{ISO-8601}","type":"plan_review","reviewer":"ceo","status":"complete"}
281
+ ```
282
+
283
+ ### 4.3 Engineering Architecture Review (`/plan-eng-review`)
284
+
285
+ Report:
214
286
  ```
287
+ 🏗️ Plan Review 2/3: Engineering Review — locking architecture and test strategy...
288
+ ```
289
+
290
+ Execute:
291
+
292
+ ```
293
+ /plan-eng-review
294
+ ```
295
+
296
+ Key outputs:
297
+ - Architecture diagram
298
+ - Coverage diagram (code paths with test status)
299
+ - Failure mode analysis per new codepath
300
+ - Worktree parallelization strategy
301
+ - Issues found per section
302
+
303
+ IF review proposes plan changes → same accept/modify/reject flow as 4.2.
304
+
305
+ Append to timeline:
306
+ ```jsonl
307
+ {"ts":"{ISO-8601}","type":"plan_review","reviewer":"eng","status":"complete"}
308
+ ```
309
+
310
+ ### 4.4 Design Review (`/plan-design-review`) — UI projects only
311
+
312
+ Check if project has UI components:
313
+ 1. Look for frontend files or frontend stack in `{project}/.yuri/identity.yaml`.
314
+ 2. Check if `docs/front-end-spec*.md` exists.
315
+
316
+ IF no UI → skip with note:
317
+ ```
318
+ ℹ️ Plan Review 3/3: Design Review — skipped (no UI components)
319
+ ```
320
+
321
+ IF UI present:
322
+
323
+ Report:
324
+ ```
325
+ 🎨 Plan Review 3/3: Design Review — auditing UX dimensions...
326
+ ```
327
+
328
+ Execute:
329
+
330
+ ```
331
+ /plan-design-review
332
+ ```
333
+
334
+ Key outputs:
335
+ - 7-dimension scorecard (info architecture, interaction states, user journey, AI slop risk, design system, responsive/a11y, unresolved decisions)
336
+ - Updated plan with design decisions filled in
337
+
338
+ IF review proposes changes → same accept/modify/reject flow.
339
+
340
+ Append to timeline:
341
+ ```jsonl
342
+ {"ts":"{ISO-8601}","type":"plan_review","reviewer":"design","status":"complete"}
343
+ ```
344
+
345
+ ### 4.5 Review Summary + Transition
346
+
347
+ ```
348
+ ## 📋 Plan Review Complete
349
+
350
+ | Review | Status | Key Finding |
351
+ |--------|--------|-------------|
352
+ | CEO Strategy | {✅/⏭️} | {one-line summary} |
353
+ | Eng Architecture | {✅/⏭️} | {one-line summary} |
354
+ | Design | {✅/⏭️} | {one-line summary} |
355
+
215
356
  🚀 Ready to start automated development? (Y/N)
216
357
  ```
217
358
 
@@ -0,0 +1,334 @@
1
+ # Phase 4.5: Pre-Ship Quality Gate
2
+
3
+ **Command**: `*pre-ship`
4
+ **Purpose**: Run comprehensive quality checks (code review, security audit, performance baseline, design audit) before deployment. Uses gstack skills as independent quality gates.
5
+
6
+ ---
7
+
8
+ ## Prerequisites
9
+
10
+ - Phase 4 (Test) must be complete.
11
+ - `{project}/.yuri/state/phase4.yaml` must have `status: complete`.
12
+ - gstack must be installed (`~/.claude/skills/gstack/` must exist).
13
+
14
+ ---
15
+
16
+ ## Step 0: Wake Up
17
+
18
+ Read [_wake-up.md](tasks/_wake-up.md) and execute it fully.
19
+
20
+ After Wake Up, validate:
21
+ 1. Read `{project}/.yuri/state/phase4.yaml` → verify `status` = `complete`.
22
+ - IF not → "Phase 4 not complete. Run `*test` first." and stop.
23
+ 2. Set `PROJECT_DIR` from `{project}/.yuri/identity.yaml` → `project.root`.
24
+ 3. Check gstack availability:
25
+ ```bash
26
+ test -d "$HOME/.claude/skills/gstack" && echo "gstack_available" || echo "gstack_missing"
27
+ ```
28
+ - IF `gstack_missing` → warn user: "gstack not installed. Install with: `git clone --depth 1 https://github.com/garrytan/gstack.git ~/.claude/skills/gstack && cd ~/.claude/skills/gstack && ./setup`". Offer to skip pre-ship and go directly to deploy.
29
+
30
+ **Resumption check:**
31
+ - IF `{project}/.yuri/state/pre-ship.yaml` exists with `status: in_progress`:
32
+ → Find last completed gate → resume from the next one.
33
+ - IF `{project}/.yuri/state/pre-ship.yaml` exists with `status: complete`:
34
+ → Offer to skip to Phase 5.
35
+ - OTHERWISE → initialize `state/pre-ship.yaml`:
36
+ ```yaml
37
+ status: pending
38
+ started_at: ""
39
+ completed_at: ""
40
+ gates:
41
+ code_review:
42
+ status: pending
43
+ score: null
44
+ findings: 0
45
+ critical: 0
46
+ security_audit:
47
+ status: pending
48
+ score: null
49
+ findings: 0
50
+ critical: 0
51
+ performance_baseline:
52
+ status: pending
53
+ captured: false
54
+ design_audit:
55
+ status: pending
56
+ score: null
57
+ has_ui: null
58
+ overall_verdict: pending
59
+ ```
60
+
61
+ Update memory:
62
+ - `{project}/.yuri/focus.yaml` → `phase: 4.5`, `step: "pre-ship"`, `action: "running quality gates"`, `updated_at: now`
63
+ - `{project}/.yuri/state/pre-ship.yaml` → `status: "in_progress"`, `started_at: now`
64
+ - `~/.yuri/focus.yaml` → `active_action: "pre-ship quality gates: {name}"`, `updated_at: now`
65
+ - Append to `{project}/.yuri/timeline/events.jsonl`:
66
+ ```jsonl
67
+ {"ts":"{ISO-8601}","type":"pre_ship_started","gates":["code_review","security_audit","performance_baseline","design_audit"]}
68
+ ```
69
+ - Save all files immediately.
70
+
71
+ ---
72
+
73
+ ## Step 1: Code Review Gate (`/review`)
74
+
75
+ **Purpose**: Catch structural issues that tests miss — SQL safety, race conditions, LLM trust boundaries, type coercion, async mixing, documentation staleness.
76
+
77
+ Report to user:
78
+ ```
79
+ 🔍 Gate 1/4: Code Review — scanning diff against base branch...
80
+ ```
81
+
82
+ Execute in the current Claude Code session (Yuri's own session, NOT a tmux agent window):
83
+
84
+ ```
85
+ /review
86
+ ```
87
+
88
+ Wait for `/review` to complete. Parse the output for:
89
+ - **PR Quality Score** (0-10)
90
+ - **Critical findings** (severity ≥ 8)
91
+ - **Total findings count**
92
+ - **Auto-fixed items**
93
+
94
+ Update `{project}/.yuri/state/pre-ship.yaml`:
95
+ ```yaml
96
+ gates.code_review:
97
+ status: complete
98
+ score: {pr_quality_score}
99
+ findings: {total_findings}
100
+ critical: {critical_count}
101
+ completed_at: "{ISO-8601}"
102
+ ```
103
+
104
+ Append to timeline:
105
+ ```jsonl
106
+ {"ts":"{ISO-8601}","type":"gate_completed","gate":"code_review","score":{score},"findings":{findings},"critical":{critical}}
107
+ ```
108
+
109
+ ### Gate Decision
110
+
111
+ - IF `critical` = 0 → **PASS** — continue to next gate.
112
+ - IF `critical` > 0 AND all were auto-fixed → **PASS with fixes** — continue.
113
+ - IF `critical` > 0 AND some unfixed → **BLOCK** — report to user:
114
+ ```
115
+ ⚠️ Code Review found {critical} critical issue(s) requiring attention:
116
+ {list of unfixed critical findings}
117
+
118
+ Fix now / Skip / Abort pre-ship?
119
+ ```
120
+ - **Fix now** → user or Dev agent fixes, then re-run `/review`.
121
+ - **Skip** → proceed with warning logged.
122
+ - **Abort** → save state, stop.
123
+
124
+ ---
125
+
126
+ ## Step 2: Security Audit Gate (`/cso`)
127
+
128
+ **Purpose**: Detect exploitable vulnerabilities — secrets in git history, dependency supply chain, OWASP Top 10, CI/CD exposure, LLM/AI security risks.
129
+
130
+ Report to user:
131
+ ```
132
+ 🛡️ Gate 2/4: Security Audit — scanning for vulnerabilities...
133
+ ```
134
+
135
+ Execute:
136
+
137
+ ```
138
+ /cso
139
+ ```
140
+
141
+ This runs in **daily mode** (default: 8/10 confidence gate, zero-noise).
142
+
143
+ Wait for completion. Parse for:
144
+ - **Security posture** (findings count, severity distribution)
145
+ - **Critical/High findings** (confidence ≥ 8)
146
+ - **Attack surface census**
147
+
148
+ Update `{project}/.yuri/state/pre-ship.yaml`:
149
+ ```yaml
150
+ gates.security_audit:
151
+ status: complete
152
+ findings: {total}
153
+ critical: {critical_high_count}
154
+ completed_at: "{ISO-8601}"
155
+ ```
156
+
157
+ Append to timeline:
158
+ ```jsonl
159
+ {"ts":"{ISO-8601}","type":"gate_completed","gate":"security_audit","findings":{findings},"critical":{critical}}
160
+ ```
161
+
162
+ ### Gate Decision
163
+
164
+ - IF `critical` = 0 → **PASS**.
165
+ - IF `critical` > 0 → **BLOCK** — report with exploit scenarios:
166
+ ```
167
+ 🚨 Security Audit found {critical} exploitable issue(s):
168
+ {list with severity, file:line, exploit scenario}
169
+
170
+ Fix now / Skip (accept risk) / Abort?
171
+ ```
172
+ - **Fix now** → address findings, re-run `/cso`.
173
+ - **Skip** → proceed with risk accepted, log to `knowledge/decisions.md`.
174
+ - **Abort** → save state, stop.
175
+
176
+ ---
177
+
178
+ ## Step 3: Performance Baseline Gate (`/benchmark`)
179
+
180
+ **Purpose**: Capture performance baseline metrics before deployment. These become the reference for post-deploy canary comparison.
181
+
182
+ Report to user:
183
+ ```
184
+ 📊 Gate 3/4: Performance Baseline — capturing metrics...
185
+ ```
186
+
187
+ **Precondition**: Project must have a running dev server URL. Check:
188
+ 1. Read `{project}/.orchestrix-core/core-config.yaml` for `dev_server_url`.
189
+ 2. IF no dev server URL configured:
190
+ - Try common patterns: `http://localhost:3000`, `http://localhost:5173`, `http://localhost:8080`.
191
+ - IF none respond → skip this gate with note: "No dev server detected. Performance baseline skipped."
192
+ - Update gate status to `skipped` and continue.
193
+
194
+ IF dev server is available:
195
+
196
+ ```
197
+ /benchmark --baseline
198
+ ```
199
+
200
+ Wait for completion. Parse for:
201
+ - **Core Web Vitals** (TTFB, FCP, LCP)
202
+ - **Bundle sizes** (JS, CSS)
203
+ - **Resource counts**
204
+
205
+ Update `{project}/.yuri/state/pre-ship.yaml`:
206
+ ```yaml
207
+ gates.performance_baseline:
208
+ status: complete
209
+ captured: true
210
+ dev_server_url: "{url}"
211
+ completed_at: "{ISO-8601}"
212
+ ```
213
+
214
+ Append to timeline:
215
+ ```jsonl
216
+ {"ts":"{ISO-8601}","type":"gate_completed","gate":"performance_baseline","captured":true}
217
+ ```
218
+
219
+ This gate does NOT block — it only captures baseline for later comparison.
220
+
221
+ ---
222
+
223
+ ## Step 4: Design Audit Gate (`/design-review`)
224
+
225
+ **Purpose**: Catch visual inconsistencies, AI slop patterns, spacing issues, hierarchy problems before users see them.
226
+
227
+ **Precondition**: Project must have UI components. Check:
228
+ 1. Look for frontend files: `src/**/*.{tsx,jsx,vue,svelte}`, `app/**/*.{tsx,jsx}`, `pages/**/*.{tsx,jsx}`.
229
+ 2. Check `{project}/.yuri/identity.yaml` → `stack` field for frontend frameworks.
230
+ 3. IF no UI detected → skip this gate:
231
+ ```
232
+ ℹ️ Gate 4/4: Design Audit — skipped (no UI components detected)
233
+ ```
234
+ Update gate status to `skipped` and continue.
235
+
236
+ IF UI is present AND dev server is available:
237
+
238
+ Report to user:
239
+ ```
240
+ 🎨 Gate 4/4: Design Audit — reviewing visual quality...
241
+ ```
242
+
243
+ ```
244
+ /design-review quick
245
+ ```
246
+
247
+ Wait for completion. Parse for:
248
+ - **Design score** (A-F)
249
+ - **AI Slop score** (A-F)
250
+ - **Per-category grades**
251
+ - **Critical visual issues**
252
+
253
+ Update `{project}/.yuri/state/pre-ship.yaml`:
254
+ ```yaml
255
+ gates.design_audit:
256
+ status: complete
257
+ score: "{design_grade}"
258
+ has_ui: true
259
+ completed_at: "{ISO-8601}"
260
+ ```
261
+
262
+ Append to timeline:
263
+ ```jsonl
264
+ {"ts":"{ISO-8601}","type":"gate_completed","gate":"design_audit","score":"{grade}"}
265
+ ```
266
+
267
+ ### Gate Decision
268
+
269
+ - IF design score ≥ C → **PASS**.
270
+ - IF design score < C → **WARN** — report issues but do not block:
271
+ ```
272
+ ⚠️ Design audit scored {grade}. Key issues:
273
+ {list of top issues}
274
+
275
+ These are non-blocking but recommended to fix before launch.
276
+ Continue to deploy / Fix first?
277
+ ```
278
+
279
+ ---
280
+
281
+ ## Step 5: Quality Gate Summary
282
+
283
+ Compile results from all gates:
284
+
285
+ ```
286
+ ## 🏁 Pre-Ship Quality Gate Report
287
+
288
+ | Gate | Status | Score | Findings | Critical |
289
+ |------|--------|-------|----------|----------|
290
+ | Code Review | {✅/⚠️/❌} | {score}/10 | {n} | {n} |
291
+ | Security Audit | {✅/⚠️/❌} | — | {n} | {n} |
292
+ | Performance Baseline | {✅/⏭️} | — | captured | — |
293
+ | Design Audit | {✅/⚠️/⏭️} | {grade} | {n} | — |
294
+
295
+ **Overall**: {PASS / PASS WITH WARNINGS / BLOCKED}
296
+ ```
297
+
298
+ ### Overall Verdict Logic
299
+
300
+ - **All gates PASS** → `overall_verdict: pass`
301
+ - **Any gate WARN but no BLOCK** → `overall_verdict: pass_with_warnings`
302
+ - **Any gate BLOCK** → `overall_verdict: blocked` (should not reach here — blocks are handled per-gate)
303
+
304
+ Update `{project}/.yuri/state/pre-ship.yaml`:
305
+ ```yaml
306
+ status: complete
307
+ completed_at: "{ISO-8601}"
308
+ overall_verdict: "{verdict}"
309
+ ```
310
+
311
+ Append to timeline:
312
+ ```jsonl
313
+ {"ts":"{ISO-8601}","type":"pre_ship_completed","verdict":"{verdict}"}
314
+ ```
315
+
316
+ Update focus:
317
+ - `{project}/.yuri/focus.yaml` → `step: "pre-ship.complete"`, `pulse: "Pre-ship {verdict}, ready for deploy"`
318
+
319
+ ### Transition
320
+
321
+ ```
322
+ 🚀 Pre-ship quality gates complete. Ready to deploy? (Y/N)
323
+ ```
324
+
325
+ - If Y → execute `tasks/yuri-deploy-project.md`
326
+ - If N → save state, end with reminder: "Run `/yuri *deploy` when ready."
327
+
328
+ ---
329
+
330
+ ## Final Step: Close Out
331
+
332
+ Read [_close-out.md](tasks/_close-out.md) and execute it fully.
333
+
334
+ Pre-ship often reveals security patterns and code quality insights — Phase Reflect should capture these for project knowledge.
@@ -157,29 +157,115 @@ IF signal detected → append to `~/.yuri/inbox.jsonl`.
157
157
 
158
158
  ---
159
159
 
160
- ## Step 3: All Epics Tested
160
+ ## Step 3: Browser QA (gstack — UI projects only)
161
161
 
162
- 1. Check results: count passed vs failed.
162
+ **Purpose**: After code-level smoke tests pass, run real browser-based end-to-end testing to catch rendering bugs, JS runtime errors, broken interactions, and visual issues that unit/integration tests cannot detect.
163
163
 
164
- 2. IF all passed:
164
+ ### 3.0 Check preconditions
165
+
166
+ Skip this step entirely IF any of:
167
+ - gstack is not installed (`~/.claude/skills/gstack/` missing)
168
+ - Project has no UI (no frontend files, no `docs/front-end-spec*.md`)
169
+ - No dev server URL is available
170
+
171
+ ```bash
172
+ HAS_GSTACK=$(test -d "$HOME/.claude/skills/gstack" && echo "yes" || echo "no")
173
+ HAS_UI=$(ls "$PROJECT_DIR/src/"*.{tsx,jsx,vue,svelte} "$PROJECT_DIR/app/"*.{tsx,jsx} 2>/dev/null | head -1)
174
+ ```
175
+
176
+ IF skipping → report:
177
+ ```
178
+ ℹ️ Browser QA skipped ({reason: no gstack / no UI / no dev server})
179
+ ```
180
+ → proceed to Step 4.
181
+
182
+ ### 3.1 Run Browser QA
183
+
184
+ Report to user:
185
+ ```
186
+ 🌐 Running browser-based QA testing...
187
+ ```
188
+
189
+ Execute in Yuri's own session:
190
+
191
+ ```
192
+ /qa
193
+ ```
194
+
195
+ This runs in **Standard** tier by default (systematic exploration of all pages, 5-10 issues).
196
+
197
+ Wait for completion. Parse for:
198
+ - **Health score** (0-100)
199
+ - **Issues found** (by severity: critical, high, medium, low)
200
+ - **Auto-fixed count** (gstack /qa fixes bugs and commits atomically)
201
+ - **Before/after screenshots**
202
+
203
+ ### 3.2 Evaluate Browser QA Results
204
+
205
+ IF health score ≥ 80 AND no critical issues:
206
+ - Report:
207
+ ```
208
+ ✅ Browser QA passed (score: {score}/100, {fixed} issues auto-fixed)
209
+ ```
210
+ - Proceed to Step 4.
211
+
212
+ IF health score < 80 OR critical issues found:
213
+ - Report:
214
+ ```
215
+ ⚠️ Browser QA found issues (score: {score}/100):
216
+ - Critical: {n} | High: {n} | Medium: {n}
217
+ - Auto-fixed: {n}
218
+ - Remaining: {n}
219
+
220
+ {list of unfixed critical/high issues with repro steps}
221
+
222
+ Fix remaining / Accept and continue / Re-run QA?
223
+ ```
224
+ - **Fix remaining** → route unfixed bugs to Dev agent via `*quick-fix`, then re-run `/qa`.
225
+ - **Accept** → proceed with warnings logged.
226
+ - **Re-run** → execute `/qa` again.
227
+
228
+ Append to timeline:
229
+ ```jsonl
230
+ {"ts":"{ISO-8601}","type":"browser_qa","score":{score},"issues":{total},"fixed":{fixed}}
231
+ ```
232
+
233
+ Update `{project}/.yuri/state/phase4.yaml` → add `browser_qa` section:
234
+ ```yaml
235
+ browser_qa:
236
+ status: complete
237
+ score: {score}
238
+ issues: {total}
239
+ fixed: {fixed}
240
+ ```
241
+
242
+ ---
243
+
244
+ ## Step 4: All Testing Complete
245
+
246
+ 1. Check results: count passed vs failed epics + browser QA status.
247
+
248
+ 2. IF all epics passed (and browser QA passed or skipped):
165
249
  - `{project}/.yuri/state/phase4.yaml` → `status: "complete"`, `completed_at: now`
166
250
  - `{project}/.yuri/focus.yaml` → `step: "phase4.complete"`, `pulse: "Phase 4 complete, all epics passed"`
167
251
  - Append: `{"ts":"...","type":"phase_completed","phase":4}` to timeline.
168
252
 
169
- 3. IF some failed:
253
+ 3. IF some epics failed:
170
254
  - Report failed epics to user.
171
- - Ask: "Retry failed epics, skip to deploy, or pause?"
255
+ - Ask: "Retry failed epics, skip to pre-ship, or pause?"
172
256
  - IF retry → re-enter Step 2 for failed epics only.
173
257
  - IF skip → mark phase complete with note about failures.
174
258
  - IF pause → save state, stop.
175
259
 
176
- 4. Present deployment options:
260
+ 4. Route to Pre-Ship Quality Gate:
177
261
  ```
178
- 🚀 All smoke tests passed! Ready to deploy? (Y/N)
262
+ 🚀 All tests passed! Next: Pre-Ship Quality Gate (code review + security + performance).
263
+ Run quality gates now? (Y / Skip to deploy)
179
264
  ```
180
265
 
181
- - If Y → execute `tasks/yuri-deploy-project.md`
182
- - If N save state, end with reminder: "Run `/yuri *deploy` when ready."
266
+ - If Y → execute `tasks/yuri-pre-ship.md`
267
+ - If "Skip to deploy" execute `tasks/yuri-deploy-project.md`
268
+ - If user wants to pause → save state, end with reminder: "Run `/yuri *pre-ship` or `/yuri *deploy` when ready."
183
269
 
184
270
  ---
185
271