npm - orchestrix-yuri - Versions diffs - 4.8.16 → 4.9.0 - Mend

orchestrix-yuri 4.8.16 → 4.9.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (10) hide show

package/lib/gateway/engine/claude-sdk.js +4 -2
package/package.json +1 -1
package/skill/SKILL.md +49 -21
package/skill/data/phase-routing.yaml +4 -0
package/skill/tasks/_close-out.md +77 -0
package/skill/tasks/yuri-deploy-project.md +92 -12
package/skill/tasks/yuri-develop-project.md +55 -1
package/skill/tasks/yuri-plan-project.md +143 -2
package/skill/tasks/yuri-pre-ship.md +334 -0
package/skill/tasks/yuri-test-project.md +95 -9

package/lib/gateway/engine/claude-sdk.js CHANGED Viewed

@@ -261,10 +261,12 @@ function loadSessionState() {
   if (!fs.existsSync(SESSION_FILE)) return null;
   try {
     const state = JSON.parse(fs.readFileSync(SESSION_FILE, 'utf8'));
-    // Reject: wrong version (system prompt changed), or expired (>24h)
+    // Reject: wrong version (system prompt changed), or expired (>4h)
+    // Claude API sessions expire well before 24h. 4h is a safe max to avoid
+    // wasting a call on a stale --resume that will fail.
     if (state.version !== SESSION_VERSION) return null;
     const age = Date.now() - new Date(state.savedAt).getTime();
-    if (age > 24 * 3600_000) return null;
+    if (age > 4 * 3600_000) return null;
     return state;
   } catch { return null; }
 }

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "orchestrix-yuri",
-  "version": "4.8.16",
+  "version": "4.9.0",
   "description": "Yuri — Meta-Orchestrator for Orchestrix. Drive your entire project lifecycle with natural language.",
   "main": "lib/installer.js",
   "bin": {

package/skill/SKILL.md CHANGED Viewed

@@ -44,18 +44,19 @@ Without it, the Enter may arrive before the TUI is ready, leaving content stuck
 | # | Command | Description |
 |---|---------|-------------|
 | 1 | *create | Create a new project (Phase 1) |
-| 2 | *plan | Start/resume project planning (Phase 2) |
-| 3 | *develop | Start/resume automated development (Phase 3) |
-| 4 | *test | Start/resume smoke testing (Phase 4) |
-| 5 | *deploy | Start/resume deployment (Phase 5) |
-| 6 | *status | Show project progress card |
-| 7 | *resume | Resume from last saved checkpoint |
-| 8 | *change "{desc}" | Handle requirement change (auto scope assessment) |
-| 9 | *iterate | Start new iteration (PM → Architect → SM → dev) |
-| 10 | *cancel | Cancel running phase |
-| 11 | *projects | List all registered projects |
-| 12 | *switch {name} | Switch active project |
-| 13 | *help | Show all commands |
+| 2 | *plan | Start/resume project planning (Phase 2) — includes gstack Plan Review Gate |
+| 3 | *develop | Start/resume automated development (Phase 3) — gstack /investigate for stuck stories |
+| 4 | *test | Start/resume smoke testing (Phase 4) — includes gstack browser QA for UI projects |
+| 5 | *pre-ship | Run quality gates: code review + security + performance + design (Phase 4.5) |
+| 6 | *deploy | Start/resume deployment (Phase 5) — gstack /canary post-deploy monitoring |
+| 7 | *status | Show project progress card |
+| 8 | *resume | Resume from last saved checkpoint |
+| 9 | *change "{desc}" | Handle requirement change (auto scope assessment) |
+| 10 | *iterate | Start new iteration (PM → Architect → SM → dev) |
+| 11 | *cancel | Cancel running phase |
+| 12 | *projects | List all registered projects |
+| 13 | *switch {name} | Switch active project |
+| 14 | *help | Show all commands |
 ## Activation Protocol
@@ -89,15 +90,16 @@ I can take you from a one-sentence idea to a fully deployed project:
 | 2 | *plan | Start/resume project planning (Phase 2) |
 | 3 | *develop | Start/resume automated development (Phase 3) |
 | 4 | *test | Start/resume smoke testing (Phase 4) |
-| 5 | *deploy | Start/resume deployment (Phase 5) |
-| 6 | *status | Show project progress card |
-| 7 | *resume | Resume from last saved checkpoint |
-| 8 | *change "{desc}" | Handle requirement change (auto scope assessment) |
-| 9 | *iterate | Start new iteration (PM → Architect → SM → dev) |
-| 10 | *cancel | Cancel running phase |
-| 11 | *projects | List all registered projects |
-| 12 | *switch {name} | Switch active project |
-| 13 | *help | Show all commands |
+| 5 | *pre-ship | Run quality gates before deploy (Phase 4.5) |
+| 6 | *deploy | Start/resume deployment (Phase 5) |
+| 7 | *status | Show project progress card |
+| 8 | *resume | Resume from last saved checkpoint |
+| 9 | *change "{desc}" | Handle requirement change (auto scope assessment) |
+| 10 | *iterate | Start new iteration (PM → Architect → SM → dev) |
+| 11 | *cancel | Cancel running phase |
+| 12 | *projects | List all registered projects |
+| 13 | *switch {name} | Switch active project |
+| 14 | *help | Show all commands |
 Tell me what you'd like to build, or pick a command to get started.
 ```
@@ -134,11 +136,37 @@ Every task file follows a standardized three-part structure:
 | *plan | [yuri-plan-project.md](tasks/yuri-plan-project.md) |
 | *develop | [yuri-develop-project.md](tasks/yuri-develop-project.md) |
 | *test | [yuri-test-project.md](tasks/yuri-test-project.md) |
+| *pre-ship | [yuri-pre-ship.md](tasks/yuri-pre-ship.md) |
 | *deploy | [yuri-deploy-project.md](tasks/yuri-deploy-project.md) |
 | *resume | [yuri-resume.md](tasks/yuri-resume.md) |
 | *status | [yuri-status.md](tasks/yuri-status.md) |
 | *change | [yuri-handle-change.md](tasks/yuri-handle-change.md) |
+## Lifecycle Flow
+```
+Create → Plan → [Plan Review Gate] → Develop → Test → [Browser QA] → Pre-Ship → Deploy → [Canary] → Close Out
+                  (gstack)                        (gstack)    (gstack)             (gstack)   (gstack retro+docs)
+```
+gstack quality gates are **additive** — if gstack is not installed, the original flow works unchanged.
+All gstack integration points gracefully degrade: check for `~/.claude/skills/gstack/` and skip if absent.
+## gstack Integration
+Yuri integrates with [gstack](https://github.com/garrytan/gstack) skills at 6 points in the lifecycle:
+| Phase | gstack Skill | Purpose | Required? |
+|-------|-------------|---------|-----------|
+| Plan (Phase 2) | `/plan-ceo-review`, `/plan-eng-review`, `/plan-design-review` | Challenge assumptions, audit architecture, validate design | Optional (user chooses depth) |
+| Develop (Phase 3) | `/investigate` | Root cause analysis for stuck stories | Auto-triggered on stuck_count=2 |
+| Test (Phase 4) | `/qa` | Browser-based end-to-end testing for UI projects | Auto for UI projects, skip for API-only |
+| Pre-Ship (Phase 4.5) | `/review`, `/cso`, `/benchmark`, `/design-review` | Code review, security audit, performance baseline, design QA | Recommended, skippable |
+| Deploy (Phase 5) | `/canary` | 10-minute post-deploy monitoring with regression detection | Auto if gstack available |
+| Close Out | `/document-release`, `/retro` | Sync docs with shipped code, structured retrospective | Auto after Phase 5 |
+**Install gstack**: `git clone --depth 1 https://github.com/garrytan/gstack.git ~/.claude/skills/gstack && cd ~/.claude/skills/gstack && ./setup`
 ## Memory Layout
 Yuri uses a four-layer memory system. Files are organized by access frequency and change rate.

package/skill/data/phase-routing.yaml CHANGED Viewed

@@ -18,6 +18,10 @@ routes:
     target_phase: 4
     command: "*test"
+  - intent_patterns: ["pre-ship", "quality gate", "code review", "security audit", "pre-launch check", "ready to ship"]
+    target_phase: 4.5
+    command: "*pre-ship"
   - intent_patterns: ["deploy", "ship", "release", "go live", "publish", "launch"]
     target_phase: 5
     command: "*deploy"

package/skill/tasks/_close-out.md CHANGED Viewed

@@ -90,3 +90,80 @@ IF the current phase just transitioned to `complete`:
    - Projects with `status: paused` for > 30 days: suggest archiving to user.
    - Projects with `status: maintenance` and no timeline events for > 60 days:
      remove from `~/.yuri/focus.yaml` → `attention_queue`.
+---
+## F.5 Post-Ship Documentation & Retrospective (gstack — only after Phase 5 completes)
+**Skip this section entirely unless Phase 5 (Deploy) just completed successfully.**
+Check gstack availability first:
+```bash
+test -d "$HOME/.claude/skills/gstack" && echo "gstack_available" || echo "gstack_missing"
+```
+IF `gstack_missing` → skip F.5 entirely.
+### F.5.1 Document Release (`/document-release`)
+**Purpose**: Automatically sync project documentation with what actually shipped. READMEs, ARCHITECTURE.md, CONTRIBUTING.md, CHANGELOG, and CLAUDE.md are often stale by deployment time.
+Execute:
+```
+/document-release
+```
+This reads all project docs, cross-references the diff since project start, and updates:
+- README.md (features, setup instructions, API docs)
+- ARCHITECTURE.md (if it exists — system design, data flows)
+- CHANGELOG.md (polished voice, consistent format)
+- Any other documentation that references changed code
+Report:
+```
+📝 Documentation synced with shipped code.
+Updated: {list of updated doc files}
+```
+Append to timeline:
+```jsonl
+{"ts":"{ISO-8601}","type":"document_release","files_updated":[{list}]}
+```
+### F.5.2 Engineering Retrospective (`/retro`)
+**Purpose**: Generate a structured retrospective analyzing the full project lifecycle — commit patterns, code quality metrics, testing coverage, shipping velocity, and actionable improvements for the next project.
+Execute:
+```
+/retro
+```
+This analyzes git history for the project and produces:
+- **Velocity metrics**: commits, LOC, test-to-production ratio
+- **Code quality signals**: test coverage trends, file hotspots, fix-to-feature ratio
+- **Time patterns**: work sessions, peak hours, focus score
+- **Top 3 wins**: highest-impact deliverables
+- **3 improvements**: specific, actionable, anchored in data
+- **3 habits for next project**: small, realistic practices
+Parse the retro output and extract universal insights for wisdom:
+**Promote to global wisdom** (if applicable):
+- Patterns that would benefit future projects → `~/.yuri/wisdom/workflow.md`
+- Technical gotchas discovered → `~/.yuri/wisdom/tech.md`
+- Process pitfalls encountered → `~/.yuri/wisdom/pitfalls.md`
+Report summary to user:
+```
+📊 Project Retrospective generated.
+Key stats: {commits} commits, {loc} LOC, {test_ratio}% test coverage
+Top win: {top_win}
+Top improvement: {top_improvement}
+```
+Append to timeline:
+```jsonl
+{"ts":"{ISO-8601}","type":"retro_completed","commits":{n},"loc":{n},"test_ratio":{n}}

package/skill/tasks/yuri-deploy-project.md CHANGED Viewed

@@ -9,6 +9,9 @@
 - Phase 4 (Test) must be complete.
 - `{project}/.yuri/state/phase4.yaml` must have `status: complete`.
+- Pre-Ship Quality Gate should be complete (recommended but not required).
+  - IF `{project}/.yuri/state/pre-ship.yaml` exists with `status: complete` → proceed.
+  - IF not → warn: "Pre-ship quality gates were not run. Consider running `*pre-ship` first." → ask user to continue or run pre-ship.
 ---
@@ -89,29 +92,105 @@ Update `{project}/.yuri/focus.yaml` → `action: "executing deployment via {stra
 ---
-## Step 4: Health Check
-After deployment completes:
+## Step 4: Post-Deploy Verification
+After deployment completes, set:
 ```bash
 DEPLOYED_URL="{url from deployment output}"
+```
+### 4.1 Quick Health Check (immediate)
+First, do a fast HTTP check to confirm the deployment succeeded at all:
+```bash
 HTTP_CODE=$(curl -s -o /dev/null -w "%{http_code}" "$DEPLOYED_URL" 2>/dev/null)
 ```
-IF `HTTP_CODE` = 200 (or expected success code):
-- `{project}/.yuri/state/phase5.yaml` → `status: "complete"`, `completed_at: now`, `url: "$DEPLOYED_URL"`, `health: "healthy"`
-- `{project}/.yuri/focus.yaml` → `step: "phase5.complete"`, `pulse: "Deployed and healthy at {url}"`
+IF `HTTP_CODE` != 200 → deployment itself failed:
+- Report error to user with diagnostics.
+- Offer: retry, try alternative strategy, or pause.
+- Do NOT proceed to canary monitoring.
+### 4.2 Canary Monitoring (gstack — 10 minutes)
+IF quick health check passed AND gstack is available:
+```bash
+test -d "$HOME/.claude/skills/gstack" && echo "gstack_available" || echo "gstack_missing"
+```
+**IF gstack available:**
+Report to user:
+```
+✅ Deployment succeeded (HTTP 200). Starting 10-minute canary monitoring...
+```
+Step 1 — Capture baseline (if pre-ship `/benchmark` was run, this is already done):
+```
+/canary {DEPLOYED_URL} --baseline
+```
+Step 2 — Start canary monitoring:
+```
+/canary {DEPLOYED_URL}
+```
+This monitors for 10 minutes, checking every 60 seconds:
+- Page load failures or timeouts
+- New console errors not in baseline
+- Performance regressions (load times 2x slower than baseline)
+- Broken links
+Wait for canary to complete. Parse results:
+- **Health status**: HEALTHY / DEGRADED / BROKEN
+- **Alert count**
+- **Rollback recommendation** (if any)
+IF canary reports **HEALTHY**:
+- Proceed to success state (Step 4.3).
+IF canary reports **DEGRADED**:
+- Report issues to user with screenshot evidence:
+  ```
+  ⚠️ Canary detected issues during monitoring:
+  {list of alerts with evidence}
+  The deployment is functional but has issues. Continue / Rollback?
+  ```
+  - IF continue → proceed to Step 4.3 with warnings logged.
+  - IF rollback → offer rollback guidance, pause.
+IF canary reports **BROKEN**:
+- Report critical failure:
+  ```
+  🚨 Canary detected critical failures:
+  {alerts with screenshots}
+  Recommend immediate rollback. Rollback / Investigate / Keep?
+  ```
+Append to timeline:
+```jsonl
+{"ts":"{ISO-8601}","type":"canary_completed","url":"{url}","health":"{status}","alerts":{count}}
+```
+**IF gstack NOT available** — fall back to basic check:
+- The HTTP 200 check from 4.1 is sufficient. Proceed to Step 4.3.
+- Note in timeline: `"canary":"skipped_no_gstack"`
+### 4.3 Mark Complete
+- `{project}/.yuri/state/phase5.yaml` → `status: "complete"`, `completed_at: now`, `url: "$DEPLOYED_URL"`, `health: "{canary_status or 'healthy'}"`, `canary_alerts: {count or 0}`
+- `{project}/.yuri/focus.yaml` → `step: "phase5.complete"`, `pulse: "Deployed and {health} at {url}"`
 - Append to timeline:
   ```jsonl
-  {"ts":"{ISO-8601}","type":"project_deployed","url":"{url}","strategy":"{strategy}","health":"healthy"}
+  {"ts":"{ISO-8601}","type":"project_deployed","url":"{url}","strategy":"{strategy}","health":"{status}","canary_alerts":{count}}
   {"ts":"{ISO-8601}","type":"phase_completed","phase":5}
   ```
 - Update `~/.yuri/portfolio/registry.yaml` → this project's `status: maintenance`, `pulse: "v1.0 deployed at {url}"`
-IF health check fails:
-- Report error to user with diagnostics.
-- Offer: retry, try alternative strategy, or pause.
 Report:
 ```
 ## ✅ Deployment Complete
@@ -120,7 +199,8 @@ Report:
 |------|--------|
 | **Strategy** | {strategy} |
 | **URL** | {url} |
-| **Health** | ✅ Healthy |
+| **Health** | {✅ Healthy / ⚠️ Degraded} |
+| **Canary** | {10 min monitoring: N alerts / skipped} |
 🎉 Project lifecycle complete! From idea to production.

package/skill/tasks/yuri-develop-project.md CHANGED Viewed

@@ -130,7 +130,61 @@ done
    - `/clear` → restart agent
 4. Increment `state/phase3.yaml` → `monitoring.stuck_count`.
-IF `stuck_count > 3`:
+IF `stuck_count` = 2 AND gstack is available:
+- **Escalate to `/investigate`** before requesting human intervention.
+- This provides a structured root cause analysis instead of blind retries.
+```bash
+test -d "$HOME/.claude/skills/gstack" && echo "gstack_available" || echo "gstack_missing"
+```
+IF gstack available:
+Report to user:
+```
+🔍 Story stuck after 2 recovery attempts. Running root cause investigation...
+```
+Execute in Yuri's own session:
+```
+/investigate
+```
+Provide `/investigate` with context:
+- Captured tmux pane contents from all 4 windows
+- Current story ID and description
+- Error patterns detected
+- What recovery was already attempted
+Wait for `/investigate` to complete. Parse output:
+- **Root cause** identified?
+- **Fix confidence** (1-10)
+- **Recommended fix** with file:line references
+IF fix confidence ≥ 8:
+- Report to user: "Root cause identified: {summary}. Auto-fix recommended. Proceed? (Y/N)"
+- IF Y → route fix to Dev agent via `*quick-fix "{investigate_fix_description}"`, then resume monitoring.
+- IF N → present full investigation report, let user decide.
+IF fix confidence < 8:
+- Report full investigation to user with diagnostics:
+  ```
+  🔍 Investigation Report:
+  - Symptom: {symptom}
+  - Root cause: {root_cause or "inconclusive"}
+  - Confidence: {score}/10
+  - Recommended action: {action}
+  Please advise: Fix manually / Retry / Skip this story / Pause?
+  ```
+Append to timeline:
+```jsonl
+{"ts":"{ISO-8601}","type":"investigate","story":"{id}","root_cause":"{summary}","confidence":{score}}
+```
+IF `stuck_count > 3` (after investigate attempt or if gstack not available):
 - Report to user with full diagnostics.
 - Request human intervention.
 - Pause monitoring loop.

package/skill/tasks/yuri-plan-project.md CHANGED Viewed

@@ -196,7 +196,7 @@ tmux kill-session -t "$SESSION"
 2. Update memory:
 - `{project}/.yuri/state/phase2.yaml` → `status: complete`, `completed_at: now`
-- `{project}/.yuri/focus.yaml` → `step: "phase2.complete"`, `pulse: "Phase 2 complete, ready for development"`, `tmux.planning_session: ""`
+- `{project}/.yuri/focus.yaml` → `step: "phase2.complete"`, `pulse: "Phase 2 complete, ready for review"`, `tmux.planning_session: ""`
 3. Output summary:
 ```
@@ -210,8 +210,149 @@ All planning documents generated:
 - Sharded context files for development
 ```
-4. Ask:
+4. Proceed to Plan Review Gate (Step 4).
+---
+## Step 4: Plan Review Gate (gstack)
+**Purpose**: Use gstack review skills to challenge assumptions, audit architecture, and validate design before writing any code. Catching issues here is 10x cheaper than catching them in development.
+### 4.0 Check gstack availability
+```bash
+test -d "$HOME/.claude/skills/gstack" && echo "gstack_available" || echo "gstack_missing"
+```
+IF `gstack_missing` → skip to Step 4.5 (ask user to proceed to development).
+### 4.1 Ask user for review depth
+```
+## 📋 Plan Review Gate
+Planning is complete. Before development begins, I can run independent quality reviews on the plan using gstack:
+| # | Review | What it does | Time |
+|---|--------|-------------|------|
+| 1 | **Full Review** | CEO strategy + Eng architecture + Design (if UI) | ~10 min |
+| 2 | **Eng Only** | Architecture, test coverage, failure modes | ~5 min |
+| 3 | **Skip** | Go straight to development | 0 min |
+Recommendation: **Full Review** for new projects, **Eng Only** for iterations.
+Select (1/2/3):
+```
+- IF user selects **3 (Skip)** → go to Step 4.5.
+- IF user selects **2 (Eng Only)** → run only Step 4.3.
+- IF user selects **1 (Full Review)** → run Steps 4.2 through 4.4.
+### 4.2 CEO Strategy Review (`/plan-ceo-review`)
+Report:
+```
+🎯 Plan Review 1/3: CEO Strategy Review — challenging scope and premises...
+```
+Execute in Yuri's own session:
+```
+/plan-ceo-review
+```
+This runs in **HOLD SCOPE** mode by default (maximum rigor without scope creep).
+Wait for completion. Key outputs to capture:
+- Premise challenges
+- Error/rescue registry
+- Failure modes
+- "Not in scope" section
+- Review readiness score
+IF `/plan-ceo-review` proposes plan modifications:
+- Show diff summary to user.
+- Ask: "Accept changes / Modify / Reject?"
+- IF accepted → plan files updated (gstack writes directly).
+- IF rejected → revert changes, continue with original plan.
+Append to timeline:
+```jsonl
+{"ts":"{ISO-8601}","type":"plan_review","reviewer":"ceo","status":"complete"}
+```
+### 4.3 Engineering Architecture Review (`/plan-eng-review`)
+Report:
 ```
+🏗️ Plan Review 2/3: Engineering Review — locking architecture and test strategy...
+```
+Execute:
+```
+/plan-eng-review
+```
+Key outputs:
+- Architecture diagram
+- Coverage diagram (code paths with test status)
+- Failure mode analysis per new codepath
+- Worktree parallelization strategy
+- Issues found per section
+IF review proposes plan changes → same accept/modify/reject flow as 4.2.
+Append to timeline:
+```jsonl
+{"ts":"{ISO-8601}","type":"plan_review","reviewer":"eng","status":"complete"}
+```
+### 4.4 Design Review (`/plan-design-review`) — UI projects only
+Check if project has UI components:
+1. Look for frontend files or frontend stack in `{project}/.yuri/identity.yaml`.
+2. Check if `docs/front-end-spec*.md` exists.
+IF no UI → skip with note:
+```
+ℹ️ Plan Review 3/3: Design Review — skipped (no UI components)
+```
+IF UI present:
+Report:
+```
+🎨 Plan Review 3/3: Design Review — auditing UX dimensions...
+```
+Execute:
+```
+/plan-design-review
+```
+Key outputs:
+- 7-dimension scorecard (info architecture, interaction states, user journey, AI slop risk, design system, responsive/a11y, unresolved decisions)
+- Updated plan with design decisions filled in
+IF review proposes changes → same accept/modify/reject flow.
+Append to timeline:
+```jsonl
+{"ts":"{ISO-8601}","type":"plan_review","reviewer":"design","status":"complete"}
+```
+### 4.5 Review Summary + Transition
+```
+## 📋 Plan Review Complete
+| Review | Status | Key Finding |
+|--------|--------|-------------|
+| CEO Strategy | {✅/⏭️} | {one-line summary} |
+| Eng Architecture | {✅/⏭️} | {one-line summary} |
+| Design | {✅/⏭️} | {one-line summary} |
 🚀 Ready to start automated development? (Y/N)
 ```

package/skill/tasks/yuri-pre-ship.md ADDED Viewed

@@ -0,0 +1,334 @@
+# Phase 4.5: Pre-Ship Quality Gate
+**Command**: `*pre-ship`
+**Purpose**: Run comprehensive quality checks (code review, security audit, performance baseline, design audit) before deployment. Uses gstack skills as independent quality gates.
+---
+## Prerequisites
+- Phase 4 (Test) must be complete.
+- `{project}/.yuri/state/phase4.yaml` must have `status: complete`.
+- gstack must be installed (`~/.claude/skills/gstack/` must exist).
+---
+## Step 0: Wake Up
+Read [_wake-up.md](tasks/_wake-up.md) and execute it fully.
+After Wake Up, validate:
+1. Read `{project}/.yuri/state/phase4.yaml` → verify `status` = `complete`.
+   - IF not → "Phase 4 not complete. Run `*test` first." and stop.
+2. Set `PROJECT_DIR` from `{project}/.yuri/identity.yaml` → `project.root`.
+3. Check gstack availability:
+```bash
+test -d "$HOME/.claude/skills/gstack" && echo "gstack_available" || echo "gstack_missing"
+```
+   - IF `gstack_missing` → warn user: "gstack not installed. Install with: `git clone --depth 1 https://github.com/garrytan/gstack.git ~/.claude/skills/gstack && cd ~/.claude/skills/gstack && ./setup`". Offer to skip pre-ship and go directly to deploy.
+**Resumption check:**
+- IF `{project}/.yuri/state/pre-ship.yaml` exists with `status: in_progress`:
+  → Find last completed gate → resume from the next one.
+- IF `{project}/.yuri/state/pre-ship.yaml` exists with `status: complete`:
+  → Offer to skip to Phase 5.
+- OTHERWISE → initialize `state/pre-ship.yaml`:
+  ```yaml
+  status: pending
+  started_at: ""
+  completed_at: ""
+  gates:
+    code_review:
+      status: pending
+      score: null
+      findings: 0
+      critical: 0
+    security_audit:
+      status: pending
+      score: null
+      findings: 0
+      critical: 0
+    performance_baseline:
+      status: pending
+      captured: false
+    design_audit:
+      status: pending
+      score: null
+      has_ui: null
+  overall_verdict: pending
+  ```
+Update memory:
+- `{project}/.yuri/focus.yaml` → `phase: 4.5`, `step: "pre-ship"`, `action: "running quality gates"`, `updated_at: now`
+- `{project}/.yuri/state/pre-ship.yaml` → `status: "in_progress"`, `started_at: now`
+- `~/.yuri/focus.yaml` → `active_action: "pre-ship quality gates: {name}"`, `updated_at: now`
+- Append to `{project}/.yuri/timeline/events.jsonl`:
+  ```jsonl
+  {"ts":"{ISO-8601}","type":"pre_ship_started","gates":["code_review","security_audit","performance_baseline","design_audit"]}
+  ```
+- Save all files immediately.
+---
+## Step 1: Code Review Gate (`/review`)
+**Purpose**: Catch structural issues that tests miss — SQL safety, race conditions, LLM trust boundaries, type coercion, async mixing, documentation staleness.
+Report to user:
+```
+🔍 Gate 1/4: Code Review — scanning diff against base branch...
+```
+Execute in the current Claude Code session (Yuri's own session, NOT a tmux agent window):
+```
+/review
+```
+Wait for `/review` to complete. Parse the output for:
+- **PR Quality Score** (0-10)
+- **Critical findings** (severity ≥ 8)
+- **Total findings count**
+- **Auto-fixed items**
+Update `{project}/.yuri/state/pre-ship.yaml`:
+```yaml
+gates.code_review:
+  status: complete
+  score: {pr_quality_score}
+  findings: {total_findings}
+  critical: {critical_count}
+  completed_at: "{ISO-8601}"
+```
+Append to timeline:
+```jsonl
+{"ts":"{ISO-8601}","type":"gate_completed","gate":"code_review","score":{score},"findings":{findings},"critical":{critical}}
+```
+### Gate Decision
+- IF `critical` = 0 → **PASS** — continue to next gate.
+- IF `critical` > 0 AND all were auto-fixed → **PASS with fixes** — continue.
+- IF `critical` > 0 AND some unfixed → **BLOCK** — report to user:
+  ```
+  ⚠️ Code Review found {critical} critical issue(s) requiring attention:
+  {list of unfixed critical findings}
+  Fix now / Skip / Abort pre-ship?
+  ```
+  - **Fix now** → user or Dev agent fixes, then re-run `/review`.
+  - **Skip** → proceed with warning logged.
+  - **Abort** → save state, stop.
+---
+## Step 2: Security Audit Gate (`/cso`)
+**Purpose**: Detect exploitable vulnerabilities — secrets in git history, dependency supply chain, OWASP Top 10, CI/CD exposure, LLM/AI security risks.
+Report to user:
+```
+🛡️ Gate 2/4: Security Audit — scanning for vulnerabilities...
+```
+Execute:
+```
+/cso
+```
+This runs in **daily mode** (default: 8/10 confidence gate, zero-noise).
+Wait for completion. Parse for:
+- **Security posture** (findings count, severity distribution)
+- **Critical/High findings** (confidence ≥ 8)
+- **Attack surface census**
+Update `{project}/.yuri/state/pre-ship.yaml`:
+```yaml
+gates.security_audit:
+  status: complete
+  findings: {total}
+  critical: {critical_high_count}
+  completed_at: "{ISO-8601}"
+```
+Append to timeline:
+```jsonl
+{"ts":"{ISO-8601}","type":"gate_completed","gate":"security_audit","findings":{findings},"critical":{critical}}
+```
+### Gate Decision
+- IF `critical` = 0 → **PASS**.
+- IF `critical` > 0 → **BLOCK** — report with exploit scenarios:
+  ```
+  🚨 Security Audit found {critical} exploitable issue(s):
+  {list with severity, file:line, exploit scenario}
+  Fix now / Skip (accept risk) / Abort?
+  ```
+  - **Fix now** → address findings, re-run `/cso`.
+  - **Skip** → proceed with risk accepted, log to `knowledge/decisions.md`.
+  - **Abort** → save state, stop.
+---
+## Step 3: Performance Baseline Gate (`/benchmark`)
+**Purpose**: Capture performance baseline metrics before deployment. These become the reference for post-deploy canary comparison.
+Report to user:
+```
+📊 Gate 3/4: Performance Baseline — capturing metrics...
+```
+**Precondition**: Project must have a running dev server URL. Check:
+1. Read `{project}/.orchestrix-core/core-config.yaml` for `dev_server_url`.
+2. IF no dev server URL configured:
+   - Try common patterns: `http://localhost:3000`, `http://localhost:5173`, `http://localhost:8080`.
+   - IF none respond → skip this gate with note: "No dev server detected. Performance baseline skipped."
+   - Update gate status to `skipped` and continue.
+IF dev server is available:
+```
+/benchmark --baseline
+```
+Wait for completion. Parse for:
+- **Core Web Vitals** (TTFB, FCP, LCP)
+- **Bundle sizes** (JS, CSS)
+- **Resource counts**
+Update `{project}/.yuri/state/pre-ship.yaml`:
+```yaml
+gates.performance_baseline:
+  status: complete
+  captured: true
+  dev_server_url: "{url}"
+  completed_at: "{ISO-8601}"
+```
+Append to timeline:
+```jsonl
+{"ts":"{ISO-8601}","type":"gate_completed","gate":"performance_baseline","captured":true}
+```
+This gate does NOT block — it only captures baseline for later comparison.
+---
+## Step 4: Design Audit Gate (`/design-review`)
+**Purpose**: Catch visual inconsistencies, AI slop patterns, spacing issues, hierarchy problems before users see them.
+**Precondition**: Project must have UI components. Check:
+1. Look for frontend files: `src/**/*.{tsx,jsx,vue,svelte}`, `app/**/*.{tsx,jsx}`, `pages/**/*.{tsx,jsx}`.
+2. Check `{project}/.yuri/identity.yaml` → `stack` field for frontend frameworks.
+3. IF no UI detected → skip this gate:
+   ```
+   ℹ️ Gate 4/4: Design Audit — skipped (no UI components detected)
+   ```
+   Update gate status to `skipped` and continue.
+IF UI is present AND dev server is available:
+Report to user:
+```
+🎨 Gate 4/4: Design Audit — reviewing visual quality...
+```
+```
+/design-review quick
+```
+Wait for completion. Parse for:
+- **Design score** (A-F)
+- **AI Slop score** (A-F)
+- **Per-category grades**
+- **Critical visual issues**
+Update `{project}/.yuri/state/pre-ship.yaml`:
+```yaml
+gates.design_audit:
+  status: complete
+  score: "{design_grade}"
+  has_ui: true
+  completed_at: "{ISO-8601}"
+```
+Append to timeline:
+```jsonl
+{"ts":"{ISO-8601}","type":"gate_completed","gate":"design_audit","score":"{grade}"}
+```
+### Gate Decision
+- IF design score ≥ C → **PASS**.
+- IF design score < C → **WARN** — report issues but do not block:
+  ```
+  ⚠️ Design audit scored {grade}. Key issues:
+  {list of top issues}
+  These are non-blocking but recommended to fix before launch.
+  Continue to deploy / Fix first?
+  ```
+---
+## Step 5: Quality Gate Summary
+Compile results from all gates:
+```
+## 🏁 Pre-Ship Quality Gate Report
+| Gate | Status | Score | Findings | Critical |
+|------|--------|-------|----------|----------|
+| Code Review | {✅/⚠️/❌} | {score}/10 | {n} | {n} |
+| Security Audit | {✅/⚠️/❌} | — | {n} | {n} |
+| Performance Baseline | {✅/⏭️} | — | captured | — |
+| Design Audit | {✅/⚠️/⏭️} | {grade} | {n} | — |
+**Overall**: {PASS / PASS WITH WARNINGS / BLOCKED}
+```
+### Overall Verdict Logic
+- **All gates PASS** → `overall_verdict: pass`
+- **Any gate WARN but no BLOCK** → `overall_verdict: pass_with_warnings`
+- **Any gate BLOCK** → `overall_verdict: blocked` (should not reach here — blocks are handled per-gate)
+Update `{project}/.yuri/state/pre-ship.yaml`:
+```yaml
+status: complete
+completed_at: "{ISO-8601}"
+overall_verdict: "{verdict}"
+```
+Append to timeline:
+```jsonl
+{"ts":"{ISO-8601}","type":"pre_ship_completed","verdict":"{verdict}"}
+```
+Update focus:
+- `{project}/.yuri/focus.yaml` → `step: "pre-ship.complete"`, `pulse: "Pre-ship {verdict}, ready for deploy"`
+### Transition
+```
+🚀 Pre-ship quality gates complete. Ready to deploy? (Y/N)
+```
+- If Y → execute `tasks/yuri-deploy-project.md`
+- If N → save state, end with reminder: "Run `/yuri *deploy` when ready."
+---
+## Final Step: Close Out
+Read [_close-out.md](tasks/_close-out.md) and execute it fully.
+Pre-ship often reveals security patterns and code quality insights — Phase Reflect should capture these for project knowledge.

package/skill/tasks/yuri-test-project.md CHANGED Viewed

@@ -157,29 +157,115 @@ IF signal detected → append to `~/.yuri/inbox.jsonl`.
 ---
-## Step 3: All Epics Tested
+## Step 3: Browser QA (gstack — UI projects only)
-1. Check results: count passed vs failed.
+**Purpose**: After code-level smoke tests pass, run real browser-based end-to-end testing to catch rendering bugs, JS runtime errors, broken interactions, and visual issues that unit/integration tests cannot detect.
-2. IF all passed:
+### 3.0 Check preconditions
+Skip this step entirely IF any of:
+- gstack is not installed (`~/.claude/skills/gstack/` missing)
+- Project has no UI (no frontend files, no `docs/front-end-spec*.md`)
+- No dev server URL is available
+```bash
+HAS_GSTACK=$(test -d "$HOME/.claude/skills/gstack" && echo "yes" || echo "no")
+HAS_UI=$(ls "$PROJECT_DIR/src/"*.{tsx,jsx,vue,svelte} "$PROJECT_DIR/app/"*.{tsx,jsx} 2>/dev/null | head -1)
+```
+IF skipping → report:
+```
+ℹ️ Browser QA skipped ({reason: no gstack / no UI / no dev server})
+```
+→ proceed to Step 4.
+### 3.1 Run Browser QA
+Report to user:
+```
+🌐 Running browser-based QA testing...
+```
+Execute in Yuri's own session:
+```
+/qa
+```
+This runs in **Standard** tier by default (systematic exploration of all pages, 5-10 issues).
+Wait for completion. Parse for:
+- **Health score** (0-100)
+- **Issues found** (by severity: critical, high, medium, low)
+- **Auto-fixed count** (gstack /qa fixes bugs and commits atomically)
+- **Before/after screenshots**
+### 3.2 Evaluate Browser QA Results
+IF health score ≥ 80 AND no critical issues:
+- Report:
+  ```
+  ✅ Browser QA passed (score: {score}/100, {fixed} issues auto-fixed)
+  ```
+- Proceed to Step 4.
+IF health score < 80 OR critical issues found:
+- Report:
+  ```
+  ⚠️ Browser QA found issues (score: {score}/100):
+  - Critical: {n} | High: {n} | Medium: {n}
+  - Auto-fixed: {n}
+  - Remaining: {n}
+  {list of unfixed critical/high issues with repro steps}
+  Fix remaining / Accept and continue / Re-run QA?
+  ```
+  - **Fix remaining** → route unfixed bugs to Dev agent via `*quick-fix`, then re-run `/qa`.
+  - **Accept** → proceed with warnings logged.
+  - **Re-run** → execute `/qa` again.
+Append to timeline:
+```jsonl
+{"ts":"{ISO-8601}","type":"browser_qa","score":{score},"issues":{total},"fixed":{fixed}}
+```
+Update `{project}/.yuri/state/phase4.yaml` → add `browser_qa` section:
+```yaml
+browser_qa:
+  status: complete
+  score: {score}
+  issues: {total}
+  fixed: {fixed}
+```
+---
+## Step 4: All Testing Complete
+1. Check results: count passed vs failed epics + browser QA status.
+2. IF all epics passed (and browser QA passed or skipped):
    - `{project}/.yuri/state/phase4.yaml` → `status: "complete"`, `completed_at: now`
    - `{project}/.yuri/focus.yaml` → `step: "phase4.complete"`, `pulse: "Phase 4 complete, all epics passed"`
    - Append: `{"ts":"...","type":"phase_completed","phase":4}` to timeline.
-3. IF some failed:
+3. IF some epics failed:
    - Report failed epics to user.
-   - Ask: "Retry failed epics, skip to deploy, or pause?"
+   - Ask: "Retry failed epics, skip to pre-ship, or pause?"
    - IF retry → re-enter Step 2 for failed epics only.
    - IF skip → mark phase complete with note about failures.
    - IF pause → save state, stop.
-4. Present deployment options:
+4. Route to Pre-Ship Quality Gate:
 ```
-🚀 All smoke tests passed! Ready to deploy? (Y/N)
+🚀 All tests passed! Next: Pre-Ship Quality Gate (code review + security + performance).
+Run quality gates now? (Y / Skip to deploy)
 ```
-- If Y → execute `tasks/yuri-deploy-project.md`
-- If N → save state, end with reminder: "Run `/yuri *deploy` when ready."
+- If Y → execute `tasks/yuri-pre-ship.md`
+- If "Skip to deploy" → execute `tasks/yuri-deploy-project.md`
+- If user wants to pause → save state, end with reminder: "Run `/yuri *pre-ship` or `/yuri *deploy` when ready."
 ---