npm - opengstack - Versions diffs - 0.14.0 → 0.14.2 - Mend

opengstack 0.14.0 → 0.14.2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (69) hide show

package/AGENTS.md +4 -4
package/CLAUDE.md +127 -110
package/README.md +10 -5
package/SKILL.md +500 -70
package/bin/opengstack.js +69 -69
package/commands/autoplan.md +7 -9
package/commands/benchmark.md +84 -91
package/commands/browse.md +60 -64
package/commands/canary.md +7 -9
package/commands/careful.md +2 -2
package/commands/codex.md +7 -9
package/commands/connect-chrome.md +7 -9
package/commands/cso.md +7 -9
package/commands/design-consultation.md +7 -9
package/commands/design-review.md +7 -9
package/commands/design-shotgun.md +7 -9
package/commands/document-release.md +7 -9
package/commands/freeze.md +3 -3
package/commands/guard.md +4 -4
package/commands/investigate.md +7 -9
package/commands/land-and-deploy.md +7 -9
package/commands/office-hours.md +7 -9
package/commands/{gstack-upgrade.md → opengstack-upgrade.md} +64 -65
package/commands/plan-ceo-review.md +7 -9
package/commands/plan-design-review.md +7 -9
package/commands/plan-eng-review.md +7 -9
package/commands/qa-only.md +7 -9
package/commands/qa.md +7 -9
package/commands/retro.md +7 -9
package/commands/review.md +7 -9
package/commands/setup-browser-cookies.md +22 -26
package/commands/setup-deploy.md +7 -9
package/commands/ship.md +7 -9
package/commands/unfreeze.md +7 -7
package/docs/designs/CHROME_VS_CHROMIUM_EXPLORATION.md +9 -9
package/docs/designs/CONDUCTOR_CHROME_SIDEBAR_INTEGRATION.md +2 -2
package/docs/designs/CONDUCTOR_SESSION_API.md +16 -16
package/docs/designs/DESIGN_SHOTGUN.md +74 -74
package/docs/designs/DESIGN_TOOLS_V1.md +111 -111
package/docs/skills.md +483 -202
package/package.json +42 -43
package/scripts/analytics.ts +188 -0
package/scripts/dev-skill.ts +83 -0
package/scripts/discover-skills.ts +39 -0
package/scripts/eval-compare.ts +97 -0
package/scripts/eval-list.ts +117 -0
package/scripts/eval-select.ts +86 -0
package/scripts/eval-summary.ts +188 -0
package/scripts/eval-watch.ts +172 -0
package/scripts/gen-skill-docs.ts +473 -0
package/scripts/resolvers/browse.ts +129 -0
package/scripts/resolvers/codex-helpers.ts +133 -0
package/scripts/resolvers/composition.ts +48 -0
package/scripts/resolvers/confidence.ts +37 -0
package/scripts/resolvers/constants.ts +50 -0
package/scripts/resolvers/design.ts +950 -0
package/scripts/resolvers/index.ts +59 -0
package/scripts/resolvers/learnings.ts +96 -0
package/scripts/resolvers/preamble.ts +505 -0
package/scripts/resolvers/review.ts +884 -0
package/scripts/resolvers/testing.ts +573 -0
package/scripts/resolvers/types.ts +45 -0
package/scripts/resolvers/utility.ts +421 -0
package/scripts/skill-check.ts +190 -0
package/scripts/cleanup.py +0 -100
package/scripts/filter-skills.sh +0 -114
package/scripts/filter_skills.py +0 -164
package/scripts/install-commands.js +0 -45
package/scripts/install-skills.js +0 -60

package/docs/skills.md CHANGED Viewed

@@ -1,6 +1,6 @@
 # Skill Deep Dives
-Detailed guides for every gstack skill — philosophy, workflow, and examples.
+Detailed guides for every opengstack skill — philosophy, workflow, and examples.
 | Skill | Your specialist | What they do |
 |-------|----------------|--------------|
@@ -12,14 +12,21 @@ Detailed guides for every gstack skill — philosophy, workflow, and examples.
 | [`/review`](#review) | **Staff Engineer** | Find the bugs that pass CI but blow up in production. Auto-fixes the obvious ones. Flags completeness gaps. |
 | [`/investigate`](#investigate) | **Debugger** | Systematic root-cause debugging. Iron Law: no fixes without investigation. Traces data flow, tests hypotheses, stops after 3 failed fixes. |
 | [`/design-review`](#design-review) | **Designer Who Codes** | Live-site visual audit + fix loop. 80-item audit, then fixes what it finds. Atomic commits, before/after screenshots. |
+| [`/design-shotgun`](#design-shotgun) | **Design Explorer** | Generate multiple AI design variants, open a comparison board in your browser, and iterate until you approve a direction. Taste memory biases toward your preferences. |
+| [`/design-html`](#design-html) | **Design Engineer** | Takes an approved mockup from `/design-shotgun` and generates production-quality Pretext-native HTML. Text reflows on resize, heights adjust to content. Smart API routing per design type. Framework detection for React/Svelte/Vue. |
 | [`/qa`](#qa) | **QA Lead** | Test your app, find bugs, fix them with atomic commits, re-verify. Auto-generates regression tests for every fix. |
 | [`/qa-only`](#qa) | **QA Reporter** | Same methodology as /qa but report only. Use when you want a pure bug report without code changes. |
 | [`/ship`](#ship) | **Release Engineer** | Sync main, run tests, audit coverage, push, open PR. Bootstraps test frameworks if you don't have one. One command. |
+| [`/land-and-deploy`](#land-and-deploy) | **Release Engineer** | Merge the PR, wait for CI and deploy, verify production health. One command from "approved" to "verified in production." |
+| [`/canary`](#canary) | **SRE** | Post-deploy monitoring loop. Watches for console errors, performance regressions, and page failures using the browse daemon. |
+| [`/benchmark`](#benchmark) | **Performance Engineer** | Baseline page load times, Core Web Vitals, and resource sizes. Compare before/after on every PR. Track trends over time. |
 | [`/cso`](#cso) | **Chief Security Officer** | OWASP Top 10 + STRIDE threat modeling security audit. Scans for injection, auth, crypto, and access control issues. |
 | [`/document-release`](#document-release) | **Technical Writer** | Update all project docs to match what you just shipped. Catches stale READMEs automatically. |
 | [`/retro`](#retro) | **Eng Manager** | Team-aware weekly retro. Per-person breakdowns, shipping streaks, test health trends, growth opportunities. |
 | [`/browse`](#browse) | **QA Engineer** | Give the agent eyes. Real Chromium browser, real clicks, real screenshots. ~100ms per command. |
 | [`/setup-browser-cookies`](#setup-browser-cookies) | **Session Manager** | Import cookies from your real browser (Chrome, Arc, Brave, Edge) into the headless session. Test authenticated pages. |
+| [`/autoplan`](#autoplan) | **Review Pipeline** | One command, fully reviewed plan. Runs CEO → design → eng review automatically with encoded decision principles. Surfaces only taste decisions for your approval. |
+| [`/learn`](#learn) | **Memory** | Manage what opengstack learned across sessions. Review, search, prune, and export project-specific patterns and preferences. |
 | | | |
 | **Multi-AI** | | |
 | [`/codex`](#codex) | **Second Opinion** | Independent review from OpenAI Codex CLI. Three modes: code review (pass/fail gate), adversarial challenge, and open consultation with session continuity. Cross-model analysis when both `/review` and `/codex` have run. |
@@ -29,7 +36,9 @@ Detailed guides for every gstack skill — philosophy, workflow, and examples.
 | [`/freeze`](#safety--guardrails) | **Edit Lock** | Restrict all file edits to a single directory. Blocks Edit and Write outside the boundary. Accident prevention for debugging. |
 | [`/guard`](#safety--guardrails) | **Full Safety** | Combines /careful + /freeze in one command. Maximum safety for prod work. |
 | [`/unfreeze`](#safety--guardrails) | **Unlock** | Remove the /freeze boundary, allowing edits everywhere again. |
-| [`/gstack-upgrade`](#gstack-upgrade) | **Self-Updater** | Upgrade gstack to the latest version. Detects global vs vendored install, syncs both, shows what changed. |
+| [`/connect-chrome`](#connect-chrome) | **Chrome Controller** | Launch your real Chrome controlled by opengstack with the Side Panel extension. Watch every action live. |
+| [`/setup-deploy`](#setup-deploy) | **Deploy Configurator** | One-time setup for `/land-and-deploy`. Detects your platform, production URL, and deploy commands. |
+| [`/opengstack-upgrade`](#opengstack-upgrade) | **Self-Updater** | Upgrade opengstack to the latest version. Detects global vs vendored install, syncs both, shows what changed. |
 ---
@@ -84,7 +93,7 @@ Recommends A because you learn from real usage. CRM data comes naturally in week
 ### The design doc
-Both modes end with a design doc written to `~/.gstack/projects/` — and that doc feeds directly into `/plan-ceo-review` and `/plan-eng-review`. The full lifecycle is now: `office-hours → plan → implement → review → QA → ship → retro`.
+Both modes end with a design doc written to `~/.opengstack/projects/` — and that doc feeds directly into `/plan-ceo-review` and `/plan-eng-review`. The full lifecycle is now: `office-hours → plan → implement → review → QA → ship → retro`.
 After the design doc is approved, `/office-hours` reflects on what it noticed about how you think — not generic praise, but specific callbacks to things you said during the session. The observations appear in the design doc too, so you re-encounter them when you re-read later.
@@ -138,7 +147,7 @@ It asks, **"what is the 10-star product hiding inside this request?"**
 - **HOLD SCOPE** — maximum rigor on the existing plan. No expansions surfaced.
 - **SCOPE REDUCTION** — find the minimum viable version. Cut everything else.
-Visions and decisions are persisted to `~/.gstack/projects/` so they survive beyond the conversation. Exceptional visions can be promoted to `docs/designs/` in your repo for the team.
+Visions and decisions are persisted to `~/.opengstack/projects/` so they survive beyond the conversation. Exceptional visions can be promoted to `docs/designs/` in your repo for the team.
 ---
@@ -203,23 +212,23 @@ Every review (CEO, Eng, Design) logs its result. At the end of each review, you
 ```
 +====================================================================+
-|                    REVIEW READINESS DASHBOARD                       |
+| REVIEW READINESS DASHBOARD |
 +====================================================================+
-| Review          | Runs | Last Run            | Status    | Required |
+| Review | Runs | Last Run | Status | Required |
 |-----------------|------|---------------------|-----------|----------|
-| Eng Review      |  1   | 2026-03-16 15:00    | CLEAR     | YES      |
-| CEO Review      |  1   | 2026-03-16 14:30    | CLEAR     | no       |
-| Design Review   |  0   | —                   | —         | no       |
+| Eng Review | 1 | 2026-03-16 15:00 | CLEAR | YES |
+| CEO Review | 1 | 2026-03-16 14:30 | CLEAR | no |
+| Design Review | 0 | — | — | no |
 +--------------------------------------------------------------------+
-| VERDICT: CLEARED — Eng Review passed                                |
+| VERDICT: CLEARED — Eng Review passed |
 +====================================================================+
 ```
-Eng Review is the only required gate (disable with `gstack-config set skip_eng_review true`). CEO and Design are informational — recommended for product and UI changes respectively.
+Eng Review is the only required gate (disable with `opengstack-config set skip_eng_review true`). CEO and Design are informational — recommended for product and UI changes respectively.
 ### Plan-to-QA flow
-When `/plan-eng-review` finishes the test review section, it writes a test plan artifact to `~/.gstack/projects/`. When you later run `/qa`, it picks up that test plan automatically — your engineering review feeds directly into QA testing with no manual copy-paste.
+When `/plan-eng-review` finishes the test review section, it writes a test plan artifact to `~/.opengstack/projects/`. When you later run `/qa`, it picks up that test plan automatically — your engineering review feeds directly into QA testing with no manual copy-paste.
 ---
@@ -238,34 +247,34 @@ Seven passes over the plan: information architecture, interaction state coverage
 ### Example
 ```
-You:   /plan-design-review
+You: /plan-design-review
 Claude: Initial Design Rating: 4/10
-        "This plan describes a user dashboard but never specifies
-         what the user sees first. It says 'cards with icons' —
-         which looks like every SaaS template. It mentions zero
-         loading states, zero empty states, and no mobile behavior."
-        Pass 1 (Info Architecture): 3/10
-        "A 10 would define primary/secondary/tertiary content
-         hierarchy for every screen."
-        → Added information hierarchy section to plan
-        Pass 2 (Interaction States): 2/10
-        "The plan has 4 UI features but specifies 0 out of 20
-         interaction states (4 features × 5 states each)."
-        → Added interaction state table to plan
-        Pass 4 (AI Slop): 4/10
-        "The plan says 'clean, modern UI with cards and icons'
-         and 'hero section with gradient'. These are the top 2
-         AI-generated-looking patterns."
-        → Rewrote UI descriptions with specific, intentional alternatives
-        Overall: 4/10 → 8/10 after fixes
-        "Plan is design-complete. Run /design-review after
-         implementation for visual QA."
+ "This plan describes a user dashboard but never specifies
+ what the user sees first. It says 'cards with icons' —
+ which looks like every SaaS template. It mentions zero
+ loading states, zero empty states, and no mobile behavior."
+ Pass 1 (Info Architecture): 3/10
+ "A 10 would define primary/secondary/tertiary content
+ hierarchy for every screen."
+ → Added information hierarchy section to plan
+ Pass 2 (Interaction States): 2/10
+ "The plan has 4 UI features but specifies 0 out of 20
+ interaction states (4 features × 5 states each)."
+ → Added interaction state table to plan
+ Pass 4 (AI Slop): 4/10
+ "The plan says 'clean, modern UI with cards and icons'
+ and 'hero section with gradient'. These are the top 2
+ AI-generated-looking patterns."
+ → Rewrote UI descriptions with specific, intentional alternatives
+ Overall: 4/10 → 8/10 after fixes
+ "Plan is design-complete. Run /design-review after
+ implementation for visual QA."
 ```
 When you re-run it, sections already at 8+ get a quick pass. Sections below 8 get full treatment. For live-site visual audits post-implementation, use `/design-review`.
@@ -293,66 +302,66 @@ Then it writes `DESIGN.md` to your repo root — your project's design source of
 ### Example
 ```
-You:   /design-consultation
+You: /design-consultation
 Claude: From the README, this looks like a developer analytics dashboard
-        for engineering teams. Sound right? Tell me about the feeling
-        you want — and would you like me to look at what's out there
-        first so we can see the landscape?
+ for engineering teams. Sound right? Tell me about the feeling
+ you want — and would you like me to look at what's out there
+ first so we can see the landscape?
-You:    Yes, it's for eng managers. Show me what's out there.
+You: Yes, it's for eng managers. Show me what's out there.
 Claude: [browses Linear, Vercel, Datadog, Posthog — takes screenshots]
-        Here's the landscape. They all converge on clean sans-serifs
-        and muted blue-gray palettes. Linear stands out with dark-first
-        and tight spacing. The rest feel interchangeable.
+ Here's the landscape. They all converge on clean sans-serifs
+ and muted blue-gray palettes. Linear stands out with dark-first
+ and tight spacing. The rest feel interchangeable.
-        Here's my proposal — and where I'd play it safe vs. take risks:
+ Here's my proposal — and where I'd play it safe vs. take risks:
-        AESTHETIC: Industrial/Utilitarian — data-first, every pixel earns its place
-        TYPOGRAPHY: Geist (body + UI), Instrument Serif (display), JetBrains Mono (data)
-        COLOR: Restrained — slate neutrals, single teal accent (#0D9488)
-        SPACING: 4px base, compact density
-        LAYOUT: Grid-disciplined, 12-column
-        MOTION: Minimal-functional — only transitions that aid comprehension
+ AESTHETIC: Industrial/Utilitarian — data-first, every pixel earns its place
+ TYPOGRAPHY: Geist (body + UI), Instrument Serif (display), JetBrains Mono (data)
+ COLOR: Restrained — slate neutrals, single teal accent (#0D9488)
+ SPACING: 4px base, compact density
+ LAYOUT: Grid-disciplined, 12-column
+ MOTION: Minimal-functional — only transitions that aid comprehension
-        SAFE CHOICES (match category expectations):
-          Geist for body — your users already know this font from Vercel
-          Slate neutrals — warm grays feel right for data-heavy UIs
-          Grid-disciplined layout — dashboards need predictable alignment
+ SAFE CHOICES (match category expectations):
+ Geist for body — your users already know this font from Vercel
+ Slate neutrals — warm grays feel right for data-heavy UIs
+ Grid-disciplined layout — dashboards need predictable alignment
-        RISKS (where you'd stand out):
-          Instrument Serif for display headings — nobody in dev tools
-            uses a serif. It creates instant hierarchy contrast and
-            signals "we care about craft" in a space full of sans-serifs.
-          Single teal accent instead of blue — everyone else in this space uses blue.
-            Teal is close enough to feel trustworthy, different enough
-            to be recognizable.
-          4px compact density — tighter than Linear, says "we respect
-            your screen real estate."
+ RISKS (where you'd stand out):
+ Instrument Serif for display headings — nobody in dev tools
+ uses a serif. It creates instant hierarchy contrast and
+ signals "we care about craft" in a space full of sans-serifs.
+ Single teal accent instead of blue — everyone else in this space uses blue.
+ Teal is close enough to feel trustworthy, different enough
+ to be recognizable.
+ 4px compact density — tighter than Linear, says "we respect
+ your screen real estate."
-        The safe choices keep you literate. The risks are where your
-        product gets its own face. Which risks appeal to you?
+ The safe choices keep you literate. The risks are where your
+ product gets its own face. Which risks appeal to you?
-You:    Love the serif risk. Let's see it.
+You: Love the serif risk. Let's see it.
 Claude: Preview opened in your browser.
-        [/tmp/design-consultation-preview-1710000000.html]
+ [/tmp/design-consultation-preview-1710000000.html]
-        The page shows a realistic dashboard: sidebar nav, header with
-        avatar, stat cards, data table — all in your design system.
-        Instrument Serif at hero scale, Geist in body text, JetBrains
-        Mono in the data table. Teal accent on buttons and active states.
-        Light and dark mode toggle in the corner.
+ The page shows a realistic dashboard: sidebar nav, header with
+ avatar, stat cards, data table — all in your design system.
+ Instrument Serif at hero scale, Geist in body text, JetBrains
+ Mono in the data table. Teal accent on buttons and active states.
+ Light and dark mode toggle in the corner.
-        How does it feel? Want to adjust anything before I write DESIGN.md?
+ How does it feel? Want to adjust anything before I write DESIGN.md?
-You:    Ship it.
+You: Ship it.
 Claude: Wrote DESIGN.md (typography, color, spacing, layout, motion).
-        Updated CLAUDE.md with design system guidelines.
-        All future sessions will respect these choices.
+ Updated CLAUDE.md with design system guidelines.
+ All future sessions will respect these choices.
 ```
 ---
@@ -370,32 +379,134 @@ The self-regulation heuristic is tuned for design work — CSS-only changes get
 ### Example
 ```
-You:   /design-review https://myapp.com
+You: /design-review https://myapp.com
 Claude: [Runs full 80-item visual audit on the live site]
-        Design Score: C  |  AI Slop Score: D
-        12 findings (4 high, 5 medium, 3 polish)
+ Design Score: C | AI Slop Score: D
+ 12 findings (4 high, 5 medium, 3 polish)
+ Fixing 9 design issues...
+ style(design): FINDING-001 — replace 3-column icon grid with asymmetric layout
+ style(design): FINDING-002 — add heading scale 48/32/24/18/16
+ style(design): FINDING-003 — remove gradient hero, use bold typography
+ style(design): FINDING-004 — add second font for headings
+ style(design): FINDING-005 — vary border-radius by element role
+ style(design): FINDING-006 — left-align body text, reserve center for headings
+ style(design): FINDING-007 — add hover/focus states to all interactive elements
+ style(design): FINDING-008 — add prefers-reduced-motion media query
+ style(design): FINDING-009 — set max content width to 680px for body text
+ Final audit:
+ Design Score: C → B+ | AI Slop Score: D → A
+ 9 fixes applied (8 verified, 1 best-effort). 3 deferred.
+ [Report with before/after screenshots saved to .OpenGStack/design-reports/]
+```
+Nine commits, each touching one concern. The AI Slop score went from D to A because the three most recognizable patterns (gradient hero, 3-column grid, uniform radius) are gone.
+---
+## `/design-shotgun`
+This is my **design exploration mode**.
+You know the feeling. You have a feature, a page, a landing screen... and you're not sure what it should look like. You could describe it to Claude and get one answer. But one answer means one perspective, and design is a taste game. You need to see options.
+`/design-shotgun` generates 3 visual design variants using the GPT Image API, opens a comparison board in your browser, and waits for your feedback. You pick a direction, request changes, or ask for entirely new variants. The board supports remix, regenerate, and approval actions.
+### The loop
+1. You describe what you want (or point at an existing page)
+2. The skill reads your `DESIGN.md` for brand constraints (if it exists)
+3. It generates 3 distinct design variants as PNGs
+4. A comparison board opens in your browser with all 3 side-by-side
+5. You click "Approve" on the one you like, or give feedback for another round
+6. The approved variant saves to `~/.opengstack/projects/$SLUG/designs/` with an `approved.json`
+That `approved.json` is what `/design-html` reads. The design pipeline chains: shotgun picks the direction, design-html renders it as working code.
+### Taste memory
-        Fixing 9 design issues...
+The skill remembers your preferences across sessions. If you consistently prefer minimal designs over busy ones, it biases future generations. This isn't a setting you configure... it emerges from your approvals.
-        style(design): FINDING-001 — replace 3-column icon grid with asymmetric layout
-        style(design): FINDING-002 — add heading scale 48/32/24/18/16
-        style(design): FINDING-003 — remove gradient hero, use bold typography
-        style(design): FINDING-004 — add second font for headings
-        style(design): FINDING-005 — vary border-radius by element role
-        style(design): FINDING-006 — left-align body text, reserve center for headings
-        style(design): FINDING-007 — add hover/focus states to all interactive elements
-        style(design): FINDING-008 — add prefers-reduced-motion media query
-        style(design): FINDING-009 — set max content width to 680px for body text
+### Example
+```
+You: /design-shotgun — hero section for a developer tools landing page
+Claude: [Generates 3 variants]
+ Variant A: Bold typography, dark background, code snippet hero
+ Variant B: Split layout, product screenshot left, copy right
+ Variant C: Minimal, centered headline, gradient accent
+ [Opens comparison board at localhost:PORT]
-        Final audit:
-        Design Score: C → B+  |  AI Slop Score: D → A
-        9 fixes applied (8 verified, 1 best-effort). 3 deferred.
+You: [Clicks "Approve" on Variant A in the browser]
-        [Report with before/after screenshots saved to .gstack/design-reports/]
+Claude: Approved Variant A. Saved to ~/.opengstack/projects/myapp/designs/
+ Next: run /design-html to generate production HTML from this mockup.
 ```
-Nine commits, each touching one concern. The AI Slop score went from D to A because the three most recognizable patterns (gradient hero, 3-column grid, uniform radius) are gone.
+---
+## `/design-html`
+This is my **design-to-code mode**.
+Every AI code generation tool produces static CSS. Hardcoded heights. Text that overflows on resize. Breakpoints that snap instead of flowing. The output looks right at exactly one viewport size and breaks at every other.
+`/design-html` fixes this. It takes the approved mockup from `/design-shotgun` and generates HTML using [Pretext](https://github.com/chenglou/pretext) by Cheng Lou (ex-React core, Midjourney frontend). Pretext is a 15KB library that computes text layout without DOM measurement. Text reflows. Heights adjust to content. Cards size themselves. Chat bubbles shrinkwrap. All sub-millisecond, all dynamic.
+### Smart API routing
+Not every page needs the full Pretext engine. The skill reads the design and picks the right tools:
+- **Simple layouts** (landing, marketing): `prepare()` + `layout()` for resize-aware heights
+- **Card grids** (dashboard, listing): `prepare()` + `layout()` for self-sizing cards
+- **Chat UIs**: `walkLineRanges()` for tight-fit bubbles with zero wasted pixels
+- **Editorial layouts**: `layoutNextLine()` for text flowing around obstacles
+- **Complex editorial**: Full engine with `layoutWithLines()` for manual line rendering
+### The refinement loop
+1. Reads the approved mockup from `approved.json`
+2. Uses GPT-4o vision to extract implementation spec (colors, typography, layout)
+3. Generates self-contained HTML with Pretext inlined (15KB, zero network dependency)
+4. Spins up a live-reload server so you see changes instantly
+5. Screenshots at 3 viewports (mobile, tablet, desktop) to verify layout
+6. AskUserQuestion: what needs to change?
+7. Surgical edits via the Edit tool (not full regeneration)
+8. Repeat until you say "done"
+### Framework detection
+If your project uses React, Svelte, or Vue (detected from `package.json`), the skill offers to generate a framework component instead of vanilla HTML. Framework output uses `npm install @chenglou/pretext` instead of inline vendoring.
+### Example
+```
+You: /design-html
+Claude: Found approved mockup: variant-A.png (hero section)
+ Design type: simple layout → using prepare() + layout()
+ [Generates HTML with Pretext, opens live-reload server]
+ [Screenshots at 375px, 768px, 1440px — all verified]
+ The HTML is live at localhost:3456. Resize your browser.
+ Text reflows dynamically. What needs to change?
+You: Make the headline bigger and add more spacing below the CTA
+Claude: [Edits font-size and margin, browser auto-reloads]
+ Updated. Two changes applied.
+You: Done
+Claude: Saved to ~/.opengstack/projects/myapp/designs/hero-20260330/finalized.html
+```
 ---
@@ -481,18 +592,18 @@ When `/qa` fixes a bug and verifies it, it automatically generates a regression
 ### Example
 ```
-You:   /qa https://staging.myapp.com
+You: /qa https://staging.myapp.com
 Claude: [Explores 12 pages, fills 3 forms, tests 2 flows]
-        QA Report: staging.myapp.com — Health Score: 72/100
+ QA Report: staging.myapp.com — Health Score: 72/100
-        Top 3 Issues:
-        1. CRITICAL: Checkout form submits with empty required fields
-        2. HIGH: Mobile nav menu doesn't close after selecting an item
-        3. MEDIUM: Dashboard chart overlaps sidebar below 1024px
+ Top 3 Issues:
+ 1. CRITICAL: Checkout form submits with empty required fields
+ 2. HIGH: Mobile nav menu doesn't close after selecting an item
+ 3. MEDIUM: Dashboard chart overlaps sidebar below 1024px
-        [Full report with screenshots saved to .gstack/qa-reports/]
+ [Full report with screenshots saved to .OpenGStack/qa-reports/]
 ```
 **Testing authenticated pages:** Use `/setup-browser-cookies` first to import your real browser sessions, then `/qa` can test pages behind login.
@@ -525,6 +636,82 @@ A lot of branches die when the interesting work is done and only the boring rele
 ---
+## `/land-and-deploy`
+This is my **deploy pipeline mode**.
+`/ship` creates the PR. `/land-and-deploy` finishes the job: merge, deploy, verify.
+It merges the PR, waits for CI, waits for the deploy to finish, then runs canary checks against production. One command from "approved" to "verified in production." If the deploy breaks, it tells you what failed and whether to rollback.
+First run on a new project triggers a dry-run walk-through so you can verify the pipeline before it does anything irreversible. After that, it trusts the config and runs straight through.
+### Setup
+Run `/setup-deploy` first. It detects your platform (Fly.io, Render, Vercel, Netlify, Heroku, GitHub Actions, or custom), discovers your production URL and health check endpoints, and writes the config to CLAUDE.md. One-time, 60 seconds.
+### Example
+```
+You: /land-and-deploy
+Claude: Merging PR #42...
+ CI: 3/3 checks passed
+ Deploy: Fly.io — deploying v2.1.0...
+ Health check: https://myapp.fly.dev/health → 200 OK
+ Canary: 5 pages checked, 0 console errors, p95 < 800ms
+ Production verified. v2.1.0 is live.
+```
+---
+## `/canary`
+This is my **post-deploy monitoring mode**.
+After deploy, `/canary` watches the live site for trouble. It loops through your key pages using the browse daemon, checking for console errors, performance regressions, page failures, and visual anomalies. Takes periodic screenshots and compares against pre-deploy baselines.
+Use it right after `/land-and-deploy`, or schedule it to run periodically after a risky deploy.
+```
+You: /canary https://myapp.com
+Claude: Monitoring 8 pages every 2 minutes...
+ Cycle 1: ✓ All pages healthy. p95: 340ms. 0 console errors.
+ Cycle 2: ✓ All pages healthy. p95: 380ms. 0 console errors.
+ Cycle 3: ⚠ /dashboard — new console error: "TypeError: Cannot read
+ property 'map' of undefined" at dashboard.js:142
+ Screenshot saved.
+ Alert: 1 new console error after 3 monitoring cycles.
+```
+---
+## `/benchmark`
+This is my **performance engineer mode**.
+`/benchmark` establishes performance baselines for your pages: load time, Core Web Vitals (LCP, CLS, INP), resource counts, and total transfer size. Run it before and after a PR to catch regressions.
+It uses the browse daemon for real Chromium measurements, not synthetic estimates. Multiple runs averaged. Results persist so you can track trends across PRs.
+```
+You: /benchmark https://myapp.com
+Claude: Benchmarking 5 pages (3 runs each)...
+ / load: 1.2s LCP: 0.9s CLS: 0.01 resources: 24 (890KB)
+ /dashboard load: 2.1s LCP: 1.8s CLS: 0.03 resources: 31 (1.4MB)
+ /settings load: 0.8s LCP: 0.6s CLS: 0.00 resources: 18 (420KB)
+ Baseline saved. Run again after changes to compare.
+```
+---
 ## `/cso`
 This is my **Chief Security Officer**.
@@ -532,16 +719,16 @@ This is my **Chief Security Officer**.
 Run `/cso` on any codebase and it performs an OWASP Top 10 + STRIDE threat model audit. It scans for injection vulnerabilities, broken authentication, sensitive data exposure, XML external entities, broken access control, security misconfiguration, XSS, insecure deserialization, known-vulnerable components, and insufficient logging. Each finding includes severity, evidence, and a recommended fix.
 ```
-You:   /cso
+You: /cso
 Claude: Running OWASP Top 10 + STRIDE security audit...
-        CRITICAL: SQL injection in user search (app/models/user.rb:47)
-        HIGH: Session tokens stored in localStorage (app/frontend/auth.ts:12)
-        MEDIUM: Missing rate limiting on /api/login endpoint
-        LOW: X-Frame-Options header not set
+ CRITICAL: SQL injection in user search (app/models/user.rb:47)
+ HIGH: Session tokens stored in localStorage (app/frontend/auth.ts:12)
+ MEDIUM: Missing rate limiting on /api/login endpoint
+ LOW: X-Frame-Options header not set
-        4 findings across 12 files scanned. 1 critical, 1 high.
+ 4 findings across 12 files scanned. 1 critical, 1 high.
 ```
 ---
@@ -553,16 +740,16 @@ This is my **technical writer mode**.
 After `/ship` creates the PR but before it merges, `/document-release` reads every documentation file in the project and cross-references it against the diff. It updates file paths, command lists, project structure trees, and anything else that drifted. Risky or subjective changes get surfaced as questions — everything else is handled automatically.
 ```
-You:   /document-release
+You: /document-release
 Claude: Analyzing 21 files changed across 3 commits. Found 8 documentation files.
-        README.md: updated skill count from 9 to 10, added new skill to table
-        CLAUDE.md: added new directory to project structure
-        CONTRIBUTING.md: current — no changes needed
-        TODOS.md: marked 2 items complete, added 1 new item
+ README.md: updated skill count from 9 to 10, added new skill to table
+ CLAUDE.md: added new directory to project structure
+ CONTRIBUTING.md: current — no changes needed
+ TODOS.md: marked 2 items complete, added 1 new item
-        All docs updated and committed. PR body updated with doc diff.
+ All docs updated and committed. PR body updated with doc diff.
 ```
 It also polishes CHANGELOG voice (without ever overwriting entries), cleans up completed TODOS, checks cross-doc consistency, and asks about VERSION bumps only when appropriate.
@@ -582,27 +769,27 @@ It also tracks test health: total test files, tests added this period, regressio
 ### Example
 ```
-You:   /retro
+You: /retro
 Claude: Week of Mar 1: 47 commits (3 contributors), 3.2k LOC, 38% tests, 12 PRs, peak: 10pm | Streak: 47d
-        ## Your Week
-        32 commits, +2.4k LOC, 41% tests. Peak hours: 9-11pm.
-        Biggest ship: cookie import system (browser decryption + picker UI).
-        What you did well: shipped a complete feature with encryption, UI, and
-        18 unit tests in one focused push...
+ ## Your Week
+ 32 commits, +2.4k LOC, 41% tests. Peak hours: 9-11pm.
+ Biggest ship: cookie import system (browser decryption + picker UI).
+ What you did well: shipped a complete feature with encryption, UI, and
+ 18 unit tests in one focused push...
-        ## Team Breakdown
+ ## Team Breakdown
-        ### Alice
-        12 commits focused on app/services/. Every PR under 200 LOC — disciplined.
-        Opportunity: test ratio at 12% — worth investing before payment gets more complex.
+ ### Alice
+ 12 commits focused on app/services/. Every PR under 200 LOC — disciplined.
+ Opportunity: test ratio at 12% — worth investing before payment gets more complex.
-        ### Bob
-        3 commits — fixed the N+1 query on dashboard. Small but high-impact.
-        Opportunity: only 1 active day this week — check if blocked on anything.
+ ### Bob
+ 3 commits — fixed the N+1 query on dashboard. Small but high-impact.
+ Opportunity: only 1 active day this week — check if blocked on anything.
-        [Top 3 team wins, 3 things to improve, 3 habits for next week]
+ [Top 3 team wins, 3 things to improve, 3 habits for next week]
 ```
 It saves a JSON snapshot to `.context/retros/` so the next run can show trends.
@@ -620,30 +807,30 @@ It is a compiled binary that talks to a persistent Chromium daemon — built on
 ### Example
 ```
-You:   /browse staging.myapp.com — log in, test the signup flow, and check
-       every page I changed in this branch
+You: /browse staging.myapp.com — log in, test the signup flow, and check
+ every page I changed in this branch
 Claude: [18 tool calls, ~60 seconds]
-        > browse goto https://staging.myapp.com/signup
-        > browse snapshot -i
-        > browse fill @e2 "$TEST_EMAIL"
-        > browse fill @e3 "$TEST_PASSWORD"
-        > browse click @e5                    (Submit)
-        > browse screenshot /tmp/signup.png
-        > Read /tmp/signup.png
+ > browse goto https://staging.myapp.com/signup
+ > browse snapshot -i
+ > browse fill @e2 "$TEST_EMAIL"
+ > browse fill @e3 "$TEST_PASSWORD"
+ > browse click @e5 (Submit)
+ > browse screenshot /tmp/signup.png
+ > Read /tmp/signup.png
-        Signup works. Redirected to onboarding. Now checking changed pages.
+ Signup works. Redirected to onboarding. Now checking changed pages.
-        > browse goto https://staging.myapp.com/dashboard
-        > browse screenshot /tmp/dashboard.png
-        > Read /tmp/dashboard.png
-        > browse console
+ > browse goto https://staging.myapp.com/dashboard
+ > browse screenshot /tmp/dashboard.png
+ > Read /tmp/dashboard.png
+ > browse console
-        Dashboard loads. No console errors. Charts render with sample data.
+ Dashboard loads. No console errors. Charts render with sample data.
-        All 4 pages load correctly. No console errors. No broken layouts.
-        Signup → onboarding → dashboard flow works end to end.
+ All 4 pages load correctly. No console errors. No broken layouts.
+ Signup → onboarding → dashboard flow works end to end.
 ```
 18 tool calls, about a minute. Full QA pass. No browser opened.
@@ -657,19 +844,19 @@ When the headless browser gets stuck — CAPTCHA, MFA, complex auth — hand off
 ```
 Claude: I'm stuck on a CAPTCHA at the login page. Opening a visible
-        Chrome so you can solve it.
+ Chrome so you can solve it.
-        > browse handoff "Stuck on CAPTCHA at login page"
+ > browse handoff "Stuck on CAPTCHA at login page"
-        Chrome opened at https://app.example.com/login with all your
-        cookies and tabs intact. Solve the CAPTCHA and tell me when
-        you're done.
+ Chrome opened at https://app.example.com/login with all your
+ cookies and tabs intact. Solve the CAPTCHA and tell me when
+ you're done.
-You:    done
+You: done
 Claude: > browse resume
-        Got a fresh snapshot. Logged in successfully. Continuing QA.
+ Got a fresh snapshot. Logged in successfully. Continuing QA.
 ```
 The browser preserves all state (cookies, localStorage, tabs) across the handoff. After `resume`, the agent gets a fresh snapshot of wherever you left off. If the browse tool fails 3 times in a row, it automatically suggests using `handoff`.
@@ -689,14 +876,14 @@ Before `/qa` or `/browse` can test authenticated pages, they need cookies. Inste
 It auto-detects installed Chromium browsers (Comet, Chrome, Arc, Brave, Edge), decrypts cookies via the macOS Keychain, and loads them into the Playwright session. An interactive picker UI lets you choose exactly which domains to import — no cookie values are ever displayed.
 ```
-You:   /setup-browser-cookies
+You: /setup-browser-cookies
 Claude: Cookie picker opened — select the domains you want to import
-        in your browser, then tell me when you're done.
+ in your browser, then tell me when you're done.
-        [You pick github.com, myapp.com in the browser UI]
+ [You pick github.com, myapp.com in the browser UI]
-You:    done
+You: done
 Claude: Imported 2 domains (47 cookies). Session is ready.
 ```
@@ -704,13 +891,107 @@ Claude: Imported 2 domains (47 cookies). Session is ready.
 Or skip the UI entirely:
 ```
-You:   /setup-browser-cookies github.com
+You: /setup-browser-cookies github.com
 Claude: Imported 12 cookies for github.com from Comet.
 ```
 ---
+## `/autoplan`
+This is my **review autopilot mode**.
+Running `/plan-ceo-review`, then `/plan-design-review`, then `/plan-eng-review` individually means answering 15-30 intermediate questions. Each question is valuable, but sometimes you want the gauntlet to run without stopping for every decision.
+`/autoplan` reads all three review skills from disk and runs them sequentially: CEO → Design → Eng. It makes decisions automatically using six encoded principles (prefer completeness, match existing patterns, choose reversible options, prefer the option the user chose for similar past decisions, defer ambiguous items, and escalate security). Taste decisions (close approaches, borderline scope expansions, cross-model disagreements) get saved and presented at a final approval gate.
+One command, fully reviewed plan out.
+```
+You: /autoplan
+Claude: Running CEO review... [4 scope decisions auto-resolved]
+ Running design review... [3 design dimensions auto-scored]
+ Running eng review... [2 architecture decisions auto-resolved]
+ TASTE DECISIONS (need your input):
+ 1. Scope: Codex suggested adding search — borderline expansion. Add?
+ 2. Design: Two approaches scored within 1 point. Which feels right?
+ [Shows both options with context]
+You: 1) Yes, add search. 2) Option A.
+Claude: Plan complete. 9 decisions auto-resolved, 2 taste decisions approved.
+```
+---
+## `/learn`
+This is my **institutional memory mode**.
+opengstack learns from every session. Patterns, pitfalls, preferences, architectural decisions... they accumulate in `~/.opengstack/projects/$SLUG/learnings.jsonl`. Each learning has a confidence score, source attribution, and the files it references.
+`/learn` lets you see what opengstack has absorbed, search for specific patterns, prune stale entries (when referenced files no longer exist), and export learnings for team sharing. The real magic is in other skills... they automatically search learnings before making recommendations, and display "Prior learning applied" when a past insight is relevant.
+```
+You: /learn
+Claude: 23 learnings for this project (14 high confidence, 6 medium, 3 low)
+ Top patterns:
+ - [9/10] API responses always wrapped in { data, error } envelope
+ - [8/10] Tests use factory helpers in test/support/factories.ts
+ - [8/10] All DB queries go through repository pattern, never direct
+ 3 potentially stale (referenced files deleted):
+ - "auth middleware uses JWT" — auth/middleware.ts was deleted
+ [Prune these? Y/N]
+```
+---
+## `/connect-chrome`
+This is my **co-presence mode**.
+`/browse` runs headless by default. You don't see what the agent sees. `/connect-chrome` changes that. It launches your actual Chrome browser controlled by Playwright, with the opengstack Side Panel extension auto-loaded. You watch every action in real time... same screen, same window.
+A subtle green shimmer at the top edge tells you which Chrome window opengstack controls. All existing browse commands work unchanged. The Side Panel shows a live activity feed of every command and a chat sidebar where you can direct Claude with natural language instructions.
+```
+You: /connect-chrome
+Claude: Launched Chrome with Side Panel extension.
+ Green shimmer indicates the controlled window.
+ All $B commands now run in headed mode.
+ Type in the Side Panel to direct the browser agent.
+```
+---
+## `/setup-deploy`
+One-time deploy configuration. Run this before your first `/land-and-deploy`.
+It auto-detects your deploy platform (Fly.io, Render, Vercel, Netlify, Heroku, GitHub Actions, or custom), discovers your production URL, health check endpoints, and deploy status commands. Writes everything to CLAUDE.md so all future deploys are automatic.
+```
+You: /setup-deploy
+Claude: Detected: Fly.io (fly.toml found)
+ Production URL: https://myapp.fly.dev
+ Health check: /health → expects 200
+ Deploy command: fly deploy
+ Status command: fly status
+ Written to CLAUDE.md. Run /land-and-deploy when ready.
+```
+---
 ## `/codex`
 This is my **second opinion mode**.
@@ -730,20 +1011,20 @@ When `/review` catches bugs from Claude's perspective, `/codex` brings a complet
 When both `/review` (Claude) and `/codex` (OpenAI) have reviewed the same branch, you get a cross-model comparison: which findings overlap (high confidence), which are unique to Codex (different perspective), and which are unique to Claude. This is the "two doctors, same patient" approach to code review.
 ```
-You:   /codex review
+You: /codex review
 Claude: Running independent Codex review...
-        CODEX REVIEW: PASS (3 findings)
-        [P2] Race condition in payment handler — concurrent charges
-             can double-debit without advisory lock
-        [P3] Missing null check on user.email before downcase
-        [P3] Token comparison not using constant-time compare
+ CODEX REVIEW: PASS (3 findings)
+ [P2] Race condition in payment handler — concurrent charges
+ can double-debit without advisory lock
+ [P3] Missing null check on user.email before downcase
+ [P3] Token comparison not using constant-time compare
-        Cross-model analysis (vs /review):
-        OVERLAP: Race condition in payment handler (both caught it)
-        UNIQUE TO CODEX: Token comparison timing attack
-        UNIQUE TO CLAUDE: N+1 query in listing photos
+ Cross-model analysis (vs /review):
+ OVERLAP: Race condition in payment handler (both caught it)
+ UNIQUE TO CODEX: Token comparison timing attack
+ UNIQUE TO CLAUDE: N+1 query in listing photos
 ```
 ---
@@ -775,14 +1056,14 @@ Restrict all file edits to a single directory. When you're debugging a billing b
 `/investigate` activates this automatically — it detects the module being debugged and freezes edits to that directory.
 ```
-You:   /freeze src/billing
+You: /freeze src/billing
 Claude: Edits restricted to src/billing/. Run /unfreeze to remove.
-        [Later, Claude tries to edit src/auth/middleware.ts]
+ [Later, Claude tries to edit src/auth/middleware.ts]
 Claude: BLOCKED — Edit outside freeze boundary (src/billing/).
-        Skipping this change.
+ Skipping this change.
 ```
 Note: this blocks Edit and Write tools only. Bash commands like `sed` can still modify files outside the boundary — it's accident prevention, not a security sandbox.
@@ -797,27 +1078,27 @@ Remove the `/freeze` boundary, allowing edits everywhere again. The hooks stay r
 ---
-## `/gstack-upgrade`
+## `/opengstack-upgrade`
-Keep gstack current with one command. It detects your install type (global at `~/.claude/skills/gstack` vs vendored in your project at `.claude/skills/gstack`), runs the upgrade, syncs both copies if you have dual installs, and shows you what changed.
+Keep opengstack current with one command. It detects your install type (global at `~/.claude/skills/opengstack` vs vendored in your project at `.claude/skills/opengstack`), runs the upgrade, syncs both copies if you have dual installs, and shows you what changed.
 ```
-You:   /gstack-upgrade
+You: /opengstack-upgrade
 Claude: Current version: 0.7.4
-        Latest version: 0.8.2
+ Latest version: 0.8.2
-        What's new:
-        - Browse handoff for CAPTCHAs and auth walls
-        - /codex multi-AI second opinion
-        - /qa always uses browser now
-        - Safety skills: /careful, /freeze, /guard
-        - Proactive skill suggestions
+ What's new:
+ - Browse handoff for CAPTCHAs and auth walls
+ - /codex multi-AI second opinion
+ - /qa always uses browser now
+ - Safety skills: /careful, /freeze, /guard
+ - Proactive skill suggestions
-        Upgraded to 0.8.2. Both global and project installs synced.
+ Upgraded to 0.8.2. Both global and project installs synced.
 ```
-Set `auto_upgrade: true` in `~/.gstack/config.yaml` to skip the prompt entirely — gstack upgrades silently at the start of each session when a new version is available.
+Set `auto_upgrade: true` in `~/.opengstack/config.yaml` to skip the prompt entirely — opengstack upgrades silently at the start of each session when a new version is available.
 ---
@@ -827,13 +1108,13 @@ Set `auto_upgrade: true` in `~/.gstack/config.yaml` to skip the prompt entirely
 ### Setup
-Install Greptile on your GitHub repo at [greptile.com](https://greptile.com) — it takes about 30 seconds. Once it's reviewing your PRs, gstack picks up its comments automatically. No additional configuration.
+Install Greptile on your GitHub repo at [greptile.com](https://greptile.com) — it takes about 30 seconds. Once it's reviewing your PRs, opengstack picks up its comments automatically. No additional configuration.
 ### How it works
 The problem with any automated reviewer is triage. Greptile is good, but not every comment is a real issue. Some are false positives. Some flag things you already fixed three commits ago. Without a triage layer, the comments pile up and you start ignoring them — which defeats the purpose.
-gstack solves this. `/review` and `/ship` are now Greptile-aware. They read Greptile's comments, classify each one, and take action:
+opengstack solves this. `/review` and `/ship` are now Greptile-aware. They read Greptile's comments, classify each one, and take action:
 - **Valid issues** get added to the critical findings and fixed before shipping
 - **Already-fixed issues** get an auto-reply acknowledging the catch
@@ -843,38 +1124,38 @@ The result is a two-layer review: Greptile catches things asynchronously on the
 ### Learning from history
-Every false positive you confirm gets saved to `~/.gstack/greptile-history.md`. Future runs auto-skip known FP patterns for your codebase. And `/retro` tracks Greptile's batting average over time — so you can see whether the signal-to-noise ratio is improving.
+Every false positive you confirm gets saved to `~/.opengstack/greptile-history.md`. Future runs auto-skip known FP patterns for your codebase. And `/retro` tracks Greptile's batting average over time — so you can see whether the signal-to-noise ratio is improving.
 ### Example
 ```
-You:   /ship
+You: /ship
 Claude: [syncs main, runs tests, pre-landing review...]
-        Greptile found 3 comments on this PR:
+ Greptile found 3 comments on this PR:
-        [VALID] app/services/payment_service.rb:47 — Race condition:
-        concurrent charges can double-debit. Recommend DB-level advisory
-        lock around the charge block.
+ [VALID] app/services/payment_service.rb:47 — Race condition:
+ concurrent charges can double-debit. Recommend DB-level advisory
+ lock around the charge block.
-        [ALREADY FIXED] app/models/user.rb:12 — Missing null check
-        on email before downcase.
-        → Already fixed in commit abc1234. Auto-replying.
+ [ALREADY FIXED] app/models/user.rb:12 — Missing null check
+ on email before downcase.
+ → Already fixed in commit abc1234. Auto-replying.
-        [FALSE POSITIVE] lib/auth.rb:88 — "Token comparison should
-        use constant-time comparison." We already use
-        ActiveSupport::SecurityUtils.secure_compare here.
+ [FALSE POSITIVE] lib/auth.rb:88 — "Token comparison should
+ use constant-time comparison." We already use
+ ActiveSupport::SecurityUtils.secure_compare here.
-        Fixing the race condition in payment_service.rb...
+ Fixing the race condition in payment_service.rb...
-        The auth token comment is a false positive.
-        Options: A) Reply to Greptile explaining  B) Fix anyway  C) Ignore
+ The auth token comment is a false positive.
+ Options: A) Reply to Greptile explaining B) Fix anyway C) Ignore
-You:    A
+You: A
 Claude: Replied to Greptile. All tests pass.
-        PR: github.com/you/app/pull/42
+ PR: github.com/you/app/pull/42
 ```
 Three Greptile comments. One real fix. One auto-acknowledged. One false positive pushed back with a reply. Total extra time: about 30 seconds.