npm - opengstack - Versions diffs - 0.13.10 → 0.14.2 - Mend

opengstack 0.13.10 → 0.14.2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (189) hide show

package/AGENTS.md +4 -4
package/CLAUDE.md +127 -110
package/README.md +10 -5
package/SKILL.md +500 -70
package/bin/opengstack.js +69 -69
package/{skills/land-and-deploy/SKILL.md → commands/autoplan.md} +7 -25
package/{skills/benchmark/SKILL.md → commands/benchmark.md} +84 -108
package/{skills/browse/SKILL.md → commands/browse.md} +60 -81
package/{skills/ship/SKILL.md → commands/canary.md} +7 -27
package/{skills/careful/SKILL.md → commands/careful.md} +2 -22
package/{skills/canary/SKILL.md → commands/codex.md} +7 -26
package/{skills/connect-chrome/SKILL.md → commands/connect-chrome.md} +7 -24
package/commands/cso.md +70 -0
package/commands/design-consultation.md +70 -0
package/commands/design-review.md +70 -0
package/commands/design-shotgun.md +70 -0
package/commands/document-release.md +70 -0
package/{skills/freeze/SKILL.md → commands/freeze.md} +3 -29
package/{skills/guard/SKILL.md → commands/guard.md} +4 -35
package/commands/investigate.md +70 -0
package/commands/land-and-deploy.md +70 -0
package/commands/office-hours.md +70 -0
package/{skills/gstack-upgrade/SKILL.md → commands/opengstack-upgrade.md} +64 -79
package/commands/plan-ceo-review.md +70 -0
package/commands/plan-design-review.md +70 -0
package/commands/plan-eng-review.md +70 -0
package/commands/qa-only.md +70 -0
package/commands/qa.md +70 -0
package/commands/retro.md +70 -0
package/commands/review.md +70 -0
package/{skills/setup-browser-cookies/SKILL.md → commands/setup-browser-cookies.md} +22 -40
package/commands/setup-deploy.md +70 -0
package/commands/ship.md +70 -0
package/commands/unfreeze.md +25 -0
package/docs/designs/CHROME_VS_CHROMIUM_EXPLORATION.md +9 -9
package/docs/designs/CONDUCTOR_CHROME_SIDEBAR_INTEGRATION.md +2 -2
package/docs/designs/CONDUCTOR_SESSION_API.md +16 -16
package/docs/designs/DESIGN_SHOTGUN.md +74 -74
package/docs/designs/DESIGN_TOOLS_V1.md +111 -111
package/docs/skills.md +483 -202
package/package.json +42 -43
package/scripts/analytics.ts +188 -0
package/scripts/dev-skill.ts +83 -0
package/scripts/discover-skills.ts +39 -0
package/scripts/eval-compare.ts +97 -0
package/scripts/eval-list.ts +117 -0
package/scripts/eval-select.ts +86 -0
package/scripts/eval-summary.ts +188 -0
package/scripts/eval-watch.ts +172 -0
package/scripts/gen-skill-docs.ts +473 -0
package/scripts/resolvers/browse.ts +129 -0
package/scripts/resolvers/codex-helpers.ts +133 -0
package/scripts/resolvers/composition.ts +48 -0
package/scripts/resolvers/confidence.ts +37 -0
package/scripts/resolvers/constants.ts +50 -0
package/scripts/resolvers/design.ts +950 -0
package/scripts/resolvers/index.ts +59 -0
package/scripts/resolvers/learnings.ts +96 -0
package/scripts/resolvers/preamble.ts +505 -0
package/scripts/resolvers/review.ts +884 -0
package/scripts/resolvers/testing.ts +573 -0
package/scripts/resolvers/types.ts +45 -0
package/scripts/resolvers/utility.ts +421 -0
package/scripts/skill-check.ts +190 -0
package/scripts/cleanup.py +0 -100
package/scripts/filter-skills.sh +0 -114
package/scripts/filter_skills.py +0 -164
package/scripts/install-skills.js +0 -60
package/skills/autoplan/SKILL.md +0 -96
package/skills/autoplan/SKILL.md.tmpl +0 -694
package/skills/benchmark/SKILL.md.tmpl +0 -222
package/skills/browse/SKILL.md.tmpl +0 -131
package/skills/browse/bin/find-browse +0 -21
package/skills/browse/bin/remote-slug +0 -14
package/skills/browse/scripts/build-node-server.sh +0 -48
package/skills/browse/src/activity.ts +0 -208
package/skills/browse/src/browser-manager.ts +0 -959
package/skills/browse/src/buffers.ts +0 -137
package/skills/browse/src/bun-polyfill.cjs +0 -109
package/skills/browse/src/cli.ts +0 -678
package/skills/browse/src/commands.ts +0 -128
package/skills/browse/src/config.ts +0 -150
package/skills/browse/src/cookie-import-browser.ts +0 -625
package/skills/browse/src/cookie-picker-routes.ts +0 -230
package/skills/browse/src/cookie-picker-ui.ts +0 -688
package/skills/browse/src/find-browse.ts +0 -61
package/skills/browse/src/meta-commands.ts +0 -550
package/skills/browse/src/platform.ts +0 -17
package/skills/browse/src/read-commands.ts +0 -358
package/skills/browse/src/server.ts +0 -1192
package/skills/browse/src/sidebar-agent.ts +0 -280
package/skills/browse/src/sidebar-utils.ts +0 -21
package/skills/browse/src/snapshot.ts +0 -407
package/skills/browse/src/url-validation.ts +0 -95
package/skills/browse/src/write-commands.ts +0 -364
package/skills/browse/test/activity.test.ts +0 -120
package/skills/browse/test/adversarial-security.test.ts +0 -32
package/skills/browse/test/browser-manager-unit.test.ts +0 -17
package/skills/browse/test/bun-polyfill.test.ts +0 -72
package/skills/browse/test/commands.test.ts +0 -2075
package/skills/browse/test/compare-board.test.ts +0 -342
package/skills/browse/test/config.test.ts +0 -316
package/skills/browse/test/cookie-import-browser.test.ts +0 -519
package/skills/browse/test/cookie-picker-routes.test.ts +0 -260
package/skills/browse/test/file-drop.test.ts +0 -271
package/skills/browse/test/find-browse.test.ts +0 -50
package/skills/browse/test/findport.test.ts +0 -191
package/skills/browse/test/fixtures/basic.html +0 -33
package/skills/browse/test/fixtures/cursor-interactive.html +0 -22
package/skills/browse/test/fixtures/dialog.html +0 -15
package/skills/browse/test/fixtures/empty.html +0 -2
package/skills/browse/test/fixtures/forms.html +0 -55
package/skills/browse/test/fixtures/iframe.html +0 -30
package/skills/browse/test/fixtures/network-idle.html +0 -30
package/skills/browse/test/fixtures/qa-eval-checkout.html +0 -108
package/skills/browse/test/fixtures/qa-eval-spa.html +0 -98
package/skills/browse/test/fixtures/qa-eval.html +0 -51
package/skills/browse/test/fixtures/responsive.html +0 -49
package/skills/browse/test/fixtures/snapshot.html +0 -55
package/skills/browse/test/fixtures/spa.html +0 -24
package/skills/browse/test/fixtures/states.html +0 -17
package/skills/browse/test/fixtures/upload.html +0 -25
package/skills/browse/test/gstack-config.test.ts +0 -138
package/skills/browse/test/gstack-update-check.test.ts +0 -514
package/skills/browse/test/handoff.test.ts +0 -235
package/skills/browse/test/path-validation.test.ts +0 -91
package/skills/browse/test/platform.test.ts +0 -37
package/skills/browse/test/server-auth.test.ts +0 -65
package/skills/browse/test/sidebar-agent-roundtrip.test.ts +0 -226
package/skills/browse/test/sidebar-agent.test.ts +0 -199
package/skills/browse/test/sidebar-integration.test.ts +0 -320
package/skills/browse/test/sidebar-unit.test.ts +0 -96
package/skills/browse/test/snapshot.test.ts +0 -467
package/skills/browse/test/state-ttl.test.ts +0 -35
package/skills/browse/test/test-server.ts +0 -57
package/skills/browse/test/url-validation.test.ts +0 -72
package/skills/browse/test/watch.test.ts +0 -129
package/skills/canary/SKILL.md.tmpl +0 -212
package/skills/careful/SKILL.md.tmpl +0 -56
package/skills/careful/bin/check-careful.sh +0 -112
package/skills/codex/SKILL.md +0 -90
package/skills/codex/SKILL.md.tmpl +0 -417
package/skills/connect-chrome/SKILL.md.tmpl +0 -195
package/skills/cso/ACKNOWLEDGEMENTS.md +0 -14
package/skills/cso/SKILL.md +0 -93
package/skills/cso/SKILL.md.tmpl +0 -606
package/skills/design-consultation/SKILL.md +0 -94
package/skills/design-consultation/SKILL.md.tmpl +0 -415
package/skills/design-review/SKILL.md +0 -94
package/skills/design-review/SKILL.md.tmpl +0 -290
package/skills/design-shotgun/SKILL.md +0 -91
package/skills/design-shotgun/SKILL.md.tmpl +0 -285
package/skills/document-release/SKILL.md +0 -91
package/skills/document-release/SKILL.md.tmpl +0 -359
package/skills/freeze/SKILL.md.tmpl +0 -77
package/skills/freeze/bin/check-freeze.sh +0 -79
package/skills/gstack-upgrade/SKILL.md.tmpl +0 -222
package/skills/guard/SKILL.md.tmpl +0 -77
package/skills/investigate/SKILL.md +0 -105
package/skills/investigate/SKILL.md.tmpl +0 -194
package/skills/land-and-deploy/SKILL.md.tmpl +0 -881
package/skills/office-hours/SKILL.md +0 -96
package/skills/office-hours/SKILL.md.tmpl +0 -645
package/skills/plan-ceo-review/SKILL.md +0 -94
package/skills/plan-ceo-review/SKILL.md.tmpl +0 -811
package/skills/plan-design-review/SKILL.md +0 -92
package/skills/plan-design-review/SKILL.md.tmpl +0 -446
package/skills/plan-eng-review/SKILL.md +0 -93
package/skills/plan-eng-review/SKILL.md.tmpl +0 -303
package/skills/qa/SKILL.md +0 -95
package/skills/qa/SKILL.md.tmpl +0 -316
package/skills/qa/references/issue-taxonomy.md +0 -85
package/skills/qa/templates/qa-report-template.md +0 -126
package/skills/qa-only/SKILL.md +0 -89
package/skills/qa-only/SKILL.md.tmpl +0 -101
package/skills/retro/SKILL.md +0 -89
package/skills/retro/SKILL.md.tmpl +0 -820
package/skills/review/SKILL.md +0 -92
package/skills/review/SKILL.md.tmpl +0 -281
package/skills/review/TODOS-format.md +0 -62
package/skills/review/checklist.md +0 -220
package/skills/review/design-checklist.md +0 -132
package/skills/review/greptile-triage.md +0 -220
package/skills/setup-browser-cookies/SKILL.md.tmpl +0 -81
package/skills/setup-deploy/SKILL.md +0 -92
package/skills/setup-deploy/SKILL.md.tmpl +0 -215
package/skills/ship/SKILL.md.tmpl +0 -636
package/skills/unfreeze/SKILL.md +0 -37
package/skills/unfreeze/SKILL.md.tmpl +0 -36

package/skills/codex/SKILL.md.tmpl DELETED Viewed

@@ -1,417 +0,0 @@
----
-name: codex
-preamble-tier: 3
-version: 1.0.0
-description: |
-  OpenAI Codex CLI wrapper — three modes. Code review: independent diff review via
-  codex review with pass/fail gate. Challenge: adversarial mode that tries to break
-  your code. Consult: ask codex anything with session continuity for follow-ups.
-  The "200 IQ autistic developer" second opinion. Use when asked to "codex review",
-  "codex challenge", "ask codex", "second opinion", or "consult codex".
-allowed-tools:
-  - Bash
-  - Read
-  - Write
-  - Glob
-  - Grep
-  - AskUserQuestion
----
-{{PREAMBLE}}
-{{BASE_BRANCH_DETECT}}
-# /codex — Multi-AI Second Opinion
-You are running the `/codex` skill. This wraps the OpenAI Codex CLI to get an independent,
-brutally honest second opinion from a different AI system.
-Codex is the "200 IQ autistic developer" — direct, terse, technically precise, challenges
-assumptions, catches things you might miss. Present its output faithfully, not summarized.
----
-## Step 0: Check codex binary
-```bash
-CODEX_BIN=$(which codex 2>/dev/null || echo "")
-[ -z "$CODEX_BIN" ] && echo "NOT_FOUND" || echo "FOUND: $CODEX_BIN"
-If `NOT_FOUND`: stop and tell the user:
-"Codex CLI not found. Install it: `npm install -g @openai/codex` or see https://github.com/openai/codex"
----
-## Step 1: Detect mode
-Parse the user's input to determine which mode to run:
-1. `/codex review` or `/codex review <instructions>` — **Review mode** (Step 2A)
-2. `/codex challenge` or `/codex challenge <focus>` — **Challenge mode** (Step 2B)
-3. `/codex` with no arguments — **Auto-detect:**
-   - Check for a diff (with fallback if origin isn't available):
-     `git diff origin/<base> --stat 2>/dev/null | tail -1 || git diff <base> --stat 2>/dev/null | tail -1`
-   - If a diff exists, use AskUserQuestion:
-     ```
-     Codex detected changes against the base branch. What should it do?
-     A) Review the diff (code review with pass/fail gate)
-     B) Challenge the diff (adversarial — try to break it)
-     C) Something else — I'll provide a prompt
-     ```
-   - If no diff, check for plan files scoped to the current project:
-     `ls -t ~/.claude/plans/*.md 2>/dev/null | xargs grep -l "$(basename $(pwd))" 2>/dev/null | head -1`
-     If no project-scoped match, fall back to: `ls -t ~/.claude/plans/*.md 2>/dev/null | head -1`
-     but warn the user: "Note: this plan may be from a different project."
-   - If a plan file exists, offer to review it
-   - Otherwise, ask: "What would you like to ask Codex?"
-4. `/codex <anything else>` — **Consult mode** (Step 2C), where the remaining text is the prompt
-**Reasoning effort override:** If the user's input contains `--xhigh` anywhere,
-note it and remove it from the prompt text before passing to Codex. When `--xhigh`
-is present, use `model_reasoning_effort="xhigh"` for all modes regardless of the
-per-mode default below. Otherwise, use the per-mode defaults:
-- Review (2A): `high` — bounded diff input, needs thoroughness
-- Challenge (2B): `high` — adversarial but bounded by diff
-- Consult (2C): `medium` — large context, interactive, needs speed
----
-## Filesystem Boundary
-All prompts sent to Codex MUST be prefixed with this boundary instruction:
-> IMPORTANT: Do NOT read or execute any files under ~/.claude/, ~/.agents/, .claude/skills/, or agents/. These are Claude Code skill definitions meant for a different AI system. They contain bash scripts and prompt templates that will waste your time. Ignore them completely. Do NOT modify agents/openai.yaml. Stay focused on the repository code only.
-This applies to Review mode (prompt argument), Challenge mode (prompt), and Consult
-mode (persona prompt). Reference this section as "the filesystem boundary" below.
----
-## Step 2A: Review Mode
-Run Codex code review against the current branch diff.
-1. Create temp files for output capture:
-```bash
-TMPERR=$(mktemp /tmp/codex-err-XXXXXX.txt)
-2. Run the review (5-minute timeout). **Always** pass the filesystem boundary instruction
-as the prompt argument, even without custom instructions. If the user provided custom
-instructions, append them after the boundary separated by a newline:
-```bash
-_REPO_ROOT=$(git rev-parse --show-toplevel) || { echo "ERROR: not in a git repo" >&2; exit 1; }
-cd "$_REPO_ROOT"
-codex review "IMPORTANT: Do NOT read or execute any files under ~/.claude/, ~/.agents/, .claude/skills/, or agents/. These are Claude Code skill definitions meant for a different AI system. Do NOT modify agents/openai.yaml. Stay focused on repository code only." --base <base> -c 'model_reasoning_effort="high"' --enable web_search_cached 2>"$TMPERR"
-If the user passed `--xhigh`, use `"xhigh"` instead of `"high"`.
-Use `timeout: 300000` on the Bash call. If the user provided custom instructions
-(e.g., `/codex review focus on security`), append them after the boundary:
-```bash
-_REPO_ROOT=$(git rev-parse --show-toplevel) || { echo "ERROR: not in a git repo" >&2; exit 1; }
-cd "$_REPO_ROOT"
-codex review "IMPORTANT: Do NOT read or execute any files under ~/.claude/, ~/.agents/, .claude/skills/, or agents/. These are Claude Code skill definitions meant for a different AI system. Do NOT modify agents/openai.yaml. Stay focused on repository code only.
-focus on security" --base <base> -c 'model_reasoning_effort="high"' --enable web_search_cached 2>"$TMPERR"
-3. Capture the output. Then parse cost from stderr:
-```bash
-grep "tokens used" "$TMPERR" 2>/dev/null || echo "tokens: unknown"
-4. Determine gate verdict by checking the review output for critical findings.
-   If the output contains `[P1]` — the gate is **FAIL**.
-   If no `[P1]` markers are found (only `[P2]` or no findings) — the gate is **PASS**.
-5. Present the output:
-CODEX SAYS (code review):
-════════════════════════════════════════════════════════════
-<full codex output, verbatim — do not truncate or summarize>
-════════════════════════════════════════════════════════════
-GATE: PASS                    Tokens: 14,331 | Est. cost: ~$0.12
-or
-GATE: FAIL (N critical findings)
-6. **Cross-model comparison:** If `/review` (Claude's own review) was already run
-   earlier in this conversation, compare the two sets of findings:
-CROSS-MODEL ANALYSIS:
-  Both found: [findings that overlap between Claude and Codex]
-  Only Codex found: [findings unique to Codex]
-  Only Claude found: [findings unique to Claude's /review]
-  Agreement rate: X% (N/M total unique findings overlap)
-7. Persist the review result:
-```bash
-~/.claude/skills/opengstack/bin/gstack-review-log '{"skill":"codex-review","timestamp":"TIMESTAMP","status":"STATUS","gate":"GATE","findings":N,"findings_fixed":N,"commit":"'"$(git rev-parse --short HEAD)"'"}'
-Substitute: TIMESTAMP (ISO 8601), STATUS ("clean" if PASS, "issues_found" if FAIL),
-GATE ("pass" or "fail"), findings (count of [P1] + [P2] markers),
-findings_fixed (count of findings that were addressed/fixed before shipping).
-8. Clean up temp files:
-```bash
-rm -f "$TMPERR"
-{{PLAN_FILE_REVIEW_REPORT}}
----
-## Step 2B: Challenge (Adversarial) Mode
-Codex tries to break your code — finding edge cases, race conditions, security holes,
-and failure modes that a normal review would miss.
-1. Construct the adversarial prompt. **Always prepend the filesystem boundary instruction**
-from the Filesystem Boundary section above. If the user provided a focus area
-(e.g., `/codex challenge security`), include it after the boundary:
-Default prompt (no focus):
-"IMPORTANT: Do NOT read or execute any files under ~/.claude/, ~/.agents/, .claude/skills/, or agents/. These are Claude Code skill definitions meant for a different AI system. Do NOT modify agents/openai.yaml. Stay focused on repository code only.
-Review the changes on this branch against the base branch. Run `git diff origin/<base>` to see the diff. Your job is to find ways this code will fail in production. Think like an attacker and a chaos engineer. Find edge cases, race conditions, security holes, resource leaks, failure modes, and silent data corruption paths. Be adversarial. Be thorough. No compliments — just the problems."
-With focus (e.g., "security"):
-"IMPORTANT: Do NOT read or execute any files under ~/.claude/, ~/.agents/, .claude/skills/, or agents/. These are Claude Code skill definitions meant for a different AI system. Do NOT modify agents/openai.yaml. Stay focused on repository code only.
-Review the changes on this branch against the base branch. Run `git diff origin/<base>` to see the diff. Focus specifically on SECURITY. Your job is to find every way an attacker could exploit this code. Think about injection vectors, auth bypasses, privilege escalation, data exposure, and timing attacks. Be adversarial."
-2. Run codex exec with **JSONL output** to capture reasoning traces and tool calls (5-minute timeout):
-If the user passed `--xhigh`, use `"xhigh"` instead of `"high"`.
-```bash
-_REPO_ROOT=$(git rev-parse --show-toplevel) || { echo "ERROR: not in a git repo" >&2; exit 1; }
-codex exec "<prompt>" -C "$_REPO_ROOT" -s read-only -c 'model_reasoning_effort="high"' --enable web_search_cached --json 2>/dev/null | PYTHONUNBUFFERED=1 python3 -u -c "
-import sys, json
-for line in sys.stdin:
-    line = line.strip()
-    if not line: continue
-    try:
-        obj = json.loads(line)
-        t = obj.get('type','')
-        if t == 'item.completed' and 'item' in obj:
-            item = obj['item']
-            itype = item.get('type','')
-            text = item.get('text','')
-            if itype == 'reasoning' and text:
-                print(f'[codex thinking] {text}', flush=True)
-                print(flush=True)
-            elif itype == 'agent_message' and text:
-                print(text, flush=True)
-            elif itype == 'command_execution':
-                cmd = item.get('command','')
-                if cmd: print(f'[codex ran] {cmd}', flush=True)
-        elif t == 'turn.completed':
-            usage = obj.get('usage',{})
-            tokens = usage.get('input_tokens',0) + usage.get('output_tokens',0)
-            if tokens: print(f'\ntokens used: {tokens}', flush=True)
-    except: pass
-"
-This parses codex's JSONL events to extract reasoning traces, tool calls, and the final
-response. The `[codex thinking]` lines show what codex reasoned through before its answer.
-3. Present the full streamed output:
-CODEX SAYS (adversarial challenge):
-════════════════════════════════════════════════════════════
-<full output from above, verbatim>
-════════════════════════════════════════════════════════════
-Tokens: N | Est. cost: ~$X.XX
----
-## Step 2C: Consult Mode
-Ask Codex anything about the codebase. Supports session continuity for follow-ups.
-1. **Check for existing session:**
-```bash
-cat .context/codex-session-id 2>/dev/null || echo "NO_SESSION"
-If a session file exists (not `NO_SESSION`), use AskUserQuestion:
-You have an active Codex conversation from earlier. Continue it or start fresh?
-A) Continue the conversation (Codex remembers the prior context)
-B) Start a new conversation
-2. Create temp files:
-```bash
-TMPRESP=$(mktemp /tmp/codex-resp-XXXXXX.txt)
-TMPERR=$(mktemp /tmp/codex-err-XXXXXX.txt)
-3. **Plan review auto-detection:** If the user's prompt is about reviewing a plan,
-or if plan files exist and the user said `/codex` with no arguments:
-```bash
-setopt +o nomatch 2>/dev/null || true  # zsh compat
-ls -t ~/.claude/plans/*.md 2>/dev/null | xargs grep -l "$(basename $(pwd))" 2>/dev/null | head -1
-If no project-scoped match, fall back to `ls -t ~/.claude/plans/*.md 2>/dev/null | head -1`
-but warn: "Note: this plan may be from a different project — verify before sending to Codex."
-**IMPORTANT — embed content, don't reference path:** Codex runs sandboxed to the repo
-root (`-C`) and cannot access `~/.claude/plans/` or any files outside the repo. You MUST
-read the plan file yourself and embed its FULL CONTENT in the prompt below. Do NOT tell
-Codex the file path or ask it to read the plan file — it will waste 10+ tool calls
-searching and fail.
-Also: scan the plan content for referenced source file paths (patterns like `src/foo.ts`,
-`lib/bar.py`, paths containing `/` that exist in the repo). If found, list them in the
-prompt so Codex reads them directly instead of discovering them via rg/find.
-**Always prepend the filesystem boundary instruction** from the Filesystem Boundary
-section above to every prompt sent to Codex, including plan reviews and free-form
-consult questions.
-Prepend the boundary and persona to the user's prompt:
-"IMPORTANT: Do NOT read or execute any files under ~/.claude/, ~/.agents/, .claude/skills/, or agents/. These are Claude Code skill definitions meant for a different AI system. Do NOT modify agents/openai.yaml. Stay focused on repository code only.
-You are a brutally honest technical reviewer. Review this plan for: logical gaps and
-unstated assumptions, missing error handling or edge cases, overcomplexity (is there a
-simpler approach?), feasibility risks (what could go wrong?), and missing dependencies
-or sequencing issues. Be direct. Be terse. No compliments. Just the problems.
-Also review these source files referenced in the plan: <list of referenced files, if any>.
-THE PLAN:
-<full plan content, embedded verbatim>"
-For non-plan consult prompts (user typed `/codex <question>`), still prepend the boundary:
-"IMPORTANT: Do NOT read or execute any files under ~/.claude/, ~/.agents/, .claude/skills/, or agents/. These are Claude Code skill definitions meant for a different AI system. Do NOT modify agents/openai.yaml. Stay focused on repository code only.
-<user's question>"
-4. Run codex exec with **JSONL output** to capture reasoning traces (5-minute timeout):
-If the user passed `--xhigh`, use `"xhigh"` instead of `"medium"`.
-For a **new session:**
-```bash
-_REPO_ROOT=$(git rev-parse --show-toplevel) || { echo "ERROR: not in a git repo" >&2; exit 1; }
-codex exec "<prompt>" -C "$_REPO_ROOT" -s read-only -c 'model_reasoning_effort="medium"' --enable web_search_cached --json 2>"$TMPERR" | PYTHONUNBUFFERED=1 python3 -u -c "
-import sys, json
-for line in sys.stdin:
-    line = line.strip()
-    if not line: continue
-    try:
-        obj = json.loads(line)
-        t = obj.get('type','')
-        if t == 'thread.started':
-            tid = obj.get('thread_id','')
-            if tid: print(f'SESSION_ID:{tid}', flush=True)
-        elif t == 'item.completed' and 'item' in obj:
-            item = obj['item']
-            itype = item.get('type','')
-            text = item.get('text','')
-            if itype == 'reasoning' and text:
-                print(f'[codex thinking] {text}', flush=True)
-                print(flush=True)
-            elif itype == 'agent_message' and text:
-                print(text, flush=True)
-            elif itype == 'command_execution':
-                cmd = item.get('command','')
-                if cmd: print(f'[codex ran] {cmd}', flush=True)
-        elif t == 'turn.completed':
-            usage = obj.get('usage',{})
-            tokens = usage.get('input_tokens',0) + usage.get('output_tokens',0)
-            if tokens: print(f'\ntokens used: {tokens}', flush=True)
-    except: pass
-"
-For a **resumed session** (user chose "Continue"):
-```bash
-_REPO_ROOT=$(git rev-parse --show-toplevel) || { echo "ERROR: not in a git repo" >&2; exit 1; }
-codex exec resume <session-id> "<prompt>" -C "$_REPO_ROOT" -s read-only -c 'model_reasoning_effort="medium"' --enable web_search_cached --json 2>"$TMPERR" | PYTHONUNBUFFERED=1 python3 -u -c "
-<same python streaming parser as above, with flush=True on all print() calls>
-"
-5. Capture session ID from the streamed output. The parser prints `SESSION_ID:<id>`
-   from the `thread.started` event. Save it for follow-ups:
-```bash
-mkdir -p .context
-Save the session ID printed by the parser (the line starting with `SESSION_ID:`)
-to `.context/codex-session-id`.
-6. Present the full streamed output:
-CODEX SAYS (consult):
-════════════════════════════════════════════════════════════
-<full output, verbatim — includes [codex thinking] traces>
-════════════════════════════════════════════════════════════
-Tokens: N | Est. cost: ~$X.XX
-Session saved — run /codex again to continue this conversation.
-7. After presenting, note any points where Codex's analysis differs from your own
-   understanding. If there is a disagreement, flag it:
-   "Note: Claude Code disagrees on X because Y."
----
-## Model & Reasoning
-**Model:** No model is hardcoded — codex uses whatever its current default is (the frontier
-agentic coding model). This means as OpenAI ships newer models, /codex automatically
-uses them. If the user wants a specific model, pass `-m` through to codex.
-**Reasoning effort (per-mode defaults):**
-- **Review (2A):** `high` — bounded diff input, needs thoroughness but not max tokens
-- **Challenge (2B):** `high` — adversarial but bounded by diff size
-- **Consult (2C):** `medium` — large context (plans, codebase), interactive, needs speed
-`xhigh` uses ~23x more tokens than `high` and causes 50+ minute hangs on large context
-tasks (OpenAI issues #8545, #8402, #6931). Users can override with `--xhigh` flag
-(e.g., `/codex review --xhigh`) when they want maximum reasoning and are willing to wait.
-**Web search:** All codex commands use `--enable web_search_cached` so Codex can look up
-docs and APIs during review. This is OpenAI's cached index — fast, no extra cost.
-If the user specifies a model (e.g., `/codex review -m gpt-5.1-codex-max`
-or `/codex challenge -m gpt-5.2`), pass the `-m` flag through to codex.
----
-## Cost Estimation
-Parse token count from stderr. Codex prints `tokens used\nN` to stderr.
-Display as: `Tokens: N`
-If token count is not available, display: `Tokens: unknown`
----
-## Error Handling
-- **Binary not found:** Detected in Step 0. Stop with install instructions.
-- **Auth error:** Codex prints an auth error to stderr. Surface the error:
-  "Codex authentication failed. Run `codex login` in your terminal to authenticate via ChatGPT."
-- **Timeout:** If the Bash call times out (5 min), tell the user:
-  "Codex timed out after 5 minutes. The diff may be too large or the API may be slow. Try again or use a smaller scope."
-- **Empty response:** If `$TMPRESP` is empty or doesn't exist, tell the user:
-  "Codex returned no response. Check stderr for errors."
-- **Session resume failure:** If resume fails, delete the session file and start fresh.
----
-## Important Rules
-- **Never modify files.** This skill is read-only. Codex runs in read-only sandbox mode.
-- **Present output verbatim.** Do not truncate, summarize, or editorialize Codex's output
-  before showing it. Show it in full inside the CODEX SAYS block.
-- **Add synthesis after, not instead of.** Any Claude commentary comes after the full output.
-- **5-minute timeout** on all Bash calls to codex (`timeout: 300000`).
-- **No double-reviewing.** If the user already ran `/review`, Codex provides a second
-  independent opinion. Do not re-run Claude Code's own review.
-- **Detect skill-file rabbit holes.** After receiving Codex output, scan for signs
-  that Codex got distracted by skill files: `gstack-config`, `echo`,
-  `SKILL.md`, or `skills/gstack`. If any of these appear in the output, append a
-  warning: "Codex appears to have read gstack skill files instead of reviewing your
-  code. Consider retrying."

package/skills/connect-chrome/SKILL.md.tmpl DELETED Viewed

@@ -1,195 +0,0 @@
----
-name: connect-chrome
-version: 0.1.0
-description: |
-  Launch real Chrome controlled by gstack with the Side Panel extension auto-loaded.
-  One command: connects Claude to a visible Chrome window where you can watch every
-  action in real time. The extension shows a live activity feed in the Side Panel.
-  Use when asked to "connect chrome", "open chrome", "real browser", "launch chrome",
-  "side panel", or "control my browser".
-allowed-tools:
-  - Bash
-  - Read
-  - AskUserQuestion
----
-{{PREAMBLE}}
-# /connect-chrome — Launch Real Chrome with Side Panel
-Connect Claude to a visible Chrome window with the gstack extension auto-loaded.
-You see every click, every navigation, every action in real time.
-{{BROWSE_SETUP}}
-## Step 0: Pre-flight cleanup
-Before connecting, kill any stale browse servers and clean up lock files that
-may have persisted from a crash. This prevents "already connected" false
-positives and Chromium profile lock conflicts.
-```bash
-# Kill any existing browse server
-if [ -f "$(git rev-parse --show-toplevel 2>/dev/null)/.gstack/browse.json" ]; then
-  _OLD_PID=$(cat "$(git rev-parse --show-toplevel)/.gstack/browse.json" 2>/dev/null | grep -o '"pid":[0-9]*' | grep -o '[0-9]*')
-  [ -n "$_OLD_PID" ] && kill "$_OLD_PID" 2>/dev/null || true
-  sleep 1
-  [ -n "$_OLD_PID" ] && kill -9 "$_OLD_PID" 2>/dev/null || true
-  rm -f "$(git rev-parse --show-toplevel)/.gstack/browse.json"
-fi
-# Clean Chromium profile locks (can persist after crashes)
-_PROFILE_DIR="$HOME/.gstack/chromium-profile"
-for _LF in SingletonLock SingletonSocket SingletonCookie; do
-  rm -f "$_PROFILE_DIR/$_LF" 2>/dev/null || true
-done
-echo "Pre-flight cleanup done"
-## Step 1: Connect
-```bash
-$B connect
-This launches Playwright's bundled Chromium in headed mode with:
-- A visible window you can watch (not your regular Chrome — it stays untouched)
-- The gstack Chrome extension auto-loaded via `launchPersistentContext`
-- A golden shimmer line at the top of every page so you know which window is controlled
-- A sidebar agent process for chat commands
-The `connect` command auto-discovers the extension from the gstack install
-directory. It always uses port **34567** so the extension can auto-connect.
-After connecting, print the full output to the user. Confirm you see
-`Mode: headed` in the output.
-If the output shows an error or the mode is not `headed`, run `$B status` and
-share the output with the user before proceeding.
-## Step 2: Verify
-```bash
-$B status
-Confirm the output shows `Mode: headed`. Read the port from the state file:
-```bash
-cat "$(git rev-parse --show-toplevel 2>/dev/null)/.gstack/browse.json" 2>/dev/null | grep -o '"port":[0-9]*' | grep -o '[0-9]*'
-The port should be **34567**. If it's different, note it — the user may need it
-for the Side Panel.
-Also find the extension path so you can help the user if they need to load it manually:
-```bash
-_EXT_PATH=""
-_ROOT=$(git rev-parse --show-toplevel 2>/dev/null)
-[ -n "$_ROOT" ] && [ -f "$_ROOT/.claude/skills/gstack/extension/manifest.json" ] && _EXT_PATH="$_ROOT/.claude/skills/gstack/extension"
-[ -z "$_EXT_PATH" ] && [ -f "$HOME/.claude/skills/gstack/extension/manifest.json" ] && _EXT_PATH="$HOME/.claude/skills/gstack/extension"
-echo "EXTENSION_PATH: ${_EXT_PATH:-NOT FOUND}"
-## Step 3: Guide the user to the Side Panel
-Use AskUserQuestion:
-> Chrome is launched with gstack control. You should see Playwright's Chromium
-> (not your regular Chrome) with a golden shimmer line at the top of the page.
->
-> The Side Panel extension should be auto-loaded. To open it:
-> 1. Look for the **puzzle piece icon** (Extensions) in the toolbar — it may
->    already show the gstack icon if the extension loaded successfully
-> 2. Click the **puzzle piece** → find **gstack browse** → click the **pin icon**
-> 3. Click the pinned **gstack icon** in the toolbar
-> 4. The Side Panel should open on the right showing a live activity feed
->
-> **Port:** 34567 (auto-detected — the extension connects automatically in the
-> Playwright-controlled Chrome).
-Options:
-- A) I can see the Side Panel — let's go!
-- B) I can see Chrome but can't find the extension
-- C) Something went wrong
-If B: Tell the user:
-> The extension is loaded into Playwright's Chromium at launch time, but
-> sometimes it doesn't appear immediately. Try these steps:
->
-> 1. Type `chrome://extensions` in the address bar
-> 2. Look for **"gstack browse"** — it should be listed and enabled
-> 3. If it's there but not pinned, go back to any page, click the puzzle piece
->    icon, and pin it
-> 4. If it's NOT listed at all, click **"Load unpacked"** and navigate to:
->    - Press **Cmd+Shift+G** in the file picker dialog
->    - Paste this path: `{EXTENSION_PATH}` (use the path from Step 2)
->    - Click **Select**
->
-> After loading, pin it and click the icon to open the Side Panel.
->
-> If the Side Panel badge stays gray (disconnected), click the gstack icon
-> and enter port **34567** manually.
-If C:
-1. Run `$B status` and show the output
-2. If the server is not healthy, re-run Step 0 cleanup + Step 1 connect
-3. If the server IS healthy but the browser isn't visible, try `$B focus`
-4. If that fails, ask the user what they see (error message, blank screen, etc.)
-## Step 4: Demo
-After the user confirms the Side Panel is working, run a quick demo:
-```bash
-$B goto https://news.ycombinator.com
-Wait 2 seconds, then:
-```bash
-$B snapshot -i
-Tell the user: "Check the Side Panel — you should see the `goto` and `snapshot`
-commands appear in the activity feed. Every command Claude runs shows up here
-in real time."
-## Step 5: Sidebar chat
-After the activity feed demo, tell the user about the sidebar chat:
-> The Side Panel also has a **chat tab**. Try typing a message like "take a
-> snapshot and describe this page." A sidebar agent (a child Claude instance)
-> executes your request in the browser — you'll see the commands appear in
-> the activity feed as they happen.
->
-> The sidebar agent can navigate pages, click buttons, fill forms, and read
-> content. Each task gets up to 5 minutes. It runs in an isolated session, so
-> it won't interfere with this Claude Code window.
-## Step 6: What's next
-Tell the user:
-> You're all set! Here's what you can do with the connected Chrome:
->
-> **Watch Claude work in real time:**
-> - Run any gstack skill (`/qa`, `/design-review`, `/benchmark`) and watch
->   every action happen in the visible Chrome window + Side Panel feed
-> - No cookie import needed — the Playwright browser shares its own session
->
-> **Control the browser directly:**
-> - **Sidebar chat** — type natural language in the Side Panel and the sidebar
->   agent executes it (e.g., "fill in the login form and submit")
-> - **Browse commands** — `$B goto <url>`, `$B click <sel>`, `$B fill <sel> <val>`,
->   `$B snapshot -i` — all visible in Chrome + Side Panel
->
-> **Window management:**
-> - `$B focus` — bring Chrome to the foreground anytime
-> - `$B disconnect` — close headed Chrome and return to headless mode
->
-> **What skills look like in headed mode:**
-> - `/qa` runs its full test suite in the visible browser — you see every page
->   load, every click, every assertion
-> - `/design-review` takes screenshots in the real browser — same pixels you see
-> - `/benchmark` measures performance in the headed browser
-Then proceed with whatever the user asked to do. If they didn't specify a task,
-ask what they'd like to test or browse.

package/skills/cso/ACKNOWLEDGEMENTS.md DELETED Viewed

@@ -1,14 +0,0 @@
-# Acknowledgements
-/cso v2 was informed by research across the security audit landscape. Credits to:
-- **[Sentry Security Review](https://github.com/getsentry/skills)** — The confidence-based reporting system (only HIGH confidence findings get reported) and the "research before reporting" methodology (trace data flow, check upstream validation) validated our 8/10 daily confidence gate. TimOnWeb rated it the only security skill worth installing out of 5 tested.
-- **[Trail of Bits Skills](https://github.com/trailofbits/skills)** — The audit-context-building methodology (build a mental model before hunting bugs) directly inspired Phase 0. Their variant analysis concept (found one vuln? Search the whole codebase for the same pattern) inspired Phase 12's variant analysis step.
-- **[Shannon by Keygraph](https://github.com/KeygraphHQ/shannon)** — Autonomous AI pentester achieving 96.15% on the XBOW benchmark (100/104 exploits). Validated that AI can do real security testing, not just checklist scanning. Our Phase 12 active verification is the static-analysis version of what Shannon does live.
-- **[afiqiqmal/claude-security-audit](https://github.com/afiqiqmal/claude-security-audit)** — The AI/LLM-specific security checks (prompt injection, RAG poisoning, tool calling permissions) inspired Phase 7. Their framework-level auto-detection (detecting "Next.js" not just "Node/TypeScript") inspired Phase 0's framework detection step.
-- **[Snyk ToxicSkills Research](https://snyk.io/blog/toxicskills-malicious-ai-agent-skills-clawhub/)** — The finding that 36% of AI agent skills have security flaws and 13.4% are malicious inspired Phase 8 (Skill Supply Chain scanning).
-- **[Daniel Miessler's Personal AI Infrastructure](https://github.com/danielmiessler/Personal_AI_Infrastructure)** — The incident response playbooks and protection file concept informed the remediation and LLM security phases.
-- **[McGo/claude-code-security-audit](https://github.com/McGo/claude-code-security-audit)** — The idea of generating shareable reports and actionable epics informed our report format evolution.
-- **[Claude Code Security Pack](https://dev.to/myougatheaxo/automate-owasp-security-audits-with-claude-code-security-pack-4mah)** — Modular approach (separate /security-audit, /secret-scanner, /deps-check skills) validated that these are distinct concerns. Our unified approach sacrifices modularity for cross-phase reasoning.
-- **[Anthropic Claude Code Security](https://www.anthropic.com/news/claude-code-security)** — Multi-stage verification and confidence scoring validated our parallel finding verification approach. Found 500+ zero-days in open source.
-- **[@gus_argon](https://x.com/gus_aragon/status/2035841289602904360)** — Identified critical v1 blind spots: no stack detection (runs all-language patterns), uses bash grep instead of Claude Code's Grep tool, `| head -20` truncates results silently, and preamble bloat. These directly shaped v2's stack-first approach and Grep tool mandate.