npm - skill-codex - Versions diffs - 0.3.0 → 0.8.0 - Mend

skill-codex 0.3.0 → 0.8.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (14) hide show

package/.claude-plugin/hooks/hooks.json +17 -0
package/.claude-plugin/marketplace.json +23 -0
package/.claude-plugin/plugin.json +16 -0
package/.mcp.json +9 -0
package/README.md +75 -15
package/commands/codex-do.md +4 -3
package/commands/codex-review.md +17 -6
package/dist/bin/skill-codex.js +1280 -17
package/dist/bin/skill-codex.js.map +1 -1
package/dist/index.js +439 -48
package/dist/index.js.map +1 -1
package/hooks/post-tool-use-review.mjs +79 -0
package/package.json +4 -1
package/skills/codex-bridge/SKILL.md +217 -0

package/.claude-plugin/hooks/hooks.json ADDED Viewed

@@ -0,0 +1,17 @@
+{
+  "description": "Suggests /codex-review after significant code changes.",
+  "hooks": {
+    "PostToolUse": [
+      {
+        "matcher": "Write|Edit|MultiEdit|NotebookEdit",
+        "hooks": [
+          {
+            "type": "command",
+            "command": "node \"${CLAUDE_PLUGIN_ROOT}/hooks/post-tool-use-review.mjs\"",
+            "timeout": 30
+          }
+        ]
+      }
+    ]
+  }
+}

package/.claude-plugin/marketplace.json ADDED Viewed

@@ -0,0 +1,23 @@
+{
+  "name": "skill-codex",
+  "owner": {
+    "name": "Arystos",
+    "url": "https://github.com/Arystos"
+  },
+  "plugins": [
+    {
+      "name": "skill-codex",
+      "source": "./",
+      "description": "Use OpenAI Codex from Claude Code for code review, task delegation, and second opinions via MCP — with your Codex subscription, no API key. Windows-native, live progress.",
+      "version": "0.8.0",
+      "author": {
+        "name": "Arystos",
+        "url": "https://github.com/Arystos"
+      },
+      "homepage": "https://github.com/Arystos/skill-codex",
+      "license": "MIT",
+      "keywords": ["codex", "code-review", "mcp", "developer-tools"],
+      "category": "development"
+    }
+  ]
+}

package/.claude-plugin/plugin.json ADDED Viewed

@@ -0,0 +1,16 @@
+{
+  "name": "skill-codex",
+  "description": "Use OpenAI Codex from Claude Code for code review, task delegation, and second opinions via MCP — with your Codex subscription, no API key.",
+  "version": "0.8.0",
+  "author": {
+    "name": "Arystos",
+    "url": "https://github.com/Arystos"
+  },
+  "homepage": "https://github.com/Arystos/skill-codex",
+  "repository": {
+    "type": "git",
+    "url": "https://github.com/Arystos/skill-codex.git"
+  },
+  "license": "MIT",
+  "keywords": ["codex", "code-review", "task-delegation", "mcp", "developer-tools", "llm-tools"]
+}

package/.mcp.json ADDED Viewed

@@ -0,0 +1,9 @@
+{
+  "mcpServers": {
+    "skill-codex": {
+      "command": "npx",
+      "args": ["-y", "skill-codex", "mcp"],
+      "env": {}
+    }
+  }
+}

package/README.md CHANGED Viewed

@@ -3,6 +3,9 @@
 [![npm version](https://img.shields.io/npm/v/skill-codex)](https://www.npmjs.com/package/skill-codex)
 [![npm downloads](https://img.shields.io/npm/dm/skill-codex)](https://www.npmjs.com/package/skill-codex)
 ![CI](https://github.com/Arystos/skill-codex/actions/workflows/ci.yml/badge.svg)
+<!-- After enabling the repo on codecov.io (see DISTRIBUTION.md), uncomment:
+[![codecov](https://codecov.io/gh/Arystos/skill-codex/graph/badge.svg)](https://codecov.io/gh/Arystos/skill-codex) -->
+![Coverage](https://img.shields.io/badge/coverage-80%25%2B%20enforced-brightgreen)
 ![Windows](https://img.shields.io/badge/Windows-0078D4?style=flat&logo=windows&logoColor=white)
 ![macOS](https://img.shields.io/badge/macOS-000000?style=flat&logo=apple&logoColor=white)
 ![Linux](https://img.shields.io/badge/Linux-FCC624?style=flat&logo=linux&logoColor=black)
@@ -12,6 +15,12 @@
 A cross-platform [Claude Code](https://code.claude.com) skill that integrates [OpenAI Codex CLI](https://github.com/openai/codex) for code review, task delegation, and consultation. Uses your existing Codex subscription -- no API key required.
+> **Two models rarely make the same mistake.** skill-codex lets Claude and Codex check each other's work -- Claude plans and builds fast, Codex reviews with fresh eyes -- from a single terminal, using the Codex subscription you already pay for. No API key, no second window, no copy-paste.
+<!-- Demo GIF: record a ~20s clip of `/codex-review` catching a real bug, save it to docs/demo.gif, then uncomment the next line:
+![skill-codex demo](docs/demo.gif)
+-->
 ## Why?
 Claude Code and Codex CLI have different strengths. Claude excels at reasoning, architecture, and complex refactors. Codex is fast, thorough, and great at focused execution and review. **skill-codex** lets them work together from a single terminal -- no second window, no copy-paste, no context loss.
@@ -19,10 +28,30 @@ Claude Code and Codex CLI have different strengths. Claude excels at reasoning,
 * **`/codex-review`** -- Have Codex review your current changes as a second reviewer
 * **`/codex-do`** -- Delegate well-scoped implementation tasks to Codex
 * **`/codex-consult`** -- Get a second opinion on architecture or design decisions
+* **`codex-bridge` agent skill** -- Auto-triggers on implementation/review/consult requests so Claude reaches for Codex without an explicit slash command
 * **Auto-review hook** -- Smart PostToolUse hook suggests review after significant changes
+* **Live progress** -- Streams Codex's activity (running commands, file edits, elapsed time) as MCP progress so long runs never look frozen; also mirrored to a tail-able log file (under the OS temp dir, path printed at run start -- never pollutes your repo)
 * **Subscription-first** -- Works with `codex login`, no `OPENAI_API_KEY` needed
 * **Edge case handling** -- Retry logic, timeout, anti-recursion, lock files, pre-flight checks
+### Why not just use Claude's own subagents?
+Because a subagent is the **same model**. When Claude reviews Claude, you inherit the same blind spots -- it's grading its own homework. A *different* model family was trained differently, so its mistakes don't correlate with Claude's: it catches what Claude is confidently wrong about, and Claude catches what Codex gets wrong. That uncorrelated, cross-model check is the entire point -- and skill-codex keeps Claude as the final judge, never blindly forwarding Codex's verdict.
+### How it compares
+| | skill-codex | Most Codex MCP bridges |
+|---|---|---|
+| Codex **subscription** auth (no `OPENAI_API_KEY`) | ✅ | ⚠️ often require an API key |
+| **Windows** verified in CI (not just "should work") | ✅ 9-way matrix (Windows/macOS/Linux × Node 18/20/22) | ❌ usually Linux-only CI |
+| **Live progress** so long runs never look frozen | ✅ MCP progress + tail-able log | ⚠️ varies |
+| **Slash commands** (`/codex-review`, `/codex-do`, `/codex-consult`) | ✅ | ⚠️ some |
+| **Auto-review hook** (PostToolUse) | ✅ | ❌ |
+| **Agent skill** (auto-triggers, no command needed) | ✅ | ❌ |
+| **Guardrails**: retry, timeout, anti-recursion, lock files | ✅ | ⚠️ partial |
+*A snapshot, not a leaderboard -- the Codex MCP space moves fast, so check each tool's current state. The point isn't "skill-codex wins everything"; it's that the operational details (subscription auth, real Windows support, never-frozen runs, guardrails) are where it focuses.*
 ## Prerequisites
 * [Node.js](https://nodejs.org) >= 18
@@ -49,55 +78,80 @@ The setup command:
 1. Registers the MCP server in your Claude Code config (`~/.claude.json`)
 2. Installs slash commands globally (`~/.claude/commands/`)
 3. Configures the auto-review PostToolUse hook
-4. Verifies everything works
+4. Installs the `codex-bridge` agent skill (`~/.claude/skills/`)
+5. Verifies everything works
 > **Tip:** Add `.skill-codex.lock` to your `.gitignore`
+### Install as a Claude Code plugin (alternative)
+Instead of `npx skill-codex setup`, you can install it as a plugin:
+```shell
+/plugin marketplace add Arystos/skill-codex
+/plugin install skill-codex@skill-codex
+```
+The plugin bundles the MCP server (launched via `npx -y skill-codex mcp`), the slash commands, the agent skill, and the auto-review hook. Restart Claude Code after installing.
 ## How It Works
 ```
 You in Claude Code
   |
-  |-- /codex-review        --> MCP tool --> codex exec (read-only) --> review findings
-  |-- /codex-do "task"     --> MCP tool --> codex exec --full-auto --> reviewed output
-  +-- /codex-consult "q"   --> MCP tool --> codex exec (read-only) --> synthesized opinion
+  |-- /codex-review        --> MCP tool --> codex exec --sandbox read-only       --> review findings
+  |-- /codex-do "task"     --> MCP tool --> codex exec --sandbox workspace-write --> reviewed output
+  +-- /codex-consult "q"   --> MCP tool --> codex exec --sandbox read-only       --> synthesized opinion
 ```
 The MCP server spawns `codex exec` as a subprocess, using your logged-in Codex session. Claude sees the output and critically evaluates it -- **Codex is treated as a peer, not an authority**.
+### Advanced: sandbox, sessions, model & review
+These `codex_exec` parameters are all optional -- omit them and Codex uses its defaults:
+* **`sandbox`** -- the explicit Codex sandbox policy (`read-only`, `workspace-write`, or `danger-full-access`), overriding the `mode` default. Use `danger-full-access` only when you understand the risk.
+* **`sessionId`** -- resume a previous Codex session for multi-round memory. Each response includes the session's thread id; pass it back so Codex retains context -- e.g. a follow-up review that checks whether *previously flagged* issues were fixed, instead of re-discovering them.
+* **`model`** -- pick the Codex model (e.g. `gpt-5.5`, `gpt-5.4`, `gpt-5.4-mini`). Route cheap tasks to a smaller model and escalate hard ones; omit to use your configured default.
+* **`reasoningEffort`** -- how hard Codex thinks: `minimal` | `low` | `medium` | `high` | `xhigh`. Omit for the model's default.
+* **`review`** -- run Codex's native diff-scoped reviewer (`codex exec review`) instead of a freeform prompt, optionally targeting a branch (`reviewBase`) or commit (`reviewCommit`). The prompt becomes optional focus instructions.
 ### Auto-review
 After significant code changes (3+ files, 100+ lines, security-related paths), the PostToolUse hook suggests running `/codex-review`. Trivial changes (docs-only, < 5 lines, whitespace) are skipped to preserve your Codex quota.
-### Example: `/codex-review`
+### Agent skill (auto-trigger)
-```
-> /codex-review
+The slash commands are explicit, on-demand entry points. The `codex-bridge` agent skill is the implicit one: it installs to `~/.claude/skills/` and Claude loads it automatically when your request matches a delegation, review, or consult pattern (e.g. "implement X", "generate tests for", "review this diff", "second opinion on this approach"). It maps the same three workflows -- delegate, review, consult -- onto the `codex_exec` tool, so you get Codex collaboration without remembering a command. Trivial tasks (< 50 lines, faster done directly) are explicitly out of scope.
-Reviewing uncommitted changes (3 files, ~85 lines)...
+### Example Output
-Codex found 2 issues:
+When Codex runs, the MCP tool returns a structured plain-text response:
+```
+[read-only │ D:\myproject │ 21538 tok in (15744 cached) → 129 out]
+  ✔ exec: powershell -Command 'git diff --cached'  (ok)
 CRITICAL — src/auth/login.ts:42
 Password comparison uses == instead of timing-safe comparison.
-Claude's assessment: Agree. Use crypto.timingSafeEqual() instead.
 MEDIUM — src/api/routes.ts:18
 Missing rate limiting on login endpoint.
-Claude's assessment: Agree. Add rate limiter middleware.
-Shall I fix these issues?
 ```
+The first line shows mode, working directory, and token usage. Activity lines show commands Codex executed (with status icons: ✔ ok, ✘ blocked/failed). The response content follows after a blank line.
 ## Configuration
 Environment variables (all optional):
 | Variable | Default | Description |
 |----------|---------|-------------|
-| `SKILL_CODEX_TIMEOUT_MS` | `600000` (10 min) | Subprocess timeout |
+| `SKILL_CODEX_TIMEOUT_MS` | `300000` (5 min) | Subprocess timeout |
 | `SKILL_CODEX_MAX_RETRIES` | `3` | Retry count for transient errors |
+| `SKILL_CODEX_LOG` | `<os-temp>/skill-codex/<workspace>.log` | Absolute path override for the live, tail-able per-run log |
 | `SKILL_CODEX_DEBUG` | -- | Enable debug logging to stderr |
+| `SKILL_CODEX_WINDOWS_SANDBOX` | `unelevated` | Windows only — Codex `windows.sandbox` mode (`unelevated`/`elevated`) |
 ### Smart Filter Thresholds
@@ -146,6 +200,8 @@ skill-codex/
 |-- hooks/
 |   |-- post-tool-use-review.sh   # Auto-review hook (macOS/Linux)
 |   +-- post-tool-use-review.ps1  # Auto-review hook (Windows)
+|-- skills/
+|   +-- codex-bridge/SKILL.md # Agent skill (auto-triggers delegation/review/consult)
 |-- setup/                    # npx skill-codex setup installer
 |-- bin/                      # CLI entry point
 +-- __tests__/                # vitest test suite
@@ -165,6 +221,7 @@ cd skill-codex
 npm install
 npm run build
 npm test
+npm run test:coverage   # enforces the same 80%+ gate as CI
 ```
 ## Troubleshooting
@@ -179,11 +236,14 @@ Restart Claude Code after running setup. The MCP server only loads on startup.
 Increase the timeout: `export SKILL_CODEX_TIMEOUT_MS=1200000` (20 min). Large codebases can take longer.
 **"Auth expired" but Codex works in another terminal**
-The MCP server runs in its own process. Run `codex login` and restart Claude Code.
+The MCP server runs in its own process. Run `codex login` and restart Claude Code. On Windows, the auth pre-check is skipped automatically (PowerShell profile errors can cause false negatives); auth is still verified when Codex actually runs.
 **Lock file blocking runs**
 If a previous run crashed, a stale `.skill-codex.lock` may remain. It auto-cleans after 15 minutes, or delete it manually.
+**Windows: "windows sandbox failed: spawn setup refresh" / Codex commands all blocked**
+Codex's default *elevated* Windows sandbox fails to spawn shells on many setups ([openai/codex#24098](https://github.com/openai/codex/issues/24098), [#24259](https://github.com/openai/codex/issues/24259)). skill-codex works around this by pinning `windows.sandbox=unelevated`, which spawns reliably. If your machine needs the elevated sandbox instead, set `SKILL_CODEX_WINDOWS_SANDBOX=elevated`.
 ## Inspired By
 * [Dunqing/claude-codex-bridge](https://github.com/Dunqing/claude-codex-bridge) -- retry logic, anti-recursion, output parsing

package/commands/codex-do.md CHANGED Viewed

@@ -10,7 +10,7 @@ You are delegating an implementation task to Codex. Follow these steps:
 2. **Evaluate if the task is suitable for delegation**:
    - **Good for Codex**: repetitive bulk changes, boilerplate generation, test writing for existing code, migration scripts, file format conversions, well-defined single-file edits
-   - **Keep for yourself**: architectural decisions, cross-module refactoring, tasks requiring deep conversation context, ambiguous requirements, tasks under 50 lines
+   - **Keep for yourself**: architectural decisions, cross-module refactoring, tasks requiring deep conversation context, ambiguous requirements, and trivial edits you'd finish faster yourself than you can write a spec for
    - If the task is poorly scoped, explain why and suggest how to break it down.
 3. **Prepare a precise, self-contained prompt** for Codex. Include:
@@ -25,7 +25,8 @@ You are delegating an implementation task to Codex. Follow these steps:
    - `requireGit`: true
 5. **Review Codex's output critically**:
-   - Run `git diff` to see exactly what Codex changed
+   - Run `git status --short` **first** to see ALL changes, including **newly-created files** (lines starting with `??`). Codex in full-auto mode often creates new files, and `git diff` alone is blind to untracked files.
+   - Run `git diff` for modifications to tracked files, and **read any new untracked files directly** to review their contents.
    - Check for introduced bugs, regressions, or style violations
    - Verify it matches the requested changes
    - Verify it doesn't modify files outside the requested scope
@@ -39,6 +40,6 @@ You are delegating an implementation task to Codex. Follow these steps:
 ## Important
 - Codex runs in **full-auto mode** — it CAN modify files in the workspace
-- Always `git diff` after Codex runs to verify what changed
+- Always run `git status --short` **and** `git diff` after Codex runs — `git status` catches new files that `git diff` misses
 - Codex is a **peer, not an authority** — review all output before accepting
 - For complex multi-file tasks, consider doing it yourself instead

package/commands/codex-review.md CHANGED Viewed

@@ -7,7 +7,7 @@ Review the current changes using Codex as a second reviewer.
 You are invoking Codex for a second-opinion code review. Follow these steps:
 1. **Determine what to review** based on `$ARGUMENTS`:
-   - If empty or "uncommitted": get unstaged changes via `git diff`. If empty, try staged changes via `git diff --cached`. If both empty, inform the user there are no changes to review.
+   - If empty or "uncommitted": first run `git status --short` to see ALL changes, including **new untracked files** (lines starting with `??`). Collect tracked modifications via `git diff` and staged changes via `git diff --cached`, **and** include the full contents of any new untracked files — read them directly, or use `git diff --no-index -- /dev/null <file>`. Plain `git diff` is blind to untracked files, so a review that skips this step will miss brand-new files entirely. If there are no changes at all, inform the user and stop.
    - If it looks like a branch name: get changes via `git diff <branch>...HEAD`
    - If it looks like a commit SHA: get changes via `git show <sha>`
@@ -20,16 +20,27 @@ You are invoking Codex for a second-opinion code review. Follow these steps:
    - `mode`: "exec"
    - `requireGit`: true
-4. **Present findings** to the user:
+   **Option B (native Codex review):** Default to the Claude-assembled diff above, since the user prefers Claude-led review. You may instead call `codex_exec` with `review: true` to use `codex exec review`; add `reviewBase` for a branch or `reviewCommit` for a SHA. The prompt is optional focus instructions.
+4. **Assign an overall verdict** from the findings (you are the final judge — weigh Codex's findings, don't just echo them):
+   - **BLOCKED** — one or more CRITICAL/HIGH issues that should be fixed before merge
+   - **WARNING** — only MEDIUM/LOW issues; safe to proceed with awareness
+   - **APPROVED** — no substantive issues
+   State it explicitly, e.g. `Verdict: BLOCKED — 2 CRITICAL, 1 MEDIUM`.
+5. **Present findings** to the user:
    - Group by severity (CRITICAL first)
-   - For each finding, add your own assessment: agree, disagree, or nuance
+   - For each finding, add your own assessment: agree, disagree (with reasoning), or add nuance
    - Note anything Codex missed that you think is important
    - Summarize actionable items at the end
-5. **If Codex found issues**, offer to fix them.
+6. **If BLOCKED or WARNING, offer a bounded fix loop** (only with the user's go-ahead for automated fixing):
+   - Fix the issues you agree are genuine. If you disagree with a finding, explain why instead of "fixing" it just to satisfy Codex.
+   - Re-run the review (call `codex_exec` again on the updated diff) so Codex confirms the fixes and checks for regressions.
+   - Repeat until the verdict is APPROVED or you have done **3 rounds**, then stop and summarize what remains. The hard cap keeps the loop from quietly burning your Codex quota.
 ## Important
 - Codex runs in **read-only mode** — it cannot modify files
-- Codex is a **peer, not an authority** — evaluate each finding critically
-- If the MCP tool returns an error, explain what happened and suggest remediation
+- Codex is a **peer, not an authority** — evaluate each finding critically and keep the final call yourself
+- **Fail soft:** if `codex_exec` errors (Codex not installed, auth expired, offline), say so and fall back to your own review — a missing Codex should degrade to Claude-only review, never block the user