skill-codex 0.3.0 → 0.8.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,17 @@
1
+ {
2
+ "description": "Suggests /codex-review after significant code changes.",
3
+ "hooks": {
4
+ "PostToolUse": [
5
+ {
6
+ "matcher": "Write|Edit|MultiEdit|NotebookEdit",
7
+ "hooks": [
8
+ {
9
+ "type": "command",
10
+ "command": "node \"${CLAUDE_PLUGIN_ROOT}/hooks/post-tool-use-review.mjs\"",
11
+ "timeout": 30
12
+ }
13
+ ]
14
+ }
15
+ ]
16
+ }
17
+ }
@@ -0,0 +1,23 @@
1
+ {
2
+ "name": "skill-codex",
3
+ "owner": {
4
+ "name": "Arystos",
5
+ "url": "https://github.com/Arystos"
6
+ },
7
+ "plugins": [
8
+ {
9
+ "name": "skill-codex",
10
+ "source": "./",
11
+ "description": "Use OpenAI Codex from Claude Code for code review, task delegation, and second opinions via MCP — with your Codex subscription, no API key. Windows-native, live progress.",
12
+ "version": "0.8.0",
13
+ "author": {
14
+ "name": "Arystos",
15
+ "url": "https://github.com/Arystos"
16
+ },
17
+ "homepage": "https://github.com/Arystos/skill-codex",
18
+ "license": "MIT",
19
+ "keywords": ["codex", "code-review", "mcp", "developer-tools"],
20
+ "category": "development"
21
+ }
22
+ ]
23
+ }
@@ -0,0 +1,16 @@
1
+ {
2
+ "name": "skill-codex",
3
+ "description": "Use OpenAI Codex from Claude Code for code review, task delegation, and second opinions via MCP — with your Codex subscription, no API key.",
4
+ "version": "0.8.0",
5
+ "author": {
6
+ "name": "Arystos",
7
+ "url": "https://github.com/Arystos"
8
+ },
9
+ "homepage": "https://github.com/Arystos/skill-codex",
10
+ "repository": {
11
+ "type": "git",
12
+ "url": "https://github.com/Arystos/skill-codex.git"
13
+ },
14
+ "license": "MIT",
15
+ "keywords": ["codex", "code-review", "task-delegation", "mcp", "developer-tools", "llm-tools"]
16
+ }
package/.mcp.json ADDED
@@ -0,0 +1,9 @@
1
+ {
2
+ "mcpServers": {
3
+ "skill-codex": {
4
+ "command": "npx",
5
+ "args": ["-y", "skill-codex", "mcp"],
6
+ "env": {}
7
+ }
8
+ }
9
+ }
package/README.md CHANGED
@@ -3,6 +3,9 @@
3
3
  [![npm version](https://img.shields.io/npm/v/skill-codex)](https://www.npmjs.com/package/skill-codex)
4
4
  [![npm downloads](https://img.shields.io/npm/dm/skill-codex)](https://www.npmjs.com/package/skill-codex)
5
5
  ![CI](https://github.com/Arystos/skill-codex/actions/workflows/ci.yml/badge.svg)
6
+ <!-- After enabling the repo on codecov.io (see DISTRIBUTION.md), uncomment:
7
+ [![codecov](https://codecov.io/gh/Arystos/skill-codex/graph/badge.svg)](https://codecov.io/gh/Arystos/skill-codex) -->
8
+ ![Coverage](https://img.shields.io/badge/coverage-80%25%2B%20enforced-brightgreen)
6
9
  ![Windows](https://img.shields.io/badge/Windows-0078D4?style=flat&logo=windows&logoColor=white)
7
10
  ![macOS](https://img.shields.io/badge/macOS-000000?style=flat&logo=apple&logoColor=white)
8
11
  ![Linux](https://img.shields.io/badge/Linux-FCC624?style=flat&logo=linux&logoColor=black)
@@ -12,6 +15,12 @@
12
15
 
13
16
  A cross-platform [Claude Code](https://code.claude.com) skill that integrates [OpenAI Codex CLI](https://github.com/openai/codex) for code review, task delegation, and consultation. Uses your existing Codex subscription -- no API key required.
14
17
 
18
+ > **Two models rarely make the same mistake.** skill-codex lets Claude and Codex check each other's work -- Claude plans and builds fast, Codex reviews with fresh eyes -- from a single terminal, using the Codex subscription you already pay for. No API key, no second window, no copy-paste.
19
+
20
+ <!-- Demo GIF: record a ~20s clip of `/codex-review` catching a real bug, save it to docs/demo.gif, then uncomment the next line:
21
+ ![skill-codex demo](docs/demo.gif)
22
+ -->
23
+
15
24
  ## Why?
16
25
 
17
26
  Claude Code and Codex CLI have different strengths. Claude excels at reasoning, architecture, and complex refactors. Codex is fast, thorough, and great at focused execution and review. **skill-codex** lets them work together from a single terminal -- no second window, no copy-paste, no context loss.
@@ -19,10 +28,30 @@ Claude Code and Codex CLI have different strengths. Claude excels at reasoning,
19
28
  * **`/codex-review`** -- Have Codex review your current changes as a second reviewer
20
29
  * **`/codex-do`** -- Delegate well-scoped implementation tasks to Codex
21
30
  * **`/codex-consult`** -- Get a second opinion on architecture or design decisions
31
+ * **`codex-bridge` agent skill** -- Auto-triggers on implementation/review/consult requests so Claude reaches for Codex without an explicit slash command
22
32
  * **Auto-review hook** -- Smart PostToolUse hook suggests review after significant changes
33
+ * **Live progress** -- Streams Codex's activity (running commands, file edits, elapsed time) as MCP progress so long runs never look frozen; also mirrored to a tail-able log file (under the OS temp dir, path printed at run start -- never pollutes your repo)
23
34
  * **Subscription-first** -- Works with `codex login`, no `OPENAI_API_KEY` needed
24
35
  * **Edge case handling** -- Retry logic, timeout, anti-recursion, lock files, pre-flight checks
25
36
 
37
+ ### Why not just use Claude's own subagents?
38
+
39
+ Because a subagent is the **same model**. When Claude reviews Claude, you inherit the same blind spots -- it's grading its own homework. A *different* model family was trained differently, so its mistakes don't correlate with Claude's: it catches what Claude is confidently wrong about, and Claude catches what Codex gets wrong. That uncorrelated, cross-model check is the entire point -- and skill-codex keeps Claude as the final judge, never blindly forwarding Codex's verdict.
40
+
41
+ ### How it compares
42
+
43
+ | | skill-codex | Most Codex MCP bridges |
44
+ |---|---|---|
45
+ | Codex **subscription** auth (no `OPENAI_API_KEY`) | ✅ | ⚠️ often require an API key |
46
+ | **Windows** verified in CI (not just "should work") | ✅ 9-way matrix (Windows/macOS/Linux × Node 18/20/22) | ❌ usually Linux-only CI |
47
+ | **Live progress** so long runs never look frozen | ✅ MCP progress + tail-able log | ⚠️ varies |
48
+ | **Slash commands** (`/codex-review`, `/codex-do`, `/codex-consult`) | ✅ | ⚠️ some |
49
+ | **Auto-review hook** (PostToolUse) | ✅ | ❌ |
50
+ | **Agent skill** (auto-triggers, no command needed) | ✅ | ❌ |
51
+ | **Guardrails**: retry, timeout, anti-recursion, lock files | ✅ | ⚠️ partial |
52
+
53
+ *A snapshot, not a leaderboard -- the Codex MCP space moves fast, so check each tool's current state. The point isn't "skill-codex wins everything"; it's that the operational details (subscription auth, real Windows support, never-frozen runs, guardrails) are where it focuses.*
54
+
26
55
  ## Prerequisites
27
56
 
28
57
  * [Node.js](https://nodejs.org) >= 18
@@ -49,55 +78,80 @@ The setup command:
49
78
  1. Registers the MCP server in your Claude Code config (`~/.claude.json`)
50
79
  2. Installs slash commands globally (`~/.claude/commands/`)
51
80
  3. Configures the auto-review PostToolUse hook
52
- 4. Verifies everything works
81
+ 4. Installs the `codex-bridge` agent skill (`~/.claude/skills/`)
82
+ 5. Verifies everything works
53
83
 
54
84
  > **Tip:** Add `.skill-codex.lock` to your `.gitignore`
55
85
 
86
+ ### Install as a Claude Code plugin (alternative)
87
+
88
+ Instead of `npx skill-codex setup`, you can install it as a plugin:
89
+
90
+ ```shell
91
+ /plugin marketplace add Arystos/skill-codex
92
+ /plugin install skill-codex@skill-codex
93
+ ```
94
+
95
+ The plugin bundles the MCP server (launched via `npx -y skill-codex mcp`), the slash commands, the agent skill, and the auto-review hook. Restart Claude Code after installing.
96
+
56
97
  ## How It Works
57
98
 
58
99
  ```
59
100
  You in Claude Code
60
101
  |
61
- |-- /codex-review --> MCP tool --> codex exec (read-only) --> review findings
62
- |-- /codex-do "task" --> MCP tool --> codex exec --full-auto --> reviewed output
63
- +-- /codex-consult "q" --> MCP tool --> codex exec (read-only) --> synthesized opinion
102
+ |-- /codex-review --> MCP tool --> codex exec --sandbox read-only --> review findings
103
+ |-- /codex-do "task" --> MCP tool --> codex exec --sandbox workspace-write --> reviewed output
104
+ +-- /codex-consult "q" --> MCP tool --> codex exec --sandbox read-only --> synthesized opinion
64
105
  ```
65
106
 
66
107
  The MCP server spawns `codex exec` as a subprocess, using your logged-in Codex session. Claude sees the output and critically evaluates it -- **Codex is treated as a peer, not an authority**.
67
108
 
109
+ ### Advanced: sandbox, sessions, model & review
110
+
111
+ These `codex_exec` parameters are all optional -- omit them and Codex uses its defaults:
112
+
113
+ * **`sandbox`** -- the explicit Codex sandbox policy (`read-only`, `workspace-write`, or `danger-full-access`), overriding the `mode` default. Use `danger-full-access` only when you understand the risk.
114
+ * **`sessionId`** -- resume a previous Codex session for multi-round memory. Each response includes the session's thread id; pass it back so Codex retains context -- e.g. a follow-up review that checks whether *previously flagged* issues were fixed, instead of re-discovering them.
115
+ * **`model`** -- pick the Codex model (e.g. `gpt-5.5`, `gpt-5.4`, `gpt-5.4-mini`). Route cheap tasks to a smaller model and escalate hard ones; omit to use your configured default.
116
+ * **`reasoningEffort`** -- how hard Codex thinks: `minimal` | `low` | `medium` | `high` | `xhigh`. Omit for the model's default.
117
+ * **`review`** -- run Codex's native diff-scoped reviewer (`codex exec review`) instead of a freeform prompt, optionally targeting a branch (`reviewBase`) or commit (`reviewCommit`). The prompt becomes optional focus instructions.
118
+
68
119
  ### Auto-review
69
120
 
70
121
  After significant code changes (3+ files, 100+ lines, security-related paths), the PostToolUse hook suggests running `/codex-review`. Trivial changes (docs-only, < 5 lines, whitespace) are skipped to preserve your Codex quota.
71
122
 
72
- ### Example: `/codex-review`
123
+ ### Agent skill (auto-trigger)
73
124
 
74
- ```
75
- > /codex-review
125
+ The slash commands are explicit, on-demand entry points. The `codex-bridge` agent skill is the implicit one: it installs to `~/.claude/skills/` and Claude loads it automatically when your request matches a delegation, review, or consult pattern (e.g. "implement X", "generate tests for", "review this diff", "second opinion on this approach"). It maps the same three workflows -- delegate, review, consult -- onto the `codex_exec` tool, so you get Codex collaboration without remembering a command. Trivial tasks (< 50 lines, faster done directly) are explicitly out of scope.
76
126
 
77
- Reviewing uncommitted changes (3 files, ~85 lines)...
127
+ ### Example Output
78
128
 
79
- Codex found 2 issues:
129
+ When Codex runs, the MCP tool returns a structured plain-text response:
130
+
131
+ ```
132
+ [read-only │ D:\myproject │ 21538 tok in (15744 cached) → 129 out]
133
+ ✔ exec: powershell -Command 'git diff --cached' (ok)
80
134
 
81
135
  CRITICAL — src/auth/login.ts:42
82
136
  Password comparison uses == instead of timing-safe comparison.
83
- Claude's assessment: Agree. Use crypto.timingSafeEqual() instead.
84
137
 
85
138
  MEDIUM — src/api/routes.ts:18
86
139
  Missing rate limiting on login endpoint.
87
- Claude's assessment: Agree. Add rate limiter middleware.
88
-
89
- Shall I fix these issues?
90
140
  ```
91
141
 
142
+ The first line shows mode, working directory, and token usage. Activity lines show commands Codex executed (with status icons: ✔ ok, ✘ blocked/failed). The response content follows after a blank line.
143
+
92
144
  ## Configuration
93
145
 
94
146
  Environment variables (all optional):
95
147
 
96
148
  | Variable | Default | Description |
97
149
  |----------|---------|-------------|
98
- | `SKILL_CODEX_TIMEOUT_MS` | `600000` (10 min) | Subprocess timeout |
150
+ | `SKILL_CODEX_TIMEOUT_MS` | `300000` (5 min) | Subprocess timeout |
99
151
  | `SKILL_CODEX_MAX_RETRIES` | `3` | Retry count for transient errors |
152
+ | `SKILL_CODEX_LOG` | `<os-temp>/skill-codex/<workspace>.log` | Absolute path override for the live, tail-able per-run log |
100
153
  | `SKILL_CODEX_DEBUG` | -- | Enable debug logging to stderr |
154
+ | `SKILL_CODEX_WINDOWS_SANDBOX` | `unelevated` | Windows only — Codex `windows.sandbox` mode (`unelevated`/`elevated`) |
101
155
 
102
156
  ### Smart Filter Thresholds
103
157
 
@@ -146,6 +200,8 @@ skill-codex/
146
200
  |-- hooks/
147
201
  | |-- post-tool-use-review.sh # Auto-review hook (macOS/Linux)
148
202
  | +-- post-tool-use-review.ps1 # Auto-review hook (Windows)
203
+ |-- skills/
204
+ | +-- codex-bridge/SKILL.md # Agent skill (auto-triggers delegation/review/consult)
149
205
  |-- setup/ # npx skill-codex setup installer
150
206
  |-- bin/ # CLI entry point
151
207
  +-- __tests__/ # vitest test suite
@@ -165,6 +221,7 @@ cd skill-codex
165
221
  npm install
166
222
  npm run build
167
223
  npm test
224
+ npm run test:coverage # enforces the same 80%+ gate as CI
168
225
  ```
169
226
 
170
227
  ## Troubleshooting
@@ -179,11 +236,14 @@ Restart Claude Code after running setup. The MCP server only loads on startup.
179
236
  Increase the timeout: `export SKILL_CODEX_TIMEOUT_MS=1200000` (20 min). Large codebases can take longer.
180
237
 
181
238
  **"Auth expired" but Codex works in another terminal**
182
- The MCP server runs in its own process. Run `codex login` and restart Claude Code.
239
+ The MCP server runs in its own process. Run `codex login` and restart Claude Code. On Windows, the auth pre-check is skipped automatically (PowerShell profile errors can cause false negatives); auth is still verified when Codex actually runs.
183
240
 
184
241
  **Lock file blocking runs**
185
242
  If a previous run crashed, a stale `.skill-codex.lock` may remain. It auto-cleans after 15 minutes, or delete it manually.
186
243
 
244
+ **Windows: "windows sandbox failed: spawn setup refresh" / Codex commands all blocked**
245
+ Codex's default *elevated* Windows sandbox fails to spawn shells on many setups ([openai/codex#24098](https://github.com/openai/codex/issues/24098), [#24259](https://github.com/openai/codex/issues/24259)). skill-codex works around this by pinning `windows.sandbox=unelevated`, which spawns reliably. If your machine needs the elevated sandbox instead, set `SKILL_CODEX_WINDOWS_SANDBOX=elevated`.
246
+
187
247
  ## Inspired By
188
248
 
189
249
  * [Dunqing/claude-codex-bridge](https://github.com/Dunqing/claude-codex-bridge) -- retry logic, anti-recursion, output parsing
@@ -10,7 +10,7 @@ You are delegating an implementation task to Codex. Follow these steps:
10
10
 
11
11
  2. **Evaluate if the task is suitable for delegation**:
12
12
  - **Good for Codex**: repetitive bulk changes, boilerplate generation, test writing for existing code, migration scripts, file format conversions, well-defined single-file edits
13
- - **Keep for yourself**: architectural decisions, cross-module refactoring, tasks requiring deep conversation context, ambiguous requirements, tasks under 50 lines
13
+ - **Keep for yourself**: architectural decisions, cross-module refactoring, tasks requiring deep conversation context, ambiguous requirements, and trivial edits you'd finish faster yourself than you can write a spec for
14
14
  - If the task is poorly scoped, explain why and suggest how to break it down.
15
15
 
16
16
  3. **Prepare a precise, self-contained prompt** for Codex. Include:
@@ -25,7 +25,8 @@ You are delegating an implementation task to Codex. Follow these steps:
25
25
  - `requireGit`: true
26
26
 
27
27
  5. **Review Codex's output critically**:
28
- - Run `git diff` to see exactly what Codex changed
28
+ - Run `git status --short` **first** to see ALL changes, including **newly-created files** (lines starting with `??`). Codex in full-auto mode often creates new files, and `git diff` alone is blind to untracked files.
29
+ - Run `git diff` for modifications to tracked files, and **read any new untracked files directly** to review their contents.
29
30
  - Check for introduced bugs, regressions, or style violations
30
31
  - Verify it matches the requested changes
31
32
  - Verify it doesn't modify files outside the requested scope
@@ -39,6 +40,6 @@ You are delegating an implementation task to Codex. Follow these steps:
39
40
  ## Important
40
41
 
41
42
  - Codex runs in **full-auto mode** — it CAN modify files in the workspace
42
- - Always `git diff` after Codex runs to verify what changed
43
+ - Always run `git status --short` **and** `git diff` after Codex runs `git status` catches new files that `git diff` misses
43
44
  - Codex is a **peer, not an authority** — review all output before accepting
44
45
  - For complex multi-file tasks, consider doing it yourself instead
@@ -7,7 +7,7 @@ Review the current changes using Codex as a second reviewer.
7
7
  You are invoking Codex for a second-opinion code review. Follow these steps:
8
8
 
9
9
  1. **Determine what to review** based on `$ARGUMENTS`:
10
- - If empty or "uncommitted": get unstaged changes via `git diff`. If empty, try staged changes via `git diff --cached`. If both empty, inform the user there are no changes to review.
10
+ - If empty or "uncommitted": first run `git status --short` to see ALL changes, including **new untracked files** (lines starting with `??`). Collect tracked modifications via `git diff` and staged changes via `git diff --cached`, **and** include the full contents of any new untracked files — read them directly, or use `git diff --no-index -- /dev/null <file>`. Plain `git diff` is blind to untracked files, so a review that skips this step will miss brand-new files entirely. If there are no changes at all, inform the user and stop.
11
11
  - If it looks like a branch name: get changes via `git diff <branch>...HEAD`
12
12
  - If it looks like a commit SHA: get changes via `git show <sha>`
13
13
 
@@ -20,16 +20,27 @@ You are invoking Codex for a second-opinion code review. Follow these steps:
20
20
  - `mode`: "exec"
21
21
  - `requireGit`: true
22
22
 
23
- 4. **Present findings** to the user:
23
+ **Option B (native Codex review):** Default to the Claude-assembled diff above, since the user prefers Claude-led review. You may instead call `codex_exec` with `review: true` to use `codex exec review`; add `reviewBase` for a branch or `reviewCommit` for a SHA. The prompt is optional focus instructions.
24
+
25
+ 4. **Assign an overall verdict** from the findings (you are the final judge — weigh Codex's findings, don't just echo them):
26
+ - **BLOCKED** — one or more CRITICAL/HIGH issues that should be fixed before merge
27
+ - **WARNING** — only MEDIUM/LOW issues; safe to proceed with awareness
28
+ - **APPROVED** — no substantive issues
29
+ State it explicitly, e.g. `Verdict: BLOCKED — 2 CRITICAL, 1 MEDIUM`.
30
+
31
+ 5. **Present findings** to the user:
24
32
  - Group by severity (CRITICAL first)
25
- - For each finding, add your own assessment: agree, disagree, or nuance
33
+ - For each finding, add your own assessment: agree, disagree (with reasoning), or add nuance
26
34
  - Note anything Codex missed that you think is important
27
35
  - Summarize actionable items at the end
28
36
 
29
- 5. **If Codex found issues**, offer to fix them.
37
+ 6. **If BLOCKED or WARNING, offer a bounded fix loop** (only with the user's go-ahead for automated fixing):
38
+ - Fix the issues you agree are genuine. If you disagree with a finding, explain why instead of "fixing" it just to satisfy Codex.
39
+ - Re-run the review (call `codex_exec` again on the updated diff) so Codex confirms the fixes and checks for regressions.
40
+ - Repeat until the verdict is APPROVED or you have done **3 rounds**, then stop and summarize what remains. The hard cap keeps the loop from quietly burning your Codex quota.
30
41
 
31
42
  ## Important
32
43
 
33
44
  - Codex runs in **read-only mode** — it cannot modify files
34
- - Codex is a **peer, not an authority** — evaluate each finding critically
35
- - If the MCP tool returns an error, explain what happened and suggest remediation
45
+ - Codex is a **peer, not an authority** — evaluate each finding critically and keep the final call yourself
46
+ - **Fail soft:** if `codex_exec` errors (Codex not installed, auth expired, offline), say so and fall back to your own review — a missing Codex should degrade to Claude-only review, never block the user