nex-code 0.5.1 → 0.5.3
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +122 -42
- package/dist/benchmark.js +307 -301
- package/dist/nex-code.js +457 -451
- package/package.json +1 -1
package/README.md
CHANGED
|
@@ -60,13 +60,23 @@ On first launch, an interactive setup wizard guides you through provider and cre
|
|
|
60
60
|
| Startup time | ~100ms | 1-4s |
|
|
61
61
|
| Runtime deps | 2 | heavy |
|
|
62
62
|
| Infra tools | SSH, Docker, K8s built-in | no |
|
|
63
|
+
<<<<<<< Updated upstream
|
|
63
64
|
|
|
64
65
|
**Smart model routing.** The built-in `/benchmark` tests all configured models across 62 tool-calling tasks in 5 categories and auto-routes to the best model per task type.
|
|
65
66
|
|
|
66
67
|
**Phase-based execution.** Tasks run through Plan (analyze) -> Implement (code) -> Verify (test) phases, each with the optimal model. Auto-loops back on test failures.
|
|
67
68
|
|
|
68
|
-
**45 built-in tools** across file ops, git, SSH, Docker, Kubernetes, deploy, browser, GitHub Actions, and visual review.
|
|
69
|
+
**45 built-in tools** across file ops, git, SSH, Docker, Kubernetes, deploy, browser, GitHub Actions, and visual review. See [Tools](#tools) for the full list.
|
|
69
70
|
|
|
71
|
+
=======
|
|
72
|
+
|
|
73
|
+
**Smart model routing.** The built-in `/benchmark` tests all configured models across 62 tool-calling tasks in 5 categories and auto-routes to the best model per task type.
|
|
74
|
+
|
|
75
|
+
**Phase-based execution.** Tasks run through Plan (analyze) -> Implement (code) -> Verify (test) phases, each with the optimal model. Auto-loops back on test failures.
|
|
76
|
+
|
|
77
|
+
**45 built-in tools** across file ops, git, SSH, Docker, Kubernetes, deploy, browser, GitHub Actions, and visual review. See [Tools](#tools) for the full list.
|
|
78
|
+
|
|
79
|
+
>>>>>>> Stashed changes
|
|
70
80
|
**2 runtime dependencies** (`axios`, `dotenv`). Starts in ~100ms. No Python, no heavy runtime.
|
|
71
81
|
|
|
72
82
|
---
|
|
@@ -76,17 +86,42 @@ On first launch, an interactive setup wizard guides you through provider and cre
|
|
|
76
86
|
Rankings from nex-code's own `/benchmark` — 62 tasks testing tool selection, argument validity, and schema compliance.
|
|
77
87
|
|
|
78
88
|
<!-- nex-benchmark-start -->
|
|
79
|
-
<!-- Updated: 2026-04-
|
|
89
|
+
<!-- Updated: 2026-04-05 — run `/benchmark --discover` after new Ollama Cloud releases -->
|
|
80
90
|
|
|
81
|
-
| Rank | Model | Score | Latency | Context | Best For |
|
|
91
|
+
| Rank | Model | Score | Avg Latency | Context | Best For |
|
|
82
92
|
|---|---|---|---|---|---|
|
|
83
|
-
|
|
|
84
|
-
|
|
|
85
|
-
|
|
|
86
|
-
|
|
|
87
|
-
|
|
|
88
|
-
|
|
89
|
-
|
|
93
|
+
| 🥇 | `qwen3-vl:235b` | **79** | 12.4s | 131K | Overall #1 — frontier tool selection, data + agentic tasks |
|
|
94
|
+
| 🥈 | `qwen3-vl:235b-instruct` | 78.2 | 5.3s | 131K | Best latency/score balance — recommended default |
|
|
95
|
+
| 🥉 | `nemotron-3-super` | 78.1 | 3.5s | 256K | — |
|
|
96
|
+
| — | `rnj-1:8b` | 77.4 | 3.9s | 131K | — |
|
|
97
|
+
| — | `mistral-large-3:675b` | 76.5 | 3.9s | 131K | — |
|
|
98
|
+
| — | `gpt-oss:20b` | 76.5 | 1.9s | 131K | Fast small model, good overall score |
|
|
99
|
+
| — | `qwen3-coder-next` | 75.7 | 2.2s | 256K | — |
|
|
100
|
+
| — | `qwen3-next:80b` | 75.1 | 11.1s | 131K | — |
|
|
101
|
+
| — | `ministral-3:8b` | 73.8 | 2.0s | 131K | Fastest strong model — 2.2s latency, 70+ score |
|
|
102
|
+
| — | `deepseek-v3.1:671b` | 73.6 | 2.9s | 131K | — |
|
|
103
|
+
| — | `devstral-2:123b` | 73.2 | 2.0s | 131K | Sysadmin + SSH tasks, reliable coding |
|
|
104
|
+
| — | `kimi-k2:1t` | 72.2 | 5.6s | 256K | Large repos (>100K tokens) |
|
|
105
|
+
| — | `ministral-3:3b` | 72 | 1.6s | 32K | — |
|
|
106
|
+
| — | `devstral-small-2:24b` | 71.7 | 2.6s | 131K | Fast sub-agents, simple lookups |
|
|
107
|
+
| — | `qwen3.5:397b` | 70.7 | 4.2s | 256K | — |
|
|
108
|
+
| — | `qwen3-coder:480b` | 70.1 | 6.0s | 131K | Heavy coding sessions, large context |
|
|
109
|
+
| — | `minimax-m2.1` | 69.9 | 3.0s | 200K | — |
|
|
110
|
+
| — | `gemma4:31b` | 69.3 | 2.8s | ? | — |
|
|
111
|
+
| — | `glm-4.7` | 69.1 | 5.3s | 131K | — |
|
|
112
|
+
| — | `kimi-k2-thinking` | 69 | 3.1s | 256K | — |
|
|
113
|
+
| — | `ministral-3:14b` | 68.8 | 2.0s | 131K | — |
|
|
114
|
+
| — | `kimi-k2.5` | 68.7 | 3.4s | 256K | Large repos — faster than k2:1t |
|
|
115
|
+
| — | `minimax-m2.7` | 68.4 | 5.5s | 200K | — |
|
|
116
|
+
| — | `glm-4.6` | 67.8 | 4.7s | 131K | — |
|
|
117
|
+
| — | `glm-5` | 67.4 | 5.0s | 131K | — |
|
|
118
|
+
| — | `gpt-oss:120b` | 64.8 | 3.4s | 131K | — |
|
|
119
|
+
| — | `nemotron-3-nano:30b` | 64.7 | 2.3s | 131K | — |
|
|
120
|
+
| — | `minimax-m2.5` | 61.9 | 2.7s | 131K | Multi-agent, large context |
|
|
121
|
+
| — | `minimax-m2` | 60.6 | 4.3s | 200K | — |
|
|
122
|
+
|
|
123
|
+
> Rankings are nex-code-specific: tool name accuracy, argument validity, schema compliance.
|
|
124
|
+
> Toolathon (Minimax SOTA) measures different task types — run `/benchmark --discover` after model releases.
|
|
90
125
|
<!-- nex-benchmark-end -->
|
|
91
126
|
|
|
92
127
|
**Recommended `.env`:**
|
|
@@ -111,7 +146,7 @@ OLLAMA_API_KEY=your-key # Ollama Cloud
|
|
|
111
146
|
OPENAI_API_KEY=your-key # OpenAI
|
|
112
147
|
ANTHROPIC_API_KEY=your-key # Anthropic
|
|
113
148
|
GEMINI_API_KEY=your-key # Gemini
|
|
114
|
-
PERPLEXITY_API_KEY=your-key # optional — grounded web search
|
|
149
|
+
PERPLEXITY_API_KEY=your-key # optional — enables grounded web search
|
|
115
150
|
|
|
116
151
|
DEFAULT_PROVIDER=ollama
|
|
117
152
|
DEFAULT_MODEL=devstral-2:123b
|
|
@@ -136,18 +171,20 @@ cp .env.example .env && npm link && npm run install-hooks
|
|
|
136
171
|
> the /users endpoint returns 500 — find the bug and fix it
|
|
137
172
|
```
|
|
138
173
|
|
|
139
|
-
|
|
174
|
+
### YOLO Mode
|
|
175
|
+
|
|
176
|
+
Skip all confirmations — file changes, dangerous commands, and tool permissions are auto-approved. Auto-runs `caffeinate` on macOS.
|
|
140
177
|
|
|
141
178
|
```bash
|
|
142
179
|
nex-code -yolo
|
|
143
180
|
```
|
|
144
181
|
|
|
145
|
-
|
|
182
|
+
### Headless / Programmatic Mode
|
|
146
183
|
|
|
147
184
|
```bash
|
|
148
185
|
nex-code --task "refactor src/index.js to async/await" --yolo
|
|
149
186
|
nex-code --prompt-file /tmp/task.txt --yolo --json
|
|
150
|
-
nex-code --daemon #
|
|
187
|
+
nex-code --daemon # watch mode: fires tasks on file changes, git commits, or cron
|
|
151
188
|
```
|
|
152
189
|
|
|
153
190
|
| Flag | Description |
|
|
@@ -156,33 +193,39 @@ nex-code --daemon # background watcher (reads .nex/daemon.json)
|
|
|
156
193
|
| `--prompt-file <path>` | Read prompt from file |
|
|
157
194
|
| `--yolo` | Skip all confirmations |
|
|
158
195
|
| `--server` | JSON-lines IPC server (VS Code extension) |
|
|
159
|
-
| `--daemon` | Background watcher (
|
|
196
|
+
| `--daemon` | Background watcher (reads `.nex/daemon.json`) |
|
|
160
197
|
| `--flatrate` | 100 turns, 6 parallel agents, 5 retries |
|
|
161
198
|
| `--json` | JSON output to stdout |
|
|
162
199
|
| `--max-turns <n>` | Override agentic loop limit |
|
|
163
|
-
| `--model <spec>` | e.g. `anthropic:claude-sonnet-4-6` |
|
|
200
|
+
| `--model <spec>` | Use specific model (e.g. `anthropic:claude-sonnet-4-6`) |
|
|
164
201
|
| `--debug` | Show diagnostic messages |
|
|
202
|
+
<<<<<<< Updated upstream
|
|
203
|
+
|
|
204
|
+
### Vision / Screenshot
|
|
205
|
+
|
|
206
|
+
=======
|
|
165
207
|
|
|
166
|
-
|
|
208
|
+
### Vision / Screenshot
|
|
167
209
|
|
|
210
|
+
>>>>>>> Stashed changes
|
|
168
211
|
```
|
|
169
212
|
> /path/to/screenshot.png implement this UI in React
|
|
170
213
|
> analyze https://example.com/mockup.png and implement it
|
|
171
|
-
> what's wrong with the layout in my clipboard # macOS clipboard
|
|
214
|
+
> what's wrong with the layout in my clipboard # macOS clipboard capture
|
|
172
215
|
> screenshot localhost:3000 and review the navbar spacing
|
|
173
216
|
```
|
|
174
217
|
|
|
175
|
-
Formats: PNG, JPG, GIF, WebP, BMP.
|
|
218
|
+
Works with Anthropic, OpenAI, Gemini, and Ollama vision models. Formats: PNG, JPG, GIF, WebP, BMP.
|
|
176
219
|
|
|
177
220
|
---
|
|
178
221
|
|
|
179
222
|
## Providers & Models
|
|
180
223
|
|
|
181
224
|
```
|
|
182
|
-
/model
|
|
183
|
-
/model openai:gpt-4o
|
|
184
|
-
/providers
|
|
185
|
-
/fallback anthropic,openai
|
|
225
|
+
/model # interactive picker
|
|
226
|
+
/model openai:gpt-4o # switch directly
|
|
227
|
+
/providers # list all
|
|
228
|
+
/fallback anthropic,openai # auto-switch on failure
|
|
186
229
|
```
|
|
187
230
|
|
|
188
231
|
| Provider | Models | Env Variable |
|
|
@@ -197,12 +240,13 @@ Formats: PNG, JPG, GIF, WebP, BMP. Works with all providers that support vision.
|
|
|
197
240
|
|
|
198
241
|
## Commands
|
|
199
242
|
|
|
200
|
-
Type `/`
|
|
243
|
+
Type `/` to see inline suggestions. Tab completion for slash commands and file paths.
|
|
201
244
|
|
|
202
245
|
| Command | Description |
|
|
203
246
|
|---|---|
|
|
204
247
|
| `/help` | Full help |
|
|
205
|
-
| `/model [spec]`
|
|
248
|
+
| `/model [spec]` | Show/switch model |
|
|
249
|
+
| `/providers` | List providers |
|
|
206
250
|
| `/clear` | Clear conversation |
|
|
207
251
|
| `/save` / `/load` / `/sessions` / `/resume` | Session management |
|
|
208
252
|
| `/branches` / `/fork` / `/switch-branch` / `/goto` | Session tree navigation |
|
|
@@ -227,7 +271,7 @@ Type `/` for inline suggestions. Tab completion for commands and file paths.
|
|
|
227
271
|
|
|
228
272
|
## Tools
|
|
229
273
|
|
|
230
|
-
45 built-in tools:
|
|
274
|
+
45 built-in tools organized by category:
|
|
231
275
|
|
|
232
276
|
**Core:** `bash`, `read_file`, `write_file`, `edit_file`, `patch_file`, `list_directory`, `search_files`, `glob`, `grep`
|
|
233
277
|
|
|
@@ -245,7 +289,7 @@ Type `/` for inline suggestions. Tab completion for commands and file paths.
|
|
|
245
289
|
|
|
246
290
|
**Deploy:** `deploy`, `deployment_status`
|
|
247
291
|
|
|
248
|
-
**Frontend:** `frontend_recon` — scans design tokens, layout,
|
|
292
|
+
**Frontend:** `frontend_recon` — scans design tokens, layout, framework stack before any frontend work
|
|
249
293
|
|
|
250
294
|
**Visual:** `visual_diff`, `responsive_sweep`, `visual_annotate`, `visual_watch`, `design_tokens`, `design_compare`
|
|
251
295
|
|
|
@@ -257,7 +301,7 @@ Additional tools via [MCP servers](#mcp) or [Skills](#skills).
|
|
|
257
301
|
|
|
258
302
|
### Multi-Agent Orchestrator
|
|
259
303
|
|
|
260
|
-
Multi-goal prompts auto-decompose into parallel sub-agents
|
|
304
|
+
Multi-goal prompts auto-decompose into parallel sub-agents. Up to 5 agents run simultaneously with file locking.
|
|
261
305
|
|
|
262
306
|
```bash
|
|
263
307
|
nex-code --task "fix type errors in src/, add JSDoc to utils/, update CHANGELOG"
|
|
@@ -265,24 +309,28 @@ nex-code --task "fix type errors in src/, add JSDoc to utils/, update CHANGELOG"
|
|
|
265
309
|
|
|
266
310
|
### Autoresearch
|
|
267
311
|
|
|
268
|
-
Autonomous optimization loops
|
|
312
|
+
Autonomous optimization loops: edit -> experiment -> keep/revert, on a dedicated branch.
|
|
269
313
|
|
|
270
314
|
```
|
|
271
315
|
/autoresearch reduce test runtime while maintaining correctness
|
|
272
|
-
/ar-self-improve #
|
|
316
|
+
/ar-self-improve # self-improvement using nex-code's benchmark
|
|
273
317
|
```
|
|
274
318
|
|
|
275
319
|
### Plan Mode
|
|
276
320
|
|
|
277
|
-
Auto-activates for implementation tasks. Read-only analysis first, approve before writes. Hard-enforced tool restrictions
|
|
321
|
+
Auto-activates for implementation tasks. Read-only analysis first, approve before writes. Hard-enforced tool restrictions.
|
|
278
322
|
|
|
279
323
|
### Daemon / Watch Mode
|
|
324
|
+
<<<<<<< Updated upstream
|
|
325
|
+
Background process that fires tasks on file changes, git commits, or cron schedule. Configured via `.nex/daemon.json`. Desktop and Matrix notifications.
|
|
326
|
+
=======
|
|
280
327
|
|
|
281
328
|
Background process that fires tasks on file changes, git commits, or cron schedule. Configured via `.nex/daemon.json`. Desktop and Matrix notifications.
|
|
282
329
|
|
|
330
|
+
>>>>>>> Stashed changes
|
|
283
331
|
### Session Trees
|
|
284
332
|
|
|
285
|
-
Navigate conversation history like git branches — fork, switch, goto, delete branches
|
|
333
|
+
Navigate conversation history like git branches — fork, switch, goto, delete branches.
|
|
286
334
|
|
|
287
335
|
### Safety
|
|
288
336
|
|
|
@@ -295,8 +343,6 @@ Navigate conversation history like git branches — fork, switch, goto, delete b
|
|
|
295
343
|
|
|
296
344
|
Pre-push secret detection, audit logging (JSONL), persistent undo/redo, cost limits, auto plan mode.
|
|
297
345
|
|
|
298
|
-
**Reporting vulnerabilities:** Email **security@schoensgibl.com** — not a public issue. 72h initial response.
|
|
299
|
-
|
|
300
346
|
### Open-Source Model Robustness
|
|
301
347
|
|
|
302
348
|
- **5-layer argument parsing** — JSON, trailing fix, extraction, key repair, fence stripping
|
|
@@ -304,10 +350,37 @@ Pre-push secret detection, audit logging (JSONL), persistent undo/redo, cost lim
|
|
|
304
350
|
- **Auto-fix engine** — path resolution, edit fuzzy matching (Levenshtein), bash error hints
|
|
305
351
|
- **Tool tiers** — essential (5) / standard (21) / full (45), auto-selected per model capability
|
|
306
352
|
- **Stale stream recovery** — progressive retry with context compression on stall
|
|
353
|
+
<<<<<<< Updated upstream
|
|
354
|
+
### Visual Development Tools
|
|
355
|
+
Pixel-level before/after comparison, responsive sweeps (320-1920px), annotation overlays, design token extraction, and live-reload diff watching. Pure image tools work standalone; browser-based tools need Playwright.
|
|
356
|
+
|
|
357
|
+
---
|
|
358
|
+
|
|
359
|
+
## Extensibility
|
|
360
|
+
|
|
361
|
+
### Skills
|
|
362
|
+
|
|
363
|
+
Drop `.md` or `.js` files in `.nex/skills/` for project-specific knowledge, commands, and tools. Global skills in `~/.nex-code/skills/`. Install from git: `/install-skill user/repo`.
|
|
364
|
+
|
|
365
|
+
### Plugins
|
|
366
|
+
|
|
367
|
+
Custom tools and lifecycle hooks via `.nex/plugins/`. Events: `onToolResult`, `onModelResponse`, `onSessionStart`, `onSessionEnd`, `onFileChange`, `beforeToolExec`, `afterToolExec`.
|
|
368
|
+
|
|
369
|
+
### MCP
|
|
370
|
+
|
|
371
|
+
Connect external tool servers via [Model Context Protocol](https://modelcontextprotocol.io). Configure in `.nex/mcp.json` with env var interpolation.
|
|
372
|
+
|
|
373
|
+
### Hooks
|
|
374
|
+
|
|
375
|
+
Run custom scripts on CLI events (`pre-tool`, `post-tool`, `pre-commit`, `post-response`, `session-start`, `session-end`). Configure in `.nex/config.json` or `.nex/hooks/`.
|
|
376
|
+
|
|
377
|
+
---
|
|
378
|
+
|
|
379
|
+
=======
|
|
307
380
|
|
|
308
381
|
### Visual Development Tools
|
|
309
382
|
|
|
310
|
-
Pixel-level before/after
|
|
383
|
+
Pixel-level before/after comparison, responsive sweeps (320-1920px), annotation overlays, design token extraction, and live-reload diff watching. Pure image tools work standalone; browser-based tools need Playwright.
|
|
311
384
|
|
|
312
385
|
---
|
|
313
386
|
|
|
@@ -331,13 +404,14 @@ Run custom scripts on CLI events (`pre-tool`, `post-tool`, `pre-commit`, `post-r
|
|
|
331
404
|
|
|
332
405
|
---
|
|
333
406
|
|
|
407
|
+
>>>>>>> Stashed changes
|
|
334
408
|
## VS Code Extension
|
|
335
409
|
|
|
336
410
|
Built-in sidebar chat panel (`vscode/`) with streaming output, collapsible tool cards, and native theme support. Spawns `nex-code --server` over JSON-lines IPC.
|
|
337
411
|
|
|
338
412
|
```bash
|
|
339
413
|
cd vscode && npm install && npm run package
|
|
340
|
-
# Cmd+Shift+P
|
|
414
|
+
# Cmd+Shift+P -> Extensions: Install from VSIX...
|
|
341
415
|
```
|
|
342
416
|
|
|
343
417
|
---
|
|
@@ -352,7 +426,11 @@ cli/
|
|
|
352
426
|
tools/index.js # 45 tool definitions + auto-fix engine
|
|
353
427
|
context-engine.js # Token management + 5-phase compression
|
|
354
428
|
sub-agent.js # Parallel sub-agents with file locking
|
|
355
|
-
|
|
429
|
+
<<<<<<< Updated upstream
|
|
430
|
+
orchestrator.js # Multi-agent decompose -> execute -> synthesize
|
|
431
|
+
=======
|
|
432
|
+
orchestrator.js # Multi-agent decompose -> execute -> synthesize
|
|
433
|
+
>>>>>>> Stashed changes
|
|
356
434
|
session-tree.js # Session branching
|
|
357
435
|
visual.js # Visual dev tools (pixelmatch-based)
|
|
358
436
|
browser.js # Playwright browser agent
|
|
@@ -366,10 +444,10 @@ See [DEVELOPMENT.md](DEVELOPMENT.md) for full architecture details.
|
|
|
366
444
|
## Testing
|
|
367
445
|
|
|
368
446
|
```bash
|
|
369
|
-
npm test
|
|
370
|
-
npm run typecheck
|
|
371
|
-
npm run benchmark:gate
|
|
372
|
-
npm run benchmark:reallife
|
|
447
|
+
npm test # 97 suites, 3920 tests
|
|
448
|
+
npm run typecheck # TypeScript noEmit check
|
|
449
|
+
npm run benchmark:gate # 7-task smoke test (blocks push on regression)
|
|
450
|
+
npm run benchmark:reallife # 35 real-world tasks across 7 categories
|
|
373
451
|
```
|
|
374
452
|
|
|
375
453
|
---
|
|
@@ -379,10 +457,12 @@ npm run benchmark:reallife # 35 real-world tasks across 7 categories
|
|
|
379
457
|
- Pre-push secret detection (API keys, private keys, hardcoded credentials)
|
|
380
458
|
- Audit logging with automatic argument sanitization
|
|
381
459
|
- Sensitive path blocking (`.ssh/`, `.aws/`, `.env`, credentials)
|
|
382
|
-
- Shell injection protection via `execFileSync` with argument arrays
|
|
460
|
+
- Shell injection protection via `execFileSync` with argument arrays
|
|
383
461
|
- SSRF protection on `web_fetch`
|
|
384
462
|
- MCP environment isolation
|
|
385
463
|
|
|
464
|
+
**Reporting vulnerabilities:** Email **security@schoensgibl.com** (not a public issue). Allow 72h for initial response.
|
|
465
|
+
|
|
386
466
|
---
|
|
387
467
|
|
|
388
468
|
## License
|