nex-code 0.5.4 → 0.5.6
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +30 -82
- package/dist/benchmark.js +375 -331
- package/dist/nex-code.js +578 -526
- package/package.json +3 -3
package/README.md
CHANGED
|
@@ -60,7 +60,6 @@ On first launch, an interactive setup wizard guides you through provider and cre
|
|
|
60
60
|
| Startup time | ~100ms | 1-4s |
|
|
61
61
|
| Runtime deps | 2 | heavy |
|
|
62
62
|
| Infra tools | SSH, Docker, K8s built-in | no |
|
|
63
|
-
<<<<<<< Updated upstream
|
|
64
63
|
|
|
65
64
|
**Smart model routing.** The built-in `/benchmark` tests all configured models across 62 tool-calling tasks in 5 categories and auto-routes to the best model per task type.
|
|
66
65
|
|
|
@@ -68,15 +67,6 @@ On first launch, an interactive setup wizard guides you through provider and cre
|
|
|
68
67
|
|
|
69
68
|
**45 built-in tools** across file ops, git, SSH, Docker, Kubernetes, deploy, browser, GitHub Actions, and visual review. See [Tools](#tools) for the full list.
|
|
70
69
|
|
|
71
|
-
=======
|
|
72
|
-
|
|
73
|
-
**Smart model routing.** The built-in `/benchmark` tests all configured models across 62 tool-calling tasks in 5 categories and auto-routes to the best model per task type.
|
|
74
|
-
|
|
75
|
-
**Phase-based execution.** Tasks run through Plan (analyze) -> Implement (code) -> Verify (test) phases, each with the optimal model. Auto-loops back on test failures.
|
|
76
|
-
|
|
77
|
-
**45 built-in tools** across file ops, git, SSH, Docker, Kubernetes, deploy, browser, GitHub Actions, and visual review. See [Tools](#tools) for the full list.
|
|
78
|
-
|
|
79
|
-
>>>>>>> Stashed changes
|
|
80
70
|
**2 runtime dependencies** (`axios`, `dotenv`). Starts in ~100ms. No Python, no heavy runtime.
|
|
81
71
|
|
|
82
72
|
---
|
|
@@ -86,39 +76,37 @@ On first launch, an interactive setup wizard guides you through provider and cre
|
|
|
86
76
|
Rankings from nex-code's own `/benchmark` — 62 tasks testing tool selection, argument validity, and schema compliance.
|
|
87
77
|
|
|
88
78
|
<!-- nex-benchmark-start -->
|
|
89
|
-
<!-- Updated: 2026-04-
|
|
79
|
+
<!-- Updated: 2026-04-09 — run `/benchmark --discover` after new Ollama Cloud releases -->
|
|
90
80
|
|
|
91
81
|
| Rank | Model | Score | Avg Latency | Context | Best For |
|
|
92
82
|
|---|---|---|---|---|---|
|
|
93
|
-
| 🥇 | `qwen3-vl:235b` | **
|
|
94
|
-
| 🥈 | `
|
|
95
|
-
| 🥉 | `
|
|
96
|
-
| — | `
|
|
97
|
-
| — | `
|
|
98
|
-
| — | `
|
|
99
|
-
| — | `
|
|
100
|
-
| — | `
|
|
101
|
-
| — | `
|
|
102
|
-
| — | `
|
|
103
|
-
| — | `devstral-2:
|
|
104
|
-
| — | `
|
|
105
|
-
| — | `
|
|
106
|
-
| — | `
|
|
107
|
-
| — | `
|
|
108
|
-
| — | `
|
|
109
|
-
| — | `
|
|
110
|
-
| — | `
|
|
111
|
-
| — | `glm-
|
|
112
|
-
| — | `
|
|
113
|
-
| — | `
|
|
114
|
-
| — | `
|
|
115
|
-
| — | `
|
|
116
|
-
| — | `
|
|
117
|
-
| — | `
|
|
118
|
-
| — | `
|
|
119
|
-
| — | `
|
|
120
|
-
| — | `minimax-m2.5` | 61.9 | 2.7s | 131K | Multi-agent, large context |
|
|
121
|
-
| — | `minimax-m2` | 60.6 | 4.3s | 200K | — |
|
|
83
|
+
| 🥇 | `qwen3-vl:235b` | **80.1** | 12.9s | 131K | Overall #1 — frontier tool selection, data + agentic tasks |
|
|
84
|
+
| 🥈 | `rnj-1:8b` | 78.6 | 2.7s | 131K | — |
|
|
85
|
+
| 🥉 | `qwen3-vl:235b-instruct` | 78.4 | 7.3s | 131K | Best latency/score balance — recommended default |
|
|
86
|
+
| — | `nemotron-3-super` | 76.2 | 2.8s | 256K | — |
|
|
87
|
+
| — | `deepseek-v3.1:671b` | 74.8 | 5.6s | 131K | — |
|
|
88
|
+
| — | `qwen3-coder-next` | 74.5 | 2.9s | 256K | — |
|
|
89
|
+
| — | `ministral-3:3b` | 73.6 | 2.4s | 32K | — |
|
|
90
|
+
| — | `ministral-3:8b` | 72.6 | 1.9s | 131K | Fastest strong model — 2.2s latency, 70+ score |
|
|
91
|
+
| — | `qwen3-next:80b` | 72.2 | 11.5s | 131K | — |
|
|
92
|
+
| — | `mistral-large-3:675b` | 70.9 | 5.7s | 131K | — |
|
|
93
|
+
| — | `devstral-small-2:24b` | 70.9 | 2.8s | 131K | Fast sub-agents, simple lookups |
|
|
94
|
+
| — | `devstral-2:123b` | 70.9 | 4.0s | 131K | Sysadmin + SSH tasks, reliable coding |
|
|
95
|
+
| — | `minimax-m2.1` | 70.7 | 4.3s | 200K | — |
|
|
96
|
+
| — | `gpt-oss:20b` | 70.2 | 3.9s | 131K | Fast small model, good overall score |
|
|
97
|
+
| — | `kimi-k2:1t` | 69.9 | 5.0s | 256K | Large repos (>100K tokens) |
|
|
98
|
+
| — | `kimi-k2.5` | 69 | 5.8s | 256K | Large repos — faster than k2:1t |
|
|
99
|
+
| — | `kimi-k2-thinking` | 69 | 4.0s | 256K | — |
|
|
100
|
+
| — | `glm-5` | 69 | 7.2s | 131K | — |
|
|
101
|
+
| — | `glm-5.1` | 68.8 | 9.7s | ? | — |
|
|
102
|
+
| — | `gemma4:31b` | 68.7 | 3.3s | ? | — |
|
|
103
|
+
| — | `minimax-m2.7` | 68.6 | 5.1s | 200K | — |
|
|
104
|
+
| — | `nemotron-3-nano:30b` | 67.8 | 2.9s | 131K | — |
|
|
105
|
+
| — | `ministral-3:14b` | 67.7 | 2.3s | 131K | — |
|
|
106
|
+
| — | `qwen3-coder:480b` | 67.2 | 7.7s | 131K | Heavy coding sessions, large context |
|
|
107
|
+
| — | `qwen3.5:397b` | 67.1 | 7.2s | 256K | — |
|
|
108
|
+
| — | `glm-4.6` | 65.2 | 7.5s | 131K | — |
|
|
109
|
+
| — | `gpt-oss:120b` | 64.6 | 3.7s | 131K | — |
|
|
122
110
|
|
|
123
111
|
> Rankings are nex-code-specific: tool name accuracy, argument validity, schema compliance.
|
|
124
112
|
> Toolathon (Minimax SOTA) measures different task types — run `/benchmark --discover` after model releases.
|
|
@@ -199,15 +187,11 @@ nex-code --daemon # watch mode: fires tasks on file changes, git commit
|
|
|
199
187
|
| `--max-turns <n>` | Override agentic loop limit |
|
|
200
188
|
| `--model <spec>` | Use specific model (e.g. `anthropic:claude-sonnet-4-6`) |
|
|
201
189
|
| `--debug` | Show diagnostic messages |
|
|
202
|
-
|
|
203
|
-
|
|
204
|
-
### Vision / Screenshot
|
|
205
|
-
|
|
206
|
-
=======
|
|
190
|
+
| `--gemini` | Local Gemini test mode (`gemini-3.1-pro-preview` by default, requires `GEMINI_API_KEY`) |
|
|
191
|
+
| `--gemini-model <id>` | Pin a specific Gemini model (implies `--gemini`) |
|
|
207
192
|
|
|
208
193
|
### Vision / Screenshot
|
|
209
194
|
|
|
210
|
-
>>>>>>> Stashed changes
|
|
211
195
|
```
|
|
212
196
|
> /path/to/screenshot.png implement this UI in React
|
|
213
197
|
> analyze https://example.com/mockup.png and implement it
|
|
@@ -321,13 +305,9 @@ Autonomous optimization loops: edit -> experiment -> keep/revert, on a dedicated
|
|
|
321
305
|
Auto-activates for implementation tasks. Read-only analysis first, approve before writes. Hard-enforced tool restrictions.
|
|
322
306
|
|
|
323
307
|
### Daemon / Watch Mode
|
|
324
|
-
<<<<<<< Updated upstream
|
|
325
|
-
Background process that fires tasks on file changes, git commits, or cron schedule. Configured via `.nex/daemon.json`. Desktop and Matrix notifications.
|
|
326
|
-
=======
|
|
327
308
|
|
|
328
309
|
Background process that fires tasks on file changes, git commits, or cron schedule. Configured via `.nex/daemon.json`. Desktop and Matrix notifications.
|
|
329
310
|
|
|
330
|
-
>>>>>>> Stashed changes
|
|
331
311
|
### Session Trees
|
|
332
312
|
|
|
333
313
|
Navigate conversation history like git branches — fork, switch, goto, delete branches.
|
|
@@ -350,33 +330,6 @@ Pre-push secret detection, audit logging (JSONL), persistent undo/redo, cost lim
|
|
|
350
330
|
- **Auto-fix engine** — path resolution, edit fuzzy matching (Levenshtein), bash error hints
|
|
351
331
|
- **Tool tiers** — essential (5) / standard (21) / full (45), auto-selected per model capability
|
|
352
332
|
- **Stale stream recovery** — progressive retry with context compression on stall
|
|
353
|
-
<<<<<<< Updated upstream
|
|
354
|
-
### Visual Development Tools
|
|
355
|
-
Pixel-level before/after comparison, responsive sweeps (320-1920px), annotation overlays, design token extraction, and live-reload diff watching. Pure image tools work standalone; browser-based tools need Playwright.
|
|
356
|
-
|
|
357
|
-
---
|
|
358
|
-
|
|
359
|
-
## Extensibility
|
|
360
|
-
|
|
361
|
-
### Skills
|
|
362
|
-
|
|
363
|
-
Drop `.md` or `.js` files in `.nex/skills/` for project-specific knowledge, commands, and tools. Global skills in `~/.nex-code/skills/`. Install from git: `/install-skill user/repo`.
|
|
364
|
-
|
|
365
|
-
### Plugins
|
|
366
|
-
|
|
367
|
-
Custom tools and lifecycle hooks via `.nex/plugins/`. Events: `onToolResult`, `onModelResponse`, `onSessionStart`, `onSessionEnd`, `onFileChange`, `beforeToolExec`, `afterToolExec`.
|
|
368
|
-
|
|
369
|
-
### MCP
|
|
370
|
-
|
|
371
|
-
Connect external tool servers via [Model Context Protocol](https://modelcontextprotocol.io). Configure in `.nex/mcp.json` with env var interpolation.
|
|
372
|
-
|
|
373
|
-
### Hooks
|
|
374
|
-
|
|
375
|
-
Run custom scripts on CLI events (`pre-tool`, `post-tool`, `pre-commit`, `post-response`, `session-start`, `session-end`). Configure in `.nex/config.json` or `.nex/hooks/`.
|
|
376
|
-
|
|
377
|
-
---
|
|
378
|
-
|
|
379
|
-
=======
|
|
380
333
|
|
|
381
334
|
### Visual Development Tools
|
|
382
335
|
|
|
@@ -404,7 +357,6 @@ Run custom scripts on CLI events (`pre-tool`, `post-tool`, `pre-commit`, `post-r
|
|
|
404
357
|
|
|
405
358
|
---
|
|
406
359
|
|
|
407
|
-
>>>>>>> Stashed changes
|
|
408
360
|
## VS Code Extension
|
|
409
361
|
|
|
410
362
|
Built-in sidebar chat panel (`vscode/`) with streaming output, collapsible tool cards, and native theme support. Spawns `nex-code --server` over JSON-lines IPC.
|
|
@@ -426,11 +378,7 @@ cli/
|
|
|
426
378
|
tools/index.js # 45 tool definitions + auto-fix engine
|
|
427
379
|
context-engine.js # Token management + 5-phase compression
|
|
428
380
|
sub-agent.js # Parallel sub-agents with file locking
|
|
429
|
-
<<<<<<< Updated upstream
|
|
430
|
-
orchestrator.js # Multi-agent decompose -> execute -> synthesize
|
|
431
|
-
=======
|
|
432
381
|
orchestrator.js # Multi-agent decompose -> execute -> synthesize
|
|
433
|
-
>>>>>>> Stashed changes
|
|
434
382
|
session-tree.js # Session branching
|
|
435
383
|
visual.js # Visual dev tools (pixelmatch-based)
|
|
436
384
|
browser.js # Playwright browser agent
|