nex-code 0.5.5 → 0.5.7
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +46 -83
- package/dist/background-worker.js +1408 -0
- package/dist/benchmark.js +431 -381
- package/dist/nex-code.js +591 -536
- package/package.json +3 -3
package/README.md
CHANGED
|
@@ -22,7 +22,7 @@
|
|
|
22
22
|
<img src="https://img.shields.io/badge/Ollama_Cloud-supported-brightgreen.svg" alt="Ollama Cloud: supported">
|
|
23
23
|
<img src="https://img.shields.io/badge/node-%3E%3D18-brightgreen.svg" alt="Node >= 18">
|
|
24
24
|
<img src="https://img.shields.io/badge/dependencies-2-green.svg" alt="Dependencies: 2">
|
|
25
|
-
<img src="https://img.shields.io/badge/tests-
|
|
25
|
+
<img src="https://img.shields.io/badge/tests-3929-blue.svg" alt="Tests: 3920">
|
|
26
26
|
<img src="https://img.shields.io/badge/VS_Code-extension-007ACC.svg" alt="VS Code extension">
|
|
27
27
|
</p>
|
|
28
28
|
|
|
@@ -60,7 +60,6 @@ On first launch, an interactive setup wizard guides you through provider and cre
|
|
|
60
60
|
| Startup time | ~100ms | 1-4s |
|
|
61
61
|
| Runtime deps | 2 | heavy |
|
|
62
62
|
| Infra tools | SSH, Docker, K8s built-in | no |
|
|
63
|
-
<<<<<<< Updated upstream
|
|
64
63
|
|
|
65
64
|
**Smart model routing.** The built-in `/benchmark` tests all configured models across 62 tool-calling tasks in 5 categories and auto-routes to the best model per task type.
|
|
66
65
|
|
|
@@ -68,15 +67,6 @@ On first launch, an interactive setup wizard guides you through provider and cre
|
|
|
68
67
|
|
|
69
68
|
**45 built-in tools** across file ops, git, SSH, Docker, Kubernetes, deploy, browser, GitHub Actions, and visual review. See [Tools](#tools) for the full list.
|
|
70
69
|
|
|
71
|
-
=======
|
|
72
|
-
|
|
73
|
-
**Smart model routing.** The built-in `/benchmark` tests all configured models across 62 tool-calling tasks in 5 categories and auto-routes to the best model per task type.
|
|
74
|
-
|
|
75
|
-
**Phase-based execution.** Tasks run through Plan (analyze) -> Implement (code) -> Verify (test) phases, each with the optimal model. Auto-loops back on test failures.
|
|
76
|
-
|
|
77
|
-
**45 built-in tools** across file ops, git, SSH, Docker, Kubernetes, deploy, browser, GitHub Actions, and visual review. See [Tools](#tools) for the full list.
|
|
78
|
-
|
|
79
|
-
>>>>>>> Stashed changes
|
|
80
70
|
**2 runtime dependencies** (`axios`, `dotenv`). Starts in ~100ms. No Python, no heavy runtime.
|
|
81
71
|
|
|
82
72
|
---
|
|
@@ -86,39 +76,37 @@ On first launch, an interactive setup wizard guides you through provider and cre
|
|
|
86
76
|
Rankings from nex-code's own `/benchmark` — 62 tasks testing tool selection, argument validity, and schema compliance.
|
|
87
77
|
|
|
88
78
|
<!-- nex-benchmark-start -->
|
|
89
|
-
<!-- Updated: 2026-04-
|
|
79
|
+
<!-- Updated: 2026-04-09 — run `/benchmark --discover` after new Ollama Cloud releases -->
|
|
90
80
|
|
|
91
81
|
| Rank | Model | Score | Avg Latency | Context | Best For |
|
|
92
82
|
|---|---|---|---|---|---|
|
|
93
|
-
| 🥇 | `qwen3-vl:235b` | **
|
|
94
|
-
| 🥈 | `
|
|
95
|
-
| 🥉 | `
|
|
96
|
-
| — | `
|
|
97
|
-
| — | `
|
|
98
|
-
| — | `
|
|
99
|
-
| — | `
|
|
100
|
-
| — | `
|
|
101
|
-
| — | `
|
|
102
|
-
| — | `
|
|
103
|
-
| — | `devstral-2:
|
|
104
|
-
| — | `
|
|
105
|
-
| — | `
|
|
106
|
-
| — | `
|
|
107
|
-
| — | `
|
|
108
|
-
| — | `
|
|
109
|
-
| — | `
|
|
110
|
-
| — | `
|
|
111
|
-
| — | `glm-
|
|
112
|
-
| — | `
|
|
113
|
-
| — | `
|
|
114
|
-
| — | `
|
|
115
|
-
| — | `
|
|
116
|
-
| — | `
|
|
117
|
-
| — | `
|
|
118
|
-
| — | `
|
|
119
|
-
| — | `
|
|
120
|
-
| — | `minimax-m2.5` | 61.9 | 2.7s | 131K | Multi-agent, large context |
|
|
121
|
-
| — | `minimax-m2` | 60.6 | 4.3s | 200K | — |
|
|
83
|
+
| 🥇 | `qwen3-vl:235b` | **80.1** | 12.9s | 131K | Overall #1 — frontier tool selection, data + agentic tasks |
|
|
84
|
+
| 🥈 | `rnj-1:8b` | 78.6 | 2.7s | 131K | — |
|
|
85
|
+
| 🥉 | `qwen3-vl:235b-instruct` | 78.4 | 7.3s | 131K | Best latency/score balance — recommended default |
|
|
86
|
+
| — | `nemotron-3-super` | 76.2 | 2.8s | 256K | — |
|
|
87
|
+
| — | `deepseek-v3.1:671b` | 74.8 | 5.6s | 131K | — |
|
|
88
|
+
| — | `qwen3-coder-next` | 74.5 | 2.9s | 256K | — |
|
|
89
|
+
| — | `ministral-3:3b` | 73.6 | 2.4s | 32K | — |
|
|
90
|
+
| — | `ministral-3:8b` | 72.6 | 1.9s | 131K | Fastest strong model — 2.2s latency, 70+ score |
|
|
91
|
+
| — | `qwen3-next:80b` | 72.2 | 11.5s | 131K | — |
|
|
92
|
+
| — | `mistral-large-3:675b` | 70.9 | 5.7s | 131K | — |
|
|
93
|
+
| — | `devstral-small-2:24b` | 70.9 | 2.8s | 131K | Fast sub-agents, simple lookups |
|
|
94
|
+
| — | `devstral-2:123b` | 70.9 | 4.0s | 131K | Sysadmin + SSH tasks, reliable coding |
|
|
95
|
+
| — | `minimax-m2.1` | 70.7 | 4.3s | 200K | — |
|
|
96
|
+
| — | `gpt-oss:20b` | 70.2 | 3.9s | 131K | Fast small model, good overall score |
|
|
97
|
+
| — | `kimi-k2:1t` | 69.9 | 5.0s | 256K | Large repos (>100K tokens) |
|
|
98
|
+
| — | `kimi-k2.5` | 69 | 5.8s | 256K | Large repos — faster than k2:1t |
|
|
99
|
+
| — | `kimi-k2-thinking` | 69 | 4.0s | 256K | — |
|
|
100
|
+
| — | `glm-5` | 69 | 7.2s | 131K | — |
|
|
101
|
+
| — | `glm-5.1` | 68.8 | 9.7s | ? | — |
|
|
102
|
+
| — | `gemma4:31b` | 68.7 | 3.3s | ? | — |
|
|
103
|
+
| — | `minimax-m2.7` | 68.6 | 5.1s | 200K | — |
|
|
104
|
+
| — | `nemotron-3-nano:30b` | 67.8 | 2.9s | 131K | — |
|
|
105
|
+
| — | `ministral-3:14b` | 67.7 | 2.3s | 131K | — |
|
|
106
|
+
| — | `qwen3-coder:480b` | 67.2 | 7.7s | 131K | Heavy coding sessions, large context |
|
|
107
|
+
| — | `qwen3.5:397b` | 67.1 | 7.2s | 256K | — |
|
|
108
|
+
| — | `glm-4.6` | 65.2 | 7.5s | 131K | — |
|
|
109
|
+
| — | `gpt-oss:120b` | 64.6 | 3.7s | 131K | — |
|
|
122
110
|
|
|
123
111
|
> Rankings are nex-code-specific: tool name accuracy, argument validity, schema compliance.
|
|
124
112
|
> Toolathon (Minimax SOTA) measures different task types — run `/benchmark --discover` after model releases.
|
|
@@ -199,15 +187,11 @@ nex-code --daemon # watch mode: fires tasks on file changes, git commit
|
|
|
199
187
|
| `--max-turns <n>` | Override agentic loop limit |
|
|
200
188
|
| `--model <spec>` | Use specific model (e.g. `anthropic:claude-sonnet-4-6`) |
|
|
201
189
|
| `--debug` | Show diagnostic messages |
|
|
202
|
-
|
|
190
|
+
| `--gemini` | Local Gemini test mode (`gemini-3.1-pro-preview` by default, requires `GEMINI_API_KEY`) |
|
|
191
|
+
| `--gemini-model <id>` | Pin a specific Gemini model (implies `--gemini`) |
|
|
203
192
|
|
|
204
193
|
### Vision / Screenshot
|
|
205
194
|
|
|
206
|
-
=======
|
|
207
|
-
|
|
208
|
-
### Vision / Screenshot
|
|
209
|
-
|
|
210
|
-
>>>>>>> Stashed changes
|
|
211
195
|
```
|
|
212
196
|
> /path/to/screenshot.png implement this UI in React
|
|
213
197
|
> analyze https://example.com/mockup.png and implement it
|
|
@@ -307,6 +291,21 @@ Multi-goal prompts auto-decompose into parallel sub-agents. Up to 5 agents run s
|
|
|
307
291
|
nex-code --task "fix type errors in src/, add JSDoc to utils/, update CHANGELOG"
|
|
308
292
|
```
|
|
309
293
|
|
|
294
|
+
### Background Agents
|
|
295
|
+
|
|
296
|
+
Sub-agents can run non-blocking in isolated forked processes. The main agent continues working while background workers complete, then results are automatically injected into the conversation.
|
|
297
|
+
|
|
298
|
+
```
|
|
299
|
+
# The model decides when to use background:true — no extra syntax needed.
|
|
300
|
+
# Example: the model might run the linter in background while explaining code.
|
|
301
|
+
spawn_agents([
|
|
302
|
+
{ task: "run the linter and report errors", background: true },
|
|
303
|
+
{ task: "explain the auth module" } ← main agent answers this immediately
|
|
304
|
+
])
|
|
305
|
+
```
|
|
306
|
+
|
|
307
|
+
Background agents are shown in the spinner: `● Thinking [1 bg agent running]`. Results appear as `✓ Background agent done: …` when workers finish.
|
|
308
|
+
|
|
310
309
|
### Autoresearch
|
|
311
310
|
|
|
312
311
|
Autonomous optimization loops: edit -> experiment -> keep/revert, on a dedicated branch.
|
|
@@ -321,13 +320,9 @@ Autonomous optimization loops: edit -> experiment -> keep/revert, on a dedicated
|
|
|
321
320
|
Auto-activates for implementation tasks. Read-only analysis first, approve before writes. Hard-enforced tool restrictions.
|
|
322
321
|
|
|
323
322
|
### Daemon / Watch Mode
|
|
324
|
-
<<<<<<< Updated upstream
|
|
325
|
-
Background process that fires tasks on file changes, git commits, or cron schedule. Configured via `.nex/daemon.json`. Desktop and Matrix notifications.
|
|
326
|
-
=======
|
|
327
323
|
|
|
328
324
|
Background process that fires tasks on file changes, git commits, or cron schedule. Configured via `.nex/daemon.json`. Desktop and Matrix notifications.
|
|
329
325
|
|
|
330
|
-
>>>>>>> Stashed changes
|
|
331
326
|
### Session Trees
|
|
332
327
|
|
|
333
328
|
Navigate conversation history like git branches — fork, switch, goto, delete branches.
|
|
@@ -350,33 +345,6 @@ Pre-push secret detection, audit logging (JSONL), persistent undo/redo, cost lim
|
|
|
350
345
|
- **Auto-fix engine** — path resolution, edit fuzzy matching (Levenshtein), bash error hints
|
|
351
346
|
- **Tool tiers** — essential (5) / standard (21) / full (45), auto-selected per model capability
|
|
352
347
|
- **Stale stream recovery** — progressive retry with context compression on stall
|
|
353
|
-
<<<<<<< Updated upstream
|
|
354
|
-
### Visual Development Tools
|
|
355
|
-
Pixel-level before/after comparison, responsive sweeps (320-1920px), annotation overlays, design token extraction, and live-reload diff watching. Pure image tools work standalone; browser-based tools need Playwright.
|
|
356
|
-
|
|
357
|
-
---
|
|
358
|
-
|
|
359
|
-
## Extensibility
|
|
360
|
-
|
|
361
|
-
### Skills
|
|
362
|
-
|
|
363
|
-
Drop `.md` or `.js` files in `.nex/skills/` for project-specific knowledge, commands, and tools. Global skills in `~/.nex-code/skills/`. Install from git: `/install-skill user/repo`.
|
|
364
|
-
|
|
365
|
-
### Plugins
|
|
366
|
-
|
|
367
|
-
Custom tools and lifecycle hooks via `.nex/plugins/`. Events: `onToolResult`, `onModelResponse`, `onSessionStart`, `onSessionEnd`, `onFileChange`, `beforeToolExec`, `afterToolExec`.
|
|
368
|
-
|
|
369
|
-
### MCP
|
|
370
|
-
|
|
371
|
-
Connect external tool servers via [Model Context Protocol](https://modelcontextprotocol.io). Configure in `.nex/mcp.json` with env var interpolation.
|
|
372
|
-
|
|
373
|
-
### Hooks
|
|
374
|
-
|
|
375
|
-
Run custom scripts on CLI events (`pre-tool`, `post-tool`, `pre-commit`, `post-response`, `session-start`, `session-end`). Configure in `.nex/config.json` or `.nex/hooks/`.
|
|
376
|
-
|
|
377
|
-
---
|
|
378
|
-
|
|
379
|
-
=======
|
|
380
348
|
|
|
381
349
|
### Visual Development Tools
|
|
382
350
|
|
|
@@ -404,7 +372,6 @@ Run custom scripts on CLI events (`pre-tool`, `post-tool`, `pre-commit`, `post-r
|
|
|
404
372
|
|
|
405
373
|
---
|
|
406
374
|
|
|
407
|
-
>>>>>>> Stashed changes
|
|
408
375
|
## VS Code Extension
|
|
409
376
|
|
|
410
377
|
Built-in sidebar chat panel (`vscode/`) with streaming output, collapsible tool cards, and native theme support. Spawns `nex-code --server` over JSON-lines IPC.
|
|
@@ -426,11 +393,7 @@ cli/
|
|
|
426
393
|
tools/index.js # 45 tool definitions + auto-fix engine
|
|
427
394
|
context-engine.js # Token management + 5-phase compression
|
|
428
395
|
sub-agent.js # Parallel sub-agents with file locking
|
|
429
|
-
<<<<<<< Updated upstream
|
|
430
|
-
orchestrator.js # Multi-agent decompose -> execute -> synthesize
|
|
431
|
-
=======
|
|
432
396
|
orchestrator.js # Multi-agent decompose -> execute -> synthesize
|
|
433
|
-
>>>>>>> Stashed changes
|
|
434
397
|
session-tree.js # Session branching
|
|
435
398
|
visual.js # Visual dev tools (pixelmatch-based)
|
|
436
399
|
browser.js # Playwright browser agent
|