nex-code 0.5.5 → 0.5.7

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -22,7 +22,7 @@
22
22
  <img src="https://img.shields.io/badge/Ollama_Cloud-supported-brightgreen.svg" alt="Ollama Cloud: supported">
23
23
  <img src="https://img.shields.io/badge/node-%3E%3D18-brightgreen.svg" alt="Node >= 18">
24
24
  <img src="https://img.shields.io/badge/dependencies-2-green.svg" alt="Dependencies: 2">
25
- <img src="https://img.shields.io/badge/tests-3920-blue.svg" alt="Tests: 3920">
25
+ <img src="https://img.shields.io/badge/tests-3929-blue.svg" alt="Tests: 3920">
26
26
  <img src="https://img.shields.io/badge/VS_Code-extension-007ACC.svg" alt="VS Code extension">
27
27
  </p>
28
28
 
@@ -60,7 +60,6 @@ On first launch, an interactive setup wizard guides you through provider and cre
60
60
  | Startup time | ~100ms | 1-4s |
61
61
  | Runtime deps | 2 | heavy |
62
62
  | Infra tools | SSH, Docker, K8s built-in | no |
63
- <<<<<<< Updated upstream
64
63
 
65
64
  **Smart model routing.** The built-in `/benchmark` tests all configured models across 62 tool-calling tasks in 5 categories and auto-routes to the best model per task type.
66
65
 
@@ -68,15 +67,6 @@ On first launch, an interactive setup wizard guides you through provider and cre
68
67
 
69
68
  **45 built-in tools** across file ops, git, SSH, Docker, Kubernetes, deploy, browser, GitHub Actions, and visual review. See [Tools](#tools) for the full list.
70
69
 
71
- =======
72
-
73
- **Smart model routing.** The built-in `/benchmark` tests all configured models across 62 tool-calling tasks in 5 categories and auto-routes to the best model per task type.
74
-
75
- **Phase-based execution.** Tasks run through Plan (analyze) -> Implement (code) -> Verify (test) phases, each with the optimal model. Auto-loops back on test failures.
76
-
77
- **45 built-in tools** across file ops, git, SSH, Docker, Kubernetes, deploy, browser, GitHub Actions, and visual review. See [Tools](#tools) for the full list.
78
-
79
- >>>>>>> Stashed changes
80
70
  **2 runtime dependencies** (`axios`, `dotenv`). Starts in ~100ms. No Python, no heavy runtime.
81
71
 
82
72
  ---
@@ -86,39 +76,37 @@ On first launch, an interactive setup wizard guides you through provider and cre
86
76
  Rankings from nex-code's own `/benchmark` — 62 tasks testing tool selection, argument validity, and schema compliance.
87
77
 
88
78
  <!-- nex-benchmark-start -->
89
- <!-- Updated: 2026-04-05 — run `/benchmark --discover` after new Ollama Cloud releases -->
79
+ <!-- Updated: 2026-04-09 — run `/benchmark --discover` after new Ollama Cloud releases -->
90
80
 
91
81
  | Rank | Model | Score | Avg Latency | Context | Best For |
92
82
  |---|---|---|---|---|---|
93
- | 🥇 | `qwen3-vl:235b` | **79** | 12.4s | 131K | Overall #1 — frontier tool selection, data + agentic tasks |
94
- | 🥈 | `qwen3-vl:235b-instruct` | 78.2 | 5.3s | 131K | Best latency/score balance recommended default |
95
- | 🥉 | `nemotron-3-super` | 78.1 | 3.5s | 256K | — |
96
- | — | `rnj-1:8b` | 77.4 | 3.9s | 131K | — |
97
- | — | `mistral-large-3:675b` | 76.5 | 3.9s | 131K | — |
98
- | — | `gpt-oss:20b` | 76.5 | 1.9s | 131K | Fast small model, good overall score |
99
- | — | `qwen3-coder-next` | 75.7 | 2.2s | 256K | — |
100
- | — | `qwen3-next:80b` | 75.1 | 11.1s | 131K | — |
101
- | — | `ministral-3:8b` | 73.8 | 2.0s | 131K | Fastest strong model 2.2s latency, 70+ score |
102
- | — | `deepseek-v3.1:671b` | 73.6 | 2.9s | 131K | — |
103
- | — | `devstral-2:123b` | 73.2 | 2.0s | 131K | Sysadmin + SSH tasks, reliable coding |
104
- | — | `kimi-k2:1t` | 72.2 | 5.6s | 256K | Large repos (>100K tokens) |
105
- | — | `ministral-3:3b` | 72 | 1.6s | 32K | — |
106
- | — | `devstral-small-2:24b` | 71.7 | 2.6s | 131K | Fast sub-agents, simple lookups |
107
- | — | `qwen3.5:397b` | 70.7 | 4.2s | 256K | |
108
- | — | `qwen3-coder:480b` | 70.1 | 6.0s | 131K | Heavy coding sessions, large context |
109
- | — | `minimax-m2.1` | 69.9 | 3.0s | 200K | — |
110
- | — | `gemma4:31b` | 69.3 | 2.8s | ? | — |
111
- | — | `glm-4.7` | 69.1 | 5.3s | 131K | — |
112
- | — | `kimi-k2-thinking` | 69 | 3.1s | 256K | — |
113
- | — | `ministral-3:14b` | 68.8 | 2.0s | 131K | — |
114
- | — | `kimi-k2.5` | 68.7 | 3.4s | 256K | Large repos faster than k2:1t |
115
- | — | `minimax-m2.7` | 68.4 | 5.5s | 200K | — |
116
- | — | `glm-4.6` | 67.8 | 4.7s | 131K | |
117
- | — | `glm-5` | 67.4 | 5.0s | 131K | — |
118
- | — | `gpt-oss:120b` | 64.8 | 3.4s | 131K | — |
119
- | — | `nemotron-3-nano:30b` | 64.7 | 2.3s | 131K | — |
120
- | — | `minimax-m2.5` | 61.9 | 2.7s | 131K | Multi-agent, large context |
121
- | — | `minimax-m2` | 60.6 | 4.3s | 200K | — |
83
+ | 🥇 | `qwen3-vl:235b` | **80.1** | 12.9s | 131K | Overall #1 — frontier tool selection, data + agentic tasks |
84
+ | 🥈 | `rnj-1:8b` | 78.6 | 2.7s | 131K | — |
85
+ | 🥉 | `qwen3-vl:235b-instruct` | 78.4 | 7.3s | 131K | Best latency/score balance recommended default |
86
+ | — | `nemotron-3-super` | 76.2 | 2.8s | 256K | — |
87
+ | — | `deepseek-v3.1:671b` | 74.8 | 5.6s | 131K | — |
88
+ | — | `qwen3-coder-next` | 74.5 | 2.9s | 256K | |
89
+ | — | `ministral-3:3b` | 73.6 | 2.4s | 32K | — |
90
+ | — | `ministral-3:8b` | 72.6 | 1.9s | 131K | Fastest strong model 2.2s latency, 70+ score |
91
+ | — | `qwen3-next:80b` | 72.2 | 11.5s | 131K | — |
92
+ | — | `mistral-large-3:675b` | 70.9 | 5.7s | 131K | — |
93
+ | — | `devstral-small-2:24b` | 70.9 | 2.8s | 131K | Fast sub-agents, simple lookups |
94
+ | — | `devstral-2:123b` | 70.9 | 4.0s | 131K | Sysadmin + SSH tasks, reliable coding |
95
+ | — | `minimax-m2.1` | 70.7 | 4.3s | 200K | — |
96
+ | — | `gpt-oss:20b` | 70.2 | 3.9s | 131K | Fast small model, good overall score |
97
+ | — | `kimi-k2:1t` | 69.9 | 5.0s | 256K | Large repos (>100K tokens) |
98
+ | — | `kimi-k2.5` | 69 | 5.8s | 256K | Large repos faster than k2:1t |
99
+ | — | `kimi-k2-thinking` | 69 | 4.0s | 256K | — |
100
+ | — | `glm-5` | 69 | 7.2s | 131K | — |
101
+ | — | `glm-5.1` | 68.8 | 9.7s | ? | — |
102
+ | — | `gemma4:31b` | 68.7 | 3.3s | ? | — |
103
+ | — | `minimax-m2.7` | 68.6 | 5.1s | 200K | — |
104
+ | — | `nemotron-3-nano:30b` | 67.8 | 2.9s | 131K | — |
105
+ | — | `ministral-3:14b` | 67.7 | 2.3s | 131K | — |
106
+ | — | `qwen3-coder:480b` | 67.2 | 7.7s | 131K | Heavy coding sessions, large context |
107
+ | — | `qwen3.5:397b` | 67.1 | 7.2s | 256K | — |
108
+ | — | `glm-4.6` | 65.2 | 7.5s | 131K | — |
109
+ | — | `gpt-oss:120b` | 64.6 | 3.7s | 131K | — |
122
110
 
123
111
  > Rankings are nex-code-specific: tool name accuracy, argument validity, schema compliance.
124
112
  > Toolathon (Minimax SOTA) measures different task types — run `/benchmark --discover` after model releases.
@@ -199,15 +187,11 @@ nex-code --daemon # watch mode: fires tasks on file changes, git commit
199
187
  | `--max-turns <n>` | Override agentic loop limit |
200
188
  | `--model <spec>` | Use specific model (e.g. `anthropic:claude-sonnet-4-6`) |
201
189
  | `--debug` | Show diagnostic messages |
202
- <<<<<<< Updated upstream
190
+ | `--gemini` | Local Gemini test mode (`gemini-3.1-pro-preview` by default, requires `GEMINI_API_KEY`) |
191
+ | `--gemini-model <id>` | Pin a specific Gemini model (implies `--gemini`) |
203
192
 
204
193
  ### Vision / Screenshot
205
194
 
206
- =======
207
-
208
- ### Vision / Screenshot
209
-
210
- >>>>>>> Stashed changes
211
195
  ```
212
196
  > /path/to/screenshot.png implement this UI in React
213
197
  > analyze https://example.com/mockup.png and implement it
@@ -307,6 +291,21 @@ Multi-goal prompts auto-decompose into parallel sub-agents. Up to 5 agents run s
307
291
  nex-code --task "fix type errors in src/, add JSDoc to utils/, update CHANGELOG"
308
292
  ```
309
293
 
294
+ ### Background Agents
295
+
296
+ Sub-agents can run non-blocking in isolated forked processes. The main agent continues working while background workers complete, then results are automatically injected into the conversation.
297
+
298
+ ```
299
+ # The model decides when to use background:true — no extra syntax needed.
300
+ # Example: the model might run the linter in background while explaining code.
301
+ spawn_agents([
302
+ { task: "run the linter and report errors", background: true },
303
+ { task: "explain the auth module" } ← main agent answers this immediately
304
+ ])
305
+ ```
306
+
307
+ Background agents are shown in the spinner: `● Thinking [1 bg agent running]`. Results appear as `✓ Background agent done: …` when workers finish.
308
+
310
309
  ### Autoresearch
311
310
 
312
311
  Autonomous optimization loops: edit -> experiment -> keep/revert, on a dedicated branch.
@@ -321,13 +320,9 @@ Autonomous optimization loops: edit -> experiment -> keep/revert, on a dedicated
321
320
  Auto-activates for implementation tasks. Read-only analysis first, approve before writes. Hard-enforced tool restrictions.
322
321
 
323
322
  ### Daemon / Watch Mode
324
- <<<<<<< Updated upstream
325
- Background process that fires tasks on file changes, git commits, or cron schedule. Configured via `.nex/daemon.json`. Desktop and Matrix notifications.
326
- =======
327
323
 
328
324
  Background process that fires tasks on file changes, git commits, or cron schedule. Configured via `.nex/daemon.json`. Desktop and Matrix notifications.
329
325
 
330
- >>>>>>> Stashed changes
331
326
  ### Session Trees
332
327
 
333
328
  Navigate conversation history like git branches — fork, switch, goto, delete branches.
@@ -350,33 +345,6 @@ Pre-push secret detection, audit logging (JSONL), persistent undo/redo, cost lim
350
345
  - **Auto-fix engine** — path resolution, edit fuzzy matching (Levenshtein), bash error hints
351
346
  - **Tool tiers** — essential (5) / standard (21) / full (45), auto-selected per model capability
352
347
  - **Stale stream recovery** — progressive retry with context compression on stall
353
- <<<<<<< Updated upstream
354
- ### Visual Development Tools
355
- Pixel-level before/after comparison, responsive sweeps (320-1920px), annotation overlays, design token extraction, and live-reload diff watching. Pure image tools work standalone; browser-based tools need Playwright.
356
-
357
- ---
358
-
359
- ## Extensibility
360
-
361
- ### Skills
362
-
363
- Drop `.md` or `.js` files in `.nex/skills/` for project-specific knowledge, commands, and tools. Global skills in `~/.nex-code/skills/`. Install from git: `/install-skill user/repo`.
364
-
365
- ### Plugins
366
-
367
- Custom tools and lifecycle hooks via `.nex/plugins/`. Events: `onToolResult`, `onModelResponse`, `onSessionStart`, `onSessionEnd`, `onFileChange`, `beforeToolExec`, `afterToolExec`.
368
-
369
- ### MCP
370
-
371
- Connect external tool servers via [Model Context Protocol](https://modelcontextprotocol.io). Configure in `.nex/mcp.json` with env var interpolation.
372
-
373
- ### Hooks
374
-
375
- Run custom scripts on CLI events (`pre-tool`, `post-tool`, `pre-commit`, `post-response`, `session-start`, `session-end`). Configure in `.nex/config.json` or `.nex/hooks/`.
376
-
377
- ---
378
-
379
- =======
380
348
 
381
349
  ### Visual Development Tools
382
350
 
@@ -404,7 +372,6 @@ Run custom scripts on CLI events (`pre-tool`, `post-tool`, `pre-commit`, `post-r
404
372
 
405
373
  ---
406
374
 
407
- >>>>>>> Stashed changes
408
375
  ## VS Code Extension
409
376
 
410
377
  Built-in sidebar chat panel (`vscode/`) with streaming output, collapsible tool cards, and native theme support. Spawns `nex-code --server` over JSON-lines IPC.
@@ -426,11 +393,7 @@ cli/
426
393
  tools/index.js # 45 tool definitions + auto-fix engine
427
394
  context-engine.js # Token management + 5-phase compression
428
395
  sub-agent.js # Parallel sub-agents with file locking
429
- <<<<<<< Updated upstream
430
- orchestrator.js # Multi-agent decompose -> execute -> synthesize
431
- =======
432
396
  orchestrator.js # Multi-agent decompose -> execute -> synthesize
433
- >>>>>>> Stashed changes
434
397
  session-tree.js # Session branching
435
398
  visual.js # Visual dev tools (pixelmatch-based)
436
399
  browser.js # Playwright browser agent