nex-code 0.5.5 → 0.5.6

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -60,7 +60,6 @@ On first launch, an interactive setup wizard guides you through provider and cre
60
60
  | Startup time | ~100ms | 1-4s |
61
61
  | Runtime deps | 2 | heavy |
62
62
  | Infra tools | SSH, Docker, K8s built-in | no |
63
- <<<<<<< Updated upstream
64
63
 
65
64
  **Smart model routing.** The built-in `/benchmark` tests all configured models across 62 tool-calling tasks in 5 categories and auto-routes to the best model per task type.
66
65
 
@@ -68,15 +67,6 @@ On first launch, an interactive setup wizard guides you through provider and cre
68
67
 
69
68
  **45 built-in tools** across file ops, git, SSH, Docker, Kubernetes, deploy, browser, GitHub Actions, and visual review. See [Tools](#tools) for the full list.
70
69
 
71
- =======
72
-
73
- **Smart model routing.** The built-in `/benchmark` tests all configured models across 62 tool-calling tasks in 5 categories and auto-routes to the best model per task type.
74
-
75
- **Phase-based execution.** Tasks run through Plan (analyze) -> Implement (code) -> Verify (test) phases, each with the optimal model. Auto-loops back on test failures.
76
-
77
- **45 built-in tools** across file ops, git, SSH, Docker, Kubernetes, deploy, browser, GitHub Actions, and visual review. See [Tools](#tools) for the full list.
78
-
79
- >>>>>>> Stashed changes
80
70
  **2 runtime dependencies** (`axios`, `dotenv`). Starts in ~100ms. No Python, no heavy runtime.
81
71
 
82
72
  ---
@@ -86,39 +76,37 @@ On first launch, an interactive setup wizard guides you through provider and cre
86
76
  Rankings from nex-code's own `/benchmark` — 62 tasks testing tool selection, argument validity, and schema compliance.
87
77
 
88
78
  <!-- nex-benchmark-start -->
89
- <!-- Updated: 2026-04-05 — run `/benchmark --discover` after new Ollama Cloud releases -->
79
+ <!-- Updated: 2026-04-09 — run `/benchmark --discover` after new Ollama Cloud releases -->
90
80
 
91
81
  | Rank | Model | Score | Avg Latency | Context | Best For |
92
82
  |---|---|---|---|---|---|
93
- | 🥇 | `qwen3-vl:235b` | **79** | 12.4s | 131K | Overall #1 — frontier tool selection, data + agentic tasks |
94
- | 🥈 | `qwen3-vl:235b-instruct` | 78.2 | 5.3s | 131K | Best latency/score balance recommended default |
95
- | 🥉 | `nemotron-3-super` | 78.1 | 3.5s | 256K | — |
96
- | — | `rnj-1:8b` | 77.4 | 3.9s | 131K | — |
97
- | — | `mistral-large-3:675b` | 76.5 | 3.9s | 131K | — |
98
- | — | `gpt-oss:20b` | 76.5 | 1.9s | 131K | Fast small model, good overall score |
99
- | — | `qwen3-coder-next` | 75.7 | 2.2s | 256K | — |
100
- | — | `qwen3-next:80b` | 75.1 | 11.1s | 131K | — |
101
- | — | `ministral-3:8b` | 73.8 | 2.0s | 131K | Fastest strong model 2.2s latency, 70+ score |
102
- | — | `deepseek-v3.1:671b` | 73.6 | 2.9s | 131K | — |
103
- | — | `devstral-2:123b` | 73.2 | 2.0s | 131K | Sysadmin + SSH tasks, reliable coding |
104
- | — | `kimi-k2:1t` | 72.2 | 5.6s | 256K | Large repos (>100K tokens) |
105
- | — | `ministral-3:3b` | 72 | 1.6s | 32K | — |
106
- | — | `devstral-small-2:24b` | 71.7 | 2.6s | 131K | Fast sub-agents, simple lookups |
107
- | — | `qwen3.5:397b` | 70.7 | 4.2s | 256K | |
108
- | — | `qwen3-coder:480b` | 70.1 | 6.0s | 131K | Heavy coding sessions, large context |
109
- | — | `minimax-m2.1` | 69.9 | 3.0s | 200K | — |
110
- | — | `gemma4:31b` | 69.3 | 2.8s | ? | — |
111
- | — | `glm-4.7` | 69.1 | 5.3s | 131K | — |
112
- | — | `kimi-k2-thinking` | 69 | 3.1s | 256K | — |
113
- | — | `ministral-3:14b` | 68.8 | 2.0s | 131K | — |
114
- | — | `kimi-k2.5` | 68.7 | 3.4s | 256K | Large repos faster than k2:1t |
115
- | — | `minimax-m2.7` | 68.4 | 5.5s | 200K | — |
116
- | — | `glm-4.6` | 67.8 | 4.7s | 131K | |
117
- | — | `glm-5` | 67.4 | 5.0s | 131K | — |
118
- | — | `gpt-oss:120b` | 64.8 | 3.4s | 131K | — |
119
- | — | `nemotron-3-nano:30b` | 64.7 | 2.3s | 131K | — |
120
- | — | `minimax-m2.5` | 61.9 | 2.7s | 131K | Multi-agent, large context |
121
- | — | `minimax-m2` | 60.6 | 4.3s | 200K | — |
83
+ | 🥇 | `qwen3-vl:235b` | **80.1** | 12.9s | 131K | Overall #1 — frontier tool selection, data + agentic tasks |
84
+ | 🥈 | `rnj-1:8b` | 78.6 | 2.7s | 131K | — |
85
+ | 🥉 | `qwen3-vl:235b-instruct` | 78.4 | 7.3s | 131K | Best latency/score balance recommended default |
86
+ | — | `nemotron-3-super` | 76.2 | 2.8s | 256K | — |
87
+ | — | `deepseek-v3.1:671b` | 74.8 | 5.6s | 131K | — |
88
+ | — | `qwen3-coder-next` | 74.5 | 2.9s | 256K | |
89
+ | — | `ministral-3:3b` | 73.6 | 2.4s | 32K | — |
90
+ | — | `ministral-3:8b` | 72.6 | 1.9s | 131K | Fastest strong model 2.2s latency, 70+ score |
91
+ | — | `qwen3-next:80b` | 72.2 | 11.5s | 131K | — |
92
+ | — | `mistral-large-3:675b` | 70.9 | 5.7s | 131K | — |
93
+ | — | `devstral-small-2:24b` | 70.9 | 2.8s | 131K | Fast sub-agents, simple lookups |
94
+ | — | `devstral-2:123b` | 70.9 | 4.0s | 131K | Sysadmin + SSH tasks, reliable coding |
95
+ | — | `minimax-m2.1` | 70.7 | 4.3s | 200K | — |
96
+ | — | `gpt-oss:20b` | 70.2 | 3.9s | 131K | Fast small model, good overall score |
97
+ | — | `kimi-k2:1t` | 69.9 | 5.0s | 256K | Large repos (>100K tokens) |
98
+ | — | `kimi-k2.5` | 69 | 5.8s | 256K | Large repos faster than k2:1t |
99
+ | — | `kimi-k2-thinking` | 69 | 4.0s | 256K | — |
100
+ | — | `glm-5` | 69 | 7.2s | 131K | — |
101
+ | — | `glm-5.1` | 68.8 | 9.7s | ? | — |
102
+ | — | `gemma4:31b` | 68.7 | 3.3s | ? | — |
103
+ | — | `minimax-m2.7` | 68.6 | 5.1s | 200K | — |
104
+ | — | `nemotron-3-nano:30b` | 67.8 | 2.9s | 131K | — |
105
+ | — | `ministral-3:14b` | 67.7 | 2.3s | 131K | — |
106
+ | — | `qwen3-coder:480b` | 67.2 | 7.7s | 131K | Heavy coding sessions, large context |
107
+ | — | `qwen3.5:397b` | 67.1 | 7.2s | 256K | — |
108
+ | — | `glm-4.6` | 65.2 | 7.5s | 131K | — |
109
+ | — | `gpt-oss:120b` | 64.6 | 3.7s | 131K | — |
122
110
 
123
111
  > Rankings are nex-code-specific: tool name accuracy, argument validity, schema compliance.
124
112
  > Toolathon (Minimax SOTA) measures different task types — run `/benchmark --discover` after model releases.
@@ -199,15 +187,11 @@ nex-code --daemon # watch mode: fires tasks on file changes, git commit
199
187
  | `--max-turns <n>` | Override agentic loop limit |
200
188
  | `--model <spec>` | Use specific model (e.g. `anthropic:claude-sonnet-4-6`) |
201
189
  | `--debug` | Show diagnostic messages |
202
- <<<<<<< Updated upstream
203
-
204
- ### Vision / Screenshot
205
-
206
- =======
190
+ | `--gemini` | Local Gemini test mode (`gemini-3.1-pro-preview` by default, requires `GEMINI_API_KEY`) |
191
+ | `--gemini-model <id>` | Pin a specific Gemini model (implies `--gemini`) |
207
192
 
208
193
  ### Vision / Screenshot
209
194
 
210
- >>>>>>> Stashed changes
211
195
  ```
212
196
  > /path/to/screenshot.png implement this UI in React
213
197
  > analyze https://example.com/mockup.png and implement it
@@ -321,13 +305,9 @@ Autonomous optimization loops: edit -> experiment -> keep/revert, on a dedicated
321
305
  Auto-activates for implementation tasks. Read-only analysis first, approve before writes. Hard-enforced tool restrictions.
322
306
 
323
307
  ### Daemon / Watch Mode
324
- <<<<<<< Updated upstream
325
- Background process that fires tasks on file changes, git commits, or cron schedule. Configured via `.nex/daemon.json`. Desktop and Matrix notifications.
326
- =======
327
308
 
328
309
  Background process that fires tasks on file changes, git commits, or cron schedule. Configured via `.nex/daemon.json`. Desktop and Matrix notifications.
329
310
 
330
- >>>>>>> Stashed changes
331
311
  ### Session Trees
332
312
 
333
313
  Navigate conversation history like git branches — fork, switch, goto, delete branches.
@@ -350,33 +330,6 @@ Pre-push secret detection, audit logging (JSONL), persistent undo/redo, cost lim
350
330
  - **Auto-fix engine** — path resolution, edit fuzzy matching (Levenshtein), bash error hints
351
331
  - **Tool tiers** — essential (5) / standard (21) / full (45), auto-selected per model capability
352
332
  - **Stale stream recovery** — progressive retry with context compression on stall
353
- <<<<<<< Updated upstream
354
- ### Visual Development Tools
355
- Pixel-level before/after comparison, responsive sweeps (320-1920px), annotation overlays, design token extraction, and live-reload diff watching. Pure image tools work standalone; browser-based tools need Playwright.
356
-
357
- ---
358
-
359
- ## Extensibility
360
-
361
- ### Skills
362
-
363
- Drop `.md` or `.js` files in `.nex/skills/` for project-specific knowledge, commands, and tools. Global skills in `~/.nex-code/skills/`. Install from git: `/install-skill user/repo`.
364
-
365
- ### Plugins
366
-
367
- Custom tools and lifecycle hooks via `.nex/plugins/`. Events: `onToolResult`, `onModelResponse`, `onSessionStart`, `onSessionEnd`, `onFileChange`, `beforeToolExec`, `afterToolExec`.
368
-
369
- ### MCP
370
-
371
- Connect external tool servers via [Model Context Protocol](https://modelcontextprotocol.io). Configure in `.nex/mcp.json` with env var interpolation.
372
-
373
- ### Hooks
374
-
375
- Run custom scripts on CLI events (`pre-tool`, `post-tool`, `pre-commit`, `post-response`, `session-start`, `session-end`). Configure in `.nex/config.json` or `.nex/hooks/`.
376
-
377
- ---
378
-
379
- =======
380
333
 
381
334
  ### Visual Development Tools
382
335
 
@@ -404,7 +357,6 @@ Run custom scripts on CLI events (`pre-tool`, `post-tool`, `pre-commit`, `post-r
404
357
 
405
358
  ---
406
359
 
407
- >>>>>>> Stashed changes
408
360
  ## VS Code Extension
409
361
 
410
362
  Built-in sidebar chat panel (`vscode/`) with streaming output, collapsible tool cards, and native theme support. Spawns `nex-code --server` over JSON-lines IPC.
@@ -426,11 +378,7 @@ cli/
426
378
  tools/index.js # 45 tool definitions + auto-fix engine
427
379
  context-engine.js # Token management + 5-phase compression
428
380
  sub-agent.js # Parallel sub-agents with file locking
429
- <<<<<<< Updated upstream
430
- orchestrator.js # Multi-agent decompose -> execute -> synthesize
431
- =======
432
381
  orchestrator.js # Multi-agent decompose -> execute -> synthesize
433
- >>>>>>> Stashed changes
434
382
  session-tree.js # Session branching
435
383
  visual.js # Visual dev tools (pixelmatch-based)
436
384
  browser.js # Playwright browser agent