nex-code 0.4.19 → 0.4.21
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +35 -7
- package/dist/nex-code.js +554 -528
- package/package.json +4 -3
package/README.md
CHANGED
|
@@ -99,7 +99,7 @@ npm update -g nex-code
|
|
|
99
99
|
|
|
100
100
|
**Open-model first.** Not locked to any single vendor. Tool tiers (`essential / standard / full`) adapt automatically to the model's capability level, so smaller models don't receive tool schemas they can't handle. A 5-layer auto-fix loop catches and retries malformed tool calls without user intervention.
|
|
101
101
|
|
|
102
|
-
**Smart model routing.** The built-in `/benchmark` system tests all configured models against
|
|
102
|
+
**Smart model routing.** The built-in `/benchmark` system tests all configured models against 56 real nex-code tool-calling tasks across 5 task categories. The results feed a routing table so nex-code can automatically switch to the best model for the detected task type:
|
|
103
103
|
|
|
104
104
|
| Detected task | Routed model (example) |
|
|
105
105
|
| ------------------------- | --------------------------- |
|
|
@@ -109,6 +109,16 @@ npm update -g nex-code
|
|
|
109
109
|
| Agentic swarms | `minimax-m2.7:cloud` |
|
|
110
110
|
| General coding | `devstral-2:123b` (default) |
|
|
111
111
|
|
|
112
|
+
**Phase-based execution.** On Ollama Cloud, each task automatically runs through three phases — each with the optimal model:
|
|
113
|
+
|
|
114
|
+
| Phase | Purpose | Default model |
|
|
115
|
+
| ------------- | -------------------------------- | ------------------------ |
|
|
116
|
+
| **Plan** | Analyze codebase, find root cause | `qwen3-coder:480b` |
|
|
117
|
+
| **Implement** | Write code, edit files | active model (default) |
|
|
118
|
+
| **Verify** | Run tests, check correctness | `devstral-small-2:24b` |
|
|
119
|
+
|
|
120
|
+
The verify phase catches incomplete work before reporting "done" — if tests fail, it loops back to implement automatically. Phase models are auto-updated by `/benchmark`. Disable with `NEX_PHASE_ROUTING=0`.
|
|
121
|
+
|
|
112
122
|
**Built-in VS Code extension.** A sidebar chat panel with streaming output, collapsible tool cards, and native VS Code theme support — shipped in the same repo, no separate install.
|
|
113
123
|
|
|
114
124
|
**Lightweight.** 2 runtime dependencies (`axios`, `dotenv`). Starts in ~100ms. No Python, no heavy runtime, no daemon process.
|
|
@@ -145,15 +155,33 @@ Rankings are based on nex-code's own `/benchmark` — 15 tool-calling tasks agai
|
|
|
145
155
|
### Flat-Rate / Pay-as-you-go
|
|
146
156
|
|
|
147
157
|
<!-- nex-benchmark-start -->
|
|
148
|
-
<!-- Updated: 2026-03-
|
|
158
|
+
<!-- Updated: 2026-03-29 — run `/benchmark --discover` after new Ollama Cloud releases -->
|
|
149
159
|
|
|
150
160
|
| Rank | Model | Score | Avg Latency | Context | Best For |
|
|
151
161
|
|---|---|---|---|---|---|
|
|
152
|
-
| 🥇 | `
|
|
153
|
-
| 🥈 | `
|
|
154
|
-
| 🥉 | `
|
|
155
|
-
| — | `
|
|
156
|
-
| — | `
|
|
162
|
+
| 🥇 | `qwen3-vl:235b` | **77.1** | 14.4s | 131K | Overall #1 — frontier tool selection, data + agentic tasks |
|
|
163
|
+
| 🥈 | `qwen3-vl:235b-instruct` | 76.3 | 6.5s | 131K | Best latency/score balance — recommended default |
|
|
164
|
+
| 🥉 | `rnj-1:8b` | 74 | 3.7s | 131K | — |
|
|
165
|
+
| — | `ministral-3:8b` | 73.1 | 2.3s | 131K | Fastest strong model — 2.2s latency, 70+ score |
|
|
166
|
+
| — | `qwen3-coder-next` | 71.4 | 2.8s | 256K | — |
|
|
167
|
+
| — | `qwen3-next:80b` | 70.6 | 11.6s | 131K | — |
|
|
168
|
+
| — | `qwen3.5:397b` | 68.9 | 3.9s | 256K | — |
|
|
169
|
+
| — | `minimax-m2.7` | 68.7 | 6.8s | 200K | — |
|
|
170
|
+
| — | `glm-5` | 67.6 | 4.5s | 131K | — |
|
|
171
|
+
| — | `devstral-2:123b` | 67.6 | 2.0s | 131K | Sysadmin + SSH tasks, reliable coding |
|
|
172
|
+
| — | `glm-4.7` | 66.5 | 5.1s | 131K | — |
|
|
173
|
+
| — | `kimi-k2-thinking` | 66.3 | 18.4s | 256K | — |
|
|
174
|
+
| — | `ministral-3:14b` | 65.8 | 3.8s | 131K | — |
|
|
175
|
+
| — | `devstral-small-2:24b` | 65.5 | 2.3s | 131K | Fast sub-agents, simple lookups |
|
|
176
|
+
| — | `ministral-3:3b` | 65.4 | 2.2s | 32K | — |
|
|
177
|
+
| — | `kimi-k2.5` | 65.2 | 3.5s | 256K | Large repos — faster than k2:1t |
|
|
178
|
+
| — | `kimi-k2:1t` | 65.2 | 4.2s | 256K | Large repos (>100K tokens) |
|
|
179
|
+
| — | `minimax-m2.1` | 64.2 | 5.4s | 200K | — |
|
|
180
|
+
| — | `glm-4.6` | 63.9 | 4.9s | 131K | — |
|
|
181
|
+
| — | `qwen3-coder:480b` | 63.2 | 14.1s | 131K | Heavy coding sessions, large context |
|
|
182
|
+
| — | `nemotron-3-super` | 61.3 | 2.6s | 256K | — |
|
|
183
|
+
| — | `gpt-oss:20b` | 60.9 | 2.5s | 131K | Fast small model, good overall score |
|
|
184
|
+
| — | `mistral-large-3:675b` | 60.8 | 3.8s | 131K | — |
|
|
157
185
|
|
|
158
186
|
> Rankings are nex-code-specific: tool name accuracy, argument validity, schema compliance.
|
|
159
187
|
> Toolathon (Minimax SOTA) measures different task types — run `/benchmark --discover` after model releases.
|