nex-code 0.4.20 → 0.4.21
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +24 -6
- package/dist/nex-code.js +468 -457
- package/package.json +4 -3
package/README.md
CHANGED
|
@@ -155,15 +155,33 @@ Rankings are based on nex-code's own `/benchmark` — 15 tool-calling tasks agai
|
|
|
155
155
|
### Flat-Rate / Pay-as-you-go
|
|
156
156
|
|
|
157
157
|
<!-- nex-benchmark-start -->
|
|
158
|
-
<!-- Updated: 2026-03-
|
|
158
|
+
<!-- Updated: 2026-03-29 — run `/benchmark --discover` after new Ollama Cloud releases -->
|
|
159
159
|
|
|
160
160
|
| Rank | Model | Score | Avg Latency | Context | Best For |
|
|
161
161
|
|---|---|---|---|---|---|
|
|
162
|
-
| 🥇 | `
|
|
163
|
-
| 🥈 | `
|
|
164
|
-
| 🥉 | `
|
|
165
|
-
| — | `
|
|
166
|
-
| — | `
|
|
162
|
+
| 🥇 | `qwen3-vl:235b` | **77.1** | 14.4s | 131K | Overall #1 — frontier tool selection, data + agentic tasks |
|
|
163
|
+
| 🥈 | `qwen3-vl:235b-instruct` | 76.3 | 6.5s | 131K | Best latency/score balance — recommended default |
|
|
164
|
+
| 🥉 | `rnj-1:8b` | 74 | 3.7s | 131K | — |
|
|
165
|
+
| — | `ministral-3:8b` | 73.1 | 2.3s | 131K | Fastest strong model — 2.2s latency, 70+ score |
|
|
166
|
+
| — | `qwen3-coder-next` | 71.4 | 2.8s | 256K | — |
|
|
167
|
+
| — | `qwen3-next:80b` | 70.6 | 11.6s | 131K | — |
|
|
168
|
+
| — | `qwen3.5:397b` | 68.9 | 3.9s | 256K | — |
|
|
169
|
+
| — | `minimax-m2.7` | 68.7 | 6.8s | 200K | — |
|
|
170
|
+
| — | `glm-5` | 67.6 | 4.5s | 131K | — |
|
|
171
|
+
| — | `devstral-2:123b` | 67.6 | 2.0s | 131K | Sysadmin + SSH tasks, reliable coding |
|
|
172
|
+
| — | `glm-4.7` | 66.5 | 5.1s | 131K | — |
|
|
173
|
+
| — | `kimi-k2-thinking` | 66.3 | 18.4s | 256K | — |
|
|
174
|
+
| — | `ministral-3:14b` | 65.8 | 3.8s | 131K | — |
|
|
175
|
+
| — | `devstral-small-2:24b` | 65.5 | 2.3s | 131K | Fast sub-agents, simple lookups |
|
|
176
|
+
| — | `ministral-3:3b` | 65.4 | 2.2s | 32K | — |
|
|
177
|
+
| — | `kimi-k2.5` | 65.2 | 3.5s | 256K | Large repos — faster than k2:1t |
|
|
178
|
+
| — | `kimi-k2:1t` | 65.2 | 4.2s | 256K | Large repos (>100K tokens) |
|
|
179
|
+
| — | `minimax-m2.1` | 64.2 | 5.4s | 200K | — |
|
|
180
|
+
| — | `glm-4.6` | 63.9 | 4.9s | 131K | — |
|
|
181
|
+
| — | `qwen3-coder:480b` | 63.2 | 14.1s | 131K | Heavy coding sessions, large context |
|
|
182
|
+
| — | `nemotron-3-super` | 61.3 | 2.6s | 256K | — |
|
|
183
|
+
| — | `gpt-oss:20b` | 60.9 | 2.5s | 131K | Fast small model, good overall score |
|
|
184
|
+
| — | `mistral-large-3:675b` | 60.8 | 3.8s | 131K | — |
|
|
167
185
|
|
|
168
186
|
> Rankings are nex-code-specific: tool name accuracy, argument validity, schema compliance.
|
|
169
187
|
> Toolathon (Minimax SOTA) measures different task types — run `/benchmark --discover` after model releases.
|