nex-code 0.5.10 → 0.5.12
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +44 -28
- package/dist/background-worker.js +511 -489
- package/dist/benchmark.js +559 -536
- package/dist/nex-code.js +899 -861
- package/package.json +1 -1
package/README.md
CHANGED
|
@@ -76,42 +76,58 @@ On first launch, an interactive setup wizard guides you through provider and cre
|
|
|
76
76
|
Rankings from nex-code's own `/benchmark` — 62 tasks testing tool selection, argument validity, and schema compliance.
|
|
77
77
|
|
|
78
78
|
<!-- nex-benchmark-start -->
|
|
79
|
-
<!-- Updated: 2026-04-
|
|
79
|
+
<!-- Updated: 2026-04-12 — run `/benchmark --discover` after new Ollama Cloud releases -->
|
|
80
80
|
|
|
81
81
|
| Rank | Model | Score | Avg Latency | Context | Best For |
|
|
82
82
|
|---|---|---|---|---|---|
|
|
83
|
-
| 🥇 | `qwen3-vl:235b` | **
|
|
84
|
-
| 🥈 | `
|
|
85
|
-
| 🥉 | `
|
|
86
|
-
| — | `
|
|
87
|
-
| — | `deepseek-v3.1:671b` |
|
|
88
|
-
| — | `qwen3-coder-next` |
|
|
89
|
-
| — | `
|
|
90
|
-
| — | `ministral-3:8b` |
|
|
91
|
-
| — | `
|
|
92
|
-
| — | `
|
|
93
|
-
| — | `
|
|
94
|
-
| — | `
|
|
95
|
-
| — | `
|
|
96
|
-
| — | `
|
|
97
|
-
| — | `
|
|
98
|
-
| — | `
|
|
99
|
-
| — | `kimi-k2
|
|
100
|
-
| — | `
|
|
101
|
-
| — | `
|
|
102
|
-
| — | `
|
|
103
|
-
| — | `minimax-m2.
|
|
104
|
-
| — | `
|
|
105
|
-
| — | `
|
|
106
|
-
| — | `
|
|
107
|
-
| — | `
|
|
108
|
-
| — | `
|
|
109
|
-
| — | `
|
|
83
|
+
| 🥇 | `qwen3-vl:235b` | **100** | 13.4s | 131K | Overall #1 — frontier tool selection, data + agentic tasks |
|
|
84
|
+
| 🥈 | `qwen3-vl:235b-instruct` | 97.5 | 7.7s | 131K | Best latency/score balance — recommended default |
|
|
85
|
+
| 🥉 | `glm-4.6` | 97.5 | 26.8s | 131K | — |
|
|
86
|
+
| — | `qwen3-next:80b` | 97.2 | 8.0s | 131K | — |
|
|
87
|
+
| — | `deepseek-v3.1:671b` | 94.5 | 3.1s | 131K | — |
|
|
88
|
+
| — | `qwen3-coder-next` | 94.3 | 2.2s | 256K | — |
|
|
89
|
+
| — | `qwen3.5:397b` | 94.3 | 4.2s | 256K | — |
|
|
90
|
+
| — | `ministral-3:8b` | 94.3 | 1.6s | 131K | Fastest strong model — 2.2s latency, 70+ score |
|
|
91
|
+
| — | `minimax-m2.7` | 92.9 | 4.7s | 200K | — |
|
|
92
|
+
| — | `rnj-1:8b` | 92.2 | 2.1s | 131K | — |
|
|
93
|
+
| — | `glm-5` | 91.7 | 3.6s | 131K | — |
|
|
94
|
+
| — | `nemotron-3-super` | 91.4 | 1.7s | 256K | — |
|
|
95
|
+
| — | `ministral-3:14b` | 91.2 | 1.5s | 131K | — |
|
|
96
|
+
| — | `qwen3-coder:480b` | 91 | 8.3s | 131K | Heavy coding sessions, large context |
|
|
97
|
+
| — | `glm-4.7` | 90.7 | 4.1s | 131K | — |
|
|
98
|
+
| — | `devstral-2:123b` | 90.3 | 8.1s | 131K | Sysadmin + SSH tasks, reliable coding |
|
|
99
|
+
| — | `kimi-k2:1t` | 90.3 | 3.7s | 256K | Large repos (>100K tokens) |
|
|
100
|
+
| — | `minimax-m2` | 90 | 3.4s | 200K | — |
|
|
101
|
+
| — | `devstral-small-2:24b` | 88.8 | 6.8s | 131K | Fast sub-agents, simple lookups |
|
|
102
|
+
| — | `kimi-k2-thinking` | 88.7 | 4.3s | 256K | — |
|
|
103
|
+
| — | `minimax-m2.1` | 88.1 | 2.5s | 200K | — |
|
|
104
|
+
| — | `glm-5.1` | 87.2 | 5.0s | ? | — |
|
|
105
|
+
| — | `kimi-k2.5` | 86.2 | 4.8s | 256K | Large repos — faster than k2:1t |
|
|
106
|
+
| — | `gemma4:31b` | 85.2 | 4.8s | ? | — |
|
|
107
|
+
| — | `minimax-m2.5` | 84.2 | 6.8s | 131K | Multi-agent, large context |
|
|
108
|
+
| — | `gpt-oss:120b` | 83.9 | 2.8s | 131K | — |
|
|
109
|
+
| — | `mistral-large-3:675b` | 82.5 | 7.0s | 131K | — |
|
|
110
|
+
| — | `ministral-3:3b` | 82.4 | 1.3s | 32K | — |
|
|
111
|
+
| — | `gpt-oss:20b` | 81.1 | 1.5s | 131K | Fast small model, good overall score |
|
|
112
|
+
| — | `nemotron-3-nano:30b` | 78.3 | 2.3s | 131K | — |
|
|
113
|
+
| — | `gemini-3-flash-preview` | 76.5 | 3.3s | 131K | — |
|
|
114
|
+
| — | `deepseek-v3.2` | 65.4 | 14.3s | 131K | — |
|
|
115
|
+
| — | `cogito-2.1:671b` | 65.2 | 3.4s | 131K | — |
|
|
110
116
|
|
|
111
117
|
> Rankings are nex-code-specific: tool name accuracy, argument validity, schema compliance.
|
|
112
118
|
> Toolathon (Minimax SOTA) measures different task types — run `/benchmark --discover` after model releases.
|
|
113
119
|
<!-- nex-benchmark-end -->
|
|
114
120
|
|
|
121
|
+
<!-- nex-routing-start -->
|
|
122
|
+
<!-- Updated: 2026-04-12 -->
|
|
123
|
+
|
|
124
|
+
**Model routing by task type** (auto-updated by `/benchmark --all`):
|
|
125
|
+
|
|
126
|
+
| Category | Model | Score |
|
|
127
|
+
|---|---|---|
|
|
128
|
+
| coding | `new` | 90/100 |
|
|
129
|
+
<!-- nex-routing-end -->
|
|
130
|
+
|
|
115
131
|
**Recommended `.env`:**
|
|
116
132
|
|
|
117
133
|
```env
|