nex-code 0.5.1 → 0.5.3

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -60,13 +60,23 @@ On first launch, an interactive setup wizard guides you through provider and cre
60
60
  | Startup time | ~100ms | 1-4s |
61
61
  | Runtime deps | 2 | heavy |
62
62
  | Infra tools | SSH, Docker, K8s built-in | no |
63
+ <<<<<<< Updated upstream
63
64
 
64
65
  **Smart model routing.** The built-in `/benchmark` tests all configured models across 62 tool-calling tasks in 5 categories and auto-routes to the best model per task type.
65
66
 
66
67
  **Phase-based execution.** Tasks run through Plan (analyze) -> Implement (code) -> Verify (test) phases, each with the optimal model. Auto-loops back on test failures.
67
68
 
68
- **45 built-in tools** across file ops, git, SSH, Docker, Kubernetes, deploy, browser, GitHub Actions, and visual review.
69
+ **45 built-in tools** across file ops, git, SSH, Docker, Kubernetes, deploy, browser, GitHub Actions, and visual review. See [Tools](#tools) for the full list.
69
70
 
71
+ =======
72
+
73
+ **Smart model routing.** The built-in `/benchmark` tests all configured models across 62 tool-calling tasks in 5 categories and auto-routes to the best model per task type.
74
+
75
+ **Phase-based execution.** Tasks run through Plan (analyze) -> Implement (code) -> Verify (test) phases, each with the optimal model. Auto-loops back on test failures.
76
+
77
+ **45 built-in tools** across file ops, git, SSH, Docker, Kubernetes, deploy, browser, GitHub Actions, and visual review. See [Tools](#tools) for the full list.
78
+
79
+ >>>>>>> Stashed changes
70
80
  **2 runtime dependencies** (`axios`, `dotenv`). Starts in ~100ms. No Python, no heavy runtime.
71
81
 
72
82
  ---
@@ -76,17 +86,42 @@ On first launch, an interactive setup wizard guides you through provider and cre
76
86
  Rankings from nex-code's own `/benchmark` — 62 tasks testing tool selection, argument validity, and schema compliance.
77
87
 
78
88
  <!-- nex-benchmark-start -->
79
- <!-- Updated: 2026-04-01 — run `/benchmark --discover` after new Ollama Cloud releases -->
89
+ <!-- Updated: 2026-04-05 — run `/benchmark --discover` after new Ollama Cloud releases -->
80
90
 
81
- | Rank | Model | Score | Latency | Context | Best For |
91
+ | Rank | Model | Score | Avg Latency | Context | Best For |
82
92
  |---|---|---|---|---|---|
83
- | 1 | `qwen3-vl:235b-instruct` | **79.9** | 3.8s | 131K | Best latency/score balance |
84
- | 2 | `qwen3-vl:235b` | 79.4 | 12.3s | 131K | Frontier tool selection |
85
- | 3 | `qwen3-coder-next` | 74.9 | 1.7s | 256K | — |
86
- | 4 | `ministral-3:8b` | 74.2 | 1.2s | 131K | Fastest strong model |
87
- | 5 | `devstral-2:123b` | 69.9 | 1.6s | 131K | Sysadmin/SSH, reliable |
88
-
89
- > Run `/benchmark --discover` to detect new models and auto-update this table.
93
+ | 🥇 | `qwen3-vl:235b` | **79** | 12.4s | 131K | Overall #1 frontier tool selection, data + agentic tasks |
94
+ | 🥈 | `qwen3-vl:235b-instruct` | 78.2 | 5.3s | 131K | Best latency/score balance — recommended default |
95
+ | 🥉 | `nemotron-3-super` | 78.1 | 3.5s | 256K | — |
96
+ | | `rnj-1:8b` | 77.4 | 3.9s | 131K | |
97
+ | | `mistral-large-3:675b` | 76.5 | 3.9s | 131K | |
98
+ | — | `gpt-oss:20b` | 76.5 | 1.9s | 131K | Fast small model, good overall score |
99
+ | | `qwen3-coder-next` | 75.7 | 2.2s | 256K | — |
100
+ | — | `qwen3-next:80b` | 75.1 | 11.1s | 131K | — |
101
+ | — | `ministral-3:8b` | 73.8 | 2.0s | 131K | Fastest strong model — 2.2s latency, 70+ score |
102
+ | — | `deepseek-v3.1:671b` | 73.6 | 2.9s | 131K | — |
103
+ | — | `devstral-2:123b` | 73.2 | 2.0s | 131K | Sysadmin + SSH tasks, reliable coding |
104
+ | — | `kimi-k2:1t` | 72.2 | 5.6s | 256K | Large repos (>100K tokens) |
105
+ | — | `ministral-3:3b` | 72 | 1.6s | 32K | — |
106
+ | — | `devstral-small-2:24b` | 71.7 | 2.6s | 131K | Fast sub-agents, simple lookups |
107
+ | — | `qwen3.5:397b` | 70.7 | 4.2s | 256K | — |
108
+ | — | `qwen3-coder:480b` | 70.1 | 6.0s | 131K | Heavy coding sessions, large context |
109
+ | — | `minimax-m2.1` | 69.9 | 3.0s | 200K | — |
110
+ | — | `gemma4:31b` | 69.3 | 2.8s | ? | — |
111
+ | — | `glm-4.7` | 69.1 | 5.3s | 131K | — |
112
+ | — | `kimi-k2-thinking` | 69 | 3.1s | 256K | — |
113
+ | — | `ministral-3:14b` | 68.8 | 2.0s | 131K | — |
114
+ | — | `kimi-k2.5` | 68.7 | 3.4s | 256K | Large repos — faster than k2:1t |
115
+ | — | `minimax-m2.7` | 68.4 | 5.5s | 200K | — |
116
+ | — | `glm-4.6` | 67.8 | 4.7s | 131K | — |
117
+ | — | `glm-5` | 67.4 | 5.0s | 131K | — |
118
+ | — | `gpt-oss:120b` | 64.8 | 3.4s | 131K | — |
119
+ | — | `nemotron-3-nano:30b` | 64.7 | 2.3s | 131K | — |
120
+ | — | `minimax-m2.5` | 61.9 | 2.7s | 131K | Multi-agent, large context |
121
+ | — | `minimax-m2` | 60.6 | 4.3s | 200K | — |
122
+
123
+ > Rankings are nex-code-specific: tool name accuracy, argument validity, schema compliance.
124
+ > Toolathon (Minimax SOTA) measures different task types — run `/benchmark --discover` after model releases.
90
125
  <!-- nex-benchmark-end -->
91
126
 
92
127
  **Recommended `.env`:**
@@ -111,7 +146,7 @@ OLLAMA_API_KEY=your-key # Ollama Cloud
111
146
  OPENAI_API_KEY=your-key # OpenAI
112
147
  ANTHROPIC_API_KEY=your-key # Anthropic
113
148
  GEMINI_API_KEY=your-key # Gemini
114
- PERPLEXITY_API_KEY=your-key # optional — grounded web search
149
+ PERPLEXITY_API_KEY=your-key # optional — enables grounded web search
115
150
 
116
151
  DEFAULT_PROVIDER=ollama
117
152
  DEFAULT_MODEL=devstral-2:123b
@@ -136,18 +171,20 @@ cp .env.example .env && npm link && npm run install-hooks
136
171
  > the /users endpoint returns 500 — find the bug and fix it
137
172
  ```
138
173
 
139
- **YOLO mode** — skip all confirmations, auto-runs `caffeinate` on macOS:
174
+ ### YOLO Mode
175
+
176
+ Skip all confirmations — file changes, dangerous commands, and tool permissions are auto-approved. Auto-runs `caffeinate` on macOS.
140
177
 
141
178
  ```bash
142
179
  nex-code -yolo
143
180
  ```
144
181
 
145
- **Headless / Programmatic:**
182
+ ### Headless / Programmatic Mode
146
183
 
147
184
  ```bash
148
185
  nex-code --task "refactor src/index.js to async/await" --yolo
149
186
  nex-code --prompt-file /tmp/task.txt --yolo --json
150
- nex-code --daemon # background watcher (reads .nex/daemon.json)
187
+ nex-code --daemon # watch mode: fires tasks on file changes, git commits, or cron
151
188
  ```
152
189
 
153
190
  | Flag | Description |
@@ -156,33 +193,39 @@ nex-code --daemon # background watcher (reads .nex/daemon.json)
156
193
  | `--prompt-file <path>` | Read prompt from file |
157
194
  | `--yolo` | Skip all confirmations |
158
195
  | `--server` | JSON-lines IPC server (VS Code extension) |
159
- | `--daemon` | Background watcher (file changes, git commits, cron) |
196
+ | `--daemon` | Background watcher (reads `.nex/daemon.json`) |
160
197
  | `--flatrate` | 100 turns, 6 parallel agents, 5 retries |
161
198
  | `--json` | JSON output to stdout |
162
199
  | `--max-turns <n>` | Override agentic loop limit |
163
- | `--model <spec>` | e.g. `anthropic:claude-sonnet-4-6` |
200
+ | `--model <spec>` | Use specific model (e.g. `anthropic:claude-sonnet-4-6`) |
164
201
  | `--debug` | Show diagnostic messages |
202
+ <<<<<<< Updated upstream
203
+
204
+ ### Vision / Screenshot
205
+
206
+ =======
165
207
 
166
- **Vision / Screenshot:**
208
+ ### Vision / Screenshot
167
209
 
210
+ >>>>>>> Stashed changes
168
211
  ```
169
212
  > /path/to/screenshot.png implement this UI in React
170
213
  > analyze https://example.com/mockup.png and implement it
171
- > what's wrong with the layout in my clipboard # macOS clipboard
214
+ > what's wrong with the layout in my clipboard # macOS clipboard capture
172
215
  > screenshot localhost:3000 and review the navbar spacing
173
216
  ```
174
217
 
175
- Formats: PNG, JPG, GIF, WebP, BMP. Works with all providers that support vision.
218
+ Works with Anthropic, OpenAI, Gemini, and Ollama vision models. Formats: PNG, JPG, GIF, WebP, BMP.
176
219
 
177
220
  ---
178
221
 
179
222
  ## Providers & Models
180
223
 
181
224
  ```
182
- /model # interactive picker
183
- /model openai:gpt-4o # switch directly
184
- /providers # list all
185
- /fallback anthropic,openai # auto-switch on failure
225
+ /model # interactive picker
226
+ /model openai:gpt-4o # switch directly
227
+ /providers # list all
228
+ /fallback anthropic,openai # auto-switch on failure
186
229
  ```
187
230
 
188
231
  | Provider | Models | Env Variable |
@@ -197,12 +240,13 @@ Formats: PNG, JPG, GIF, WebP, BMP. Works with all providers that support vision.
197
240
 
198
241
  ## Commands
199
242
 
200
- Type `/` for inline suggestions. Tab completion for commands and file paths.
243
+ Type `/` to see inline suggestions. Tab completion for slash commands and file paths.
201
244
 
202
245
  | Command | Description |
203
246
  |---|---|
204
247
  | `/help` | Full help |
205
- | `/model [spec]` / `/providers` | Switch model / list all |
248
+ | `/model [spec]` | Show/switch model |
249
+ | `/providers` | List providers |
206
250
  | `/clear` | Clear conversation |
207
251
  | `/save` / `/load` / `/sessions` / `/resume` | Session management |
208
252
  | `/branches` / `/fork` / `/switch-branch` / `/goto` | Session tree navigation |
@@ -227,7 +271,7 @@ Type `/` for inline suggestions. Tab completion for commands and file paths.
227
271
 
228
272
  ## Tools
229
273
 
230
- 45 built-in tools:
274
+ 45 built-in tools organized by category:
231
275
 
232
276
  **Core:** `bash`, `read_file`, `write_file`, `edit_file`, `patch_file`, `list_directory`, `search_files`, `glob`, `grep`
233
277
 
@@ -245,7 +289,7 @@ Type `/` for inline suggestions. Tab completion for commands and file paths.
245
289
 
246
290
  **Deploy:** `deploy`, `deployment_status`
247
291
 
248
- **Frontend:** `frontend_recon` — scans design tokens, layout, and framework stack before any frontend work
292
+ **Frontend:** `frontend_recon` — scans design tokens, layout, framework stack before any frontend work
249
293
 
250
294
  **Visual:** `visual_diff`, `responsive_sweep`, `visual_annotate`, `visual_watch`, `design_tokens`, `design_compare`
251
295
 
@@ -257,7 +301,7 @@ Additional tools via [MCP servers](#mcp) or [Skills](#skills).
257
301
 
258
302
  ### Multi-Agent Orchestrator
259
303
 
260
- Multi-goal prompts auto-decompose into parallel sub-agents (up to 5 simultaneously, with file locking):
304
+ Multi-goal prompts auto-decompose into parallel sub-agents. Up to 5 agents run simultaneously with file locking.
261
305
 
262
306
  ```bash
263
307
  nex-code --task "fix type errors in src/, add JSDoc to utils/, update CHANGELOG"
@@ -265,24 +309,28 @@ nex-code --task "fix type errors in src/, add JSDoc to utils/, update CHANGELOG"
265
309
 
266
310
  ### Autoresearch
267
311
 
268
- Autonomous optimization loops edit experiment keep/revert, on a dedicated git branch:
312
+ Autonomous optimization loops: edit -> experiment -> keep/revert, on a dedicated branch.
269
313
 
270
314
  ```
271
315
  /autoresearch reduce test runtime while maintaining correctness
272
- /ar-self-improve # uses nex-code's own benchmark as the fitness metric
316
+ /ar-self-improve # self-improvement using nex-code's benchmark
273
317
  ```
274
318
 
275
319
  ### Plan Mode
276
320
 
277
- Auto-activates for implementation tasks. Read-only analysis first, approve before writes. Hard-enforced tool restrictions during analysis phase.
321
+ Auto-activates for implementation tasks. Read-only analysis first, approve before writes. Hard-enforced tool restrictions.
278
322
 
279
323
  ### Daemon / Watch Mode
324
+ <<<<<<< Updated upstream
325
+ Background process that fires tasks on file changes, git commits, or cron schedule. Configured via `.nex/daemon.json`. Desktop and Matrix notifications.
326
+ =======
280
327
 
281
328
  Background process that fires tasks on file changes, git commits, or cron schedule. Configured via `.nex/daemon.json`. Desktop and Matrix notifications.
282
329
 
330
+ >>>>>>> Stashed changes
283
331
  ### Session Trees
284
332
 
285
- Navigate conversation history like git branches — fork, switch, goto, delete branches without losing prior attempts.
333
+ Navigate conversation history like git branches — fork, switch, goto, delete branches.
286
334
 
287
335
  ### Safety
288
336
 
@@ -295,8 +343,6 @@ Navigate conversation history like git branches — fork, switch, goto, delete b
295
343
 
296
344
  Pre-push secret detection, audit logging (JSONL), persistent undo/redo, cost limits, auto plan mode.
297
345
 
298
- **Reporting vulnerabilities:** Email **security@schoensgibl.com** — not a public issue. 72h initial response.
299
-
300
346
  ### Open-Source Model Robustness
301
347
 
302
348
  - **5-layer argument parsing** — JSON, trailing fix, extraction, key repair, fence stripping
@@ -304,10 +350,37 @@ Pre-push secret detection, audit logging (JSONL), persistent undo/redo, cost lim
304
350
  - **Auto-fix engine** — path resolution, edit fuzzy matching (Levenshtein), bash error hints
305
351
  - **Tool tiers** — essential (5) / standard (21) / full (45), auto-selected per model capability
306
352
  - **Stale stream recovery** — progressive retry with context compression on stall
353
+ <<<<<<< Updated upstream
354
+ ### Visual Development Tools
355
+ Pixel-level before/after comparison, responsive sweeps (320-1920px), annotation overlays, design token extraction, and live-reload diff watching. Pure image tools work standalone; browser-based tools need Playwright.
356
+
357
+ ---
358
+
359
+ ## Extensibility
360
+
361
+ ### Skills
362
+
363
+ Drop `.md` or `.js` files in `.nex/skills/` for project-specific knowledge, commands, and tools. Global skills in `~/.nex-code/skills/`. Install from git: `/install-skill user/repo`.
364
+
365
+ ### Plugins
366
+
367
+ Custom tools and lifecycle hooks via `.nex/plugins/`. Events: `onToolResult`, `onModelResponse`, `onSessionStart`, `onSessionEnd`, `onFileChange`, `beforeToolExec`, `afterToolExec`.
368
+
369
+ ### MCP
370
+
371
+ Connect external tool servers via [Model Context Protocol](https://modelcontextprotocol.io). Configure in `.nex/mcp.json` with env var interpolation.
372
+
373
+ ### Hooks
374
+
375
+ Run custom scripts on CLI events (`pre-tool`, `post-tool`, `pre-commit`, `post-response`, `session-start`, `session-end`). Configure in `.nex/config.json` or `.nex/hooks/`.
376
+
377
+ ---
378
+
379
+ =======
307
380
 
308
381
  ### Visual Development Tools
309
382
 
310
- Pixel-level before/after diffs, responsive sweeps (3201920px), annotation overlays, design token extraction, and live-reload diff watching. Pure image tools work standalone; browser tools need Playwright.
383
+ Pixel-level before/after comparison, responsive sweeps (320-1920px), annotation overlays, design token extraction, and live-reload diff watching. Pure image tools work standalone; browser-based tools need Playwright.
311
384
 
312
385
  ---
313
386
 
@@ -331,13 +404,14 @@ Run custom scripts on CLI events (`pre-tool`, `post-tool`, `pre-commit`, `post-r
331
404
 
332
405
  ---
333
406
 
407
+ >>>>>>> Stashed changes
334
408
  ## VS Code Extension
335
409
 
336
410
  Built-in sidebar chat panel (`vscode/`) with streaming output, collapsible tool cards, and native theme support. Spawns `nex-code --server` over JSON-lines IPC.
337
411
 
338
412
  ```bash
339
413
  cd vscode && npm install && npm run package
340
- # Cmd+Shift+P Extensions: Install from VSIX...
414
+ # Cmd+Shift+P -> Extensions: Install from VSIX...
341
415
  ```
342
416
 
343
417
  ---
@@ -352,7 +426,11 @@ cli/
352
426
  tools/index.js # 45 tool definitions + auto-fix engine
353
427
  context-engine.js # Token management + 5-phase compression
354
428
  sub-agent.js # Parallel sub-agents with file locking
355
- orchestrator.js # Multi-agent decompose → execute → synthesize
429
+ <<<<<<< Updated upstream
430
+ orchestrator.js # Multi-agent decompose -> execute -> synthesize
431
+ =======
432
+ orchestrator.js # Multi-agent decompose -> execute -> synthesize
433
+ >>>>>>> Stashed changes
356
434
  session-tree.js # Session branching
357
435
  visual.js # Visual dev tools (pixelmatch-based)
358
436
  browser.js # Playwright browser agent
@@ -366,10 +444,10 @@ See [DEVELOPMENT.md](DEVELOPMENT.md) for full architecture details.
366
444
  ## Testing
367
445
 
368
446
  ```bash
369
- npm test # 97 suites, 3920 tests
370
- npm run typecheck # TypeScript noEmit check
371
- npm run benchmark:gate # 7-task smoke test (blocks push on regression)
372
- npm run benchmark:reallife # 35 real-world tasks across 7 categories
447
+ npm test # 97 suites, 3920 tests
448
+ npm run typecheck # TypeScript noEmit check
449
+ npm run benchmark:gate # 7-task smoke test (blocks push on regression)
450
+ npm run benchmark:reallife # 35 real-world tasks across 7 categories
373
451
  ```
374
452
 
375
453
  ---
@@ -379,10 +457,12 @@ npm run benchmark:reallife # 35 real-world tasks across 7 categories
379
457
  - Pre-push secret detection (API keys, private keys, hardcoded credentials)
380
458
  - Audit logging with automatic argument sanitization
381
459
  - Sensitive path blocking (`.ssh/`, `.aws/`, `.env`, credentials)
382
- - Shell injection protection via `execFileSync` with argument arrays throughout
460
+ - Shell injection protection via `execFileSync` with argument arrays
383
461
  - SSRF protection on `web_fetch`
384
462
  - MCP environment isolation
385
463
 
464
+ **Reporting vulnerabilities:** Email **security@schoensgibl.com** (not a public issue). Allow 72h for initial response.
465
+
386
466
  ---
387
467
 
388
468
  ## License