nex-code 0.4.20 → 0.4.22

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -22,7 +22,7 @@
22
22
  <img src="https://img.shields.io/badge/Ollama_Cloud-supported-brightgreen.svg" alt="Ollama Cloud: supported">
23
23
  <img src="https://img.shields.io/badge/node-%3E%3D18-brightgreen.svg" alt="Node >= 18">
24
24
  <img src="https://img.shields.io/badge/dependencies-2-green.svg" alt="Dependencies: 2">
25
- <img src="https://img.shields.io/badge/tests-3453-blue.svg" alt="Tests: 3453">
25
+ <img src="https://img.shields.io/badge/tests-3719-blue.svg" alt="Tests: 3719">
26
26
  <img src="https://img.shields.io/badge/VS_Code-extension-007ACC.svg" alt="VS Code extension">
27
27
  </p>
28
28
 
@@ -123,6 +123,10 @@ The verify phase catches incomplete work before reporting "done" — if tests fa
123
123
 
124
124
  **Lightweight.** 2 runtime dependencies (`axios`, `dotenv`). Starts in ~100ms. No Python, no heavy runtime, no daemon process.
125
125
 
126
+ **Server-aware from the first message.** When your prompt contains a URL whose domain matches a configured SSH profile (e.g. `jarvis.example.com` → profile `jarvis`), nex-code probes the server before responding — listing ports, running processes, and data directories. The model receives this topology before its first token, so it goes straight to `ssh_exec` instead of reading local files.
127
+
128
+ **Few-shot behavior injection.** On each session start, nex-code injects a short example of the correct tool sequence for the detected task type (sysadmin → check remote logs first; coding → read file before editing; data → explain before rewriting). Works across all models without fine-tuning. Customize with your own high-scoring sessions via `npm run extract-examples`.
129
+
126
130
  **Infrastructure tools built in:**
127
131
 
128
132
  - SSH server management (AlmaLinux, macOS, any Linux)
@@ -143,7 +147,7 @@ The verify phase catches incomplete work before reporting "done" — if tests fa
143
147
 
144
148
  **Extensible.** Plugin API (`registerTool` + lifecycle hooks), skill system (install from any git URL), MCP server support.
145
149
 
146
- **Tested.** 3453 tests, 79% coverage, CI on every push.
150
+ **Tested.** 3719 tests, 83% coverage, CI on every push.
147
151
 
148
152
  ---
149
153
 
@@ -155,15 +159,33 @@ Rankings are based on nex-code's own `/benchmark` — 15 tool-calling tasks agai
155
159
  ### Flat-Rate / Pay-as-you-go
156
160
 
157
161
  <!-- nex-benchmark-start -->
158
- <!-- Updated: 2026-03-26 — run `/benchmark --discover` after new Ollama Cloud releases -->
162
+ <!-- Updated: 2026-03-29 — run `/benchmark --discover` after new Ollama Cloud releases -->
159
163
 
160
164
  | Rank | Model | Score | Avg Latency | Context | Best For |
161
165
  |---|---|---|---|---|---|
162
- | 🥇 | `devstral-2:123b` | **82.5** | 1.7s | 131K | Defaultfastest + most reliable tool selection |
163
- | 🥈 | `devstral-small-2:24b` | 75 | 3.1s | 131K | Fast sub-agents, simple lookups |
164
- | 🥉 | `qwen3-coder:480b` | 72.5 | 8.4s | 131K | Coding-heavy sessions, heavy sub-agents |
165
- | — | `kimi-k2:1t` | 67.5 | 6.6s | 256K | Large repos (>100K tokens) |
166
- | — | `minimax-m2.7:cloud` | 64.1 | 5.0s | 200K | Complex swarm / multi-agent sessions (Toolathon SOTA) |
166
+ | 🥇 | `qwen3-vl:235b` | **77.1** | 14.4s | 131K | Overall #1 frontier tool selection, data + agentic tasks |
167
+ | 🥈 | `qwen3-vl:235b-instruct` | 76.3 | 6.5s | 131K | Best latency/score balance recommended default |
168
+ | 🥉 | `rnj-1:8b` | 74 | 3.7s | 131K | |
169
+ | — | `ministral-3:8b` | 73.1 | 2.3s | 131K | Fastest strong model 2.2s latency, 70+ score |
170
+ | — | `qwen3-coder-next` | 71.4 | 2.8s | 256K | |
171
+ | — | `qwen3-next:80b` | 70.6 | 11.6s | 131K | — |
172
+ | — | `qwen3.5:397b` | 68.9 | 3.9s | 256K | — |
173
+ | — | `minimax-m2.7` | 68.7 | 6.8s | 200K | — |
174
+ | — | `glm-5` | 67.6 | 4.5s | 131K | — |
175
+ | — | `devstral-2:123b` | 67.6 | 2.0s | 131K | Sysadmin + SSH tasks, reliable coding |
176
+ | — | `glm-4.7` | 66.5 | 5.1s | 131K | — |
177
+ | — | `kimi-k2-thinking` | 66.3 | 18.4s | 256K | — |
178
+ | — | `ministral-3:14b` | 65.8 | 3.8s | 131K | — |
179
+ | — | `devstral-small-2:24b` | 65.5 | 2.3s | 131K | Fast sub-agents, simple lookups |
180
+ | — | `ministral-3:3b` | 65.4 | 2.2s | 32K | — |
181
+ | — | `kimi-k2.5` | 65.2 | 3.5s | 256K | Large repos — faster than k2:1t |
182
+ | — | `kimi-k2:1t` | 65.2 | 4.2s | 256K | Large repos (>100K tokens) |
183
+ | — | `minimax-m2.1` | 64.2 | 5.4s | 200K | — |
184
+ | — | `glm-4.6` | 63.9 | 4.9s | 131K | — |
185
+ | — | `qwen3-coder:480b` | 63.2 | 14.1s | 131K | Heavy coding sessions, large context |
186
+ | — | `nemotron-3-super` | 61.3 | 2.6s | 256K | — |
187
+ | — | `gpt-oss:20b` | 60.9 | 2.5s | 131K | Fast small model, good overall score |
188
+ | — | `mistral-large-3:675b` | 60.8 | 3.8s | 131K | — |
167
189
 
168
190
  > Rankings are nex-code-specific: tool name accuracy, argument validity, schema compliance.
169
191
  > Toolathon (Minimax SOTA) measures different task types — run `/benchmark --discover` after model releases.
@@ -451,6 +473,8 @@ Fallback chains let you auto-switch when a provider fails:
451
473
  /fallback anthropic,openai,local
452
474
  ```
453
475
 
476
+ **Wire Protocol Layer:** All 5 providers share 3 wire protocol implementations (OpenAI-compatible SSE, Anthropic Messages SSE, Ollama NDJSON). Stream parsing, tool call accumulation, and response normalization are handled by reusable `StreamParser` classes — eliminating duplicated protocol code across providers.
477
+
454
478
  ---
455
479
 
456
480
  ## Commands
@@ -473,6 +497,12 @@ Type `/` to see inline suggestions as you type. Tab completion is supported for
473
497
  | `/load <name>` | Load a saved session |
474
498
  | `/sessions` | List saved sessions |
475
499
  | `/resume` | Resume last session |
500
+ | `/branches` | Show session tree (all conversation branches) |
501
+ | `/timeline [n]` | Show message timeline of current branch |
502
+ | `/goto <index>` | Jump to a message index (truncates later messages) |
503
+ | `/fork [index] [name]` | Create a new branch at the given message index |
504
+ | `/switch-branch <name>` | Switch to a different conversation branch |
505
+ | `/delete-branch <name>` | Delete a conversation branch |
476
506
  | `/remember <text>` | Save a memory (persists across sessions) |
477
507
  | `/forget <key>` | Delete a memory |
478
508
  | `/memory` | Show all memories |
@@ -860,6 +890,37 @@ Only sessions from the last 24 hours are offered for auto-resume. Older autosave
860
890
 
861
891
  Sessions are stored in `.nex/sessions/` as JSON files. Auto-saves always write to `_autosave` (overwritten each turn). Writes are atomic — a temp file is written and renamed, so a crash mid-write never corrupts the saved state.
862
892
 
893
+ ### Session Trees
894
+
895
+ Navigate your conversation history like git branches. Fork at any point, explore alternative approaches, and switch between branches:
896
+
897
+ ```
898
+ /timeline # see message indices
899
+ /fork 5 experiment # branch from message 5
900
+ /branches # see all branches
901
+ /switch-branch main # go back to main
902
+ /goto 3 # jump to message 3 (truncates later messages)
903
+ /delete-branch experiment
904
+ ```
905
+
906
+ This enables non-linear conversations: try an approach, and if it doesn't work, fork from an earlier point and try something different — without losing the original attempt.
907
+
908
+ ### Autoresearch
909
+
910
+ Autonomous optimization loops inspired by Karpathy's autoresearch pattern. The agent edits code, runs experiments, logs results, and automatically keeps improvements or reverts failures:
911
+
912
+ ```
913
+ /autoresearch reduce test runtime while maintaining correctness
914
+ /autoresearch optimize bundle size under 500kb
915
+ ```
916
+
917
+ The agent follows a repeating cycle: **checkpoint** (git) -> **edit** -> **run experiment** -> **log result** -> **keep or revert**. All experiments are logged to `.nex/autoresearch/experiments.json` with metrics and trend tracking.
918
+
919
+ ```
920
+ /ar-status # show experiment history with trends
921
+ /ar-clear # reset experiment history
922
+ ```
923
+
863
924
  ### Memory
864
925
 
865
926
  Persistent project memory that survives across sessions:
@@ -1589,10 +1650,12 @@ npm test # Run all tests with coverage
1589
1650
  npm run test:watch # Watch mode
1590
1651
  ```
1591
1652
 
1592
- 83 test suites, 3453 tests, 79% statement / 71% branch coverage.
1653
+ 91 test suites, 3719 tests, 83% statement / 74% branch coverage.
1593
1654
 
1594
1655
  CI runs on GitHub Actions (Node 20 LTS).
1595
1656
 
1657
+ **Type checking:** `npm run typecheck` runs TypeScript in `noEmit` mode with `allowJs`. Core type definitions live in `types/index.d.ts` (Message, ToolCall, IProvider, IWireProtocol, Session, Skill, etc.). The codebase uses incremental TypeScript adoption — new modules can be written in `.ts` while existing `.js` files are gradually migrated.
1658
+
1596
1659
  ---
1597
1660
 
1598
1661
  ## Dependencies