nex-code 0.4.20 → 0.4.22
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +72 -9
- package/dist/nex-code.js +533 -492
- package/examples/agentic.md +17 -0
- package/examples/coding.md +14 -0
- package/examples/data.md +14 -0
- package/examples/frontend.md +15 -0
- package/examples/skills/code-review.md +15 -0
- package/examples/skills/docker-deploy.js +17 -0
- package/examples/skills/git-workflow.md +23 -0
- package/examples/sysadmin.md +15 -0
- package/package.json +7 -2
package/README.md
CHANGED
|
@@ -22,7 +22,7 @@
|
|
|
22
22
|
<img src="https://img.shields.io/badge/Ollama_Cloud-supported-brightgreen.svg" alt="Ollama Cloud: supported">
|
|
23
23
|
<img src="https://img.shields.io/badge/node-%3E%3D18-brightgreen.svg" alt="Node >= 18">
|
|
24
24
|
<img src="https://img.shields.io/badge/dependencies-2-green.svg" alt="Dependencies: 2">
|
|
25
|
-
<img src="https://img.shields.io/badge/tests-
|
|
25
|
+
<img src="https://img.shields.io/badge/tests-3719-blue.svg" alt="Tests: 3719">
|
|
26
26
|
<img src="https://img.shields.io/badge/VS_Code-extension-007ACC.svg" alt="VS Code extension">
|
|
27
27
|
</p>
|
|
28
28
|
|
|
@@ -123,6 +123,10 @@ The verify phase catches incomplete work before reporting "done" — if tests fa
|
|
|
123
123
|
|
|
124
124
|
**Lightweight.** 2 runtime dependencies (`axios`, `dotenv`). Starts in ~100ms. No Python, no heavy runtime, no daemon process.
|
|
125
125
|
|
|
126
|
+
**Server-aware from the first message.** When your prompt contains a URL whose domain matches a configured SSH profile (e.g. `jarvis.example.com` → profile `jarvis`), nex-code probes the server before responding — listing ports, running processes, and data directories. The model receives this topology before its first token, so it goes straight to `ssh_exec` instead of reading local files.
|
|
127
|
+
|
|
128
|
+
**Few-shot behavior injection.** On each session start, nex-code injects a short example of the correct tool sequence for the detected task type (sysadmin → check remote logs first; coding → read file before editing; data → explain before rewriting). Works across all models without fine-tuning. Customize with your own high-scoring sessions via `npm run extract-examples`.
|
|
129
|
+
|
|
126
130
|
**Infrastructure tools built in:**
|
|
127
131
|
|
|
128
132
|
- SSH server management (AlmaLinux, macOS, any Linux)
|
|
@@ -143,7 +147,7 @@ The verify phase catches incomplete work before reporting "done" — if tests fa
|
|
|
143
147
|
|
|
144
148
|
**Extensible.** Plugin API (`registerTool` + lifecycle hooks), skill system (install from any git URL), MCP server support.
|
|
145
149
|
|
|
146
|
-
**Tested.**
|
|
150
|
+
**Tested.** 3719 tests, 83% coverage, CI on every push.
|
|
147
151
|
|
|
148
152
|
---
|
|
149
153
|
|
|
@@ -155,15 +159,33 @@ Rankings are based on nex-code's own `/benchmark` — 15 tool-calling tasks agai
|
|
|
155
159
|
### Flat-Rate / Pay-as-you-go
|
|
156
160
|
|
|
157
161
|
<!-- nex-benchmark-start -->
|
|
158
|
-
<!-- Updated: 2026-03-
|
|
162
|
+
<!-- Updated: 2026-03-29 — run `/benchmark --discover` after new Ollama Cloud releases -->
|
|
159
163
|
|
|
160
164
|
| Rank | Model | Score | Avg Latency | Context | Best For |
|
|
161
165
|
|---|---|---|---|---|---|
|
|
162
|
-
| 🥇 | `
|
|
163
|
-
| 🥈 | `
|
|
164
|
-
| 🥉 | `
|
|
165
|
-
| — | `
|
|
166
|
-
| — | `
|
|
166
|
+
| 🥇 | `qwen3-vl:235b` | **77.1** | 14.4s | 131K | Overall #1 — frontier tool selection, data + agentic tasks |
|
|
167
|
+
| 🥈 | `qwen3-vl:235b-instruct` | 76.3 | 6.5s | 131K | Best latency/score balance — recommended default |
|
|
168
|
+
| 🥉 | `rnj-1:8b` | 74 | 3.7s | 131K | — |
|
|
169
|
+
| — | `ministral-3:8b` | 73.1 | 2.3s | 131K | Fastest strong model — 2.2s latency, 70+ score |
|
|
170
|
+
| — | `qwen3-coder-next` | 71.4 | 2.8s | 256K | — |
|
|
171
|
+
| — | `qwen3-next:80b` | 70.6 | 11.6s | 131K | — |
|
|
172
|
+
| — | `qwen3.5:397b` | 68.9 | 3.9s | 256K | — |
|
|
173
|
+
| — | `minimax-m2.7` | 68.7 | 6.8s | 200K | — |
|
|
174
|
+
| — | `glm-5` | 67.6 | 4.5s | 131K | — |
|
|
175
|
+
| — | `devstral-2:123b` | 67.6 | 2.0s | 131K | Sysadmin + SSH tasks, reliable coding |
|
|
176
|
+
| — | `glm-4.7` | 66.5 | 5.1s | 131K | — |
|
|
177
|
+
| — | `kimi-k2-thinking` | 66.3 | 18.4s | 256K | — |
|
|
178
|
+
| — | `ministral-3:14b` | 65.8 | 3.8s | 131K | — |
|
|
179
|
+
| — | `devstral-small-2:24b` | 65.5 | 2.3s | 131K | Fast sub-agents, simple lookups |
|
|
180
|
+
| — | `ministral-3:3b` | 65.4 | 2.2s | 32K | — |
|
|
181
|
+
| — | `kimi-k2.5` | 65.2 | 3.5s | 256K | Large repos — faster than k2:1t |
|
|
182
|
+
| — | `kimi-k2:1t` | 65.2 | 4.2s | 256K | Large repos (>100K tokens) |
|
|
183
|
+
| — | `minimax-m2.1` | 64.2 | 5.4s | 200K | — |
|
|
184
|
+
| — | `glm-4.6` | 63.9 | 4.9s | 131K | — |
|
|
185
|
+
| — | `qwen3-coder:480b` | 63.2 | 14.1s | 131K | Heavy coding sessions, large context |
|
|
186
|
+
| — | `nemotron-3-super` | 61.3 | 2.6s | 256K | — |
|
|
187
|
+
| — | `gpt-oss:20b` | 60.9 | 2.5s | 131K | Fast small model, good overall score |
|
|
188
|
+
| — | `mistral-large-3:675b` | 60.8 | 3.8s | 131K | — |
|
|
167
189
|
|
|
168
190
|
> Rankings are nex-code-specific: tool name accuracy, argument validity, schema compliance.
|
|
169
191
|
> Toolathon (Minimax SOTA) measures different task types — run `/benchmark --discover` after model releases.
|
|
@@ -451,6 +473,8 @@ Fallback chains let you auto-switch when a provider fails:
|
|
|
451
473
|
/fallback anthropic,openai,local
|
|
452
474
|
```
|
|
453
475
|
|
|
476
|
+
**Wire Protocol Layer:** All 5 providers share 3 wire protocol implementations (OpenAI-compatible SSE, Anthropic Messages SSE, Ollama NDJSON). Stream parsing, tool call accumulation, and response normalization are handled by reusable `StreamParser` classes — eliminating duplicated protocol code across providers.
|
|
477
|
+
|
|
454
478
|
---
|
|
455
479
|
|
|
456
480
|
## Commands
|
|
@@ -473,6 +497,12 @@ Type `/` to see inline suggestions as you type. Tab completion is supported for
|
|
|
473
497
|
| `/load <name>` | Load a saved session |
|
|
474
498
|
| `/sessions` | List saved sessions |
|
|
475
499
|
| `/resume` | Resume last session |
|
|
500
|
+
| `/branches` | Show session tree (all conversation branches) |
|
|
501
|
+
| `/timeline [n]` | Show message timeline of current branch |
|
|
502
|
+
| `/goto <index>` | Jump to a message index (truncates later messages) |
|
|
503
|
+
| `/fork [index] [name]` | Create a new branch at the given message index |
|
|
504
|
+
| `/switch-branch <name>` | Switch to a different conversation branch |
|
|
505
|
+
| `/delete-branch <name>` | Delete a conversation branch |
|
|
476
506
|
| `/remember <text>` | Save a memory (persists across sessions) |
|
|
477
507
|
| `/forget <key>` | Delete a memory |
|
|
478
508
|
| `/memory` | Show all memories |
|
|
@@ -860,6 +890,37 @@ Only sessions from the last 24 hours are offered for auto-resume. Older autosave
|
|
|
860
890
|
|
|
861
891
|
Sessions are stored in `.nex/sessions/` as JSON files. Auto-saves always write to `_autosave` (overwritten each turn). Writes are atomic — a temp file is written and renamed, so a crash mid-write never corrupts the saved state.
|
|
862
892
|
|
|
893
|
+
### Session Trees
|
|
894
|
+
|
|
895
|
+
Navigate your conversation history like git branches. Fork at any point, explore alternative approaches, and switch between branches:
|
|
896
|
+
|
|
897
|
+
```
|
|
898
|
+
/timeline # see message indices
|
|
899
|
+
/fork 5 experiment # branch from message 5
|
|
900
|
+
/branches # see all branches
|
|
901
|
+
/switch-branch main # go back to main
|
|
902
|
+
/goto 3 # jump to message 3 (truncates later messages)
|
|
903
|
+
/delete-branch experiment
|
|
904
|
+
```
|
|
905
|
+
|
|
906
|
+
This enables non-linear conversations: try an approach, and if it doesn't work, fork from an earlier point and try something different — without losing the original attempt.
|
|
907
|
+
|
|
908
|
+
### Autoresearch
|
|
909
|
+
|
|
910
|
+
Autonomous optimization loops inspired by Karpathy's autoresearch pattern. The agent edits code, runs experiments, logs results, and automatically keeps improvements or reverts failures:
|
|
911
|
+
|
|
912
|
+
```
|
|
913
|
+
/autoresearch reduce test runtime while maintaining correctness
|
|
914
|
+
/autoresearch optimize bundle size under 500kb
|
|
915
|
+
```
|
|
916
|
+
|
|
917
|
+
The agent follows a repeating cycle: **checkpoint** (git) -> **edit** -> **run experiment** -> **log result** -> **keep or revert**. All experiments are logged to `.nex/autoresearch/experiments.json` with metrics and trend tracking.
|
|
918
|
+
|
|
919
|
+
```
|
|
920
|
+
/ar-status # show experiment history with trends
|
|
921
|
+
/ar-clear # reset experiment history
|
|
922
|
+
```
|
|
923
|
+
|
|
863
924
|
### Memory
|
|
864
925
|
|
|
865
926
|
Persistent project memory that survives across sessions:
|
|
@@ -1589,10 +1650,12 @@ npm test # Run all tests with coverage
|
|
|
1589
1650
|
npm run test:watch # Watch mode
|
|
1590
1651
|
```
|
|
1591
1652
|
|
|
1592
|
-
|
|
1653
|
+
91 test suites, 3719 tests, 83% statement / 74% branch coverage.
|
|
1593
1654
|
|
|
1594
1655
|
CI runs on GitHub Actions (Node 20 LTS).
|
|
1595
1656
|
|
|
1657
|
+
**Type checking:** `npm run typecheck` runs TypeScript in `noEmit` mode with `allowJs`. Core type definitions live in `types/index.d.ts` (Message, ToolCall, IProvider, IWireProtocol, Session, Skill, etc.). The codebase uses incremental TypeScript adoption — new modules can be written in `.ts` while existing `.js` files are gradually migrated.
|
|
1658
|
+
|
|
1596
1659
|
---
|
|
1597
1660
|
|
|
1598
1661
|
## Dependencies
|