npm - nex-code - Versions diffs - 0.4.20 → 0.4.22 - Mend

nex-code 0.4.20 → 0.4.22

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (11) hide show

package/README.md +72 -9
package/dist/nex-code.js +533 -492
package/examples/agentic.md +17 -0
package/examples/coding.md +14 -0
package/examples/data.md +14 -0
package/examples/frontend.md +15 -0
package/examples/skills/code-review.md +15 -0
package/examples/skills/docker-deploy.js +17 -0
package/examples/skills/git-workflow.md +23 -0
package/examples/sysadmin.md +15 -0
package/package.json +7 -2

package/README.md CHANGED Viewed

@@ -22,7 +22,7 @@
   <img src="https://img.shields.io/badge/Ollama_Cloud-supported-brightgreen.svg" alt="Ollama Cloud: supported">
   <img src="https://img.shields.io/badge/node-%3E%3D18-brightgreen.svg" alt="Node >= 18">
   <img src="https://img.shields.io/badge/dependencies-2-green.svg" alt="Dependencies: 2">
-  <img src="https://img.shields.io/badge/tests-3453-blue.svg" alt="Tests: 3453">
+  <img src="https://img.shields.io/badge/tests-3719-blue.svg" alt="Tests: 3719">
   <img src="https://img.shields.io/badge/VS_Code-extension-007ACC.svg" alt="VS Code extension">
 </p>
@@ -123,6 +123,10 @@ The verify phase catches incomplete work before reporting "done" — if tests fa
 **Lightweight.** 2 runtime dependencies (`axios`, `dotenv`). Starts in ~100ms. No Python, no heavy runtime, no daemon process.
+**Server-aware from the first message.** When your prompt contains a URL whose domain matches a configured SSH profile (e.g. `jarvis.example.com` → profile `jarvis`), nex-code probes the server before responding — listing ports, running processes, and data directories. The model receives this topology before its first token, so it goes straight to `ssh_exec` instead of reading local files.
+**Few-shot behavior injection.** On each session start, nex-code injects a short example of the correct tool sequence for the detected task type (sysadmin → check remote logs first; coding → read file before editing; data → explain before rewriting). Works across all models without fine-tuning. Customize with your own high-scoring sessions via `npm run extract-examples`.
 **Infrastructure tools built in:**
 - SSH server management (AlmaLinux, macOS, any Linux)
@@ -143,7 +147,7 @@ The verify phase catches incomplete work before reporting "done" — if tests fa
 **Extensible.** Plugin API (`registerTool` + lifecycle hooks), skill system (install from any git URL), MCP server support.
-**Tested.** 3453 tests, 79% coverage, CI on every push.
+**Tested.** 3719 tests, 83% coverage, CI on every push.
 ---
@@ -155,15 +159,33 @@ Rankings are based on nex-code's own `/benchmark` — 15 tool-calling tasks agai
 ### Flat-Rate / Pay-as-you-go
 <!-- nex-benchmark-start -->
-<!-- Updated: 2026-03-26 — run `/benchmark --discover` after new Ollama Cloud releases -->
+<!-- Updated: 2026-03-29 — run `/benchmark --discover` after new Ollama Cloud releases -->
 | Rank | Model | Score | Avg Latency | Context | Best For |
 |---|---|---|---|---|---|
-| 🥇 | `devstral-2:123b` | **82.5** | 1.7s | 131K | Default — fastest + most reliable tool selection |
-| 🥈 | `devstral-small-2:24b` | 75 | 3.1s | 131K | Fast sub-agents, simple lookups |
-| 🥉 | `qwen3-coder:480b` | 72.5 | 8.4s | 131K | Coding-heavy sessions, heavy sub-agents |
-| — | `kimi-k2:1t` | 67.5 | 6.6s | 256K | Large repos (>100K tokens) |
-| — | `minimax-m2.7:cloud` | 64.1 | 5.0s | 200K | Complex swarm / multi-agent sessions (Toolathon SOTA) |
+| 🥇 | `qwen3-vl:235b` | **77.1** | 14.4s | 131K | Overall #1 — frontier tool selection, data + agentic tasks |
+| 🥈 | `qwen3-vl:235b-instruct` | 76.3 | 6.5s | 131K | Best latency/score balance — recommended default |
+| 🥉 | `rnj-1:8b` | 74 | 3.7s | 131K | — |
+| — | `ministral-3:8b` | 73.1 | 2.3s | 131K | Fastest strong model — 2.2s latency, 70+ score |
+| — | `qwen3-coder-next` | 71.4 | 2.8s | 256K | — |
+| — | `qwen3-next:80b` | 70.6 | 11.6s | 131K | — |
+| — | `qwen3.5:397b` | 68.9 | 3.9s | 256K | — |
+| — | `minimax-m2.7` | 68.7 | 6.8s | 200K | — |
+| — | `glm-5` | 67.6 | 4.5s | 131K | — |
+| — | `devstral-2:123b` | 67.6 | 2.0s | 131K | Sysadmin + SSH tasks, reliable coding |
+| — | `glm-4.7` | 66.5 | 5.1s | 131K | — |
+| — | `kimi-k2-thinking` | 66.3 | 18.4s | 256K | — |
+| — | `ministral-3:14b` | 65.8 | 3.8s | 131K | — |
+| — | `devstral-small-2:24b` | 65.5 | 2.3s | 131K | Fast sub-agents, simple lookups |
+| — | `ministral-3:3b` | 65.4 | 2.2s | 32K | — |
+| — | `kimi-k2.5` | 65.2 | 3.5s | 256K | Large repos — faster than k2:1t |
+| — | `kimi-k2:1t` | 65.2 | 4.2s | 256K | Large repos (>100K tokens) |
+| — | `minimax-m2.1` | 64.2 | 5.4s | 200K | — |
+| — | `glm-4.6` | 63.9 | 4.9s | 131K | — |
+| — | `qwen3-coder:480b` | 63.2 | 14.1s | 131K | Heavy coding sessions, large context |
+| — | `nemotron-3-super` | 61.3 | 2.6s | 256K | — |
+| — | `gpt-oss:20b` | 60.9 | 2.5s | 131K | Fast small model, good overall score |
+| — | `mistral-large-3:675b` | 60.8 | 3.8s | 131K | — |
 > Rankings are nex-code-specific: tool name accuracy, argument validity, schema compliance.
 > Toolathon (Minimax SOTA) measures different task types — run `/benchmark --discover` after model releases.
@@ -451,6 +473,8 @@ Fallback chains let you auto-switch when a provider fails:
 /fallback anthropic,openai,local
 ```
+**Wire Protocol Layer:** All 5 providers share 3 wire protocol implementations (OpenAI-compatible SSE, Anthropic Messages SSE, Ollama NDJSON). Stream parsing, tool call accumulation, and response normalization are handled by reusable `StreamParser` classes — eliminating duplicated protocol code across providers.
 ---
 ## Commands
@@ -473,6 +497,12 @@ Type `/` to see inline suggestions as you type. Tab completion is supported for
 | `/load <name>`                                | Load a saved session                                                                                                                               |
 | `/sessions`                                   | List saved sessions                                                                                                                                |
 | `/resume`                                     | Resume last session                                                                                                                                |
+| `/branches`                                   | Show session tree (all conversation branches)                                                                                                      |
+| `/timeline [n]`                               | Show message timeline of current branch                                                                                                            |
+| `/goto <index>`                               | Jump to a message index (truncates later messages)                                                                                                 |
+| `/fork [index] [name]`                        | Create a new branch at the given message index                                                                                                     |
+| `/switch-branch <name>`                       | Switch to a different conversation branch                                                                                                          |
+| `/delete-branch <name>`                       | Delete a conversation branch                                                                                                                       |
 | `/remember <text>`                            | Save a memory (persists across sessions)                                                                                                           |
 | `/forget <key>`                               | Delete a memory                                                                                                                                    |
 | `/memory`                                     | Show all memories                                                                                                                                  |
@@ -860,6 +890,37 @@ Only sessions from the last 24 hours are offered for auto-resume. Older autosave
 Sessions are stored in `.nex/sessions/` as JSON files. Auto-saves always write to `_autosave` (overwritten each turn). Writes are atomic — a temp file is written and renamed, so a crash mid-write never corrupts the saved state.
+### Session Trees
+Navigate your conversation history like git branches. Fork at any point, explore alternative approaches, and switch between branches:
+```
+/timeline            # see message indices
+/fork 5 experiment   # branch from message 5
+/branches            # see all branches
+/switch-branch main  # go back to main
+/goto 3              # jump to message 3 (truncates later messages)
+/delete-branch experiment
+```
+This enables non-linear conversations: try an approach, and if it doesn't work, fork from an earlier point and try something different — without losing the original attempt.
+### Autoresearch
+Autonomous optimization loops inspired by Karpathy's autoresearch pattern. The agent edits code, runs experiments, logs results, and automatically keeps improvements or reverts failures:
+```
+/autoresearch reduce test runtime while maintaining correctness
+/autoresearch optimize bundle size under 500kb
+```
+The agent follows a repeating cycle: **checkpoint** (git) -> **edit** -> **run experiment** -> **log result** -> **keep or revert**. All experiments are logged to `.nex/autoresearch/experiments.json` with metrics and trend tracking.
+```
+/ar-status    # show experiment history with trends
+/ar-clear     # reset experiment history
+```
 ### Memory
 Persistent project memory that survives across sessions:
@@ -1589,10 +1650,12 @@ npm test              # Run all tests with coverage
 npm run test:watch    # Watch mode
 ```
-83 test suites, 3453 tests, 79% statement / 71% branch coverage.
+91 test suites, 3719 tests, 83% statement / 74% branch coverage.
 CI runs on GitHub Actions (Node 20 LTS).
+**Type checking:** `npm run typecheck` runs TypeScript in `noEmit` mode with `allowJs`. Core type definitions live in `types/index.d.ts` (Message, ToolCall, IProvider, IWireProtocol, Session, Skill, etc.). The codebase uses incremental TypeScript adoption — new modules can be written in `.ts` while existing `.js` files are gradually migrated.
 ---
 ## Dependencies