npm - nex-code - Versions diffs - 0.5.1 → 0.5.3 - Mend

nex-code 0.5.1 → 0.5.3

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (4) hide show

package/README.md CHANGED Viewed

@@ -60,13 +60,23 @@ On first launch, an interactive setup wizard guides you through provider and cre
 | Startup time | ~100ms | 1-4s |
 | Runtime deps | 2 | heavy |
 | Infra tools | SSH, Docker, K8s built-in | no |
+<<<<<<< Updated upstream
 **Smart model routing.** The built-in `/benchmark` tests all configured models across 62 tool-calling tasks in 5 categories and auto-routes to the best model per task type.
 **Phase-based execution.** Tasks run through Plan (analyze) -> Implement (code) -> Verify (test) phases, each with the optimal model. Auto-loops back on test failures.
-**45 built-in tools** across file ops, git, SSH, Docker, Kubernetes, deploy, browser, GitHub Actions, and visual review.
+**45 built-in tools** across file ops, git, SSH, Docker, Kubernetes, deploy, browser, GitHub Actions, and visual review. See [Tools](#tools) for the full list.
+=======
+**Smart model routing.** The built-in `/benchmark` tests all configured models across 62 tool-calling tasks in 5 categories and auto-routes to the best model per task type.
+**Phase-based execution.** Tasks run through Plan (analyze) -> Implement (code) -> Verify (test) phases, each with the optimal model. Auto-loops back on test failures.
+**45 built-in tools** across file ops, git, SSH, Docker, Kubernetes, deploy, browser, GitHub Actions, and visual review. See [Tools](#tools) for the full list.
+>>>>>>> Stashed changes
 **2 runtime dependencies** (`axios`, `dotenv`). Starts in ~100ms. No Python, no heavy runtime.
 ---
@@ -76,17 +86,42 @@ On first launch, an interactive setup wizard guides you through provider and cre
 Rankings from nex-code's own `/benchmark` — 62 tasks testing tool selection, argument validity, and schema compliance.
 <!-- nex-benchmark-start -->
-<!-- Updated: 2026-04-01 — run `/benchmark --discover` after new Ollama Cloud releases -->
+<!-- Updated: 2026-04-05 — run `/benchmark --discover` after new Ollama Cloud releases -->
-| Rank | Model | Score | Latency | Context | Best For |
+| Rank | Model | Score | Avg Latency | Context | Best For |
 |---|---|---|---|---|---|
-| 1 | `qwen3-vl:235b-instruct` | **79.9** | 3.8s | 131K | Best latency/score balance |
-| 2 | `qwen3-vl:235b` | 79.4 | 12.3s | 131K | Frontier tool selection |
-| 3 | `qwen3-coder-next` | 74.9 | 1.7s | 256K | — |
-| 4 | `ministral-3:8b` | 74.2 | 1.2s | 131K | Fastest strong model |
-| 5 | `devstral-2:123b` | 69.9 | 1.6s | 131K | Sysadmin/SSH, reliable |
-> Run `/benchmark --discover` to detect new models and auto-update this table.
+| 🥇 | `qwen3-vl:235b` | **79** | 12.4s | 131K | Overall #1 — frontier tool selection, data + agentic tasks |
+| 🥈 | `qwen3-vl:235b-instruct` | 78.2 | 5.3s | 131K | Best latency/score balance — recommended default |
+| 🥉 | `nemotron-3-super` | 78.1 | 3.5s | 256K | — |
+| — | `rnj-1:8b` | 77.4 | 3.9s | 131K | — |
+| — | `mistral-large-3:675b` | 76.5 | 3.9s | 131K | — |
+| — | `gpt-oss:20b` | 76.5 | 1.9s | 131K | Fast small model, good overall score |
+| — | `qwen3-coder-next` | 75.7 | 2.2s | 256K | — |
+| — | `qwen3-next:80b` | 75.1 | 11.1s | 131K | — |
+| — | `ministral-3:8b` | 73.8 | 2.0s | 131K | Fastest strong model — 2.2s latency, 70+ score |
+| — | `deepseek-v3.1:671b` | 73.6 | 2.9s | 131K | — |
+| — | `devstral-2:123b` | 73.2 | 2.0s | 131K | Sysadmin + SSH tasks, reliable coding |
+| — | `kimi-k2:1t` | 72.2 | 5.6s | 256K | Large repos (>100K tokens) |
+| — | `ministral-3:3b` | 72 | 1.6s | 32K | — |
+| — | `devstral-small-2:24b` | 71.7 | 2.6s | 131K | Fast sub-agents, simple lookups |
+| — | `qwen3.5:397b` | 70.7 | 4.2s | 256K | — |
+| — | `qwen3-coder:480b` | 70.1 | 6.0s | 131K | Heavy coding sessions, large context |
+| — | `minimax-m2.1` | 69.9 | 3.0s | 200K | — |
+| — | `gemma4:31b` | 69.3 | 2.8s | ? | — |
+| — | `glm-4.7` | 69.1 | 5.3s | 131K | — |
+| — | `kimi-k2-thinking` | 69 | 3.1s | 256K | — |
+| — | `ministral-3:14b` | 68.8 | 2.0s | 131K | — |
+| — | `kimi-k2.5` | 68.7 | 3.4s | 256K | Large repos — faster than k2:1t |
+| — | `minimax-m2.7` | 68.4 | 5.5s | 200K | — |
+| — | `glm-4.6` | 67.8 | 4.7s | 131K | — |
+| — | `glm-5` | 67.4 | 5.0s | 131K | — |
+| — | `gpt-oss:120b` | 64.8 | 3.4s | 131K | — |
+| — | `nemotron-3-nano:30b` | 64.7 | 2.3s | 131K | — |
+| — | `minimax-m2.5` | 61.9 | 2.7s | 131K | Multi-agent, large context |
+| — | `minimax-m2` | 60.6 | 4.3s | 200K | — |
+> Rankings are nex-code-specific: tool name accuracy, argument validity, schema compliance.
+> Toolathon (Minimax SOTA) measures different task types — run `/benchmark --discover` after model releases.
 <!-- nex-benchmark-end -->
 **Recommended `.env`:**
@@ -111,7 +146,7 @@ OLLAMA_API_KEY=your-key       # Ollama Cloud
 OPENAI_API_KEY=your-key       # OpenAI
 ANTHROPIC_API_KEY=your-key    # Anthropic
 GEMINI_API_KEY=your-key       # Gemini
-PERPLEXITY_API_KEY=your-key   # optional — grounded web search
+PERPLEXITY_API_KEY=your-key   # optional — enables grounded web search
 DEFAULT_PROVIDER=ollama
 DEFAULT_MODEL=devstral-2:123b
@@ -136,18 +171,20 @@ cp .env.example .env && npm link && npm run install-hooks
 > the /users endpoint returns 500 — find the bug and fix it
 ```
-**YOLO mode** — skip all confirmations, auto-runs `caffeinate` on macOS:
+### YOLO Mode
+Skip all confirmations — file changes, dangerous commands, and tool permissions are auto-approved. Auto-runs `caffeinate` on macOS.
 ```bash
 nex-code -yolo
 ```
-**Headless / Programmatic:**
+### Headless / Programmatic Mode
 ```bash
 nex-code --task "refactor src/index.js to async/await" --yolo
 nex-code --prompt-file /tmp/task.txt --yolo --json
-nex-code --daemon          # background watcher (reads .nex/daemon.json)
+nex-code --daemon          # watch mode: fires tasks on file changes, git commits, or cron
 ```
 | Flag | Description |
@@ -156,33 +193,39 @@ nex-code --daemon          # background watcher (reads .nex/daemon.json)
 | `--prompt-file <path>` | Read prompt from file |
 | `--yolo` | Skip all confirmations |
 | `--server` | JSON-lines IPC server (VS Code extension) |
-| `--daemon` | Background watcher (file changes, git commits, cron) |
+| `--daemon` | Background watcher (reads `.nex/daemon.json`) |
 | `--flatrate` | 100 turns, 6 parallel agents, 5 retries |
 | `--json` | JSON output to stdout |
 | `--max-turns <n>` | Override agentic loop limit |
-| `--model <spec>` | e.g. `anthropic:claude-sonnet-4-6` |
+| `--model <spec>` | Use specific model (e.g. `anthropic:claude-sonnet-4-6`) |
 | `--debug` | Show diagnostic messages |
+<<<<<<< Updated upstream
+### Vision / Screenshot
+=======
-**Vision / Screenshot:**
+### Vision / Screenshot
+>>>>>>> Stashed changes
 ```
 > /path/to/screenshot.png implement this UI in React
 > analyze https://example.com/mockup.png and implement it
-> what's wrong with the layout in my clipboard    # macOS clipboard
+> what's wrong with the layout in my clipboard    # macOS clipboard capture
 > screenshot localhost:3000 and review the navbar spacing
 ```
-Formats: PNG, JPG, GIF, WebP, BMP. Works with all providers that support vision.
+Works with Anthropic, OpenAI, Gemini, and Ollama vision models. Formats: PNG, JPG, GIF, WebP, BMP.
 ---
 ## Providers & Models
 ```
-/model                      # interactive picker
-/model openai:gpt-4o        # switch directly
-/providers                  # list all
-/fallback anthropic,openai  # auto-switch on failure
+/model                         # interactive picker
+/model openai:gpt-4o           # switch directly
+/providers                     # list all
+/fallback anthropic,openai     # auto-switch on failure
 ```
 | Provider | Models | Env Variable |
@@ -197,12 +240,13 @@ Formats: PNG, JPG, GIF, WebP, BMP. Works with all providers that support vision.
 ## Commands
-Type `/` for inline suggestions. Tab completion for commands and file paths.
+Type `/` to see inline suggestions. Tab completion for slash commands and file paths.
 | Command | Description |
 |---|---|
 | `/help` | Full help |
-| `/model [spec]` / `/providers` | Switch model / list all |
+| `/model [spec]` | Show/switch model |
+| `/providers` | List providers |
 | `/clear` | Clear conversation |
 | `/save` / `/load` / `/sessions` / `/resume` | Session management |
 | `/branches` / `/fork` / `/switch-branch` / `/goto` | Session tree navigation |
@@ -227,7 +271,7 @@ Type `/` for inline suggestions. Tab completion for commands and file paths.
 ## Tools
-45 built-in tools:
+45 built-in tools organized by category:
 **Core:** `bash`, `read_file`, `write_file`, `edit_file`, `patch_file`, `list_directory`, `search_files`, `glob`, `grep`
@@ -245,7 +289,7 @@ Type `/` for inline suggestions. Tab completion for commands and file paths.
 **Deploy:** `deploy`, `deployment_status`
-**Frontend:** `frontend_recon` — scans design tokens, layout, and framework stack before any frontend work
+**Frontend:** `frontend_recon` — scans design tokens, layout, framework stack before any frontend work
 **Visual:** `visual_diff`, `responsive_sweep`, `visual_annotate`, `visual_watch`, `design_tokens`, `design_compare`
@@ -257,7 +301,7 @@ Additional tools via [MCP servers](#mcp) or [Skills](#skills).
 ### Multi-Agent Orchestrator
-Multi-goal prompts auto-decompose into parallel sub-agents (up to 5 simultaneously, with file locking):
+Multi-goal prompts auto-decompose into parallel sub-agents. Up to 5 agents run simultaneously with file locking.
 ```bash
 nex-code --task "fix type errors in src/, add JSDoc to utils/, update CHANGELOG"
@@ -265,24 +309,28 @@ nex-code --task "fix type errors in src/, add JSDoc to utils/, update CHANGELOG"
 ### Autoresearch
-Autonomous optimization loops — edit → experiment → keep/revert, on a dedicated git branch:
+Autonomous optimization loops: edit -> experiment -> keep/revert, on a dedicated branch.
 ```
 /autoresearch reduce test runtime while maintaining correctness
-/ar-self-improve          # uses nex-code's own benchmark as the fitness metric
+/ar-self-improve          # self-improvement using nex-code's benchmark
 ```
 ### Plan Mode
-Auto-activates for implementation tasks. Read-only analysis first, approve before writes. Hard-enforced tool restrictions during analysis phase.
+Auto-activates for implementation tasks. Read-only analysis first, approve before writes. Hard-enforced tool restrictions.
 ### Daemon / Watch Mode
+<<<<<<< Updated upstream
+Background process that fires tasks on file changes, git commits, or cron schedule. Configured via `.nex/daemon.json`. Desktop and Matrix notifications.
+=======
 Background process that fires tasks on file changes, git commits, or cron schedule. Configured via `.nex/daemon.json`. Desktop and Matrix notifications.
+>>>>>>> Stashed changes
 ### Session Trees
-Navigate conversation history like git branches — fork, switch, goto, delete branches without losing prior attempts.
+Navigate conversation history like git branches — fork, switch, goto, delete branches.
 ### Safety
@@ -295,8 +343,6 @@ Navigate conversation history like git branches — fork, switch, goto, delete b
 Pre-push secret detection, audit logging (JSONL), persistent undo/redo, cost limits, auto plan mode.
-**Reporting vulnerabilities:** Email **security@schoensgibl.com** — not a public issue. 72h initial response.
 ### Open-Source Model Robustness
 - **5-layer argument parsing** — JSON, trailing fix, extraction, key repair, fence stripping
@@ -304,10 +350,37 @@ Pre-push secret detection, audit logging (JSONL), persistent undo/redo, cost lim
 - **Auto-fix engine** — path resolution, edit fuzzy matching (Levenshtein), bash error hints
 - **Tool tiers** — essential (5) / standard (21) / full (45), auto-selected per model capability
 - **Stale stream recovery** — progressive retry with context compression on stall
+<<<<<<< Updated upstream
+### Visual Development Tools
+Pixel-level before/after comparison, responsive sweeps (320-1920px), annotation overlays, design token extraction, and live-reload diff watching. Pure image tools work standalone; browser-based tools need Playwright.
+---
+## Extensibility
+### Skills
+Drop `.md` or `.js` files in `.nex/skills/` for project-specific knowledge, commands, and tools. Global skills in `~/.nex-code/skills/`. Install from git: `/install-skill user/repo`.
+### Plugins
+Custom tools and lifecycle hooks via `.nex/plugins/`. Events: `onToolResult`, `onModelResponse`, `onSessionStart`, `onSessionEnd`, `onFileChange`, `beforeToolExec`, `afterToolExec`.
+### MCP
+Connect external tool servers via [Model Context Protocol](https://modelcontextprotocol.io). Configure in `.nex/mcp.json` with env var interpolation.
+### Hooks
+Run custom scripts on CLI events (`pre-tool`, `post-tool`, `pre-commit`, `post-response`, `session-start`, `session-end`). Configure in `.nex/config.json` or `.nex/hooks/`.
+---
+=======
 ### Visual Development Tools
-Pixel-level before/after diffs, responsive sweeps (320–1920px), annotation overlays, design token extraction, and live-reload diff watching. Pure image tools work standalone; browser tools need Playwright.
+Pixel-level before/after comparison, responsive sweeps (320-1920px), annotation overlays, design token extraction, and live-reload diff watching. Pure image tools work standalone; browser-based tools need Playwright.
 ---
@@ -331,13 +404,14 @@ Run custom scripts on CLI events (`pre-tool`, `post-tool`, `pre-commit`, `post-r
 ---
+>>>>>>> Stashed changes
 ## VS Code Extension
 Built-in sidebar chat panel (`vscode/`) with streaming output, collapsible tool cards, and native theme support. Spawns `nex-code --server` over JSON-lines IPC.
 ```bash
 cd vscode && npm install && npm run package
-# Cmd+Shift+P → Extensions: Install from VSIX...
+# Cmd+Shift+P -> Extensions: Install from VSIX...
 ```
 ---
@@ -352,7 +426,11 @@ cli/
   tools/index.js         # 45 tool definitions + auto-fix engine
   context-engine.js      # Token management + 5-phase compression
   sub-agent.js           # Parallel sub-agents with file locking
-  orchestrator.js        # Multi-agent decompose → execute → synthesize
+<<<<<<< Updated upstream
+orchestrator.js        # Multi-agent decompose -> execute -> synthesize
+=======
+  orchestrator.js        # Multi-agent decompose -> execute -> synthesize
+>>>>>>> Stashed changes
   session-tree.js        # Session branching
   visual.js              # Visual dev tools (pixelmatch-based)
   browser.js             # Playwright browser agent
@@ -366,10 +444,10 @@ See [DEVELOPMENT.md](DEVELOPMENT.md) for full architecture details.
 ## Testing
 ```bash
-npm test                          # 97 suites, 3920 tests
-npm run typecheck                 # TypeScript noEmit check
-npm run benchmark:gate            # 7-task smoke test (blocks push on regression)
-npm run benchmark:reallife        # 35 real-world tasks across 7 categories
+npm test              # 97 suites, 3920 tests
+npm run typecheck     # TypeScript noEmit check
+npm run benchmark:gate        # 7-task smoke test (blocks push on regression)
+npm run benchmark:reallife    # 35 real-world tasks across 7 categories
 ```
 ---
@@ -379,10 +457,12 @@ npm run benchmark:reallife        # 35 real-world tasks across 7 categories
 - Pre-push secret detection (API keys, private keys, hardcoded credentials)
 - Audit logging with automatic argument sanitization
 - Sensitive path blocking (`.ssh/`, `.aws/`, `.env`, credentials)
-- Shell injection protection via `execFileSync` with argument arrays throughout
+- Shell injection protection via `execFileSync` with argument arrays
 - SSRF protection on `web_fetch`
 - MCP environment isolation
+**Reporting vulnerabilities:** Email **security@schoensgibl.com** (not a public issue). Allow 72h for initial response.
 ---
 ## License