npm - @vtstech/pi-model-test - Versions diffs - 1.0.4 → 1.0.5 - Mend

@vtstech/pi-model-test 1.0.4 → 1.0.5

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (3) hide show

package/README.md ADDED Viewed

@@ -0,0 +1,61 @@
+# @vtstech/pi-model-test
+Model benchmark extension for the [Pi Coding Agent](https://github.com/badlogic/pi-mono).
+Test any model for reasoning, tool usage, and instruction following — works with Ollama and cloud providers.
+## Install
+```bash
+pi install "npm:@vtstech/pi-model-test"
+```
+## Commands
+```bash
+/model-test                     Test current Pi model (auto-detects provider)
+/model-test qwen3:0.6b          Test a specific Ollama model
+/model-test --all               Test every Ollama model
+```
+## Test Suites
+### Ollama (6 tests)
+| Test | Scoring |
+|------|---------|
+| Reasoning (snail puzzle) | STRONG / MODERATE / WEAK / FAIL |
+| Thinking token support | SUPPORTED / NOT SUPPORTED |
+| Tool usage (native + text) | STRONG / MODERATE / WEAK / FAIL |
+| ReAct parsing | STRONG / MODERATE / WEAK / FAIL |
+| Instruction following (JSON) | STRONG / MODERATE / WEAK / FAIL |
+| Tool support detection | NATIVE / REACT / NONE |
+### Cloud Providers (4 tests)
+| Test | Scoring |
+|------|---------|
+| Connectivity | OK / FAIL |
+| Reasoning | STRONG / MODERATE / WEAK / FAIL |
+| Instruction following | STRONG / MODERATE / WEAK / FAIL |
+| Tool usage (function calling) | STRONG / MODERATE / WEAK / FAIL |
+## Features
+- Auto-detects Ollama vs cloud provider (OpenRouter, Anthropic, Google, OpenAI, Groq, DeepSeek, Mistral, xAI, Together, Fireworks, Cohere)
+- Automatic remote Ollama URL resolution
+- Timeout resilience with auto-retry on empty responses
+- Rate limit delay between tests (configurable)
+- Thinking model fallback (retries with `think: true`)
+- Tool support cache (`~/.pi/agent/cache/tool_support.json`)
+- JSON repair for truncated output
+- Tab-completion for model names
+## Links
+- [Full Documentation](https://github.com/VTSTech/pi-coding-agent#model-benchmark-model-testts)
+- [Changelog](https://github.com/VTSTech/pi-coding-agent/blob/main/CHANGELOG.md)
+## License
+MIT — [VTSTech](https://www.vts-tech.org)

package/model-test.js CHANGED Viewed

@@ -1167,7 +1167,7 @@ The JSON object must have exactly these 4 keys:
     }
   }
   const branding = [
-    `  \u26A1 Pi Model Benchmark v1.0.3`,
+    `  \u26A1 Pi Model Benchmark v1.0.5`,
     `  Written by VTSTech`,
     `  GitHub: https://github.com/VTSTech`,
     `  Website: www.vts-tech.org`

package/package.json CHANGED Viewed

@@ -1,9 +1,9 @@
 {
   "name": "@vtstech/pi-model-test",
-  "version": "1.0.4",
+  "version": "1.0.5",
   "description": "Model benchmark/testing extension for Pi Coding Agent",
   "main": "model-test.js",
-  "keywords": ["pi-package", "pi-extensions"],
+  "keywords": ["pi-extensions"],
   "license": "MIT",
   "access": "public",
   "type": "module",
@@ -14,7 +14,7 @@
     "url": "https://github.com/VTSTech/pi-coding-agent"
   },
   "dependencies": {
-    "@vtstech/pi-shared": "1.0.4"
+    "@vtstech/pi-shared": "1.0.5"
   },
   "peerDependencies": {
     "@mariozechner/pi-coding-agent": ">=0.66"