@elvatis_com/elvatis-mcp 0.6.2 → 0.7.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (2) hide show
  1. package/README.md +28 -45
  2. package/package.json +1 -1
package/README.md CHANGED
@@ -219,17 +219,31 @@ Prerequisites: `.env` configured, local LLM server running, OpenClaw server reac
219
219
 
220
220
  ## Benchmarks
221
221
 
222
- All benchmarks measured on a local development machine. Results will vary based on hardware, model size, and network latency.
222
+ See [BENCHMARKS.md](BENCHMARKS.md) for the full benchmark suite, methodology, and community contribution guide.
223
223
 
224
- ### Test Hardware
224
+ ### Reference Hardware
225
225
 
226
226
  | Component | Spec |
227
227
  |---|---|
228
- | CPU | AMD Threadripper 3960X (24 cores / 48 threads, 3.8 GHz base) |
228
+ | CPU | AMD Threadripper 3960X (24 cores / 48 threads) |
229
229
  | GPU | AMD Radeon RX 9070 XT Elite (16 GB GDDR6) |
230
- | RAM | 128 GB DDR4 ECC |
230
+ | RAM | 128 GB DDR4 |
231
231
  | OS | Windows 11 Pro |
232
- | LLM Server | LM Studio 0.3.x (llama.cpp backend) |
232
+ | Runtime | LM Studio + ROCm (`llama.cpp-win-x86_64-amd-rocm-avx2@2.8.0`) |
233
+
234
+ ### Local LLM Inference (LM Studio, ROCm GPU, `--gpu max`)
235
+
236
+ Median of 3 runs, `max_tokens=512`. Tasks: classify (1-word sentiment), extract (JSON), reason (arithmetic), code (Python function).
237
+
238
+ | Model | Params | classify | extract | reason | code |
239
+ |-------|--------|----------|---------|--------|------|
240
+ | Phi 4 Mini Reasoning | 3B | 3.9s | 5.1s | 8.6s | 8.5s |
241
+ | Deepseek R1 0528 Qwen3 | 8B | 2.5s | 7.7s | 13.3s | 7.3s |
242
+ | Qwen 3.5 9B | 9B | 4.9s | 11.2s | 6.6s | 11.4s |
243
+ | Phi 4 Reasoning Plus | 15B | 0.4s | 17.4s | 4.5s | 17.3s |
244
+ | **GPT-OSS 20B** | **20B** | **0.6s** | **0.7s** | **0.7s** | **3.1s** |
245
+
246
+ **GPU speedup vs CPU (Deepseek R1 8B):** classify 8.4x faster, extract 3.2x faster.
233
247
 
234
248
  ### Service Latency (system_status)
235
249
 
@@ -242,49 +256,18 @@ All benchmarks measured on a local development machine. Results will vary based
242
256
  | Codex CLI (version check) | 131-136 ms | CLI startup overhead |
243
257
  | Gemini CLI (version check) | 4,700-4,900 ms | CLI startup + auth check |
244
258
 
245
- ### Local LLM Inference (LM Studio, CPU-only)
246
-
247
- | Model | Task | Tokens | Time | Notes |
248
- |---|---|---|---|---|
249
- | Deepseek R1 Qwen3 8B | Sentiment classification | 343 (303 reasoning) | ~21s | Reasoning model, <think> tags stripped |
250
- | Deepseek R1 Qwen3 8B | JSON extraction | ~400 | ~25s | Structured output from natural language |
251
- | Deepseek R1 Qwen3 8B | Simple greeting | 151 (145 reasoning) | ~18s | Reasoning overhead even for trivial tasks |
252
-
253
- ### Prompt Splitting (prompt_split)
254
-
255
- | Strategy | Latency | Notes |
256
- |---|---|---|
257
- | Heuristic (keyword) | <1 ms | Instant, no LLM call |
258
- | Short-circuit (single domain) | <1 ms | Auto-detected, no LLM call |
259
- | Local LLM | 60s (fallback) | Deepseek R1 struggles with structured JSON output, falls back to heuristic |
260
- | Gemini | 5-15s | Best quality splitting (requires Gemini CLI) |
261
-
262
- ### Tool Operations
263
-
264
- | Operation | Latency | Notes |
265
- |---|---|---|
266
- | Memory search (SSH, single call) | 208-391 ms | grep across 90 days of daily logs |
267
- | Home light on/off | 48-84 ms | Direct HA REST API |
268
- | Cron job listing | ~300 ms | SSH + openclaw CLI |
269
- | File listing (SSH) | ~300 ms | Remote directory listing |
270
-
271
- ### Notes on Hardware
272
-
273
- This setup runs all local inference on **CPU only** (Threadripper 3960X, 48 threads). The Radeon RX 9070 XT is not yet utilized for LLM inference because:
274
-
275
- - LM Studio uses llama.cpp which requires ROCm or Vulkan for AMD GPUs
276
- - ROCm support for RDNA 4 (RX 9070 series) is still maturing
277
- - A Vulkan backend for llama.cpp is in development
259
+ ### prompt_split Accuracy (heuristic strategy)
278
260
 
279
- On newer or GPU-accelerated hardware, local LLM inference times would be significantly faster (likely 2-5x for the 8B model). Contributions benchmarking on different hardware are very welcome:
261
+ | Metric | Result |
262
+ |--------|--------|
263
+ | Pass rate | 6/10 (60%) |
264
+ | Task count accuracy | 7/10 (70%) |
265
+ | Avg agent match | 70% |
266
+ | Latency | <1ms (no LLM call) |
280
267
 
281
- - Apple Silicon (M2/M3/M4) with Metal acceleration
282
- - NVIDIA GPUs (RTX 3090/4090) with CUDA
283
- - AMD GPUs with ROCm (RDNA 3, MI300X)
284
- - Intel Arc GPUs with SYCL
285
- - ARM servers (Graviton, Ampere Altra)
268
+ The `auto` strategy (Gemini or local LLM) handles complex multi-step prompts with higher accuracy. See [BENCHMARKS.md](BENCHMARKS.md) for details.
286
269
 
287
- If you run benchmarks on your hardware, please open an issue or PR at [github.com/elvatis/elvatis-mcp](https://github.com/elvatis/elvatis-mcp) with your results.
270
+ > Want to contribute benchmarks from your hardware? See [BENCHMARKS.md](BENCHMARKS.md#community-contributions).
288
271
 
289
272
  ---
290
273
 
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "@elvatis_com/elvatis-mcp",
3
- "version": "0.6.2",
3
+ "version": "0.7.0",
4
4
  "description": "MCP server for OpenClaw — expose smart home, memory, cron, and more to Claude Desktop, Cursor, Windsurf, and any MCP client",
5
5
  "main": "dist/index.js",
6
6
  "types": "dist/index.d.ts",