@elvatis_com/elvatis-mcp 0.6.2 → 0.7.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +28 -45
- package/package.json +1 -1
package/README.md
CHANGED
|
@@ -219,17 +219,31 @@ Prerequisites: `.env` configured, local LLM server running, OpenClaw server reac
|
|
|
219
219
|
|
|
220
220
|
## Benchmarks
|
|
221
221
|
|
|
222
|
-
|
|
222
|
+
See [BENCHMARKS.md](BENCHMARKS.md) for the full benchmark suite, methodology, and community contribution guide.
|
|
223
223
|
|
|
224
|
-
###
|
|
224
|
+
### Reference Hardware
|
|
225
225
|
|
|
226
226
|
| Component | Spec |
|
|
227
227
|
|---|---|
|
|
228
|
-
| CPU | AMD Threadripper 3960X (24 cores / 48 threads
|
|
228
|
+
| CPU | AMD Threadripper 3960X (24 cores / 48 threads) |
|
|
229
229
|
| GPU | AMD Radeon RX 9070 XT Elite (16 GB GDDR6) |
|
|
230
|
-
| RAM | 128 GB DDR4
|
|
230
|
+
| RAM | 128 GB DDR4 |
|
|
231
231
|
| OS | Windows 11 Pro |
|
|
232
|
-
|
|
|
232
|
+
| Runtime | LM Studio + ROCm (`llama.cpp-win-x86_64-amd-rocm-avx2@2.8.0`) |
|
|
233
|
+
|
|
234
|
+
### Local LLM Inference (LM Studio, ROCm GPU, `--gpu max`)
|
|
235
|
+
|
|
236
|
+
Median of 3 runs, `max_tokens=512`. Tasks: classify (1-word sentiment), extract (JSON), reason (arithmetic), code (Python function).
|
|
237
|
+
|
|
238
|
+
| Model | Params | classify | extract | reason | code |
|
|
239
|
+
|-------|--------|----------|---------|--------|------|
|
|
240
|
+
| Phi 4 Mini Reasoning | 3B | 3.9s | 5.1s | 8.6s | 8.5s |
|
|
241
|
+
| Deepseek R1 0528 Qwen3 | 8B | 2.5s | 7.7s | 13.3s | 7.3s |
|
|
242
|
+
| Qwen 3.5 9B | 9B | 4.9s | 11.2s | 6.6s | 11.4s |
|
|
243
|
+
| Phi 4 Reasoning Plus | 15B | 0.4s | 17.4s | 4.5s | 17.3s |
|
|
244
|
+
| **GPT-OSS 20B** | **20B** | **0.6s** | **0.7s** | **0.7s** | **3.1s** |
|
|
245
|
+
|
|
246
|
+
**GPU speedup vs CPU (Deepseek R1 8B):** classify 8.4x faster, extract 3.2x faster.
|
|
233
247
|
|
|
234
248
|
### Service Latency (system_status)
|
|
235
249
|
|
|
@@ -242,49 +256,18 @@ All benchmarks measured on a local development machine. Results will vary based
|
|
|
242
256
|
| Codex CLI (version check) | 131-136 ms | CLI startup overhead |
|
|
243
257
|
| Gemini CLI (version check) | 4,700-4,900 ms | CLI startup + auth check |
|
|
244
258
|
|
|
245
|
-
###
|
|
246
|
-
|
|
247
|
-
| Model | Task | Tokens | Time | Notes |
|
|
248
|
-
|---|---|---|---|---|
|
|
249
|
-
| Deepseek R1 Qwen3 8B | Sentiment classification | 343 (303 reasoning) | ~21s | Reasoning model, <think> tags stripped |
|
|
250
|
-
| Deepseek R1 Qwen3 8B | JSON extraction | ~400 | ~25s | Structured output from natural language |
|
|
251
|
-
| Deepseek R1 Qwen3 8B | Simple greeting | 151 (145 reasoning) | ~18s | Reasoning overhead even for trivial tasks |
|
|
252
|
-
|
|
253
|
-
### Prompt Splitting (prompt_split)
|
|
254
|
-
|
|
255
|
-
| Strategy | Latency | Notes |
|
|
256
|
-
|---|---|---|
|
|
257
|
-
| Heuristic (keyword) | <1 ms | Instant, no LLM call |
|
|
258
|
-
| Short-circuit (single domain) | <1 ms | Auto-detected, no LLM call |
|
|
259
|
-
| Local LLM | 60s (fallback) | Deepseek R1 struggles with structured JSON output, falls back to heuristic |
|
|
260
|
-
| Gemini | 5-15s | Best quality splitting (requires Gemini CLI) |
|
|
261
|
-
|
|
262
|
-
### Tool Operations
|
|
263
|
-
|
|
264
|
-
| Operation | Latency | Notes |
|
|
265
|
-
|---|---|---|
|
|
266
|
-
| Memory search (SSH, single call) | 208-391 ms | grep across 90 days of daily logs |
|
|
267
|
-
| Home light on/off | 48-84 ms | Direct HA REST API |
|
|
268
|
-
| Cron job listing | ~300 ms | SSH + openclaw CLI |
|
|
269
|
-
| File listing (SSH) | ~300 ms | Remote directory listing |
|
|
270
|
-
|
|
271
|
-
### Notes on Hardware
|
|
272
|
-
|
|
273
|
-
This setup runs all local inference on **CPU only** (Threadripper 3960X, 48 threads). The Radeon RX 9070 XT is not yet utilized for LLM inference because:
|
|
274
|
-
|
|
275
|
-
- LM Studio uses llama.cpp which requires ROCm or Vulkan for AMD GPUs
|
|
276
|
-
- ROCm support for RDNA 4 (RX 9070 series) is still maturing
|
|
277
|
-
- A Vulkan backend for llama.cpp is in development
|
|
259
|
+
### prompt_split Accuracy (heuristic strategy)
|
|
278
260
|
|
|
279
|
-
|
|
261
|
+
| Metric | Result |
|
|
262
|
+
|--------|--------|
|
|
263
|
+
| Pass rate | 6/10 (60%) |
|
|
264
|
+
| Task count accuracy | 7/10 (70%) |
|
|
265
|
+
| Avg agent match | 70% |
|
|
266
|
+
| Latency | <1ms (no LLM call) |
|
|
280
267
|
|
|
281
|
-
|
|
282
|
-
- NVIDIA GPUs (RTX 3090/4090) with CUDA
|
|
283
|
-
- AMD GPUs with ROCm (RDNA 3, MI300X)
|
|
284
|
-
- Intel Arc GPUs with SYCL
|
|
285
|
-
- ARM servers (Graviton, Ampere Altra)
|
|
268
|
+
The `auto` strategy (Gemini or local LLM) handles complex multi-step prompts with higher accuracy. See [BENCHMARKS.md](BENCHMARKS.md) for details.
|
|
286
269
|
|
|
287
|
-
|
|
270
|
+
> Want to contribute benchmarks from your hardware? See [BENCHMARKS.md](BENCHMARKS.md#community-contributions).
|
|
288
271
|
|
|
289
272
|
---
|
|
290
273
|
|
package/package.json
CHANGED
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
{
|
|
2
2
|
"name": "@elvatis_com/elvatis-mcp",
|
|
3
|
-
"version": "0.
|
|
3
|
+
"version": "0.7.0",
|
|
4
4
|
"description": "MCP server for OpenClaw — expose smart home, memory, cron, and more to Claude Desktop, Cursor, Windsurf, and any MCP client",
|
|
5
5
|
"main": "dist/index.js",
|
|
6
6
|
"types": "dist/index.d.ts",
|