localclaw 0.3.0__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (49) hide show
  1. localclaw-0.3.0/PKG-INFO +1035 -0
  2. localclaw-0.3.0/README.md +1001 -0
  3. localclaw-0.3.0/localclaw/__init__.py +120 -0
  4. localclaw-0.3.0/localclaw/__main__.py +27 -0
  5. localclaw-0.3.0/localclaw/acp_plugin.py +2344 -0
  6. localclaw-0.3.0/localclaw/bitnet_client.py +151 -0
  7. localclaw-0.3.0/localclaw/bitnet_setup.py +30 -0
  8. localclaw-0.3.0/localclaw/cli.py +1234 -0
  9. localclaw-0.3.0/localclaw/config.py +99 -0
  10. localclaw-0.3.0/localclaw/core/agent.py +1875 -0
  11. localclaw-0.3.0/localclaw/core/math_prompts.py +311 -0
  12. localclaw-0.3.0/localclaw/core/memory.py +191 -0
  13. localclaw-0.3.0/localclaw/core/ollama_client.py +273 -0
  14. localclaw-0.3.0/localclaw/core/orchestrator.py +191 -0
  15. localclaw-0.3.0/localclaw/core/orchestrator_enhanced.py +393 -0
  16. localclaw-0.3.0/localclaw/core/tools.py +275 -0
  17. localclaw-0.3.0/localclaw/model_discovery.py +341 -0
  18. localclaw-0.3.0/localclaw/skills/__init__.py +24 -0
  19. localclaw-0.3.0/localclaw/skills/acp/SKILL.md +309 -0
  20. localclaw-0.3.0/localclaw/skills/datetime/SKILL.md +25 -0
  21. localclaw-0.3.0/localclaw/skills/loader.py +445 -0
  22. localclaw-0.3.0/localclaw/skills/skill-creator/SKILL.md +111 -0
  23. localclaw-0.3.0/localclaw/skills/skill-creator/scripts/__init__.py +0 -0
  24. localclaw-0.3.0/localclaw/skills/skill-creator/scripts/aggregate_benchmark.py +401 -0
  25. localclaw-0.3.0/localclaw/skills/skill-creator/scripts/generate_report.py +326 -0
  26. localclaw-0.3.0/localclaw/skills/skill-creator/scripts/improve_description.py +247 -0
  27. localclaw-0.3.0/localclaw/skills/skill-creator/scripts/init_skill.py +378 -0
  28. localclaw-0.3.0/localclaw/skills/skill-creator/scripts/package_skill.py +139 -0
  29. localclaw-0.3.0/localclaw/skills/skill-creator/scripts/quick_validate.py +159 -0
  30. localclaw-0.3.0/localclaw/skills/skill-creator/scripts/run_eval.py +310 -0
  31. localclaw-0.3.0/localclaw/skills/skill-creator/scripts/run_loop.py +328 -0
  32. localclaw-0.3.0/localclaw/skills/skill-creator/scripts/security_scan.py +144 -0
  33. localclaw-0.3.0/localclaw/skills/skill-creator/scripts/test_package_skill.py +160 -0
  34. localclaw-0.3.0/localclaw/skills/skill-creator/scripts/test_quick_validate.py +72 -0
  35. localclaw-0.3.0/localclaw/skills/skill-creator/scripts/utils.py +47 -0
  36. localclaw-0.3.0/localclaw/skills/skill-creator/scripts/validate.py +147 -0
  37. localclaw-0.3.0/localclaw/skills/web_search/SKILL.md +138 -0
  38. localclaw-0.3.0/localclaw/tools/builtins.py +667 -0
  39. localclaw-0.3.0/localclaw.egg-info/PKG-INFO +1035 -0
  40. localclaw-0.3.0/localclaw.egg-info/SOURCES.txt +47 -0
  41. localclaw-0.3.0/localclaw.egg-info/dependency_links.txt +1 -0
  42. localclaw-0.3.0/localclaw.egg-info/entry_points.txt +5 -0
  43. localclaw-0.3.0/localclaw.egg-info/requires.txt +5 -0
  44. localclaw-0.3.0/localclaw.egg-info/top_level.txt +1 -0
  45. localclaw-0.3.0/pyproject.toml +85 -0
  46. localclaw-0.3.0/setup.cfg +4 -0
  47. localclaw-0.3.0/tests/test_acp_integration.py +210 -0
  48. localclaw-0.3.0/tests/test_acp_subagents.py +302 -0
  49. localclaw-0.3.0/tests/test_agent.py +266 -0
@@ -0,0 +1,1035 @@
1
+ Metadata-Version: 2.4
2
+ Name: localclaw
3
+ Version: 0.3.0
4
+ Summary: A minimal, hackable agentic framework for Ollama and BitNet - local-first AI agent toolkit
5
+ Author-email: VTSTech <veritas@vts-tech.org>
6
+ Maintainer-email: VTSTech <veritas@vts-tech.org>
7
+ License: MIT
8
+ Project-URL: Homepage, https://www.vts-tech.org
9
+ Project-URL: Documentation, https://github.com/VTSTech/LocalClaw#readme
10
+ Project-URL: Repository, https://github.com/VTSTech/LocalClaw
11
+ Project-URL: Issues, https://github.com/VTSTech/LocalClaw/issues
12
+ Project-URL: Changelog, https://github.com/VTSTech/LocalClaw/blob/main/CHANGELOG.md
13
+ Keywords: ai,agent,llm,ollama,bitnet,local-ai,agentic,tool-use,function-calling,cli
14
+ Classifier: Development Status :: 4 - Beta
15
+ Classifier: Environment :: Console
16
+ Classifier: Intended Audience :: Developers
17
+ Classifier: Intended Audience :: Science/Research
18
+ Classifier: License :: OSI Approved :: MIT License
19
+ Classifier: Operating System :: OS Independent
20
+ Classifier: Programming Language :: Python :: 3
21
+ Classifier: Programming Language :: Python :: 3.9
22
+ Classifier: Programming Language :: Python :: 3.10
23
+ Classifier: Programming Language :: Python :: 3.11
24
+ Classifier: Programming Language :: Python :: 3.12
25
+ Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
26
+ Classifier: Topic :: Software Development :: Libraries :: Python Modules
27
+ Classifier: Topic :: Terminals
28
+ Requires-Python: >=3.9
29
+ Description-Content-Type: text/markdown
30
+ Provides-Extra: dev
31
+ Requires-Dist: pytest>=7.0; extra == "dev"
32
+ Requires-Dist: black>=23.0; extra == "dev"
33
+ Requires-Dist: ruff>=0.1.0; extra == "dev"
34
+
35
+ # 🦞 LocalClaw R03
36
+
37
+ A minimal, hackable agentic framework engineered to run **entirely locally** with [Ollama](https://ollama.com) or [BitNet](https://github.com/microsoft/BitNet).
38
+
39
+ Inspired by the architecture of OpenClaw, rebuilt from scratch for local-first operation.
40
+
41
+ **Written by [VTSTech](https://www.vts-tech.org)** Β· [GitHub](https://github.com/VTSTech/LocalClaw)
42
+
43
+ ---
44
+
45
+ ## Architecture
46
+
47
+ ```
48
+ localclaw/
49
+ β”œβ”€β”€ core/
50
+ β”‚ β”œβ”€β”€ ollama_client.py # Zero-dependency HTTP wrapper (stdlib urllib only)
51
+ β”‚ β”œβ”€β”€ tools.py # Decorator-based tool registry + JSON schema generation
52
+ β”‚ β”œβ”€β”€ memory.py # Sliding-window conversation memory with summarization
53
+ β”‚ β”œβ”€β”€ agent.py # ReAct loop β€” native tool-call + text-fallback modes
54
+ β”‚ └── orchestrator.py # Multi-agent routing (router / pipeline / parallel)
55
+ β”œβ”€β”€ skills/
56
+ β”‚ β”œβ”€β”€ loader.py # Agent Skills specification loader (progressive disclosure)
57
+ β”‚ β”œβ”€β”€ skill-creator/ # OpenClaw skill-creator for generating new skills
58
+ β”‚ β”œβ”€β”€ acp/ # ACP (Agent Control Panel) skill
59
+ β”‚ β”œβ”€β”€ datetime/ # Datetime utilities skill
60
+ β”‚ └── web_search/ # Web search skill
61
+ β”œβ”€β”€ tools/
62
+ β”‚ └── builtins.py # Ready-to-use tools: calculator, shell, file I/O, HTTP, REPL
63
+ β”œβ”€β”€ bitnet_client.py # R03: BitNet backend client (Microsoft 1.58-bit quantization)
64
+ β”œβ”€β”€ bitnet_setup.py # R03: BitNet setup/compilation helper
65
+ β”œβ”€β”€ acp_plugin.py # ACP integration for activity tracking and A2A messaging
66
+ β”œβ”€β”€ model_discovery.py # R03: Dynamic model discovery for both backends
67
+ └── examples/
68
+ β”œβ”€β”€ 01_basic_agent.py # Simple Q&A demo
69
+ β”œβ”€β”€ 02_tool_agent.py # Tool calling demo
70
+ β”œβ”€β”€ 03_orchestrator.py # Multi-agent routing demo
71
+ β”œβ”€β”€ 04_comprehensive_test.py # Full test suite (supports BitNet)
72
+ β”œβ”€β”€ 04_comprehensive_test_acp.py # ACP-tracked version
73
+ β”œβ”€β”€ 05_tool_tests.py # Tool-specific tests
74
+ β”œβ”€β”€ 06_interactive_chat.py # Interactive CLI chat
75
+ β”œβ”€β”€ 07_model_comparison.py # Compare models on 15 tests (3 per category)
76
+ β”œβ”€β”€ 07_model_comparison_acp.py # ACP-tracked version with model logging
77
+ β”œβ”€β”€ 08_robust_comparison.py # Progress-saving comparison for unstable connections
78
+ β”œβ”€β”€ 08_robust_comparison_acp.py # ACP-tracked version with resumability
79
+ β”œβ”€β”€ 09_expanded_benchmark.py # 25 tests across 8 categories
80
+ β”œβ”€β”€ 10_skills_demo.py # Agent Skills system demo
81
+ └── 11_skill_creator_test.py # Skill creation benchmark across models
82
+ ```
83
+
84
+ ### Test Scripts
85
+
86
+ ```
87
+ test.sh # Bash: Run all 11 examples (Linux/macOS/Colab)
88
+ test-quick.sh # Bash: Run 7 quick tests (skips benchmarks)
89
+ run.sh # Bash: Interactive menu for single example
90
+ test-bitnet.sh # Bash: Run BitNet benchmark tests
91
+ test.cmd # Batch: Run all 11 examples (Windows)
92
+ test-quick.cmd # Batch: Run 7 quick tests (Windows)
93
+ run.cmd # Batch: Interactive menu for single example (Windows)
94
+ test-bitnet.cmd # Batch: Run BitNet benchmark tests (Windows)
95
+ ```
96
+
97
+ ### Core design decisions
98
+
99
+ | Concern | Approach |
100
+ |---|---|
101
+ | **HTTP Client** | Zero external dependencies β€” uses Python stdlib `urllib` only |
102
+ | **Backends** | Ollama (default) or BitNet (R03) β€” switch via `--backend` flag |
103
+ | **Tool calling** | Native Ollama tool-call protocol when supported; automatic ReAct text-parsing fallback for other models |
104
+ | **Memory** | Sliding window β€” older turns are archived and optionally compressed via LLM summarization |
105
+ | **Tools** | Decorator-based, auto-generates JSON schemas from Python type hints |
106
+ | **Orchestration** | Router (LLM picks agent), Pipeline (chain), or Parallel (concurrent + merge) |
107
+ | **Streaming** | First-class via generator interface |
108
+ | **Error handling** | Automatic retry with exponential backoff for transient network/server errors |
109
+ | **Security** | Path validation, command blocklist, SSRF protection (R03) |
110
+
111
+ ---
112
+
113
+ ## Installation
114
+
115
+ ### From PyPI (Recommended)
116
+
117
+ ```bash
118
+ pip install localclaw
119
+
120
+ # Or install from GitHub for the latest development version:
121
+ pip install git+https://github.com/VTSTech/LocalClaw.git
122
+ ```
123
+
124
+ ### From Source
125
+
126
+ ```bash
127
+ # Clone the repository
128
+ git clone https://github.com/VTSTech/LocalClaw.git
129
+ cd LocalClaw
130
+
131
+ # Install in development mode
132
+ pip install -e .
133
+ ```
134
+
135
+ ### No Installation Required
136
+
137
+ LocalClaw uses only Python stdlib β€” no dependencies! You can also just copy the `localclaw` directory into your project:
138
+
139
+ ```bash
140
+ # Just copy and use
141
+ cp -r localclaw /path/to/your/project/
142
+ ```
143
+
144
+ ### Setup Ollama
145
+
146
+ ```bash
147
+ # Make sure Ollama is running:
148
+ ollama serve
149
+
150
+ # Pull a model:
151
+ ollama pull qwen2.5-coder:0.5b-instruct-q4_k_m
152
+ ```
153
+
154
+ ### Usage After Installation
155
+
156
+ ```bash
157
+ # Use the CLI command
158
+ localclaw chat --model llama3.1:8b
159
+
160
+ # Or use as a module
161
+ python -m localclaw chat --model llama3.1:8b
162
+
163
+ # Or in Python code
164
+ from localclaw import Agent
165
+ agent = Agent(model="llama3.1:8b")
166
+ ```
167
+
168
+ ### BitNet Backend (R03)
169
+
170
+ LocalClaw supports Microsoft's BitNet for 1.58-bit ternary weight models β€” highly efficient CPU inference.
171
+
172
+ #### Supported Models
173
+
174
+ | Model | Size | HuggingFace Repo |
175
+ |-------|------|------------------|
176
+ | **BitNet-b1.58-2B-4T** | ~0.4 GB | `microsoft/BitNet-b1.58-2B-4T` |
177
+ | **Falcon3-1B-Instruct** | ~1 GB | `tiiuae/Falcon3-1B-Instruct-1.58bit` |
178
+ | **Falcon3-3B-Instruct** | ~3 GB | `tiiuae/Falcon3-3B-Instruct-1.58bit` |
179
+ | **Falcon3-7B-Instruct** | ~7 GB | `tiiuae/Falcon3-7B-Instruct-1.58bit` |
180
+ | **Falcon3-10B-Instruct** | ~10 GB | `tiiuae/Falcon3-10B-Instruct-1.58bit` |
181
+
182
+ #### Setup (One Command with huggingface-cli)
183
+
184
+ BitNet's `setup_env.py` handles everything: download, convert to GGUF, quantize, and compile kernels.
185
+
186
+ ```bash
187
+ # Clone BitNet
188
+ git clone --recursive https://github.com/microsoft/BitNet.git
189
+ cd BitNet
190
+ pip install -r requirements.txt
191
+
192
+ # Download, convert, and prepare a model (choose one):
193
+ python setup_env.py --hf-repo microsoft/BitNet-b1.58-2B-4T -q i2_s # Recommended
194
+ python setup_env.py --hf-repo tiiuae/Falcon3-1B-Instruct-1.58bit -q i2_s # Smallest Falcon
195
+ python setup_env.py --hf-repo tiiuae/Falcon3-3B-Instruct-1.58bit -q i2_s # Best balance
196
+ python setup_env.py --hf-repo tiiuae/Falcon3-7B-Instruct-1.58bit -q i2_s # Most capable
197
+ ```
198
+
199
+ This automatically:
200
+ 1. Downloads the model from HuggingFace (safetensors format)
201
+ 2. Converts to GGUF format
202
+ 3. Quantizes to `i2_s` (1.58-bit ternary)
203
+ 4. Compiles optimized CPU kernels
204
+
205
+ #### Manual Download (wget)
206
+
207
+ If you prefer not to use huggingface-cli, download directly with wget:
208
+
209
+ ```bash
210
+ # Create model directory
211
+ mkdir -p models/Falcon3-1B-Instruct-1.58bit
212
+ cd models/Falcon3-1B-Instruct-1.58bit
213
+
214
+ # Download model files (~1.3GB for 1B, ~3.2GB for 3B, ~7.5GB for 7B)
215
+ wget https://huggingface.co/tiiuae/Falcon3-1B-Instruct-1.58bit/resolve/main/model.safetensors
216
+ wget https://huggingface.co/tiiuae/Falcon3-1B-Instruct-1.58bit/resolve/main/config.json
217
+ wget https://huggingface.co/tiiuae/Falcon3-1B-Instruct-1.58bit/resolve/main/tokenizer.json
218
+ wget https://huggingface.co/tiiuae/Falcon3-1B-Instruct-1.58bit/resolve/main/tokenizer_config.json
219
+ wget https://huggingface.co/tiiuae/Falcon3-1B-Instruct-1.58bit/resolve/main/special_tokens_map.json
220
+ wget https://huggingface.co/tiiuae/Falcon3-1B-Instruct-1.58bit/resolve/main/generation_config.json
221
+
222
+ # Or for BitNet-b1.58-2B-4T (~400MB):
223
+ mkdir -p models/BitNet-b1.58-2B-4T
224
+ cd models/BitNet-b1.58-2B-4T
225
+ wget https://huggingface.co/microsoft/BitNet-b1.58-2B-4T/resolve/main/model.safetensors
226
+ wget https://huggingface.co/microsoft/BitNet-b1.58-2B-4T/resolve/main/config.json
227
+ wget https://huggingface.co/microsoft/BitNet-b1.58-2B-4T/resolve/main/tokenizer.json
228
+ wget https://huggingface.co/microsoft/BitNet-b1.58-2B-4T/resolve/main/tokenizer_config.json
229
+ ```
230
+
231
+ Then run setup_env.py pointing to your downloaded model:
232
+
233
+ ```bash
234
+ cd ../.. # Back to BitNet root
235
+ python setup_env.py --model-dir models/Falcon3-1B-Instruct-1.58bit -q i2_s
236
+ ```
237
+
238
+ #### Model File Sizes
239
+
240
+ | Model | model.safetensors | Total Download |
241
+ |-------|-------------------|----------------|
242
+ | Falcon3-1B-Instruct | ~1.3 GB | ~1.4 GB |
243
+ | Falcon3-3B-Instruct | ~3.2 GB | ~3.4 GB |
244
+ | Falcon3-7B-Instruct | ~7.5 GB | ~7.8 GB |
245
+ | BitNet-b1.58-2B-4T | ~400 MB | ~500 MB |
246
+
247
+ #### Start the Server
248
+
249
+ ```bash
250
+ # Start BitNet server (separate terminal)
251
+ ./build/bin/llama-server -m models/BitNet-b1.58-2B-4T/ggml-model-i2_s.gguf
252
+
253
+ # Or for Falcon models:
254
+ ./build/bin/llama-server -m models/Falcon3-1B-Instruct-1.58bit/ggml-model-i2_s.gguf
255
+ ```
256
+
257
+ #### Use with LocalClaw
258
+
259
+ ```bash
260
+ # Set BitNet URL (default: http://localhost:8080)
261
+ export BITNET_BASE_URL=http://localhost:8080
262
+
263
+ # Chat with BitNet backend
264
+ localclaw chat --backend bitnet --force-react
265
+
266
+ # With tools
267
+ localclaw chat --backend bitnet --force-react --tools calculator,shell
268
+ ```
269
+
270
+ > **Note**: BitNet models require `--force-react` as they don't support native tool calling.
271
+
272
+ #### Colab Quick Start
273
+
274
+ ```bash
275
+ # Cell 1: Setup BitNet with Falcon3-1B (fastest option)
276
+ !git clone --recursive https://github.com/microsoft/BitNet.git
277
+ %cd BitNet
278
+ !pip install -r requirements.txt
279
+ !python setup_env.py --hf-repo tiiuae/Falcon3-1B-Instruct-1.58bit -q i2_s
280
+
281
+ # Cell 2: Start server in background
282
+ import subprocess, time
283
+ server = subprocess.Popen(
284
+ ['./build/bin/llama-server', '-m', 'models/Falcon3-1B-Instruct-1.58bit/ggml-model-i2_s.gguf', '--port', '8080'],
285
+ stdout=subprocess.PIPE, stderr=subprocess.PIPE
286
+ )
287
+ time.sleep(5) # Wait for server startup
288
+
289
+ # Cell 3: Clone and run LocalClaw
290
+ %cd /content
291
+ !git clone https://github.com/VTSTech/LocalClaw.git
292
+ %cd LocalClaw
293
+ !localclaw chat --backend bitnet --force-react
294
+ ```
295
+
296
+ #### Model Comparison
297
+
298
+ | Model | Speed | Quality | Best For |
299
+ |-------|-------|---------|----------|
300
+ | BitNet-b1.58-2B-4T | ⚑⚑⚑ | Good | Quick tasks, testing |
301
+ | Falcon3-1B-Instruct | ⚑⚑⚑ | Good | Fastest inference |
302
+ | Falcon3-3B-Instruct | ⚑⚑ | Better | Balanced performance |
303
+ | Falcon3-7B-Instruct | ⚑ | Best | Complex reasoning |
304
+
305
+ > **BitNet Benchmark Results**: BitNet-b1.58-2B-4T achieved **87%** on the LocalClaw benchmark β€” see **BitNet Benchmark Results** section below.
306
+
307
+ ---
308
+
309
+ ## Quick start
310
+
311
+ ### 1. Single prompt
312
+
313
+ ```bash
314
+ # Simple Q&A
315
+ localclaw run "What is the capital of Japan?"
316
+
317
+ # With streaming output
318
+ localclaw run "Tell me a joke." --stream
319
+
320
+ # Specify a model
321
+ localclaw run "Explain quantum computing" -m llama3.2:3b
322
+ ```
323
+
324
+ ### 2. Interactive chat
325
+
326
+ ```bash
327
+ # Start interactive session
328
+ localclaw chat -m qwen2.5-coder:0.5b
329
+
330
+ # With tools enabled
331
+ localclaw chat -m llama3.1:8b --tools calculator,shell,read_file,write_file
332
+
333
+ # With skills loaded
334
+ localclaw chat -m llama3.2:3b --skills skill-creator --tools write_file,shell
335
+
336
+ # Fast mode (reduced context for speed)
337
+ localclaw chat -m qwen2.5-coder:0.5b --fast --verbose
338
+ ```
339
+
340
+ ### 3. Using BitNet backend
341
+
342
+ ```bash
343
+ # BitNet requires --force-react for tool support
344
+ localclaw chat --backend bitnet --force-react
345
+
346
+ # Run single prompt with BitNet
347
+ localclaw run "Calculate 17 * 23" --backend bitnet --tools calculator
348
+ ```
349
+
350
+ ### 4. With ACP tracking
351
+
352
+ ```bash
353
+ # Enable ACP for activity monitoring
354
+ localclaw chat -m qwen2.5-coder:0.5b --acp --tools shell,read_file,write_file
355
+
356
+ # Single prompt with ACP
357
+ localclaw run "What is 2+2?" --acp
358
+ ```
359
+
360
+ ---
361
+
362
+ ## CLI Commands
363
+
364
+ | Command | Description |
365
+ |---------|-------------|
366
+ | `run "prompt"` | Run single prompt and exit |
367
+ | `chat` | Interactive multi-turn conversation |
368
+ | `models` | List available Ollama models |
369
+ | `tools` | List built-in tools |
370
+ | `skills` | List available Agent Skills |
371
+
372
+ ### CLI Flags
373
+
374
+ | Flag | Description |
375
+ |------|-------------|
376
+ | `-m`, `--model` | Model name (default: qwen2.5-coder:0.5b) |
377
+ | `--tools` | Comma-separated tool list |
378
+ | `--skills` | Comma-separated skill list |
379
+ | `--backend` | `ollama` or `bitnet` |
380
+ | `--force-react` | Force ReAct text parsing |
381
+ | `--acp` | Enable ACP integration |
382
+ | `-v`, `--verbose` | Show tool calls and timing |
383
+ | `--debug` | Show detailed debug info |
384
+ | `--fast` | Preset: reduced context for speed |
385
+ | `--warmup` | Pre-load model before chat |
386
+ | `--stream` | Stream output token-by-token |
387
+ | `--temperature` | Sampling temperature (0.0-2.0) |
388
+ | `--num-ctx` | Context window size |
389
+ | `--num-predict` | Max output tokens |
390
+
391
+ ### Interactive Commands (in chat)
392
+
393
+ | Command | Description |
394
+ |---------|-------------|
395
+ | `/help` | Show available commands |
396
+ | `/status` | Show session status |
397
+ | `/tools` | List active tools |
398
+ | `/skills` | List active skills |
399
+ | `/reset` | Clear conversation history |
400
+ | `/undo` | Remove last exchange |
401
+ | `/retry` | Retry last message |
402
+ | `/a2a` | Process pending A2A messages |
403
+ | `/export` | Export to markdown |
404
+ | `exit` | End session |
405
+
406
+ ---
407
+
408
+ ## Built-in Tools
409
+
410
+ | Tool | Description |
411
+ |------|-------------|
412
+ | `calculator` | Evaluate math expressions |
413
+ | `python_repl` | Execute Python code |
414
+ | `shell` | Run shell commands |
415
+ | `read_file` | Read file contents |
416
+ | `write_file` | Write content to file |
417
+ | `list_directory` | List directory contents |
418
+ | `http_get` | HTTP GET request |
419
+ | `save_note` | Save a note to memory |
420
+ | `get_note` | Retrieve saved notes |
421
+
422
+ ```bash
423
+ # List all tools
424
+ localclaw tools
425
+
426
+ # Use specific tools
427
+ localclaw chat --tools calculator,python_repl,shell
428
+ ```
429
+
430
+ ---
431
+
432
+ ## Built-in Skills
433
+
434
+ | Skill | Description |
435
+ |-------|-------------|
436
+ | `skill-creator` | Generate new Agent Skills from requests |
437
+ | `datetime` | Date/time formatting and calculations |
438
+ | `web_search` | Web search capabilities |
439
+
440
+ ```bash
441
+ # List all skills
442
+ localclaw skills
443
+
444
+ # Use skills in chat
445
+ localclaw chat --skills skill-creator --tools write_file
446
+ ```
447
+
448
+ ---
449
+
450
+ ## Supported models (tool-calling)
451
+
452
+ The following model families support native tool calling in Ollama and are auto-detected:
453
+
454
+ **Meta Llama**: `llama3`, `llama3.1`, `llama3.2`, `llama3.3`, `llama3-groq-tool-use`
455
+
456
+ **Mistral AI**: `mistral`, `mixtral`, `mistral-nemo`, `mistral-small`, `mistral-large`, `codestral`, `ministral`
457
+
458
+ **Alibaba Qwen**: `qwen2`, `qwen2.5`, `qwen3`, `qwen35`, `qwen2.5-coder`, `qwen2-math`
459
+
460
+ **Cohere**: `command-r`, `command-r7b`
461
+
462
+ **DeepSeek**: `deepseek`, `deepseek-coder`, `deepseek-v2`, `deepseek-v3`
463
+
464
+ **Microsoft Phi**: `phi-3`, `phi3`, `phi-4`
465
+
466
+ **Google Gemma**: `functiongemma` (designed for function calling)
467
+
468
+ **Others**: `yi-`, `yi1.5`, `internlm2`, `internlm2.5`, `solar`, `glm4`, `chatglm`, `firefunction`, `hermes`, `nemotron`, `cogito`, `athene`
469
+
470
+ All other models fall back to **ReAct text-parsing** automatically.
471
+
472
+ ---
473
+
474
+ ## Tested Small Models (≀1.5B parameters)
475
+
476
+ The following models have been tested with a **15-test benchmark** (3 tests per category: Math, Reasoning, Knowledge, Calc Tool, Code). Prompts are optimized for small model comprehension.
477
+
478
+ ### Rankings (Updated)
479
+
480
+ | Rank | Model | Score | Time | Math | Reason | Know | Calc | Code |
481
+ |:----:|-------|------:|-----:|:----:|:------:|:----:|:----:|:----:|
482
+ | πŸ₯‡ | `qwen2.5-coder:0.5b-instruct-q4_k_m` | **14/15 (93%)** | ~80s | **3/3** | 2/3 | 2/3 | **3/3** | **3/3** |
483
+ | πŸ₯ˆ | **`BitNet-b1.58-2B-4T`** (BitNet) | **13/15 (87%)** | ~394s | **3/3** | 2/3 | 2/3 | **3/3** | **3/3** |
484
+ | πŸ₯‰ | `granite3.1-moe:1b` | **12/15 (80%)** | ~60s | **3/3** | 2/3 | **3/3** | 1/3 | **3/3** |
485
+ | 4 | `llama3.2:1b` | **12/15 (80%)** | ~600s | **3/3** | 1/3 | 2/3 | **3/3** | **3/3** |
486
+ | 5 | `gemma3:270m` | 10/15 (67%) | ~75s | **3/3** | 1/3 | 1/3 | 2/3 | **3/3** |
487
+ | 6 | `qwen3:0.6b` | ~9/12 | ~130s | 2/3 | **3/3** | **3/3** | 0/3 | β€” |
488
+ | 7 | `granite4:350m` | 8/15 (53%) | ~97s | 2/3 | 1/3 | 2/3 | 0/3 | **3/3** |
489
+ | 8 | `qwen2.5:0.5b` | 10/15 (67%) | ~107s | 1/3 | **3/3** | **3/3** | 0/3 | **3/3** |
490
+ | 9 | `qwen2-math:1.5b` | 12/15 (80%) | ~611s | **3/3** | **3/3** | **3/3** | ❌ | **3/3** |
491
+ | 10 | `tinyllama:latest` | 9/15 (60%) | ~587s | 2/3 | 2/3 | **3/3** | 0/3 | 2/3 |
492
+ | 11 | `smollm:135m` | 7/15 (47%) | ~285s | 0/3 | 2/3 | 2/3 | 0/3 | **3/3** |
493
+ | 12 | `functiongemma:270m` | 1/15 (7%) | ~90s | 0/3 | 0/3 | 0/3 | 0/3 | 1/3 |
494
+
495
+ > **Note**: Scores vary between runs due to model non-determinism. The `qwen2.5-coder:0.5b` achieved 100% in some runs.
496
+
497
+ ### Model Details
498
+
499
+ | Model | Params | Size | Speed | Tool Support | Notes |
500
+ |-------|--------|------|-------|--------------|-------|
501
+ | `qwen2.5-coder:0.5b` | 494M | ~400MB | ⚑ Fast | βœ… Native | **πŸ† Best overall!** Excellent tool usage |
502
+ | **`BitNet-b1.58-2B-4T`** | **2B** | **~1.3GB** | **⚑ Medium** | **⚠️ ReAct** | **πŸ₯ˆ 2nd place!** CPU-efficient ternary weights |
503
+ | `granite3.1-moe:1b` | 1B MoE | ~1.4GB | ⚑ Medium | βœ… Native | Strong knowledge, HTTP 500 on long context |
504
+ | `llama3.2:1b` | 1.2B | ~1.3GB | 🐒 Slow | βœ… Native | **128k context!** Thorough but slow |
505
+ | `gemma3:270m` | 270M | ~292MB | ⚑⚑ Fastest | ⚠️ ReAct JSON | Uses JSON ReAct format, Math & Code champion |
506
+ | `qwen3:0.6b` | 600M | ~523MB | ⚑ Medium | ⚠️ Text | Perfect reasoning but Calc returns empty |
507
+ | `granite4:350m` | 350M | ~708MB | ⚑ Fast | ❌ Refused | **Refuses calculator** - safety filter |
508
+ | `qwen2.5:0.5b` | 494M | ~398MB | ⚑ Fast | ⚠️ Text | **Reasoning & Knowledge champ**, Calc fails |
509
+ | `qwen2-math:1.5b` | 1.5B | ~935MB | 🐒 Slow | ❌ No tools | **4 perfect categories!** No tool support |
510
+ | `tinyllama:latest` | 1.1B | ~638MB | 🐒 Slow | ⚠️ Text | Older model, verbose, unstable |
511
+ | `smollm:135m` | 135M | ~92MB | ⚑ Fast | ❌ None | **Smallest** - hallucinates math (7Γ—8=42!) |
512
+ | `functiongemma:270m` | 270M | ~301MB | ⚑ Fast | ❌ Broken | **Worst performer** - returns empty |
513
+
514
+ ### Category Champions
515
+
516
+ | Category | Champion | Score | Notes |
517
+ |----------|----------|-------|-------|
518
+ | **Math** | `qwen2.5-coder:0.5b`, `granite3.1-moe:1b`, `BitNet-b1.58-2B` | 3/3 | Also gemma3:270m |
519
+ | **Reasoning** | `qwen2.5:0.5b`, `qwen3:0.6b`, `qwen2-math` | 3/3 | Multiple tied |
520
+ | **Knowledge** | `granite3.1-moe:1b`, `qwen2-math` | 3/3 | Multiple tied at 3/3 |
521
+ | **Calc** | `qwen2.5-coder:0.5b`, `llama3.2:1b`, `BitNet-b1.58-2B` | 3/3 | 100% tool usage with ReAct |
522
+ | **Code** | Many models | 3/3 | Code generation is easy for small models! |
523
+
524
+ ### Test Categories
525
+
526
+ | Category | Tests | What it measures |
527
+ |----------|-------|------------------|
528
+ | **Math** | Multiply, Add, Divide | Basic arithmetic without tools |
529
+ | **Reasoning** | Apples, Sequence, Logic | Multi-step reasoning and deduction |
530
+ | **Knowledge** | Japan, France, Brazil capitals | World knowledge recall |
531
+ | **Calc** | Multiply, Divide, Power | Tool usage with calculator |
532
+ | **Code** | is_even, reverse, max_num | Python function generation |
533
+
534
+ ### Recommendations
535
+
536
+ | Use Case | Recommended Model | Why |
537
+ |----------|-------------------|-----|
538
+ | **General use** | `qwen2.5-coder:0.5b-instruct-q4_k_m` | Best all-around, fast, great tool usage |
539
+ | **Large context** | `llama3.2:1b` | **128k context window** - handles long conversations |
540
+ | **Math tasks** | `qwen2.5-coder:0.5b` or `qwen2-math:1.5b` | Perfect math scores |
541
+ | **Reasoning tasks** | `qwen2.5:0.5b` or `qwen3:0.6b` | Perfect reasoning |
542
+ | **Tool usage** | `qwen2.5-coder:0.5b` | Most reliable tool calling |
543
+ | **Fastest inference** | `gemma3:270m` | 270M params, fastest responses |
544
+ | **No tools needed** | `qwen2-math:1.5b` | 4/5 categories perfect (no Calc) |
545
+ | **Smallest footprint** | `smollm:135m` | 92MB - but expect hallucinations |
546
+
547
+ ### ⚠️ Models to Avoid
548
+
549
+ | Model | Issue |
550
+ |-------|-------|
551
+ | `functiongemma:270m` | Despite the name, terrible at function calling - returns empty or refuses |
552
+ | `smollm:135m` | Hallucinates wrong math (7Γ—8=42), only 7/15 score |
553
+ | `granite4:350m` | Refuses calculator tools (safety filter) |
554
+
555
+ ### Known Issues with Small Models
556
+
557
+ 1. **Tool calling variations**:
558
+ - `granite4:350m`: Refuses calculator ("I'm sorry, but I can't assist with that")
559
+ - `functiongemma:270m`: Asks for clarification instead of using tools
560
+ - `qwen2.5:0.5b`, `qwen3:0.6b`: Returns empty responses on Calc tests
561
+ - `qwen2-math:1.5b`: HTTP 400 - doesn't support tool calling at all
562
+ 2. **Math hallucinations**: `smollm:135m` says "7Γ—8=42", `tinyllama` says "7Γ—8=45"
563
+ 3. **Power operator confusion**: `gemma3:270m` reads `2**10` as `2*10=20`
564
+ 4. **Reasoning failures**: Some models answer "8" for sequence "2,4,6,8,?" (repeat last)
565
+ 5. **Stability issues**:
566
+ - `granite3.1-moe:1b`: HTTP 500 crashes (server EOF)
567
+ - `tinyllama`, `qwen3:0.6b`: HTTP 524 timeouts
568
+ 6. **Empty responses**: `functiongemma:270m` returns empty strings on most tests
569
+
570
+ ---
571
+
572
+ ## Skills (Agent Skills Specification)
573
+
574
+ 🦞 LocalClaw R03 supports the **[Agent Skills](https://agentskills.io/)** specification for reusable instruction bundles.
575
+
576
+ ### Skill Structure
577
+
578
+ ```
579
+ skills/
580
+ └── my-skill/
581
+ β”œβ”€β”€ SKILL.md # Required: name, description, instructions
582
+ β”œβ”€β”€ scripts/ # Optional: executable scripts
583
+ β”œβ”€β”€ references/ # Optional: additional docs
584
+ └── assets/ # Optional: templates, images
585
+ ```
586
+
587
+ ### SKILL.md Format
588
+
589
+ ```yaml
590
+ ---
591
+ name: calculator
592
+ description: Perform mathematical calculations. Use when the user needs to compute expressions.
593
+ ---
594
+
595
+ # Calculator Skill
596
+
597
+ Instructions for the model on how to use this skill...
598
+ ```
599
+
600
+ ### Using Skills
601
+
602
+ ```bash
603
+ # Load skills via CLI
604
+ localclaw chat --skills skill-creator --tools write_file,shell
605
+
606
+ # Multiple skills
607
+ localclaw chat --skills datetime,web_search --tools calculator
608
+ ```
609
+
610
+ ### Progressive Disclosure
611
+
612
+ Skills follow a three-level loading system:
613
+
614
+ 1. **Metadata** (~100 tokens): `name` + `description` loaded at startup
615
+ 2. **Instructions** (<500 lines): Full `SKILL.md` body loaded when skill triggers
616
+ 3. **Resources** (as needed): Files in `scripts/`, `references/`, `assets/` loaded on demand
617
+
618
+ ### Built-in Skills
619
+
620
+ | Skill | Description |
621
+ |-------|-------------|
622
+ | `skill-creator` | OpenClaw's platform-agnostic skill generator. Creates new skills from user requests. |
623
+ | `datetime` | Date and time utilities for formatting, parsing, and calculations. |
624
+ | `web_search` | Web search capabilities for retrieving information from the internet. |
625
+
626
+ ---
627
+
628
+ ## Orchestrator modes
629
+
630
+ | Mode | Behaviour |
631
+ |---|---|
632
+ | `router` | A small routing LLM picks the best agent for each request |
633
+ | `pipeline` | Agents run sequentially β€” each receives the previous agent's output |
634
+ | `parallel` | All agents run concurrently; results are merged with attribution |
635
+
636
+ ---
637
+
638
+ ## Running the examples
639
+
640
+ ```bash
641
+ # Make sure Ollama is serving and you have a model pulled
642
+ ollama pull qwen2.5-coder:0.5b-instruct-q4_k_m
643
+
644
+ # Or use a remote Ollama instance by editing localclaw/core/ollama_client.py
645
+
646
+ # Quick test suite (recommended first run)
647
+ bash test-quick.sh # Linux/macOS/Colab
648
+ test-quick.cmd # Windows
649
+
650
+ # Full test suite (all 11 examples)
651
+ bash test.sh # Linux/macOS/Colab
652
+ test.cmd # Windows
653
+
654
+ # Interactive menu
655
+ bash run.sh # Linux/macOS/Colab
656
+ run.cmd # Windows
657
+
658
+ # Run individual examples
659
+ python examples/01_basic_agent.py
660
+ python examples/02_tool_agent.py
661
+ python examples/03_orchestrator.py
662
+ python examples/04_comprehensive_test.py
663
+ python examples/05_tool_tests.py
664
+ python examples/06_interactive_chat.py
665
+ python examples/07_model_comparison.py
666
+ python examples/08_robust_comparison.py
667
+ python examples/09_expanded_benchmark.py
668
+ python examples/10_skills_demo.py
669
+ python examples/11_skill_creator_test.py
670
+ ```
671
+
672
+ ---
673
+
674
+ ## ACP Integration (Agent Control Panel)
675
+
676
+ 🦞 LocalClaw R03 supports **[ACP (Agent Control Panel)](https://github.com/VTSTech/ACP-Agent-Control-Panel)** for centralized activity tracking, token monitoring, and multi-agent coordination.
677
+
678
+ ### What is ACP?
679
+
680
+ ACP is a monitoring and observability protocol for AI agents. Unlike communication protocols (MCP, A2A), ACP sits alongside your agents and provides:
681
+
682
+ - **Activity Tracking**: Real-time monitoring of all agent actions
683
+ - **Token Management**: Context window usage estimation per agent
684
+ - **Multi-Agent Coordination**: Track multiple agents in one session
685
+ - **STOP/Resume Control**: Emergency stop capability
686
+ - **Session Persistence**: State preserved across restarts
687
+
688
+ ### Enable ACP
689
+
690
+ ```bash
691
+ # Run with ACP tracking
692
+ localclaw chat --acp --tools shell,read_file,write_file -m qwen2.5-coder:0.5b
693
+
694
+ # Run single prompt with ACP
695
+ localclaw run --acp "What is 2+2?"
696
+ ```
697
+
698
+ ### Configuration
699
+
700
+ Set your ACP server URL via environment variables:
701
+
702
+ ```bash
703
+ # Local ACP
704
+ export ACP_URL="http://localhost:8766"
705
+
706
+ # Remote ACP (cloudflare tunnel)
707
+ export ACP_URL="https://your-tunnel.trycloudflare.com"
708
+
709
+ # Credentials
710
+ export ACP_USER="admin"
711
+ export ACP_PASS="secret"
712
+ ```
713
+
714
+ Or edit `localclaw/config.py` for persistent settings.
715
+
716
+ ### What Gets Logged
717
+
718
+ | Activity | Description |
719
+ |----------|-------------|
720
+ | **Bootstrap** | Session start, identity establishment |
721
+ | **User messages** | All prompts sent to the model |
722
+ | **Assistant messages** | All model responses |
723
+ | **Tool calls** | Shell commands, file operations, etc. |
724
+ | **Tool results** | Outcomes from tool execution |
725
+
726
+ ### Per-Agent Token Tracking
727
+
728
+ When multiple agents connect to the same ACP session:
729
+
730
+ ```json
731
+ {
732
+ "primary_agent": "Super Z",
733
+ "agent_tokens": {
734
+ "Super Z": 42000,
735
+ "LocalClaw": 500
736
+ },
737
+ "other_agents_tokens": 500
738
+ }
739
+ ```
740
+
741
+ - First agent to connect becomes **primary** (owns main context window)
742
+ - Other agents tracked separately in `agent_tokens`
743
+ - Prevents context pollution between agents
744
+
745
+ ### ACP Server
746
+
747
+ To run your own ACP server, see the [ACP Specification](https://github.com/VTSTech/ACP-Agent-Control-Panel):
748
+
749
+ ```bash
750
+ # ACP is a single Python file
751
+ python VTSTech-GLMACP.py
752
+
753
+ # With cloudflare tunnel
754
+ GLMACP_TUNNEL=auto python VTSTech-GLMACP.py
755
+ ```
756
+
757
+ ---
758
+
759
+ ## Remote Ollama Configuration
760
+
761
+ To use a remote Ollama instance (e.g., via Cloudflare tunnel), set the environment variable:
762
+
763
+ ```bash
764
+ # Local Ollama (default)
765
+ export OLLAMA_URL="http://localhost:11434"
766
+
767
+ # Remote Ollama (cloudflare tunnel)
768
+ export OLLAMA_URL="https://your-tunnel.trycloudflare.com"
769
+ ```
770
+
771
+ Or edit `localclaw/config.py` for persistent settings.
772
+
773
+ ### Timeout Configuration
774
+
775
+ Configure via environment variables:
776
+
777
+ ```bash
778
+ # Request timeout in seconds (default: 90s for Cloudflare tunnel compatibility)
779
+ export OLLAMA_TIMEOUT=90
780
+
781
+ # Max retry attempts for transient errors (default: 3)
782
+ export OLLAMA_MAX_RETRIES=3
783
+
784
+ # Initial retry delay in seconds (default: 5s, doubles each retry)
785
+ export OLLAMA_RETRY_DELAY=5
786
+ ```
787
+
788
+ ### Automatic Retry
789
+
790
+ LocalClaw automatically retries on transient errors with exponential backoff:
791
+
792
+ | Error Code | Description | Retry Behavior |
793
+ |------------|-------------|----------------|
794
+ | HTTP 524 | Cloudflare tunnel timeout | Retries up to 3 times |
795
+ | HTTP 502/503/504 | Server temporarily unavailable | Retries up to 3 times |
796
+ | HTTP 500 | Server error (model loading, memory pressure) | Retries up to 3 times |
797
+ | Timeout | Socket or connection timeout | Retries up to 3 times |
798
+
799
+ ---
800
+
801
+ ## Performance Optimization
802
+
803
+ ### CLI Options for Speed
804
+
805
+ ```bash
806
+ # Fast mode - reduces context and output for quicker responses
807
+ localclaw chat -m qwen2.5-coder:0.5b --fast --verbose
808
+
809
+ # Fine-tuned control
810
+ localclaw chat -m qwen2.5-coder:0.5b --num-ctx 2048 --num-predict 128
811
+
812
+ # Warm up model before chat (useful for remote Ollama with cold starts)
813
+ localclaw chat -m qwen2.5-coder:0.5b --warmup --fast
814
+ ```
815
+
816
+ | Option | Description | Speed Impact |
817
+ |--------|-------------|--------------|
818
+ | `--fast` | Preset: `num_ctx=2048`, `num_predict=256` | πŸš€ Significant |
819
+ | `--num-ctx N` | Reduce context window (default varies by model) | πŸš€ Significant |
820
+ | `--num-predict N` | Limit max output tokens | ⚑ Moderate |
821
+ | `--warmup` | Pre-load model before first chat | ⚑ Faster first response |
822
+
823
+ ### Ollama Model Options
824
+
825
+ Control model behavior via CLI flags:
826
+
827
+ ```bash
828
+ # Lower temperature = more deterministic
829
+ localclaw chat -m qwen2.5-coder:0.5b --temperature 0.1
830
+
831
+ # Smaller context = faster
832
+ localclaw chat -m qwen2.5-coder:0.5b --num-ctx 2048 --num-predict 128
833
+
834
+ # Combined for optimal speed
835
+ localclaw chat -m qwen2.5-coder:0.5b --fast --temperature 0.3
836
+ ```
837
+
838
+ ### Remote Ollama Tips
839
+
840
+ When using a **remote Ollama via Cloudflare tunnel**:
841
+
842
+ 1. **Use `--fast` flag** - Reduces inference time significantly
843
+ 2. **Use smaller models** - `qwen2.5-coder:0.5b` is fastest
844
+ 3. **Warm up the model** - First request is slowest due to model loading
845
+ 4. **Increase timeout if needed**: `export OLLAMA_TIMEOUT=120`
846
+
847
+ ```bash
848
+ # Recommended for remote Ollama
849
+ localclaw chat -m qwen2.5-coder:0.5b-instruct-q4_k_m \
850
+ --fast --warmup --verbose \
851
+ --tools python_repl
852
+ ```
853
+
854
+ ### Why Inference is Slow
855
+
856
+ | Factor | Impact | Solution |
857
+ |--------|--------|----------|
858
+ | **Model size** | Larger models = slower | Use smaller quantized models |
859
+ | **Context window** | More context = slower | Use `--num-ctx 2048` or smaller |
860
+ | **Output length** | More tokens = slower | Use `--num-predict 128` |
861
+ | **Remote connection** | Network latency | Use local Ollama if possible |
862
+ | **Cold start** | First load is slowest | Use `--warmup` flag |
863
+ | **GPU unavailable** | CPU inference is slow | Ensure GPU is configured |
864
+
865
+ ---
866
+
867
+ ## Recent Improvements
868
+
869
+ ### R03: BitNet Backend
870
+
871
+ 🦞 LocalClaw R03 adds **BitNet backend support** for running Microsoft's 1.58-bit quantized models:
872
+
873
+ - **New backend**: Switch between Ollama and BitNet via `--backend` flag
874
+ - **Zero-cost inference**: BitNet models run efficiently on CPU
875
+ - **Setup helper**: `bitnet_setup.py` handles cloning and compilation
876
+ - **Note**: BitNet requires ReAct fallback (no native tool support)
877
+
878
+ ### R03: Enhanced Security
879
+
880
+ Built-in tools now have comprehensive security:
881
+
882
+ - **Path validation**: Restrict file access to allowed directories
883
+ - **Command blocklist**: Block dangerous commands (`rm`, `sudo`, `chmod`, etc.)
884
+ - **Pattern detection**: Detect dangerous shell patterns (pipes to bash, command substitution)
885
+ - **SSRF protection**: Block private IPs and cloud metadata endpoints in `http_get`
886
+ - **Configurable modes**: `strict`, `permissive`, or `disabled`
887
+
888
+ ```bash
889
+ # Set security mode
890
+ export LOCALCLAW_SECURITY_MODE=strict
891
+ export LOCALCLAW_ALLOWED_PATHS=/home/user/projects:/tmp
892
+ export LOCALCLAW_BLOCKED_COMMANDS=rm,sudo,dd
893
+ ```
894
+
895
+ ### Zero Dependencies
896
+
897
+ 🦞 LocalClaw R03 continues to use **only Python stdlib** β€” no pip install required! The HTTP client uses `urllib` instead of `httpx`.
898
+
899
+ ### Automatic Error Recovery
900
+
901
+ - **HTTP 524/502/503/504/500 retry**: Transient server errors are automatically retried with exponential backoff
902
+ - **Timeout retry**: Socket timeouts are retried automatically
903
+ - **Configurable via environment variables**: `OLLAMA_TIMEOUT`, `OLLAMA_MAX_RETRIES`, `OLLAMA_RETRY_DELAY`
904
+
905
+ ### Small Model Support
906
+
907
+ 🦞 LocalClaw R03 handles quirks of small models (≀1.5B parameters):
908
+
909
+ - **Fuzzy tool name matching**: Hallucinated tool names like `calculate_expression` are automatically mapped to `calculator`
910
+ - **Argument auto-fixing**: Common wrong argument patterns are corrected (e.g., `{"base": 2, "exponent": 10}` β†’ `{"expression": "2 ** 10"}`)
911
+ - **JSON response cleaning**: When models output tool schemas instead of text answers, LocalClaw falls back to tool results
912
+ - **Unicode normalization**: Accented characters are normalized for comparison (e.g., "BrasΓ­lia" matches "brasilia")
913
+ - **ReAct text parsing**: Models without native tool support automatically fall back to text-based ReAct format
914
+
915
+ ### Optimized Test Prompts
916
+
917
+ Key insights for small model prompt engineering:
918
+
919
+ 1. **State the fact first**: "The capital of Japan is Tokyo. What is the capital of Japan?"
920
+ 2. **Show the answer format**: "Answer: Tokyo" at the end
921
+ 3. **Give calculation steps**: "10 minus 3 equals 7. Then 7 minus 2 equals 5."
922
+ 4. **Be explicit with tools**: "Use calculator tool. Expression: 2 ** 10. Result: 1024"
923
+ 5. **Guide code output**: "Start with: def is_even(n):"
924
+
925
+ ### New Examples
926
+
927
+ | Example | Description |
928
+ |---------|-------------|
929
+ | `07_model_comparison.py` | Benchmark 15 tests across models with category breakdown |
930
+ | `08_robust_comparison.py` | Progress-saving comparison for unstable connections |
931
+ | `09_expanded_benchmark.py` | 25 tests across 8 categories including tool chaining |
932
+ | `10_skills_demo.py` | Demonstrate Agent Skills system with skill-creator |
933
+ | `11_skill_creator_test.py` | Benchmark skill creation across multiple small models |
934
+
935
+ ### Test Categories (15 tests)
936
+
937
+ | Category | Tests | Description |
938
+ |----------|-------|-------------|
939
+ | Math | Multiply, Add, Divide | Basic arithmetic (no tools) |
940
+ | Reasoning | Apples, Sequence, Logic | Multi-step reasoning |
941
+ | Knowledge | Japan, France, Brazil | World knowledge |
942
+ | Calc | Multiply, Divide, Power | Calculator tool usage |
943
+ | Code | is_even, reverse, max_num | Python code generation |
944
+
945
+ ---
946
+
947
+ ## BitNet Benchmark Results
948
+
949
+ LocalClaw R03 has been tested with **Microsoft BitNet-b1.58-2B-4T** β€” a 2B parameter model with 1.58-bit ternary weights, designed for efficient CPU inference.
950
+
951
+ ### Test Results Summary
952
+
953
+ | Test Suite | Score | Time | Notes |
954
+ |------------|-------|------|-------|
955
+ | **Model Comparison** (15 tests) | **13/15 (87%)** | 394s | 5 categories |
956
+ | **Robust Comparison** (22 tests) | **19/22 (86%)** | ~6min | Incremental save |
957
+ | **Comprehensive Test** (7 tests) | **6/7 (86%)** | ~90s | Basic + Reasoning + Code |
958
+
959
+ ### Category Breakdown (Model Comparison - 15 tests)
960
+
961
+ | Category | Score | Pass Rate |
962
+ |----------|-------|-----------|
963
+ | **Math** | 3/3 | 100% βœ… |
964
+ | **Code** | 3/3 | 100% βœ… |
965
+ | **Calc (with tools)** | 3/3 | 100% βœ… |
966
+ | **Reasoning** | 2/3 | 67% |
967
+ | **Knowledge** | 2/3 | 67% |
968
+ | **Total** | **13/15** | **87%** |
969
+
970
+ ### Failed Tests
971
+
972
+ | Test | Expected | Got | Category |
973
+ |------|----------|-----|----------|
974
+ | Apples (reasoning) | 5 | 7 | Reasoning |
975
+ | Brazil capital | BrasΓ­lia | SΓ£o Paulo | Knowledge |
976
+
977
+ ### Performance Notes
978
+
979
+ | Metric | Value |
980
+ |--------|-------|
981
+ | **Avg response time** | 5-10s (simple), 100s+ (tool use) |
982
+ | **Tool calling** | ReAct fallback (no native support) |
983
+ | **Context window** | Default (model dependent) |
984
+ | **Inference** | CPU-efficient ternary weights |
985
+
986
+ ### BitNet vs Ollama Small Models
987
+
988
+ | Rank | Model | Score | Params | Backend |
989
+ |:----:|-------|------:|-------:|---------|
990
+ | πŸ₯‡ | `qwen2.5-coder:0.5b-instruct-q4_k_m` | 14/15 (93%) | 494M | Ollama |
991
+ | πŸ₯ˆ | **`BitNet-b1.58-2B-4T`** | **13/15 (87%)** | **2B** | **BitNet** |
992
+ | πŸ₯‰ | `granite3.1-moe:1b` | 12/15 (80%) | 1B MoE | Ollama |
993
+ | 4 | `llama3.2:1b` | 12/15 (80%) | 1.2B | Ollama |
994
+
995
+ > **Note**: BitNet uses 1.58-bit ternary weights, making it highly efficient for CPU inference despite having 2B parameters.
996
+
997
+ ### BitNet Setup for Benchmarking
998
+
999
+ ```bash
1000
+ # 1. Clone and compile BitNet
1001
+ python localclaw/bitnet_setup.py
1002
+
1003
+ # 2. Start the BitNet server
1004
+ ./build/bin/llama-server -m models/BitNet-b1.58-2B-4T/ggml-model-i2_s.gguf
1005
+
1006
+ # 3. Run benchmark
1007
+ export LOCALCLAW_BACKEND=bitnet
1008
+ python examples/07_model_comparison.py
1009
+
1010
+ # 4. Run with ACP tracking
1011
+ export LOCALCLAW_BACKEND=bitnet
1012
+ python examples/07_model_comparison_acp.py
1013
+ ```
1014
+
1015
+ ### Observations
1016
+
1017
+ 1. **Excellent for CPU-only systems** β€” ternary weights enable fast inference without GPU
1018
+ 2. **Solid tool usage** β€” ReAct fallback handles calculator tools reliably
1019
+ 3. **Code generation strong** β€” 100% pass rate on function writing tasks
1020
+ 4. **Multi-step reasoning challenges** β€” the "apples" test requires tracking state
1021
+ 5. **Knowledge gaps** β€” SΓ£o Paulo is commonly mistaken for Brazil's capital
1022
+
1023
+ ---
1024
+
1025
+ ## About
1026
+
1027
+ **🦞 LocalClaw R03** is written and maintained by **VTSTech**.
1028
+
1029
+ - 🌐 Website: [https://www.vts-tech.org](https://www.vts-tech.org)
1030
+ - πŸ“¦ GitHub: [https://github.com/VTSTech/LocalClaw](https://github.com/VTSTech/LocalClaw)
1031
+ - πŸ’» More projects: [https://github.com/VTSTech](https://github.com/VTSTech)
1032
+
1033
+ ---
1034
+
1035
+ > **Testing Status**: LocalClaw has been tested with both **Ollama** (11 small models) and **BitNet** (BitNet-b1.58-2B-4T) backends. BitNet achieved **87%** on the benchmark, making it the 2nd best performer overall. See **Tested Small Models** and **BitNet Benchmark Results** sections for details.