PyPI - localclaw - Versions diffs - 0.3.0__tar.gz - Mend

localclaw 0.3.0__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (49) hide show

localclaw-0.3.0/PKG-INFO +1035 -0
localclaw-0.3.0/README.md +1001 -0
localclaw-0.3.0/localclaw/__init__.py +120 -0
localclaw-0.3.0/localclaw/__main__.py +27 -0
localclaw-0.3.0/localclaw/acp_plugin.py +2344 -0
localclaw-0.3.0/localclaw/bitnet_client.py +151 -0
localclaw-0.3.0/localclaw/bitnet_setup.py +30 -0
localclaw-0.3.0/localclaw/cli.py +1234 -0
localclaw-0.3.0/localclaw/config.py +99 -0
localclaw-0.3.0/localclaw/core/agent.py +1875 -0
localclaw-0.3.0/localclaw/core/math_prompts.py +311 -0
localclaw-0.3.0/localclaw/core/memory.py +191 -0
localclaw-0.3.0/localclaw/core/ollama_client.py +273 -0
localclaw-0.3.0/localclaw/core/orchestrator.py +191 -0
localclaw-0.3.0/localclaw/core/orchestrator_enhanced.py +393 -0
localclaw-0.3.0/localclaw/core/tools.py +275 -0
localclaw-0.3.0/localclaw/model_discovery.py +341 -0
localclaw-0.3.0/localclaw/skills/__init__.py +24 -0
localclaw-0.3.0/localclaw/skills/acp/SKILL.md +309 -0
localclaw-0.3.0/localclaw/skills/datetime/SKILL.md +25 -0
localclaw-0.3.0/localclaw/skills/loader.py +445 -0
localclaw-0.3.0/localclaw/skills/skill-creator/SKILL.md +111 -0
localclaw-0.3.0/localclaw/skills/skill-creator/scripts/__init__.py +0 -0
localclaw-0.3.0/localclaw/skills/skill-creator/scripts/aggregate_benchmark.py +401 -0
localclaw-0.3.0/localclaw/skills/skill-creator/scripts/generate_report.py +326 -0
localclaw-0.3.0/localclaw/skills/skill-creator/scripts/improve_description.py +247 -0
localclaw-0.3.0/localclaw/skills/skill-creator/scripts/init_skill.py +378 -0
localclaw-0.3.0/localclaw/skills/skill-creator/scripts/package_skill.py +139 -0
localclaw-0.3.0/localclaw/skills/skill-creator/scripts/quick_validate.py +159 -0
localclaw-0.3.0/localclaw/skills/skill-creator/scripts/run_eval.py +310 -0
localclaw-0.3.0/localclaw/skills/skill-creator/scripts/run_loop.py +328 -0
localclaw-0.3.0/localclaw/skills/skill-creator/scripts/security_scan.py +144 -0
localclaw-0.3.0/localclaw/skills/skill-creator/scripts/test_package_skill.py +160 -0
localclaw-0.3.0/localclaw/skills/skill-creator/scripts/test_quick_validate.py +72 -0
localclaw-0.3.0/localclaw/skills/skill-creator/scripts/utils.py +47 -0
localclaw-0.3.0/localclaw/skills/skill-creator/scripts/validate.py +147 -0
localclaw-0.3.0/localclaw/skills/web_search/SKILL.md +138 -0
localclaw-0.3.0/localclaw/tools/builtins.py +667 -0
localclaw-0.3.0/localclaw.egg-info/PKG-INFO +1035 -0
localclaw-0.3.0/localclaw.egg-info/SOURCES.txt +47 -0
localclaw-0.3.0/localclaw.egg-info/dependency_links.txt +1 -0
localclaw-0.3.0/localclaw.egg-info/entry_points.txt +5 -0
localclaw-0.3.0/localclaw.egg-info/requires.txt +5 -0
localclaw-0.3.0/localclaw.egg-info/top_level.txt +1 -0
localclaw-0.3.0/pyproject.toml +85 -0
localclaw-0.3.0/setup.cfg +4 -0
localclaw-0.3.0/tests/test_acp_integration.py +210 -0
localclaw-0.3.0/tests/test_acp_subagents.py +302 -0
localclaw-0.3.0/tests/test_agent.py +266 -0

localclaw-0.3.0/PKG-INFO ADDED Viewed

@@ -0,0 +1,1035 @@
+Metadata-Version: 2.4
+Name: localclaw
+Version: 0.3.0
+Summary: A minimal, hackable agentic framework for Ollama and BitNet - local-first AI agent toolkit
+Author-email: VTSTech <veritas@vts-tech.org>
+Maintainer-email: VTSTech <veritas@vts-tech.org>
+License: MIT
+Project-URL: Homepage, https://www.vts-tech.org
+Project-URL: Documentation, https://github.com/VTSTech/LocalClaw#readme
+Project-URL: Repository, https://github.com/VTSTech/LocalClaw
+Project-URL: Issues, https://github.com/VTSTech/LocalClaw/issues
+Project-URL: Changelog, https://github.com/VTSTech/LocalClaw/blob/main/CHANGELOG.md
+Keywords: ai,agent,llm,ollama,bitnet,local-ai,agentic,tool-use,function-calling,cli
+Classifier: Development Status :: 4 - Beta
+Classifier: Environment :: Console
+Classifier: Intended Audience :: Developers
+Classifier: Intended Audience :: Science/Research
+Classifier: License :: OSI Approved :: MIT License
+Classifier: Operating System :: OS Independent
+Classifier: Programming Language :: Python :: 3
+Classifier: Programming Language :: Python :: 3.9
+Classifier: Programming Language :: Python :: 3.10
+Classifier: Programming Language :: Python :: 3.11
+Classifier: Programming Language :: Python :: 3.12
+Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
+Classifier: Topic :: Software Development :: Libraries :: Python Modules
+Classifier: Topic :: Terminals
+Requires-Python: >=3.9
+Description-Content-Type: text/markdown
+Provides-Extra: dev
+Requires-Dist: pytest>=7.0; extra == "dev"
+Requires-Dist: black>=23.0; extra == "dev"
+Requires-Dist: ruff>=0.1.0; extra == "dev"
+# 🦞 LocalClaw R03
+A minimal, hackable agentic framework engineered to run **entirely locally** with [Ollama](https://ollama.com) or [BitNet](https://github.com/microsoft/BitNet).
+Inspired by the architecture of OpenClaw, rebuilt from scratch for local-first operation.
+**Written by [VTSTech](https://www.vts-tech.org)** · [GitHub](https://github.com/VTSTech/LocalClaw)
+---
+## Architecture
+```
+localclaw/
+├── core/
+│   ├── ollama_client.py   # Zero-dependency HTTP wrapper (stdlib urllib only)
+│   ├── tools.py           # Decorator-based tool registry + JSON schema generation
+│   ├── memory.py          # Sliding-window conversation memory with summarization
+│   ├── agent.py           # ReAct loop — native tool-call + text-fallback modes
+│   └── orchestrator.py    # Multi-agent routing (router / pipeline / parallel)
+├── skills/
+│   ├── loader.py          # Agent Skills specification loader (progressive disclosure)
+│   ├── skill-creator/     # OpenClaw skill-creator for generating new skills
+│   ├── acp/               # ACP (Agent Control Panel) skill
+│   ├── datetime/          # Datetime utilities skill
+│   └── web_search/        # Web search skill
+├── tools/
+│   └── builtins.py        # Ready-to-use tools: calculator, shell, file I/O, HTTP, REPL
+├── bitnet_client.py       # R03: BitNet backend client (Microsoft 1.58-bit quantization)
+├── bitnet_setup.py        # R03: BitNet setup/compilation helper
+├── acp_plugin.py          # ACP integration for activity tracking and A2A messaging
+├── model_discovery.py     # R03: Dynamic model discovery for both backends
+└── examples/
+    ├── 01_basic_agent.py           # Simple Q&A demo
+    ├── 02_tool_agent.py            # Tool calling demo
+    ├── 03_orchestrator.py          # Multi-agent routing demo
+    ├── 04_comprehensive_test.py    # Full test suite (supports BitNet)
+    ├── 04_comprehensive_test_acp.py # ACP-tracked version
+    ├── 05_tool_tests.py            # Tool-specific tests
+    ├── 06_interactive_chat.py      # Interactive CLI chat
+    ├── 07_model_comparison.py      # Compare models on 15 tests (3 per category)
+    ├── 07_model_comparison_acp.py  # ACP-tracked version with model logging
+    ├── 08_robust_comparison.py     # Progress-saving comparison for unstable connections
+    ├── 08_robust_comparison_acp.py # ACP-tracked version with resumability
+    ├── 09_expanded_benchmark.py    # 25 tests across 8 categories
+    ├── 10_skills_demo.py           # Agent Skills system demo
+    └── 11_skill_creator_test.py    # Skill creation benchmark across models
+```
+### Test Scripts
+```
+test.sh          # Bash: Run all 11 examples (Linux/macOS/Colab)
+test-quick.sh    # Bash: Run 7 quick tests (skips benchmarks)
+run.sh           # Bash: Interactive menu for single example
+test-bitnet.sh   # Bash: Run BitNet benchmark tests
+test.cmd         # Batch: Run all 11 examples (Windows)
+test-quick.cmd   # Batch: Run 7 quick tests (Windows)
+run.cmd          # Batch: Interactive menu for single example (Windows)
+test-bitnet.cmd  # Batch: Run BitNet benchmark tests (Windows)
+```
+### Core design decisions
+| Concern | Approach |
+|---|---|
+| **HTTP Client** | Zero external dependencies — uses Python stdlib `urllib` only |
+| **Backends** | Ollama (default) or BitNet (R03) — switch via `--backend` flag |
+| **Tool calling** | Native Ollama tool-call protocol when supported; automatic ReAct text-parsing fallback for other models |
+| **Memory** | Sliding window — older turns are archived and optionally compressed via LLM summarization |
+| **Tools** | Decorator-based, auto-generates JSON schemas from Python type hints |
+| **Orchestration** | Router (LLM picks agent), Pipeline (chain), or Parallel (concurrent + merge) |
+| **Streaming** | First-class via generator interface |
+| **Error handling** | Automatic retry with exponential backoff for transient network/server errors |
+| **Security** | Path validation, command blocklist, SSRF protection (R03) |
+---
+## Installation
+### From PyPI (Recommended)
+```bash
+pip install localclaw
+# Or install from GitHub for the latest development version:
+pip install git+https://github.com/VTSTech/LocalClaw.git
+```
+### From Source
+```bash
+# Clone the repository
+git clone https://github.com/VTSTech/LocalClaw.git
+cd LocalClaw
+# Install in development mode
+pip install -e .
+```
+### No Installation Required
+LocalClaw uses only Python stdlib — no dependencies! You can also just copy the `localclaw` directory into your project:
+```bash
+# Just copy and use
+cp -r localclaw /path/to/your/project/
+```
+### Setup Ollama
+```bash
+# Make sure Ollama is running:
+ollama serve
+# Pull a model:
+ollama pull qwen2.5-coder:0.5b-instruct-q4_k_m
+```
+### Usage After Installation
+```bash
+# Use the CLI command
+localclaw chat --model llama3.1:8b
+# Or use as a module
+python -m localclaw chat --model llama3.1:8b
+# Or in Python code
+from localclaw import Agent
+agent = Agent(model="llama3.1:8b")
+```
+### BitNet Backend (R03)
+LocalClaw supports Microsoft's BitNet for 1.58-bit ternary weight models — highly efficient CPU inference.
+#### Supported Models
+| Model | Size | HuggingFace Repo |
+|-------|------|------------------|
+| **BitNet-b1.58-2B-4T** | ~0.4 GB | `microsoft/BitNet-b1.58-2B-4T` |
+| **Falcon3-1B-Instruct** | ~1 GB | `tiiuae/Falcon3-1B-Instruct-1.58bit` |
+| **Falcon3-3B-Instruct** | ~3 GB | `tiiuae/Falcon3-3B-Instruct-1.58bit` |
+| **Falcon3-7B-Instruct** | ~7 GB | `tiiuae/Falcon3-7B-Instruct-1.58bit` |
+| **Falcon3-10B-Instruct** | ~10 GB | `tiiuae/Falcon3-10B-Instruct-1.58bit` |
+#### Setup (One Command with huggingface-cli)
+BitNet's `setup_env.py` handles everything: download, convert to GGUF, quantize, and compile kernels.
+```bash
+# Clone BitNet
+git clone --recursive https://github.com/microsoft/BitNet.git
+cd BitNet
+pip install -r requirements.txt
+# Download, convert, and prepare a model (choose one):
+python setup_env.py --hf-repo microsoft/BitNet-b1.58-2B-4T -q i2_s      # Recommended
+python setup_env.py --hf-repo tiiuae/Falcon3-1B-Instruct-1.58bit -q i2_s  # Smallest Falcon
+python setup_env.py --hf-repo tiiuae/Falcon3-3B-Instruct-1.58bit -q i2_s  # Best balance
+python setup_env.py --hf-repo tiiuae/Falcon3-7B-Instruct-1.58bit -q i2_s  # Most capable
+```
+This automatically:
+1. Downloads the model from HuggingFace (safetensors format)
+2. Converts to GGUF format
+3. Quantizes to `i2_s` (1.58-bit ternary)
+4. Compiles optimized CPU kernels
+#### Manual Download (wget)
+If you prefer not to use huggingface-cli, download directly with wget:
+```bash
+# Create model directory
+mkdir -p models/Falcon3-1B-Instruct-1.58bit
+cd models/Falcon3-1B-Instruct-1.58bit
+# Download model files (~1.3GB for 1B, ~3.2GB for 3B, ~7.5GB for 7B)
+wget https://huggingface.co/tiiuae/Falcon3-1B-Instruct-1.58bit/resolve/main/model.safetensors
+wget https://huggingface.co/tiiuae/Falcon3-1B-Instruct-1.58bit/resolve/main/config.json
+wget https://huggingface.co/tiiuae/Falcon3-1B-Instruct-1.58bit/resolve/main/tokenizer.json
+wget https://huggingface.co/tiiuae/Falcon3-1B-Instruct-1.58bit/resolve/main/tokenizer_config.json
+wget https://huggingface.co/tiiuae/Falcon3-1B-Instruct-1.58bit/resolve/main/special_tokens_map.json
+wget https://huggingface.co/tiiuae/Falcon3-1B-Instruct-1.58bit/resolve/main/generation_config.json
+# Or for BitNet-b1.58-2B-4T (~400MB):
+mkdir -p models/BitNet-b1.58-2B-4T
+cd models/BitNet-b1.58-2B-4T
+wget https://huggingface.co/microsoft/BitNet-b1.58-2B-4T/resolve/main/model.safetensors
+wget https://huggingface.co/microsoft/BitNet-b1.58-2B-4T/resolve/main/config.json
+wget https://huggingface.co/microsoft/BitNet-b1.58-2B-4T/resolve/main/tokenizer.json
+wget https://huggingface.co/microsoft/BitNet-b1.58-2B-4T/resolve/main/tokenizer_config.json
+```
+Then run setup_env.py pointing to your downloaded model:
+```bash
+cd ../..  # Back to BitNet root
+python setup_env.py --model-dir models/Falcon3-1B-Instruct-1.58bit -q i2_s
+```
+#### Model File Sizes
+| Model | model.safetensors | Total Download |
+|-------|-------------------|----------------|
+| Falcon3-1B-Instruct | ~1.3 GB | ~1.4 GB |
+| Falcon3-3B-Instruct | ~3.2 GB | ~3.4 GB |
+| Falcon3-7B-Instruct | ~7.5 GB | ~7.8 GB |
+| BitNet-b1.58-2B-4T | ~400 MB | ~500 MB |
+#### Start the Server
+```bash
+# Start BitNet server (separate terminal)
+./build/bin/llama-server -m models/BitNet-b1.58-2B-4T/ggml-model-i2_s.gguf
+# Or for Falcon models:
+./build/bin/llama-server -m models/Falcon3-1B-Instruct-1.58bit/ggml-model-i2_s.gguf
+```
+#### Use with LocalClaw
+```bash
+# Set BitNet URL (default: http://localhost:8080)
+export BITNET_BASE_URL=http://localhost:8080
+# Chat with BitNet backend
+localclaw chat --backend bitnet --force-react
+# With tools
+localclaw chat --backend bitnet --force-react --tools calculator,shell
+```
+> **Note**: BitNet models require `--force-react` as they don't support native tool calling.
+#### Colab Quick Start
+```bash
+# Cell 1: Setup BitNet with Falcon3-1B (fastest option)
+!git clone --recursive https://github.com/microsoft/BitNet.git
+%cd BitNet
+!pip install -r requirements.txt
+!python setup_env.py --hf-repo tiiuae/Falcon3-1B-Instruct-1.58bit -q i2_s
+# Cell 2: Start server in background
+import subprocess, time
+server = subprocess.Popen(
+    ['./build/bin/llama-server', '-m', 'models/Falcon3-1B-Instruct-1.58bit/ggml-model-i2_s.gguf', '--port', '8080'],
+    stdout=subprocess.PIPE, stderr=subprocess.PIPE
+)
+time.sleep(5)  # Wait for server startup
+# Cell 3: Clone and run LocalClaw
+%cd /content
+!git clone https://github.com/VTSTech/LocalClaw.git
+%cd LocalClaw
+!localclaw chat --backend bitnet --force-react
+```
+#### Model Comparison
+| Model | Speed | Quality | Best For |
+|-------|-------|---------|----------|
+| BitNet-b1.58-2B-4T | ⚡⚡⚡ | Good | Quick tasks, testing |
+| Falcon3-1B-Instruct | ⚡⚡⚡ | Good | Fastest inference |
+| Falcon3-3B-Instruct | ⚡⚡ | Better | Balanced performance |
+| Falcon3-7B-Instruct | ⚡ | Best | Complex reasoning |
+> **BitNet Benchmark Results**: BitNet-b1.58-2B-4T achieved **87%** on the LocalClaw benchmark — see **BitNet Benchmark Results** section below.
+---
+## Quick start
+### 1. Single prompt
+```bash
+# Simple Q&A
+localclaw run "What is the capital of Japan?"
+# With streaming output
+localclaw run "Tell me a joke." --stream
+# Specify a model
+localclaw run "Explain quantum computing" -m llama3.2:3b
+```
+### 2. Interactive chat
+```bash
+# Start interactive session
+localclaw chat -m qwen2.5-coder:0.5b
+# With tools enabled
+localclaw chat -m llama3.1:8b --tools calculator,shell,read_file,write_file
+# With skills loaded
+localclaw chat -m llama3.2:3b --skills skill-creator --tools write_file,shell
+# Fast mode (reduced context for speed)
+localclaw chat -m qwen2.5-coder:0.5b --fast --verbose
+```
+### 3. Using BitNet backend
+```bash
+# BitNet requires --force-react for tool support
+localclaw chat --backend bitnet --force-react
+# Run single prompt with BitNet
+localclaw run "Calculate 17 * 23" --backend bitnet --tools calculator
+```
+### 4. With ACP tracking
+```bash
+# Enable ACP for activity monitoring
+localclaw chat -m qwen2.5-coder:0.5b --acp --tools shell,read_file,write_file
+# Single prompt with ACP
+localclaw run "What is 2+2?" --acp
+```
+---
+## CLI Commands
+| Command | Description |
+|---------|-------------|
+| `run "prompt"` | Run single prompt and exit |
+| `chat` | Interactive multi-turn conversation |
+| `models` | List available Ollama models |
+| `tools` | List built-in tools |
+| `skills` | List available Agent Skills |
+### CLI Flags
+| Flag | Description |
+|------|-------------|
+| `-m`, `--model` | Model name (default: qwen2.5-coder:0.5b) |
+| `--tools` | Comma-separated tool list |
+| `--skills` | Comma-separated skill list |
+| `--backend` | `ollama` or `bitnet` |
+| `--force-react` | Force ReAct text parsing |
+| `--acp` | Enable ACP integration |
+| `-v`, `--verbose` | Show tool calls and timing |
+| `--debug` | Show detailed debug info |
+| `--fast` | Preset: reduced context for speed |
+| `--warmup` | Pre-load model before chat |
+| `--stream` | Stream output token-by-token |
+| `--temperature` | Sampling temperature (0.0-2.0) |
+| `--num-ctx` | Context window size |
+| `--num-predict` | Max output tokens |
+### Interactive Commands (in chat)
+| Command | Description |
+|---------|-------------|
+| `/help` | Show available commands |
+| `/status` | Show session status |
+| `/tools` | List active tools |
+| `/skills` | List active skills |
+| `/reset` | Clear conversation history |
+| `/undo` | Remove last exchange |
+| `/retry` | Retry last message |
+| `/a2a` | Process pending A2A messages |
+| `/export` | Export to markdown |
+| `exit` | End session |
+---
+## Built-in Tools
+| Tool | Description |
+|------|-------------|
+| `calculator` | Evaluate math expressions |
+| `python_repl` | Execute Python code |
+| `shell` | Run shell commands |
+| `read_file` | Read file contents |
+| `write_file` | Write content to file |
+| `list_directory` | List directory contents |
+| `http_get` | HTTP GET request |
+| `save_note` | Save a note to memory |
+| `get_note` | Retrieve saved notes |
+```bash
+# List all tools
+localclaw tools
+# Use specific tools
+localclaw chat --tools calculator,python_repl,shell
+```
+---
+## Built-in Skills
+| Skill | Description |
+|-------|-------------|
+| `skill-creator` | Generate new Agent Skills from requests |
+| `datetime` | Date/time formatting and calculations |
+| `web_search` | Web search capabilities |
+```bash
+# List all skills
+localclaw skills
+# Use skills in chat
+localclaw chat --skills skill-creator --tools write_file
+```
+---
+## Supported models (tool-calling)
+The following model families support native tool calling in Ollama and are auto-detected:
+**Meta Llama**: `llama3`, `llama3.1`, `llama3.2`, `llama3.3`, `llama3-groq-tool-use`
+**Mistral AI**: `mistral`, `mixtral`, `mistral-nemo`, `mistral-small`, `mistral-large`, `codestral`, `ministral`
+**Alibaba Qwen**: `qwen2`, `qwen2.5`, `qwen3`, `qwen35`, `qwen2.5-coder`, `qwen2-math`
+**Cohere**: `command-r`, `command-r7b`
+**DeepSeek**: `deepseek`, `deepseek-coder`, `deepseek-v2`, `deepseek-v3`
+**Microsoft Phi**: `phi-3`, `phi3`, `phi-4`
+**Google Gemma**: `functiongemma` (designed for function calling)
+**Others**: `yi-`, `yi1.5`, `internlm2`, `internlm2.5`, `solar`, `glm4`, `chatglm`, `firefunction`, `hermes`, `nemotron`, `cogito`, `athene`
+All other models fall back to **ReAct text-parsing** automatically.
+---
+## Tested Small Models (≤1.5B parameters)
+The following models have been tested with a **15-test benchmark** (3 tests per category: Math, Reasoning, Knowledge, Calc Tool, Code). Prompts are optimized for small model comprehension.
+### Rankings (Updated)
+| Rank | Model | Score | Time | Math | Reason | Know | Calc | Code |
+|:----:|-------|------:|-----:|:----:|:------:|:----:|:----:|:----:|
+| 🥇 | `qwen2.5-coder:0.5b-instruct-q4_k_m` | **14/15 (93%)** | ~80s | **3/3** | 2/3 | 2/3 | **3/3** | **3/3** |
+| 🥈 | **`BitNet-b1.58-2B-4T`** (BitNet) | **13/15 (87%)** | ~394s | **3/3** | 2/3 | 2/3 | **3/3** | **3/3** |
+| 🥉 | `granite3.1-moe:1b` | **12/15 (80%)** | ~60s | **3/3** | 2/3 | **3/3** | 1/3 | **3/3** |
+| 4 | `llama3.2:1b` | **12/15 (80%)** | ~600s | **3/3** | 1/3 | 2/3 | **3/3** | **3/3** |
+| 5 | `gemma3:270m` | 10/15 (67%) | ~75s | **3/3** | 1/3 | 1/3 | 2/3 | **3/3** |
+| 6 | `qwen3:0.6b` | ~9/12 | ~130s | 2/3 | **3/3** | **3/3** | 0/3 | — |
+| 7 | `granite4:350m` | 8/15 (53%) | ~97s | 2/3 | 1/3 | 2/3 | 0/3 | **3/3** |
+| 8 | `qwen2.5:0.5b` | 10/15 (67%) | ~107s | 1/3 | **3/3** | **3/3** | 0/3 | **3/3** |
+| 9 | `qwen2-math:1.5b` | 12/15 (80%) | ~611s | **3/3** | **3/3** | **3/3** | ❌ | **3/3** |
+| 10 | `tinyllama:latest` | 9/15 (60%) | ~587s | 2/3 | 2/3 | **3/3** | 0/3 | 2/3 |
+| 11 | `smollm:135m` | 7/15 (47%) | ~285s | 0/3 | 2/3 | 2/3 | 0/3 | **3/3** |
+| 12 | `functiongemma:270m` | 1/15 (7%) | ~90s | 0/3 | 0/3 | 0/3 | 0/3 | 1/3 |
+> **Note**: Scores vary between runs due to model non-determinism. The `qwen2.5-coder:0.5b` achieved 100% in some runs.
+### Model Details
+| Model | Params | Size | Speed | Tool Support | Notes |
+|-------|--------|------|-------|--------------|-------|
+| `qwen2.5-coder:0.5b` | 494M | ~400MB | ⚡ Fast | ✅ Native | **🏆 Best overall!** Excellent tool usage |
+| **`BitNet-b1.58-2B-4T`** | **2B** | **~1.3GB** | **⚡ Medium** | **⚠️ ReAct** | **🥈 2nd place!** CPU-efficient ternary weights |
+| `granite3.1-moe:1b` | 1B MoE | ~1.4GB | ⚡ Medium | ✅ Native | Strong knowledge, HTTP 500 on long context |
+| `llama3.2:1b` | 1.2B | ~1.3GB | 🐢 Slow | ✅ Native | **128k context!** Thorough but slow |
+| `gemma3:270m` | 270M | ~292MB | ⚡⚡ Fastest | ⚠️ ReAct JSON | Uses JSON ReAct format, Math & Code champion |
+| `qwen3:0.6b` | 600M | ~523MB | ⚡ Medium | ⚠️ Text | Perfect reasoning but Calc returns empty |
+| `granite4:350m` | 350M | ~708MB | ⚡ Fast | ❌ Refused | **Refuses calculator** - safety filter |
+| `qwen2.5:0.5b` | 494M | ~398MB | ⚡ Fast | ⚠️ Text | **Reasoning & Knowledge champ**, Calc fails |
+| `qwen2-math:1.5b` | 1.5B | ~935MB | 🐢 Slow | ❌ No tools | **4 perfect categories!** No tool support |
+| `tinyllama:latest` | 1.1B | ~638MB | 🐢 Slow | ⚠️ Text | Older model, verbose, unstable |
+| `smollm:135m` | 135M | ~92MB | ⚡ Fast | ❌ None | **Smallest** - hallucinates math (7×8=42!) |
+| `functiongemma:270m` | 270M | ~301MB | ⚡ Fast | ❌ Broken | **Worst performer** - returns empty |
+### Category Champions
+| Category | Champion | Score | Notes |
+|----------|----------|-------|-------|
+| **Math** | `qwen2.5-coder:0.5b`, `granite3.1-moe:1b`, `BitNet-b1.58-2B` | 3/3 | Also gemma3:270m |
+| **Reasoning** | `qwen2.5:0.5b`, `qwen3:0.6b`, `qwen2-math` | 3/3 | Multiple tied |
+| **Knowledge** | `granite3.1-moe:1b`, `qwen2-math` | 3/3 | Multiple tied at 3/3 |
+| **Calc** | `qwen2.5-coder:0.5b`, `llama3.2:1b`, `BitNet-b1.58-2B` | 3/3 | 100% tool usage with ReAct |
+| **Code** | Many models | 3/3 | Code generation is easy for small models! |
+### Test Categories
+| Category | Tests | What it measures |
+|----------|-------|------------------|
+| **Math** | Multiply, Add, Divide | Basic arithmetic without tools |
+| **Reasoning** | Apples, Sequence, Logic | Multi-step reasoning and deduction |
+| **Knowledge** | Japan, France, Brazil capitals | World knowledge recall |
+| **Calc** | Multiply, Divide, Power | Tool usage with calculator |
+| **Code** | is_even, reverse, max_num | Python function generation |
+### Recommendations
+| Use Case | Recommended Model | Why |
+|----------|-------------------|-----|
+| **General use** | `qwen2.5-coder:0.5b-instruct-q4_k_m` | Best all-around, fast, great tool usage |
+| **Large context** | `llama3.2:1b` | **128k context window** - handles long conversations |
+| **Math tasks** | `qwen2.5-coder:0.5b` or `qwen2-math:1.5b` | Perfect math scores |
+| **Reasoning tasks** | `qwen2.5:0.5b` or `qwen3:0.6b` | Perfect reasoning |
+| **Tool usage** | `qwen2.5-coder:0.5b` | Most reliable tool calling |
+| **Fastest inference** | `gemma3:270m` | 270M params, fastest responses |
+| **No tools needed** | `qwen2-math:1.5b` | 4/5 categories perfect (no Calc) |
+| **Smallest footprint** | `smollm:135m` | 92MB - but expect hallucinations |
+### ⚠️ Models to Avoid
+| Model | Issue |
+|-------|-------|
+| `functiongemma:270m` | Despite the name, terrible at function calling - returns empty or refuses |
+| `smollm:135m` | Hallucinates wrong math (7×8=42), only 7/15 score |
+| `granite4:350m` | Refuses calculator tools (safety filter) |
+### Known Issues with Small Models
+1. **Tool calling variations**:
+   - `granite4:350m`: Refuses calculator ("I'm sorry, but I can't assist with that")
+   - `functiongemma:270m`: Asks for clarification instead of using tools
+   - `qwen2.5:0.5b`, `qwen3:0.6b`: Returns empty responses on Calc tests
+   - `qwen2-math:1.5b`: HTTP 400 - doesn't support tool calling at all
+2. **Math hallucinations**: `smollm:135m` says "7×8=42", `tinyllama` says "7×8=45"
+3. **Power operator confusion**: `gemma3:270m` reads `2**10` as `2*10=20`
+4. **Reasoning failures**: Some models answer "8" for sequence "2,4,6,8,?" (repeat last)
+5. **Stability issues**:
+   - `granite3.1-moe:1b`: HTTP 500 crashes (server EOF)
+   - `tinyllama`, `qwen3:0.6b`: HTTP 524 timeouts
+6. **Empty responses**: `functiongemma:270m` returns empty strings on most tests
+---
+## Skills (Agent Skills Specification)
+🦞 LocalClaw R03 supports the **[Agent Skills](https://agentskills.io/)** specification for reusable instruction bundles.
+### Skill Structure
+```
+skills/
+└── my-skill/
+    ├── SKILL.md          # Required: name, description, instructions
+    ├── scripts/          # Optional: executable scripts
+    ├── references/       # Optional: additional docs
+    └── assets/           # Optional: templates, images
+```
+### SKILL.md Format
+```yaml
+---
+name: calculator
+description: Perform mathematical calculations. Use when the user needs to compute expressions.
+---
+# Calculator Skill
+Instructions for the model on how to use this skill...
+```
+### Using Skills
+```bash
+# Load skills via CLI
+localclaw chat --skills skill-creator --tools write_file,shell
+# Multiple skills
+localclaw chat --skills datetime,web_search --tools calculator
+```
+### Progressive Disclosure
+Skills follow a three-level loading system:
+1. **Metadata** (~100 tokens): `name` + `description` loaded at startup
+2. **Instructions** (<500 lines): Full `SKILL.md` body loaded when skill triggers
+3. **Resources** (as needed): Files in `scripts/`, `references/`, `assets/` loaded on demand
+### Built-in Skills
+| Skill | Description |
+|-------|-------------|
+| `skill-creator` | OpenClaw's platform-agnostic skill generator. Creates new skills from user requests. |
+| `datetime` | Date and time utilities for formatting, parsing, and calculations. |
+| `web_search` | Web search capabilities for retrieving information from the internet. |
+---
+## Orchestrator modes
+| Mode | Behaviour |
+|---|---|
+| `router` | A small routing LLM picks the best agent for each request |
+| `pipeline` | Agents run sequentially — each receives the previous agent's output |
+| `parallel` | All agents run concurrently; results are merged with attribution |
+---
+## Running the examples
+```bash
+# Make sure Ollama is serving and you have a model pulled
+ollama pull qwen2.5-coder:0.5b-instruct-q4_k_m
+# Or use a remote Ollama instance by editing localclaw/core/ollama_client.py
+# Quick test suite (recommended first run)
+bash test-quick.sh      # Linux/macOS/Colab
+test-quick.cmd          # Windows
+# Full test suite (all 11 examples)
+bash test.sh            # Linux/macOS/Colab
+test.cmd                # Windows
+# Interactive menu
+bash run.sh             # Linux/macOS/Colab
+run.cmd                 # Windows
+# Run individual examples
+python examples/01_basic_agent.py
+python examples/02_tool_agent.py
+python examples/03_orchestrator.py
+python examples/04_comprehensive_test.py
+python examples/05_tool_tests.py
+python examples/06_interactive_chat.py
+python examples/07_model_comparison.py
+python examples/08_robust_comparison.py
+python examples/09_expanded_benchmark.py
+python examples/10_skills_demo.py
+python examples/11_skill_creator_test.py
+```
+---
+## ACP Integration (Agent Control Panel)
+🦞 LocalClaw R03 supports **[ACP (Agent Control Panel)](https://github.com/VTSTech/ACP-Agent-Control-Panel)** for centralized activity tracking, token monitoring, and multi-agent coordination.
+### What is ACP?
+ACP is a monitoring and observability protocol for AI agents. Unlike communication protocols (MCP, A2A), ACP sits alongside your agents and provides:
+- **Activity Tracking**: Real-time monitoring of all agent actions
+- **Token Management**: Context window usage estimation per agent
+- **Multi-Agent Coordination**: Track multiple agents in one session
+- **STOP/Resume Control**: Emergency stop capability
+- **Session Persistence**: State preserved across restarts
+### Enable ACP
+```bash
+# Run with ACP tracking
+localclaw chat --acp --tools shell,read_file,write_file -m qwen2.5-coder:0.5b
+# Run single prompt with ACP
+localclaw run --acp "What is 2+2?"
+```
+### Configuration
+Set your ACP server URL via environment variables:
+```bash
+# Local ACP
+export ACP_URL="http://localhost:8766"
+# Remote ACP (cloudflare tunnel)
+export ACP_URL="https://your-tunnel.trycloudflare.com"
+# Credentials
+export ACP_USER="admin"
+export ACP_PASS="secret"
+```
+Or edit `localclaw/config.py` for persistent settings.
+### What Gets Logged
+| Activity | Description |
+|----------|-------------|
+| **Bootstrap** | Session start, identity establishment |
+| **User messages** | All prompts sent to the model |
+| **Assistant messages** | All model responses |
+| **Tool calls** | Shell commands, file operations, etc. |
+| **Tool results** | Outcomes from tool execution |
+### Per-Agent Token Tracking
+When multiple agents connect to the same ACP session:
+```json
+{
+  "primary_agent": "Super Z",
+  "agent_tokens": {
+    "Super Z": 42000,
+    "LocalClaw": 500
+  },
+  "other_agents_tokens": 500
+}
+```
+- First agent to connect becomes **primary** (owns main context window)
+- Other agents tracked separately in `agent_tokens`
+- Prevents context pollution between agents
+### ACP Server
+To run your own ACP server, see the [ACP Specification](https://github.com/VTSTech/ACP-Agent-Control-Panel):
+```bash
+# ACP is a single Python file
+python VTSTech-GLMACP.py
+# With cloudflare tunnel
+GLMACP_TUNNEL=auto python VTSTech-GLMACP.py
+```
+---
+## Remote Ollama Configuration
+To use a remote Ollama instance (e.g., via Cloudflare tunnel), set the environment variable:
+```bash
+# Local Ollama (default)
+export OLLAMA_URL="http://localhost:11434"
+# Remote Ollama (cloudflare tunnel)
+export OLLAMA_URL="https://your-tunnel.trycloudflare.com"
+```
+Or edit `localclaw/config.py` for persistent settings.
+### Timeout Configuration
+Configure via environment variables:
+```bash
+# Request timeout in seconds (default: 90s for Cloudflare tunnel compatibility)
+export OLLAMA_TIMEOUT=90
+# Max retry attempts for transient errors (default: 3)
+export OLLAMA_MAX_RETRIES=3
+# Initial retry delay in seconds (default: 5s, doubles each retry)
+export OLLAMA_RETRY_DELAY=5
+```
+### Automatic Retry
+LocalClaw automatically retries on transient errors with exponential backoff:
+| Error Code | Description | Retry Behavior |
+|------------|-------------|----------------|
+| HTTP 524 | Cloudflare tunnel timeout | Retries up to 3 times |
+| HTTP 502/503/504 | Server temporarily unavailable | Retries up to 3 times |
+| HTTP 500 | Server error (model loading, memory pressure) | Retries up to 3 times |
+| Timeout | Socket or connection timeout | Retries up to 3 times |
+---
+## Performance Optimization
+### CLI Options for Speed
+```bash
+# Fast mode - reduces context and output for quicker responses
+localclaw chat -m qwen2.5-coder:0.5b --fast --verbose
+# Fine-tuned control
+localclaw chat -m qwen2.5-coder:0.5b --num-ctx 2048 --num-predict 128
+# Warm up model before chat (useful for remote Ollama with cold starts)
+localclaw chat -m qwen2.5-coder:0.5b --warmup --fast
+```
+| Option | Description | Speed Impact |
+|--------|-------------|--------------|
+| `--fast` | Preset: `num_ctx=2048`, `num_predict=256` | 🚀 Significant |
+| `--num-ctx N` | Reduce context window (default varies by model) | 🚀 Significant |
+| `--num-predict N` | Limit max output tokens | ⚡ Moderate |
+| `--warmup` | Pre-load model before first chat | ⚡ Faster first response |
+### Ollama Model Options
+Control model behavior via CLI flags:
+```bash
+# Lower temperature = more deterministic
+localclaw chat -m qwen2.5-coder:0.5b --temperature 0.1
+# Smaller context = faster
+localclaw chat -m qwen2.5-coder:0.5b --num-ctx 2048 --num-predict 128
+# Combined for optimal speed
+localclaw chat -m qwen2.5-coder:0.5b --fast --temperature 0.3
+```
+### Remote Ollama Tips
+When using a **remote Ollama via Cloudflare tunnel**:
+1. **Use `--fast` flag** - Reduces inference time significantly
+2. **Use smaller models** - `qwen2.5-coder:0.5b` is fastest
+3. **Warm up the model** - First request is slowest due to model loading
+4. **Increase timeout if needed**: `export OLLAMA_TIMEOUT=120`
+```bash
+# Recommended for remote Ollama
+localclaw chat -m qwen2.5-coder:0.5b-instruct-q4_k_m \
+    --fast --warmup --verbose \
+    --tools python_repl
+```
+### Why Inference is Slow
+| Factor | Impact | Solution |
+|--------|--------|----------|
+| **Model size** | Larger models = slower | Use smaller quantized models |
+| **Context window** | More context = slower | Use `--num-ctx 2048` or smaller |
+| **Output length** | More tokens = slower | Use `--num-predict 128` |
+| **Remote connection** | Network latency | Use local Ollama if possible |
+| **Cold start** | First load is slowest | Use `--warmup` flag |
+| **GPU unavailable** | CPU inference is slow | Ensure GPU is configured |
+---
+## Recent Improvements
+### R03: BitNet Backend
+🦞 LocalClaw R03 adds **BitNet backend support** for running Microsoft's 1.58-bit quantized models:
+- **New backend**: Switch between Ollama and BitNet via `--backend` flag
+- **Zero-cost inference**: BitNet models run efficiently on CPU
+- **Setup helper**: `bitnet_setup.py` handles cloning and compilation
+- **Note**: BitNet requires ReAct fallback (no native tool support)
+### R03: Enhanced Security
+Built-in tools now have comprehensive security:
+- **Path validation**: Restrict file access to allowed directories
+- **Command blocklist**: Block dangerous commands (`rm`, `sudo`, `chmod`, etc.)
+- **Pattern detection**: Detect dangerous shell patterns (pipes to bash, command substitution)
+- **SSRF protection**: Block private IPs and cloud metadata endpoints in `http_get`
+- **Configurable modes**: `strict`, `permissive`, or `disabled`
+```bash
+# Set security mode
+export LOCALCLAW_SECURITY_MODE=strict
+export LOCALCLAW_ALLOWED_PATHS=/home/user/projects:/tmp
+export LOCALCLAW_BLOCKED_COMMANDS=rm,sudo,dd
+```
+### Zero Dependencies
+🦞 LocalClaw R03 continues to use **only Python stdlib** — no pip install required! The HTTP client uses `urllib` instead of `httpx`.
+### Automatic Error Recovery
+- **HTTP 524/502/503/504/500 retry**: Transient server errors are automatically retried with exponential backoff
+- **Timeout retry**: Socket timeouts are retried automatically
+- **Configurable via environment variables**: `OLLAMA_TIMEOUT`, `OLLAMA_MAX_RETRIES`, `OLLAMA_RETRY_DELAY`
+### Small Model Support
+🦞 LocalClaw R03 handles quirks of small models (≤1.5B parameters):
+- **Fuzzy tool name matching**: Hallucinated tool names like `calculate_expression` are automatically mapped to `calculator`
+- **Argument auto-fixing**: Common wrong argument patterns are corrected (e.g., `{"base": 2, "exponent": 10}` → `{"expression": "2 ** 10"}`)
+- **JSON response cleaning**: When models output tool schemas instead of text answers, LocalClaw falls back to tool results
+- **Unicode normalization**: Accented characters are normalized for comparison (e.g., "Brasília" matches "brasilia")
+- **ReAct text parsing**: Models without native tool support automatically fall back to text-based ReAct format
+### Optimized Test Prompts
+Key insights for small model prompt engineering:
+1. **State the fact first**: "The capital of Japan is Tokyo. What is the capital of Japan?"
+2. **Show the answer format**: "Answer: Tokyo" at the end
+3. **Give calculation steps**: "10 minus 3 equals 7. Then 7 minus 2 equals 5."
+4. **Be explicit with tools**: "Use calculator tool. Expression: 2 ** 10. Result: 1024"
+5. **Guide code output**: "Start with: def is_even(n):"
+### New Examples
+| Example | Description |
+|---------|-------------|
+| `07_model_comparison.py` | Benchmark 15 tests across models with category breakdown |
+| `08_robust_comparison.py` | Progress-saving comparison for unstable connections |
+| `09_expanded_benchmark.py` | 25 tests across 8 categories including tool chaining |
+| `10_skills_demo.py` | Demonstrate Agent Skills system with skill-creator |
+| `11_skill_creator_test.py` | Benchmark skill creation across multiple small models |
+### Test Categories (15 tests)
+| Category | Tests | Description |
+|----------|-------|-------------|
+| Math | Multiply, Add, Divide | Basic arithmetic (no tools) |
+| Reasoning | Apples, Sequence, Logic | Multi-step reasoning |
+| Knowledge | Japan, France, Brazil | World knowledge |
+| Calc | Multiply, Divide, Power | Calculator tool usage |
+| Code | is_even, reverse, max_num | Python code generation |
+---
+## BitNet Benchmark Results
+LocalClaw R03 has been tested with **Microsoft BitNet-b1.58-2B-4T** — a 2B parameter model with 1.58-bit ternary weights, designed for efficient CPU inference.
+### Test Results Summary
+| Test Suite | Score | Time | Notes |
+|------------|-------|------|-------|
+| **Model Comparison** (15 tests) | **13/15 (87%)** | 394s | 5 categories |
+| **Robust Comparison** (22 tests) | **19/22 (86%)** | ~6min | Incremental save |
+| **Comprehensive Test** (7 tests) | **6/7 (86%)** | ~90s | Basic + Reasoning + Code |
+### Category Breakdown (Model Comparison - 15 tests)
+| Category | Score | Pass Rate |
+|----------|-------|-----------|
+| **Math** | 3/3 | 100% ✅ |
+| **Code** | 3/3 | 100% ✅ |
+| **Calc (with tools)** | 3/3 | 100% ✅ |
+| **Reasoning** | 2/3 | 67% |
+| **Knowledge** | 2/3 | 67% |
+| **Total** | **13/15** | **87%** |
+### Failed Tests
+| Test | Expected | Got | Category |
+|------|----------|-----|----------|
+| Apples (reasoning) | 5 | 7 | Reasoning |
+| Brazil capital | Brasília | São Paulo | Knowledge |
+### Performance Notes
+| Metric | Value |
+|--------|-------|
+| **Avg response time** | 5-10s (simple), 100s+ (tool use) |
+| **Tool calling** | ReAct fallback (no native support) |
+| **Context window** | Default (model dependent) |
+| **Inference** | CPU-efficient ternary weights |
+### BitNet vs Ollama Small Models
+| Rank | Model | Score | Params | Backend |
+|:----:|-------|------:|-------:|---------|
+| 🥇 | `qwen2.5-coder:0.5b-instruct-q4_k_m` | 14/15 (93%) | 494M | Ollama |
+| 🥈 | **`BitNet-b1.58-2B-4T`** | **13/15 (87%)** | **2B** | **BitNet** |
+| 🥉 | `granite3.1-moe:1b` | 12/15 (80%) | 1B MoE | Ollama |
+| 4 | `llama3.2:1b` | 12/15 (80%) | 1.2B | Ollama |
+> **Note**: BitNet uses 1.58-bit ternary weights, making it highly efficient for CPU inference despite having 2B parameters.
+### BitNet Setup for Benchmarking
+```bash
+# 1. Clone and compile BitNet
+python localclaw/bitnet_setup.py
+# 2. Start the BitNet server
+./build/bin/llama-server -m models/BitNet-b1.58-2B-4T/ggml-model-i2_s.gguf
+# 3. Run benchmark
+export LOCALCLAW_BACKEND=bitnet
+python examples/07_model_comparison.py
+# 4. Run with ACP tracking
+export LOCALCLAW_BACKEND=bitnet
+python examples/07_model_comparison_acp.py
+```
+### Observations
+1. **Excellent for CPU-only systems** — ternary weights enable fast inference without GPU
+2. **Solid tool usage** — ReAct fallback handles calculator tools reliably
+3. **Code generation strong** — 100% pass rate on function writing tasks
+4. **Multi-step reasoning challenges** — the "apples" test requires tracking state
+5. **Knowledge gaps** — São Paulo is commonly mistaken for Brazil's capital
+---
+## About
+**🦞 LocalClaw R03** is written and maintained by **VTSTech**.
+- 🌐 Website: [https://www.vts-tech.org](https://www.vts-tech.org)
+- 📦 GitHub: [https://github.com/VTSTech/LocalClaw](https://github.com/VTSTech/LocalClaw)
+- 💻 More projects: [https://github.com/VTSTech](https://github.com/VTSTech)
+---
+> **Testing Status**: LocalClaw has been tested with both **Ollama** (11 small models) and **BitNet** (BitNet-b1.58-2B-4T) backends. BitNet achieved **87%** on the benchmark, making it the 2nd best performer overall. See **Tested Small Models** and **BitNet Benchmark Results** sections for details.