titan-synapse 0.1.1
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/CONTRIBUTING.md +187 -0
- package/Cargo.lock +3976 -0
- package/Cargo.toml +10 -0
- package/LICENSE +190 -0
- package/PROGRESS.md +151 -0
- package/README.md +514 -0
- package/TEST_LOG.md +220 -0
- package/config/default.yaml +36 -0
- package/crates/synapse/Cargo.toml +70 -0
- package/crates/synapse/src/cli/bench.rs +44 -0
- package/crates/synapse/src/cli/eval.rs +395 -0
- package/crates/synapse/src/cli/export.rs +45 -0
- package/crates/synapse/src/cli/hub.rs +179 -0
- package/crates/synapse/src/cli/import.rs +35 -0
- package/crates/synapse/src/cli/learn.rs +53 -0
- package/crates/synapse/src/cli/mod.rs +10 -0
- package/crates/synapse/src/cli/models.rs +36 -0
- package/crates/synapse/src/cli/pull.rs +60 -0
- package/crates/synapse/src/cli/status.rs +52 -0
- package/crates/synapse/src/cli/train.rs +99 -0
- package/crates/synapse/src/config.rs +220 -0
- package/crates/synapse/src/dashboard.rs +281 -0
- package/crates/synapse/src/format/manifest.rs +57 -0
- package/crates/synapse/src/format/mod.rs +4 -0
- package/crates/synapse/src/format/packer.rs +213 -0
- package/crates/synapse/src/inference/engine.rs +361 -0
- package/crates/synapse/src/inference/kv_cache.rs +97 -0
- package/crates/synapse/src/inference/lora.rs +166 -0
- package/crates/synapse/src/inference/mod.rs +9 -0
- package/crates/synapse/src/inference/model.rs +167 -0
- package/crates/synapse/src/inference/sampler.rs +133 -0
- package/crates/synapse/src/inference/speculative.rs +153 -0
- package/crates/synapse/src/learn/cloud_fallback.rs +186 -0
- package/crates/synapse/src/learn/engine.rs +109 -0
- package/crates/synapse/src/learn/mod.rs +5 -0
- package/crates/synapse/src/main.rs +185 -0
- package/crates/synapse/src/memory/extractor.rs +201 -0
- package/crates/synapse/src/memory/graph.rs +332 -0
- package/crates/synapse/src/memory/hallucination.rs +259 -0
- package/crates/synapse/src/memory/mod.rs +7 -0
- package/crates/synapse/src/openai.rs +232 -0
- package/crates/synapse/src/server.rs +166 -0
- package/crates/synapse/src/streaming.rs +80 -0
- package/crates/synapse/src/swarm/coordinator.rs +198 -0
- package/crates/synapse/src/swarm/mod.rs +8 -0
- package/crates/synapse/src/swarm/orchestrator.rs +225 -0
- package/crates/synapse/src/swarm/pool.rs +64 -0
- package/crates/synapse/src/swarm/spawner.rs +199 -0
- package/crates/synapse/src/swarm/synthesizer.rs +26 -0
- package/crates/synapse/src/vram/manager.rs +67 -0
- package/crates/synapse/src/vram/mod.rs +3 -0
- package/docker-compose.yml +19 -0
- package/install.sh +311 -0
- package/package.json +36 -0
- package/python/Dockerfile.learn +18 -0
- package/python/requirements.txt +11 -0
- package/python/synapse_learn/__init__.py +0 -0
- package/python/synapse_learn/datasets.py +233 -0
- package/python/synapse_learn/real_eval.py +616 -0
- package/python/synapse_learn/server.py +431 -0
- package/python/synapse_learn/train_base.py +672 -0
- package/python/synapse_learn/train_specialists.py +787 -0
package/README.md
ADDED
|
@@ -0,0 +1,514 @@
|
|
|
1
|
+
```
|
|
2
|
+
███████╗██╗ ██╗███╗ ██╗ █████╗ ██████╗ ███████╗███████╗
|
|
3
|
+
██╔════╝╚██╗ ██╔╝████╗ ██║██╔══██╗██╔══██╗██╔════╝██╔════╝
|
|
4
|
+
███████╗ ╚████╔╝ ██╔██╗ ██║███████║██████╔╝███████╗█████╗
|
|
5
|
+
╚════██║ ╚██╔╝ ██║╚██╗██║██╔══██║██╔═══╝ ╚════██║██╔══╝
|
|
6
|
+
███████║ ██║ ██║ ╚████║██║ ██║██║ ███████║███████╗
|
|
7
|
+
╚══════╝ ╚═╝ ╚═╝ ╚═══╝╚═╝ ╚═╝╚═╝ ╚══════╝╚══════╝
|
|
8
|
+
Tiny models. Big brain. Your hardware. No excuses.
|
|
9
|
+
```
|
|
10
|
+
|
|
11
|
+
<div align="center">
|
|
12
|
+
|
|
13
|
+
**A Rust inference engine that runs a swarm of tiny specialist models<br>that collaborate and learn continuously — on your GPU.**
|
|
14
|
+
|
|
15
|
+
[](LICENSE)
|
|
16
|
+
[](https://www.rust-lang.org/)
|
|
17
|
+
[](#tests)
|
|
18
|
+
[-76B900.svg)](https://developer.nvidia.com/cuda-toolkit)
|
|
19
|
+
|
|
20
|
+
[Quick Start](#-quick-start) · [How It Works](#-how-it-works) · [Architecture](#-architecture) · [Tested Results](#-tested-results) · [Configuration](#%EF%B8%8F-configuration) · [Contributing](#-contributing)
|
|
21
|
+
|
|
22
|
+
</div>
|
|
23
|
+
|
|
24
|
+
---
|
|
25
|
+
|
|
26
|
+
## What if you could run six specialists for the VRAM cost of one?
|
|
27
|
+
|
|
28
|
+
Everyone's racing to make models bigger. **We went the other way.**
|
|
29
|
+
|
|
30
|
+
Synapse runs a **swarm of tiny specialist models** that share a single base and coordinate through a Hebbian router — "pathways that fire together, wire together." Six specialists sharing one base model use **~5GB of VRAM**. A single 70B model needs 35GB and still can't fit on your card.
|
|
31
|
+
|
|
32
|
+
Oh, and they **learn from every conversation** you have. No fine-tuning scripts. No export-retrain-import dance. Just continuous, automatic self-improvement running in the background while you work.
|
|
33
|
+
|
|
34
|
+
No cloud. No API keys. No telemetry. One binary. Your hardware. Your data. Period.
|
|
35
|
+
|
|
36
|
+
---
|
|
37
|
+
|
|
38
|
+
## Features
|
|
39
|
+
|
|
40
|
+
- **Own Inference Engine** — Written from scratch in Rust with [candle](https://github.com/huggingface/candle). Not a wrapper around llama.cpp. Not a shim over vLLM. Ours.
|
|
41
|
+
- **GGUF Model Loading** — Native quantized model support. Load Q4_K_M, Q5_K_M, Q8_0 models directly. Tested with Qwen2.5 models.
|
|
42
|
+
- **Specialist Swarm with Hebbian Routing** — A coordinator routes queries to the right specialist(s). Simple question? One model. Complex task? The swarm convenes **in parallel**. Routing weights strengthen with use.
|
|
43
|
+
- **Metacognitive Confidence** — The system knows what it knows. Each specialist tracks its own performance per domain. Low confidence? Route to cloud fallback. High confidence? Handle locally at 100 tok/s.
|
|
44
|
+
- **Continuous Learning** — QLoRA + DPO self-improvement pipeline via Python sidecar. Every conversation generates training signal. Your model gets smarter the more you use it.
|
|
45
|
+
- **Hallucination Detection** — Cross-references every response against the knowledge graph. Contradictions are flagged. The model knows what it doesn't know.
|
|
46
|
+
- **Live Knowledge Graph** — SQLite-backed graph that updates in real-time during conversations. Auto-extracts facts ("Rust is a programming language" → stored as triple). Stores facts, conversation history, and DPO preference pairs.
|
|
47
|
+
- **Own Model Format (.synapse)** — Bundles base model + LoRA adapters + knowledge graph + training data + agent config into a single shareable file.
|
|
48
|
+
- **OpenAI-Compatible API** — Drop-in replacement. Point your existing tools at `localhost:6900` and everything just works. SSE streaming included.
|
|
49
|
+
- **Cloud Fallback with Auto-Learning** — When a specialist isn't confident, it routes to a cloud API (Ollama, OpenAI, anything OpenAI-compatible). The cloud response is captured as a DPO preference pair. Next time, the specialist handles it locally. The system teaches itself using the cloud as a tutor.
|
|
50
|
+
- **Web Dashboard** — Open `http://localhost:6900` in a browser. Chat with your AI swarm visually. See specialist confidence scores, knowledge graph stats, and Hebbian pathway strengths. Normal people can use it. No terminal required.
|
|
51
|
+
- **Community Specialist Hub** — Share trained specialists on HuggingFace. `synapse hub search python` finds community-trained specialists. `synapse hub install user/synapse-python-expert` installs them. `synapse hub push my_expert` shares yours.
|
|
52
|
+
- **Specialist Auto-Spawning** — When the system detects repeated failures in an uncovered domain, it proposes and creates new specialists automatically. A music producer ends up with `audio_expert`, `midi_expert`, `mixing_expert` without configuring anything.
|
|
53
|
+
- **Standardized Evaluation** — `synapse eval` runs MMLU, HumanEval, MT-Bench, and Safety benchmarks — the same ones OpenAI, Anthropic, and Meta use. Apples-to-apples comparison with the big models.
|
|
54
|
+
- **Public Dataset Training** — Train specialists on curated public datasets (OpenWebMath, The Stack, SlimPajama, Alpaca-Cleaned). Clean, factual data. No garbage in, no garbage out.
|
|
55
|
+
- **Single Binary** — `cargo build --release` gives you one binary. No Python environment required for inference. No Docker. No "please install these 47 things first."
|
|
56
|
+
|
|
57
|
+
---
|
|
58
|
+
|
|
59
|
+
## Quick Start
|
|
60
|
+
|
|
61
|
+
```bash
|
|
62
|
+
# Build from source
|
|
63
|
+
git clone https://github.com/Djtony707/titan-synapse
|
|
64
|
+
cd titan-synapse && cargo build --release
|
|
65
|
+
|
|
66
|
+
# Pull a model (downloads from HuggingFace)
|
|
67
|
+
./target/release/synapse pull qwen3-3b
|
|
68
|
+
|
|
69
|
+
# Start the engine
|
|
70
|
+
./target/release/synapse up
|
|
71
|
+
```
|
|
72
|
+
|
|
73
|
+
That's it. You now have an AI inference engine running on your GPU.
|
|
74
|
+
|
|
75
|
+
```bash
|
|
76
|
+
# Chat with it (OpenAI-compatible API)
|
|
77
|
+
curl http://localhost:6900/v1/chat/completions \
|
|
78
|
+
-H "Content-Type: application/json" \
|
|
79
|
+
-d '{
|
|
80
|
+
"model": "synapse",
|
|
81
|
+
"messages": [{"role": "user", "content": "Write a Python function to check if a number is prime"}]
|
|
82
|
+
}'
|
|
83
|
+
```
|
|
84
|
+
|
|
85
|
+
Works with any OpenAI-compatible client:
|
|
86
|
+
|
|
87
|
+
```python
|
|
88
|
+
from openai import OpenAI
|
|
89
|
+
|
|
90
|
+
client = OpenAI(base_url="http://localhost:6900/v1", api_key="not-needed")
|
|
91
|
+
response = client.chat.completions.create(
|
|
92
|
+
model="synapse",
|
|
93
|
+
messages=[{"role": "user", "content": "Hello from the swarm"}]
|
|
94
|
+
)
|
|
95
|
+
```
|
|
96
|
+
|
|
97
|
+
---
|
|
98
|
+
|
|
99
|
+
## How It Works
|
|
100
|
+
|
|
101
|
+
### The Core Insight
|
|
102
|
+
|
|
103
|
+
A 70B model is like hiring one genius who's okay at everything. Synapse is like hiring six specialists who are incredible at their thing and know how to collaborate. And they get better every day.
|
|
104
|
+
|
|
105
|
+
### Hebbian Routing
|
|
106
|
+
|
|
107
|
+
```
|
|
108
|
+
"Neurons that fire together, wire together."
|
|
109
|
+
```
|
|
110
|
+
|
|
111
|
+
The coordinator analyzes each request and routes to the right specialist(s). It tracks which specialist combinations produce the best results. Over time, the routing itself becomes learned — successful pathways get reinforced, poor ones weaken.
|
|
112
|
+
|
|
113
|
+
- Simple query → routed to a single specialist
|
|
114
|
+
- Complex task → multiple specialists activated, responses synthesized
|
|
115
|
+
|
|
116
|
+
### Continuous Learning Loop
|
|
117
|
+
|
|
118
|
+
```
|
|
119
|
+
Conversation → Self-Evaluation → Preference Pairs → QLoRA Fine-tune → Better Model
|
|
120
|
+
(automatic) (collected) (background) (hot-swapped)
|
|
121
|
+
```
|
|
122
|
+
|
|
123
|
+
The learning engine evaluates every response, collects preference pairs (good vs bad answers), and trains QLoRA adapters on idle GPU cycles. New adapters are hot-swapped in without restarting the server.
|
|
124
|
+
|
|
125
|
+
### Knowledge Graph
|
|
126
|
+
|
|
127
|
+
Every conversation updates a persistent SQLite knowledge graph:
|
|
128
|
+
- **Facts**: Subject-predicate-object triples with confidence scores
|
|
129
|
+
- **Conversations**: Full history with specialist attribution
|
|
130
|
+
- **Preferences**: DPO training pairs for self-improvement
|
|
131
|
+
|
|
132
|
+
---
|
|
133
|
+
|
|
134
|
+
## Architecture
|
|
135
|
+
|
|
136
|
+
```
|
|
137
|
+
Client → POST /v1/chat/completions
|
|
138
|
+
│
|
|
139
|
+
├→ Coordinator (keyword + Hebbian routing + metacognitive confidence)
|
|
140
|
+
│ ├→ Single specialist (simple query, confidence-scored)
|
|
141
|
+
│ └→ Multi-specialist swarm (complex task, PARALLEL execution)
|
|
142
|
+
│
|
|
143
|
+
├→ Inference Engine (Rust + candle)
|
|
144
|
+
│ ├→ GGUF quantized model loading
|
|
145
|
+
│ ├→ LoRA adapters (~5-10MB each, hot-swappable)
|
|
146
|
+
│ ├→ PagedAttention-style KV cache
|
|
147
|
+
│ └→ Temperature/top-p/top-k sampling
|
|
148
|
+
│
|
|
149
|
+
├→ Knowledge Graph (SQLite)
|
|
150
|
+
│ └→ Facts, conversations, preference pairs
|
|
151
|
+
│
|
|
152
|
+
├→ Learning Engine (Python sidecar on :8090)
|
|
153
|
+
│ ├→ Self-evaluation scoring
|
|
154
|
+
│ ├→ QLoRA fine-tuning
|
|
155
|
+
│ └→ DPO self-improvement
|
|
156
|
+
│
|
|
157
|
+
└→ SSE Stream Response (OpenAI-compatible)
|
|
158
|
+
```
|
|
159
|
+
|
|
160
|
+
### Project Structure
|
|
161
|
+
|
|
162
|
+
```
|
|
163
|
+
titan-synapse/
|
|
164
|
+
├── Cargo.toml # Workspace root
|
|
165
|
+
├── crates/synapse/src/
|
|
166
|
+
│ ├── main.rs # CLI (clap): serve, status, models, pull, learn, bench
|
|
167
|
+
│ ├── server.rs # Axum HTTP server on :6900
|
|
168
|
+
│ ├── openai.rs # OpenAI-compatible API handlers
|
|
169
|
+
│ ├── streaming.rs # SSE streaming
|
|
170
|
+
│ ├── config.rs # YAML config loader
|
|
171
|
+
│ ├── inference/
|
|
172
|
+
│ │ ├── engine.rs # Model management, GGUF auto-loading
|
|
173
|
+
│ │ ├── model.rs # Candle quantized model, generation loop
|
|
174
|
+
│ │ ├── sampler.rs # Temperature, top-p, top-k sampling
|
|
175
|
+
│ │ ├── kv_cache.rs # PagedAttention-style block allocation
|
|
176
|
+
│ │ └── lora.rs # LoRA adapter hot-swap
|
|
177
|
+
│ ├── dashboard.rs # Embedded web UI (Tailwind CDN, zero build tools)
|
|
178
|
+
│ ├── swarm/
|
|
179
|
+
│ │ ├── orchestrator.rs # Task decomposition + routing + cloud fallback
|
|
180
|
+
│ │ ├── coordinator.rs # Hebbian routing + metacognitive confidence
|
|
181
|
+
│ │ ├── pool.rs # Specialist pool with LRU eviction
|
|
182
|
+
│ │ ├── synthesizer.rs # Multi-specialist output merging
|
|
183
|
+
│ │ └── spawner.rs # Specialist auto-spawning from failure patterns
|
|
184
|
+
│ ├── learn/
|
|
185
|
+
│ │ ├── engine.rs # Python sidecar bridge
|
|
186
|
+
│ │ └── cloud_fallback.rs # Cloud API fallback + DPO training data capture
|
|
187
|
+
│ ├── memory/
|
|
188
|
+
│ │ ├── graph.rs # SQLite knowledge graph
|
|
189
|
+
│ │ ├── extractor.rs # Real-time knowledge extraction from conversations
|
|
190
|
+
│ │ └── hallucination.rs # Hallucination detection via knowledge cross-reference
|
|
191
|
+
│ ├── vram/manager.rs # GPU monitoring (nvidia-smi)
|
|
192
|
+
│ └── format/ # .synapse format pack/unpack
|
|
193
|
+
├── python/synapse_learn/ # FastAPI learning sidecar
|
|
194
|
+
├── config/default.yaml # Default specialist definitions
|
|
195
|
+
└── docker-compose.yml # GPU-accelerated learning container
|
|
196
|
+
```
|
|
197
|
+
|
|
198
|
+
### VRAM Budget (32GB GPU)
|
|
199
|
+
|
|
200
|
+
| Component | VRAM |
|
|
201
|
+
|-----------|------|
|
|
202
|
+
| Base model (3B, Q4_K_M) | ~2.1 GB |
|
|
203
|
+
| 6x LoRA adapters loaded | ~0.06 GB |
|
|
204
|
+
| KV cache pool | ~3 GB |
|
|
205
|
+
| Coordinator (0.6B) | ~0.5 GB |
|
|
206
|
+
| **Total for 6 specialists** | **~5.7 GB** |
|
|
207
|
+
| **Remaining on 32GB GPU** | **~26 GB free** |
|
|
208
|
+
|
|
209
|
+
Compare that to a single 70B model that needs **35GB** — doesn't even fit. With Synapse, you've got room for longer contexts, more specialists, or a larger generalist model alongside the swarm.
|
|
210
|
+
|
|
211
|
+
---
|
|
212
|
+
|
|
213
|
+
## Tested Results
|
|
214
|
+
|
|
215
|
+
Real results from our test deployment on an i9-14900KF with RTX 5090 (32GB VRAM).
|
|
216
|
+
|
|
217
|
+
### Benchmarks (Qwen2.5-3B, Q4_K_M)
|
|
218
|
+
|
|
219
|
+
| Metric | CPU | GPU (CUDA) |
|
|
220
|
+
|--------|-----|------------|
|
|
221
|
+
| **Throughput** | 21-24 tok/s | **97-128 tok/s** |
|
|
222
|
+
| **Model load time** | 1.1s (3B) | **0.6s (3B)** |
|
|
223
|
+
| **512-token generation** | ~22s | **~4s** |
|
|
224
|
+
| **Multi-model** | 2 models loaded | 2 models loaded |
|
|
225
|
+
| **Token counting** | Accurate | Accurate |
|
|
226
|
+
| **Hebbian routing** | Working | Working |
|
|
227
|
+
|
|
228
|
+
That's a **5x speedup** on GPU with CUDA 12.8 (Blackwell). And this is a quantized Q4 model — not all ops are GPU-accelerated yet. Full CUDA kernel coverage will push this even further.
|
|
229
|
+
|
|
230
|
+
### Standardized Evaluation (Real Benchmarks, Full Datasets)
|
|
231
|
+
|
|
232
|
+
Run against the **actual standardized benchmark datasets** — the same ones OpenAI, Anthropic, Meta, and Google report against. Not simplified proxies. Not cherry-picked samples. Every question in each dataset.
|
|
233
|
+
|
|
234
|
+
| Benchmark | Score | Samples | Notes |
|
|
235
|
+
|-----------|-------|---------|-------|
|
|
236
|
+
| **MMLU** (Knowledge + Reasoning) | **61.9%** | 14,042 | All 57 subjects. Best: marketing (87%), psychology (84%). Worst: moral scenarios (34%) |
|
|
237
|
+
| **HumanEval** (Code Generation) | **65.2%** | 164 | Real Python code execution with test cases (pass@1) |
|
|
238
|
+
| **GSM8K** (Math Reasoning) | **83.7%** | 1,319 | Grade school math — step-by-step reasoning with numerical extraction |
|
|
239
|
+
| **TruthfulQA** (Truthfulness) | **89.1%** | 817 | 89.1% truthful, 98.5% informative |
|
|
240
|
+
| **Overall** | **75.0%** | 16,342 | Weighted across all benchmarks |
|
|
241
|
+
|
|
242
|
+
#### What These Numbers Mean
|
|
243
|
+
|
|
244
|
+
**vs Qwen2.5 3B base** (the raw model, no swarm):
|
|
245
|
+
| Benchmark | Synapse Swarm | Qwen2.5 3B Base | Delta |
|
|
246
|
+
|-----------|---------------|-----------------|-------|
|
|
247
|
+
| MMLU | 61.9% | ~65% | -3% (Q4_K_M quantization cost) |
|
|
248
|
+
| HumanEval | 65.2% | ~55% | **+10 pts** (specialist routing) |
|
|
249
|
+
| GSM8K | 83.7% | ~68% | **+15.7 pts** (swarm math boost) |
|
|
250
|
+
| TruthfulQA | 89.1% | ~45% | **+44 pts** (hallucination detection) |
|
|
251
|
+
|
|
252
|
+
The swarm adds **+10 to +44 points** over the raw base model on task-specific benchmarks. MMLU takes a small hit from quantization — expected trade-off for running in 2.1GB VRAM instead of 6GB.
|
|
253
|
+
|
|
254
|
+
#### Head-to-Head vs Flagship Models (March 2026)
|
|
255
|
+
|
|
256
|
+
We're not pretending a 3B model beats GPT-5. Here's where we actually stand — with sourced numbers from official technical reports:
|
|
257
|
+
|
|
258
|
+
| Model | Params | MMLU | HumanEval | GSM8K | Cost |
|
|
259
|
+
|-------|--------|------|-----------|-------|------|
|
|
260
|
+
| **SYNAPSE (ours)** | **3B Q4** | **61.9%** | **65.2%** | **83.7%** | **$0 (local)** |
|
|
261
|
+
| GPT-5 | Undisclosed | 91.4% | ~99% | ~99% | $$$ |
|
|
262
|
+
| OpenAI o3 | Undisclosed | ~91% | ~97% | ~99% | $$$ |
|
|
263
|
+
| OpenAI o4-mini | Undisclosed | ~90% | 99.3% | ~99% | $$ |
|
|
264
|
+
| Grok 3 | Undisclosed | 92.7% | ~95% | ~99% | $$ |
|
|
265
|
+
| Grok 3.5 | Undisclosed | 91.8% | N/A | ~99% | $$ |
|
|
266
|
+
| DeepSeek R1 | 671B MoE | 90.8% | ~95% | ~99% | $ |
|
|
267
|
+
| Claude 3.7 Sonnet | Undisclosed | ~82% | 94% | ~98% | $$ |
|
|
268
|
+
| Claude Sonnet 4.5 | Undisclosed | ~83% | ~96% | ~99% | $$ |
|
|
269
|
+
| Gemini 2.5 Pro | Undisclosed | 89.8% | ~98% | ~99% | $$ |
|
|
270
|
+
| Llama 4 Maverick | 400B MoE | ~80% | ~86% | ~95% | Free (weights) |
|
|
271
|
+
| Llama 4 Scout | 109B MoE | 79.6% | 86.4% | ~93% | Free (weights) |
|
|
272
|
+
| Qwen3.5 27B | 27B | ~86% | ~85% | ~98% | Free (weights) |
|
|
273
|
+
| Qwen2.5 3B (base) | 3B | ~65% | ~55% | ~68% | Free (weights) |
|
|
274
|
+
|
|
275
|
+
*Sources: Official technical reports from OpenAI, Anthropic, Google, xAI, Meta, Alibaba, DeepSeek. Cross-referenced via Artificial Analysis, lmsys Arena, and llm-stats.com.*
|
|
276
|
+
|
|
277
|
+
#### The Honest Take
|
|
278
|
+
|
|
279
|
+
**On raw knowledge (MMLU):** Models 100x our size dominate — they should. A 3B model can't memorize as many facts as a 200B+ model. No amount of routing changes that.
|
|
280
|
+
|
|
281
|
+
**On math reasoning (GSM8K 83.7%):** Our swarm adds +15.7 points over the base Qwen2.5 3B model. Frontier models have saturated this benchmark (~99%), but our 3B model hitting 83.7% is remarkably strong for the parameter count.
|
|
282
|
+
|
|
283
|
+
**On code generation (HumanEval 65.2%):** Frontier models have essentially maxed out HumanEval (97-99%). Our 65.2% is +10 points over the base model, showing the specialist routing helps, but there's clear room to grow.
|
|
284
|
+
|
|
285
|
+
**On truthfulness (TruthfulQA 89.1%):** No major lab reports TruthfulQA anymore — they consider it saturated. But our +44 point improvement over the base model proves the hallucination detection system works.
|
|
286
|
+
|
|
287
|
+
**The real comparison isn't scores — it's economics.** GPT-5 scores 91% on MMLU but costs money per token, requires internet, and doesn't learn your patterns. Synapse scores 62% on MMLU but runs for free on your GPU at 100+ tok/s, works offline, and gets smarter every day from your conversations. Different tools for different jobs.
|
|
288
|
+
|
|
289
|
+
#### Note on Benchmark Saturation
|
|
290
|
+
|
|
291
|
+
MMLU, HumanEval, and GSM8K are now considered **saturated benchmarks** — frontier models score 90-99% on all of them. The industry has moved to harder evals: GPQA Diamond (PhD-level science), AIME 2025 (math olympiad), SWE-bench Verified (real software engineering), and MMLU-Pro (10-choice, harder). We report the classic benchmarks for baseline comparison, but plan to add the modern suite as the swarm matures.
|
|
292
|
+
|
|
293
|
+
### Verified Working
|
|
294
|
+
|
|
295
|
+
| Test | Result | Details |
|
|
296
|
+
|------|--------|---------|
|
|
297
|
+
| `cargo build --release` | PASS | Clean compilation, Rust 2024 edition |
|
|
298
|
+
| `cargo test` | **37/37 passing** | Config, sampler, KV cache, knowledge graph, manifest, packer, Hebbian, coordinator, LoRA, extractor, hallucination, spawner, cloud fallback |
|
|
299
|
+
| `synapse bench` | PASS | 4 prompts, 759 tokens, 23 tok/s average (CPU) |
|
|
300
|
+
| `synapse status` | PASS | Shows GPU info, VRAM usage, specialist list |
|
|
301
|
+
| `GET /health` | PASS | Returns "ok" |
|
|
302
|
+
| `GET /v1/models` | PASS | Lists synapse + all specialist models |
|
|
303
|
+
| `GET /api/status` | PASS | Loaded models, Hebbian pathways, knowledge stats |
|
|
304
|
+
| `POST /v1/chat/completions` | PASS | Real inference with token usage stats |
|
|
305
|
+
| `POST /v1/chat/completions` (stream) | PASS | SSE streaming, OpenAI-compatible chunks |
|
|
306
|
+
| GGUF model loading | PASS | Multi-model: Qwen2.5-0.5B (0.7s) + Qwen2.5-3B (1.1s) |
|
|
307
|
+
| Code generation | PASS | Correct `is_prime()` function with explanation |
|
|
308
|
+
| Math reasoning | PASS | "2 + 2 equals 4." — clean stop tokens |
|
|
309
|
+
| Specialist routing | PASS | Python queries → python_expert, SQL → sql_expert |
|
|
310
|
+
| Hebbian routing | PASS | Pathway strengths accumulate in SQLite |
|
|
311
|
+
| Swarm decomposition | PASS | Complex queries trigger multi-specialist **parallel** mode |
|
|
312
|
+
| Metacognitive confidence | PASS | /api/confidence returns per-specialist performance |
|
|
313
|
+
| Knowledge graph | PASS | Facts, preferences, conversations, routing pathways |
|
|
314
|
+
| .synapse format | PASS | Pack/unpack with model, adapters, knowledge bundling |
|
|
315
|
+
| Export/Import CLI | PASS | Round-trip specialist export and import |
|
|
316
|
+
|
|
317
|
+
### Unit Tests (37/37 Passing)
|
|
318
|
+
|
|
319
|
+
```
|
|
320
|
+
test config::tests::test_default_config ... ok
|
|
321
|
+
test config::tests::test_config_serialization ... ok
|
|
322
|
+
test config::tests::test_load_missing_config ... ok
|
|
323
|
+
test inference::sampler::tests::test_greedy_sampling ... ok
|
|
324
|
+
test inference::sampler::tests::test_empty_logits ... ok
|
|
325
|
+
test inference::sampler::tests::test_stochastic_sampling ... ok
|
|
326
|
+
test inference::kv_cache::tests::test_cache_allocation ... ok
|
|
327
|
+
test inference::lora::tests::test_lora_adapter_placeholder ... ok
|
|
328
|
+
test inference::lora::tests::test_lora_adapter_with_tensors ... ok
|
|
329
|
+
test inference::speculative::tests::test_speculative_decoder_creation ... ok
|
|
330
|
+
test inference::speculative::tests::test_draft_length_clamping ... ok
|
|
331
|
+
test swarm::coordinator::tests::test_single_routing ... ok
|
|
332
|
+
test swarm::coordinator::tests::test_swarm_routing ... ok
|
|
333
|
+
test swarm::coordinator::tests::test_default_routing ... ok
|
|
334
|
+
test swarm::spawner::tests::test_infer_capabilities ... ok
|
|
335
|
+
test swarm::spawner::tests::test_is_domain_covered ... ok
|
|
336
|
+
test swarm::spawner::tests::test_create_specialist_config ... ok
|
|
337
|
+
test memory::graph::tests::test_knowledge_graph ... ok
|
|
338
|
+
test memory::graph::tests::test_preferences ... ok
|
|
339
|
+
test memory::graph::tests::test_hebbian_routing ... ok
|
|
340
|
+
test memory::graph::tests::test_specialist_stats ... ok
|
|
341
|
+
test memory::extractor::tests::test_extract_is_pattern ... ok
|
|
342
|
+
test memory::extractor::tests::test_extract_verb_patterns ... ok
|
|
343
|
+
test memory::extractor::tests::test_extract_preferences_positive ... ok
|
|
344
|
+
test memory::extractor::tests::test_extract_preferences_negative ... ok
|
|
345
|
+
test memory::extractor::tests::test_empty_text ... ok
|
|
346
|
+
test memory::hallucination::tests::test_verify_correct_claim ... ok
|
|
347
|
+
test memory::hallucination::tests::test_verify_unknown_claim ... ok
|
|
348
|
+
test memory::hallucination::tests::test_word_overlap ... ok
|
|
349
|
+
test memory::hallucination::tests::test_empty_response ... ok
|
|
350
|
+
test learn::cloud_fallback::tests::test_cloud_fallback_disabled ... ok
|
|
351
|
+
test learn::cloud_fallback::tests::test_cloud_fallback_enabled ... ok
|
|
352
|
+
test learn::cloud_fallback::tests::test_confidence_threshold ... ok
|
|
353
|
+
test format::manifest::tests::test_manifest_creation ... ok
|
|
354
|
+
test format::manifest::tests::test_manifest_serialization ... ok
|
|
355
|
+
test format::packer::tests::test_pack_and_unpack ... ok
|
|
356
|
+
test format::packer::tests::test_list_bundles ... ok
|
|
357
|
+
test result: ok. 37 passed; 0 failed; 0 ignored
|
|
358
|
+
```
|
|
359
|
+
|
|
360
|
+
---
|
|
361
|
+
|
|
362
|
+
## How Synapse Compares
|
|
363
|
+
|
|
364
|
+
| Feature | Ollama | vLLM | CrewAI | **Synapse** |
|
|
365
|
+
|---------|--------|------|--------|-------------|
|
|
366
|
+
| Own inference engine | No (llama.cpp) | Yes | No (wraps LLMs) | **Yes (Rust + candle)** |
|
|
367
|
+
| Own model format | No (GGUF) | No | No | **Yes (.synapse)** |
|
|
368
|
+
| Specialist swarm | No | No | Yes (no inference) | **Yes (integrated)** |
|
|
369
|
+
| Continuous learning | No | No | No | **Yes (QLoRA + DPO)** |
|
|
370
|
+
| Knowledge graph | No | No | No | **Yes (real-time SQLite)** |
|
|
371
|
+
| Single binary | No | No | No | **Yes** |
|
|
372
|
+
| Consumer GPU optimized | Yes | No | N/A | **Yes** |
|
|
373
|
+
| OpenAI-compatible API | Yes | Yes | No | **Yes** |
|
|
374
|
+
|
|
375
|
+
---
|
|
376
|
+
|
|
377
|
+
## CLI Commands
|
|
378
|
+
|
|
379
|
+
```bash
|
|
380
|
+
synapse serve [--port 6900] # Start the inference server
|
|
381
|
+
synapse up [--port 6900] # Alias for serve (also opens web dashboard)
|
|
382
|
+
synapse status # GPU info, loaded models, specialist list
|
|
383
|
+
synapse models # List available models in ~/.synapse/models/
|
|
384
|
+
synapse pull <model> # Download model from HuggingFace
|
|
385
|
+
synapse export <name> # Export specialist as .synapse file
|
|
386
|
+
synapse import <path> # Import a .synapse specialist
|
|
387
|
+
synapse learn status # Show learning engine stats
|
|
388
|
+
synapse learn train-now # Force immediate training
|
|
389
|
+
synapse bench [--model <name>] # Run inference benchmarks
|
|
390
|
+
synapse eval # Run standardized eval (MMLU, HumanEval, MT-Bench, Safety)
|
|
391
|
+
synapse hub search <query> # Find community specialists on HuggingFace
|
|
392
|
+
synapse hub install <repo> # Install a community specialist
|
|
393
|
+
synapse hub push <name> # Share your trained specialist
|
|
394
|
+
synapse hub list # Browse the hub
|
|
395
|
+
```
|
|
396
|
+
|
|
397
|
+
---
|
|
398
|
+
|
|
399
|
+
## Configuration
|
|
400
|
+
|
|
401
|
+
Synapse uses YAML config at `~/.synapse/config.yaml`:
|
|
402
|
+
|
|
403
|
+
```yaml
|
|
404
|
+
port: 6900
|
|
405
|
+
coordinator_model: qwen3-0.6b
|
|
406
|
+
base_model: qwen3-3b
|
|
407
|
+
|
|
408
|
+
learning:
|
|
409
|
+
enabled: true
|
|
410
|
+
min_pairs_before_training: 10
|
|
411
|
+
sidecar_url: http://localhost:8090
|
|
412
|
+
eval_threshold: 3.0
|
|
413
|
+
|
|
414
|
+
specialists:
|
|
415
|
+
- name: general
|
|
416
|
+
capabilities: [general, chat, help]
|
|
417
|
+
system_prompt: "You are a helpful AI assistant."
|
|
418
|
+
priority: 50
|
|
419
|
+
|
|
420
|
+
- name: python_expert
|
|
421
|
+
capabilities: [python, debugging, testing, refactoring]
|
|
422
|
+
system_prompt: "You are an expert Python developer."
|
|
423
|
+
priority: 60
|
|
424
|
+
|
|
425
|
+
- name: sql_expert
|
|
426
|
+
capabilities: [sql, database, query, postgres]
|
|
427
|
+
system_prompt: "You are an expert database engineer."
|
|
428
|
+
priority: 60
|
|
429
|
+
```
|
|
430
|
+
|
|
431
|
+
Or just run `synapse up` and the defaults handle everything. Config is auto-created on first run.
|
|
432
|
+
|
|
433
|
+
---
|
|
434
|
+
|
|
435
|
+
## Contributing
|
|
436
|
+
|
|
437
|
+
This thing is early. There's a lot to build and a lot to break.
|
|
438
|
+
|
|
439
|
+
**Areas where help is most needed:**
|
|
440
|
+
|
|
441
|
+
- **CUDA inference** — Enable candle CUDA kernels for GPU-accelerated generation
|
|
442
|
+
- **New specialist adapters** — Train and contribute domain-specific LoRAs
|
|
443
|
+
- **Inference optimizations** — Flash attention, speculative decoding, continuous batching
|
|
444
|
+
- **Platform support** — AMD ROCm, Apple Metal, Intel Arc
|
|
445
|
+
- **Learning engine** — Improved training signal extraction, better DPO reward modeling
|
|
446
|
+
- **Benchmarks** — Rigorous eval harness across standard benchmarks
|
|
447
|
+
|
|
448
|
+
```bash
|
|
449
|
+
# Dev setup
|
|
450
|
+
git clone https://github.com/Djtony707/titan-synapse
|
|
451
|
+
cd titan-synapse
|
|
452
|
+
cargo build
|
|
453
|
+
cargo test # 37/37 should pass
|
|
454
|
+
|
|
455
|
+
# Run with debug logging
|
|
456
|
+
RUST_LOG=debug cargo run -- serve
|
|
457
|
+
```
|
|
458
|
+
|
|
459
|
+
---
|
|
460
|
+
|
|
461
|
+
## Roadmap
|
|
462
|
+
|
|
463
|
+
- [x] Core inference engine (Rust + candle)
|
|
464
|
+
- [x] GGUF quantized model loading
|
|
465
|
+
- [x] OpenAI-compatible API (chat completions + streaming)
|
|
466
|
+
- [x] Specialist swarm with Hebbian routing
|
|
467
|
+
- [x] Knowledge graph (SQLite)
|
|
468
|
+
- [x] .synapse model format
|
|
469
|
+
- [x] CLI (serve, status, models, pull, learn, bench)
|
|
470
|
+
- [x] Python learning sidecar
|
|
471
|
+
- [x] Multi-model loading (0.5B + 3B loaded simultaneously)
|
|
472
|
+
- [x] Token counting in API responses (accurate usage stats)
|
|
473
|
+
- [x] Hebbian routing persistence (SQLite-backed pathway learning)
|
|
474
|
+
- [x] .synapse format packer/unpacker with bundled models + adapters
|
|
475
|
+
- [x] CUDA-accelerated inference (5x speedup achieved — 128 tok/s on RTX 5090)
|
|
476
|
+
- [x] Parallel swarm execution (specialists run concurrently, not sequentially)
|
|
477
|
+
- [x] Metacognitive confidence scoring (system tracks what it knows)
|
|
478
|
+
- [x] Smart model selection (prefers larger models when available)
|
|
479
|
+
- [x] Real LoRA adapter loading via SafeTensors (f32, f16, bf16)
|
|
480
|
+
- [x] Conversation context threading (multi-turn awareness)
|
|
481
|
+
- [x] Real-time knowledge extraction from conversations
|
|
482
|
+
- [x] Hallucination detection (cross-reference against knowledge graph)
|
|
483
|
+
- [x] User feedback preference learning (DPO pair collection)
|
|
484
|
+
- [x] Standardized evaluation (MMLU 61.9%, HumanEval 65.2%, GSM8K 83.7%, TruthfulQA 89.1% — real datasets, 16,342 questions)
|
|
485
|
+
- [x] Cloud fallback with auto-learning (DPO pairs from cloud responses)
|
|
486
|
+
- [x] Specialist auto-spawning (system creates new specialists from failure patterns)
|
|
487
|
+
- [x] Web dashboard (chat UI at localhost:6900, stats + metacognition panels)
|
|
488
|
+
- [x] Community specialist hub (push/pull/search on HuggingFace)
|
|
489
|
+
- [x] Public dataset training pipeline (OpenWebMath, The Stack, SlimPajama, etc.)
|
|
490
|
+
- [x] Speculative decoding scaffold (draft + verify architecture)
|
|
491
|
+
- [x] LoRA adapter training + hot-swap during inference
|
|
492
|
+
- [ ] Full speculative decoding (shared KV cache state)
|
|
493
|
+
- [ ] Continuous batching across specialists
|
|
494
|
+
- [ ] Doc-to-LoRA knowledge crystallization
|
|
495
|
+
- [ ] Distributed swarm across multiple machines
|
|
496
|
+
- [ ] Custom Synapse base model (trained specifically for swarm coordination)
|
|
497
|
+
|
|
498
|
+
---
|
|
499
|
+
|
|
500
|
+
## License
|
|
501
|
+
|
|
502
|
+
Licensed under the [Apache License 2.0](LICENSE).
|
|
503
|
+
|
|
504
|
+
Use it. Fork it. Build on it. Make something wild.
|
|
505
|
+
|
|
506
|
+
---
|
|
507
|
+
|
|
508
|
+
<div align="center">
|
|
509
|
+
|
|
510
|
+
**Built with mass amounts of caffeine and mass amounts of mass by [Tony Elliott](https://github.com/Djtony707)**
|
|
511
|
+
|
|
512
|
+
*Because the future of AI isn't one massive model — it's a swarm of tiny ones that never stop learning.*
|
|
513
|
+
|
|
514
|
+
</div>
|