groove-dev 0.27.109 → 0.27.111

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (53) hide show
  1. package/DYNAMIC_LEAF_ARCH.md +488 -0
  2. package/EMBEDDING_SERVICE_BUILD_PLAN.md +200 -0
  3. package/MERKLE_TREE_ARCHITECTURE.md +354 -0
  4. package/TRAINING_DATA_v2.md +9 -0
  5. package/moe-training/client/domain-tagger.js +226 -30
  6. package/moe-training/client/trajectory-capture.js +5 -2
  7. package/moe-training/shared/constants.js +1 -0
  8. package/moe-training/shared/envelope-schema.js +38 -0
  9. package/moe-training/test/client/domain-tagger.test.js +111 -2
  10. package/moe-training/test/shared/envelope-schema.test.js +116 -0
  11. package/node_modules/@groove-dev/cli/package.json +1 -1
  12. package/node_modules/@groove-dev/daemon/package.json +1 -1
  13. package/node_modules/@groove-dev/daemon/src/api.js +24 -6
  14. package/node_modules/@groove-dev/daemon/src/index.js +1 -1
  15. package/node_modules/@groove-dev/daemon/src/journalist.js +24 -18
  16. package/node_modules/@groove-dev/daemon/src/preview.js +113 -9
  17. package/node_modules/@groove-dev/daemon/src/process.js +12 -31
  18. package/node_modules/@groove-dev/daemon/src/providers/base.js +1 -0
  19. package/node_modules/@groove-dev/daemon/src/providers/codex.js +28 -9
  20. package/node_modules/@groove-dev/daemon/src/rotator.js +6 -1
  21. package/node_modules/@groove-dev/daemon/src/tunnel-manager.js +1 -1
  22. package/node_modules/@groove-dev/daemon/test/codex-provider.test.js +63 -0
  23. package/node_modules/@groove-dev/daemon/test/rotator.test.js +10 -10
  24. package/node_modules/@groove-dev/gui/dist/assets/{index-CmYGHdXZ.js → index-CHu5w3i3.js} +2 -2
  25. package/node_modules/@groove-dev/gui/dist/index.html +1 -1
  26. package/node_modules/@groove-dev/gui/package.json +1 -1
  27. package/node_modules/@groove-dev/gui/src/components/preview/preview-workspace.jsx +1 -3
  28. package/node_modules/@groove-dev/gui/src/stores/groove.js +1 -1
  29. package/node_modules/moe-training/client/domain-tagger.js +226 -30
  30. package/node_modules/moe-training/client/trajectory-capture.js +5 -2
  31. package/node_modules/moe-training/shared/constants.js +1 -0
  32. package/node_modules/moe-training/shared/envelope-schema.js +38 -0
  33. package/node_modules/moe-training/test/client/domain-tagger.test.js +111 -2
  34. package/node_modules/moe-training/test/shared/envelope-schema.test.js +116 -0
  35. package/package.json +1 -1
  36. package/packages/cli/package.json +1 -1
  37. package/packages/daemon/package.json +1 -1
  38. package/packages/daemon/src/api.js +24 -6
  39. package/packages/daemon/src/index.js +1 -1
  40. package/packages/daemon/src/journalist.js +24 -18
  41. package/packages/daemon/src/preview.js +113 -9
  42. package/packages/daemon/src/process.js +12 -31
  43. package/packages/daemon/src/providers/base.js +1 -0
  44. package/packages/daemon/src/providers/codex.js +28 -9
  45. package/packages/daemon/src/rotator.js +6 -1
  46. package/packages/daemon/src/tunnel-manager.js +1 -1
  47. package/packages/gui/dist/assets/{index-CmYGHdXZ.js → index-CHu5w3i3.js} +2 -2
  48. package/packages/gui/dist/index.html +1 -1
  49. package/packages/gui/package.json +1 -1
  50. package/packages/gui/src/components/preview/preview-workspace.jsx +1 -3
  51. package/packages/gui/src/stores/groove.js +1 -1
  52. package/TRAINING_DATA.md +0 -12
  53. package/ssh/main.js +0 -2253
@@ -0,0 +1,200 @@
1
+ # Build Plan: Embedding Service Endpoint for Central Command
2
+
3
+ ## What this is
4
+
5
+ Groove clients need an embedding endpoint to compute 384-dimensional vectors from session text. The client (`DomainTagger`) already has the code to call it — it just needs the URL. This unlocks semantic domain tagging and `session_embedding` in training envelopes (currently `null` for all sessions).
6
+
7
+ ## Model
8
+
9
+ **`sentence-transformers/all-MiniLM-L6-v2`** — ONNX format
10
+ - 22M parameters, ~80MB on disk
11
+ - 384-dimensional output vectors
12
+ - Download from Hugging Face: `Xenova/all-MiniLM-L6-v2` (ONNX-optimized)
13
+ - Runtime: `onnxruntime-node` (npm package)
14
+
15
+ ## Dependencies
16
+
17
+ ```bash
18
+ npm install onnxruntime-node @xenova/transformers
19
+ ```
20
+
21
+ `@xenova/transformers` handles tokenization + ONNX inference in one package. If you prefer manual control, use `onnxruntime-node` directly with the tokenizer JSON files from the model repo.
22
+
23
+ ## Endpoint
24
+
25
+ **`POST /v1/embed`**
26
+
27
+ ### Request body:
28
+ ```json
29
+ {
30
+ "input": "some text to embed",
31
+ "model": "sentence-transformers/all-MiniLM-L6-v2"
32
+ }
33
+ ```
34
+
35
+ - `input` — string, required, max 512 chars (client already truncates to 512)
36
+ - `model` — string, optional (only one model supported, ignore or validate)
37
+
38
+ ### Response body (must match this exactly — client parses `data[0].embedding`):
39
+ ```json
40
+ {
41
+ "data": [
42
+ {
43
+ "embedding": [0.0123, -0.0456, 0.0789, "...384 floats total"],
44
+ "index": 0
45
+ }
46
+ ],
47
+ "model": "sentence-transformers/all-MiniLM-L6-v2"
48
+ }
49
+ ```
50
+
51
+ This follows the OpenAI embedding response format. The client reads it at:
52
+ ```javascript
53
+ const embedding = data?.data?.[0]?.embedding; // must be Array<number>, length 384
54
+ ```
55
+
56
+ ### Error responses:
57
+ - `400` — missing `input` field
58
+ - `503` — model not loaded yet (startup)
59
+
60
+ ## Health check behavior
61
+
62
+ On `init()`, the client sends a probe request to verify the service is up:
63
+ ```json
64
+ POST /v1/embed
65
+ { "input": "health check", "model": "sentence-transformers/all-MiniLM-L6-v2" }
66
+ ```
67
+ If this returns `200 OK`, the client switches from `keyword` mode to `http` mode. If it fails or times out (5s), the client silently falls back to keyword matching. So the endpoint must handle tiny inputs gracefully.
68
+
69
+ ## Implementation approach
70
+
71
+ ```
72
+ server/
73
+ ├── embedding.js ← new file: load model once, expose embed(text) function
74
+ ├── routes/
75
+ │ └── embed.js ← new file: POST /v1/embed route
76
+ └── index.js ← add: import + mount embed route
77
+ ```
78
+
79
+ ### `embedding.js` — Singleton model loader:
80
+ ```javascript
81
+ import { pipeline } from '@xenova/transformers';
82
+
83
+ let embedder = null;
84
+ let loading = false;
85
+ let loadError = null;
86
+
87
+ export async function initEmbedding() {
88
+ if (embedder || loading) return;
89
+ loading = true;
90
+ try {
91
+ embedder = await pipeline('feature-extraction', 'Xenova/all-MiniLM-L6-v2');
92
+ console.log('[embedding] model loaded');
93
+ } catch (err) {
94
+ loadError = err;
95
+ console.error('[embedding] failed to load model:', err.message);
96
+ }
97
+ loading = false;
98
+ }
99
+
100
+ export async function embed(text) {
101
+ if (!embedder) throw new Error('Model not loaded');
102
+ const result = await embedder(text, { pooling: 'mean', normalize: true });
103
+ return Array.from(result.data);
104
+ }
105
+
106
+ export function isReady() {
107
+ return embedder !== null;
108
+ }
109
+ ```
110
+
111
+ ### `routes/embed.js`:
112
+ ```javascript
113
+ import { Router } from 'express';
114
+ import { embed, isReady } from '../embedding.js';
115
+
116
+ export function createEmbedRoutes() {
117
+ const router = Router();
118
+
119
+ router.post('/v1/embed', async (req, res) => {
120
+ if (!isReady()) {
121
+ return res.status(503).json({ error: 'Model loading' });
122
+ }
123
+
124
+ const { input } = req.body;
125
+ if (!input || typeof input !== 'string') {
126
+ return res.status(400).json({ error: 'Missing input field' });
127
+ }
128
+
129
+ const text = input.slice(0, 512);
130
+
131
+ try {
132
+ const vector = await embed(text);
133
+ res.json({
134
+ data: [{ embedding: vector, index: 0 }],
135
+ model: 'sentence-transformers/all-MiniLM-L6-v2',
136
+ });
137
+ } catch (err) {
138
+ res.status(500).json({ error: err.message });
139
+ }
140
+ });
141
+
142
+ return router;
143
+ }
144
+ ```
145
+
146
+ ### `index.js` — add to existing server setup:
147
+ ```javascript
148
+ import { createEmbedRoutes } from './routes/embed.js';
149
+ import { initEmbedding } from './embedding.js';
150
+
151
+ // Mount alongside existing routes
152
+ app.use(createEmbedRoutes());
153
+
154
+ // Load model in background (don't block server startup)
155
+ initEmbedding();
156
+ ```
157
+
158
+ ## Client configuration
159
+
160
+ Once the endpoint is live, Groove clients connect by setting one env var:
161
+
162
+ ```bash
163
+ export EMBEDDING_SERVICE_URL=https://api.groovedev.ai/v1/embed
164
+ ```
165
+
166
+ The `DomainTagger` constructor reads `process.env.EMBEDDING_SERVICE_URL` automatically. No client code changes needed.
167
+
168
+ ## What changes on the Groove side
169
+
170
+ Nothing. The client code already supports this. When `EMBEDDING_SERVICE_URL` is set:
171
+ 1. `DomainTagger.init()` probes the URL → switches to `http` mode
172
+ 2. `tag()` uses cosine similarity against domain centroids instead of keywords
173
+ 3. `embed()` returns `{ model, vector, source_text }` instead of `null`
174
+ 4. `trajectory-capture.js` writes the vector into `session_embedding` on SESSION_CLOSE
175
+
176
+ ## Performance notes
177
+
178
+ - Model loads once at startup (~2-5 seconds)
179
+ - Inference: ~5-15ms per embedding on CPU, <2ms on GPU
180
+ - Each session close triggers ~45 embed calls (40 domain centroids + routing text + session embedding) during `init()`, then 1-2 per session close
181
+ - The 40 centroid embeddings are computed once in `_buildCentroids()` during init and cached in memory — not per-request
182
+ - Memory footprint: ~200MB for model + runtime
183
+
184
+ ## Verification
185
+
186
+ After deploying, test with:
187
+ ```bash
188
+ curl -X POST https://api.groovedev.ai/v1/embed \
189
+ -H "Content-Type: application/json" \
190
+ -d '{"input": "React TypeScript frontend development"}' \
191
+ | jq '.data[0].embedding | length'
192
+ # Should return: 384
193
+ ```
194
+
195
+ Then on any Groove client machine:
196
+ ```bash
197
+ export EMBEDDING_SERVICE_URL=https://api.groovedev.ai/v1/embed
198
+ ```
199
+
200
+ Next agent spawn will show `session_embedding` populated in training data instead of `null`, and domain tags will have much higher confidence scores.
@@ -0,0 +1,354 @@
1
+ # Hummingbird Local Tree Architecture
2
+
3
+ ## 1. Executive Summary
4
+
5
+ Hummingbird has two trees.
6
+
7
+ The **Network Tree** makes the model smart. It is the collective intelligence layer: shared skill leaves, shared reasoning leaves, shared routing centroids, shared telemetry-derived improvement, and shared network effect. It is where the mesh becomes more capable as more people use it.
8
+
9
+ The **Local Tree** makes the model yours. It is the per-device adaptive intelligence layer: private personality, local preference memory, domain affinity, pacing, tone, preferred explanation style, and the user's evolving interaction fingerprint. It never leaves the device. It is not uploaded to the mesh. It is not visible to a cloud provider. It is the layer that turns Hummingbird from another capable AI system into a persistent cognitive companion that understands how an individual thinks and communicates.
10
+
11
+ The proof-of-concept already validates the core primitive: a lightweight semantic router can select specialized leaves with high accuracy and negligible latency. In the current benchmark, Hummingbird routed 48 prompts across 10 technical domains with **93.8% accuracy**, selected among 10,000 simulated leaves in **0.7688 ms**, and averaged **0.67 ms** route time across the benchmark suite. That means the chassis + router + leaf architecture is not just a concept; the selection layer works. The Local Tree extends that proven mechanism from *what expertise should be loaded?* to *how should this intelligence express itself for this person?*
12
+
13
+ This is the next step in the Hummingbird vision: edge-native intelligence that is collective where it should be collective, personal where it must be personal, and private by design. The Network Tree is the world's shared intelligence graph. The Local Tree is the user's private AI identity.
14
+
15
+ Diagram in words:
16
+
17
+ ```text
18
+ Hummingbird Device
19
+ ├── Frozen Chassis
20
+ ├── Router
21
+ │ ├── Intent classifier: task | explore | chat
22
+ │ └── Leaf selector: skill | reasoning | standby domain
23
+ ├── Network Tree Cache
24
+ │ ├── Skill leaves: Python, React, PostgreSQL, DevOps, Rust...
25
+ │ └── Reasoning leaves: research, strategy, debate, synthesis...
26
+ └── Local Tree
27
+ └── Personality leaf: tone, pace, format, debate style, preferences
28
+ ```
29
+
30
+ The key principle is separation of concerns: shared expertise is trained collectively; personal adaptation is learned locally. Hummingbird should know Python because the network trained a great Python leaf. Hummingbird should know *how you like Python explained* because your device learned your preferences over time.
31
+
32
+ ## 2. The Two-Tree Model
33
+
34
+ ### The Network Tree
35
+
36
+ The Network Tree lives on the mesh and is shared across all participating nodes. It contains the leaves that represent reusable intelligence: skill leaves for domain expertise and reasoning leaves for cognitive modes. A Python skill leaf, a React skill leaf, a PostgreSQL skill leaf, a security leaf, a system design leaf, a research reasoning leaf, a strategy reasoning leaf, and a debate reasoning leaf all belong here because their value improves when trained on many high-quality examples from many users.
37
+
38
+ The Network Tree is trained through the Groove telemetry pipeline. Current telemetry already captures agentic ReAct trajectories: thoughts, commands, file edits, tool calls, stdout, stderr, timestamps, and completion outcomes. That is exactly the data Hummingbird needs. It is not generic scraped text. It is procedural intelligence: what the agent decided, what it tried, what changed in the files, what the environment returned, and whether the workflow succeeded.
39
+
40
+ As more users run Hummingbird, the Network Tree compounds. More tasks create more telemetry. More telemetry improves the leaves. Better leaves produce better outputs. Better outputs attract more users. More users create more data. That is the network effect at the center of Hummingbird: the shared mesh becomes more capable because each successful trajectory can improve the public tree.
41
+
42
+ The Network Tree also evolves through the gossip protocol. Successful leaves propagate. Weak leaves lose routing share. Underperforming branches get pruned, merged, or replaced. The router's centroid graph becomes a living map of what the network can do. This is the shared intelligence of the Hummingbird network.
43
+
44
+ ### The Local Tree
45
+
46
+ The Local Tree lives on the user's device only. It is never shared. It is never uploaded. It contains the personality leaf, local preference vectors, rolling interaction summaries, and domain affinity signals that personalize routing and response style for one person on one device.
47
+
48
+ The Local Tree does not try to duplicate the Network Tree. It should not contain a full Python expert or a full React expert. That would waste storage, fragment training, and weaken the network effect. Instead, the Local Tree answers a different question: given that Hummingbird knows the right thing, how should it communicate with this user?
49
+
50
+ Some users want terse answers. Some want the reasoning first. Some want code first and explanation only if asked. Some want a direct challenge when their assumption looks wrong. Some want a collaborative tone. Some want professional precision. Some iterate quickly and prefer small patches. Some want a deep architectural debate before anything is built. These are not domain facts. They are personal interaction patterns.
51
+
52
+ The Local Tree captures those patterns organically. It makes the model feel like *your* assistant, not a generic assistant wearing a slightly customized system prompt. And because it is device-local, no cloud provider builds a profile of how the user thinks, argues, learns, works, or writes.
53
+
54
+ The result is a clean split:
55
+
56
+ ```text
57
+ Network Tree = shared capability
58
+ Local Tree = private personalization
59
+
60
+ Network Tree answers: What should the model know?
61
+ Local Tree answers: How should the model work with me?
62
+ ```
63
+
64
+ ## 3. The Personality Mirror
65
+
66
+ The Personality Mirror is the core breakthrough of the Local Tree.
67
+
68
+ Every device ships with a blank or neutral personality leaf. On day one, Hummingbird behaves like a strong general assistant: helpful, capable, and neutral. It uses the Network Tree for skills and reasoning, but the Local Tree has not yet learned the user. It has no strong opinion about verbosity, tone, debate style, pacing, format, or preferred depth.
69
+
70
+ As the user interacts with Hummingbird, the Local Tree observes patterns. It does not need invasive surveillance. The normal interaction stream is enough:
71
+
72
+ - **Response length preference:** Does the user reward concise answers, or do they ask for deeper explanation?
73
+ - **Tone:** Does the user write casually, formally, humorously, urgently, or analytically?
74
+ - **Output format:** Does the user prefer code-only, code with reasoning, explanation-first, tables, checklists, or narrative?
75
+ - **Debate style:** Does the user push back, challenge assumptions, and enjoy rigorous disagreement, or prefer fast alignment?
76
+ - **Iteration pattern:** Does the user refine over several rounds, or expect a near-final answer on the first pass?
77
+ - **Pacing:** Does the user want to pause and reason, or keep shipping with minimal interruption?
78
+ - **Domain affinity:** Which topics recur, and what expertise level does the user demonstrate in each domain?
79
+
80
+ This is not traditional on-device training at first. The Local Tree should begin with a lightweight adaptive mechanism that is fast, inspectable, reversible, and cheap.
81
+
82
+ ### Option A: Accumulated Preference Vectors
83
+
84
+ Each interaction nudges a small set of personality dimensions. For example, if the user repeatedly says "shorter," the verbosity score moves down. If the user asks "why?" after code-only answers, explanation depth rises. If the user frequently says "push back if I am wrong," debate tolerance rises. These scores are injected into the system context at inference time and shape chassis behavior immediately.
85
+
86
+ This option is simple, fast, transparent, and requires no GPU training. It can work on a phone, laptop, or low-power edge node. It also makes privacy enforcement straightforward because the local vector file is small and auditable.
87
+
88
+ ### Option B: Local Micro-Fine-Tuning
89
+
90
+ After enough interaction data accumulates, the Local Tree can optionally run a lightweight LoRA update for the personality leaf. This could happen overnight, while charging, or during an explicit user-approved optimization window. The training target is not domain knowledge. It is communication style: preferred structure, tone, ordering, level of detail, and interaction rhythm.
91
+
92
+ This produces deeper personalization, but it requires more compute, careful safeguards, and clear user controls. It should not be the first mechanism users depend on.
93
+
94
+ ### Option C: Hybrid Adaptation
95
+
96
+ The recommended architecture is **Option C: hybrid adaptation**.
97
+
98
+ Start with preference vectors on day one. They are instant, cheap, explainable, and responsive. Once the device has enough signal, such as 100+ sessions or a user-approved threshold, graduate to optional local micro-fine-tuning. The vector layer remains active as a real-time steering layer, while the LoRA personality leaf slowly absorbs stable long-term patterns.
99
+
100
+ This gives Hummingbird the best of both worlds: immediate adaptation and deep personalization. After a week, the model starts matching communication style. After a month, it anticipates preferred structure and depth. After six months, it becomes a cognitive mirror: not just repeating the user's tone, but complementing the user's thinking patterns.
101
+
102
+ That is the category shift. Every other AI has a personality baked in. Claude is Claude. ChatGPT is ChatGPT. Users can steer them with prompts, but the steering is fragile, partial, and often conversation-scoped. Hummingbird's Local Tree is persistent memory of the user as a person, enforced on-device. It is better UX and better privacy at the same time. That combination almost never happens.
103
+
104
+ ## 4. The Leaf Taxonomy
105
+
106
+ Hummingbird needs a clear taxonomy so the router can stack the right capabilities without mixing concerns.
107
+
108
+ ### Skill Leaves: Network Tree
109
+
110
+ Skill leaves represent domain expertise. Examples include Python, React, PostgreSQL, DevOps, Docker, Kubernetes, Rust, TypeScript, data science, security, mobile development, and system design. These leaves handle prompts that mean: "do this task," "fix this bug," "write this code," "build this component," or "analyze this technical artifact."
111
+
112
+ They are trained on agentic ReAct telemetry from Groove: tool use, code output, file edits, bash commands, environment observations, and success/failure outcomes. The training method is LoRA fine-tuning plus DPO on domain-tagged trajectories. A strong Python leaf should learn not just Python syntax, but how successful agents inspect files, run tests, patch code, and recover from errors in Python-heavy tasks.
113
+
114
+ Skill leaves live on the Network Tree because domain expertise benefits from collective learning. Everyone improves when the shared Python leaf improves.
115
+
116
+ ### Reasoning Leaves: Network Tree
117
+
118
+ Reasoning leaves represent cognitive modes rather than domains. Examples include research, strategy and planning, analysis and breakdown, debate and challenge, synthesis and connection, and teaching and explanation. These leaves handle prompts that mean: "think about this with me," "compare these approaches," "what are the tradeoffs," "challenge this idea," or "help me understand."
119
+
120
+ They are trained on conversational and exploratory data: planning sessions, research explorations, long-form analysis, back-and-forth problem solving, product strategy, technical debates, and synthesis across domains. The training method is LoRA fine-tuning plus DPO on exploration-tagged sessions.
121
+
122
+ This is what allows Hummingbird to compete with frontier conversational systems. A great AI is not only a task executor. It can reason about novel problems, explore possibilities, push back on assumptions, and connect ideas that are not already neatly packaged as a coding task.
123
+
124
+ Reasoning leaves also live on the Network Tree because cognitive strategies improve collectively. The network should learn what good research looks like, what good debate looks like, and what good planning looks like.
125
+
126
+ ### Personality Leaf: Local Tree
127
+
128
+ The personality leaf is singular per device. It is not a category like Python or React. It is the private adapter for one user.
129
+
130
+ It captures dimensions such as tone, verbosity, formality, debate tolerance, pacing, preferred output format, error detail, humor tolerance, and iteration rhythm. It is trained only on local interaction patterns and never shared. It handles **how** the model communicates, not **what** the model knows.
131
+
132
+ The adaptation method is preference vectors immediately, with optional local micro-fine-tuning later. The personality leaf is always active because every response has a style, even when the user is asking for a technical task.
133
+
134
+ ## 5. Multi-Leaf Stacking
135
+
136
+ The router should no longer think in terms of one selected leaf. It should assemble a stack.
137
+
138
+ For a task prompt like "write me a Python function," the stack is:
139
+
140
+ ```text
141
+ [personality leaf] + [python skill leaf]
142
+ ```
143
+
144
+ The personality leaf shapes how the response is delivered. The Python skill leaf shapes what the response contains.
145
+
146
+ For an exploration prompt like "what do you think about microservices vs monoliths," the stack is:
147
+
148
+ ```text
149
+ [personality leaf] + [strategy reasoning leaf] + [system_design skill leaf on standby]
150
+ ```
151
+
152
+ The personality leaf shapes the conversation style. The strategy reasoning leaf shapes the thinking approach. The system design leaf is available if the conversation needs deeper technical grounding.
153
+
154
+ For a pure chat or explanation prompt like "explain this to me simply," the stack is:
155
+
156
+ ```text
157
+ [personality leaf] + [teaching reasoning leaf]
158
+ ```
159
+
160
+ Here the personality leaf is dominant and the reasoning leaf supports the pedagogical mode.
161
+
162
+ The implementation can start simple. In the first version, stacking can be represented as concatenated system context:
163
+
164
+ ```text
165
+ System context = base policy
166
+ + personality prompt generated from vectors
167
+ + selected reasoning or skill prompt
168
+ + task-specific instructions
169
+ ```
170
+
171
+ This requires no new inference infrastructure and matches the current PoC's system-prompt swap baseline. Later, Hummingbird can support true adapter stacking with PEFT: load multiple LoRA adapters, weight them, and merge or activate them dynamically. That advanced path matters because it lets personality, reasoning, and skill become learned activation layers rather than prompt-only steering.
172
+
173
+ The product rule is simple: the personality leaf is always active. The user should not need to ask Hummingbird to remember how they work. It should be part of every inference call.
174
+
175
+ ## 6. Two-Stage Router
176
+
177
+ The current router proves the core routing primitive: embed the prompt, compare against leaf centroids, pick the best match. The Local Tree architecture extends this into a two-stage router.
178
+
179
+ ### Stage 1: Intent Classification
180
+
181
+ Stage 1 classifies the prompt into a mode:
182
+
183
+ ```text
184
+ TASK | EXPLORE | CHAT
185
+ ```
186
+
187
+ The method remains lightweight cosine similarity, but the comparison is against mode centroids instead of domain centroids. The mode centroids are trained from labeled examples:
188
+
189
+ - **TASK:** "write a function," "fix this bug," "create a Dockerfile," "build a component."
190
+ - **EXPLORE:** "what do you think about," "how should we approach," "compare X vs Y," "what are the tradeoffs."
191
+ - **CHAT:** "explain this," "tell me about," "help me understand," "what does this mean."
192
+
193
+ This lets the router understand whether the user wants execution, exploration, or explanation before it picks a specialized branch.
194
+
195
+ ### Stage 2: Domain Selection
196
+
197
+ Stage 2 selects the leaf within the detected mode:
198
+
199
+ - **TASK mode** searches skill leaf centroids. This is the current behavior.
200
+ - **EXPLORE mode** searches reasoning leaf centroids, with skill leaves available as standby grounding.
201
+ - **CHAT mode** searches reasoning leaf centroids, while the personality leaf is automatically applied.
202
+
203
+ The performance impact is negligible. The PoC shows cosine similarity over 10,000 leaves in under 1 ms. Two lookups still keep routing under roughly 2 ms for the centroid search, which is not material compared with embedding latency or inference time. The important point is architectural: Hummingbird can become more conversational without abandoning the deterministic router that makes the system edge-native.
204
+
205
+ Diagram in words:
206
+
207
+ ```text
208
+ User Prompt
209
+
210
+ Stage 1: Mode centroid lookup
211
+ ├── TASK
212
+ ├── EXPLORE
213
+ └── CHAT
214
+
215
+ Stage 2: Leaf centroid lookup inside selected branch
216
+ ├── Skill leaf
217
+ ├── Reasoning leaf
218
+ └── Standby support leaf
219
+
220
+ Stack assembly
221
+ └── Personality + selected network leaves
222
+ ```
223
+
224
+ ## 7. Enterprise Configuration
225
+
226
+ The same architecture serves individuals and enterprises. This should be a configuration mode, not a separate product.
227
+
228
+ ### Individual Mode
229
+
230
+ Individual mode is the default. The Local Tree is active, the personality leaf grows over time, reasoning leaves are fully available, and all relevant skill leaves can be cached from the Network Tree. The user's local tree is private and portable. If the user moves devices, they can export an encrypted bundle and import it elsewhere.
231
+
232
+ This is the fullest expression of Hummingbird as personalized AI: shared intelligence from the network, private adaptation on the device.
233
+
234
+ ### Enterprise Mode
235
+
236
+ Enterprise mode changes policy while preserving architecture. The personality leaf can be disabled, pinned to neutral/professional, or replaced by a company-specific personality leaf set by IT. That gives organizations a consistent brand and compliance posture while still allowing employees to benefit from skill and reasoning leaves.
237
+
238
+ Reasoning leaves remain active because teams still need research, strategy, planning, tradeoff analysis, and explanation. Skill leaves can be scoped by team. A frontend team may not need Rust. A data team may not need Swift. A security team may need deeper security and infrastructure branches.
239
+
240
+ The Local Tree can still track domain affinity and routing optimization signals without adapting personality. For example, an enterprise device can learn that a user mostly works in TypeScript and DevOps so those leaves are cached aggressively, while keeping tone neutral and non-personalized.
241
+
242
+ Enterprises can also run company-private leaf training. Their telemetry can feed a private branch of the Network Tree that never enters the public mesh. This matters for internal codebases, proprietary workflows, regulated industries, and teams that want the Hummingbird architecture without public data contribution.
243
+
244
+ The key is that individual and enterprise deployments share the chassis, router, leaf taxonomy, and stack assembly. The difference is configuration and governance.
245
+
246
+ ## 8. Local Tree Data Model
247
+
248
+ The Local Tree should use a small, explicit, inspectable on-device storage layout:
249
+
250
+ ```text
251
+ ~/.hummingbird/
252
+ config.json — mode, preferences, privacy controls
253
+ personality/
254
+ vectors.json — preference dimension scores updated every session
255
+ interaction_log.jsonl — rolling interaction patterns, e.g. last 500 sessions
256
+ leaf.bin — optional personality LoRA adapter
257
+ cache/
258
+ skill_leaves/ — downloaded skill LoRAs from the network
259
+ reasoning_leaves/ — downloaded reasoning LoRAs from the network
260
+ router/
261
+ centroids.bin — leaf centroid vectors synced from the network
262
+ mode_centroids.bin — intent classification vectors
263
+ ```
264
+
265
+ Example `vectors.json`:
266
+
267
+ ```json
268
+ {
269
+ "version": 1,
270
+ "updated_at": "2026-05-15T14:30:00Z",
271
+ "sessions_observed": 247,
272
+ "dimensions": {
273
+ "verbosity": 0.35,
274
+ "formality": 0.22,
275
+ "code_vs_explanation": 0.78,
276
+ "debate_tolerance": 0.65,
277
+ "iteration_preference": 0.40,
278
+ "detail_in_errors": 0.82,
279
+ "humor_tolerance": 0.55,
280
+ "pace": 0.70
281
+ },
282
+ "domain_affinity": {
283
+ "python": 0.42,
284
+ "typescript_node": 0.28,
285
+ "devops_docker": 0.15,
286
+ "system_design": 0.10,
287
+ "other": 0.05
288
+ }
289
+ }
290
+ ```
291
+
292
+ Each dimension is inferred from interaction patterns:
293
+
294
+ - **Verbosity:** Compare requested response length, follow-up corrections, and acceptance. Short prompts plus "be brief" signals lower the score. Repeated requests for more detail raise it.
295
+ - **Formality:** Analyze user language. Slang, contractions, emojis, and casual phrasing lower formality. Structured professional language raises it.
296
+ - **Code vs explanation:** Track whether the user accepts direct code, asks for rationale, or removes explanations in later revisions.
297
+ - **Debate tolerance:** Track pushback patterns. Frequent "but what about," "I disagree," "challenge this," or "are we sure" raises the score.
298
+ - **Iteration preference:** Track whether success comes through small rounds or larger first-pass deliverables.
299
+ - **Detail in errors:** Track whether the user asks for root-cause analysis, logs, tracebacks, and validation details.
300
+ - **Humor tolerance:** Track whether casual or playful phrasing is accepted, ignored, or corrected.
301
+ - **Pace:** Track whether the user rewards fast implementation and concise updates or pauses for discussion and framing.
302
+
303
+ The rolling `interaction_log.jsonl` should store derived signals, not raw private transcripts by default. For example, one line can record detected tone, accepted format, mode, selected leaves, user correction type, and outcome. Raw content should require explicit opt-in because the Local Tree's privacy story is strongest when it stores only what it needs.
304
+
305
+ ## 9. Privacy Architecture
306
+
307
+ The Local Tree's privacy guarantee must be architectural, not just policy.
308
+
309
+ The local tree never leaves the device. Personality vectors are not included in network telemetry. Personality LoRA weights are not uploaded. Interaction logs are not gossiped. The telemetry pipeline should have no field for personality data, no serialization path for local vectors, and no default exporter that can accidentally include the Local Tree.
310
+
311
+ The personality directory should be encrypted at rest with device-local keys. If the user deletes the app, the Local Tree is destroyed. Export and import must be user-initiated, encrypted, and explicit. A migration bundle should be portable only because the user chose to carry it, not because a provider synchronized it silently.
312
+
313
+ In enterprise mode, IT can audit configuration posture: whether personalization is enabled, disabled, pinned, or company-managed. IT should not be able to export a user's personal personality data unless the deployment explicitly disables personal adaptation and uses a company-owned profile. This distinction matters. Hummingbird should not recreate cloud surveillance under an enterprise label.
314
+
315
+ Architectural enforcement should include:
316
+
317
+ - Separate storage paths for `personality/` and network telemetry.
318
+ - Schema-level omission of personality fields from public telemetry.
319
+ - Export functions that require explicit user action and encryption.
320
+ - Tests that fail if local personality data appears in telemetry payloads.
321
+ - Clear deletion semantics that remove vectors, logs, and local adapters.
322
+
323
+ This is one of Hummingbird's strongest positions: no cloud provider needs to hold a profile of how the user thinks. The device can personalize deeply without centralizing identity.
324
+
325
+ ## 10. Growth Timeline
326
+
327
+ The Local Tree should feel alive because it grows gradually and visibly.
328
+
329
+ **Day 1:** Hummingbird starts neutral. It can route to strong Network Tree leaves, but its personality is generic. It feels like a capable AI assistant.
330
+
331
+ **Week 1:** Basic preferences emerge. Verbosity and formality start adjusting. The model notices whether the user likes concise answers, direct code, structured plans, casual tone, or professional tone. It begins to feel less generic.
332
+
333
+ **Month 1:** The Personality Mirror becomes strong. Hummingbird anticipates preferred format, tone, and level of detail. It knows when to explain first, when to patch first, when to ask a clarifying question, and when to move.
334
+
335
+ **Month 3:** Deep personalization appears. Domain affinity shapes proactive suggestions and cache strategy. The model knows the user's recurring domains, codebase patterns, preferred validation style, and iteration rhythm. It starts saving time not just by answering, but by avoiding the wrong kind of answer.
336
+
337
+ **Month 6+:** The Local Tree becomes a cognitive mirror. It is not merely matching style; it is complementing thought. If the user tends to overlook edge cases, Hummingbird raises them. If the user over-engineers, it suggests the simpler path. If the user moves too fast, it slows down at risky moments. If the user gets stuck in analysis, it pushes toward execution.
338
+
339
+ That is the future state: an assistant that learns the user's working mind without extracting that mind into the cloud.
340
+
341
+ ## 11. Future: Portable Identity
342
+
343
+ The Local Tree can become the user's AI identity.
344
+
345
+ Today, preferences are trapped inside provider systems. A user can spend months teaching an AI how they think, but that learning belongs to the platform. If they switch providers, devices, or models, the relationship resets.
346
+
347
+ Hummingbird points in the opposite direction. The personality leaf belongs to the user. It can be exported, encrypted, imported, backed up, deleted, or carried to a new device. Over time, the vector format could become a standard: a portable preference layer that any compatible AI runtime can read.
348
+
349
+ That changes the relationship between people and AI systems. The model is no longer the identity container. The user is. The AI runtime becomes interchangeable infrastructure around a user-owned intelligence profile.
350
+
351
+ At 1,000 nodes, the Network Tree can become a powerful shared intelligence system: many users producing high-quality trajectories, many leaves specializing, many routers selecting, many adapters improving. But the Local Tree ensures that scale does not erase individuality. The network gets smarter together while each device becomes more personal in private.
352
+
353
+ This is not just a new model layout. It is a new category: collective intelligence plus private personalization, edge-native routing plus adaptive identity, shared expertise plus user-owned personality. Hummingbird can unlock something that cloud-only AI cannot: an assistant that benefits from the crowd without turning the user into the product.
354
+