@nxuss/lemma 0.3.3 → 0.4.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (49) hide show
  1. package/README.md +42 -330
  2. package/dashboard/README.md +351 -0
  3. package/dashboard/dist/assets/index-CIx8ECj8.css +1 -0
  4. package/dashboard/dist/assets/index-zTlIPJOp.js +478 -0
  5. package/dashboard/dist/assets/index-zTlIPJOp.js.map +1 -0
  6. package/dashboard/dist/index.html +14 -0
  7. package/dist/cjs/cloud/TenantCache.d.ts +1 -0
  8. package/dist/cjs/cloud/TenantCache.d.ts.map +1 -1
  9. package/dist/cjs/cloud/TenantCache.js +25 -3
  10. package/dist/cjs/cloud/TenantCache.js.map +1 -1
  11. package/dist/cjs/config/index.d.ts.map +1 -1
  12. package/dist/cjs/config/index.js +4 -0
  13. package/dist/cjs/config/index.js.map +1 -1
  14. package/dist/cjs/index.d.ts +3 -1
  15. package/dist/cjs/index.d.ts.map +1 -1
  16. package/dist/cjs/index.js +29 -0
  17. package/dist/cjs/index.js.map +1 -1
  18. package/dist/cjs/observability/IdeContextSync.d.ts +39 -0
  19. package/dist/cjs/observability/IdeContextSync.d.ts.map +1 -0
  20. package/dist/cjs/observability/IdeContextSync.js +169 -0
  21. package/dist/cjs/observability/IdeContextSync.js.map +1 -0
  22. package/dist/cjs/types/index.d.ts +11 -0
  23. package/dist/cjs/types/index.d.ts.map +1 -1
  24. package/dist/cjs/types/index.js.map +1 -1
  25. package/dist/esm/cloud/TenantCache.d.ts +1 -0
  26. package/dist/esm/cloud/TenantCache.d.ts.map +1 -1
  27. package/dist/esm/cloud/TenantCache.js +25 -3
  28. package/dist/esm/cloud/TenantCache.js.map +1 -1
  29. package/dist/esm/config/index.d.ts.map +1 -1
  30. package/dist/esm/config/index.js +4 -0
  31. package/dist/esm/config/index.js.map +1 -1
  32. package/dist/esm/index.d.ts +3 -1
  33. package/dist/esm/index.d.ts.map +1 -1
  34. package/dist/esm/index.js +29 -0
  35. package/dist/esm/index.js.map +1 -1
  36. package/dist/esm/observability/IdeContextSync.d.ts +39 -0
  37. package/dist/esm/observability/IdeContextSync.d.ts.map +1 -0
  38. package/dist/esm/observability/IdeContextSync.js +162 -0
  39. package/dist/esm/observability/IdeContextSync.js.map +1 -0
  40. package/dist/esm/types/index.d.ts +11 -0
  41. package/dist/esm/types/index.d.ts.map +1 -1
  42. package/dist/esm/types/index.js.map +1 -1
  43. package/lemma-proxy.cjs +66 -12
  44. package/package.json +15 -13
  45. package/src/cloud/CloudSyncClient.js +35 -0
  46. package/src/protocol/README.md +576 -0
  47. package/src/proxy/ComplexityRouter.js +37 -0
  48. package/src/security/SemanticScrubber.js +54 -0
  49. package/src/speculative/worker.js +96 -0
package/README.md CHANGED
@@ -1,366 +1,78 @@
1
- # Lemma v0.3.2
2
- > **The Universal AI Cache Proxy + Agent Orchestration Layer.**
1
+ # Lemma v0.4.0
2
+ > **The Intelligent AI Gateway Privacy, Performance, and Precision for the Agentic Era.**
3
3
 
4
- Lemma is the open-core gateway for AI development. It sits between your tools (Cursor, VS Code, CLI Agents) and your models (OpenAI, Claude, Gemini, Ollama), providing a **shared semantic memory** that saves you 40-70% in API costs and makes your AI tools instant.
5
-
6
- ⚡ **Why Lemma?**
7
- - 💰 **Stop paying twice**: Lemma caches redundant queries semantically. "Fix this bug" and "Solve this error" return the same cached answer.
8
- - ⚡ **Instant responses**: 3ms cache hits vs 2000ms LLM calls.
9
- - 🤖 **Universal Gateway**: One endpoint for OpenAI, Anthropic, and Gemini.
10
- - 🐝 **Agent Swarms**: Orchestrate multiple agents with shared memory.
11
-
12
- ⚡ **Quick Start (IDE Proxy)**
13
-
14
- Install and launch the proxy to start saving on your API bills immediately.
15
-
16
- ```bash
17
- npm install -g @nxuss/lemma
18
- lemma start
19
-
20
- # Or to launch the full development stack (Proxy + Dashboard + Hub + Chroma):
21
- lemma start --stack
22
- ```
23
-
24
- **Configure your IDE:**
25
-
26
- - **Base URL:** `http://localhost:8080/v1`
27
- - **Gemini Base:** `http://localhost:8080/v1beta`
28
-
29
- 🆓 **Free Tier:** 300 queries/month + Exact Matching.
30
- 💎 **Pro:** Unlimited queries + Semantic Caching ($12/mo or $120/yr).
31
- ☁️ **Cloud:** Managed infrastructure (Coming Soon).
32
-
33
- 👉 [Get Lemma Pro](https://lemma.nxus.studio/upgrade)
34
-
35
- ### Option 2: Multi-Agent System
36
- For building coordinated AI agent systems:
37
-
38
- ```bash
39
- npm install @nxuss/lemma
40
- ```
41
- 👉 [Multi-Agent Guide](https://lemma.nxus.studio/docs/multi-agent)
4
+ Lemma is a high-performance orchestration layer that sits between your development environment and LLM providers. It transforms the way you build with AI by providing **Shared Semantic Memory**, **Autonomous Cost Optimization**, and **Runtime Context Synchronization**.
42
5
 
43
6
  ---
44
7
 
45
- ## The problem with AI development costs
46
- When you use AI assistants for development, you pay for every prompt — even when asking similar questions:
47
-
48
- 1. *"How to implement JWT in Express?"*
49
- 2. *"Explain JWT authentication in Node.js"* ← **Same answer, paid twice**
50
- 3. *"Show me JWT example for Express"* ← **Same answer, paid three times**
51
-
52
- Lemma Proxy intercepts these calls and returns cached responses for similar prompts in **3ms instead of 600ms**, saving you money and time.
8
+ ## Killer Features
53
9
 
54
- ## The problem with multi-agent systems
55
- When you run multiple AI agents in parallel, they don't share context. Agent A solves a problem. Agent B gets the same problem 10 minutes later and solves it again. You pay twice, wait twice, and get the same answer.
10
+ ### 🛡️ Privacy Firewall (Semantic Scrubber)
11
+ **Zero-Trust Prompts.** Stop leaking sensitive data. Lemma automatically detects API keys, PII, and credentials in your prompts, masking them with secure tokens before they reach the cloud. Responses are seamlessly reconstructed locally, ensuring your secrets never leave your machine.
56
12
 
57
- At scale this compounds fast:
58
- **10 agents × 500 tasks/day × 70% overlap = 3,500 redundant LLM calls/day**
13
+ ### 🚦 Complexity Router (Cost-Optimizer)
14
+ **Intelligence Where it Matters.** Lemma analyzes the semantic complexity of every request. It autonomously routes lightweight tasks (like JSON formatting or translations) to hyper-efficient models like `gpt-4o-mini`, reserving premium models for high-reasoning challenges. Save up to 90% on simple tasks without losing quality.
59
15
 
60
- Lemma puts a shared semantic brain between your agents. When any agent solves something, every other agent gets that answer for free — even if they phrase the question differently.
16
+ ### 🧠 Telepathic Context Injector (Runtime Sync)
17
+ **Bridge the Gap Between Code and Execution.** Lemma synchronizes your application's live runtime state, exceptions, and memory mutations directly with your IDE’s consciousness. Your AI assistant gains immediate "situational awareness" of crashes and state changes, allowing for instant, context-aware debugging without manual intervention.
61
18
 
62
- ## How it works
63
-
64
- ```
65
- Agent A ──┐
66
- Agent B ──┤──► SubconsciousHub ──► Semantic Cache (ChromaDB + embeddings)
67
- Agent C ──┘ │
68
- └──► Agent Registry + Capability Routing
69
- ```
70
-
71
- 1. Agents connect via WebSocket and register their capabilities
72
- 2. Every task request hits the semantic cache first
73
- 3. On a miss, the hub routes to a capable agent and stores the result
74
- 4. On a hit, the response returns in **~20ms** — no agent invoked, no LLM called
75
-
76
- The cache is **semantic**, not exact. *"fibonacci up to n=10"* and *"compute fibonacci(10)"* resolve to the same cached answer.
19
+ ### Shared Semantic Cache
20
+ **Stop Paying for the Same Thought Twice.** Unlike traditional caches, Lemma understands meaning. It recognizes that *"Fix this CSS bug"* and *"Solve the styling error"* are functionally identical, returning instant (3ms) responses and saving 40-70% on total API expenditure.
77
21
 
78
22
  ---
79
23
 
80
- ## Quick start
81
-
82
- ### 1. Install and setup dependencies
83
- ```bash
84
- npm install @nxuss/lemma
85
-
86
- # For semantic mode (optional, lightweight embeddings):
87
- npm install @xenova/transformers
88
-
89
- # For persistent storage with ChromaDB (optional):
90
- pip install chromadb
91
- chroma run --path ./chroma_data --port 8000
92
-
93
- # For ChromaDB embeddings (optional):
94
- ollama pull nomic-embed-text
95
- ```
96
-
97
- ### 2. Choose your mode
98
-
99
- #### Option A: Semantic Mode (Recommended) ⚡
100
- Zero external dependencies, true semantic matching:
101
-
102
- ```typescript
103
- import { Lemma } from '@nxuss/lemma/embed';
104
-
105
- const lemma = await Lemma.create({
106
- storage: 'semantic', // Uses transformers.js
107
- threshold: 0.85, // Similarity threshold
108
- });
109
-
110
- const cachedLLM = lemma.wrap(async (query: string) => {
111
- return await yourLLMCall(query);
112
- });
113
-
114
- await cachedLLM('weather in San Francisco'); // Calls LLM
115
- await cachedLLM('SF weather forecast'); // Cache HIT! ⚡
116
- await cachedLLM('San Francisco temperature'); // Cache HIT! ⚡
117
- ```
118
-
119
- #### Option B: Memory Mode (Fastest)
120
- Exact matching, zero dependencies:
121
-
122
- ```typescript
123
- const lemma = await Lemma.create({
124
- storage: 'memory', // Default, exact match only
125
- });
126
- ```
127
-
128
- #### Option C: Server Mode (Multi-Agent)
129
- For multi-agent orchestration:
130
-
131
- ```typescript
132
- import { SubconsciousHub } from '@nxuss/lemma';
133
-
134
- const hub = new SubconsciousHub({
135
- server: { port: 8080 }
136
- });
137
-
138
- await hub.start();
139
- console.log('WebSocket hub listening on ws://localhost:8080');
140
- ```
141
-
142
- ### 3. Connect agents (Server Mode)
143
- ```typescript
144
- import WebSocket from 'ws';
145
-
146
- const ws = new WebSocket('ws://localhost:8080');
147
-
148
- ws.on('open', () => {
149
- // Register with the hub
150
- ws.send(JSON.stringify({
151
- type: 'handshake',
152
- messageId: `msg-${Date.now()}`,
153
- timestamp: Date.now(),
154
- payload: {
155
- agentId: 'my-agent-001',
156
- capabilities: [{ name: 'code-generation', description: 'Writes code', version: '1.0.0' }],
157
- metadata: { version: '1.0.0' }
158
- }
159
- }));
160
- });
24
+ ## 🚀 Quick Start
161
25
 
162
- ws.on('message', (data) => {
163
- const msg = JSON.parse(data.toString());
26
+ Launch the Lemma ecosystem in seconds:
164
27
 
165
- if (msg.type === 'handshake_ack') {
166
- // Connected send a task
167
- ws.send(JSON.stringify({
168
- type: 'task_request',
169
- messageId: `msg-${Date.now()}`,
170
- timestamp: Date.now(),
171
- payload: {
172
- taskId: `task-${Date.now()}`,
173
- taskType: 'general',
174
- description: 'Implement binary search in Python',
175
- requiredCapabilities: ['code-generation'],
176
- parameters: {}
177
- }
178
- }));
179
- }
180
-
181
- if (msg.type === 'task_response' || msg.type === 'TASK_RESPONSE') {
182
- const { cached, executionTime, result } = msg.payload;
183
- console.log(cached ? `⚡ Cache hit (${executionTime}ms)` : `🔄 Computed (${executionTime}ms)`);
184
- console.log(result);
185
- }
186
-
187
- if (msg.type === 'task_assign') {
188
- // Hub routed a task to us — process and respond
189
- const { taskId, description } = msg.payload;
190
- const result = yourLLM(description); // your actual LLM call
191
- ws.send(JSON.stringify({
192
- type: 'task_response',
193
- messageId: `msg-${Date.now()}`,
194
- timestamp: Date.now(),
195
- payload: { taskId, success: true, result, executionTime: 1200, tokensUsed: 800 }
196
- }));
197
- }
198
- });
28
+ ```bash
29
+ npm install -g @nxuss/lemma
30
+ lemma start --stack
199
31
  ```
200
32
 
201
- ### 4. See it in action
202
- When multiple agents request similar tasks, you'll see the cache working:
203
-
204
- ```
205
- [agent-001] 🔄 COMPUTED - Calculate fibonacci(10)... (1000ms)
206
- [agent-002] ⚡ CACHE HIT - compute the 10th fibonacci... (20ms)
207
- [agent-003] ⚡ CACHE HIT - fibonacci sequence up to n=10... (22ms)
208
- ```
209
- **Result: 100% cache hit rate after first computation. ~20ms responses. Zero duplicate LLM calls.**
33
+ **Point your IDE or SDK to Lemma:**
34
+ * **Base URL:** `http://localhost:8085/v1` (OpenAI / Anthropic Compatible)
35
+ * **Gemini Base:** `http://localhost:8085/v1beta`
36
+ * **Dashboard:** `http://localhost:3000`
210
37
 
211
38
  ---
212
39
 
213
- ## What's inside
40
+ ## 💎 Tier Comparison
214
41
 
215
- ### Embedded Mode Zero-config semantic cache
216
- The simplest way to add semantic caching to any project:
42
+ | Feature | 🆓 Free (Open-Core) | 💎 Pro ($12/mo) |
43
+ | :--- | :--- | :--- |
44
+ | **Caching** | Exact Match | **Semantic Memory (ChromaDB)** |
45
+ | **Security** | Basic Logging | **Privacy Firewall (Auto-Masking)** |
46
+ | **Optimization** | Manual Routing | **Autonomous Complexity Router** |
47
+ | **IDE Sync** | Raw Log Stream | **Telepathic Context Injector** |
48
+ | **Continuity** | Local Only | **Cloud Sync (Team Memory)** |
49
+ | **Limits** | 300 requests/mo | **Unlimited Agentic Power** |
217
50
 
218
- ```typescript
219
- import { Lemma } from '@nxuss/lemma/embed';
51
+ ---
220
52
 
221
- const lemma = await Lemma.create({
222
- storage: 'semantic', // or 'memory', 'chroma', 'cloud'
223
- threshold: 0.85,
224
- ttl: 3600000, // 1 hour
225
- cleanupInterval: 60000, // Auto-cleanup every minute
226
- enableFallback: true, // Auto-fallback on failures
227
- });
53
+ ## 🛠️ Developer Integration
228
54
 
229
- // Wrap any async function
230
- const cached = lemma.wrap(yourExpensiveFunction);
55
+ ### Use as an Intelligent Proxy
56
+ Simply swap your OpenAI/Anthropic base URL in your favorite tools (Cursor, Copilot, AutoGPT). Lemma handles the caching, scrubbing, and routing transparently.
231
57
 
232
- // Use it
233
- const result = await cached('your input');
234
- console.log(result.fromCache); // true on cache hit
58
+ ```bash
59
+ # Point your configuration to:
60
+ http://localhost:8085/v1
235
61
  ```
236
62
 
237
- **Features:**
238
- - Semantic matching with lightweight embeddings (transformers.js)
239
- - Automatic TTL cleanup prevents memory leaks
240
- - Circuit breaker with automatic fallbacks (Cloud → Chroma → Memory)
241
- - Health monitoring with detailed metrics
242
- - Graceful shutdown with `lemma.stop()`
243
-
244
- **Storage options:**
245
- - `memory`: Exact match, zero dependencies, fastest
246
- - `semantic`: True semantic matching, lightweight (50MB)
247
- - `chroma`: Persistent semantic cache (requires ChromaDB)
248
- - `cloud`: Managed cache (requires API key)
249
-
250
- ### SubconsciousHub — the orchestration layer
251
- The core of Lemma. A WebSocket server that manages agent connections, routes tasks by capability, and maintains the shared semantic cache.
252
-
63
+ ### Use as a Multi-Agent Hub
64
+ For complex agent swarms that need a "Hive Mind":
253
65
  ```typescript
254
66
  import { SubconsciousHub } from '@nxuss/lemma';
255
-
256
67
  const hub = new SubconsciousHub({ server: { port: 8080 } });
257
68
  await hub.start();
258
69
  ```
259
70
 
260
- **What it handles:**
261
- - Agent registration and capability discovery
262
- - Semantic cache lookup before every task (ChromaDB + nomic-embed-text embeddings)
263
- - Task routing to capable agents on cache miss
264
- - Response storage for future cache hits
265
- - WebSocket heartbeat and connection lifecycle
266
- - Rate limiting and message sanitization
267
-
268
- ### Semantic cache — the shared memory
269
- Built on ChromaDB with Ollama embeddings. Catches paraphrases, not just exact matches.
270
-
271
- - *"fibonacci up to n=10"* ──► **cache hit** (similarity: 0.97)
272
- - *"compute the 10th fibonacci"* ──► **cache hit** (similarity: 0.91)
273
- - *"fib sequence, first 10 terms"* ──► **cache hit** (similarity: 0.88)
274
-
275
- Threshold is configurable (`SEMANTIC_THRESHOLD=0.85` by default).
276
-
277
- ### Consensus engine — multi-model voting
278
- For high-stakes decisions, route a query through multiple models and only return when they agree.
279
-
280
- ```typescript
281
- import { ConsensusEngine } from '@nxuss/lemma';
282
-
283
- const consensus = new ConsensusEngine({
284
- minModels: 3,
285
- minAgreement: 0.90,
286
- maxRounds: 3,
287
- });
288
-
289
- const result = await consensus.requestConsensus({
290
- query: 'Is this SQL query safe to run in production?',
291
- models: ['llama3', 'gpt-4', 'claude-3'],
292
- });
293
- // Returns only when 3 models agree ≥90%
294
- ```
295
- Supports Ollama (local), OpenAI, Anthropic, and Google models simultaneously.
296
-
297
71
  ---
298
72
 
299
- ## New in v0.2.0 🎉
300
-
301
- ### 1. Semantic Memory Backend
302
- True semantic caching without external dependencies:
303
- ```typescript
304
- const lemma = await Lemma.create({
305
- storage: 'semantic',
306
- embeddingModel: 'Xenova/all-MiniLM-L6-v2', // Lightweight!
307
- });
308
-
309
- // These all match semantically:
310
- await lemma.run('weather in SF', fetchWeather);
311
- await lemma.run('San Francisco weather', fetchWeather); // HIT!
312
- await lemma.run('SF temperature forecast', fetchWeather); // HIT!
313
- ```
314
-
315
- ### 2. Automatic TTL Cleanup
316
- No more memory leaks from expired entries:
317
- ```typescript
318
- const lemma = await Lemma.create({
319
- ttl: 3600000, // 1 hour expiry
320
- cleanupInterval: 60000, // Check every minute
321
- });
322
- ```
323
-
324
- ### 3. Circuit Breaker & Fallbacks
325
- Automatic resilience when backends fail:
326
- ```typescript
327
- const lemma = await Lemma.create({
328
- storage: 'cloud',
329
- enableFallback: true,
330
- maxRetries: 3,
331
- retryDelay: 1000,
332
- });
333
- ```
334
-
335
- ### 4. Enhanced Metrics & Health Monitoring
336
- ```typescript
337
- const metrics = lemma.getMetrics();
338
- // { hits: 150, misses: 50, hitRate: 0.75, ... }
339
- ```
340
-
341
- ### 5. Dual Module Support (ESM + CJS)
342
- Full support for both modern and legacy Node.js projects.
73
+ ## 📊 Dashboard
74
+ Monitor your savings, visualize agent connections, and inspect your semantic memory through the integrated real-time dashboard.
343
75
 
344
76
  ---
345
77
 
346
- ## Install
347
- ```bash
348
- npm install @nxuss/lemma
349
- ```
350
-
351
- ## Configuration
352
- ```bash
353
- # .env
354
- WS_PORT=8080
355
- CHROMA_HOST=http://localhost
356
- CHROMA_PORT=8000
357
- OLLAMA_HOST=http://localhost:11434
358
- OLLAMA_MODEL=nomic-embed-text
359
- SEMANTIC_THRESHOLD=0.85
360
- ```
361
-
362
- ## Contributing
363
- Contributions are welcome! Visit [lemma.nxus.studio](https://lemma.nxus.studio)
364
-
365
- ## License
366
- MIT © Nxus Studio
78
+ MIT © Nxus Studio | [Get Lemma Pro](https://lemma.nxus.studio/upgrade)