@nxuss/lemma 0.3.2 → 0.3.3
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +34 -190
- package/package.json +10 -2
package/README.md
CHANGED
|
@@ -3,73 +3,62 @@
|
|
|
3
3
|
|
|
4
4
|
Lemma is the open-core gateway for AI development. It sits between your tools (Cursor, VS Code, CLI Agents) and your models (OpenAI, Claude, Gemini, Ollama), providing a **shared semantic memory** that saves you 40-70% in API costs and makes your AI tools instant.
|
|
5
5
|
|
|
6
|
-
|
|
6
|
+
⚡ **Why Lemma?**
|
|
7
7
|
- 💰 **Stop paying twice**: Lemma caches redundant queries semantically. "Fix this bug" and "Solve this error" return the same cached answer.
|
|
8
8
|
- ⚡ **Instant responses**: 3ms cache hits vs 2000ms LLM calls.
|
|
9
9
|
- 🤖 **Universal Gateway**: One endpoint for OpenAI, Anthropic, and Gemini.
|
|
10
10
|
- 🐝 **Agent Swarms**: Orchestrate multiple agents with shared memory.
|
|
11
11
|
|
|
12
|
-
|
|
13
|
-
|
|
14
|
-
## ⚡ Quick Start (IDE Proxy)
|
|
12
|
+
⚡ **Quick Start (IDE Proxy)**
|
|
15
13
|
|
|
16
14
|
Install and launch the proxy to start saving on your API bills immediately.
|
|
17
15
|
|
|
18
16
|
```bash
|
|
19
17
|
npm install -g @nxuss/lemma
|
|
20
18
|
lemma start
|
|
19
|
+
|
|
20
|
+
# Or to launch the full development stack (Proxy + Dashboard + Hub + Chroma):
|
|
21
|
+
lemma start --stack
|
|
21
22
|
```
|
|
22
23
|
|
|
23
24
|
**Configure your IDE:**
|
|
25
|
+
|
|
24
26
|
- **Base URL:** `http://localhost:8080/v1`
|
|
25
27
|
- **Gemini Base:** `http://localhost:8080/v1beta`
|
|
26
28
|
|
|
27
|
-
|
|
28
|
-
|
|
29
|
-
|
|
29
|
+
🆓 **Free Tier:** 300 queries/month + Exact Matching.
|
|
30
|
+
💎 **Pro:** Unlimited queries + Semantic Caching ($12/mo or $120/yr).
|
|
31
|
+
☁️ **Cloud:** Managed infrastructure (Coming Soon).
|
|
30
32
|
|
|
31
|
-
👉
|
|
33
|
+
👉 [Get Lemma Pro](https://lemma.nxus.studio/upgrade)
|
|
32
34
|
|
|
33
35
|
### Option 2: Multi-Agent System
|
|
34
|
-
|
|
35
36
|
For building coordinated AI agent systems:
|
|
36
37
|
|
|
37
38
|
```bash
|
|
38
39
|
npm install @nxuss/lemma
|
|
39
40
|
```
|
|
40
|
-
|
|
41
|
-
👉 **[Multi-Agent Guide](#quick-start)**
|
|
41
|
+
👉 [Multi-Agent Guide](https://lemma.nxus.studio/docs/multi-agent)
|
|
42
42
|
|
|
43
43
|
---
|
|
44
44
|
|
|
45
45
|
## The problem with AI development costs
|
|
46
|
-
|
|
47
46
|
When you use AI assistants for development, you pay for every prompt — even when asking similar questions:
|
|
48
47
|
|
|
49
|
-
|
|
50
|
-
"
|
|
51
|
-
"
|
|
52
|
-
"Show me JWT example for Express" ← Same answer, paid three times
|
|
53
|
-
```
|
|
54
|
-
|
|
55
|
-
**Lemma Proxy** intercepts these calls and returns cached responses for similar prompts in 3ms instead of 600ms, saving you money and time.
|
|
48
|
+
1. *"How to implement JWT in Express?"*
|
|
49
|
+
2. *"Explain JWT authentication in Node.js"* ← **Same answer, paid twice**
|
|
50
|
+
3. *"Show me JWT example for Express"* ← **Same answer, paid three times**
|
|
56
51
|
|
|
57
|
-
|
|
52
|
+
Lemma Proxy intercepts these calls and returns cached responses for similar prompts in **3ms instead of 600ms**, saving you money and time.
|
|
58
53
|
|
|
59
54
|
## The problem with multi-agent systems
|
|
60
|
-
|
|
61
55
|
When you run multiple AI agents in parallel, they don't share context. Agent A solves a problem. Agent B gets the same problem 10 minutes later and solves it again. You pay twice, wait twice, and get the same answer.
|
|
62
56
|
|
|
63
57
|
At scale this compounds fast:
|
|
64
|
-
|
|
65
|
-
```
|
|
66
|
-
10 agents × 500 tasks/day × 70% overlap = 3,500 redundant LLM calls/day
|
|
67
|
-
```
|
|
58
|
+
**10 agents × 500 tasks/day × 70% overlap = 3,500 redundant LLM calls/day**
|
|
68
59
|
|
|
69
60
|
Lemma puts a shared semantic brain between your agents. When any agent solves something, every other agent gets that answer for free — even if they phrase the question differently.
|
|
70
61
|
|
|
71
|
-
---
|
|
72
|
-
|
|
73
62
|
## How it works
|
|
74
63
|
|
|
75
64
|
```
|
|
@@ -82,16 +71,15 @@ Agent C ──┘ │
|
|
|
82
71
|
1. Agents connect via WebSocket and register their capabilities
|
|
83
72
|
2. Every task request hits the semantic cache first
|
|
84
73
|
3. On a miss, the hub routes to a capable agent and stores the result
|
|
85
|
-
4. On a hit, the response returns in
|
|
74
|
+
4. On a hit, the response returns in **~20ms** — no agent invoked, no LLM called
|
|
86
75
|
|
|
87
|
-
The cache is semantic
|
|
76
|
+
The cache is **semantic**, not exact. *"fibonacci up to n=10"* and *"compute fibonacci(10)"* resolve to the same cached answer.
|
|
88
77
|
|
|
89
78
|
---
|
|
90
79
|
|
|
91
80
|
## Quick start
|
|
92
81
|
|
|
93
82
|
### 1. Install and setup dependencies
|
|
94
|
-
|
|
95
83
|
```bash
|
|
96
84
|
npm install @nxuss/lemma
|
|
97
85
|
|
|
@@ -109,7 +97,6 @@ ollama pull nomic-embed-text
|
|
|
109
97
|
### 2. Choose your mode
|
|
110
98
|
|
|
111
99
|
#### Option A: Semantic Mode (Recommended) ⚡
|
|
112
|
-
|
|
113
100
|
Zero external dependencies, true semantic matching:
|
|
114
101
|
|
|
115
102
|
```typescript
|
|
@@ -130,7 +117,6 @@ await cachedLLM('San Francisco temperature'); // Cache HIT! ⚡
|
|
|
130
117
|
```
|
|
131
118
|
|
|
132
119
|
#### Option B: Memory Mode (Fastest)
|
|
133
|
-
|
|
134
120
|
Exact matching, zero dependencies:
|
|
135
121
|
|
|
136
122
|
```typescript
|
|
@@ -140,9 +126,6 @@ const lemma = await Lemma.create({
|
|
|
140
126
|
```
|
|
141
127
|
|
|
142
128
|
#### Option C: Server Mode (Multi-Agent)
|
|
143
|
-
|
|
144
|
-
#### Option C: Server Mode (Multi-Agent)
|
|
145
|
-
|
|
146
129
|
For multi-agent orchestration:
|
|
147
130
|
|
|
148
131
|
```typescript
|
|
@@ -157,7 +140,6 @@ console.log('WebSocket hub listening on ws://localhost:8080');
|
|
|
157
140
|
```
|
|
158
141
|
|
|
159
142
|
### 3. Connect agents (Server Mode)
|
|
160
|
-
|
|
161
143
|
```typescript
|
|
162
144
|
import WebSocket from 'ws';
|
|
163
145
|
|
|
@@ -217,7 +199,6 @@ ws.on('message', (data) => {
|
|
|
217
199
|
```
|
|
218
200
|
|
|
219
201
|
### 4. See it in action
|
|
220
|
-
|
|
221
202
|
When multiple agents request similar tasks, you'll see the cache working:
|
|
222
203
|
|
|
223
204
|
```
|
|
@@ -225,15 +206,13 @@ When multiple agents request similar tasks, you'll see the cache working:
|
|
|
225
206
|
[agent-002] ⚡ CACHE HIT - compute the 10th fibonacci... (20ms)
|
|
226
207
|
[agent-003] ⚡ CACHE HIT - fibonacci sequence up to n=10... (22ms)
|
|
227
208
|
```
|
|
228
|
-
|
|
229
|
-
**Result:** 100% cache hit rate after first computation. ~20ms responses. Zero duplicate LLM calls.
|
|
209
|
+
**Result: 100% cache hit rate after first computation. ~20ms responses. Zero duplicate LLM calls.**
|
|
230
210
|
|
|
231
211
|
---
|
|
232
212
|
|
|
233
213
|
## What's inside
|
|
234
214
|
|
|
235
215
|
### Embedded Mode — Zero-config semantic cache
|
|
236
|
-
|
|
237
216
|
The simplest way to add semantic caching to any project:
|
|
238
217
|
|
|
239
218
|
```typescript
|
|
@@ -256,11 +235,11 @@ console.log(result.fromCache); // true on cache hit
|
|
|
256
235
|
```
|
|
257
236
|
|
|
258
237
|
**Features:**
|
|
259
|
-
-
|
|
260
|
-
-
|
|
261
|
-
-
|
|
262
|
-
-
|
|
263
|
-
-
|
|
238
|
+
- Semantic matching with lightweight embeddings (transformers.js)
|
|
239
|
+
- Automatic TTL cleanup prevents memory leaks
|
|
240
|
+
- Circuit breaker with automatic fallbacks (Cloud → Chroma → Memory)
|
|
241
|
+
- Health monitoring with detailed metrics
|
|
242
|
+
- Graceful shutdown with `lemma.stop()`
|
|
264
243
|
|
|
265
244
|
**Storage options:**
|
|
266
245
|
- `memory`: Exact match, zero dependencies, fastest
|
|
@@ -269,7 +248,6 @@ console.log(result.fromCache); // true on cache hit
|
|
|
269
248
|
- `cloud`: Managed cache (requires API key)
|
|
270
249
|
|
|
271
250
|
### SubconsciousHub — the orchestration layer
|
|
272
|
-
|
|
273
251
|
The core of Lemma. A WebSocket server that manages agent connections, routes tasks by capability, and maintains the shared semantic cache.
|
|
274
252
|
|
|
275
253
|
```typescript
|
|
@@ -281,26 +259,22 @@ await hub.start();
|
|
|
281
259
|
|
|
282
260
|
**What it handles:**
|
|
283
261
|
- Agent registration and capability discovery
|
|
284
|
-
- Semantic cache lookup before every task (ChromaDB +
|
|
262
|
+
- Semantic cache lookup before every task (ChromaDB + nomic-embed-text embeddings)
|
|
285
263
|
- Task routing to capable agents on cache miss
|
|
286
264
|
- Response storage for future cache hits
|
|
287
265
|
- WebSocket heartbeat and connection lifecycle
|
|
288
266
|
- Rate limiting and message sanitization
|
|
289
267
|
|
|
290
268
|
### Semantic cache — the shared memory
|
|
291
|
-
|
|
292
269
|
Built on ChromaDB with Ollama embeddings. Catches paraphrases, not just exact matches.
|
|
293
270
|
|
|
294
|
-
|
|
295
|
-
"
|
|
296
|
-
"
|
|
297
|
-
"fib sequence, first 10 terms" ──► cache hit (similarity: 0.88)
|
|
298
|
-
```
|
|
271
|
+
- *"fibonacci up to n=10"* ──► **cache hit** (similarity: 0.97)
|
|
272
|
+
- *"compute the 10th fibonacci"* ──► **cache hit** (similarity: 0.91)
|
|
273
|
+
- *"fib sequence, first 10 terms"* ──► **cache hit** (similarity: 0.88)
|
|
299
274
|
|
|
300
275
|
Threshold is configurable (`SEMANTIC_THRESHOLD=0.85` by default).
|
|
301
276
|
|
|
302
277
|
### Consensus engine — multi-model voting
|
|
303
|
-
|
|
304
278
|
For high-stakes decisions, route a query through multiple models and only return when they agree.
|
|
305
279
|
|
|
306
280
|
```typescript
|
|
@@ -318,7 +292,6 @@ const result = await consensus.requestConsensus({
|
|
|
318
292
|
});
|
|
319
293
|
// Returns only when 3 models agree ≥90%
|
|
320
294
|
```
|
|
321
|
-
|
|
322
295
|
Supports Ollama (local), OpenAI, Anthropic, and Google models simultaneously.
|
|
323
296
|
|
|
324
297
|
---
|
|
@@ -326,9 +299,7 @@ Supports Ollama (local), OpenAI, Anthropic, and Google models simultaneously.
|
|
|
326
299
|
## New in v0.2.0 🎉
|
|
327
300
|
|
|
328
301
|
### 1. Semantic Memory Backend
|
|
329
|
-
|
|
330
302
|
True semantic caching without external dependencies:
|
|
331
|
-
|
|
332
303
|
```typescript
|
|
333
304
|
const lemma = await Lemma.create({
|
|
334
305
|
storage: 'semantic',
|
|
@@ -342,120 +313,42 @@ await lemma.run('SF temperature forecast', fetchWeather); // HIT!
|
|
|
342
313
|
```
|
|
343
314
|
|
|
344
315
|
### 2. Automatic TTL Cleanup
|
|
345
|
-
|
|
346
316
|
No more memory leaks from expired entries:
|
|
347
|
-
|
|
348
317
|
```typescript
|
|
349
318
|
const lemma = await Lemma.create({
|
|
350
319
|
ttl: 3600000, // 1 hour expiry
|
|
351
320
|
cleanupInterval: 60000, // Check every minute
|
|
352
321
|
});
|
|
353
|
-
|
|
354
|
-
// Expired entries are automatically removed
|
|
355
|
-
// No manual cleanup needed!
|
|
356
322
|
```
|
|
357
323
|
|
|
358
324
|
### 3. Circuit Breaker & Fallbacks
|
|
359
|
-
|
|
360
325
|
Automatic resilience when backends fail:
|
|
361
|
-
|
|
362
326
|
```typescript
|
|
363
327
|
const lemma = await Lemma.create({
|
|
364
328
|
storage: 'cloud',
|
|
365
|
-
enableFallback: true,
|
|
329
|
+
enableFallback: true,
|
|
366
330
|
maxRetries: 3,
|
|
367
331
|
retryDelay: 1000,
|
|
368
332
|
});
|
|
369
|
-
|
|
370
|
-
// If cloud fails → falls back to chroma
|
|
371
|
-
// If chroma fails → falls back to memory
|
|
372
|
-
// Automatic recovery when backend comes back
|
|
373
|
-
|
|
374
|
-
lemma.on('backend-degraded', ({ from, to }) => {
|
|
375
|
-
console.log(`Degraded: ${from} → ${to}`);
|
|
376
|
-
});
|
|
377
|
-
|
|
378
|
-
lemma.on('backend-recovered', ({ backend }) => {
|
|
379
|
-
console.log(`Recovered: ${backend}`);
|
|
380
|
-
});
|
|
381
333
|
```
|
|
382
334
|
|
|
383
335
|
### 4. Enhanced Metrics & Health Monitoring
|
|
384
|
-
|
|
385
336
|
```typescript
|
|
386
337
|
const metrics = lemma.getMetrics();
|
|
387
|
-
|
|
388
|
-
// {
|
|
389
|
-
// hits: 150,
|
|
390
|
-
// misses: 50,
|
|
391
|
-
// hitRate: 0.75,
|
|
392
|
-
// backendHealth: 'healthy',
|
|
393
|
-
// failureCount: 0,
|
|
394
|
-
// evictedCount: 23,
|
|
395
|
-
// lastCleanupAt: 1234567890
|
|
396
|
-
// }
|
|
397
|
-
|
|
398
|
-
const health = lemma.getBackendHealth();
|
|
399
|
-
console.log(health);
|
|
400
|
-
// {
|
|
401
|
-
// state: 'CLOSED',
|
|
402
|
-
// currentBackend: 'semantic',
|
|
403
|
-
// failureCount: 0,
|
|
404
|
-
// totalFailures: 0
|
|
405
|
-
// }
|
|
338
|
+
// { hits: 150, misses: 50, hitRate: 0.75, ... }
|
|
406
339
|
```
|
|
407
340
|
|
|
408
341
|
### 5. Dual Module Support (ESM + CJS)
|
|
409
|
-
|
|
410
|
-
```typescript
|
|
411
|
-
// ESM
|
|
412
|
-
import { Lemma } from '@nxuss/lemma/embed';
|
|
413
|
-
import { ConsensusEngine } from '@nxuss/lemma/consensus';
|
|
414
|
-
import { SpeculativeEngine } from '@nxuss/lemma/speculative';
|
|
415
|
-
|
|
416
|
-
// CJS
|
|
417
|
-
const { Lemma } = require('@nxuss/lemma/embed');
|
|
418
|
-
const { ConsensusEngine } = require('@nxuss/lemma/consensus');
|
|
419
|
-
```
|
|
420
|
-
|
|
421
|
-
**New exports:**
|
|
422
|
-
- `@nxuss/lemma/consensus` - Multi-model consensus
|
|
423
|
-
- `@nxuss/lemma/speculative` - Speculative execution
|
|
424
|
-
- `@nxuss/lemma/security` - Security utilities
|
|
425
|
-
- `@nxuss/lemma/protocol` - IAP protocol
|
|
426
|
-
- `@nxuss/lemma/langchain` - LangChain SDK
|
|
427
|
-
- `@nxuss/lemma/crewai` - CrewAI SDK
|
|
428
|
-
|
|
429
|
-
See [MIGRATION_GUIDE.md](docs/MIGRATION_GUIDE.md) for upgrade instructions.
|
|
342
|
+
Full support for both modern and legacy Node.js projects.
|
|
430
343
|
|
|
431
344
|
---
|
|
432
345
|
|
|
433
346
|
## Install
|
|
434
|
-
|
|
435
347
|
```bash
|
|
436
348
|
npm install @nxuss/lemma
|
|
437
349
|
```
|
|
438
350
|
|
|
439
|
-
**Optional dependencies (install as needed):**
|
|
440
|
-
|
|
441
|
-
```bash
|
|
442
|
-
# For semantic mode (lightweight embeddings)
|
|
443
|
-
npm install @xenova/transformers
|
|
444
|
-
|
|
445
|
-
# For persistent storage with ChromaDB
|
|
446
|
-
pip install chromadb
|
|
447
|
-
chroma run --path ./chroma_data --port 8000
|
|
448
|
-
|
|
449
|
-
# For ChromaDB embeddings
|
|
450
|
-
ollama pull nomic-embed-text
|
|
451
|
-
```
|
|
452
|
-
|
|
453
|
-
**Zero dependencies required** for basic memory mode!
|
|
454
|
-
|
|
455
|
-
---
|
|
456
|
-
|
|
457
351
|
## Configuration
|
|
458
|
-
|
|
459
352
|
```bash
|
|
460
353
|
# .env
|
|
461
354
|
WS_PORT=8080
|
|
@@ -463,60 +356,11 @@ CHROMA_HOST=http://localhost
|
|
|
463
356
|
CHROMA_PORT=8000
|
|
464
357
|
OLLAMA_HOST=http://localhost:11434
|
|
465
358
|
OLLAMA_MODEL=nomic-embed-text
|
|
466
|
-
SEMANTIC_THRESHOLD=0.85
|
|
467
|
-
ENABLE_CACHING=true
|
|
468
|
-
AUTH_ENABLED=false # set true in production
|
|
359
|
+
SEMANTIC_THRESHOLD=0.85
|
|
469
360
|
```
|
|
470
361
|
|
|
471
|
-
---
|
|
472
|
-
|
|
473
|
-
## Examples & Documentation
|
|
474
|
-
|
|
475
|
-
For complete examples including:
|
|
476
|
-
- Single agent setup
|
|
477
|
-
- Multi-agent swarms
|
|
478
|
-
- Consensus voting
|
|
479
|
-
- Security & authentication
|
|
480
|
-
- LangChain/CrewAI integration
|
|
481
|
-
|
|
482
|
-
Visit [lemma.nxus.studio/docs](https://lemma.nxus.studio/docs)
|
|
483
|
-
|
|
484
|
-
---
|
|
485
|
-
|
|
486
|
-
## Who this is for
|
|
487
|
-
|
|
488
|
-
- Teams running **LangChain, CrewAI, or custom agent frameworks** who need shared memory across agents
|
|
489
|
-
- Systems where **multiple agents handle overlapping queries** — support bots, research pipelines, code assistants
|
|
490
|
-
- Anyone whose **LLM bill scales with agent count** rather than unique queries
|
|
491
|
-
|
|
492
|
-
Lemma is designed for multi-agent systems where coordination and shared memory provide immediate value.
|
|
493
|
-
|
|
494
|
-
---
|
|
495
|
-
|
|
496
|
-
## Production deployment
|
|
497
|
-
|
|
498
|
-
Lemma can be deployed to any Node.js hosting environment. For production setup guides including:
|
|
499
|
-
- Docker deployment
|
|
500
|
-
- API key management
|
|
501
|
-
- Security configuration
|
|
502
|
-
- Monitoring & observability
|
|
503
|
-
|
|
504
|
-
Visit [lemma.nxus.studio/docs/deployment](https://lemma.nxus.studio/docs/deployment)
|
|
505
|
-
|
|
506
|
-
---
|
|
507
|
-
|
|
508
|
-
## Cloud hosting (coming soon)
|
|
509
|
-
|
|
510
|
-
Managed Lemma instances with zero infrastructure setup. Check pricing and availability at [lemma.nxus.studio](https://lemma.nxus.studio)
|
|
511
|
-
|
|
512
|
-
---
|
|
513
|
-
|
|
514
362
|
## Contributing
|
|
515
|
-
|
|
516
|
-
Contributions are welcome! For development setup and guidelines, visit [lemma.nxus.studio](https://lemma.nxus.studio)
|
|
517
|
-
|
|
518
|
-
---
|
|
363
|
+
Contributions are welcome! Visit [lemma.nxus.studio](https://lemma.nxus.studio)
|
|
519
364
|
|
|
520
365
|
## License
|
|
521
|
-
|
|
522
|
-
MIT © [Nxus Studio](https://nxus.studio)
|
|
366
|
+
MIT © Nxus Studio
|
package/package.json
CHANGED
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
{
|
|
2
2
|
"name": "@nxuss/lemma",
|
|
3
|
-
"version": "0.3.
|
|
3
|
+
"version": "0.3.3",
|
|
4
4
|
"description": "Semantic cache for AI apps — stop paying for the same LLM call twice",
|
|
5
5
|
"main": "./dist/cjs/index.js",
|
|
6
6
|
"module": "./dist/esm/index.js",
|
|
@@ -68,6 +68,14 @@
|
|
|
68
68
|
"types": "./dist/cjs/protocol/index.d.ts",
|
|
69
69
|
"default": "./dist/cjs/protocol/index.js"
|
|
70
70
|
}
|
|
71
|
+
},
|
|
72
|
+
"./langchain": {
|
|
73
|
+
"types": "./sdks/langchain/dist/index.d.ts",
|
|
74
|
+
"default": "./sdks/langchain/dist/index.js"
|
|
75
|
+
},
|
|
76
|
+
"./crewai": {
|
|
77
|
+
"types": "./sdks/crewai/dist/index.d.ts",
|
|
78
|
+
"default": "./sdks/crewai/dist/index.js"
|
|
71
79
|
}
|
|
72
80
|
},
|
|
73
81
|
"files": [
|
|
@@ -151,7 +159,7 @@
|
|
|
151
159
|
"homepage": "https://github.com/Nxusbets/lemma#readme",
|
|
152
160
|
"dependencies": {
|
|
153
161
|
"@chroma-core/default-embed": "^1.1.4",
|
|
154
|
-
"@nxuss/lemma": "^0.3.
|
|
162
|
+
"@nxuss/lemma": "^0.3.1",
|
|
155
163
|
"@types/cors": "^2.8.19",
|
|
156
164
|
"axios": "^1.6.0",
|
|
157
165
|
"commander": "^14.0.3",
|