@nxuss/lemma 0.3.1 → 0.3.3

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -1,75 +1,64 @@
1
- # Lemma v0.3.0
1
+ # Lemma v0.3.2
2
2
  > **The Universal AI Cache Proxy + Agent Orchestration Layer.**
3
3
 
4
4
  Lemma is the open-core gateway for AI development. It sits between your tools (Cursor, VS Code, CLI Agents) and your models (OpenAI, Claude, Gemini, Ollama), providing a **shared semantic memory** that saves you 40-70% in API costs and makes your AI tools instant.
5
5
 
6
- ### šŸš€ Why Lemma?
6
+ ⚔ **Why Lemma?**
7
7
  - šŸ’° **Stop paying twice**: Lemma caches redundant queries semantically. "Fix this bug" and "Solve this error" return the same cached answer.
8
8
  - ⚔ **Instant responses**: 3ms cache hits vs 2000ms LLM calls.
9
9
  - šŸ¤– **Universal Gateway**: One endpoint for OpenAI, Anthropic, and Gemini.
10
10
  - šŸ **Agent Swarms**: Orchestrate multiple agents with shared memory.
11
11
 
12
- ---
13
-
14
- ## ⚔ Quick Start (IDE Proxy)
12
+ ⚔ **Quick Start (IDE Proxy)**
15
13
 
16
14
  Install and launch the proxy to start saving on your API bills immediately.
17
15
 
18
16
  ```bash
19
17
  npm install -g @nxuss/lemma
20
- lemma-proxy start
18
+ lemma start
19
+
20
+ # Or to launch the full development stack (Proxy + Dashboard + Hub + Chroma):
21
+ lemma start --stack
21
22
  ```
22
23
 
23
24
  **Configure your IDE:**
25
+
24
26
  - **Base URL:** `http://localhost:8080/v1`
25
27
  - **Gemini Base:** `http://localhost:8080/v1beta`
26
28
 
27
- - šŸ†“ **Free Tier**: 300 queries/month + Exact Matching.
28
- - šŸ’Ž **Pro**: Unlimited queries + **Semantic Caching** ($12/mo or $120/yr).
29
- - ā˜ļø **Cloud**: Managed infrastructure (Coming Soon).
29
+ šŸ†“ **Free Tier:** 300 queries/month + Exact Matching.
30
+ šŸ’Ž **Pro:** Unlimited queries + Semantic Caching ($12/mo or $120/yr).
31
+ ā˜ļø **Cloud:** Managed infrastructure (Coming Soon).
30
32
 
31
- šŸ‘‰ **[Get Lemma Pro](https://lemma.nxus.studio/upgrade)**
33
+ šŸ‘‰ [Get Lemma Pro](https://lemma.nxus.studio/upgrade)
32
34
 
33
35
  ### Option 2: Multi-Agent System
34
-
35
36
  For building coordinated AI agent systems:
36
37
 
37
38
  ```bash
38
39
  npm install @nxuss/lemma
39
40
  ```
40
-
41
- šŸ‘‰ **[Multi-Agent Guide](#quick-start)**
41
+ šŸ‘‰ [Multi-Agent Guide](https://lemma.nxus.studio/docs/multi-agent)
42
42
 
43
43
  ---
44
44
 
45
45
  ## The problem with AI development costs
46
-
47
46
  When you use AI assistants for development, you pay for every prompt — even when asking similar questions:
48
47
 
49
- ```
50
- "How to implement JWT in Express?"
51
- "Explain JWT authentication in Node.js" ← Same answer, paid twice
52
- "Show me JWT example for Express" ← Same answer, paid three times
53
- ```
54
-
55
- **Lemma Proxy** intercepts these calls and returns cached responses for similar prompts in 3ms instead of 600ms, saving you money and time.
48
+ 1. *"How to implement JWT in Express?"*
49
+ 2. *"Explain JWT authentication in Node.js"* ← **Same answer, paid twice**
50
+ 3. *"Show me JWT example for Express"* ← **Same answer, paid three times**
56
51
 
57
- ---
52
+ Lemma Proxy intercepts these calls and returns cached responses for similar prompts in **3ms instead of 600ms**, saving you money and time.
58
53
 
59
54
  ## The problem with multi-agent systems
60
-
61
55
  When you run multiple AI agents in parallel, they don't share context. Agent A solves a problem. Agent B gets the same problem 10 minutes later and solves it again. You pay twice, wait twice, and get the same answer.
62
56
 
63
57
  At scale this compounds fast:
64
-
65
- ```
66
- 10 agents Ɨ 500 tasks/day Ɨ 70% overlap = 3,500 redundant LLM calls/day
67
- ```
58
+ **10 agents Ɨ 500 tasks/day Ɨ 70% overlap = 3,500 redundant LLM calls/day**
68
59
 
69
60
  Lemma puts a shared semantic brain between your agents. When any agent solves something, every other agent gets that answer for free — even if they phrase the question differently.
70
61
 
71
- ---
72
-
73
62
  ## How it works
74
63
 
75
64
  ```
@@ -82,16 +71,15 @@ Agent C ā”€ā”€ā”˜ │
82
71
  1. Agents connect via WebSocket and register their capabilities
83
72
  2. Every task request hits the semantic cache first
84
73
  3. On a miss, the hub routes to a capable agent and stores the result
85
- 4. On a hit, the response returns in ~20ms — no agent invoked, no LLM called
74
+ 4. On a hit, the response returns in **~20ms** — no agent invoked, no LLM called
86
75
 
87
- The cache is semantic, not exact. "fibonacci up to n=10" and "compute fibonacci(10)" resolve to the same cached answer.
76
+ The cache is **semantic**, not exact. *"fibonacci up to n=10"* and *"compute fibonacci(10)"* resolve to the same cached answer.
88
77
 
89
78
  ---
90
79
 
91
80
  ## Quick start
92
81
 
93
82
  ### 1. Install and setup dependencies
94
-
95
83
  ```bash
96
84
  npm install @nxuss/lemma
97
85
 
@@ -109,7 +97,6 @@ ollama pull nomic-embed-text
109
97
  ### 2. Choose your mode
110
98
 
111
99
  #### Option A: Semantic Mode (Recommended) ⚔
112
-
113
100
  Zero external dependencies, true semantic matching:
114
101
 
115
102
  ```typescript
@@ -130,7 +117,6 @@ await cachedLLM('San Francisco temperature'); // Cache HIT! ⚔
130
117
  ```
131
118
 
132
119
  #### Option B: Memory Mode (Fastest)
133
-
134
120
  Exact matching, zero dependencies:
135
121
 
136
122
  ```typescript
@@ -140,9 +126,6 @@ const lemma = await Lemma.create({
140
126
  ```
141
127
 
142
128
  #### Option C: Server Mode (Multi-Agent)
143
-
144
- #### Option C: Server Mode (Multi-Agent)
145
-
146
129
  For multi-agent orchestration:
147
130
 
148
131
  ```typescript
@@ -157,7 +140,6 @@ console.log('WebSocket hub listening on ws://localhost:8080');
157
140
  ```
158
141
 
159
142
  ### 3. Connect agents (Server Mode)
160
-
161
143
  ```typescript
162
144
  import WebSocket from 'ws';
163
145
 
@@ -217,7 +199,6 @@ ws.on('message', (data) => {
217
199
  ```
218
200
 
219
201
  ### 4. See it in action
220
-
221
202
  When multiple agents request similar tasks, you'll see the cache working:
222
203
 
223
204
  ```
@@ -225,15 +206,13 @@ When multiple agents request similar tasks, you'll see the cache working:
225
206
  [agent-002] ⚔ CACHE HIT - compute the 10th fibonacci... (20ms)
226
207
  [agent-003] ⚔ CACHE HIT - fibonacci sequence up to n=10... (22ms)
227
208
  ```
228
-
229
- **Result:** 100% cache hit rate after first computation. ~20ms responses. Zero duplicate LLM calls.
209
+ **Result: 100% cache hit rate after first computation. ~20ms responses. Zero duplicate LLM calls.**
230
210
 
231
211
  ---
232
212
 
233
213
  ## What's inside
234
214
 
235
215
  ### Embedded Mode — Zero-config semantic cache
236
-
237
216
  The simplest way to add semantic caching to any project:
238
217
 
239
218
  ```typescript
@@ -256,11 +235,11 @@ console.log(result.fromCache); // true on cache hit
256
235
  ```
257
236
 
258
237
  **Features:**
259
- - **Semantic matching** with lightweight embeddings (transformers.js)
260
- - **Automatic TTL cleanup** prevents memory leaks
261
- - **Circuit breaker** with automatic fallbacks (Cloud → Chroma → Memory)
262
- - **Health monitoring** with detailed metrics
263
- - **Graceful shutdown** with `lemma.stop()`
238
+ - Semantic matching with lightweight embeddings (transformers.js)
239
+ - Automatic TTL cleanup prevents memory leaks
240
+ - Circuit breaker with automatic fallbacks (Cloud → Chroma → Memory)
241
+ - Health monitoring with detailed metrics
242
+ - Graceful shutdown with `lemma.stop()`
264
243
 
265
244
  **Storage options:**
266
245
  - `memory`: Exact match, zero dependencies, fastest
@@ -269,7 +248,6 @@ console.log(result.fromCache); // true on cache hit
269
248
  - `cloud`: Managed cache (requires API key)
270
249
 
271
250
  ### SubconsciousHub — the orchestration layer
272
-
273
251
  The core of Lemma. A WebSocket server that manages agent connections, routes tasks by capability, and maintains the shared semantic cache.
274
252
 
275
253
  ```typescript
@@ -281,26 +259,22 @@ await hub.start();
281
259
 
282
260
  **What it handles:**
283
261
  - Agent registration and capability discovery
284
- - Semantic cache lookup before every task (ChromaDB + `nomic-embed-text` embeddings)
262
+ - Semantic cache lookup before every task (ChromaDB + nomic-embed-text embeddings)
285
263
  - Task routing to capable agents on cache miss
286
264
  - Response storage for future cache hits
287
265
  - WebSocket heartbeat and connection lifecycle
288
266
  - Rate limiting and message sanitization
289
267
 
290
268
  ### Semantic cache — the shared memory
291
-
292
269
  Built on ChromaDB with Ollama embeddings. Catches paraphrases, not just exact matches.
293
270
 
294
- ```
295
- "fibonacci up to n=10" ──► cache hit (similarity: 0.97)
296
- "compute the 10th fibonacci" ──► cache hit (similarity: 0.91)
297
- "fib sequence, first 10 terms" ──► cache hit (similarity: 0.88)
298
- ```
271
+ - *"fibonacci up to n=10"* ──► **cache hit** (similarity: 0.97)
272
+ - *"compute the 10th fibonacci"* ──► **cache hit** (similarity: 0.91)
273
+ - *"fib sequence, first 10 terms"* ──► **cache hit** (similarity: 0.88)
299
274
 
300
275
  Threshold is configurable (`SEMANTIC_THRESHOLD=0.85` by default).
301
276
 
302
277
  ### Consensus engine — multi-model voting
303
-
304
278
  For high-stakes decisions, route a query through multiple models and only return when they agree.
305
279
 
306
280
  ```typescript
@@ -318,7 +292,6 @@ const result = await consensus.requestConsensus({
318
292
  });
319
293
  // Returns only when 3 models agree ≄90%
320
294
  ```
321
-
322
295
  Supports Ollama (local), OpenAI, Anthropic, and Google models simultaneously.
323
296
 
324
297
  ---
@@ -326,9 +299,7 @@ Supports Ollama (local), OpenAI, Anthropic, and Google models simultaneously.
326
299
  ## New in v0.2.0 šŸŽ‰
327
300
 
328
301
  ### 1. Semantic Memory Backend
329
-
330
302
  True semantic caching without external dependencies:
331
-
332
303
  ```typescript
333
304
  const lemma = await Lemma.create({
334
305
  storage: 'semantic',
@@ -342,120 +313,42 @@ await lemma.run('SF temperature forecast', fetchWeather); // HIT!
342
313
  ```
343
314
 
344
315
  ### 2. Automatic TTL Cleanup
345
-
346
316
  No more memory leaks from expired entries:
347
-
348
317
  ```typescript
349
318
  const lemma = await Lemma.create({
350
319
  ttl: 3600000, // 1 hour expiry
351
320
  cleanupInterval: 60000, // Check every minute
352
321
  });
353
-
354
- // Expired entries are automatically removed
355
- // No manual cleanup needed!
356
322
  ```
357
323
 
358
324
  ### 3. Circuit Breaker & Fallbacks
359
-
360
325
  Automatic resilience when backends fail:
361
-
362
326
  ```typescript
363
327
  const lemma = await Lemma.create({
364
328
  storage: 'cloud',
365
- enableFallback: true, // Auto-fallback on failure
329
+ enableFallback: true,
366
330
  maxRetries: 3,
367
331
  retryDelay: 1000,
368
332
  });
369
-
370
- // If cloud fails → falls back to chroma
371
- // If chroma fails → falls back to memory
372
- // Automatic recovery when backend comes back
373
-
374
- lemma.on('backend-degraded', ({ from, to }) => {
375
- console.log(`Degraded: ${from} → ${to}`);
376
- });
377
-
378
- lemma.on('backend-recovered', ({ backend }) => {
379
- console.log(`Recovered: ${backend}`);
380
- });
381
333
  ```
382
334
 
383
335
  ### 4. Enhanced Metrics & Health Monitoring
384
-
385
336
  ```typescript
386
337
  const metrics = lemma.getMetrics();
387
- console.log(metrics);
388
- // {
389
- // hits: 150,
390
- // misses: 50,
391
- // hitRate: 0.75,
392
- // backendHealth: 'healthy',
393
- // failureCount: 0,
394
- // evictedCount: 23,
395
- // lastCleanupAt: 1234567890
396
- // }
397
-
398
- const health = lemma.getBackendHealth();
399
- console.log(health);
400
- // {
401
- // state: 'CLOSED',
402
- // currentBackend: 'semantic',
403
- // failureCount: 0,
404
- // totalFailures: 0
405
- // }
338
+ // { hits: 150, misses: 50, hitRate: 0.75, ... }
406
339
  ```
407
340
 
408
341
  ### 5. Dual Module Support (ESM + CJS)
409
-
410
- ```typescript
411
- // ESM
412
- import { Lemma } from '@nxuss/lemma/embed';
413
- import { ConsensusEngine } from '@nxuss/lemma/consensus';
414
- import { SpeculativeEngine } from '@nxuss/lemma/speculative';
415
-
416
- // CJS
417
- const { Lemma } = require('@nxuss/lemma/embed');
418
- const { ConsensusEngine } = require('@nxuss/lemma/consensus');
419
- ```
420
-
421
- **New exports:**
422
- - `@nxuss/lemma/consensus` - Multi-model consensus
423
- - `@nxuss/lemma/speculative` - Speculative execution
424
- - `@nxuss/lemma/security` - Security utilities
425
- - `@nxuss/lemma/protocol` - IAP protocol
426
- - `@nxuss/lemma/langchain` - LangChain SDK
427
- - `@nxuss/lemma/crewai` - CrewAI SDK
428
-
429
- See [MIGRATION_GUIDE.md](docs/MIGRATION_GUIDE.md) for upgrade instructions.
342
+ Full support for both modern and legacy Node.js projects.
430
343
 
431
344
  ---
432
345
 
433
346
  ## Install
434
-
435
347
  ```bash
436
348
  npm install @nxuss/lemma
437
349
  ```
438
350
 
439
- **Optional dependencies (install as needed):**
440
-
441
- ```bash
442
- # For semantic mode (lightweight embeddings)
443
- npm install @xenova/transformers
444
-
445
- # For persistent storage with ChromaDB
446
- pip install chromadb
447
- chroma run --path ./chroma_data --port 8000
448
-
449
- # For ChromaDB embeddings
450
- ollama pull nomic-embed-text
451
- ```
452
-
453
- **Zero dependencies required** for basic memory mode!
454
-
455
- ---
456
-
457
351
  ## Configuration
458
-
459
352
  ```bash
460
353
  # .env
461
354
  WS_PORT=8080
@@ -463,60 +356,11 @@ CHROMA_HOST=http://localhost
463
356
  CHROMA_PORT=8000
464
357
  OLLAMA_HOST=http://localhost:11434
465
358
  OLLAMA_MODEL=nomic-embed-text
466
- SEMANTIC_THRESHOLD=0.85 # similarity cutoff (0–1)
467
- ENABLE_CACHING=true
468
- AUTH_ENABLED=false # set true in production
359
+ SEMANTIC_THRESHOLD=0.85
469
360
  ```
470
361
 
471
- ---
472
-
473
- ## Examples & Documentation
474
-
475
- For complete examples including:
476
- - Single agent setup
477
- - Multi-agent swarms
478
- - Consensus voting
479
- - Security & authentication
480
- - LangChain/CrewAI integration
481
-
482
- Visit [lemma.nxus.studio/docs](https://lemma.nxus.studio/docs)
483
-
484
- ---
485
-
486
- ## Who this is for
487
-
488
- - Teams running **LangChain, CrewAI, or custom agent frameworks** who need shared memory across agents
489
- - Systems where **multiple agents handle overlapping queries** — support bots, research pipelines, code assistants
490
- - Anyone whose **LLM bill scales with agent count** rather than unique queries
491
-
492
- Lemma is designed for multi-agent systems where coordination and shared memory provide immediate value.
493
-
494
- ---
495
-
496
- ## Production deployment
497
-
498
- Lemma can be deployed to any Node.js hosting environment. For production setup guides including:
499
- - Docker deployment
500
- - API key management
501
- - Security configuration
502
- - Monitoring & observability
503
-
504
- Visit [lemma.nxus.studio/docs/deployment](https://lemma.nxus.studio/docs/deployment)
505
-
506
- ---
507
-
508
- ## Cloud hosting (coming soon)
509
-
510
- Managed Lemma instances with zero infrastructure setup. Check pricing and availability at [lemma.nxus.studio](https://lemma.nxus.studio)
511
-
512
- ---
513
-
514
362
  ## Contributing
515
-
516
- Contributions are welcome! For development setup and guidelines, visit [lemma.nxus.studio](https://lemma.nxus.studio)
517
-
518
- ---
363
+ Contributions are welcome! Visit [lemma.nxus.studio](https://lemma.nxus.studio)
519
364
 
520
365
  ## License
521
-
522
- MIT Ā© [Nxus Studio](https://nxus.studio)
366
+ MIT Ā© Nxus Studio
package/lemma-proxy.cjs CHANGED
@@ -1,7 +1,7 @@
1
1
  #!/usr/bin/env node
2
2
  'use strict';
3
3
  /**
4
- * Lemma Proxy v0.3.1 — Universal AI Cache CLI
4
+ * Lemma Proxy v0.3.2 — Universal AI Cache CLI
5
5
  * Commands: start, stop, stats, status, activate <key>
6
6
  */
7
7
 
@@ -79,7 +79,7 @@ async function validateKeyRemote(key) {
79
79
  const body = JSON.stringify({ key });
80
80
  const u = new URL(VALIDATE_URL);
81
81
  const req = https.request({ hostname: u.hostname, port: 443, path: u.pathname, method: 'POST',
82
- headers: { 'Content-Type': 'application/json', 'Content-Length': Buffer.byteLength(body), 'User-Agent': 'lemma-proxy/0.3.1' }
82
+ headers: { 'Content-Type': 'application/json', 'Content-Length': Buffer.byteLength(body), 'User-Agent': 'lemma-proxy/0.3.2' }
83
83
  }, res => {
84
84
  let d = '';
85
85
  res.on('data', c => d += c);
@@ -413,14 +413,14 @@ class LemmaServer {
413
413
  server.listen(this.port, () => {
414
414
  fs.writeFileSync(PID_FILE, String(process.pid));
415
415
  fs.writeFileSync(PORT_FILE, String(this.port));
416
- console.log(`\nšŸš€ Lemma Proxy v0.3.1\nšŸ“ Project : ${this.projectName}\nšŸ”Œ Port : ${this.port}\n`);
416
+ console.log(`\nšŸš€ Lemma Proxy v0.3.2\nšŸ“ Project : ${this.projectName}\nšŸ”Œ Port : ${this.port}\n`);
417
417
  });
418
418
  process.on('SIGTERM', () => server.close());
419
419
  }
420
420
  }
421
421
 
422
422
  // ── CLI ────────────────────────────────────────────────────────────────────────
423
- program.name('lemma').description('Lemma Proxy CLI').version('0.3.1');
423
+ program.name('lemma').description('Lemma Proxy CLI').version('0.3.2');
424
424
 
425
425
  program.command('start')
426
426
  .description('Start the proxy server')
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "@nxuss/lemma",
3
- "version": "0.3.1",
3
+ "version": "0.3.3",
4
4
  "description": "Semantic cache for AI apps — stop paying for the same LLM call twice",
5
5
  "main": "./dist/cjs/index.js",
6
6
  "module": "./dist/esm/index.js",
@@ -68,6 +68,14 @@
68
68
  "types": "./dist/cjs/protocol/index.d.ts",
69
69
  "default": "./dist/cjs/protocol/index.js"
70
70
  }
71
+ },
72
+ "./langchain": {
73
+ "types": "./sdks/langchain/dist/index.d.ts",
74
+ "default": "./sdks/langchain/dist/index.js"
75
+ },
76
+ "./crewai": {
77
+ "types": "./sdks/crewai/dist/index.d.ts",
78
+ "default": "./sdks/crewai/dist/index.js"
71
79
  }
72
80
  },
73
81
  "files": [