npm - @nxuss/lemma - Versions diffs - 0.3.2 → 0.3.3 - Mend

@nxuss/lemma 0.3.2 → 0.3.3

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (2) hide show

package/README.md +34 -190
package/package.json +10 -2

package/README.md CHANGED Viewed

@@ -3,73 +3,62 @@
 Lemma is the open-core gateway for AI development. It sits between your tools (Cursor, VS Code, CLI Agents) and your models (OpenAI, Claude, Gemini, Ollama), providing a **shared semantic memory** that saves you 40-70% in API costs and makes your AI tools instant.
-### 🚀 Why Lemma?
+⚡ **Why Lemma?**
 - 💰 **Stop paying twice**: Lemma caches redundant queries semantically. "Fix this bug" and "Solve this error" return the same cached answer.
 - ⚡ **Instant responses**: 3ms cache hits vs 2000ms LLM calls.
 - 🤖 **Universal Gateway**: One endpoint for OpenAI, Anthropic, and Gemini.
 - 🐝 **Agent Swarms**: Orchestrate multiple agents with shared memory.
----
-## ⚡ Quick Start (IDE Proxy)
+⚡ **Quick Start (IDE Proxy)**
 Install and launch the proxy to start saving on your API bills immediately.
 ```bash
 npm install -g @nxuss/lemma
 lemma start
+# Or to launch the full development stack (Proxy + Dashboard + Hub + Chroma):
+lemma start --stack
 ```
 **Configure your IDE:**
 - **Base URL:** `http://localhost:8080/v1`
 - **Gemini Base:** `http://localhost:8080/v1beta`
-- 🆓 **Free Tier**: 300 queries/month + Exact Matching.
-- 💎 **Pro**: Unlimited queries + **Semantic Caching** ($12/mo or $120/yr).
-- ☁️ **Cloud**: Managed infrastructure (Coming Soon).
+🆓 **Free Tier:** 300 queries/month + Exact Matching.
+💎 **Pro:** Unlimited queries + Semantic Caching ($12/mo or $120/yr).
+☁️ **Cloud:** Managed infrastructure (Coming Soon).
-👉 **[Get Lemma Pro](https://lemma.nxus.studio/upgrade)**
+👉 [Get Lemma Pro](https://lemma.nxus.studio/upgrade)
 ### Option 2: Multi-Agent System
 For building coordinated AI agent systems:
 ```bash
 npm install @nxuss/lemma
 ```
-👉 **[Multi-Agent Guide](#quick-start)**
+👉 [Multi-Agent Guide](https://lemma.nxus.studio/docs/multi-agent)
 ---
 ## The problem with AI development costs
 When you use AI assistants for development, you pay for every prompt — even when asking similar questions:
-```
-"How to implement JWT in Express?"
-"Explain JWT authentication in Node.js"  ← Same answer, paid twice
-"Show me JWT example for Express"       ← Same answer, paid three times
-```
-**Lemma Proxy** intercepts these calls and returns cached responses for similar prompts in 3ms instead of 600ms, saving you money and time.
+1. *"How to implement JWT in Express?"*
+2. *"Explain JWT authentication in Node.js"*  ← **Same answer, paid twice**
+3. *"Show me JWT example for Express"*       ← **Same answer, paid three times**
----
+Lemma Proxy intercepts these calls and returns cached responses for similar prompts in **3ms instead of 600ms**, saving you money and time.
 ## The problem with multi-agent systems
 When you run multiple AI agents in parallel, they don't share context. Agent A solves a problem. Agent B gets the same problem 10 minutes later and solves it again. You pay twice, wait twice, and get the same answer.
 At scale this compounds fast:
-```
-10 agents × 500 tasks/day × 70% overlap = 3,500 redundant LLM calls/day
-```
+**10 agents × 500 tasks/day × 70% overlap = 3,500 redundant LLM calls/day**
 Lemma puts a shared semantic brain between your agents. When any agent solves something, every other agent gets that answer for free — even if they phrase the question differently.
----
 ## How it works
 ```
@@ -82,16 +71,15 @@ Agent C ──┘         │
 1. Agents connect via WebSocket and register their capabilities
 2. Every task request hits the semantic cache first
 3. On a miss, the hub routes to a capable agent and stores the result
-4. On a hit, the response returns in ~20ms — no agent invoked, no LLM called
+4. On a hit, the response returns in **~20ms** — no agent invoked, no LLM called
-The cache is semantic, not exact. "fibonacci up to n=10" and "compute fibonacci(10)" resolve to the same cached answer.
+The cache is **semantic**, not exact. *"fibonacci up to n=10"* and *"compute fibonacci(10)"* resolve to the same cached answer.
 ---
 ## Quick start
 ### 1. Install and setup dependencies
 ```bash
 npm install @nxuss/lemma
@@ -109,7 +97,6 @@ ollama pull nomic-embed-text
 ### 2. Choose your mode
 #### Option A: Semantic Mode (Recommended) ⚡
 Zero external dependencies, true semantic matching:
 ```typescript
@@ -130,7 +117,6 @@ await cachedLLM('San Francisco temperature');  // Cache HIT! ⚡
 ```
 #### Option B: Memory Mode (Fastest)
 Exact matching, zero dependencies:
 ```typescript
@@ -140,9 +126,6 @@ const lemma = await Lemma.create({
 ```
 #### Option C: Server Mode (Multi-Agent)
-#### Option C: Server Mode (Multi-Agent)
 For multi-agent orchestration:
 ```typescript
@@ -157,7 +140,6 @@ console.log('WebSocket hub listening on ws://localhost:8080');
 ```
 ### 3. Connect agents (Server Mode)
 ```typescript
 import WebSocket from 'ws';
@@ -217,7 +199,6 @@ ws.on('message', (data) => {
 ```
 ### 4. See it in action
 When multiple agents request similar tasks, you'll see the cache working:
 ```
@@ -225,15 +206,13 @@ When multiple agents request similar tasks, you'll see the cache working:
 [agent-002] ⚡ CACHE HIT - compute the 10th fibonacci... (20ms)
 [agent-003] ⚡ CACHE HIT - fibonacci sequence up to n=10... (22ms)
 ```
-**Result:** 100% cache hit rate after first computation. ~20ms responses. Zero duplicate LLM calls.
+**Result: 100% cache hit rate after first computation. ~20ms responses. Zero duplicate LLM calls.**
 ---
 ## What's inside
 ### Embedded Mode — Zero-config semantic cache
 The simplest way to add semantic caching to any project:
 ```typescript
@@ -256,11 +235,11 @@ console.log(result.fromCache); // true on cache hit
 ```
 **Features:**
-- **Semantic matching** with lightweight embeddings (transformers.js)
-- **Automatic TTL cleanup** prevents memory leaks
-- **Circuit breaker** with automatic fallbacks (Cloud → Chroma → Memory)
-- **Health monitoring** with detailed metrics
-- **Graceful shutdown** with `lemma.stop()`
+- Semantic matching with lightweight embeddings (transformers.js)
+- Automatic TTL cleanup prevents memory leaks
+- Circuit breaker with automatic fallbacks (Cloud → Chroma → Memory)
+- Health monitoring with detailed metrics
+- Graceful shutdown with `lemma.stop()`
 **Storage options:**
 - `memory`: Exact match, zero dependencies, fastest
@@ -269,7 +248,6 @@ console.log(result.fromCache); // true on cache hit
 - `cloud`: Managed cache (requires API key)
 ### SubconsciousHub — the orchestration layer
 The core of Lemma. A WebSocket server that manages agent connections, routes tasks by capability, and maintains the shared semantic cache.
 ```typescript
@@ -281,26 +259,22 @@ await hub.start();
 **What it handles:**
 - Agent registration and capability discovery
-- Semantic cache lookup before every task (ChromaDB + `nomic-embed-text` embeddings)
+- Semantic cache lookup before every task (ChromaDB + nomic-embed-text embeddings)
 - Task routing to capable agents on cache miss
 - Response storage for future cache hits
 - WebSocket heartbeat and connection lifecycle
 - Rate limiting and message sanitization
 ### Semantic cache — the shared memory
 Built on ChromaDB with Ollama embeddings. Catches paraphrases, not just exact matches.
-```
-"fibonacci up to n=10"          ──► cache hit (similarity: 0.97)
-"compute the 10th fibonacci"    ──► cache hit (similarity: 0.91)
-"fib sequence, first 10 terms"  ──► cache hit (similarity: 0.88)
-```
+- *"fibonacci up to n=10"*          ──► **cache hit** (similarity: 0.97)
+- *"compute the 10th fibonacci"*    ──► **cache hit** (similarity: 0.91)
+- *"fib sequence, first 10 terms"*  ──► **cache hit** (similarity: 0.88)
 Threshold is configurable (`SEMANTIC_THRESHOLD=0.85` by default).
 ### Consensus engine — multi-model voting
 For high-stakes decisions, route a query through multiple models and only return when they agree.
 ```typescript
@@ -318,7 +292,6 @@ const result = await consensus.requestConsensus({
 });
 // Returns only when 3 models agree ≥90%
 ```
 Supports Ollama (local), OpenAI, Anthropic, and Google models simultaneously.
 ---
@@ -326,9 +299,7 @@ Supports Ollama (local), OpenAI, Anthropic, and Google models simultaneously.
 ## New in v0.2.0 🎉
 ### 1. Semantic Memory Backend
 True semantic caching without external dependencies:
 ```typescript
 const lemma = await Lemma.create({
   storage: 'semantic',
@@ -342,120 +313,42 @@ await lemma.run('SF temperature forecast', fetchWeather); // HIT!
 ```
 ### 2. Automatic TTL Cleanup
 No more memory leaks from expired entries:
 ```typescript
 const lemma = await Lemma.create({
   ttl: 3600000,          // 1 hour expiry
   cleanupInterval: 60000, // Check every minute
 });
-// Expired entries are automatically removed
-// No manual cleanup needed!
 ```
 ### 3. Circuit Breaker & Fallbacks
 Automatic resilience when backends fail:
 ```typescript
 const lemma = await Lemma.create({
   storage: 'cloud',
-  enableFallback: true,  // Auto-fallback on failure
+  enableFallback: true,
   maxRetries: 3,
   retryDelay: 1000,
 });
-// If cloud fails → falls back to chroma
-// If chroma fails → falls back to memory
-// Automatic recovery when backend comes back
-lemma.on('backend-degraded', ({ from, to }) => {
-  console.log(`Degraded: ${from} → ${to}`);
-});
-lemma.on('backend-recovered', ({ backend }) => {
-  console.log(`Recovered: ${backend}`);
-});
 ```
 ### 4. Enhanced Metrics & Health Monitoring
 ```typescript
 const metrics = lemma.getMetrics();
-console.log(metrics);
-// {
-//   hits: 150,
-//   misses: 50,
-//   hitRate: 0.75,
-//   backendHealth: 'healthy',
-//   failureCount: 0,
-//   evictedCount: 23,
-//   lastCleanupAt: 1234567890
-// }
-const health = lemma.getBackendHealth();
-console.log(health);
-// {
-//   state: 'CLOSED',
-//   currentBackend: 'semantic',
-//   failureCount: 0,
-//   totalFailures: 0
-// }
+// { hits: 150, misses: 50, hitRate: 0.75, ... }
 ```
 ### 5. Dual Module Support (ESM + CJS)
-```typescript
-// ESM
-import { Lemma } from '@nxuss/lemma/embed';
-import { ConsensusEngine } from '@nxuss/lemma/consensus';
-import { SpeculativeEngine } from '@nxuss/lemma/speculative';
-// CJS
-const { Lemma } = require('@nxuss/lemma/embed');
-const { ConsensusEngine } = require('@nxuss/lemma/consensus');
-```
-**New exports:**
-- `@nxuss/lemma/consensus` - Multi-model consensus
-- `@nxuss/lemma/speculative` - Speculative execution
-- `@nxuss/lemma/security` - Security utilities
-- `@nxuss/lemma/protocol` - IAP protocol
-- `@nxuss/lemma/langchain` - LangChain SDK
-- `@nxuss/lemma/crewai` - CrewAI SDK
-See [MIGRATION_GUIDE.md](docs/MIGRATION_GUIDE.md) for upgrade instructions.
+Full support for both modern and legacy Node.js projects.
 ---
 ## Install
 ```bash
 npm install @nxuss/lemma
 ```
-**Optional dependencies (install as needed):**
-```bash
-# For semantic mode (lightweight embeddings)
-npm install @xenova/transformers
-# For persistent storage with ChromaDB
-pip install chromadb
-chroma run --path ./chroma_data --port 8000
-# For ChromaDB embeddings
-ollama pull nomic-embed-text
-```
-**Zero dependencies required** for basic memory mode!
----
 ## Configuration
 ```bash
 # .env
 WS_PORT=8080
@@ -463,60 +356,11 @@ CHROMA_HOST=http://localhost
 CHROMA_PORT=8000
 OLLAMA_HOST=http://localhost:11434
 OLLAMA_MODEL=nomic-embed-text
-SEMANTIC_THRESHOLD=0.85   # similarity cutoff (0–1)
-ENABLE_CACHING=true
-AUTH_ENABLED=false        # set true in production
+SEMANTIC_THRESHOLD=0.85
 ```
----
-## Examples & Documentation
-For complete examples including:
-- Single agent setup
-- Multi-agent swarms
-- Consensus voting
-- Security & authentication
-- LangChain/CrewAI integration
-Visit [lemma.nxus.studio/docs](https://lemma.nxus.studio/docs)
----
-## Who this is for
-- Teams running **LangChain, CrewAI, or custom agent frameworks** who need shared memory across agents
-- Systems where **multiple agents handle overlapping queries** — support bots, research pipelines, code assistants
-- Anyone whose **LLM bill scales with agent count** rather than unique queries
-Lemma is designed for multi-agent systems where coordination and shared memory provide immediate value.
----
-## Production deployment
-Lemma can be deployed to any Node.js hosting environment. For production setup guides including:
-- Docker deployment
-- API key management
-- Security configuration
-- Monitoring & observability
-Visit [lemma.nxus.studio/docs/deployment](https://lemma.nxus.studio/docs/deployment)
----
-## Cloud hosting (coming soon)
-Managed Lemma instances with zero infrastructure setup. Check pricing and availability at [lemma.nxus.studio](https://lemma.nxus.studio)
----
 ## Contributing
-Contributions are welcome! For development setup and guidelines, visit [lemma.nxus.studio](https://lemma.nxus.studio)
----
+Contributions are welcome! Visit [lemma.nxus.studio](https://lemma.nxus.studio)
 ## License
-MIT © [Nxus Studio](https://nxus.studio)
+MIT © Nxus Studio

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "@nxuss/lemma",
-  "version": "0.3.2",
+  "version": "0.3.3",
   "description": "Semantic cache for AI apps — stop paying for the same LLM call twice",
   "main": "./dist/cjs/index.js",
   "module": "./dist/esm/index.js",
@@ -68,6 +68,14 @@
         "types": "./dist/cjs/protocol/index.d.ts",
         "default": "./dist/cjs/protocol/index.js"
       }
+    },
+    "./langchain": {
+      "types": "./sdks/langchain/dist/index.d.ts",
+      "default": "./sdks/langchain/dist/index.js"
+    },
+    "./crewai": {
+      "types": "./sdks/crewai/dist/index.d.ts",
+      "default": "./sdks/crewai/dist/index.js"
     }
   },
   "files": [
@@ -151,7 +159,7 @@
   "homepage": "https://github.com/Nxusbets/lemma#readme",
   "dependencies": {
     "@chroma-core/default-embed": "^1.1.4",
-    "@nxuss/lemma": "^0.3.2",
+    "@nxuss/lemma": "^0.3.1",
     "@types/cors": "^2.8.19",
     "axios": "^1.6.0",
     "commander": "^14.0.3",