npm - @sparkleideas/embeddings - Versions diffs - 3.0.0-alpha.17 → 3.0.0-alpha.26 - Mend

@sparkleideas/embeddings 3.0.0-alpha.17 → 3.0.0-alpha.26

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (26) hide show

package/README.md +308 -17
package/package.json +19 -7
package/src/chunking.ts +351 -0
package/src/embedding-service.ts +477 -5
package/src/hyperbolic.ts +458 -0
package/src/index.ts +77 -0
package/src/neural-integration.ts +295 -0
package/src/normalization.ts +267 -0
package/src/persistent-cache.ts +410 -0
package/src/types.ts +61 -2
package/dist/__tests__/embedding-service.test.d.ts +0 -2
package/dist/__tests__/embedding-service.test.d.ts.map +0 -1
package/dist/__tests__/embedding-service.test.js +0 -98
package/dist/__tests__/embedding-service.test.js.map +0 -1
package/dist/embedding-service.d.ts +0 -113
package/dist/embedding-service.d.ts.map +0 -1
package/dist/embedding-service.js +0 -543
package/dist/embedding-service.js.map +0 -1
package/dist/index.d.ts +0 -15
package/dist/index.d.ts.map +0 -1
package/dist/index.js +0 -15
package/dist/index.js.map +0 -1
package/dist/types.d.ts +0 -178
package/dist/types.d.ts.map +0 -1
package/dist/types.js +0 -15
package/dist/types.js.map +0 -1

package/README.md CHANGED Viewed

@@ -4,26 +4,27 @@
 [![npm downloads](https://img.shields.io/npm/dm/@claude-flow/embeddings.svg)](https://www.npmjs.com/package/@claude-flow/embeddings)
 [![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
 [![TypeScript](https://img.shields.io/badge/TypeScript-5.0+-blue.svg)](https://www.typescriptlang.org/)
-[![Performance](https://img.shields.io/badge/Performance-<100ms-brightgreen.svg)](https://github.com/ruvnet/claude-flow)
+[![Performance](https://img.shields.io/badge/Performance-<5ms-brightgreen.svg)](https://github.com/ruvnet/claude-flow)
-> High-performance embedding generation module for Claude Flow V3 - multi-provider support, LRU caching, batch processing, and similarity computation.
+> High-performance embedding generation module for Claude Flow V3 - multi-provider support with persistent caching, document chunking, normalization, hyperbolic embeddings, and neural substrate integration.
 ## Features
-- **Multiple Providers** - OpenAI, Transformers.js (local), and Mock for testing
-- **LRU Caching** - Intelligent caching with configurable size and hit rate tracking
+### Core Embedding
+- **Multiple Providers** - Agentic-Flow (ONNX), OpenAI, Transformers.js, and Mock
+- **Auto-Install** - Automatically installs agentic-flow when using `provider: 'auto'`
+- **Smart Fallback** - Graceful fallback chain: agentic-flow → transformers → mock
+- **LRU + Disk Caching** - In-memory LRU + SQLite persistent cache with TTL
 - **Batch Processing** - Efficient batch embedding with partial cache hits
 - **Similarity Functions** - Cosine, Euclidean, and dot product metrics
-- **Event System** - Observable embedding operations with event listeners
-- **Type-Safe** - Full TypeScript support with comprehensive type definitions
+- **75x Faster** - Agentic-flow ONNX is 75x faster than Transformers.js
-## Performance Targets
-| Operation | API Provider | Local Provider |
-|-----------|--------------|----------------|
-| Single embedding | <100ms | <50ms |
-| Batch (10 items) | <500ms | <200ms |
-| Cache hit | <1ms | <1ms |
+### Advanced Features (New in v3.0.0-alpha.11)
+- **Document Chunking** - Character, sentence, paragraph, and token-based chunking with overlap
+- **Multiple Normalization** - L2, L1, min-max, and z-score normalization
+- **Hyperbolic Embeddings** - Poincaré ball model for hierarchical representations
+- **Neural Substrate** - Semantic drift detection, memory physics, swarm coordination
+- **Persistent Cache** - SQLite-backed disk cache with LRU eviction and TTL
 ## Installation
@@ -66,19 +67,45 @@ const similarity = cosineSimilarity(
 console.log(`Similarity: ${similarity.toFixed(4)}`);
 ```
+## CLI Usage
+```bash
+# Generate embedding from CLI
+claude-flow embeddings embed "Your text here"
+# Batch embed from file
+claude-flow embeddings batch documents.txt -o embeddings.json
+# Similarity search
+claude-flow embeddings search "query" --index ./vectors
+# Initialize agentic-flow model
+claude-flow embeddings init --provider agentic-flow
+```
 ## API Reference
 ### Factory Functions
 ```typescript
-import { createEmbeddingService, getEmbedding } from '@claude-flow/embeddings';
+import {
+  createEmbeddingService,
+  createEmbeddingServiceAsync,
+  getEmbedding
+} from '@claude-flow/embeddings';
-// Create a service instance
+// Sync: Create with known provider
 const service = createEmbeddingService({
   provider: 'openai',
   apiKey: 'your-api-key',
   model: 'text-embedding-3-small',
-  cacheSize: 1000,
+});
+// Async: Auto-select best provider with fallback
+const autoService = await createEmbeddingServiceAsync({
+  provider: 'auto',       // agentic-flow → transformers → mock
+  autoInstall: true,      // Install agentic-flow if missing
+  fallback: 'transformers', // Custom fallback
 });
 // Quick one-off embedding
@@ -108,6 +135,22 @@ const result = await service.embed('Your text here');
 console.log('Tokens used:', result.usage?.totalTokens);
 ```
+### Agentic-Flow Provider (Fastest)
+```typescript
+import { AgenticFlowEmbeddingService } from '@claude-flow/embeddings';
+const service = new AgenticFlowEmbeddingService({
+  provider: 'agentic-flow',
+  modelId: 'default',     // Uses optimized ONNX model
+  cacheSize: 256,
+});
+// 75x faster than Transformers.js (3ms vs 233ms)
+const result = await service.embed('Your text here');
+console.log(`ONNX embedding in ${result.latencyMs}ms`);
+```
 ### Transformers.js Provider (Local)
 ```typescript
@@ -243,10 +286,17 @@ service.removeEventListener(listener);
 | Provider | Latency | Quality | Cost | Offline |
 |----------|---------|---------|------|---------|
+| **Agentic-Flow** | ~3ms | Good | Free | Yes |
 | **OpenAI** | ~50-100ms | Excellent | $0.02-0.13/1M tokens | No |
-| **Transformers.js** | ~20-50ms | Good | Free | Yes |
+| **Transformers.js** | ~230ms | Good | Free | Yes |
 | **Mock** | <1ms | N/A | Free | Yes |
+### Agentic-Flow (Recommended)
+| Model | Dimensions | Speed | Best For |
+|-------|------------|-------|----------|
+| `default` | 384 | 3ms | General purpose, fastest |
 ### OpenAI Models
 | Model | Dimensions | Max Tokens | Best For |
@@ -272,7 +322,9 @@ import type {
   EmbeddingConfig,
   OpenAIEmbeddingConfig,
   TransformersEmbeddingConfig,
+  AgenticFlowEmbeddingConfig,
   MockEmbeddingConfig,
+  AutoEmbeddingConfig,
   // Result types
   EmbeddingResult,
@@ -349,6 +401,245 @@ const queryResult = await embeddings.embed('Search query');
 const results = await index.search(new Float32Array(queryResult.embedding), 5);
 ```
+## Document Chunking
+Split long documents into overlapping chunks for embedding:
+```typescript
+import { chunkText, estimateTokens, reconstructFromChunks } from '@claude-flow/embeddings';
+// Chunk by sentence (default)
+const result = chunkText(longDocument, {
+  maxChunkSize: 512,
+  overlap: 50,
+  strategy: 'sentence',  // 'character' | 'sentence' | 'paragraph' | 'token'
+  minChunkSize: 100,
+});
+console.log('Chunks:', result.totalChunks);
+result.chunks.forEach((chunk, i) => {
+  console.log(`Chunk ${i}: ${chunk.length} chars, ~${chunk.tokenCount} tokens`);
+});
+// Estimate tokens
+const tokens = estimateTokens('Hello world');  // ~3 tokens
+// Reconstruct (approximate)
+const reconstructed = reconstructFromChunks(result.chunks);
+```
+## Normalization
+Normalize embeddings for consistent similarity computation:
+```typescript
+import {
+  l2Normalize,    // Unit vector (Euclidean norm = 1)
+  l1Normalize,    // Manhattan norm = 1
+  minMaxNormalize, // Values in [0, 1]
+  zScoreNormalize, // Mean 0, std 1
+  normalize,       // Generic with type option
+  l2Norm,
+  isNormalized,
+} from '@claude-flow/embeddings';
+const embedding = new Float32Array([3, 4, 0]);
+// L2 normalize (most common for cosine similarity)
+const l2 = l2Normalize(embedding);  // [0.6, 0.8, 0]
+console.log('L2 norm:', l2Norm(l2));  // 1.0
+// Check if already normalized
+console.log(isNormalized(l2));  // true
+console.log(isNormalized(embedding));  // false
+// Generic normalize with type
+const normalized = normalize(embedding, { type: 'l2' });
+```
+## Hyperbolic Embeddings (Poincaré Ball)
+Transform embeddings to hyperbolic space for better hierarchical representation:
+```typescript
+import {
+  euclideanToPoincare,
+  poincareToEuclidean,
+  hyperbolicDistance,
+  mobiusAdd,
+  isInPoincareBall,
+  batchEuclideanToPoincare,
+  hyperbolicCentroid,
+} from '@claude-flow/embeddings';
+// Convert Euclidean embedding to Poincaré ball
+const euclidean = new Float32Array([0.5, 0.3, 0.2]);
+const poincare = euclideanToPoincare(euclidean);
+// Check if point is in the ball
+console.log(isInPoincareBall(poincare));  // true
+// Round-trip conversion
+const back = poincareToEuclidean(poincare);
+// Hyperbolic distance (geodesic in Poincaré ball)
+const a = euclideanToPoincare(new Float32Array([0.1, 0.2, 0.1]));
+const b = euclideanToPoincare(new Float32Array([0.3, 0.1, 0.2]));
+const dist = hyperbolicDistance(a, b);
+// Möbius addition (hyperbolic "plus")
+const sum = mobiusAdd(a, b);
+// Batch conversion
+const embeddings = [vec1, vec2, vec3];
+const hyperbolic = batchEuclideanToPoincare(embeddings);
+// Hyperbolic centroid (Fréchet mean)
+const centroid = hyperbolicCentroid(hyperbolic);
+```
+### Why Hyperbolic?
+Hyperbolic space has natural properties for representing hierarchical data:
+- **Exponential growth** - Tree-like structures fit naturally
+- **Better hierarchy** - Parent-child relationships preserved
+- **Lower distortion** - Taxonomies represented with less error
+## Neural Substrate Integration
+Access agentic-flow's neural features for advanced embedding operations:
+```typescript
+import {
+  NeuralEmbeddingService,
+  createNeuralService,
+  isNeuralAvailable,
+  listEmbeddingModels,
+  downloadEmbeddingModel,
+} from '@claude-flow/embeddings';
+// Check if neural features are available
+const available = await isNeuralAvailable();
+// Create neural service
+const neural = createNeuralService({ dimension: 384 });
+await neural.init();
+if (neural.isAvailable()) {
+  // Semantic drift detection
+  await neural.setDriftBaseline('Initial context about the topic');
+  const drift = await neural.detectDrift('New input to check for drift');
+  console.log('Drift:', drift?.trend);  // 'stable' | 'drifting' | 'accelerating'
+  // Memory with interference detection
+  const stored = await neural.storeMemory('mem-1', 'Important information');
+  console.log('Interference:', stored?.interference);
+  // Recall by similarity
+  const memories = await neural.recallMemories('query', 5);
+  // Swarm coordination
+  await neural.addSwarmAgent('agent-1', 'researcher');
+  const coordination = await neural.coordinateSwarm('Analyze this task');
+  // Coherence checking
+  await neural.calibrateCoherence(['good output 1', 'good output 2']);
+  const coherence = await neural.checkCoherence('Output to check');
+  // Health status
+  const health = neural.health();
+  console.log('Memory count:', health?.memoryCount);
+}
+// List available ONNX models
+const models = await listEmbeddingModels();
+console.log(models);
+// [{ id: 'all-MiniLM-L6-v2', dimension: 384, size: '23MB', ... }]
+// Download model
+const path = await downloadEmbeddingModel('all-MiniLM-L6-v2', '.models');
+```
+## Persistent Disk Cache
+SQLite-backed persistent cache for embeddings:
+```typescript
+import { PersistentEmbeddingCache, isPersistentCacheAvailable } from '@claude-flow/embeddings';
+// Check if SQLite is available
+const hasSQLite = await isPersistentCacheAvailable();
+// Create persistent cache
+const cache = new PersistentEmbeddingCache({
+  dbPath: './embeddings.db',  // SQLite database path
+  maxSize: 10000,             // Max entries before LRU eviction
+  ttlMs: 7 * 24 * 60 * 60 * 1000,  // 7 day TTL
+});
+// Initialize
+await cache.init();
+// Store embedding
+await cache.set('my text', new Float32Array([0.1, 0.2, 0.3]));
+// Retrieve
+const embedding = await cache.get('my text');
+// Get stats
+const stats = await cache.getStats();
+console.log('Cache stats:', {
+  size: stats.totalEntries,
+  hitRate: stats.hitRate,
+  avgLatency: stats.avgLatencyMs,
+});
+// Close when done
+await cache.close();
+```
+### Enable in Embedding Service
+```typescript
+const service = createEmbeddingService({
+  provider: 'openai',
+  apiKey: process.env.OPENAI_API_KEY!,
+  persistentCache: {
+    enabled: true,
+    dbPath: './cache/embeddings.db',
+    maxSize: 50000,
+    ttlMs: 30 * 24 * 60 * 60 * 1000,  // 30 days
+  },
+  normalization: 'l2',  // Auto-normalize embeddings
+});
+```
+## CLI Commands (New)
+```bash
+# Document chunking
+claude-flow embeddings chunk document.txt --strategy sentence --max-size 512
+# Normalize embedding file
+claude-flow embeddings normalize embeddings.json --type l2 -o normalized.json
+# Convert to hyperbolic
+claude-flow embeddings hyperbolic embeddings.json -o poincare.json
+# Neural operations
+claude-flow embeddings neural drift --baseline "context" --input "check this"
+claude-flow embeddings neural store --id mem-1 --content "data"
+claude-flow embeddings neural recall "query" --top-k 5
+# List/download models
+claude-flow embeddings models list
+claude-flow embeddings models download all-MiniLM-L6-v2
+# Cache management
+claude-flow embeddings cache stats
+claude-flow embeddings cache clear --older-than 7d
+```
 ## Related Packages
 - [@claude-flow/memory](../memory) - HNSW indexing and vector storage

package/package.json CHANGED Viewed

@@ -1,7 +1,7 @@
 {
   "name": "@sparkleideas/embeddings",
-  "version": "3.0.0-alpha.17",
-  "description": "V3 Embedding Service - OpenAI, Transformers.js, and Mock providers",
+  "version": "3.0.0-alpha.26",
+  "description": "V3 Embedding Service - OpenAI, Transformers.js, Agentic-Flow (ONNX), Mock providers with hyperbolic embeddings, normalization, and chunking",
   "type": "module",
   "main": "./dist/index.js",
   "types": "./dist/index.d.ts",
@@ -29,20 +29,32 @@
     "vector",
     "similarity",
     "claude-flow",
-    "v3"
+    "v3",
+    "hyperbolic",
+    "poincare",
+    "normalization",
+    "chunking",
+    "neural-substrate"
   ],
   "author": "Claude Flow Team",
   "license": "MIT",
   "dependencies": {
-    "@xenova/transformers": "^2.17.0"
+    "@xenova/transformers": "^2.17.0",
+    "sql.js": "^1.13.0"
   },
   "devDependencies": {
+    "@types/node": "^20.10.0",
     "typescript": "^5.3.0",
-    "vitest": "^1.0.0",
-    "@types/node": "^20.10.0"
+    "vitest": "^4.0.16"
   },
   "peerDependencies": {
-    "@sparkleideas/shared": "*"
+    "@sparkleideas/shared": "*",
+    "@sparkleideas/agentic-flow": "*"
+  },
+  "peerDependenciesMeta": {
+    "agentic-flow": {
+      "optional": true
+    }
   },
   "engines": {
     "node": ">=20.0.0"