npm - @dooor-ai/cortexdb - Versions diffs - 0.1.1 → 0.1.2 - Mend

@dooor-ai/cortexdb 0.1.1 → 0.1.2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (2) hide show

package/README.md +137 -60
package/package.json +1 -1

package/README.md CHANGED Viewed

@@ -1,15 +1,29 @@
 # CortexDB TypeScript SDK
-Official TypeScript/JavaScript client for CortexDB - Multi-modal RAG Platform.
+Official TypeScript/JavaScript client for CortexDB.
+## What is CortexDB?
+CortexDB is a multi-modal RAG (Retrieval Augmented Generation) platform that combines traditional database capabilities with vector search and advanced document processing. It enables you to:
+- Store structured and unstructured data in a unified database
+- Automatically extract text from documents (PDF, DOCX, XLSX) using Docling
+- Generate embeddings for semantic search using various providers (OpenAI, Gemini, etc.)
+- Perform hybrid search combining filters with vector similarity
+- Build RAG applications with automatic chunking and vectorization
+CortexDB handles the complex infrastructure of vector databases (Qdrant), object storage (MinIO), and traditional databases (PostgreSQL) behind a simple API.
 ## Features
-- Full TypeScript support with type definitions
-- Async/await API using native fetch
-- Semantic search with vector embeddings
-- Collection and record management
-- Custom error types for better debugging
-- Works with Node.js 18+, Deno, and modern browsers
+- **Multi-modal document processing**: Upload PDFs, DOCX, XLSX files and automatically extract text with OCR fallback
+- **Semantic search**: Vector-based search using embeddings from OpenAI, Gemini, or custom providers
+- **Automatic chunking**: Smart text splitting optimized for RAG applications
+- **Flexible schema**: Define collections with typed fields (string, number, boolean, file, array)
+- **Hybrid queries**: Combine exact filters with semantic search
+- **Storage control**: Choose where each field is stored (PostgreSQL, Qdrant, MinIO)
+- **Type-safe**: Full TypeScript support with comprehensive type definitions
+- **Modern API**: Async/await using native fetch (Node.js 18+)
 ## Installation
@@ -39,28 +53,34 @@ async function main() {
     baseUrl: 'http://localhost:8000'
   });
-  // Create a collection
-  await client.collections.create('documents', [
-    { name: 'title', type: FieldType.STRING },
-    { name: 'content', type: FieldType.STRING, vectorize: true }
-  ]);
+  // Create a collection with vectorization enabled
+  await client.collections.create(
+    'documents',
+    [
+      { name: 'title', type: FieldType.STRING },
+      { name: 'content', type: FieldType.STRING, vectorize: true }
+    ],
+    'your-embedding-provider-id'  // Required when vectorize=true
+  );
   // Create a record
   const record = await client.records.create('documents', {
-    title: 'Hello World',
-    content: 'This is my first document'
+    title: 'Introduction to AI',
+    content: 'Artificial intelligence is transforming how we build software...'
   });
-  // Semantic search
+  // Semantic search - finds relevant content by meaning, not just keywords
   const results = await client.records.search(
     'documents',
-    'hello world',
+    'How is AI changing software development?',
     undefined,
     10
   );
   results.results.forEach(result => {
-    console.log(`Score: ${result.score.toFixed(4)} - ${result.record.data.title}`);
+    console.log(`Score: ${result.score.toFixed(4)}`);
+    console.log(`Title: ${result.record.data.title}`);
+    console.log(`Content: ${result.record.data.content}\n`);
   });
   await client.close();
@@ -81,7 +101,7 @@ const client = new CortexClient({
   baseUrl: 'http://localhost:8000'
 });
-// With API key
+// Production with API key
 const client = new CortexClient({
   baseUrl: 'https://api.cortexdb.com',
   apiKey: 'your-api-key'
@@ -96,44 +116,52 @@ const client = new CortexClient({
 ### Collections
+Collections define the schema for your data. Each collection can have multiple fields with different types and storage options.
 ```typescript
 import { FieldType, StoreLocation } from '@dooor-ai/cortexdb';
-// Create collection
-const collection = await client.collections.create('articles', [
-  {
-    name: 'title',
-    type: FieldType.STRING
-  },
-  {
-    name: 'content',
-    type: FieldType.STRING,
-    vectorize: true  // Enable semantic search
-  },
-  {
-    name: 'year',
-    type: FieldType.NUMBER,
-    store_in: [StoreLocation.POSTGRES, StoreLocation.QDRANT_PAYLOAD]
-  }
-]);
+// Create collection with vectorization
+const collection = await client.collections.create(
+  'articles',
+  [
+    {
+      name: 'title',
+      type: FieldType.STRING
+    },
+    {
+      name: 'content',
+      type: FieldType.STRING,
+      vectorize: true  // Enable semantic search on this field
+    },
+    {
+      name: 'year',
+      type: FieldType.NUMBER,
+      store_in: [StoreLocation.POSTGRES, StoreLocation.QDRANT_PAYLOAD]
+    }
+  ],
+  'embedding-provider-id'  // Required when any field has vectorize=true
+);
 // List collections
 const collections = await client.collections.list();
-// Get collection
+// Get collection schema
 const schema = await client.collections.get('articles');
-// Delete collection
+// Delete collection and all its records
 await client.collections.delete('articles');
 ```
 ### Records
+Records are the actual data stored in collections. They must match the collection schema.
 ```typescript
 // Create record
 const record = await client.records.create('articles', {
   title: 'Machine Learning Basics',
-  content: 'Introduction to ML concepts...',
+  content: 'Machine learning is a subset of AI that focuses on learning from data...',
   year: 2024
 });
@@ -148,7 +176,7 @@ const updated = await client.records.update('articles', record.id, {
 // Delete record
 await client.records.delete('articles', record.id);
-// List records
+// List records with pagination
 const results = await client.records.list('articles', {
   limit: 10,
   offset: 0
@@ -157,8 +185,10 @@ const results = await client.records.list('articles', {
 ### Semantic Search
+Semantic search finds records by meaning, not just exact keyword matches. It uses vector embeddings to understand context.
 ```typescript
-// Basic search
+// Basic semantic search
 const results = await client.records.search(
   'articles',
   'machine learning fundamentals',
@@ -166,7 +196,7 @@ const results = await client.records.search(
   10
 );
-// Search with filters
+// Search with filters - combine semantic search with exact matches
 const filteredResults = await client.records.search(
   'articles',
   'neural networks',
@@ -177,30 +207,55 @@ const filteredResults = await client.records.search(
   5
 );
-// Process results
+// Process results - ordered by relevance score
 filteredResults.results.forEach(result => {
-  console.log(`Score: ${result.score.toFixed(4)}`);
+  console.log(`Score: ${result.score.toFixed(4)}`);  // Higher = more relevant
   console.log(`Title: ${result.record.data.title}`);
   console.log(`Year: ${result.record.data.year}`);
 });
 ```
+### Working with Files
+CortexDB can process documents and automatically extract text for vectorization.
+```typescript
+// Create collection with file field
+await client.collections.create(
+  'documents',
+  [
+    { name: 'title', type: FieldType.STRING },
+    {
+      name: 'document',
+      type: FieldType.FILE,
+      vectorize: true  // Extract text and create embeddings
+    }
+  ],
+  'embedding-provider-id'
+);
+// Note: File upload support is currently available in the REST API
+// TypeScript SDK file upload will be added in a future version
+```
 ### Filter Operators
 ```typescript
-// Exact match
+// Exact match filters
 const results = await client.records.list('articles', {
   filters: {
     category: 'technology',
-    published: true
+    published: true,
+    year: 2024
   }
 });
-// Combine filters
+// Combine multiple filters
 const filtered = await client.records.list('articles', {
   filters: {
     year: 2024,
-    category: 'AI'
+    category: 'AI',
+    author: 'John Doe'
   },
   limit: 20
 });
@@ -208,12 +263,15 @@ const filtered = await client.records.list('articles', {
 ## Error Handling
+The SDK provides specific error types for different failure scenarios.
 ```typescript
 import {
   CortexDBError,
   CortexDBNotFoundError,
   CortexDBValidationError,
-  CortexDBConnectionError
+  CortexDBConnectionError,
+  CortexDBTimeoutError
 } from '@dooor-ai/cortexdb';
 try {
@@ -222,9 +280,11 @@ try {
   if (error instanceof CortexDBNotFoundError) {
     console.log('Record not found');
   } else if (error instanceof CortexDBValidationError) {
-    console.log('Validation error:', error.message);
+    console.log('Invalid data:', error.message);
   } else if (error instanceof CortexDBConnectionError) {
     console.log('Connection failed:', error.message);
+  } else if (error instanceof CortexDBTimeoutError) {
+    console.log('Request timed out:', error.message);
   } else if (error instanceof CortexDBError) {
     console.log('General error:', error.message);
   }
@@ -233,11 +293,11 @@ try {
 ## Examples
-Check the [`examples/`](./examples) directory for more usage examples:
+Check the [`examples/`](./examples) directory for complete working examples:
-- [`quickstart.ts`](./examples/quickstart.ts) - Walkthrough of SDK features
-- [`search.ts`](./examples/search.ts) - Semantic search with filters
-- [`basic.ts`](./examples/basic.ts) - Basic operations
+- [`quickstart.ts`](./examples/quickstart.ts) - Complete walkthrough of SDK features
+- [`search.ts`](./examples/search.ts) - Semantic search with filters and providers
+- [`basic.ts`](./examples/basic.ts) - Basic CRUD operations
 Run examples:
@@ -264,27 +324,44 @@ npm run build
 ### Scripts
 ```bash
-# Build
+# Build TypeScript
 npm run build
-# Watch mode
+# Build in watch mode
 npm run build:watch
-# Clean
+# Clean build artifacts
 npm run clean
-# Lint
+# Lint code
 npm run lint
-# Format
+# Format code
 npm run format
 ```
 ## Requirements
-- Node.js >= 18.0.0
-- CortexDB gateway running (local or remote)
+- Node.js >= 18.0.0 (for native fetch support)
+- CortexDB gateway running locally or remotely
+- Embedding provider configured (OpenAI, Gemini, etc.) if using vectorization
+## Architecture
+CortexDB integrates multiple technologies:
+- **PostgreSQL**: Stores structured data and metadata
+- **Qdrant**: Vector database for semantic search
+- **MinIO**: Object storage for files
+- **Docling**: Advanced document processing and text extraction
+The SDK abstracts this complexity into a simple, unified API.
 ## License
 MIT License - see [LICENSE](./LICENSE) for details.
+## Related
+- [CortexDB Python SDK](../python) - Python client for CortexDB
+- [CortexDB Documentation](../../docs) - Complete platform documentation

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "@dooor-ai/cortexdb",
-  "version": "0.1.1",
+  "version": "0.1.2",
   "description": "Official TypeScript/JavaScript SDK for CortexDB - Multi-modal RAG Platform with advanced document processing",
   "main": "dist/index.js",
   "types": "dist/index.d.ts",