npm - @dooor-ai/cortexdb - Versions diffs - 0.1.0 → 0.1.2 - Mend

@dooor-ai/cortexdb 0.1.0 → 0.1.2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (2) hide show

package/README.md +238 -173
package/package.json +1 -1

package/README.md CHANGED Viewed

@@ -1,34 +1,47 @@
-# CortexDB TypeScript/JavaScript SDK
+# CortexDB TypeScript SDK
-Official TypeScript/JavaScript client for **CortexDB** - A powerful multi-modal RAG (Retrieval Augmented Generation) platform with advanced document processing capabilities.
+Official TypeScript/JavaScript client for CortexDB.
-[![npm version](https://badge.fury.io/js/%40dooor-ai%2Fcortexdb.svg)](https://badge.fury.io/js/%40dooor-ai%2Fcortexdb)
-[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
+## What is CortexDB?
+CortexDB is a multi-modal RAG (Retrieval Augmented Generation) platform that combines traditional database capabilities with vector search and advanced document processing. It enables you to:
+- Store structured and unstructured data in a unified database
+- Automatically extract text from documents (PDF, DOCX, XLSX) using Docling
+- Generate embeddings for semantic search using various providers (OpenAI, Gemini, etc.)
+- Perform hybrid search combining filters with vector similarity
+- Build RAG applications with automatic chunking and vectorization
+CortexDB handles the complex infrastructure of vector databases (Qdrant), object storage (MinIO), and traditional databases (PostgreSQL) behind a simple API.
 ## Features
-- 🚀 **Simple & Intuitive API** - Easy to use, TypeScript-first design
-- 🔍 **Semantic Search** - Vector-based semantic search with embeddings
-- 📄 **Document Processing** - Advanced PDF, DOCX, XLSX processing with Docling
-- 🎯 **Type Safe** - Full TypeScript support with type definitions
-- ⚡ **Async/Await** - Modern async/await API
-- 🛡️ **Error Handling** - Comprehensive error types for better debugging
-- 🔌 **Flexible** - Works with Node.js, Deno, and modern browsers
+- **Multi-modal document processing**: Upload PDFs, DOCX, XLSX files and automatically extract text with OCR fallback
+- **Semantic search**: Vector-based search using embeddings from OpenAI, Gemini, or custom providers
+- **Automatic chunking**: Smart text splitting optimized for RAG applications
+- **Flexible schema**: Define collections with typed fields (string, number, boolean, file, array)
+- **Hybrid queries**: Combine exact filters with semantic search
+- **Storage control**: Choose where each field is stored (PostgreSQL, Qdrant, MinIO)
+- **Type-safe**: Full TypeScript support with comprehensive type definitions
+- **Modern API**: Async/await using native fetch (Node.js 18+)
 ## Installation
 ```bash
 npm install @dooor-ai/cortexdb
-# or
+```
+Or with yarn:
+```bash
 yarn add @dooor-ai/cortexdb
-# or
-pnpm add @dooor-ai/cortexdb
 ```
-## Requirements
+Or with pnpm:
-- Node.js >= 18.0.0 (uses native `fetch`)
-- CortexDB gateway running (default: `http://localhost:8000`)
+```bash
+pnpm add @dooor-ai/cortexdb
+```
 ## Quick Start
@@ -36,267 +49,319 @@ pnpm add @dooor-ai/cortexdb
 import { CortexClient, FieldType } from '@dooor-ai/cortexdb';
 async function main() {
-  // Initialize client
   const client = new CortexClient({
-    baseUrl: 'http://localhost:8000',
-    // apiKey: 'YOUR_API_KEY', // Optional: if authentication is enabled
+    baseUrl: 'http://localhost:8000'
   });
-  // Check health
-  const isHealthy = await client.healthcheck();
-  console.log('CortexDB:', isHealthy ? 'Connected ✓' : 'Disconnected ✗');
-  // Create a collection
-  await client.collections.create('my_docs', [
-    { name: 'title', type: FieldType.STRING },
-    { name: 'content', type: FieldType.STRING },
-  ]);
-  // Create a record
-  const record = await client.records.create('my_docs', {
-    title: 'Hello World',
-    content: 'This is my first document',
-  });
-  console.log('Created:', record.id);
-  // Get the record
-  const fetched = await client.records.get('my_docs', record.id);
-  console.log('Fetched:', fetched.data);
-  // List all records
-  const results = await client.records.list('my_docs');
-  console.log('Total:', results.total);
-  // Clean up
-  await client.collections.delete('my_docs');
-  await client.close();
-}
-main();
-```
-## Semantic Search Example
-Enable semantic search by creating a collection with `vectorize: true` and an embedding provider:
-```typescript
-import { CortexClient, FieldType } from '@dooor-ai/cortexdb';
-async function semanticSearch() {
-  const client = new CortexClient({ baseUrl: 'http://localhost:8000' });
-  // Create collection with vectorization
+  // Create a collection with vectorization enabled
   await client.collections.create(
-    'knowledge_base',
+    'documents',
     [
       { name: 'title', type: FieldType.STRING },
-      { name: 'content', type: FieldType.STRING, vectorize: true },
+      { name: 'content', type: FieldType.STRING, vectorize: true }
     ],
-    'your-embedding-provider-id' // Required for vectorization
+    'your-embedding-provider-id'  // Required when vectorize=true
   );
-  // Add documents
-  await client.records.create('knowledge_base', {
-    title: 'Machine Learning',
-    content: 'ML is a branch of AI that focuses on learning from data.',
-  });
-  await client.records.create('knowledge_base', {
-    title: 'Deep Learning',
-    content: 'Deep learning uses neural networks with multiple layers.',
+  // Create a record
+  const record = await client.records.create('documents', {
+    title: 'Introduction to AI',
+    content: 'Artificial intelligence is transforming how we build software...'
   });
-  // Perform semantic search
+  // Semantic search - finds relevant content by meaning, not just keywords
   const results = await client.records.search(
-    'knowledge_base',
-    'What is artificial intelligence?',
-    undefined, // filters
-    5 // limit
+    'documents',
+    'How is AI changing software development?',
+    undefined,
+    10
   );
-  results.results.forEach((result) => {
-    console.log(`${result.record.data.title} (score: ${result.score})`);
+  results.results.forEach(result => {
+    console.log(`Score: ${result.score.toFixed(4)}`);
+    console.log(`Title: ${result.record.data.title}`);
+    console.log(`Content: ${result.record.data.content}\n`);
   });
   await client.close();
 }
+main();
 ```
-## API Reference
+## Usage
-### CortexClient
+### Initialize Client
 ```typescript
+import { CortexClient } from '@dooor-ai/cortexdb';
+// Local development
 const client = new CortexClient({
-  baseUrl?: string;        // Default: 'http://localhost:8000'
-  apiKey?: string;         // Optional API key
-  timeout?: number;        // Request timeout in ms (default: 30000)
+  baseUrl: 'http://localhost:8000'
 });
-```
-#### Methods
+// Production with API key
+const client = new CortexClient({
+  baseUrl: 'https://api.cortexdb.com',
+  apiKey: 'your-api-key'
+});
-- `health()` - Get health status
-- `healthcheck()` - Returns boolean health status
-- `close()` - Close the client
+// Custom timeout
+const client = new CortexClient({
+  baseUrl: 'http://localhost:8000',
+  timeout: 60000  // 60 seconds
+});
+```
-### Collections API
+### Collections
-```typescript
-// List all collections
-const collections = await client.collections.list();
+Collections define the schema for your data. Each collection can have multiple fields with different types and storage options.
-// Get a collection
-const collection = await client.collections.get('my_collection');
+```typescript
+import { FieldType, StoreLocation } from '@dooor-ai/cortexdb';
-// Create a collection
-await client.collections.create(
-  'my_collection',
+// Create collection with vectorization
+const collection = await client.collections.create(
+  'articles',
   [
-    { name: 'title', type: FieldType.STRING, required: true },
-    { name: 'content', type: FieldType.STRING, vectorize: true },
+    {
+      name: 'title',
+      type: FieldType.STRING
+    },
+    {
+      name: 'content',
+      type: FieldType.STRING,
+      vectorize: true  // Enable semantic search on this field
+    },
+    {
+      name: 'year',
+      type: FieldType.NUMBER,
+      store_in: [StoreLocation.POSTGRES, StoreLocation.QDRANT_PAYLOAD]
+    }
   ],
-  'embedding-provider-id' // Optional: required if vectorize is true
+  'embedding-provider-id'  // Required when any field has vectorize=true
 );
-// Update a collection
-await client.collections.update('my_collection', fields, embeddingProvider);
+// List collections
+const collections = await client.collections.list();
+// Get collection schema
+const schema = await client.collections.get('articles');
-// Delete a collection
-await client.collections.delete('my_collection');
+// Delete collection and all its records
+await client.collections.delete('articles');
 ```
-### Records API
+### Records
+Records are the actual data stored in collections. They must match the collection schema.
 ```typescript
-// Create a record
-const record = await client.records.create('collection_name', {
-  title: 'My Document',
-  content: 'Document content...',
+// Create record
+const record = await client.records.create('articles', {
+  title: 'Machine Learning Basics',
+  content: 'Machine learning is a subset of AI that focuses on learning from data...',
+  year: 2024
 });
-// Get a record
-const record = await client.records.get('collection_name', 'record-id');
+// Get record by ID
+const fetched = await client.records.get('articles', record.id);
-// List records with filters
-const results = await client.records.list('collection_name', {
-  filters: { year: 2024 },
-  limit: 10,
-  offset: 0,
+// Update record
+const updated = await client.records.update('articles', record.id, {
+  year: 2025
 });
-// Update a record
-await client.records.update('collection_name', 'record-id', {
-  title: 'Updated Title',
+// Delete record
+await client.records.delete('articles', record.id);
+// List records with pagination
+const results = await client.records.list('articles', {
+  limit: 10,
+  offset: 0
 });
+```
+### Semantic Search
-// Delete a record
-await client.records.delete('collection_name', 'record-id');
+Semantic search finds records by meaning, not just exact keyword matches. It uses vector embeddings to understand context.
-// Semantic search
+```typescript
+// Basic semantic search
 const results = await client.records.search(
-  'collection_name',
-  'search query',
-  { category: 'tech' }, // optional filters
-  10 // limit
+  'articles',
+  'machine learning fundamentals',
+  undefined,
+  10
 );
+// Search with filters - combine semantic search with exact matches
+const filteredResults = await client.records.search(
+  'articles',
+  'neural networks',
+  {
+    year: 2024,
+    category: 'AI'
+  },
+  5
+);
+// Process results - ordered by relevance score
+filteredResults.results.forEach(result => {
+  console.log(`Score: ${result.score.toFixed(4)}`);  // Higher = more relevant
+  console.log(`Title: ${result.record.data.title}`);
+  console.log(`Year: ${result.record.data.year}`);
+});
 ```
-## Field Types
+### Working with Files
+CortexDB can process documents and automatically extract text for vectorization.
 ```typescript
-import { FieldType, StoreLocation } from '@cortexdb/sdk';
-const field = {
-  name: 'my_field',
-  type: FieldType.STRING, // STRING, NUMBER, BOOLEAN, FILE, ARRAY
-  vectorize: true,        // Enable semantic search
-  required: false,        // Make field required
-  store_in: [             // Storage locations
-    StoreLocation.POSTGRES,
-    StoreLocation.QDRANT_PAYLOAD,
+// Create collection with file field
+await client.collections.create(
+  'documents',
+  [
+    { name: 'title', type: FieldType.STRING },
+    {
+      name: 'document',
+      type: FieldType.FILE,
+      vectorize: true  // Extract text and create embeddings
+    }
   ],
-};
+  'embedding-provider-id'
+);
+// Note: File upload support is currently available in the REST API
+// TypeScript SDK file upload will be added in a future version
+```
+### Filter Operators
+```typescript
+// Exact match filters
+const results = await client.records.list('articles', {
+  filters: {
+    category: 'technology',
+    published: true,
+    year: 2024
+  }
+});
+// Combine multiple filters
+const filtered = await client.records.list('articles', {
+  filters: {
+    year: 2024,
+    category: 'AI',
+    author: 'John Doe'
+  },
+  limit: 20
+});
 ```
 ## Error Handling
-The SDK provides specific error types for better error handling:
+The SDK provides specific error types for different failure scenarios.
 ```typescript
 import {
   CortexDBError,
-  CortexDBConnectionError,
-  CortexDBTimeoutError,
   CortexDBNotFoundError,
   CortexDBValidationError,
-  CortexDBAuthenticationError,
-  CortexDBPermissionError,
-  CortexDBServerError,
-} from '@cortexdb/sdk';
+  CortexDBConnectionError,
+  CortexDBTimeoutError
+} from '@dooor-ai/cortexdb';
 try {
-  await client.records.get('collection', 'invalid-id');
+  const record = await client.records.get('articles', 'invalid-id');
 } catch (error) {
   if (error instanceof CortexDBNotFoundError) {
     console.log('Record not found');
+  } else if (error instanceof CortexDBValidationError) {
+    console.log('Invalid data:', error.message);
   } else if (error instanceof CortexDBConnectionError) {
-    console.log('Connection failed');
+    console.log('Connection failed:', error.message);
   } else if (error instanceof CortexDBTimeoutError) {
-    console.log('Request timed out');
-  } else {
-    console.log('Unknown error:', error);
+    console.log('Request timed out:', error.message);
+  } else if (error instanceof CortexDBError) {
+    console.log('General error:', error.message);
   }
 }
 ```
 ## Examples
-Check out the [examples](./examples) directory for more:
+Check the [`examples/`](./examples) directory for complete working examples:
-- [quickstart.ts](./examples/quickstart.ts) - Complete quickstart guide
-- [search.ts](./examples/search.ts) - Semantic search with filters
-- [basic.ts](./examples/basic.ts) - Basic operations
+- [`quickstart.ts`](./examples/quickstart.ts) - Complete walkthrough of SDK features
+- [`search.ts`](./examples/search.ts) - Semantic search with filters and providers
+- [`basic.ts`](./examples/basic.ts) - Basic CRUD operations
 Run examples:
 ```bash
-cd examples
-npx ts-node -O '{"module":"commonjs"}' quickstart.ts
+npx ts-node -O '{"module":"commonjs"}' examples/quickstart.ts
 ```
 ## Development
+### Setup
 ```bash
+# Clone repository
+git clone https://github.com/yourusername/cortexdb
+cd cortexdb/clients/typescript
 # Install dependencies
 npm install
 # Build
 npm run build
+```
-# Watch mode
+### Scripts
+```bash
+# Build TypeScript
+npm run build
+# Build in watch mode
 npm run build:watch
-# Clean
+# Clean build artifacts
 npm run clean
+# Lint code
+npm run lint
+# Format code
+npm run format
 ```
-## Contributing
+## Requirements
-Contributions are welcome! Please feel free to submit a Pull Request.
+- Node.js >= 18.0.0 (for native fetch support)
+- CortexDB gateway running locally or remotely
+- Embedding provider configured (OpenAI, Gemini, etc.) if using vectorization
-## License
+## Architecture
+CortexDB integrates multiple technologies:
-MIT License - see [LICENSE](./LICENSE) file for details.
+- **PostgreSQL**: Stores structured data and metadata
+- **Qdrant**: Vector database for semantic search
+- **MinIO**: Object storage for files
+- **Docling**: Advanced document processing and text extraction
-## Support
+The SDK abstracts this complexity into a simple, unified API.
+## License
-- 📖 [Documentation](https://github.com/cortexdb/cortexdb)
-- 🐛 [Issue Tracker](https://github.com/cortexdb/cortexdb/issues)
-- 💬 [Discussions](https://github.com/cortexdb/cortexdb/discussions)
+MIT License - see [LICENSE](./LICENSE) for details.
 ## Related
 - [CortexDB Python SDK](../python) - Python client for CortexDB
-- [CortexDB Gateway](../../gateway) - CortexDB backend service
+- [CortexDB Documentation](../../docs) - Complete platform documentation

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "@dooor-ai/cortexdb",
-  "version": "0.1.0",
+  "version": "0.1.2",
   "description": "Official TypeScript/JavaScript SDK for CortexDB - Multi-modal RAG Platform with advanced document processing",
   "main": "dist/index.js",
   "types": "dist/index.d.ts",