npm - voctar - Versions diffs - 0.1.0 → 0.1.2 - Mend

voctar 0.1.0 → 0.1.2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (8) hide show

package/README.md +4 -4
package/docs/API.md +10 -0
package/docs/EMBEDDINGS.md +109 -0
package/docs/README.md +1 -1
package/docs/STORAGE_BACKENDS.md +67 -52
package/package.json +9 -1
package/docs/CUSTOM_PROVIDERS.md +0 -101
/package/docs/assets/{vectar.png → voctar.png} +0 -0

package/README.md CHANGED Viewed

@@ -1,11 +1,11 @@
 <p align="center">
-  <img src="./docs/assets/vectar.png" alt="Voctar logo" width="180" />
+  <img src="https://github.com/marvinified/voctar/blob/e0ca3d3d1d609020e9139530aea9c8e60eca92ae/docs/assets/vectar.png" alt="Voctar logo" width="180" />
 </p>
 <h1 align="center">Voctar</h1>
 <p align="center">
-  Simple TypeScript library with RAG primitives for embeddings, chunking, storage, and semantic retrieval.
+  Simple TypeScript library with RAG primitives for embeddings, chunking, storage, and retrieval.
 </p>
 <p align="center">
@@ -16,11 +16,10 @@
 </p>
 ## Features
+- Simple primitives: `embed` and `search`
 - Supports multiple vector stores: SQLite, Qdrant, in-memory, or custom store providers
 - Automatic chunking for long documents with multiple strategies (`fixed`, `recursive`, `sentence`, `paragraph`, `semantic`)
 - Semantic search with score thresholds and metadata filtering
-- Simple primitives: `embed`, `search` and more
 - TypeScript-first.
 ## Quick Start
@@ -97,6 +96,7 @@ Each result includes:
 ## Documentation
 - [Docs Index](./docs/README.md)
+- [Embeddings](./docs/EMBEDDINGS.md)
 - [Storage Backends](./docs/STORAGE_BACKENDS.md)
 - [Chunking](./docs/CHUNKING.md)

package/docs/API.md CHANGED Viewed

@@ -280,6 +280,16 @@ type RuntimeEmbeddingConfig =
     };
 ```
+The built-in OpenAI provider defaults to:
+- `model`: `text-embedding-3-small`
+- `dimension`: `1536`
+- `maxRetries`: `3`
+Set `model` to any OpenAI embedding model supported by your OpenAI account. Set `dimension` when the model supports configurable embedding dimensions or when your vector store collection expects a specific dimension. A collection can only contain vectors with one dimension, so changing model or dimension usually requires a new collection.
+Use `{ type: 'custom', provider }` for local models, hosted non-OpenAI models, or any other embedding service. Custom providers must implement `EmbeddingProvider`.
 ### `RuntimeStoreConfig`
 ```typescript

package/docs/EMBEDDINGS.md ADDED Viewed

@@ -0,0 +1,109 @@
+# Voctar Embeddings
+This guide covers embedding model configuration in Voctar.
+Voctar is config-first:
+- your app chooses the embedding provider,
+- your app reads env vars or secrets,
+- your app passes explicit config to `new Voctar(...)`.
+## Available Providers
+Voctar supports:
+- `openai`
+- `custom`
+## OpenAI Provider
+The built-in OpenAI provider is the default path for most apps.
+Defaults:
+- `model`: `text-embedding-3-small`
+- `dimension`: `1536`
+- `maxRetries`: `3`
+Example:
+```typescript
+import { Voctar } from 'voctar';
+const vector = new Voctar({
+  embedding: {
+    type: 'openai',
+    apiKey: process.env.OPENAI_API_KEY!,
+    model: 'text-embedding-3-small',
+    dimension: 1536,
+  },
+  store: {
+    type: 'sqlite',
+    path: './data/vector.db',
+  },
+});
+```
+You can pass any OpenAI embedding model supported by the OpenAI API. If the model supports configurable embedding dimensions, set `dimension` to the vector size you want to store.
+## Model and Dimension Notes
+The embedding dimension must match the vector store collection dimension. Existing collections cannot mix vectors with different dimensions, so changing `model` or `dimension` usually requires a new collection.
+Voctar uses the provider token limit to decide when documents should be chunked automatically. The built-in OpenAI provider uses:
+- `8192` tokens for `text-embedding-3-small` and `text-embedding-3-large`
+- `8191` tokens for `text-embedding-ada-002`
+- `8192` tokens for other OpenAI embedding model names
+## Custom Embedding Provider
+Use a custom embedding provider for local models, hosted non-OpenAI models, or any embedding service with your own client.
+Example:
+```typescript
+import { Voctar, type EmbeddingProvider } from 'voctar';
+class MyEmbeddingProvider implements EmbeddingProvider {
+  async embed(text: string): Promise<number[]> {
+    // Return one embedding vector for one text.
+    return [/* ... */];
+  }
+  async embedBatch(texts: string[]): Promise<number[][]> {
+    // Return one vector per input text in the same order.
+    return texts.map(() => [/* ... */]);
+  }
+  getDimension(): number {
+    return 1536;
+  }
+  getModelName(): string {
+    return 'my-embedding-model';
+  }
+  getTokenLimit(): number {
+    return 8192;
+  }
+}
+const vector = new Voctar({
+  embedding: {
+    type: 'custom',
+    provider: new MyEmbeddingProvider(),
+  },
+  store: {
+    type: 'sqlite',
+    path: './data/vector.db',
+  },
+});
+```
+Integration tips:
+- Keep `embedBatch()` output order stable with input order.
+- Ensure `getDimension()` matches vectors returned by `embed()` and `embedBatch()`.
+- Return a realistic `getTokenLimit()` so automatic chunking can split long documents before embedding.
+- Normalize errors with useful messages so callers can debug provider failures quickly.

package/docs/README.md CHANGED Viewed

@@ -5,7 +5,7 @@ The canonical getting-started guide now lives in the root [`README.md`](../READM
 Use this folder for focused topics:
 - [API Reference](./API.md)
-- [Custom Providers](./CUSTOM_PROVIDERS.md)
+- [Embeddings](./EMBEDDINGS.md)
 - [Storage Backends](./STORAGE_BACKENDS.md)
 - [Chunking](./CHUNKING.md)

package/docs/STORAGE_BACKENDS.md CHANGED Viewed

@@ -2,12 +2,6 @@
 This guide covers the available storage backends in Voctar and when to use each one.
-Voctar is config-first:
-- your app chooses the backend,
-- your app reads env vars (if any),
-- your app passes explicit config to `new Vectar(...)`.
 ## Available Backends
 Voctar supports:
@@ -19,9 +13,9 @@ Voctar supports:
 ## Quick Selection Guide
+- Use `memory` for tests and short-lived demos only.
 - Use `sqlite` for local dev and simple production workloads.
 - Use `qdrant` for larger datasets, higher throughput, or multi-instance deployments.
-- Use `memory` for tests and short-lived demos only.
 - Use `custom` when integrating an internal or third-party vector store.
 ## SQLite Backend
@@ -70,24 +64,17 @@ store: {
 }
 ```
-## Qdrant Backend
+## In-Memory Backend
 Best for:
-- medium and large datasets,
-- high query volume,
-- distributed deployments.
-Pros:
-- purpose-built vector DB,
-- strong scale characteristics,
-- rich filtering support.
+- unit tests,
+- quick local examples.
 Trade-offs:
-- extra service to operate,
-- network hop adds operational complexity.
+- data is lost on restart,
+- unsuitable for production persistence.
 Example:
@@ -100,27 +87,29 @@ const vector = new Voctar({
     apiKey: process.env.OPENAI_API_KEY!,
   },
   store: {
-    type: 'qdrant',
-    url: process.env.QDRANT_URL!,
-    port: process.env.QDRANT_PORT ? Number(process.env.QDRANT_PORT) : 6333,
-    apiKey: process.env.QDRANT_API_KEY || undefined,
-    timeout: 30000,
-    checkCompatibility: false,
+    type: 'memory',
   },
 });
 ```
-## In-Memory Backend
+## Qdrant Backend
 Best for:
-- unit tests,
-- quick local examples.
+- medium and large datasets,
+- high query volume,
+- distributed deployments.
+Pros:
+- purpose-built vector DB,
+- strong scale characteristics,
+- rich filtering support.
 Trade-offs:
-- data is lost on restart,
-- unsuitable for production persistence.
+- extra service to operate,
+- network hop adds operational complexity.
 Example:
@@ -133,11 +122,17 @@ const vector = new Voctar({
     apiKey: process.env.OPENAI_API_KEY!,
   },
   store: {
-    type: 'memory',
+    type: 'qdrant',
+    url: process.env.QDRANT_URL!,
+    port: process.env.QDRANT_PORT ? Number(process.env.QDRANT_PORT) : 6333,
+    apiKey: process.env.QDRANT_API_KEY || undefined,
+    timeout: 30000,
+    checkCompatibility: false,
   },
 });
 ```
 ## Custom Backend
 Use this when you need full control over storage behavior.
@@ -161,29 +156,49 @@ const vector = new Voctar({
 });
 ```
-See [`CUSTOM_PROVIDERS.md`](./CUSTOM_PROVIDERS.md) for full interface details.
-## Environment Variable Pattern (App-Owned)
-Voctar does not auto-load env vars, but many apps use a selector like this:
-```bash
-VECTOR_STORE=sqlite  # sqlite | qdrant | memory
-SQLITE_PATH=./data/vector.db
-QDRANT_URL=http://localhost
-QDRANT_PORT=6333
-QDRANT_API_KEY=your_api_key
-```
-Then in app bootstrap:
+Full interface example:
 ```typescript
-const storeType = process.env.VECTOR_STORE ?? 'sqlite';
+import type {
+  CollectionConfig,
+  SearchOptions,
+  SearchResult,
+  VectorPoint,
+  VectorStoreProvider,
+} from 'voctar';
+export class MyVectorStoreProvider implements VectorStoreProvider {
+  async ensureCollection(name: string, dimension: number, config?: CollectionConfig): Promise<void> {
+    // Create collection/index if missing.
+  }
+  async upsert(collection: string, points: VectorPoint[]): Promise<void> {
+    // Insert or update vectors.
+  }
+  async search(collection: string, vector: number[], options: SearchOptions): Promise<SearchResult[]> {
+    // Return scored results in descending relevance.
+    return [];
+  }
+  async delete(collection: string, ids: string[]): Promise<void> {
+    // Delete matching IDs.
+  }
+  async deleteCollection(collection: string): Promise<void> {
+    // Drop collection/index.
+  }
+  async getIdsByFilter(collection: string, filter: Record<string, any>, limit?: number): Promise<string[]> {
+    // Return IDs that match filter.
+    return [];
+  }
+}
 ```
-## Migration and Operations Notes
+Integration tips:
-- Start with `sqlite` if you are early-stage.
-- Move to `qdrant` when dataset size, traffic, or deployment topology requires it.
-- Back up SQLite database files regularly.
-- For Qdrant, use snapshots/backups supported by your Qdrant setup.
+- Ensure `ensureCollection()` respects the embedding provider dimension.
+- Implement filter behavior consistently in `search()` and `getIdsByFilter()`.
+- Return search results in descending relevance order.
+- Normalize storage errors with useful messages so callers can debug quickly.

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "voctar",
-  "version": "0.1.0",
+  "version": "0.1.2",
   "description": "TypeScript library with RAG primitives for vector embeddings, chunking, storing and retrieval.",
   "main": "dist/index.js",
   "types": "dist/index.d.ts",
@@ -26,6 +26,14 @@
     "qdrant",
     "sqlite"
   ],
+  "repository": {
+    "type": "git",
+    "url": "git+https://github.com/marvinified/voctar.git"
+  },
+  "bugs": {
+    "url": "https://github.com/marvinified/voctar/issues"
+  },
+  "homepage": "https://github.com/marvinified/voctar#readme",
   "license": "MIT",
   "engines": {
     "node": ">=18"

package/docs/CUSTOM_PROVIDERS.md DELETED Viewed

@@ -1,101 +0,0 @@
-# Custom Providers
-Voctar supports custom providers for embeddings and storage.
-## Use Custom Providers
-```typescript
-import { Voctar } from 'voctar';
-const vector = new Voctar({
-  embedding: {
-    type: 'custom',
-    provider: myEmbeddingProvider,
-  },
-  store: {
-    type: 'custom',
-    provider: myVectorStoreProvider,
-  },
-});
-```
-## Custom Embedding Provider
-Implement the `EmbeddingProvider` interface:
-```typescript
-import type { EmbeddingProvider } from 'voctar';
-export class MyEmbeddingProvider implements EmbeddingProvider {
-  async embed(text: string): Promise<number[]> {
-    // Return one embedding vector for one text
-    return [/* ... */];
-  }
-  async embedBatch(texts: string[]): Promise<number[][]> {
-    // Return one vector per input text (same order)
-    return texts.map(() => [/* ... */]);
-  }
-  getDimension(): number {
-    return 1536;
-  }
-  getModelName(): string {
-    return 'my-embedding-model';
-  }
-  getTokenLimit(): number {
-    return 8192;
-  }
-}
-```
-## Custom Store Provider
-Implement the `VectorStoreProvider` interface:
-```typescript
-import type {
-  VectorStoreProvider,
-  VectorPoint,
-  SearchOptions,
-  SearchResult,
-  CollectionConfig,
-} from 'voctar';
-export class MyVectorStoreProvider implements VectorStoreProvider {
-  async ensureCollection(name: string, dimension: number, config?: CollectionConfig): Promise<void> {
-    // Create collection/index if missing
-  }
-  async upsert(collection: string, points: VectorPoint[]): Promise<void> {
-    // Insert or update vectors
-  }
-  async search(collection: string, vector: number[], options: SearchOptions): Promise<SearchResult[]> {
-    // Return scored results in descending relevance
-    return [];
-  }
-  async delete(collection: string, ids: string[]): Promise<void> {
-    // Delete matching IDs
-  }
-  async deleteCollection(collection: string): Promise<void> {
-    // Drop collection/index
-  }
-  async getIdsByFilter(collection: string, filter: Record<string, any>, limit?: number): Promise<string[]> {
-    // Return IDs that match filter
-    return [];
-  }
-}
-```
-## Integration Tips
-- Keep `embedBatch()` order stable with input order.
-- Ensure `getDimension()` matches vectors returned by `embed()`/`embedBatch()`.
-- Normalize errors with useful messages so callers can debug quickly.
-- Implement filter behavior consistently in `search()` and `getIdsByFilter()`.

/package/docs/assets/{vectar.png → voctar.png} RENAMED Viewed

File without changes