vectra-js 0.9.6 → 0.9.8

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -1,656 +1,511 @@
1
1
  # Vectra (Node.js)
2
2
 
3
- A production-ready, provider-agnostic Node.js SDK for End-to-End RAG (Retrieval-Augmented Generation) pipelines.
4
-
5
- ## Features
6
-
7
- * **Multi-Provider Support**: First-class support for **OpenAI**, **Gemini**, **Anthropic**, **OpenRouter**, and **Hugging Face**.
8
- * **Modular Vector Store**:
9
- * **Prisma**: Use your existing PostgreSQL database with `pgvector`.
10
- * **ChromaDB**: Native support for the open-source vector database.
11
- * **Qdrant & Milvus**: Additional backends for portability.
12
- * **Extensible**: Easily add others by extending the `VectorStore` class.
13
- * **Advanced Chunking**:
14
- * **Recursive**: Smart splitting based on characters and separators.
15
- * **Token-Aware**: Sentence/paragraph fallback and adaptive overlap based on local entropy.
16
- * **Agentic**: Uses an LLM to split text into semantically complete propositions with JSON validation and dedupe.
17
- * **Advanced Retrieval Strategies**:
18
- * **Naive**: Standard cosine similarity search.
19
- * **HyDE (Hypothetical Document Embeddings)**: Generates a fake answer to the query and searches for that.
20
- * **Multi-Query**: Generates multiple variations of the query to catch different phrasings.
21
- * **Hybrid Search**: Combines semantic (pgvector) and lexical (FTS) results using **Reciprocal Rank Fusion (RRF)**.
22
- * **MMR**: Diversifies results to reduce redundancy.
23
- * **Streaming**: Full support for token-by-token streaming responses.
24
- * **Reranking**: LLM-based reranking to re-order retrieved documents for maximum relevance.
25
- * **File Support**: Native parsing for PDF, DOCX, XLSX, TXT, and Markdown.
26
- * **Index Helpers**: ivfflat for pgvector, GIN FTS index, optional tsvector trigger.
27
- * **Embedding Cache**: SHA256 content-based cache to skip re-embedding.
28
- * **Batch Embeddings**: Gemini and OpenAI adapters support array inputs and dimension control.
29
- * **Metadata Enrichment**: Per-chunk summary, keywords, hypothetical questions; page and section mapping for PDFs/Markdown. Retrieval boosts matching keywords and uses summaries in prompts.
30
- * **Conversation Memory**: Built-in chat history management for context-aware multi-turn conversations.
31
- * **Production Evaluation**: Integrated evaluation module to measure RAG quality (Faithfulness, Relevance).
32
- * **Local LLMs**: First-class support for **Ollama** for local/offline development.
33
- * **Web Configuration UI**: Visual generator to create and validate your configuration file (`vectra webconfig`).
3
+ **Vectra** is a **production-grade, provider-agnostic Node.js SDK** for building **end-to-end Retrieval-Augmented Generation (RAG)** systems. It is designed for teams that need **flexibility, extensibility, correctness, and observability** across embeddings, vector databases, retrieval strategies, and LLM providers—without locking into a single vendor.
4
+
5
+ ![GitHub Release](https://img.shields.io/github/v/release/iamabhishek-n/vectra-js)
6
+ ![NPM Version](https://img.shields.io/npm/v/vectra-js)
7
+ ![NPM Downloads](https://img.shields.io/npm/dm/vectra-js)
8
+ [![Quality Gate Status](https://sonarcloud.io/api/project_badges/measure?project=iamabhishek-n_vectra-js&metric=alert_status)](https://sonarcloud.io/summary/new_code?id=iamabhishek-n_vectra-js)
9
+
10
+ If you find this project useful, consider supporting it:<br>
11
+ [![Star this project on GitHub](https://img.shields.io/github/stars/iamabhishek-n/vectra-js?style=social)](https://github.com/iamabhishek-n/vectra-js/stargazers)
12
+ [![Sponsor me on GitHub](https://img.shields.io/badge/Sponsor%20me%20on-GitHub-%23FFD43B?logo=github)](https://github.com/sponsors/iamabhishek-n)
13
+ [![Buy me a Coffee](https://img.shields.io/badge/Buy%20me%20a%20Coffee-%23FFDD00?logo=buy-me-a-coffee&logoColor=black)](https://www.buymeacoffee.com/iamabhishekn)
14
+
15
+ ## Table of Contents
16
+
17
+ * [1. Overview](#1-overview)
18
+ * [2. Design Goals & Philosophy](#2-design-goals--philosophy)
19
+ * [3. Feature Matrix](#3-feature-matrix)
20
+ * [4. Installation](#4-installation)
21
+ * [5. Quick Start](#5-quick-start)
22
+ * [6. Core Concepts](#6-core-concepts)
23
+
24
+ * [Providers](#providers)
25
+ * [Vector Stores](#vector-stores)
26
+ * [Chunking](#chunking)
27
+ * [Retrieval](#retrieval)
28
+ * [Reranking](#reranking)
29
+ * [Metadata Enrichment](#metadata-enrichment)
30
+ * [Query Planning & Grounding](#query-planning--grounding)
31
+ * [Conversation Memory](#conversation-memory)
32
+ * [7. Configuration Reference (Usage‑Driven)](#7-configuration-reference-usage-driven)
33
+ * [8. Ingestion Pipeline](#8-ingestion-pipeline)
34
+ * [9. Querying & Streaming](#9-querying--streaming)
35
+ * [10. Conversation Memory](#10-conversation-memory)
36
+ * [11. Evaluation & Quality Measurement](#11-evaluation--quality-measurement)
37
+ * [12. CLI](#12-cli)
38
+
39
+ * [Ingest & Query](#ingest--query)
40
+ * [WebConfig (Config Generator UI)](#webconfig-config-generator-ui)
41
+ * [Observability Dashboard](#observability-dashboard)
42
+ * [13. Observability & Callbacks](#13-observability--callbacks)
43
+ * [14. Database Schemas & Indexing](#14-database-schemas--indexing)
44
+ * [15. Extending Vectra](#15-extending-vectra)
45
+ * [16. Architecture Overview](#16-architecture-overview)
46
+ * [17. Development & Contribution Guide](#17-development--contribution-guide)
47
+ * [18. Production Best Practices](#18-production-best-practices)
34
48
 
35
49
  ---
36
50
 
37
- ## Installation
51
+ ## 1. Overview
52
+
53
+ Vectra provides a **fully modular RAG pipeline**:
54
+
55
+ ```
56
+ Load → Chunk → Embed → Store → Retrieve → Rerank → Plan → Ground → Generate → Stream
57
+ ```
58
+ <p align="center">
59
+ <img src="https://vectra.thenxtgenagents.com/vectraArch.png" alt="Vectra SDK Architecture" width="900">
60
+ </p>
61
+
62
+ <p align="center">
63
+ <em>Vectra SDK – End-to-End RAG Architecture</em>
64
+ </p>
65
+
66
+ Every stage is **explicitly configurable**, validated at runtime, and observable.
67
+
68
+ ### Key Characteristics
69
+
70
+ * Provider‑agnostic LLM & embedding layer
71
+ * Multiple vector backends (Postgres, Chroma, Qdrant, Milvus)
72
+ * Advanced retrieval strategies (HyDE, Multi‑Query, Hybrid RRF, MMR)
73
+ * Unified streaming interface
74
+ * Built‑in evaluation & observability
75
+ * CLI + SDK parity
76
+
77
+ ---
78
+
79
+ ## 2. Design Goals & Philosophy
80
+
81
+ ### Explicitness over Magic
82
+
83
+ Vectra avoids hidden defaults. Chunking, retrieval, grounding, memory, and generation behavior are always explicit.
84
+
85
+ ### Production‑First
86
+
87
+ Index helpers, rate limiting, embedding cache, observability, and evaluation are first‑class features.
88
+
89
+ ### Provider Neutrality
90
+
91
+ Swapping OpenAI → Gemini → Anthropic → Ollama requires **no application code changes**.
92
+
93
+ ### Extensibility
94
+
95
+ Every major subsystem (providers, vector stores, callbacks) is interface‑driven.
96
+
97
+ ---
98
+
99
+ ## 3. Feature Matrix
100
+
101
+ ### Providers
102
+
103
+ * **Embeddings**: OpenAI, Gemini, Ollama, HuggingFace
104
+ * **Generation**: OpenAI, Gemini, Anthropic, Ollama, OpenRouter, HuggingFace
105
+ * **Streaming**: Unified async generator
106
+
107
+ ### Vector Stores
108
+
109
+ * PostgreSQL (Prisma + pgvector)
110
+ * PostgreSQL (native `pg` driver)
111
+ * ChromaDB
112
+ * Qdrant
113
+ * Milvus
114
+
115
+ ### Retrieval Strategies
116
+
117
+ * Naive cosine similarity
118
+ * HyDE (Hypothetical Document Embeddings)
119
+ * Multi‑Query expansion
120
+ * Hybrid semantic + lexical (RRF)
121
+ * MMR diversification
122
+
123
+ ---
124
+
125
+ ## 4. Installation
126
+
127
+ ### Library
38
128
 
39
129
  ```bash
40
- # Library (npm)
41
130
  npm install vectra-js @prisma/client
42
- npm install chromadb # optional: ChromaDB backend
43
-
44
- # Library (pnpm)
131
+ # or
45
132
  pnpm add vectra-js @prisma/client
46
- pnpm add chromadb # optional
47
-
48
- # CLI (global install)
49
- npm i -g vectra-js # or: pnpm add -g vectra-js
133
+ ```
50
134
 
51
- # CLI (no global install)
52
- # Uses local project bin if vectra-js is installed
53
- npx vectra ingest ./docs --config=./config.json
135
+ Optional backends:
54
136
 
55
- # CLI (one-off run with pnpm dlx)
56
- pnpm dlx vectra-js vectra query "What is our leave policy?" --config=./config.json --stream
137
+ ```bash
138
+ npm install chromadb
57
139
  ```
58
140
 
59
- ---
141
+ ### CLI
60
142
 
61
- ## Usage Guide
143
+ ```bash
144
+ npm i -g vectra-js
145
+ # or
146
+ pnpm add -g vectra-js
147
+ ```
62
148
 
63
- ### 1. Configuration
149
+ ---
64
150
 
65
- The SDK uses a strictly typed configuration object (validated with Zod).
151
+ ## 5. Quick Start
66
152
 
67
- ```javascript
68
- const { ProviderType, ChunkingStrategy, RetrievalStrategy } = require('vectra-js');
153
+ ```js
154
+ const { VectraClient, ProviderType } = require('vectra-js');
69
155
 
70
- const config = {
71
- // 1. Embedding Provider
156
+ const client = new VectraClient({
72
157
  embedding: {
73
158
  provider: ProviderType.OPENAI,
74
159
  apiKey: process.env.OPENAI_API_KEY,
75
- modelName: 'text-embedding-3-small',
76
- dimensions: 1536 // Optional
160
+ modelName: 'text-embedding-3-small'
77
161
  },
78
-
79
- // 2. LLM Provider (for Generation)
80
162
  llm: {
81
163
  provider: ProviderType.GEMINI,
82
164
  apiKey: process.env.GOOGLE_API_KEY,
83
165
  modelName: 'gemini-1.5-pro-latest'
84
166
  },
85
-
86
- // 3. Database (Modular)
87
167
  database: {
88
- type: 'prisma', // or 'chroma'
89
- clientInstance: prismaClient, // Your instantiated DB client
90
- tableName: 'Document', // Table or Collection name
91
- columnMap: { // Map SDK fields to your DB columns
92
- content: 'text',
93
- vector: 'embedding',
94
- metadata: 'meta'
95
- }
96
- },
97
-
98
- // 4. Chunking (Optional)
99
- chunking: {
100
- strategy: ChunkingStrategy.RECURSIVE,
101
- chunkSize: 1000,
102
- chunkOverlap: 200
103
- },
104
-
105
- // 5. Retrieval (Optional)
106
- retrieval: {
107
- strategy: RetrievalStrategy.HYBRID, // Uses RRF
108
- llmConfig: { /* Config for query rewriting LLM */ }
168
+ type: 'prisma',
169
+ clientInstance: prisma,
170
+ tableName: 'Document'
109
171
  }
110
- };
111
- ```
172
+ });
112
173
 
113
- ### Configuration Reference
114
-
115
- - Embedding
116
- - `provider`: one of `ProviderType.OPENAI`, `ProviderType.GEMINI`
117
- - `apiKey`: provider API key string
118
- - `modelName`: embedding model identifier
119
- - `dimensions`: number; ensures vector size matches DB `pgvector(n)`
120
- - LLM
121
- - `provider`: `ProviderType.OPENAI` | `ProviderType.GEMINI` | `ProviderType.ANTHROPIC` | `ProviderType.OLLAMA`
122
- - `apiKey`: provider API key string (optional for Ollama)
123
- - `modelName`: generation model identifier
124
- - `baseUrl`: optional custom URL (e.g., for Ollama)
125
- - `temperature`: number; optional sampling temperature
126
- - `maxTokens`: number; optional max output tokens
127
- - Memory
128
- - `enabled`: boolean; toggle memory on/off (default: false)
129
- - `type`: `'in-memory' | 'redis' | 'postgres'`
130
- - `maxMessages`: number; number of recent messages to retain (default: 20)
131
- - `redis`: `{ clientInstance, keyPrefix }` where `keyPrefix` defaults to `'vectra:chat:'`
132
- - `postgres`: `{ clientInstance, tableName, columnMap }` where `tableName` defaults to `'ChatMessage'` and `columnMap` maps `{ sessionId, role, content, createdAt }`
133
- - Ingestion
134
- - `rateLimitEnabled`: boolean; toggle rate limiting on/off (default: false)
135
- - `concurrencyLimit`: number; max concurrent embedding requests when enabled (default: 5)
136
- - `mode`: `'skip' | 'append' | 'replace'`; idempotency behavior (default: `'skip'`)
137
- - Database
138
- - `type`: `prisma` | `chroma` | `qdrant` | `milvus`
139
- - `clientInstance`: instantiated client for the chosen backend
140
- - `tableName`: table/collection name (Postgres/Qdrant/Milvus)
141
- - `columnMap`: maps SDK fields to DB columns
142
- - `content`: text column name
143
- - `vector`: embedding vector column name (for Postgres pgvector)
144
- - `metadata`: JSON column name for per-chunk metadata
145
- - Chunking
146
- - `strategy`: `ChunkingStrategy.RECURSIVE` | `ChunkingStrategy.AGENTIC`
147
- - `chunkSize`: number; preferred chunk size (characters)
148
- - `chunkOverlap`: number; overlap between adjacent chunks (characters)
149
- - `separators`: array of string separators to split on (optional)
150
- - Retrieval
151
- - `strategy`: `RetrievalStrategy.NAIVE` | `HYDE` | `MULTI_QUERY` | `HYBRID` | `MMR`
152
- - `llmConfig`: optional LLM config for query rewriting (HyDE/Multi-Query)
153
- - `mmrLambda`: \(0..1\) tradeoff between relevance and diversity (default: 0.5)
154
- - `mmrFetchK`: candidate pool size for MMR (default: 20)
155
- - Reranking
156
- - `enabled`: boolean; enable LLM-based reranking
157
- - `topN`: number; final number of docs to keep (optional)
158
- - `windowSize`: number; number of docs considered before reranking
159
- - `llmConfig`: optional LLM config for the reranker
160
- - Metadata
161
- - `enrichment`: boolean; generate `summary`, `keywords`, `hypothetical_questions`
162
- - Callbacks
163
- - `callbacks`: array of handlers; use `LoggingCallbackHandler` or `StructuredLoggingCallbackHandler`
164
- - Observability
165
- - `enabled`: boolean; enable SQLite-based observability (default: false)
166
- - `sqlitePath`: string; path to SQLite database file (default: 'vectra-observability.db')
167
- - `projectId`: string; project identifier for multi-project support (default: 'default')
168
- - `trackMetrics`: boolean; track latency and other metrics
169
- - `trackTraces`: boolean; track detailed workflow traces
170
- - `sessionTracking`: boolean; track chat sessions
171
- - Index Helpers (Postgres + Prisma)
172
- - `ensureIndexes()`: creates ivfflat and GIN FTS indexes and optional `tsvector` trigger
173
-
174
-
175
- ### 2. Initialization & Ingestion
176
-
177
- ```javascript
178
- const { VectraClient } = require('vectra-js');
179
- const client = new VectraClient(config);
180
-
181
- // Ingest a file (supports .pdf, .docx, .txt, .md, .xlsx)
182
- // This will: Load -> Chunk -> Embed -> Store
183
- await client.ingestDocuments('./documents/employee_handbook.pdf');
184
- // Ensure indexes (Postgres + Prisma)
185
- if (config.database.type === 'prisma' && client.vectorStore.ensureIndexes) {
186
- await client.vectorStore.ensureIndexes();
187
- }
188
- // Enable metadata enrichment
189
- // metadata: { enrichment: true }
174
+ await client.ingestDocuments('./docs');
175
+ const res = await client.queryRAG('What is the vacation policy?');
176
+ console.log(res.answer);
190
177
  ```
191
178
 
192
- ### Document Management
179
+ ---
193
180
 
194
- ```javascript
195
- // List recent documents (by metadata filter)
196
- const docs = await client.listDocuments({ filter: { docTitle: 'Employee Handbook' }, limit: 50 });
181
+ ## 6. Core Concepts
197
182
 
198
- // Delete by ids or metadata filter
199
- await client.deleteDocuments({ ids: docs.map(d => d.id) });
200
- // or:
201
- await client.deleteDocuments({ filter: { absolutePath: '/abs/path/to/file.pdf' } });
183
+ ### Providers
202
184
 
203
- // Update existing docs (requires backend upsert support)
204
- await client.updateDocuments([
205
- { id: docs[0].id, content: 'Updated content', metadata: { docTitle: 'Employee Handbook' } }
206
- ]);
207
- ```
185
+ Providers implement embeddings, generation, or both. Vectra normalizes outputs and streaming across providers.
208
186
 
209
- ### 3. Querying (Standard)
187
+ ### Vector Stores
210
188
 
211
- ```javascript
212
- const response = await client.queryRAG("What is the vacation policy?");
189
+ Vector stores persist embeddings and metadata. They are fully swappable via config.
213
190
 
214
- console.log("Answer:", response.answer);
215
- console.log("Sources:", response.sources); // Metadata of retrieved chunks
216
- ```
191
+ ### Chunking
217
192
 
218
- ### 4. Querying (Streaming)
193
+ * **Recursive**: Character‑aware, separator‑aware splitting
194
+ * **Agentic**: LLM‑driven semantic propositions (best for policies, legal docs)
219
195
 
220
- Ideal for Chat UIs. Returns an Async Generator of unified chunks.
196
+ ### Retrieval
197
+
198
+ Controls recall vs precision using multiple strategies.
199
+
200
+ ### Reranking
201
+
202
+ Optional LLM‑based reordering of retrieved chunks.
203
+
204
+ ### Metadata Enrichment
221
205
 
222
- ```javascript
223
- const stream = await client.queryRAG("Draft a welcome email...", null, true);
206
+ Optional per‑chunk summaries, keywords, and hypothetical questions generated at ingestion time.
224
207
 
225
- for await (const chunk of stream) {
226
- process.stdout.write(chunk.delta || "");
208
+ ### Query Planning & Grounding
209
+
210
+ Controls how context is assembled and how strictly answers must be grounded in retrieved text.
211
+
212
+ ### Conversation Memory
213
+
214
+ Persist multi‑turn chat history across sessions.
215
+
216
+ ---
217
+
218
+ ## 7. Configuration Reference (Usage‑Driven)
219
+
220
+ > All configuration is validated using **Zod** at runtime.
221
+
222
+ ### Embedding
223
+
224
+ ```js
225
+ embedding: {
226
+ provider: ProviderType.OPENAI,
227
+ apiKey: process.env.OPENAI_API_KEY,
228
+ modelName: 'text-embedding-3-small',
229
+ dimensions: 1536
227
230
  }
228
231
  ```
229
232
 
230
- ### 5. Conversation Memory
231
-
232
- Enable multi-turn conversations by configuring memory and passing a `sessionId`.
233
-
234
- ```javascript
235
- // In config (enable memory: default is off)
236
- const config = {
237
- // ...
238
- memory: { enabled: true, type: 'in-memory', maxMessages: 10 }
239
- };
240
-
241
- // Redis-backed memory
242
- const redis = /* your redis client instance */;
243
- const configRedis = {
244
- // ...
245
- memory: {
246
- enabled: true,
247
- type: 'redis',
248
- redis: { clientInstance: redis, keyPrefix: 'vectra:chat:' },
249
- maxMessages: 20
250
- }
251
- };
252
-
253
- // Postgres-backed memory
254
- const prisma = /* your Prisma client instance */;
255
- const configPostgres = {
256
- // ...
257
- memory: {
258
- enabled: true,
259
- type: 'postgres',
260
- postgres: {
261
- clientInstance: prisma,
262
- tableName: 'ChatMessage',
263
- columnMap: { sessionId: 'sessionId', role: 'role', content: 'content', createdAt: 'createdAt' }
264
- },
265
- maxMessages: 20
266
- }
267
- };
233
+ Use `dimensions` when using pgvector to avoid runtime mismatches.
268
234
 
269
- // In your app:
270
- const sessionId = 'user-123-session-abc';
271
- const response = await client.queryRAG("What is the refund policy?", null, false, sessionId);
272
- const followUp = await client.queryRAG("Does it apply to sale items?", null, false, sessionId);
235
+ ---
236
+
237
+ ### LLM
238
+
239
+ ```js
240
+ llm: {
241
+ provider: ProviderType.GEMINI,
242
+ apiKey: process.env.GOOGLE_API_KEY,
243
+ modelName: 'gemini-1.5-pro-latest',
244
+ temperature: 0.3,
245
+ maxTokens: 1024
246
+ }
273
247
  ```
274
248
 
275
- ### 6. Production Evaluation
249
+ Used for:
276
250
 
277
- Measure the quality of your RAG pipeline using the built-in evaluation module.
251
+ * Answer generation
252
+ * HyDE & Multi‑Query
253
+ * Agentic chunking
254
+ * Reranking
278
255
 
279
- ```javascript
280
- const testSet = [
281
- {
282
- question: "What is the capital of France?",
283
- expectedGroundTruth: "Paris is the capital of France."
284
- }
285
- ];
256
+ ---
286
257
 
287
- const results = await client.evaluate(testSet);
258
+ ### Database
288
259
 
289
- console.log(`Faithfulness: ${results.averageFaithfulness}`);
290
- console.log(`Relevance: ${results.averageRelevance}`);
260
+ ```js
261
+ database: {
262
+ type: 'prisma',
263
+ clientInstance: prisma,
264
+ tableName: 'Document'
265
+ }
291
266
  ```
292
267
 
268
+ Supports Prisma, Chroma, Qdrant, Milvus.
269
+
293
270
  ---
294
271
 
295
- ## Supported Providers & Backends
296
-
297
- | Feature | OpenAI | Gemini | Anthropic | Ollama | OpenRouter | HuggingFace |
298
- | :--- | :---: | :---: | :---: | :---: | :---: | :---: |
299
- | **Embeddings** | ✅ | ✅ | ❌ | ✅ | ❌ | ✅ |
300
- | **Generation** | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
301
- | **Streaming** | ✅ | ✅ | ✅ | ✅ | ✅ | ⚠️ |
302
-
303
- ### Ollama (Local)
304
- - Use Ollama for local, offline development.
305
- - Set `provider = ProviderType.OLLAMA`.
306
- - Default `baseUrl` is `http://localhost:11434`.
307
- ```javascript
308
- const config = {
309
- embedding: { provider: ProviderType.OLLAMA, modelName: 'nomic-embed-text' },
310
- llm: { provider: ProviderType.OLLAMA, modelName: 'llama3' }
311
- };
312
- ```
272
+ ### Chunking
313
273
 
314
- ### OpenRouter (Generation)
315
- - Use OpenRouter as a unified generation provider.
316
- - Set `llm.provider = ProviderType.OPENROUTER`, `llm.modelName` to a supported model (e.g., `openai/gpt-4o`).
317
- - Provide `OPENROUTER_API_KEY`; optional attribution via `OPENROUTER_REFERER`, `OPENROUTER_TITLE`.
318
- ```javascript
319
- const config = {
320
- llm: {
321
- provider: ProviderType.OPENROUTER,
322
- apiKey: process.env.OPENROUTER_API_KEY,
323
- modelName: 'openai/gpt-4o',
324
- defaultHeaders: {
325
- 'HTTP-Referer': 'https://your.app',
326
- 'X-Title': 'Your App'
327
- }
328
- }
329
- };
274
+ ```js
275
+ chunking: {
276
+ strategy: ChunkingStrategy.RECURSIVE,
277
+ chunkSize: 1000,
278
+ chunkOverlap: 200
279
+ }
330
280
  ```
331
281
 
332
- ### Database Schemas
282
+ Agentic chunking:
333
283
 
334
- **Prisma (PostgreSQL)**
335
- ```prisma
336
- model Document {
337
- id String @id @default(uuid())
338
- content String
339
- metadata Json
340
- vector Unsupported("vector")? // pgvector type
341
- createdAt DateTime @default(now())
284
+ ```js
285
+ chunking: {
286
+ strategy: ChunkingStrategy.AGENTIC,
287
+ agenticLlm: {
288
+ provider: ProviderType.OPENAI,
289
+ apiKey: process.env.OPENAI_API_KEY,
290
+ modelName: 'gpt-4o-mini'
291
+ }
342
292
  }
343
293
  ```
344
294
 
345
- **ChromaDB**
346
- No schema required; collections are created automatically.
347
-
348
295
  ---
349
296
 
350
- ## API Reference
351
-
352
- ### `new VectraClient(config)`
353
- Creates a new client instance. Throws error if config is invalid.
354
-
355
- ### `client.ingestDocuments(path: string): Promise<void>`
356
- Reads a file **or recursively iterates a directory**, chunks content, embeds, and saves to the configured DB.
357
- - If `path` is a file: Ingests that single file.
358
- - If `path` is a directory: Recursively finds all supported files and ingests them.
359
-
360
- ### `client.queryRAG(query: string, filter?: object, stream?: boolean)`
361
- Performs the RAG pipeline:
362
- 1. **Retrieval**: Fetches relevant docs using `config.retrieval.strategy`.
363
- 2. **Reranking**: (Optional) Re-orders docs using `config.reranking`.
364
- 3. **Generation**: Sends context + query to LLM.
365
-
366
- **Returns**:
367
- * If `stream=false` (default): `{ answer: string | object, sources: object[] }`
368
- * If `stream=true`: `AsyncGenerator<{ delta: string, finish_reason: string | null, usage: any | null }>`
369
-
370
- ### Advanced Configuration
371
-
372
- - Query Planning
373
- - `queryPlanning.tokenBudget`: number; total token budget for context
374
- - `queryPlanning.preferSummariesBelow`: number; prefer metadata summaries under this budget
375
- - `queryPlanning.includeCitations`: boolean; include titles/sections/pages in context
376
- - Grounding
377
- - `grounding.enabled`: boolean; enable extractive snippet grounding
378
- - `grounding.strict`: boolean; use only grounded snippets when true
379
- - `grounding.maxSnippets`: number; max snippets to include
380
- - Generation
381
- - `generation.structuredOutput`: `'none' | 'citations'`; enable inline citations
382
- - `generation.outputFormat`: `'text' | 'json'`; return JSON when set to `json`
383
- - Prompts
384
- - `prompts.query`: string template using `{{context}}` and `{{question}}`
385
- - `prompts.reranking`: optional template for reranker prompt
386
- - Tracing
387
- - `tracing.enable`: boolean; enable provider/DB/pipeline span hooks
297
+ ### Retrieval
388
298
 
389
- ### CLI
299
+ ```js
300
+ retrieval: { strategy: RetrievalStrategy.HYBRID }
301
+ ```
390
302
 
391
- Quickly ingest and query to validate configurations.
303
+ HYBRID is recommended for production.
392
304
 
393
- ```bash
394
- vectra ingest ./docs --config=./nodejs-test/config.json
395
- vectra query "What is our leave policy?" --config=./nodejs-test/config.json --stream
396
- ```
305
+ ---
306
+
307
+ ### Reranking
397
308
 
398
- ### Ingestion Rate Limiting
399
- - Toggle ingestion rate limiting via `config.ingestion`.
400
- ```javascript
401
- const config = {
402
- // ...
403
- ingestion: { rateLimitEnabled: true, concurrencyLimit: 5 }
404
- };
309
+ ```js
310
+ reranking: {
311
+ enabled: true,
312
+ windowSize: 20,
313
+ topN: 5
314
+ }
405
315
  ```
406
316
 
407
317
  ---
408
318
 
409
- ## Extending
319
+ ### Memory
410
320
 
411
- ### Custom Vector Store
412
- Inherit from `VectorStore` class and implement `addDocuments` and `similaritySearch`.
321
+ ```js
322
+ memory: { enabled: true, type: 'in-memory', maxMessages: 20 }
323
+ ```
413
324
 
414
- ```javascript
415
- const { VectorStore } = require('vectra-js/interfaces');
325
+ Redis and Postgres are supported.
416
326
 
417
- class MyCustomDB extends VectorStore {
418
- async addDocuments(docs) { ... }
419
- async similaritySearch(vector, k) { ... }
327
+ ---
328
+
329
+ ### Observability
330
+
331
+ ```js
332
+ observability: {
333
+ enabled: true,
334
+ sqlitePath: 'vectra-observability.db'
420
335
  }
421
336
  ```
422
337
 
423
338
  ---
424
339
 
425
- ## Developer Guide
340
+ ## 8. Ingestion Pipeline
426
341
 
427
- ### Setup
428
- - Use `pnpm` for package management.
429
- - Node.js 18+ recommended.
430
- - Install with `pnpm install`.
431
- - Lint with `pnpm run lint`.
342
+ ```js
343
+ await client.ingestDocuments('./documents');
344
+ ```
432
345
 
433
- ### Environment
434
- - `OPENAI_API_KEY`, `GOOGLE_API_KEY`, `ANTHROPIC_API_KEY` for providers.
435
- - Database client instance configured under `config.database.clientInstance`.
346
+ Supports files or directories.
436
347
 
437
- ### Architecture
438
- - Pipeline: Load → Chunk → Embed → Store → Retrieve → Rerank → Plan → Ground → Generate → Stream.
439
- - Core client: `VectraClient` (library export).
440
- - Configuration: `VectraConfig` (validated schema).
441
- - Vector store interface: `VectorStore` (extend to add custom stores).
442
- - Callbacks: `StructuredLoggingCallbackHandler` and custom handler support.
348
+ Formats: PDF, DOCX, XLSX, TXT, Markdown
443
349
 
444
- ### Retrieval Strategies
445
- - Supports NAIVE, HYDE, MULTI_QUERY, HYBRID (RRF fusion built-in).
350
+ ---
446
351
 
447
- ### Query Planning & Grounding
448
- - Context assembly respects `queryPlanning` (token budget, summary preference, citations).
449
- - Snippet extraction controlled by `grounding` (strict mode and max snippets).
352
+ ## 9. Querying & Streaming
450
353
 
451
- ### Streaming Interface
452
- - Unified streaming shape `{ delta, finish_reason, usage }` across OpenAI, Gemini, Anthropic.
354
+ ```js
355
+ const res = await client.queryRAG('Refund policy?');
356
+ ```
453
357
 
454
- ### Adding a Provider
455
- - Implement `embedDocuments`, `embedQuery`, `generate`, `generateStream`.
456
- - Ensure streaming yields `{ delta, finish_reason, usage }`.
457
- - Wire via `llm.provider` in config.
358
+ Streaming:
458
359
 
459
- ### Adding a Vector Store
460
- - Extend `VectorStore`; implement `addDocuments`, `similaritySearch`, optionally `hybridSearch`.
461
- - Select via `database.type` in config.
360
+ ```js
361
+ const stream = await client.queryRAG('Draft email', null, true);
362
+ for await (const chunk of stream) process.stdout.write(chunk.delta || '');
363
+ ```
462
364
 
463
- ### Callbacks & Observability
464
- - Available events: `onIngestStart`, `onIngestEnd`, `onIngestSummary`, `onChunkingStart`, `onEmbeddingStart`, `onRetrievalStart`, `onRetrievalEnd`, `onRerankingStart`, `onRerankingEnd`, `onGenerationStart`, `onGenerationEnd`, `onError`.
465
- - Extend `StructuredLoggingCallbackHandler` to add error codes and payload sizes.
365
+ ---
466
366
 
467
- ### CLI
468
- - Binary `vectra` included with the package.
469
- - Ingest: `vectra ingest <path> --config=./config.json`.
470
- - Query: `vectra query "<text>" --config=./config.json --stream`.
367
+ ## 10. Conversation Memory
471
368
 
472
- ### Coding Conventions
473
- - CommonJS modules, flat ESLint config.
474
- - Follow existing naming: `chunkIndex` in JS; use consistent casing.
369
+ Pass a `sessionId` to maintain context across turns.
475
370
 
476
371
  ---
477
372
 
478
- ## Feature Guide
373
+ ## 11. Evaluation & Quality Measurement
479
374
 
480
- ### Embeddings
481
- - Providers: `OPENAI`, `GEMINI`.
482
- - Configure dimensions to match DB `pgvector(n)` when applicable.
483
- - Example:
484
- ```javascript
485
- const config = {
486
- embedding: {
487
- provider: ProviderType.OPENAI,
488
- apiKey: process.env.OPENAI_API_KEY,
489
- modelName: 'text-embedding-3-small',
490
- dimensions: 1536
491
- },
492
- // ...
493
- };
375
+ ```js
376
+ await client.evaluate([{ question: 'Capital of France?', expectedGroundTruth: 'Paris' }]);
494
377
  ```
495
378
 
496
- ### Generation
497
- - Providers: `OPENAI`, `GEMINI`, `ANTHROPIC`.
498
- - Options: `temperature`, `maxTokens`.
499
- - Structured output: set `generation.outputFormat = 'json'` and parse `answer`.
500
- ```javascript
501
- const config = {
502
- llm: { provider: ProviderType.GEMINI, apiKey: process.env.GOOGLE_API_KEY, modelName: 'gemini-1.5-pro-latest', temperature: 0.3 },
503
- generation: { outputFormat: 'json', structuredOutput: 'citations' }
504
- };
505
- const client = new VectraClient(config);
506
- const res = await client.queryRAG('Summarize our policy with citations.');
507
- console.log(res.answer); // JSON object or string on fallback
508
- ```
379
+ Metrics:
509
380
 
510
- - OpenRouter usage:
511
- ```javascript
512
- const config = {
513
- llm: {
514
- provider: ProviderType.OPENROUTER,
515
- apiKey: process.env.OPENROUTER_API_KEY,
516
- modelName: 'openai/gpt-4o',
517
- defaultHeaders: { 'HTTP-Referer': 'https://your.app', 'X-Title': 'Your App' }
518
- }
519
- };
520
- ```
381
+ * Faithfulness
382
+ * Relevance
521
383
 
522
- ### Chunking
523
- - Strategies: `RECURSIVE`, `AGENTIC`.
524
- - Agentic requires `chunking.agenticLlm` config.
525
- ```javascript
526
- const config = {
527
- chunking: {
528
- strategy: ChunkingStrategy.AGENTIC,
529
- agenticLlm: { provider: ProviderType.OPENAI, apiKey: process.env.OPENAI_API_KEY, modelName: 'gpt-4o-mini' },
530
- chunkSize: 1200,
531
- chunkOverlap: 200
532
- }
533
- };
534
- ```
384
+ ---
535
385
 
536
- ### Retrieval
537
- - Strategies: `NAIVE`, `HYDE`, `MULTI_QUERY`, `HYBRID`.
538
- - HYDE/MULTI_QUERY require `retrieval.llmConfig`.
539
- - Example:
540
- ```javascript
541
- const config = {
542
- retrieval: {
543
- strategy: RetrievalStrategy.MULTI_QUERY,
544
- llmConfig: { provider: ProviderType.OPENAI, apiKey: process.env.OPENAI_API_KEY, modelName: 'gpt-4o-mini' }
545
- }
546
- };
547
- ```
386
+ ## 12. CLI
548
387
 
549
- ### Reranking
550
- - Enable LLM-based reranking to reorder results.
551
- ```javascript
552
- const config = {
553
- reranking: {
554
- enabled: true,
555
- topN: 5,
556
- windowSize: 20,
557
- llmConfig: { provider: ProviderType.ANTHROPIC, apiKey: process.env.ANTHROPIC_API_KEY, modelName: 'claude-3-haiku' }
558
- }
559
- };
560
- ```
388
+ ### Ingest & Query
561
389
 
562
- ### Metadata Enrichment
563
- - Add summaries, keywords, hypothetical questions during ingestion.
564
- ```javascript
565
- const config = { metadata: { enrichment: true } };
566
- await client.ingestDocuments('./docs/handbook.pdf');
390
+ ```bash
391
+ vectra ingest ./docs --config=./config.json
392
+ vectra query "What is our leave policy?" --config=./config.json --stream
567
393
  ```
568
394
 
569
- ### Query Planning
570
- - Control context assembly with token budget and summary preference.
571
- ```javascript
572
- const config = {
573
- queryPlanning: { tokenBudget: 2048, preferSummariesBelow: 1024, includeCitations: true }
574
- };
575
- ```
395
+ ---
576
396
 
577
- ### Answer Grounding
578
- - Inject extractive snippets; use `strict` to only allow grounded quotes.
579
- ```javascript
580
- const config = { grounding: { enabled: true, strict: false, maxSnippets: 4 } };
581
- ```
397
+ ### WebConfig (Config Generator UI)
582
398
 
583
- ### Prompts
584
- - Provide a custom query template using `{{context}}` and `{{question}}`.
585
- ```javascript
586
- const config = {
587
- prompts: { query: 'Use only the following context to answer.\nContext:\n{{context}}\n\nQ: {{question}}' }
588
- };
399
+ ```bash
400
+ vectra webconfig
589
401
  ```
590
402
 
591
- ### Streaming
592
- - Unified async generator with chunks `{ delta, finish_reason, usage }`.
593
- ```javascript
594
- const stream = await client.queryRAG('Draft a welcome email', null, true);
595
- for await (const chunk of stream) process.stdout.write(chunk.delta || '');
596
- ```
403
+ **WebConfig** launches a local web UI that:
597
404
 
598
- ### Filters
599
- - Limit retrieval to metadata fields.
600
- ```javascript
601
- const res = await client.queryRAG('Vacation policy', { docTitle: 'Employee Handbook' });
602
- ```
405
+ * Guides you through building a valid `vectra.config.json`
406
+ * Validates all options interactively
407
+ * Prevents misconfiguration
603
408
 
604
- ### Callbacks
605
- - Hook into pipeline stages for logging/metrics.
606
- ```javascript
607
- const { StructuredLoggingCallbackHandler } = require('vectra-js/src/callbacks');
608
- const config = { callbacks: [ new StructuredLoggingCallbackHandler() ] };
409
+ This is ideal for:
410
+
411
+ * First‑time setup
412
+ * Non‑backend users
413
+ * Sharing configs across teams
414
+
415
+ ---
416
+
417
+ ### Observability Dashboard
418
+
419
+ ```bash
420
+ vectra dashboard
609
421
  ```
610
422
 
423
+ The **Observability Dashboard** is a local web UI backed by SQLite that visualizes:
424
+
425
+ * Ingestion latency
426
+ * Query latency
427
+ * Retrieval & generation traces
428
+ * Chat sessions
429
+
430
+ It helps you:
431
+
432
+ * Debug RAG quality issues
433
+ * Understand latency bottlenecks
434
+ * Monitor production‑like workloads
435
+
436
+ ---
437
+
438
+ ## 13. Observability & Callbacks
439
+
611
440
  ### Observability
612
441
 
613
- Built-in SQLite-based observability to track metrics, traces, and sessions.
614
-
615
- ```javascript
616
- const config = {
617
- // ...
618
- observability: {
619
- enabled: true,
620
- sqlitePath: 'vectra-observability.db',
621
- projectId: 'my-project',
622
- trackMetrics: true,
623
- trackTraces: true,
624
- sessionTracking: true
625
- }
626
- };
627
- ```
442
+ Tracks metrics, traces, and sessions automatically when enabled.
628
443
 
629
- This tracks:
630
- - **Metrics**: Latency (ingest, query).
631
- - **Traces**: Detailed spans for retrieval, generation, and ingestion workflows.
632
- - **Sessions**: Chat session history and last query tracking.
444
+ ### Callbacks
633
445
 
634
- ### Vector Stores
635
- - Prisma (Postgres + pgvector), Chroma, Qdrant, Milvus.
636
- - Configure `database.type`, `tableName`, `columnMap`, `clientInstance`.
637
- ```javascript
638
- const config = {
639
- database: {
640
- type: 'prisma',
641
- clientInstance: prismaClient,
642
- tableName: 'Document',
643
- columnMap: { content: 'content', vector: 'embedding', metadata: 'metadata' }
644
- }
645
- };
446
+ Lifecycle hooks:
447
+
448
+ * Ingestion
449
+ * Chunking
450
+ * Embedding
451
+ * Retrieval
452
+ * Reranking
453
+ * Generation
454
+ * Errors
455
+
456
+ ---
457
+
458
+ ## 14. Database Schemas & Indexing
459
+
460
+ ```prisma
461
+ model Document {
462
+ id String @id @default(uuid())
463
+ content String
464
+ metadata Json
465
+ vector Unsupported("vector")?
466
+ createdAt DateTime @default(now())
467
+ }
646
468
  ```
647
- ### HuggingFace (Embeddings & Generation)
648
- - Use HuggingFace Inference API for embeddings and generation.
649
- - Set `provider = ProviderType.HUGGINGFACE`, `modelName` to a supported model (e.g., `sentence-transformers/all-MiniLM-L6-v2` for embeddings, `tiiuae/falcon-7b-instruct` for generation).
650
- - Provide `HUGGINGFACE_API_KEY`.
651
- ```javascript
652
- const config = {
653
- embedding: { provider: ProviderType.HUGGINGFACE, apiKey: process.env.HUGGINGFACE_API_KEY, modelName: 'sentence-transformers/all-MiniLM-L6-v2' },
654
- llm: { provider: ProviderType.HUGGINGFACE, apiKey: process.env.HUGGINGFACE_API_KEY, modelName: 'tiiuae/falcon-7b-instruct' }
655
- };
469
+
470
+ ---
471
+
472
+ ## 15. Extending Vectra
473
+
474
+ ### Custom Vector Store
475
+
476
+ ```js
477
+ class MyStore extends VectorStore {
478
+ async addDocuments() {}
479
+ async similaritySearch() {}
480
+ }
656
481
  ```
482
+
483
+ ---
484
+
485
+ ## 16. Architecture Overview
486
+
487
+ * `VectraClient`: orchestrator
488
+ * Typed config schema
489
+ * Interface‑driven providers & stores
490
+ * Unified streaming abstraction
491
+
492
+ ---
493
+
494
+ ## 17. Development & Contribution Guide
495
+
496
+ * Node.js 18+
497
+ * pnpm recommended
498
+ * Lint: `pnpm run lint`
499
+
500
+ ---
501
+
502
+ ## 18. Production Best Practices
503
+
504
+ * Match embedding dimensions to pgvector
505
+ * Prefer HYBRID retrieval
506
+ * Enable observability in staging
507
+ * Evaluate before changing chunk sizes
508
+
509
+ ---
510
+
511
+ **Vectra scales cleanly from local prototypes to production‑grade RAG platforms.**