vectra-js 0.9.7 → 0.9.8

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -1,657 +1,511 @@
1
1
  # Vectra (Node.js)
2
2
 
3
- A production-ready, provider-agnostic Node.js SDK for End-to-End RAG (Retrieval-Augmented Generation) pipelines.
4
-
5
- ## Features
6
-
7
- * **Multi-Provider Support**: First-class support for **OpenAI**, **Gemini**, **Anthropic**, **OpenRouter**, and **Hugging Face**.
8
- * **Modular Vector Store**:
9
- * **Prisma**: Use your existing PostgreSQL database with `pgvector` (via Prisma).
10
- * **Native PostgreSQL**: Direct connection to PostgreSQL using `pg` driver (no ORM required).
11
- * **ChromaDB**: Native support for the open-source vector database.
12
- * **Qdrant & Milvus**: Additional backends for portability.
13
- * **Extensible**: Easily add others by extending the `VectorStore` class.
14
- * **Advanced Chunking**:
15
- * **Recursive**: Smart splitting based on characters and separators.
16
- * **Token-Aware**: Sentence/paragraph fallback and adaptive overlap based on local entropy.
17
- * **Agentic**: Uses an LLM to split text into semantically complete propositions with JSON validation and dedupe.
18
- * **Advanced Retrieval Strategies**:
19
- * **Naive**: Standard cosine similarity search.
20
- * **HyDE (Hypothetical Document Embeddings)**: Generates a fake answer to the query and searches for that.
21
- * **Multi-Query**: Generates multiple variations of the query to catch different phrasings.
22
- * **Hybrid Search**: Combines semantic (pgvector) and lexical (FTS) results using **Reciprocal Rank Fusion (RRF)**.
23
- * **MMR**: Diversifies results to reduce redundancy.
24
- * **Streaming**: Full support for token-by-token streaming responses.
25
- * **Reranking**: LLM-based reranking to re-order retrieved documents for maximum relevance.
26
- * **File Support**: Native parsing for PDF, DOCX, XLSX, TXT, and Markdown.
27
- * **Index Helpers**: ivfflat for pgvector, GIN FTS index, optional tsvector trigger.
28
- * **Embedding Cache**: SHA256 content-based cache to skip re-embedding.
29
- * **Batch Embeddings**: Gemini and OpenAI adapters support array inputs and dimension control.
30
- * **Metadata Enrichment**: Per-chunk summary, keywords, hypothetical questions; page and section mapping for PDFs/Markdown. Retrieval boosts matching keywords and uses summaries in prompts.
31
- * **Conversation Memory**: Built-in chat history management for context-aware multi-turn conversations.
32
- * **Production Evaluation**: Integrated evaluation module to measure RAG quality (Faithfulness, Relevance).
33
- * **Local LLMs**: First-class support for **Ollama** for local/offline development.
34
- * **Web Configuration UI**: Visual generator to create and validate your configuration file (`vectra webconfig`).
3
+ **Vectra** is a **production-grade, provider-agnostic Node.js SDK** for building **end-to-end Retrieval-Augmented Generation (RAG)** systems. It is designed for teams that need **flexibility, extensibility, correctness, and observability** across embeddings, vector databases, retrieval strategies, and LLM providers—without locking into a single vendor.
4
+
5
+ ![GitHub Release](https://img.shields.io/github/v/release/iamabhishek-n/vectra-js)
6
+ ![NPM Version](https://img.shields.io/npm/v/vectra-js)
7
+ ![NPM Downloads](https://img.shields.io/npm/dm/vectra-js)
8
+ [![Quality Gate Status](https://sonarcloud.io/api/project_badges/measure?project=iamabhishek-n_vectra-js&metric=alert_status)](https://sonarcloud.io/summary/new_code?id=iamabhishek-n_vectra-js)
9
+
10
+ If you find this project useful, consider supporting it:<br>
11
+ [![Star this project on GitHub](https://img.shields.io/github/stars/iamabhishek-n/vectra-js?style=social)](https://github.com/iamabhishek-n/vectra-js/stargazers)
12
+ [![Sponsor me on GitHub](https://img.shields.io/badge/Sponsor%20me%20on-GitHub-%23FFD43B?logo=github)](https://github.com/sponsors/iamabhishek-n)
13
+ [![Buy me a Coffee](https://img.shields.io/badge/Buy%20me%20a%20Coffee-%23FFDD00?logo=buy-me-a-coffee&logoColor=black)](https://www.buymeacoffee.com/iamabhishekn)
14
+
15
+ ## Table of Contents
16
+
17
+ * [1. Overview](#1-overview)
18
+ * [2. Design Goals & Philosophy](#2-design-goals--philosophy)
19
+ * [3. Feature Matrix](#3-feature-matrix)
20
+ * [4. Installation](#4-installation)
21
+ * [5. Quick Start](#5-quick-start)
22
+ * [6. Core Concepts](#6-core-concepts)
23
+
24
+ * [Providers](#providers)
25
+ * [Vector Stores](#vector-stores)
26
+ * [Chunking](#chunking)
27
+ * [Retrieval](#retrieval)
28
+ * [Reranking](#reranking)
29
+ * [Metadata Enrichment](#metadata-enrichment)
30
+ * [Query Planning & Grounding](#query-planning--grounding)
31
+ * [Conversation Memory](#conversation-memory)
32
+ * [7. Configuration Reference (Usage‑Driven)](#7-configuration-reference-usage-driven)
33
+ * [8. Ingestion Pipeline](#8-ingestion-pipeline)
34
+ * [9. Querying & Streaming](#9-querying--streaming)
35
+ * [10. Conversation Memory](#10-conversation-memory)
36
+ * [11. Evaluation & Quality Measurement](#11-evaluation--quality-measurement)
37
+ * [12. CLI](#12-cli)
38
+
39
+ * [Ingest & Query](#ingest--query)
40
+ * [WebConfig (Config Generator UI)](#webconfig-config-generator-ui)
41
+ * [Observability Dashboard](#observability-dashboard)
42
+ * [13. Observability & Callbacks](#13-observability--callbacks)
43
+ * [14. Database Schemas & Indexing](#14-database-schemas--indexing)
44
+ * [15. Extending Vectra](#15-extending-vectra)
45
+ * [16. Architecture Overview](#16-architecture-overview)
46
+ * [17. Development & Contribution Guide](#17-development--contribution-guide)
47
+ * [18. Production Best Practices](#18-production-best-practices)
35
48
 
36
49
  ---
37
50
 
38
- ## Installation
51
+ ## 1. Overview
52
+
53
+ Vectra provides a **fully modular RAG pipeline**:
54
+
55
+ ```
56
+ Load → Chunk → Embed → Store → Retrieve → Rerank → Plan → Ground → Generate → Stream
57
+ ```
58
+ <p align="center">
59
+ <img src="https://vectra.thenxtgenagents.com/vectraArch.png" alt="Vectra SDK Architecture" width="900">
60
+ </p>
61
+
62
+ <p align="center">
63
+ <em>Vectra SDK – End-to-End RAG Architecture</em>
64
+ </p>
65
+
66
+ Every stage is **explicitly configurable**, validated at runtime, and observable.
67
+
68
+ ### Key Characteristics
69
+
70
+ * Provider‑agnostic LLM & embedding layer
71
+ * Multiple vector backends (Postgres, Chroma, Qdrant, Milvus)
72
+ * Advanced retrieval strategies (HyDE, Multi‑Query, Hybrid RRF, MMR)
73
+ * Unified streaming interface
74
+ * Built‑in evaluation & observability
75
+ * CLI + SDK parity
76
+
77
+ ---
78
+
79
+ ## 2. Design Goals & Philosophy
80
+
81
+ ### Explicitness over Magic
82
+
83
+ Vectra avoids hidden defaults. Chunking, retrieval, grounding, memory, and generation behavior are always explicit.
84
+
85
+ ### Production‑First
86
+
87
+ Index helpers, rate limiting, embedding cache, observability, and evaluation are first‑class features.
88
+
89
+ ### Provider Neutrality
90
+
91
+ Swapping OpenAI → Gemini → Anthropic → Ollama requires **no application code changes**.
92
+
93
+ ### Extensibility
94
+
95
+ Every major subsystem (providers, vector stores, callbacks) is interface‑driven.
96
+
97
+ ---
98
+
99
+ ## 3. Feature Matrix
100
+
101
+ ### Providers
102
+
103
+ * **Embeddings**: OpenAI, Gemini, Ollama, HuggingFace
104
+ * **Generation**: OpenAI, Gemini, Anthropic, Ollama, OpenRouter, HuggingFace
105
+ * **Streaming**: Unified async generator
106
+
107
+ ### Vector Stores
108
+
109
+ * PostgreSQL (Prisma + pgvector)
110
+ * PostgreSQL (native `pg` driver)
111
+ * ChromaDB
112
+ * Qdrant
113
+ * Milvus
114
+
115
+ ### Retrieval Strategies
116
+
117
+ * Naive cosine similarity
118
+ * HyDE (Hypothetical Document Embeddings)
119
+ * Multi‑Query expansion
120
+ * Hybrid semantic + lexical (RRF)
121
+ * MMR diversification
122
+
123
+ ---
124
+
125
+ ## 4. Installation
126
+
127
+ ### Library
39
128
 
40
129
  ```bash
41
- # Library (npm)
42
130
  npm install vectra-js @prisma/client
43
- npm install chromadb # optional: ChromaDB backend
44
-
45
- # Library (pnpm)
131
+ # or
46
132
  pnpm add vectra-js @prisma/client
47
- pnpm add chromadb # optional
48
-
49
- # CLI (global install)
50
- npm i -g vectra-js # or: pnpm add -g vectra-js
133
+ ```
51
134
 
52
- # CLI (no global install)
53
- # Uses local project bin if vectra-js is installed
54
- npx vectra ingest ./docs --config=./config.json
135
+ Optional backends:
55
136
 
56
- # CLI (one-off run with pnpm dlx)
57
- pnpm dlx vectra-js vectra query "What is our leave policy?" --config=./config.json --stream
137
+ ```bash
138
+ npm install chromadb
58
139
  ```
59
140
 
60
- ---
141
+ ### CLI
61
142
 
62
- ## Usage Guide
143
+ ```bash
144
+ npm i -g vectra-js
145
+ # or
146
+ pnpm add -g vectra-js
147
+ ```
63
148
 
64
- ### 1. Configuration
149
+ ---
65
150
 
66
- The SDK uses a strictly typed configuration object (validated with Zod).
151
+ ## 5. Quick Start
67
152
 
68
- ```javascript
69
- const { ProviderType, ChunkingStrategy, RetrievalStrategy } = require('vectra-js');
153
+ ```js
154
+ const { VectraClient, ProviderType } = require('vectra-js');
70
155
 
71
- const config = {
72
- // 1. Embedding Provider
156
+ const client = new VectraClient({
73
157
  embedding: {
74
158
  provider: ProviderType.OPENAI,
75
159
  apiKey: process.env.OPENAI_API_KEY,
76
- modelName: 'text-embedding-3-small',
77
- dimensions: 1536 // Optional
160
+ modelName: 'text-embedding-3-small'
78
161
  },
79
-
80
- // 2. LLM Provider (for Generation)
81
162
  llm: {
82
163
  provider: ProviderType.GEMINI,
83
164
  apiKey: process.env.GOOGLE_API_KEY,
84
165
  modelName: 'gemini-1.5-pro-latest'
85
166
  },
86
-
87
- // 3. Database (Modular)
88
167
  database: {
89
- type: 'prisma', // or 'chroma'
90
- clientInstance: prismaClient, // Your instantiated DB client
91
- tableName: 'Document', // Table or Collection name
92
- columnMap: { // Map SDK fields to your DB columns
93
- content: 'text',
94
- vector: 'embedding',
95
- metadata: 'meta'
96
- }
97
- },
98
-
99
- // 4. Chunking (Optional)
100
- chunking: {
101
- strategy: ChunkingStrategy.RECURSIVE,
102
- chunkSize: 1000,
103
- chunkOverlap: 200
104
- },
105
-
106
- // 5. Retrieval (Optional)
107
- retrieval: {
108
- strategy: RetrievalStrategy.HYBRID, // Uses RRF
109
- llmConfig: { /* Config for query rewriting LLM */ }
168
+ type: 'prisma',
169
+ clientInstance: prisma,
170
+ tableName: 'Document'
110
171
  }
111
- };
112
- ```
172
+ });
113
173
 
114
- ### Configuration Reference
115
-
116
- - Embedding
117
- - `provider`: one of `ProviderType.OPENAI`, `ProviderType.GEMINI`
118
- - `apiKey`: provider API key string
119
- - `modelName`: embedding model identifier
120
- - `dimensions`: number; ensures vector size matches DB `pgvector(n)`
121
- - LLM
122
- - `provider`: `ProviderType.OPENAI` | `ProviderType.GEMINI` | `ProviderType.ANTHROPIC` | `ProviderType.OLLAMA`
123
- - `apiKey`: provider API key string (optional for Ollama)
124
- - `modelName`: generation model identifier
125
- - `baseUrl`: optional custom URL (e.g., for Ollama)
126
- - `temperature`: number; optional sampling temperature
127
- - `maxTokens`: number; optional max output tokens
128
- - Memory
129
- - `enabled`: boolean; toggle memory on/off (default: false)
130
- - `type`: `'in-memory' | 'redis' | 'postgres'`
131
- - `maxMessages`: number; number of recent messages to retain (default: 20)
132
- - `redis`: `{ clientInstance, keyPrefix }` where `keyPrefix` defaults to `'vectra:chat:'`
133
- - `postgres`: `{ clientInstance, tableName, columnMap }` where `tableName` defaults to `'ChatMessage'` and `columnMap` maps `{ sessionId, role, content, createdAt }`
134
- - Ingestion
135
- - `rateLimitEnabled`: boolean; toggle rate limiting on/off (default: false)
136
- - `concurrencyLimit`: number; max concurrent embedding requests when enabled (default: 5)
137
- - `mode`: `'skip' | 'append' | 'replace'`; idempotency behavior (default: `'skip'`)
138
- - Database
139
- - `type`: `prisma` | `chroma` | `qdrant` | `milvus`
140
- - `clientInstance`: instantiated client for the chosen backend
141
- - `tableName`: table/collection name (Postgres/Qdrant/Milvus)
142
- - `columnMap`: maps SDK fields to DB columns
143
- - `content`: text column name
144
- - `vector`: embedding vector column name (for Postgres pgvector)
145
- - `metadata`: JSON column name for per-chunk metadata
146
- - Chunking
147
- - `strategy`: `ChunkingStrategy.RECURSIVE` | `ChunkingStrategy.AGENTIC`
148
- - `chunkSize`: number; preferred chunk size (characters)
149
- - `chunkOverlap`: number; overlap between adjacent chunks (characters)
150
- - `separators`: array of string separators to split on (optional)
151
- - Retrieval
152
- - `strategy`: `RetrievalStrategy.NAIVE` | `HYDE` | `MULTI_QUERY` | `HYBRID` | `MMR`
153
- - `llmConfig`: optional LLM config for query rewriting (HyDE/Multi-Query)
154
- - `mmrLambda`: \(0..1\) tradeoff between relevance and diversity (default: 0.5)
155
- - `mmrFetchK`: candidate pool size for MMR (default: 20)
156
- - Reranking
157
- - `enabled`: boolean; enable LLM-based reranking
158
- - `topN`: number; final number of docs to keep (optional)
159
- - `windowSize`: number; number of docs considered before reranking
160
- - `llmConfig`: optional LLM config for the reranker
161
- - Metadata
162
- - `enrichment`: boolean; generate `summary`, `keywords`, `hypothetical_questions`
163
- - Callbacks
164
- - `callbacks`: array of handlers; use `LoggingCallbackHandler` or `StructuredLoggingCallbackHandler`
165
- - Observability
166
- - `enabled`: boolean; enable SQLite-based observability (default: false)
167
- - `sqlitePath`: string; path to SQLite database file (default: 'vectra-observability.db')
168
- - `projectId`: string; project identifier for multi-project support (default: 'default')
169
- - `trackMetrics`: boolean; track latency and other metrics
170
- - `trackTraces`: boolean; track detailed workflow traces
171
- - `sessionTracking`: boolean; track chat sessions
172
- - Index Helpers (Postgres + Prisma)
173
- - `ensureIndexes()`: creates ivfflat and GIN FTS indexes and optional `tsvector` trigger
174
-
175
-
176
- ### 2. Initialization & Ingestion
177
-
178
- ```javascript
179
- const { VectraClient } = require('vectra-js');
180
- const client = new VectraClient(config);
181
-
182
- // Ingest a file (supports .pdf, .docx, .txt, .md, .xlsx)
183
- // This will: Load -> Chunk -> Embed -> Store
184
- await client.ingestDocuments('./documents/employee_handbook.pdf');
185
- // Ensure indexes (Postgres + Prisma)
186
- if (config.database.type === 'prisma' && client.vectorStore.ensureIndexes) {
187
- await client.vectorStore.ensureIndexes();
188
- }
189
- // Enable metadata enrichment
190
- // metadata: { enrichment: true }
174
+ await client.ingestDocuments('./docs');
175
+ const res = await client.queryRAG('What is the vacation policy?');
176
+ console.log(res.answer);
191
177
  ```
192
178
 
193
- ### Document Management
179
+ ---
194
180
 
195
- ```javascript
196
- // List recent documents (by metadata filter)
197
- const docs = await client.listDocuments({ filter: { docTitle: 'Employee Handbook' }, limit: 50 });
181
+ ## 6. Core Concepts
198
182
 
199
- // Delete by ids or metadata filter
200
- await client.deleteDocuments({ ids: docs.map(d => d.id) });
201
- // or:
202
- await client.deleteDocuments({ filter: { absolutePath: '/abs/path/to/file.pdf' } });
183
+ ### Providers
203
184
 
204
- // Update existing docs (requires backend upsert support)
205
- await client.updateDocuments([
206
- { id: docs[0].id, content: 'Updated content', metadata: { docTitle: 'Employee Handbook' } }
207
- ]);
208
- ```
185
+ Providers implement embeddings, generation, or both. Vectra normalizes outputs and streaming across providers.
209
186
 
210
- ### 3. Querying (Standard)
187
+ ### Vector Stores
211
188
 
212
- ```javascript
213
- const response = await client.queryRAG("What is the vacation policy?");
189
+ Vector stores persist embeddings and metadata. They are fully swappable via config.
214
190
 
215
- console.log("Answer:", response.answer);
216
- console.log("Sources:", response.sources); // Metadata of retrieved chunks
217
- ```
191
+ ### Chunking
218
192
 
219
- ### 4. Querying (Streaming)
193
+ * **Recursive**: Character‑aware, separator‑aware splitting
194
+ * **Agentic**: LLM‑driven semantic propositions (best for policies, legal docs)
220
195
 
221
- Ideal for Chat UIs. Returns an Async Generator of unified chunks.
196
+ ### Retrieval
197
+
198
+ Controls recall vs precision using multiple strategies.
199
+
200
+ ### Reranking
201
+
202
+ Optional LLM‑based reordering of retrieved chunks.
203
+
204
+ ### Metadata Enrichment
222
205
 
223
- ```javascript
224
- const stream = await client.queryRAG("Draft a welcome email...", null, true);
206
+ Optional per‑chunk summaries, keywords, and hypothetical questions generated at ingestion time.
225
207
 
226
- for await (const chunk of stream) {
227
- process.stdout.write(chunk.delta || "");
208
+ ### Query Planning & Grounding
209
+
210
+ Controls how context is assembled and how strictly answers must be grounded in retrieved text.
211
+
212
+ ### Conversation Memory
213
+
214
+ Persist multi‑turn chat history across sessions.
215
+
216
+ ---
217
+
218
+ ## 7. Configuration Reference (Usage‑Driven)
219
+
220
+ > All configuration is validated using **Zod** at runtime.
221
+
222
+ ### Embedding
223
+
224
+ ```js
225
+ embedding: {
226
+ provider: ProviderType.OPENAI,
227
+ apiKey: process.env.OPENAI_API_KEY,
228
+ modelName: 'text-embedding-3-small',
229
+ dimensions: 1536
228
230
  }
229
231
  ```
230
232
 
231
- ### 5. Conversation Memory
232
-
233
- Enable multi-turn conversations by configuring memory and passing a `sessionId`.
234
-
235
- ```javascript
236
- // In config (enable memory: default is off)
237
- const config = {
238
- // ...
239
- memory: { enabled: true, type: 'in-memory', maxMessages: 10 }
240
- };
241
-
242
- // Redis-backed memory
243
- const redis = /* your redis client instance */;
244
- const configRedis = {
245
- // ...
246
- memory: {
247
- enabled: true,
248
- type: 'redis',
249
- redis: { clientInstance: redis, keyPrefix: 'vectra:chat:' },
250
- maxMessages: 20
251
- }
252
- };
253
-
254
- // Postgres-backed memory
255
- const prisma = /* your Prisma client instance */;
256
- const configPostgres = {
257
- // ...
258
- memory: {
259
- enabled: true,
260
- type: 'postgres',
261
- postgres: {
262
- clientInstance: prisma,
263
- tableName: 'ChatMessage',
264
- columnMap: { sessionId: 'sessionId', role: 'role', content: 'content', createdAt: 'createdAt' }
265
- },
266
- maxMessages: 20
267
- }
268
- };
233
+ Use `dimensions` when using pgvector to avoid runtime mismatches.
269
234
 
270
- // In your app:
271
- const sessionId = 'user-123-session-abc';
272
- const response = await client.queryRAG("What is the refund policy?", null, false, sessionId);
273
- const followUp = await client.queryRAG("Does it apply to sale items?", null, false, sessionId);
235
+ ---
236
+
237
+ ### LLM
238
+
239
+ ```js
240
+ llm: {
241
+ provider: ProviderType.GEMINI,
242
+ apiKey: process.env.GOOGLE_API_KEY,
243
+ modelName: 'gemini-1.5-pro-latest',
244
+ temperature: 0.3,
245
+ maxTokens: 1024
246
+ }
274
247
  ```
275
248
 
276
- ### 6. Production Evaluation
249
+ Used for:
277
250
 
278
- Measure the quality of your RAG pipeline using the built-in evaluation module.
251
+ * Answer generation
252
+ * HyDE & Multi‑Query
253
+ * Agentic chunking
254
+ * Reranking
279
255
 
280
- ```javascript
281
- const testSet = [
282
- {
283
- question: "What is the capital of France?",
284
- expectedGroundTruth: "Paris is the capital of France."
285
- }
286
- ];
256
+ ---
287
257
 
288
- const results = await client.evaluate(testSet);
258
+ ### Database
289
259
 
290
- console.log(`Faithfulness: ${results.averageFaithfulness}`);
291
- console.log(`Relevance: ${results.averageRelevance}`);
260
+ ```js
261
+ database: {
262
+ type: 'prisma',
263
+ clientInstance: prisma,
264
+ tableName: 'Document'
265
+ }
292
266
  ```
293
267
 
268
+ Supports Prisma, Chroma, Qdrant, Milvus.
269
+
294
270
  ---
295
271
 
296
- ## Supported Providers & Backends
297
-
298
- | Feature | OpenAI | Gemini | Anthropic | Ollama | OpenRouter | HuggingFace |
299
- | :--- | :---: | :---: | :---: | :---: | :---: | :---: |
300
- | **Embeddings** | ✅ | ✅ | ❌ | ✅ | ❌ | ✅ |
301
- | **Generation** | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
302
- | **Streaming** | ✅ | ✅ | ✅ | ✅ | ✅ | ⚠️ |
303
-
304
- ### Ollama (Local)
305
- - Use Ollama for local, offline development.
306
- - Set `provider = ProviderType.OLLAMA`.
307
- - Default `baseUrl` is `http://localhost:11434`.
308
- ```javascript
309
- const config = {
310
- embedding: { provider: ProviderType.OLLAMA, modelName: 'nomic-embed-text' },
311
- llm: { provider: ProviderType.OLLAMA, modelName: 'llama3' }
312
- };
313
- ```
272
+ ### Chunking
314
273
 
315
- ### OpenRouter (Generation)
316
- - Use OpenRouter as a unified generation provider.
317
- - Set `llm.provider = ProviderType.OPENROUTER`, `llm.modelName` to a supported model (e.g., `openai/gpt-4o`).
318
- - Provide `OPENROUTER_API_KEY`; optional attribution via `OPENROUTER_REFERER`, `OPENROUTER_TITLE`.
319
- ```javascript
320
- const config = {
321
- llm: {
322
- provider: ProviderType.OPENROUTER,
323
- apiKey: process.env.OPENROUTER_API_KEY,
324
- modelName: 'openai/gpt-4o',
325
- defaultHeaders: {
326
- 'HTTP-Referer': 'https://your.app',
327
- 'X-Title': 'Your App'
328
- }
329
- }
330
- };
274
+ ```js
275
+ chunking: {
276
+ strategy: ChunkingStrategy.RECURSIVE,
277
+ chunkSize: 1000,
278
+ chunkOverlap: 200
279
+ }
331
280
  ```
332
281
 
333
- ### Database Schemas
282
+ Agentic chunking:
334
283
 
335
- **Prisma (PostgreSQL)**
336
- ```prisma
337
- model Document {
338
- id String @id @default(uuid())
339
- content String
340
- metadata Json
341
- vector Unsupported("vector")? // pgvector type
342
- createdAt DateTime @default(now())
284
+ ```js
285
+ chunking: {
286
+ strategy: ChunkingStrategy.AGENTIC,
287
+ agenticLlm: {
288
+ provider: ProviderType.OPENAI,
289
+ apiKey: process.env.OPENAI_API_KEY,
290
+ modelName: 'gpt-4o-mini'
291
+ }
343
292
  }
344
293
  ```
345
294
 
346
- **ChromaDB**
347
- No schema required; collections are created automatically.
348
-
349
295
  ---
350
296
 
351
- ## API Reference
352
-
353
- ### `new VectraClient(config)`
354
- Creates a new client instance. Throws error if config is invalid.
355
-
356
- ### `client.ingestDocuments(path: string): Promise<void>`
357
- Reads a file **or recursively iterates a directory**, chunks content, embeds, and saves to the configured DB.
358
- - If `path` is a file: Ingests that single file.
359
- - If `path` is a directory: Recursively finds all supported files and ingests them.
360
-
361
- ### `client.queryRAG(query: string, filter?: object, stream?: boolean)`
362
- Performs the RAG pipeline:
363
- 1. **Retrieval**: Fetches relevant docs using `config.retrieval.strategy`.
364
- 2. **Reranking**: (Optional) Re-orders docs using `config.reranking`.
365
- 3. **Generation**: Sends context + query to LLM.
366
-
367
- **Returns**:
368
- * If `stream=false` (default): `{ answer: string | object, sources: object[] }`
369
- * If `stream=true`: `AsyncGenerator<{ delta: string, finish_reason: string | null, usage: any | null }>`
370
-
371
- ### Advanced Configuration
372
-
373
- - Query Planning
374
- - `queryPlanning.tokenBudget`: number; total token budget for context
375
- - `queryPlanning.preferSummariesBelow`: number; prefer metadata summaries under this budget
376
- - `queryPlanning.includeCitations`: boolean; include titles/sections/pages in context
377
- - Grounding
378
- - `grounding.enabled`: boolean; enable extractive snippet grounding
379
- - `grounding.strict`: boolean; use only grounded snippets when true
380
- - `grounding.maxSnippets`: number; max snippets to include
381
- - Generation
382
- - `generation.structuredOutput`: `'none' | 'citations'`; enable inline citations
383
- - `generation.outputFormat`: `'text' | 'json'`; return JSON when set to `json`
384
- - Prompts
385
- - `prompts.query`: string template using `{{context}}` and `{{question}}`
386
- - `prompts.reranking`: optional template for reranker prompt
387
- - Tracing
388
- - `tracing.enable`: boolean; enable provider/DB/pipeline span hooks
297
+ ### Retrieval
389
298
 
390
- ### CLI
299
+ ```js
300
+ retrieval: { strategy: RetrievalStrategy.HYBRID }
301
+ ```
391
302
 
392
- Quickly ingest and query to validate configurations.
303
+ HYBRID is recommended for production.
393
304
 
394
- ```bash
395
- vectra ingest ./docs --config=./nodejs-test/config.json
396
- vectra query "What is our leave policy?" --config=./nodejs-test/config.json --stream
397
- ```
305
+ ---
306
+
307
+ ### Reranking
398
308
 
399
- ### Ingestion Rate Limiting
400
- - Toggle ingestion rate limiting via `config.ingestion`.
401
- ```javascript
402
- const config = {
403
- // ...
404
- ingestion: { rateLimitEnabled: true, concurrencyLimit: 5 }
405
- };
309
+ ```js
310
+ reranking: {
311
+ enabled: true,
312
+ windowSize: 20,
313
+ topN: 5
314
+ }
406
315
  ```
407
316
 
408
317
  ---
409
318
 
410
- ## Extending
319
+ ### Memory
411
320
 
412
- ### Custom Vector Store
413
- Inherit from `VectorStore` class and implement `addDocuments` and `similaritySearch`.
321
+ ```js
322
+ memory: { enabled: true, type: 'in-memory', maxMessages: 20 }
323
+ ```
414
324
 
415
- ```javascript
416
- const { VectorStore } = require('vectra-js/interfaces');
325
+ Redis and Postgres are supported.
417
326
 
418
- class MyCustomDB extends VectorStore {
419
- async addDocuments(docs) { ... }
420
- async similaritySearch(vector, k) { ... }
327
+ ---
328
+
329
+ ### Observability
330
+
331
+ ```js
332
+ observability: {
333
+ enabled: true,
334
+ sqlitePath: 'vectra-observability.db'
421
335
  }
422
336
  ```
423
337
 
424
338
  ---
425
339
 
426
- ## Developer Guide
340
+ ## 8. Ingestion Pipeline
427
341
 
428
- ### Setup
429
- - Use `pnpm` for package management.
430
- - Node.js 18+ recommended.
431
- - Install with `pnpm install`.
432
- - Lint with `pnpm run lint`.
342
+ ```js
343
+ await client.ingestDocuments('./documents');
344
+ ```
433
345
 
434
- ### Environment
435
- - `OPENAI_API_KEY`, `GOOGLE_API_KEY`, `ANTHROPIC_API_KEY` for providers.
436
- - Database client instance configured under `config.database.clientInstance`.
346
+ Supports files or directories.
437
347
 
438
- ### Architecture
439
- - Pipeline: Load → Chunk → Embed → Store → Retrieve → Rerank → Plan → Ground → Generate → Stream.
440
- - Core client: `VectraClient` (library export).
441
- - Configuration: `VectraConfig` (validated schema).
442
- - Vector store interface: `VectorStore` (extend to add custom stores).
443
- - Callbacks: `StructuredLoggingCallbackHandler` and custom handler support.
348
+ Formats: PDF, DOCX, XLSX, TXT, Markdown
444
349
 
445
- ### Retrieval Strategies
446
- - Supports NAIVE, HYDE, MULTI_QUERY, HYBRID (RRF fusion built-in).
350
+ ---
447
351
 
448
- ### Query Planning & Grounding
449
- - Context assembly respects `queryPlanning` (token budget, summary preference, citations).
450
- - Snippet extraction controlled by `grounding` (strict mode and max snippets).
352
+ ## 9. Querying & Streaming
451
353
 
452
- ### Streaming Interface
453
- - Unified streaming shape `{ delta, finish_reason, usage }` across OpenAI, Gemini, Anthropic.
354
+ ```js
355
+ const res = await client.queryRAG('Refund policy?');
356
+ ```
454
357
 
455
- ### Adding a Provider
456
- - Implement `embedDocuments`, `embedQuery`, `generate`, `generateStream`.
457
- - Ensure streaming yields `{ delta, finish_reason, usage }`.
458
- - Wire via `llm.provider` in config.
358
+ Streaming:
459
359
 
460
- ### Adding a Vector Store
461
- - Extend `VectorStore`; implement `addDocuments`, `similaritySearch`, optionally `hybridSearch`.
462
- - Select via `database.type` in config.
360
+ ```js
361
+ const stream = await client.queryRAG('Draft email', null, true);
362
+ for await (const chunk of stream) process.stdout.write(chunk.delta || '');
363
+ ```
463
364
 
464
- ### Callbacks & Observability
465
- - Available events: `onIngestStart`, `onIngestEnd`, `onIngestSummary`, `onChunkingStart`, `onEmbeddingStart`, `onRetrievalStart`, `onRetrievalEnd`, `onRerankingStart`, `onRerankingEnd`, `onGenerationStart`, `onGenerationEnd`, `onError`.
466
- - Extend `StructuredLoggingCallbackHandler` to add error codes and payload sizes.
365
+ ---
467
366
 
468
- ### CLI
469
- - Binary `vectra` included with the package.
470
- - Ingest: `vectra ingest <path> --config=./config.json`.
471
- - Query: `vectra query "<text>" --config=./config.json --stream`.
367
+ ## 10. Conversation Memory
472
368
 
473
- ### Coding Conventions
474
- - CommonJS modules, flat ESLint config.
475
- - Follow existing naming: `chunkIndex` in JS; use consistent casing.
369
+ Pass a `sessionId` to maintain context across turns.
476
370
 
477
371
  ---
478
372
 
479
- ## Feature Guide
373
+ ## 11. Evaluation & Quality Measurement
480
374
 
481
- ### Embeddings
482
- - Providers: `OPENAI`, `GEMINI`.
483
- - Configure dimensions to match DB `pgvector(n)` when applicable.
484
- - Example:
485
- ```javascript
486
- const config = {
487
- embedding: {
488
- provider: ProviderType.OPENAI,
489
- apiKey: process.env.OPENAI_API_KEY,
490
- modelName: 'text-embedding-3-small',
491
- dimensions: 1536
492
- },
493
- // ...
494
- };
375
+ ```js
376
+ await client.evaluate([{ question: 'Capital of France?', expectedGroundTruth: 'Paris' }]);
495
377
  ```
496
378
 
497
- ### Generation
498
- - Providers: `OPENAI`, `GEMINI`, `ANTHROPIC`.
499
- - Options: `temperature`, `maxTokens`.
500
- - Structured output: set `generation.outputFormat = 'json'` and parse `answer`.
501
- ```javascript
502
- const config = {
503
- llm: { provider: ProviderType.GEMINI, apiKey: process.env.GOOGLE_API_KEY, modelName: 'gemini-1.5-pro-latest', temperature: 0.3 },
504
- generation: { outputFormat: 'json', structuredOutput: 'citations' }
505
- };
506
- const client = new VectraClient(config);
507
- const res = await client.queryRAG('Summarize our policy with citations.');
508
- console.log(res.answer); // JSON object or string on fallback
509
- ```
379
+ Metrics:
510
380
 
511
- - OpenRouter usage:
512
- ```javascript
513
- const config = {
514
- llm: {
515
- provider: ProviderType.OPENROUTER,
516
- apiKey: process.env.OPENROUTER_API_KEY,
517
- modelName: 'openai/gpt-4o',
518
- defaultHeaders: { 'HTTP-Referer': 'https://your.app', 'X-Title': 'Your App' }
519
- }
520
- };
521
- ```
381
+ * Faithfulness
382
+ * Relevance
522
383
 
523
- ### Chunking
524
- - Strategies: `RECURSIVE`, `AGENTIC`.
525
- - Agentic requires `chunking.agenticLlm` config.
526
- ```javascript
527
- const config = {
528
- chunking: {
529
- strategy: ChunkingStrategy.AGENTIC,
530
- agenticLlm: { provider: ProviderType.OPENAI, apiKey: process.env.OPENAI_API_KEY, modelName: 'gpt-4o-mini' },
531
- chunkSize: 1200,
532
- chunkOverlap: 200
533
- }
534
- };
535
- ```
384
+ ---
536
385
 
537
- ### Retrieval
538
- - Strategies: `NAIVE`, `HYDE`, `MULTI_QUERY`, `HYBRID`.
539
- - HYDE/MULTI_QUERY require `retrieval.llmConfig`.
540
- - Example:
541
- ```javascript
542
- const config = {
543
- retrieval: {
544
- strategy: RetrievalStrategy.MULTI_QUERY,
545
- llmConfig: { provider: ProviderType.OPENAI, apiKey: process.env.OPENAI_API_KEY, modelName: 'gpt-4o-mini' }
546
- }
547
- };
548
- ```
386
+ ## 12. CLI
549
387
 
550
- ### Reranking
551
- - Enable LLM-based reranking to reorder results.
552
- ```javascript
553
- const config = {
554
- reranking: {
555
- enabled: true,
556
- topN: 5,
557
- windowSize: 20,
558
- llmConfig: { provider: ProviderType.ANTHROPIC, apiKey: process.env.ANTHROPIC_API_KEY, modelName: 'claude-3-haiku' }
559
- }
560
- };
561
- ```
388
+ ### Ingest & Query
562
389
 
563
- ### Metadata Enrichment
564
- - Add summaries, keywords, hypothetical questions during ingestion.
565
- ```javascript
566
- const config = { metadata: { enrichment: true } };
567
- await client.ingestDocuments('./docs/handbook.pdf');
390
+ ```bash
391
+ vectra ingest ./docs --config=./config.json
392
+ vectra query "What is our leave policy?" --config=./config.json --stream
568
393
  ```
569
394
 
570
- ### Query Planning
571
- - Control context assembly with token budget and summary preference.
572
- ```javascript
573
- const config = {
574
- queryPlanning: { tokenBudget: 2048, preferSummariesBelow: 1024, includeCitations: true }
575
- };
576
- ```
395
+ ---
577
396
 
578
- ### Answer Grounding
579
- - Inject extractive snippets; use `strict` to only allow grounded quotes.
580
- ```javascript
581
- const config = { grounding: { enabled: true, strict: false, maxSnippets: 4 } };
582
- ```
397
+ ### WebConfig (Config Generator UI)
583
398
 
584
- ### Prompts
585
- - Provide a custom query template using `{{context}}` and `{{question}}`.
586
- ```javascript
587
- const config = {
588
- prompts: { query: 'Use only the following context to answer.\nContext:\n{{context}}\n\nQ: {{question}}' }
589
- };
399
+ ```bash
400
+ vectra webconfig
590
401
  ```
591
402
 
592
- ### Streaming
593
- - Unified async generator with chunks `{ delta, finish_reason, usage }`.
594
- ```javascript
595
- const stream = await client.queryRAG('Draft a welcome email', null, true);
596
- for await (const chunk of stream) process.stdout.write(chunk.delta || '');
597
- ```
403
+ **WebConfig** launches a local web UI that:
598
404
 
599
- ### Filters
600
- - Limit retrieval to metadata fields.
601
- ```javascript
602
- const res = await client.queryRAG('Vacation policy', { docTitle: 'Employee Handbook' });
603
- ```
405
+ * Guides you through building a valid `vectra.config.json`
406
+ * Validates all options interactively
407
+ * Prevents misconfiguration
604
408
 
605
- ### Callbacks
606
- - Hook into pipeline stages for logging/metrics.
607
- ```javascript
608
- const { StructuredLoggingCallbackHandler } = require('vectra-js/src/callbacks');
609
- const config = { callbacks: [ new StructuredLoggingCallbackHandler() ] };
409
+ This is ideal for:
410
+
411
+ * First‑time setup
412
+ * Non‑backend users
413
+ * Sharing configs across teams
414
+
415
+ ---
416
+
417
+ ### Observability Dashboard
418
+
419
+ ```bash
420
+ vectra dashboard
610
421
  ```
611
422
 
423
+ The **Observability Dashboard** is a local web UI backed by SQLite that visualizes:
424
+
425
+ * Ingestion latency
426
+ * Query latency
427
+ * Retrieval & generation traces
428
+ * Chat sessions
429
+
430
+ It helps you:
431
+
432
+ * Debug RAG quality issues
433
+ * Understand latency bottlenecks
434
+ * Monitor production‑like workloads
435
+
436
+ ---
437
+
438
+ ## 13. Observability & Callbacks
439
+
612
440
  ### Observability
613
441
 
614
- Built-in SQLite-based observability to track metrics, traces, and sessions.
615
-
616
- ```javascript
617
- const config = {
618
- // ...
619
- observability: {
620
- enabled: true,
621
- sqlitePath: 'vectra-observability.db',
622
- projectId: 'my-project',
623
- trackMetrics: true,
624
- trackTraces: true,
625
- sessionTracking: true
626
- }
627
- };
628
- ```
442
+ Tracks metrics, traces, and sessions automatically when enabled.
629
443
 
630
- This tracks:
631
- - **Metrics**: Latency (ingest, query).
632
- - **Traces**: Detailed spans for retrieval, generation, and ingestion workflows.
633
- - **Sessions**: Chat session history and last query tracking.
444
+ ### Callbacks
634
445
 
635
- ### Vector Stores
636
- - Prisma (Postgres + pgvector), Chroma, Qdrant, Milvus.
637
- - Configure `database.type`, `tableName`, `columnMap`, `clientInstance`.
638
- ```javascript
639
- const config = {
640
- database: {
641
- type: 'prisma',
642
- clientInstance: prismaClient,
643
- tableName: 'Document',
644
- columnMap: { content: 'content', vector: 'embedding', metadata: 'metadata' }
645
- }
646
- };
446
+ Lifecycle hooks:
447
+
448
+ * Ingestion
449
+ * Chunking
450
+ * Embedding
451
+ * Retrieval
452
+ * Reranking
453
+ * Generation
454
+ * Errors
455
+
456
+ ---
457
+
458
+ ## 14. Database Schemas & Indexing
459
+
460
+ ```prisma
461
+ model Document {
462
+ id String @id @default(uuid())
463
+ content String
464
+ metadata Json
465
+ vector Unsupported("vector")?
466
+ createdAt DateTime @default(now())
467
+ }
647
468
  ```
648
- ### HuggingFace (Embeddings & Generation)
649
- - Use HuggingFace Inference API for embeddings and generation.
650
- - Set `provider = ProviderType.HUGGINGFACE`, `modelName` to a supported model (e.g., `sentence-transformers/all-MiniLM-L6-v2` for embeddings, `tiiuae/falcon-7b-instruct` for generation).
651
- - Provide `HUGGINGFACE_API_KEY`.
652
- ```javascript
653
- const config = {
654
- embedding: { provider: ProviderType.HUGGINGFACE, apiKey: process.env.HUGGINGFACE_API_KEY, modelName: 'sentence-transformers/all-MiniLM-L6-v2' },
655
- llm: { provider: ProviderType.HUGGINGFACE, apiKey: process.env.HUGGINGFACE_API_KEY, modelName: 'tiiuae/falcon-7b-instruct' }
656
- };
469
+
470
+ ---
471
+
472
+ ## 15. Extending Vectra
473
+
474
+ ### Custom Vector Store
475
+
476
+ ```js
477
+ class MyStore extends VectorStore {
478
+ async addDocuments() {}
479
+ async similaritySearch() {}
480
+ }
657
481
  ```
482
+
483
+ ---
484
+
485
+ ## 16. Architecture Overview
486
+
487
+ * `VectraClient`: orchestrator
488
+ * Typed config schema
489
+ * Interface‑driven providers & stores
490
+ * Unified streaming abstraction
491
+
492
+ ---
493
+
494
+ ## 17. Development & Contribution Guide
495
+
496
+ * Node.js 18+
497
+ * pnpm recommended
498
+ * Lint: `pnpm run lint`
499
+
500
+ ---
501
+
502
+ ## 18. Production Best Practices
503
+
504
+ * Match embedding dimensions to pgvector
505
+ * Prefer HYBRID retrieval
506
+ * Enable observability in staging
507
+ * Evaluate before changing chunk sizes
508
+
509
+ ---
510
+
511
+ **Vectra scales cleanly from local prototypes to production‑grade RAG platforms.**