@msbayindir/context-rag 1.0.0-beta.6 β†’ 1.0.0-beta.7

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -21,6 +21,235 @@
21
21
  | πŸ”Ž **Hybrid Search** | Semantic (vector) + Keyword (full-text) search combination |
22
22
  | 🐘 **PostgreSQL Native** | No external vector DB needed, uses pgvector |
23
23
  | ⚑ **Batch Processing** | Concurrent processing with automatic retry |
24
+ | πŸ›‘οΈ **Enterprise Error Handling** | Correlation IDs, graceful degradation, structured logging |
25
+
26
+ ---
27
+
28
+ ## πŸ—οΈ Architecture
29
+
30
+ ```mermaid
31
+ flowchart TB
32
+ subgraph Input
33
+ PDF[πŸ“„ PDF Document]
34
+ end
35
+
36
+ subgraph Discovery["πŸ” Discovery Phase"]
37
+ DA[Discovery Agent]
38
+ DA --> |Analyzes| PS[Prompt Strategy]
39
+ DA --> |Suggests| CS[Chunk Types]
40
+ end
41
+
42
+ subgraph Ingestion["πŸ“₯ Ingestion Pipeline"]
43
+ GF[Gemini Files API]
44
+ BP[Batch Processor]
45
+ SE[Structured Extraction]
46
+ CR[Contextual Retrieval]
47
+ VE[Vector Embedding]
48
+ end
49
+
50
+ subgraph Storage["πŸ—„οΈ PostgreSQL"]
51
+ PG[(pgvector)]
52
+ FT[Full-Text Index]
53
+ end
54
+
55
+ subgraph Retrieval["πŸ”Ž Search & Retrieval"]
56
+ HS[Hybrid Search]
57
+ RR[Reranker]
58
+ RR --> |Gemini or Cohere| RS[Ranked Results]
59
+ end
60
+
61
+ PDF --> DA
62
+ PS --> GF
63
+ GF --> |Cached URI| BP
64
+ BP --> SE
65
+ SE --> CR
66
+ CR --> |"Adds Context"| VE
67
+ VE --> PG
68
+ VE --> FT
69
+
70
+ Query[πŸ” Query] --> HS
71
+ HS --> PG
72
+ HS --> FT
73
+ PG & FT --> RR
74
+ RS --> Response[πŸ“‹ Contextual Results]
75
+ ```
76
+
77
+ ---
78
+
79
+ ## πŸ€” Why Contextual Retrieval?
80
+
81
+ > **Problem:** Traditional RAG systems lose context when chunking documents. A chunk saying *"The inhibitor blocks Complex IV"* is meaningless without knowing it's from the *"Electron Transport Chain"* section.
82
+
83
+ ### The Anthropic Research
84
+
85
+ [Anthropic's Contextual Retrieval](https://www.anthropic.com/news/contextual-retrieval) paper showed that adding context to each chunk dramatically improves retrieval quality:
86
+
87
+ | Method | Retrieval Failure Rate | Improvement |
88
+ |--------|------------------------|-------------|
89
+ | Traditional RAG | 5.7% | - |
90
+ | + BM25 Hybrid | 4.5% | +21% |
91
+ | + Contextual Retrieval | 2.9% | **+49%** |
92
+ | + Contextual + Reranking | 1.9% | **+67%** |
93
+
94
+ ### How Context-RAG Implements This
95
+
96
+ ```typescript
97
+ // Before: Raw chunk (loses context)
98
+ {
99
+ content: "The inhibitor blocks Complex IV",
100
+ // Where is this from? What document? What section?
101
+ }
102
+
103
+ // After: Contextual chunk (Context-RAG)
104
+ {
105
+ content: "The inhibitor blocks Complex IV",
106
+ contextText: "This chunk is from 'Biochemistry 101', Chapter 5: Electron Transport Chain. It describes how cyanide inhibits cytochrome c oxidase (Complex IV), stopping ATP synthesis.",
107
+ enrichedContent: "[CONTEXT] ... [CONTENT] The inhibitor blocks Complex IV"
108
+ }
109
+ ```
110
+
111
+ ---
112
+
113
+ ## πŸ“‹ Real-World Use Cases
114
+
115
+ ### πŸ₯ 1. Medical Education (TUS/USMLE Prep)
116
+
117
+ **Scenario:** Turkish medical students preparing for TUS exam with 500+ page biochemistry PDFs.
118
+
119
+ ```typescript
120
+ const rag = new ContextRAG({
121
+ prisma,
122
+ geminiApiKey: process.env.GEMINI_API_KEY,
123
+ ragEnhancement: {
124
+ approach: 'anthropic_contextual', // Enable contextual retrieval
125
+ strategy: 'llm',
126
+ model: 'gemini-2.5-flash',
127
+ },
128
+ });
129
+
130
+ // Discovery: AI analyzes the PDF and suggests extraction strategy
131
+ const discovery = await rag.discover({ file: pdfBuffer, filename: 'biochemistry.pdf' });
132
+
133
+ // Ingest with discovered strategy
134
+ await rag.ingest({
135
+ file: pdfBuffer,
136
+ filename: 'biochemistry.pdf',
137
+ promptConfig: discovery.promptConfig, // AI-suggested prompts
138
+ });
139
+
140
+ // Students can now ask contextual questions
141
+ const results = await rag.search({
142
+ query: 'SiyanΓΌr hangi kompleksi inhibe eder?',
143
+ mode: 'hybrid',
144
+ useReranking: true,
145
+ });
146
+ // Returns: "Complex IV (Cytochrome c oxidase)" with full chapter context
147
+ ```
148
+
149
+ ### βš–οΈ 2. Legal Document Analysis
150
+
151
+ **Scenario:** Law firms processing contracts, regulations, and case law.
152
+
153
+ ```typescript
154
+ // Custom extraction for legal documents
155
+ await rag.ingest({
156
+ file: contractPdf,
157
+ filename: 'service-agreement.pdf',
158
+ customPrompt: `
159
+ Extract the following from this legal document:
160
+ - CLAUSE: Individual contract clauses with section numbers
161
+ - DEFINITION: Defined terms and their meanings
162
+ - OBLIGATION: Parties' obligations and deadlines
163
+ - LIABILITY: Liability limitations and indemnifications
164
+ `,
165
+ });
166
+
167
+ // Search with type filtering
168
+ const liabilityClauses = await rag.search({
169
+ query: 'limitation of liability for indirect damages',
170
+ filters: { chunkTypes: ['LIABILITY', 'CLAUSE'] },
171
+ useReranking: true,
172
+ });
173
+ ```
174
+
175
+ ### 🏒 3. Enterprise Knowledge Base
176
+
177
+ **Scenario:** Company onboarding with internal policies, procedures, and technical docs.
178
+
179
+ ```typescript
180
+ // Process multiple document types
181
+ for (const doc of ['hr-policy.pdf', 'security-guidelines.pdf', 'api-docs.pdf']) {
182
+ const discovery = await rag.discover({ file: docs[doc], filename: doc });
183
+ await rag.ingest({
184
+ file: docs[doc],
185
+ filename: doc,
186
+ promptConfig: discovery.promptConfig,
187
+ experimentId: 'knowledge-base-v1', // Group related documents
188
+ });
189
+ }
190
+
191
+ // Employees search across all documents
192
+ const results = await rag.search({
193
+ query: 'What is the vacation policy for remote employees?',
194
+ mode: 'hybrid',
195
+ });
196
+ ```
197
+
198
+ ---
199
+
200
+ ## πŸ›‘οΈ Enterprise Error Handling
201
+
202
+ Context-RAG implements production-grade error handling with full traceability:
203
+
204
+ ### Correlation IDs
205
+
206
+ Every operation is tracked with a unique correlation ID for debugging:
207
+
208
+ ```typescript
209
+ import { generateCorrelationId, setCorrelationId } from '@msbayindir/context-rag';
210
+
211
+ // Set correlation ID for request tracing
212
+ const correlationId = generateCorrelationId(); // crag_1737470109_abc123
213
+ setCorrelationId(correlationId);
214
+
215
+ // All logs and errors now include this ID
216
+ // [2026-01-21T18:00:00.000Z] [INFO] Starting ingestion {"correlationId":"crag_1737470109_abc123"}
217
+ ```
218
+
219
+ ### Custom Error Classes
220
+
221
+ ```typescript
222
+ import {
223
+ IngestionError,
224
+ RerankingError,
225
+ ConfigurationError,
226
+ RateLimitError
227
+ } from '@msbayindir/context-rag';
228
+
229
+ try {
230
+ await rag.ingest({ file: pdfBuffer, filename: 'doc.pdf' });
231
+ } catch (error) {
232
+ if (error instanceof RateLimitError) {
233
+ console.log(`Rate limited. Retry after ${error.retryAfterMs}ms`);
234
+ console.log(`Correlation ID: ${error.correlationId}`);
235
+ } else if (error instanceof IngestionError) {
236
+ console.log(`Ingestion failed at batch ${error.batchIndex}`);
237
+ console.log(`Retryable: ${error.retryable}`);
238
+ }
239
+ }
240
+ ```
241
+
242
+ ### Health Check
243
+
244
+ ```typescript
245
+ const health = await rag.healthCheck();
246
+ // {
247
+ // status: 'healthy',
248
+ // database: true,
249
+ // pgvector: true,
250
+ // reranking: { enabled: true, provider: 'gemini', configured: true }
251
+ // }
252
+ ```
24
253
 
25
254
  ---
26
255
 
package/dist/bin/cli.js CHANGED
File without changes