@vibe-agent-toolkit/vat-development-agents 0.1.13 → 0.1.14

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,770 @@
1
+ ---
2
+ title: RAG Usage Guide
3
+ description: Practical examples for using the VAT RAG system in real-world scenarios
4
+ category: guide
5
+ tags: [rag, documentation, examples, configuration]
6
+ audience: intermediate
7
+ ---
8
+
9
+ # RAG Usage Guide
10
+
11
+ This guide provides practical examples for using the VAT RAG system in real-world scenarios.
12
+
13
+ ---
14
+
15
+ ## Table of Contents
16
+
17
+ 1. [Quick Start](#quick-start)
18
+ 2. [Configuration Examples](#configuration-examples)
19
+ 3. [Agent Integration](#agent-integration)
20
+ 4. [Advanced Patterns](#advanced-patterns)
21
+ 5. [Content Transform](#content-transform)
22
+ 6. [Document Storage](#document-storage)
23
+ 7. [Example Projects](#example-projects)
24
+
25
+ ---
26
+
27
+ ## Quick Start
28
+
29
+ ### 1. Index Your Documentation
30
+
31
+ ```bash
32
+ # Index all markdown files in docs/
33
+ vat rag index docs/
34
+
35
+ # Output:
36
+ # status: success
37
+ # resourcesIndexed: 42
38
+ # chunksCreated: 156
39
+ # duration: 2134ms
40
+ ```
41
+
42
+ ### 2. Search Your Documentation
43
+
44
+ ```bash
45
+ # Ask a question in natural language
46
+ vat rag query "How do I configure agent tools?"
47
+
48
+ # Output shows relevant chunks:
49
+ # status: success
50
+ # chunks:
51
+ # - content: "Agent tools are configured in the spec.tools section..."
52
+ # filePath: docs/agent-configuration.md
53
+ # headingPath: Configuration > Tools
54
+ ```
55
+
56
+ ### 3. View Database Statistics
57
+
58
+ ```bash
59
+ vat rag stats
60
+
61
+ # Output:
62
+ # totalChunks: 156
63
+ # totalResources: 42
64
+ # embeddingModel: Xenova/all-MiniLM-L6-v2
65
+ # dbSizeBytes: 2458624
66
+ ```
67
+
68
+ ---
69
+
70
+ ## Configuration Examples
71
+
72
+ ### Example 1: Simple Project
73
+
74
+ **Use Case**: Single documentation directory, one RAG store
75
+
76
+ **vibe-agent-toolkit.config.yaml**:
77
+
78
+ ```yaml
79
+ version: 1
80
+
81
+ resources:
82
+ collections:
83
+ docs:
84
+ include:
85
+ - ./docs/**/*.md
86
+ - ./README.md
87
+
88
+ rag:
89
+ stores:
90
+ main:
91
+ db: ./.rag-db
92
+ resources: docs
93
+ ```
94
+
95
+ **Usage**:
96
+
97
+ ```bash
98
+ # Index uses config automatically
99
+ vat rag index
100
+
101
+ # Query uses default store
102
+ vat rag query "installation guide"
103
+ ```
104
+
105
+ ### Example 2: Multi-Language Documentation
106
+
107
+ **Use Case**: Separate RAG stores for different languages
108
+
109
+ **Config**:
110
+
111
+ ```yaml
112
+ version: 1
113
+
114
+ resources:
115
+ collections:
116
+ docs-en:
117
+ include:
118
+ - ./docs/en/**/*.md
119
+ docs-fr:
120
+ include:
121
+ - ./docs/fr/**/*.md
122
+ docs-es:
123
+ include:
124
+ - ./docs/es/**/*.md
125
+
126
+ rag:
127
+ stores:
128
+ en-rag:
129
+ db: ./dist/rag-en
130
+ resources: docs-en
131
+ fr-rag:
132
+ db: ./dist/rag-fr
133
+ resources: docs-fr
134
+ es-rag:
135
+ db: ./dist/rag-es
136
+ resources: docs-es
137
+ ```
138
+
139
+ **Usage**:
140
+
141
+ ```bash
142
+ # Index each language separately
143
+ vat rag index --db ./dist/rag-en docs/en/
144
+ vat rag index --db ./dist/rag-fr docs/fr/
145
+ vat rag index --db ./dist/rag-es docs/es/
146
+
147
+ # Query by language
148
+ vat rag query "installation" --db ./dist/rag-en
149
+ vat rag query "installation" --db ./dist/rag-fr
150
+ ```
151
+
152
+ ### Example 3: API Documentation + Examples
153
+
154
+ **Use Case**: Separate stores for API reference vs usage examples
155
+
156
+ **Config**:
157
+
158
+ ```yaml
159
+ version: 1
160
+
161
+ resources:
162
+ defaults:
163
+ exclude:
164
+ - '**/node_modules/**'
165
+ - '**/dist/**'
166
+
167
+ collections:
168
+ api-reference:
169
+ include:
170
+ - ./api-docs/**/*.md
171
+ metadata:
172
+ defaults:
173
+ type: api-reference
174
+ tags: [api, reference]
175
+
176
+ examples:
177
+ include:
178
+ - ./examples/**/*.{md,js,ts}
179
+ metadata:
180
+ defaults:
181
+ type: example
182
+ tags: [example, tutorial]
183
+
184
+ rag:
185
+ defaults:
186
+ embedding:
187
+ provider: openai
188
+ model: text-embedding-3-small
189
+ chunking:
190
+ targetSize: 256 # Smaller chunks for API docs
191
+
192
+ stores:
193
+ api-rag:
194
+ db: ./dist/api-rag
195
+ resources: api-reference
196
+
197
+ examples-rag:
198
+ db: ./dist/examples-rag
199
+ resources: examples
200
+ chunking:
201
+ targetSize: 512 # Larger chunks for examples
202
+ ```
203
+
204
+ **Usage**:
205
+
206
+ ```bash
207
+ # Index both stores
208
+ vat rag index --db ./dist/api-rag api-docs/
209
+ vat rag index --db ./dist/examples-rag examples/
210
+
211
+ # Search API docs
212
+ vat rag query "authentication endpoint" --db ./dist/api-rag
213
+
214
+ # Search examples
215
+ vat rag query "authentication example" --db ./dist/examples-rag
216
+ ```
217
+
218
+ ### Example 4: Agent Development Project
219
+
220
+ **Use Case**: Agent toolkit with multiple agents and shared knowledge base
221
+
222
+ **Config**:
223
+
224
+ ```yaml
225
+ version: 1
226
+
227
+ resources:
228
+ defaults:
229
+ exclude:
230
+ - '**/node_modules/**'
231
+ - '**/dist/**'
232
+ - '**/.git/**'
233
+
234
+ collections:
235
+ toolkit-docs:
236
+ include:
237
+ - ./docs/**/*.md
238
+ - ./README.md
239
+ agent-guides:
240
+ include:
241
+ - ./guides/**/*.md
242
+ api-reference:
243
+ include:
244
+ - ./api/**/*.md
245
+
246
+ agents:
247
+ include:
248
+ - ./packages/*/agents/**
249
+ - ./agents/**
250
+
251
+ rag:
252
+ defaults:
253
+ embedding:
254
+ provider: transformers-js
255
+ model: Xenova/all-MiniLM-L6-v2
256
+ chunking:
257
+ targetSize: 512
258
+ paddingFactor: 0.9
259
+
260
+ stores:
261
+ agent-knowledge:
262
+ db: ./dist/agent-knowledge-rag
263
+ resources: toolkit-docs
264
+ guide-rag:
265
+ db: ./dist/guide-rag
266
+ resources: agent-guides
267
+ api-rag:
268
+ db: ./dist/api-rag
269
+ resources: api-reference
270
+ chunking:
271
+ targetSize: 256 # Smaller for API docs
272
+ ```
273
+
274
+ ---
275
+
276
+ ## Agent Integration
277
+
278
+ ### Example: Code Review Agent with RAG
279
+
280
+ **Agent Manifest** (`agent.yaml`):
281
+
282
+ ```yaml
283
+ apiVersion: v1
284
+ kind: Agent
285
+ metadata:
286
+ name: code-review-agent
287
+ version: 1.0.0
288
+ description: Reviews code against best practices documentation
289
+
290
+ spec:
291
+ llm:
292
+ provider: anthropic
293
+ model: claude-sonnet-4-20250514
294
+
295
+ prompts:
296
+ system:
297
+ $ref: ./prompts/system.md
298
+
299
+ tools:
300
+ - name: search_best_practices
301
+ type: rag
302
+ config:
303
+ dbPath: ../../dist/best-practices-rag
304
+ description: Search coding best practices documentation
305
+ ```
306
+
307
+ **System Prompt** (`prompts/system.md`):
308
+
309
+ ```markdown
310
+ You are a code review assistant. When reviewing code:
311
+
312
+ 1. Use the search_best_practices tool to find relevant documentation
313
+ 2. Compare code against documented best practices
314
+ 3. Provide specific, actionable feedback
315
+
316
+ Example tool usage:
317
+ - Query: "error handling best practices"
318
+ - Query: "async await patterns"
319
+ - Query: "security guidelines"
320
+ ```
321
+
322
+ **TypeScript Integration**:
323
+
324
+ ```typescript
325
+ import { loadAgentManifest } from '@vibe-agent-toolkit/agent-config';
326
+ import { createRAGProvider } from '@vibe-agent-toolkit/rag-lancedb';
327
+
328
+ const manifest = await loadAgentManifest('./code-review-agent');
329
+
330
+ // Create RAG provider for tool
331
+ const ragTool = manifest.spec.tools.find(t => t.type === 'rag');
332
+ const ragProvider = await createRAGProvider({
333
+ dbPath: ragTool.config.dbPath,
334
+ readonly: true,
335
+ });
336
+
337
+ // Agent uses RAG during review
338
+ async function reviewCode(code: string) {
339
+ // Search for relevant best practices
340
+ const practices = await ragProvider.query({
341
+ text: `best practices for: ${extractTopics(code)}`,
342
+ limit: 5,
343
+ });
344
+
345
+ // Use practices as context for LLM review
346
+ const review = await callLLM({
347
+ prompt: `Review this code using these best practices:\n\n${formatPractices(practices)}\n\nCode:\n${code}`,
348
+ });
349
+
350
+ return review;
351
+ }
352
+ ```
353
+
354
+ ### Example: Documentation Assistant
355
+
356
+ **Use Case**: Agent that answers questions about project documentation
357
+
358
+ **Agent Setup**:
359
+
360
+ ```yaml
361
+ # doc-assistant/agent.yaml
362
+ spec:
363
+ tools:
364
+ - name: search_docs
365
+ type: rag
366
+ config:
367
+ dbPath: ../dist/docs-rag
368
+ description: Search project documentation
369
+ ```
370
+
371
+ **Usage**:
372
+
373
+ ```typescript
374
+ // User asks question
375
+ const userQuestion = "How do I configure authentication?";
376
+
377
+ // Agent searches docs
378
+ const relevantDocs = await ragProvider.query({
379
+ text: userQuestion,
380
+ limit: 10,
381
+ });
382
+
383
+ // Agent synthesizes answer from chunks
384
+ const context = relevantDocs.chunks
385
+ .map(chunk => `${chunk.filePath}:\n${chunk.content}`)
386
+ .join('\n\n');
387
+
388
+ const answer = await llm.complete({
389
+ system: "Answer based on documentation provided.",
390
+ user: `Question: ${userQuestion}\n\nDocumentation:\n${context}`,
391
+ });
392
+ ```
393
+
394
+ ---
395
+
396
+ ## Advanced Patterns
397
+
398
+ ### Pattern 1: Hybrid Search (Vector + Keyword)
399
+
400
+ **Coming Soon**: Combine semantic search with exact keyword matching
401
+
402
+ ```typescript
403
+ // Future API
404
+ const results = await ragProvider.query({
405
+ text: "authentication",
406
+ filters: {
407
+ keywords: ["OAuth", "JWT"], // Must contain these keywords
408
+ filePath: "docs/security/**", // Only search security docs
409
+ },
410
+ });
411
+ ```
412
+
413
+ ### Pattern 2: Incremental Indexing
414
+
415
+ **Use Case**: Update only changed files
416
+
417
+ ```bash
418
+ # Initial index
419
+ vat rag index docs/
420
+
421
+ # Make changes to docs/api.md
422
+ # ...
423
+
424
+ # Re-index (skips unchanged files automatically)
425
+ vat rag index docs/
426
+
427
+ # Output:
428
+ # resourcesIndexed: 1 # Only api.md
429
+ # resourcesSkipped: 41 # All others unchanged
430
+ # chunksDeleted: 5 # Old chunks from api.md
431
+ # chunksCreated: 6 # New chunks from api.md
432
+ ```
433
+
434
+ ### Pattern 3: Multi-Store Querying
435
+
436
+ **Use Case**: Search across multiple RAG stores
437
+
438
+ ```typescript
439
+ const stores = [
440
+ createRAGProvider({ dbPath: './dist/api-rag' }),
441
+ createRAGProvider({ dbPath: './dist/guides-rag' }),
442
+ createRAGProvider({ dbPath: './dist/examples-rag' }),
443
+ ];
444
+
445
+ // Query all stores in parallel
446
+ const results = await Promise.all(
447
+ stores.map(store => store.query({ text: userQuestion, limit: 5 }))
448
+ );
449
+
450
+ // Merge and deduplicate results
451
+ const allChunks = results.flatMap(r => r.chunks);
452
+ const uniqueChunks = deduplicateByContentHash(allChunks);
453
+ const topResults = sortByScore(uniqueChunks).slice(0, 10);
454
+ ```
455
+
456
+ ### Pattern 4: CI/CD Integration
457
+
458
+ **Use Case**: Build RAG database during deployment
459
+
460
+ **GitHub Actions** (`.github/workflows/build-rag.yml`):
461
+
462
+ ```yaml
463
+ name: Build RAG Database
464
+
465
+ on:
466
+ push:
467
+ paths:
468
+ - 'docs/**'
469
+ workflow_dispatch:
470
+
471
+ jobs:
472
+ build:
473
+ runs-on: ubuntu-latest
474
+ steps:
475
+ - uses: actions/checkout@v4
476
+ - uses: oven-sh/setup-bun@v1
477
+
478
+ - name: Install dependencies
479
+ run: bun install
480
+
481
+ - name: Build RAG database
482
+ run: |
483
+ bun run vat rag clear
484
+ bun run vat rag index docs/
485
+
486
+ - name: Upload RAG database
487
+ uses: actions/upload-artifact@v4
488
+ with:
489
+ name: rag-database
490
+ path: .rag-db/
491
+
492
+ - name: Commit RAG database to dist/
493
+ if: github.ref == 'refs/heads/main'
494
+ run: |
495
+ mv .rag-db dist/docs-rag
496
+ git config user.name github-actions
497
+ git config user.email github-actions@github.com
498
+ git add dist/docs-rag
499
+ git commit -m "chore: update RAG database"
500
+ git push
501
+ ```
502
+
503
+ ---
504
+
505
+ ## Content Transform
506
+
507
+ Content transforms rewrite markdown links before content is chunked, embedded, and persisted. This is useful when RAG-indexed content needs links rewritten for the consumer context -- for example, converting local file links to MCP resource URIs, or stripping links entirely for cleaner LLM context.
508
+
509
+ ### Programmatic API
510
+
511
+ Pass `contentTransform` when creating a RAG provider to apply link rewriting during indexing:
512
+
513
+ ```typescript
514
+ import { LanceDBRAGProvider } from '@vibe-agent-toolkit/rag-lancedb';
515
+ import type { ContentTransformOptions } from '@vibe-agent-toolkit/resources';
516
+
517
+ const contentTransform: ContentTransformOptions = {
518
+ linkRewriteRules: [
519
+ {
520
+ match: { type: 'local_file' },
521
+ template: '{{link.text}} (see: {{link.href}})',
522
+ },
523
+ {
524
+ match: { type: 'external' },
525
+ template: '[{{link.text}}]({{link.href}})', // Keep external links as-is
526
+ },
527
+ ],
528
+ };
529
+
530
+ const provider = await LanceDBRAGProvider.create({
531
+ dbPath: './dist/rag-db',
532
+ contentTransform,
533
+ });
534
+ ```
535
+
536
+ ### Template Variables
537
+
538
+ Templates use Mustache-style `{{variable}}` placeholders. The following variables are available:
539
+
540
+ | Variable | Description |
541
+ |----------|-------------|
542
+ | `link.text` | Link display text |
543
+ | `link.href` | Original href (without fragment) |
544
+ | `link.fragment` | Fragment portion including `#` (or empty string) |
545
+ | `link.type` | Link type: `local_file`, `anchor`, `external`, `email`, `unknown` |
546
+ | `link.resource.id` | Target resource ID (requires `resourceRegistry`) |
547
+ | `link.resource.filePath` | Target resource file path (requires `resourceRegistry`) |
548
+ | `link.resource.extension` | File extension (requires `resourceRegistry`) |
549
+ | `link.resource.mimeType` | Inferred MIME type (requires `resourceRegistry`) |
550
+ | `link.resource.frontmatter.*` | Frontmatter fields (requires `resourceRegistry`) |
551
+
552
+ ### Match Criteria
553
+
554
+ Rules are evaluated in order; the first matching rule wins. Each rule's `match` object supports:
555
+
556
+ - `type` -- link type(s) to match (e.g., `'local_file'`, `'external'`, or an array of types)
557
+ - `pattern` -- glob pattern(s) matched against the target resource's `filePath`
558
+ - `excludeResourceIds` -- resource IDs to exclude from matching
559
+
560
+ ### Advanced Example: Resource Registry
561
+
562
+ When a `resourceRegistry` is provided, templates can reference resolved resource metadata for richer rewriting:
563
+
564
+ ```typescript
565
+ import { transformContent, type LinkRewriteRule } from '@vibe-agent-toolkit/resources';
566
+
567
+ const rules: LinkRewriteRule[] = [
568
+ {
569
+ match: { type: 'local_file', pattern: 'docs/**/*.md' },
570
+ template: '{{link.text}} (resource: {{link.resource.id}}, type: {{link.resource.mimeType}})',
571
+ },
572
+ ];
573
+
574
+ // transformContent is also available as a standalone function
575
+ const transformed = transformContent(content, resource.links, {
576
+ linkRewriteRules: rules,
577
+ resourceRegistry: myRegistry,
578
+ });
579
+ ```
580
+
581
+ ### Key Behavior Notes
582
+
583
+ - Content hash is computed on **transformed** content, so changing transform rules triggers re-indexing.
584
+ - Links matching no rule are left untouched.
585
+ - When no `contentTransform` is provided, content is stored as-is (original behavior).
586
+
587
+ ---
588
+
589
+ ## Document Storage
590
+
591
+ By default, the RAG provider only stores chunked content for vector search. When `storeDocuments: true` is enabled, the full source document is also persisted in a separate `rag_documents` table. This enables the "search then retrieve" pattern -- find relevant chunks via vector search, then fetch the complete document for full context.
592
+
593
+ ### Programmatic API
594
+
595
+ ```typescript
596
+ const provider = await LanceDBRAGProvider.create({
597
+ dbPath: './dist/rag-db',
598
+ storeDocuments: true,
599
+ });
600
+
601
+ // Index resources (documents are stored automatically)
602
+ await provider.indexResources(resources);
603
+
604
+ // After finding relevant chunks via query...
605
+ const result = await provider.query({ text: 'authentication setup', limit: 5 });
606
+
607
+ // Retrieve full document for top result
608
+ const chunk = result.chunks[0];
609
+ if (chunk) {
610
+ const doc = await provider.getDocument(chunk.resourceId);
611
+ // doc.content — full document text
612
+ // doc.tokenCount — total tokens
613
+ // doc.totalChunks — number of chunks produced
614
+ // doc.metadata — frontmatter metadata
615
+ // doc.indexedAt — when it was indexed
616
+ }
617
+ ```
618
+
619
+ ### DocumentResult Fields
620
+
621
+ | Field | Type | Description |
622
+ |-------|------|-------------|
623
+ | `resourceId` | `string` | Source resource ID |
624
+ | `filePath` | `string` | Absolute file path |
625
+ | `content` | `string` | Full document content (transformed if applicable) |
626
+ | `contentHash` | `string` | SHA-256 hash of stored content |
627
+ | `tokenCount` | `number` | Token count of full document |
628
+ | `totalChunks` | `number` | Number of chunks produced |
629
+ | `indexedAt` | `Date` | When the document was indexed |
630
+ | `metadata` | `Record<string, unknown>` | Frontmatter metadata |
631
+
632
+ ### Combined Example: Content Transform + Document Storage
633
+
634
+ Both features compose naturally. When used together, the stored document content reflects the transformed output:
635
+
636
+ ```typescript
637
+ const provider = await LanceDBRAGProvider.create({
638
+ dbPath: './dist/rag-db',
639
+ storeDocuments: true,
640
+ contentTransform: {
641
+ linkRewriteRules: [
642
+ {
643
+ match: { type: 'local_file' },
644
+ template: '{{link.text}} (see: {{link.href}})',
645
+ },
646
+ ],
647
+ },
648
+ });
649
+
650
+ // Both chunks AND full documents will have transformed content
651
+ ```
652
+
653
+ ### Key Behavior Notes
654
+
655
+ - When `storeDocuments` is not enabled (default), `getDocument()` returns `null`.
656
+ - Documents are automatically updated/deleted when their resource is updated/deleted.
657
+ - Full document content reflects any `contentTransform` rules applied.
658
+
659
+ ---
660
+
661
+ ## Example Projects
662
+
663
+ ### Example 1: Personal Knowledge Base
664
+
665
+ **Structure**:
666
+
667
+ ```
668
+ my-knowledge-base/
669
+ ├── notes/
670
+ │ ├── programming/
671
+ │ ├── productivity/
672
+ │ └── learning/
673
+ ├── bookmarks/
674
+ │ └── articles.md
675
+ ├── vibe-agent-toolkit.config.yaml
676
+ └── .rag-db/
677
+ ```
678
+
679
+ **Config**:
680
+
681
+ ```yaml
682
+ version: 1
683
+
684
+ resources:
685
+ collections:
686
+ all-notes:
687
+ include:
688
+ - ./notes/**/*.md
689
+ - ./bookmarks/**/*.md
690
+
691
+ rag:
692
+ stores:
693
+ knowledge:
694
+ db: ./.rag-db
695
+ resources: all-notes
696
+ ```
697
+
698
+ **Usage**:
699
+
700
+ ```bash
701
+ # Index all notes
702
+ vat rag index
703
+
704
+ # Search notes
705
+ vat rag query "how to use docker compose"
706
+ vat rag query "productivity tips"
707
+ vat rag query "typescript generics"
708
+ ```
709
+
710
+ ### Example 2: Team Documentation Portal
711
+
712
+ **Structure**:
713
+
714
+ ```
715
+ team-docs/
716
+ ├── onboarding/
717
+ ├── processes/
718
+ ├── technical/
719
+ ├── agents/
720
+ │ └── doc-assistant/
721
+ ├── vibe-agent-toolkit.config.yaml
722
+ └── dist/
723
+ └── docs-rag/
724
+ ```
725
+
726
+ **Config**:
727
+
728
+ ```yaml
729
+ version: 1
730
+
731
+ resources:
732
+ collections:
733
+ team-docs:
734
+ include:
735
+ - ./onboarding/**/*.md
736
+ - ./processes/**/*.md
737
+ - ./technical/**/*.md
738
+
739
+ agents:
740
+ include:
741
+ - ./agents/**
742
+
743
+ rag:
744
+ stores:
745
+ team-knowledge:
746
+ db: ./dist/docs-rag
747
+ resources: team-docs
748
+ embedding:
749
+ provider: openai
750
+ model: text-embedding-3-small
751
+ ```
752
+
753
+ **Deploy**:
754
+
755
+ ```bash
756
+ # Build RAG database
757
+ bun run vat rag index
758
+
759
+ # Deploy doc-assistant agent
760
+ bun run vat agent build doc-assistant
761
+ bun run vat agent install doc-assistant
762
+
763
+ # Team members query via agent
764
+ vat agent run doc-assistant "What's our code review process?"
765
+ ```
766
+
767
+ ---
768
+
769
+ **See Also**:
770
+ - [RAG Architecture](rag.md)