qualia-framework 2.1.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +50 -0
- package/bin/cli.js +519 -0
- package/framework/agents/architecture-strategist.md +53 -0
- package/framework/agents/backend-agent.md +150 -0
- package/framework/agents/code-simplicity-reviewer.md +86 -0
- package/framework/agents/frontend-agent.md +111 -0
- package/framework/agents/kieran-typescript-reviewer.md +96 -0
- package/framework/agents/performance-oracle.md +111 -0
- package/framework/agents/qualia-codebase-mapper.md +760 -0
- package/framework/agents/qualia-debugger.md +1203 -0
- package/framework/agents/qualia-executor.md +881 -0
- package/framework/agents/qualia-integration-checker.md +423 -0
- package/framework/agents/qualia-phase-researcher.md +453 -0
- package/framework/agents/qualia-plan-checker.md +699 -0
- package/framework/agents/qualia-planner.md +1241 -0
- package/framework/agents/qualia-project-researcher.md +602 -0
- package/framework/agents/qualia-research-synthesizer.md +236 -0
- package/framework/agents/qualia-roadmapper.md +605 -0
- package/framework/agents/qualia-verifier.md +685 -0
- package/framework/agents/team-orchestrator.md +228 -0
- package/framework/agents/teams/full-stack-team.md +48 -0
- package/framework/agents/teams/optimize-team.md +53 -0
- package/framework/agents/teams/review-team.md +62 -0
- package/framework/agents/teams/ship-team.md +86 -0
- package/framework/agents/test-agent.md +182 -0
- package/framework/askpass.sh +2 -0
- package/framework/commands/design.md +53 -0
- package/framework/commands/quick-db.md +22 -0
- package/framework/config/retention.json +35 -0
- package/framework/core/PRINCIPLES.md +77 -0
- package/framework/hooks/auto-format.sh +45 -0
- package/framework/hooks/block-env-edit.sh +42 -0
- package/framework/hooks/branch-guard.sh +46 -0
- package/framework/hooks/confirm-delete.sh +56 -0
- package/framework/hooks/migration-validate.sh +68 -0
- package/framework/hooks/notification-speak.sh +15 -0
- package/framework/hooks/pre-commit.sh +80 -0
- package/framework/hooks/pre-compact.sh +55 -0
- package/framework/hooks/pre-deploy-gate.sh +151 -0
- package/framework/hooks/qualia-colors.sh +32 -0
- package/framework/hooks/retention-cleanup.sh +43 -0
- package/framework/hooks/save-session-state.sh +153 -0
- package/framework/hooks/session-context-loader.sh +28 -0
- package/framework/hooks/session-learn.sh +30 -0
- package/framework/knowledge/claudecode-bible.md +1384 -0
- package/framework/knowledge/client-prefs.md +22 -0
- package/framework/knowledge/common-fixes.md +25 -0
- package/framework/knowledge/deployment-map.md +35 -0
- package/framework/knowledge/email-signature.html +1 -0
- package/framework/knowledge/employees.md +8 -0
- package/framework/knowledge/learned-patterns.md +51 -0
- package/framework/knowledge/optimization-research-2026.md +137 -0
- package/framework/knowledge/qualia-context.md +67 -0
- package/framework/knowledge/supabase-patterns.md +50 -0
- package/framework/knowledge/voice-agent-patterns.md +46 -0
- package/framework/qualia-engine/VERSION +1 -0
- package/framework/qualia-engine/bin/qualia-tools.js +2160 -0
- package/framework/qualia-engine/bin/qualia-tools.test.js +1054 -0
- package/framework/qualia-engine/references/checkpoints.md +775 -0
- package/framework/qualia-engine/references/continuation-format.md +249 -0
- package/framework/qualia-engine/references/decimal-phase-calculation.md +65 -0
- package/framework/qualia-engine/references/design-quality.md +56 -0
- package/framework/qualia-engine/references/git-integration.md +254 -0
- package/framework/qualia-engine/references/git-planning-commit.md +50 -0
- package/framework/qualia-engine/references/model-profile-resolution.md +32 -0
- package/framework/qualia-engine/references/model-profiles.md +73 -0
- package/framework/qualia-engine/references/phase-argument-parsing.md +61 -0
- package/framework/qualia-engine/references/planning-config.md +195 -0
- package/framework/qualia-engine/references/questioning.md +141 -0
- package/framework/qualia-engine/references/tdd.md +263 -0
- package/framework/qualia-engine/references/ui-brand.md +160 -0
- package/framework/qualia-engine/references/verification-patterns.md +612 -0
- package/framework/qualia-engine/templates/DEBUG.md +159 -0
- package/framework/qualia-engine/templates/DESIGN.md +81 -0
- package/framework/qualia-engine/templates/UAT.md +247 -0
- package/framework/qualia-engine/templates/codebase/architecture.md +255 -0
- package/framework/qualia-engine/templates/codebase/concerns.md +310 -0
- package/framework/qualia-engine/templates/codebase/conventions.md +307 -0
- package/framework/qualia-engine/templates/codebase/integrations.md +280 -0
- package/framework/qualia-engine/templates/codebase/stack.md +186 -0
- package/framework/qualia-engine/templates/codebase/structure.md +285 -0
- package/framework/qualia-engine/templates/codebase/testing.md +480 -0
- package/framework/qualia-engine/templates/config.json +35 -0
- package/framework/qualia-engine/templates/context.md +283 -0
- package/framework/qualia-engine/templates/continue-here.md +78 -0
- package/framework/qualia-engine/templates/debug-subagent-prompt.md +91 -0
- package/framework/qualia-engine/templates/discovery.md +146 -0
- package/framework/qualia-engine/templates/milestone-archive.md +123 -0
- package/framework/qualia-engine/templates/milestone.md +115 -0
- package/framework/qualia-engine/templates/phase-prompt.md +567 -0
- package/framework/qualia-engine/templates/planner-subagent-prompt.md +117 -0
- package/framework/qualia-engine/templates/project.md +184 -0
- package/framework/qualia-engine/templates/projects/ai-agent.md +156 -0
- package/framework/qualia-engine/templates/projects/mobile-app.md +181 -0
- package/framework/qualia-engine/templates/projects/voice-agent.md +134 -0
- package/framework/qualia-engine/templates/projects/website.md +137 -0
- package/framework/qualia-engine/templates/requirements.md +231 -0
- package/framework/qualia-engine/templates/research-project/ARCHITECTURE.md +204 -0
- package/framework/qualia-engine/templates/research-project/FEATURES.md +147 -0
- package/framework/qualia-engine/templates/research-project/PITFALLS.md +200 -0
- package/framework/qualia-engine/templates/research-project/STACK.md +120 -0
- package/framework/qualia-engine/templates/research-project/SUMMARY.md +170 -0
- package/framework/qualia-engine/templates/research.md +552 -0
- package/framework/qualia-engine/templates/roadmap.md +202 -0
- package/framework/qualia-engine/templates/state.md +176 -0
- package/framework/qualia-engine/templates/summary-complex.md +59 -0
- package/framework/qualia-engine/templates/summary-minimal.md +41 -0
- package/framework/qualia-engine/templates/summary-standard.md +48 -0
- package/framework/qualia-engine/templates/summary.md +246 -0
- package/framework/qualia-engine/templates/user-setup.md +311 -0
- package/framework/qualia-engine/templates/verification-report.md +322 -0
- package/framework/qualia-engine/workflows/add-phase.md +179 -0
- package/framework/qualia-engine/workflows/add-todo.md +157 -0
- package/framework/qualia-engine/workflows/audit-milestone.md +241 -0
- package/framework/qualia-engine/workflows/check-todos.md +176 -0
- package/framework/qualia-engine/workflows/complete-milestone.md +858 -0
- package/framework/qualia-engine/workflows/diagnose-issues.md +219 -0
- package/framework/qualia-engine/workflows/discovery-phase.md +289 -0
- package/framework/qualia-engine/workflows/discuss-phase.md +534 -0
- package/framework/qualia-engine/workflows/execute-phase.md +559 -0
- package/framework/qualia-engine/workflows/execute-plan.md +438 -0
- package/framework/qualia-engine/workflows/help.md +470 -0
- package/framework/qualia-engine/workflows/insert-phase.md +220 -0
- package/framework/qualia-engine/workflows/list-phase-assumptions.md +178 -0
- package/framework/qualia-engine/workflows/map-codebase.md +327 -0
- package/framework/qualia-engine/workflows/new-milestone.md +363 -0
- package/framework/qualia-engine/workflows/new-project.md +1037 -0
- package/framework/qualia-engine/workflows/pause-work.md +122 -0
- package/framework/qualia-engine/workflows/plan-milestone-gaps.md +256 -0
- package/framework/qualia-engine/workflows/plan-phase.md +422 -0
- package/framework/qualia-engine/workflows/progress.md +354 -0
- package/framework/qualia-engine/workflows/quick.md +252 -0
- package/framework/qualia-engine/workflows/remove-phase.md +326 -0
- package/framework/qualia-engine/workflows/research-phase.md +74 -0
- package/framework/qualia-engine/workflows/resume-project.md +306 -0
- package/framework/qualia-engine/workflows/set-profile.md +80 -0
- package/framework/qualia-engine/workflows/settings.md +145 -0
- package/framework/qualia-engine/workflows/transition.md +556 -0
- package/framework/qualia-engine/workflows/update.md +197 -0
- package/framework/qualia-engine/workflows/verify-phase.md +195 -0
- package/framework/qualia-engine/workflows/verify-work.md +625 -0
- package/framework/rules/context7.md +11 -0
- package/framework/rules/deployment.md +29 -0
- package/framework/rules/frontend.md +33 -0
- package/framework/rules/security.md +12 -0
- package/framework/rules/speed.md +20 -0
- package/framework/scripts/__pycache__/say.cpython-314.pyc +0 -0
- package/framework/scripts/apply-retention.sh +120 -0
- package/framework/scripts/bootstrap-pop-os.sh +354 -0
- package/framework/scripts/claude-voice +13 -0
- package/framework/scripts/cleanup.sh +131 -0
- package/framework/scripts/cowork-mode.sh +141 -0
- package/framework/scripts/generate-project-claude-md.sh +153 -0
- package/framework/scripts/load-test-webhook.js +172 -0
- package/framework/scripts/say.py +236 -0
- package/framework/scripts/showcase-video-recorder/ffmpeg-builder.js +167 -0
- package/framework/scripts/showcase-video-recorder/playwright-helpers.js +216 -0
- package/framework/scripts/speak.py +55 -0
- package/framework/scripts/speak.sh +18 -0
- package/framework/scripts/status.sh +138 -0
- package/framework/scripts/sync-to-framework.sh +65 -0
- package/framework/scripts/voice-hotkey.py +227 -0
- package/framework/scripts/voice-input.sh +51 -0
- package/framework/skills/animate/SKILL.md +202 -0
- package/framework/skills/bolder/SKILL.md +144 -0
- package/framework/skills/browser-qa/SKILL.md +536 -0
- package/framework/skills/clarify/SKILL.md +179 -0
- package/framework/skills/colorize/SKILL.md +170 -0
- package/framework/skills/critique/SKILL.md +126 -0
- package/framework/skills/deep-research/SKILL.md +271 -0
- package/framework/skills/delight/SKILL.md +329 -0
- package/framework/skills/deploy/SKILL.md +261 -0
- package/framework/skills/deploy-verify/SKILL.md +377 -0
- package/framework/skills/deploy-verify/scripts/canary-check.sh +206 -0
- package/framework/skills/deploy-verify/scripts/check-console-errors.js +147 -0
- package/framework/skills/deploy-verify/scripts/check-cwv.js +139 -0
- package/framework/skills/deploy-verify/scripts/project-detect.sh +84 -0
- package/framework/skills/deploy-verify/scripts/verify.sh +548 -0
- package/framework/skills/design-quieter/SKILL.md +130 -0
- package/framework/skills/distill/SKILL.md +149 -0
- package/framework/skills/docs-lookup/SKILL.md +78 -0
- package/framework/skills/fcm-notifications/SKILL.md +125 -0
- package/framework/skills/financial-ledger/SKILL.md +1039 -0
- package/framework/skills/frontend-master/NOTICE.md +4 -0
- package/framework/skills/frontend-master/SKILL.md +127 -0
- package/framework/skills/frontend-master/reference/color-and-contrast.md +132 -0
- package/framework/skills/frontend-master/reference/interaction-design.md +123 -0
- package/framework/skills/frontend-master/reference/motion-design.md +99 -0
- package/framework/skills/frontend-master/reference/responsive-design.md +114 -0
- package/framework/skills/frontend-master/reference/spatial-design.md +100 -0
- package/framework/skills/frontend-master/reference/typography.md +131 -0
- package/framework/skills/frontend-master/reference/ux-writing.md +107 -0
- package/framework/skills/harden/SKILL.md +357 -0
- package/framework/skills/i18n-rtl/SKILL.md +752 -0
- package/framework/skills/learn/SKILL.md +71 -0
- package/framework/skills/memory/SKILL.md +50 -0
- package/framework/skills/mobile-expo/SKILL.md +864 -0
- package/framework/skills/mobile-expo/references/store-checklist.md +550 -0
- package/framework/skills/nestjs-backend/README.md +73 -0
- package/framework/skills/nestjs-backend/SKILL.md +446 -0
- package/framework/skills/nestjs-backend/references/templates.md +1173 -0
- package/framework/skills/normalize/SKILL.md +79 -0
- package/framework/skills/onboard/SKILL.md +242 -0
- package/framework/skills/polish/SKILL.md +209 -0
- package/framework/skills/pr/SKILL.md +66 -0
- package/framework/skills/qualia/SKILL.md +153 -0
- package/framework/skills/qualia-add-todo/SKILL.md +68 -0
- package/framework/skills/qualia-audit-milestone/SKILL.md +92 -0
- package/framework/skills/qualia-check-todos/SKILL.md +55 -0
- package/framework/skills/qualia-complete-milestone/SKILL.md +108 -0
- package/framework/skills/qualia-debug/SKILL.md +149 -0
- package/framework/skills/qualia-design/SKILL.md +203 -0
- package/framework/skills/qualia-discuss-phase/SKILL.md +72 -0
- package/framework/skills/qualia-execute-phase/SKILL.md +86 -0
- package/framework/skills/qualia-help/SKILL.md +67 -0
- package/framework/skills/qualia-idk/SKILL.md +352 -0
- package/framework/skills/qualia-list-phase-assumptions/SKILL.md +67 -0
- package/framework/skills/qualia-new-milestone/SKILL.md +72 -0
- package/framework/skills/qualia-new-project/SKILL.md +92 -0
- package/framework/skills/qualia-optimize/SKILL.md +417 -0
- package/framework/skills/qualia-pause-work/SKILL.md +96 -0
- package/framework/skills/qualia-plan-milestone-gaps/SKILL.md +57 -0
- package/framework/skills/qualia-plan-phase/SKILL.md +101 -0
- package/framework/skills/qualia-progress/SKILL.md +53 -0
- package/framework/skills/qualia-quick/SKILL.md +89 -0
- package/framework/skills/qualia-research-phase/SKILL.md +88 -0
- package/framework/skills/qualia-resume-work/SKILL.md +62 -0
- package/framework/skills/qualia-review/SKILL.md +263 -0
- package/framework/skills/qualia-start/SKILL.md +182 -0
- package/framework/skills/qualia-verify-work/SKILL.md +105 -0
- package/framework/skills/qualia-workflow/SKILL.md +130 -0
- package/framework/skills/rag/SKILL.md +750 -0
- package/framework/skills/responsive/SKILL.md +231 -0
- package/framework/skills/retro/SKILL.md +284 -0
- package/framework/skills/sakani-conventions/SKILL.md +136 -0
- package/framework/skills/sakani-conventions/evals/evals.json +23 -0
- package/framework/skills/sakani-conventions/references/entities.md +365 -0
- package/framework/skills/sakani-conventions/references/error-codes.md +95 -0
- package/framework/skills/seo-master/SKILL.md +490 -0
- package/framework/skills/seo-master/references/checklist.md +199 -0
- package/framework/skills/seo-master/references/structured-data.md +609 -0
- package/framework/skills/ship/SKILL.md +202 -0
- package/framework/skills/stack-researcher/SKILL.md +215 -0
- package/framework/skills/status/SKILL.md +154 -0
- package/framework/skills/status/scripts/health-check.sh +562 -0
- package/framework/skills/subscription-payments/SKILL.md +250 -0
- package/framework/skills/supabase/SKILL.md +973 -0
- package/framework/skills/supabase/references/templates.md +159 -0
- package/framework/skills/team/SKILL.md +67 -0
- package/framework/skills/test-runner/SKILL.md +202 -0
- package/framework/skills/voice-agent/SKILL.md +407 -0
- package/framework/skills/zoho-workflow/SKILL.md +51 -0
- package/framework/statusline-command.sh +117 -0
- package/package.json +24 -0
- package/profiles/fawzi.json +16 -0
- package/profiles/hasan.json +16 -0
- package/profiles/moayad.json +16 -0
- package/templates/CLAUDE-owner.md +52 -0
- package/templates/CLAUDE.md.hbs +58 -0
- package/templates/env.claude.template +12 -0
- package/templates/settings.json +141 -0
|
@@ -0,0 +1,750 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: rag
|
|
3
|
+
description: Build production RAG (Retrieval Augmented Generation) systems — Supabase pgvector setup, document chunking, embedding pipelines, retrieval + reranking, Claude generation with context injection. Full Next.js API route wiring.
|
|
4
|
+
tags: [rag, embeddings, pgvector, supabase, claude, ai, vector-search]
|
|
5
|
+
---
|
|
6
|
+
|
|
7
|
+
# RAG Builder
|
|
8
|
+
|
|
9
|
+
Build production-grade RAG systems on your stack: Supabase pgvector + Claude API + Next.js.
|
|
10
|
+
|
|
11
|
+
**Announce at start:** "Activating RAG builder. Let me set up your retrieval-augmented generation pipeline."
|
|
12
|
+
|
|
13
|
+
## Phase 1: Database Setup (Supabase pgvector)
|
|
14
|
+
|
|
15
|
+
### Enable pgvector Extension
|
|
16
|
+
|
|
17
|
+
```sql
|
|
18
|
+
-- Migration: supabase/migrations/YYYYMMDD_enable_pgvector.sql
|
|
19
|
+
CREATE EXTENSION IF NOT EXISTS vector WITH SCHEMA extensions;
|
|
20
|
+
```
|
|
21
|
+
|
|
22
|
+
### Documents Table
|
|
23
|
+
|
|
24
|
+
```sql
|
|
25
|
+
CREATE TABLE documents (
|
|
26
|
+
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
|
|
27
|
+
title TEXT NOT NULL,
|
|
28
|
+
source_url TEXT,
|
|
29
|
+
source_type TEXT NOT NULL DEFAULT 'manual', -- 'manual', 'web', 'pdf', 'api'
|
|
30
|
+
metadata JSONB DEFAULT '{}',
|
|
31
|
+
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
|
|
32
|
+
updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
|
|
33
|
+
);
|
|
34
|
+
|
|
35
|
+
ALTER TABLE documents ENABLE ROW LEVEL SECURITY;
|
|
36
|
+
```
|
|
37
|
+
|
|
38
|
+
### Chunks Table (with embeddings)
|
|
39
|
+
|
|
40
|
+
```sql
|
|
41
|
+
CREATE TABLE document_chunks (
|
|
42
|
+
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
|
|
43
|
+
document_id UUID NOT NULL REFERENCES documents(id) ON DELETE CASCADE,
|
|
44
|
+
content TEXT NOT NULL,
|
|
45
|
+
chunk_index INT NOT NULL,
|
|
46
|
+
token_count INT,
|
|
47
|
+
embedding vector(1024), -- Voyage 4-lite default (adjust per provider, see Phase 3)
|
|
48
|
+
metadata JSONB DEFAULT '{}',
|
|
49
|
+
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
|
|
50
|
+
|
|
51
|
+
UNIQUE(document_id, chunk_index)
|
|
52
|
+
);
|
|
53
|
+
|
|
54
|
+
ALTER TABLE document_chunks ENABLE ROW LEVEL SECURITY;
|
|
55
|
+
|
|
56
|
+
-- HNSW index for fast similarity search (better than ivfflat for < 1M rows)
|
|
57
|
+
CREATE INDEX idx_chunks_embedding ON document_chunks
|
|
58
|
+
USING hnsw (embedding vector_cosine_ops)
|
|
59
|
+
WITH (m = 16, ef_construction = 64);
|
|
60
|
+
|
|
61
|
+
-- For filtering by document
|
|
62
|
+
CREATE INDEX idx_chunks_document ON document_chunks(document_id);
|
|
63
|
+
```
|
|
64
|
+
|
|
65
|
+
### Match Function (RPC)
|
|
66
|
+
|
|
67
|
+
```sql
|
|
68
|
+
CREATE OR REPLACE FUNCTION match_documents(
|
|
69
|
+
query_embedding vector(1024),
|
|
70
|
+
match_threshold FLOAT DEFAULT 0.7,
|
|
71
|
+
match_count INT DEFAULT 5,
|
|
72
|
+
filter_metadata JSONB DEFAULT '{}'
|
|
73
|
+
)
|
|
74
|
+
RETURNS TABLE (
|
|
75
|
+
id UUID,
|
|
76
|
+
document_id UUID,
|
|
77
|
+
content TEXT,
|
|
78
|
+
metadata JSONB,
|
|
79
|
+
similarity FLOAT
|
|
80
|
+
)
|
|
81
|
+
LANGUAGE plpgsql
|
|
82
|
+
AS $$
|
|
83
|
+
BEGIN
|
|
84
|
+
RETURN QUERY
|
|
85
|
+
SELECT
|
|
86
|
+
dc.id,
|
|
87
|
+
dc.document_id,
|
|
88
|
+
dc.content,
|
|
89
|
+
dc.metadata,
|
|
90
|
+
1 - (dc.embedding <=> query_embedding) AS similarity
|
|
91
|
+
FROM document_chunks dc
|
|
92
|
+
WHERE 1 - (dc.embedding <=> query_embedding) > match_threshold
|
|
93
|
+
AND (filter_metadata = '{}' OR dc.metadata @> filter_metadata)
|
|
94
|
+
ORDER BY dc.embedding <=> query_embedding
|
|
95
|
+
LIMIT match_count;
|
|
96
|
+
END;
|
|
97
|
+
$$;
|
|
98
|
+
```
|
|
99
|
+
|
|
100
|
+
### Multi-tenant variant (add user_id or org_id scoping)
|
|
101
|
+
|
|
102
|
+
```sql
|
|
103
|
+
-- Add user_id to documents for RLS
|
|
104
|
+
ALTER TABLE documents ADD COLUMN user_id UUID REFERENCES auth.users(id);
|
|
105
|
+
|
|
106
|
+
CREATE POLICY "users_read_own_docs" ON documents
|
|
107
|
+
FOR SELECT USING (user_id = auth.uid());
|
|
108
|
+
|
|
109
|
+
CREATE POLICY "users_read_own_chunks" ON document_chunks
|
|
110
|
+
FOR SELECT USING (
|
|
111
|
+
EXISTS (
|
|
112
|
+
SELECT 1 FROM documents d
|
|
113
|
+
WHERE d.id = document_chunks.document_id
|
|
114
|
+
AND d.user_id = auth.uid()
|
|
115
|
+
)
|
|
116
|
+
);
|
|
117
|
+
```
|
|
118
|
+
|
|
119
|
+
## Phase 2: Document Chunking
|
|
120
|
+
|
|
121
|
+
### Chunking Strategy
|
|
122
|
+
|
|
123
|
+
Use **recursive character splitting** with overlap. This is the most reliable general-purpose strategy.
|
|
124
|
+
|
|
125
|
+
```typescript
|
|
126
|
+
// lib/rag/chunker.ts
|
|
127
|
+
|
|
128
|
+
interface ChunkOptions {
|
|
129
|
+
maxTokens?: number; // default 512
|
|
130
|
+
overlapTokens?: number; // default 50
|
|
131
|
+
separators?: string[];
|
|
132
|
+
}
|
|
133
|
+
|
|
134
|
+
const DEFAULT_SEPARATORS = ['\n\n', '\n', '. ', ', ', ' ', ''];
|
|
135
|
+
|
|
136
|
+
export function chunkText(
|
|
137
|
+
text: string,
|
|
138
|
+
options: ChunkOptions = {}
|
|
139
|
+
): string[] {
|
|
140
|
+
const {
|
|
141
|
+
maxTokens = 512,
|
|
142
|
+
overlapTokens = 50,
|
|
143
|
+
separators = DEFAULT_SEPARATORS,
|
|
144
|
+
} = options;
|
|
145
|
+
|
|
146
|
+
// Rough token estimate: 1 token ~ 4 chars
|
|
147
|
+
const maxChars = maxTokens * 4;
|
|
148
|
+
const overlapChars = overlapTokens * 4;
|
|
149
|
+
|
|
150
|
+
return recursiveSplit(text, maxChars, overlapChars, separators);
|
|
151
|
+
}
|
|
152
|
+
|
|
153
|
+
function recursiveSplit(
|
|
154
|
+
text: string,
|
|
155
|
+
maxChars: number,
|
|
156
|
+
overlapChars: number,
|
|
157
|
+
separators: string[]
|
|
158
|
+
): string[] {
|
|
159
|
+
if (text.length <= maxChars) return [text.trim()].filter(Boolean);
|
|
160
|
+
|
|
161
|
+
const sep = separators.find(s => text.includes(s)) ?? '';
|
|
162
|
+
const parts = text.split(sep);
|
|
163
|
+
const chunks: string[] = [];
|
|
164
|
+
let current = '';
|
|
165
|
+
|
|
166
|
+
for (const part of parts) {
|
|
167
|
+
const candidate = current ? current + sep + part : part;
|
|
168
|
+
if (candidate.length > maxChars && current) {
|
|
169
|
+
chunks.push(current.trim());
|
|
170
|
+
// Keep overlap from end of previous chunk
|
|
171
|
+
const overlapText = current.slice(-overlapChars);
|
|
172
|
+
current = overlapText + sep + part;
|
|
173
|
+
} else {
|
|
174
|
+
current = candidate;
|
|
175
|
+
}
|
|
176
|
+
}
|
|
177
|
+
if (current.trim()) chunks.push(current.trim());
|
|
178
|
+
|
|
179
|
+
// Recursively split any chunks that are still too large
|
|
180
|
+
return chunks.flatMap(chunk =>
|
|
181
|
+
chunk.length > maxChars
|
|
182
|
+
? recursiveSplit(chunk, maxChars, overlapChars, separators.slice(1))
|
|
183
|
+
: [chunk]
|
|
184
|
+
);
|
|
185
|
+
}
|
|
186
|
+
|
|
187
|
+
// Estimate token count (use tiktoken for exact counts if needed)
|
|
188
|
+
export function estimateTokens(text: string): number {
|
|
189
|
+
return Math.ceil(text.length / 4);
|
|
190
|
+
}
|
|
191
|
+
```
|
|
192
|
+
|
|
193
|
+
### Specialized chunkers
|
|
194
|
+
|
|
195
|
+
```typescript
|
|
196
|
+
// Markdown-aware chunking (respects headings)
|
|
197
|
+
export function chunkMarkdown(markdown: string, maxTokens = 512): string[] {
|
|
198
|
+
const sections = markdown.split(/(?=^#{1,3} )/m);
|
|
199
|
+
return sections.flatMap(section =>
|
|
200
|
+
chunkText(section, { maxTokens, separators: ['\n\n', '\n', '. ', ' ', ''] })
|
|
201
|
+
);
|
|
202
|
+
}
|
|
203
|
+
|
|
204
|
+
// Code-aware chunking (respects function boundaries)
|
|
205
|
+
export function chunkCode(code: string, maxTokens = 512): string[] {
|
|
206
|
+
const functionPattern = /(?=(?:export\s+)?(?:async\s+)?(?:function|const|class)\s)/;
|
|
207
|
+
const sections = code.split(functionPattern);
|
|
208
|
+
return sections.flatMap(section =>
|
|
209
|
+
chunkText(section, { maxTokens, separators: ['\n\n', '\n', ' ', ''] })
|
|
210
|
+
);
|
|
211
|
+
}
|
|
212
|
+
```
|
|
213
|
+
|
|
214
|
+
## Embedding Model Landscape (March 2026)
|
|
215
|
+
|
|
216
|
+
Pick based on your needs:
|
|
217
|
+
|
|
218
|
+
| Model | Provider | Dims | Price/MTok | Best For |
|
|
219
|
+
|-------|----------|------|------------|----------|
|
|
220
|
+
| **voyage-4-large** | Voyage AI (MongoDB) | 1024 | ~$0.12 | Best retrieval quality (MoE arch) |
|
|
221
|
+
| **voyage-4** | Voyage AI | 1024 | ~$0.06 | Great quality, mid-size cost |
|
|
222
|
+
| **voyage-4-lite** | Voyage AI | 1024 | ~$0.02 | Production sweet spot (quality/cost) |
|
|
223
|
+
| **gemini-embedding-001** | Google | 3072 | Free tier / $0.01 | Highest MTEB score, free quota |
|
|
224
|
+
| **Gemini Embedding 2** | Google | 3072 | Preview | Multimodal (text+image+video+audio) |
|
|
225
|
+
| **text-embedding-3-small** | OpenAI | 1536 | $0.02 | Reliable, mature ecosystem |
|
|
226
|
+
| **text-embedding-3-large** | OpenAI | 3072 | $0.13 | Higher quality OpenAI option |
|
|
227
|
+
| **Cohere Embed v4** | Cohere | 1536 | $0.12 | Multimodal (text+image), 128k ctx |
|
|
228
|
+
| **Qwen3-Embedding-8B** | Qwen (open) | 4096 | Self-host | #1 MTEB multilingual, 32k ctx |
|
|
229
|
+
| **e5-small** | Microsoft (open) | 384 | Self-host | Fastest (<30ms), 100% Top-5 |
|
|
230
|
+
|
|
231
|
+
**Key insights:**
|
|
232
|
+
- **Voyage 4 series** has shared embedding space — you can embed docs with `voyage-4-large` and query with `voyage-4-lite` (asymmetric retrieval, saves cost)
|
|
233
|
+
- **Google deprecated `text-embedding-004`** in Jan 2026 — use `gemini-embedding-001` instead
|
|
234
|
+
- **Gemini Embedding 2** (preview) is the first production multimodal embedding — text, images, video, audio in one vector space
|
|
235
|
+
- All modern models support **Matryoshka embeddings** — reduce dims (e.g. 1024 -> 256) with minimal quality loss
|
|
236
|
+
|
|
237
|
+
### Recommended default: Voyage 4-lite (best value) or Gemini Embedding 001 (free tier)
|
|
238
|
+
|
|
239
|
+
## Phase 3: Embedding Pipeline
|
|
240
|
+
|
|
241
|
+
### Option A: Voyage AI (recommended for retrieval quality)
|
|
242
|
+
|
|
243
|
+
```typescript
|
|
244
|
+
// lib/rag/embeddings.ts
|
|
245
|
+
|
|
246
|
+
export async function generateEmbedding(
|
|
247
|
+
text: string,
|
|
248
|
+
inputType: 'query' | 'document' = 'query'
|
|
249
|
+
): Promise<number[]> {
|
|
250
|
+
const response = await fetch('https://api.voyageai.com/v1/embeddings', {
|
|
251
|
+
method: 'POST',
|
|
252
|
+
headers: {
|
|
253
|
+
'Content-Type': 'application/json',
|
|
254
|
+
'Authorization': `Bearer ${process.env.VOYAGE_API_KEY}`,
|
|
255
|
+
},
|
|
256
|
+
body: JSON.stringify({
|
|
257
|
+
model: 'voyage-4-lite', // 1024 dims, fast + cheap
|
|
258
|
+
input: [text],
|
|
259
|
+
input_type: inputType, // 'document' when indexing, 'query' when searching
|
|
260
|
+
output_dimension: 1024, // Can reduce to 512/256 via Matryoshka
|
|
261
|
+
}),
|
|
262
|
+
});
|
|
263
|
+
const data = await response.json();
|
|
264
|
+
return data.data[0].embedding;
|
|
265
|
+
}
|
|
266
|
+
|
|
267
|
+
export async function generateEmbeddings(
|
|
268
|
+
texts: string[],
|
|
269
|
+
inputType: 'query' | 'document' = 'document'
|
|
270
|
+
): Promise<number[][]> {
|
|
271
|
+
// Voyage supports batch embedding
|
|
272
|
+
const response = await fetch('https://api.voyageai.com/v1/embeddings', {
|
|
273
|
+
method: 'POST',
|
|
274
|
+
headers: {
|
|
275
|
+
'Content-Type': 'application/json',
|
|
276
|
+
'Authorization': `Bearer ${process.env.VOYAGE_API_KEY}`,
|
|
277
|
+
},
|
|
278
|
+
body: JSON.stringify({
|
|
279
|
+
model: 'voyage-4-lite',
|
|
280
|
+
input: texts,
|
|
281
|
+
input_type: inputType,
|
|
282
|
+
output_dimension: 1024,
|
|
283
|
+
}),
|
|
284
|
+
});
|
|
285
|
+
const data = await response.json();
|
|
286
|
+
return data.data.map((d: { embedding: number[] }) => d.embedding);
|
|
287
|
+
}
|
|
288
|
+
```
|
|
289
|
+
|
|
290
|
+
### Option B: Google Gemini Embedding (free tier, highest MTEB)
|
|
291
|
+
|
|
292
|
+
```typescript
|
|
293
|
+
// lib/rag/embeddings-gemini.ts
|
|
294
|
+
import { GoogleGenAI } from '@google/genai';
|
|
295
|
+
|
|
296
|
+
const genai = new GoogleGenAI({ apiKey: process.env.GEMINI_API_KEY });
|
|
297
|
+
|
|
298
|
+
export async function generateEmbedding(text: string): Promise<number[]> {
|
|
299
|
+
const response = await genai.models.embedContent({
|
|
300
|
+
model: 'gemini-embedding-001',
|
|
301
|
+
contents: text,
|
|
302
|
+
config: { outputDimensionality: 1536 }, // Default 3072, can reduce to 1536/768
|
|
303
|
+
});
|
|
304
|
+
return response.embeddings![0].values!;
|
|
305
|
+
}
|
|
306
|
+
|
|
307
|
+
export async function generateEmbeddings(texts: string[]): Promise<number[][]> {
|
|
308
|
+
// Batch embed
|
|
309
|
+
const results = await Promise.all(
|
|
310
|
+
texts.map(text =>
|
|
311
|
+
genai.models.embedContent({
|
|
312
|
+
model: 'gemini-embedding-001',
|
|
313
|
+
contents: text,
|
|
314
|
+
config: { outputDimensionality: 1536 },
|
|
315
|
+
})
|
|
316
|
+
)
|
|
317
|
+
);
|
|
318
|
+
return results.map(r => r.embeddings![0].values!);
|
|
319
|
+
}
|
|
320
|
+
```
|
|
321
|
+
|
|
322
|
+
### Option C: OpenAI (mature, wide ecosystem support)
|
|
323
|
+
|
|
324
|
+
```typescript
|
|
325
|
+
// lib/rag/embeddings-openai.ts
|
|
326
|
+
import OpenAI from 'openai';
|
|
327
|
+
|
|
328
|
+
const openai = new OpenAI({ apiKey: process.env.OPENAI_API_KEY });
|
|
329
|
+
|
|
330
|
+
export async function generateEmbedding(text: string): Promise<number[]> {
|
|
331
|
+
const response = await openai.embeddings.create({
|
|
332
|
+
model: 'text-embedding-3-small', // $0.02/MTok, 1536 dims
|
|
333
|
+
input: text,
|
|
334
|
+
});
|
|
335
|
+
return response.data[0].embedding;
|
|
336
|
+
}
|
|
337
|
+
|
|
338
|
+
export async function generateEmbeddings(texts: string[]): Promise<number[][]> {
|
|
339
|
+
const response = await openai.embeddings.create({
|
|
340
|
+
model: 'text-embedding-3-small',
|
|
341
|
+
input: texts, // Up to 2048 inputs per batch
|
|
342
|
+
});
|
|
343
|
+
return response.data.map(d => d.embedding);
|
|
344
|
+
}
|
|
345
|
+
```
|
|
346
|
+
|
|
347
|
+
### Vector dimension by provider
|
|
348
|
+
|
|
349
|
+
```sql
|
|
350
|
+
-- Match your chosen provider:
|
|
351
|
+
embedding vector(1024) -- Voyage 4 series (default 1024)
|
|
352
|
+
embedding vector(1536) -- OpenAI text-embedding-3-small / Gemini (reduced) / Cohere Embed v4
|
|
353
|
+
embedding vector(3072) -- OpenAI text-embedding-3-large / Gemini (full)
|
|
354
|
+
embedding vector(384) -- e5-small (self-hosted, fastest)
|
|
355
|
+
```
|
|
356
|
+
|
|
357
|
+
### Ingest Pipeline
|
|
358
|
+
|
|
359
|
+
```typescript
|
|
360
|
+
// lib/rag/ingest.ts
|
|
361
|
+
import { createClient } from '@/lib/supabase/server';
|
|
362
|
+
import { chunkText, estimateTokens } from './chunker';
|
|
363
|
+
import { generateEmbeddings } from './embeddings';
|
|
364
|
+
|
|
365
|
+
export async function ingestDocument(
|
|
366
|
+
title: string,
|
|
367
|
+
content: string,
|
|
368
|
+
metadata: Record<string, unknown> = {}
|
|
369
|
+
) {
|
|
370
|
+
const supabase = await createClient();
|
|
371
|
+
|
|
372
|
+
// 1. Create document record
|
|
373
|
+
const { data: doc, error: docError } = await supabase
|
|
374
|
+
.from('documents')
|
|
375
|
+
.insert({ title, metadata, source_type: 'manual' })
|
|
376
|
+
.select('id')
|
|
377
|
+
.single();
|
|
378
|
+
|
|
379
|
+
if (docError) throw docError;
|
|
380
|
+
|
|
381
|
+
// 2. Chunk the content
|
|
382
|
+
const chunks = chunkText(content, { maxTokens: 512, overlapTokens: 50 });
|
|
383
|
+
|
|
384
|
+
// 3. Generate embeddings in batches
|
|
385
|
+
const batchSize = 100;
|
|
386
|
+
for (let i = 0; i < chunks.length; i += batchSize) {
|
|
387
|
+
const batch = chunks.slice(i, i + batchSize);
|
|
388
|
+
const embeddings = await generateEmbeddings(batch);
|
|
389
|
+
|
|
390
|
+
// 4. Insert chunks with embeddings
|
|
391
|
+
const rows = batch.map((chunk, j) => ({
|
|
392
|
+
document_id: doc.id,
|
|
393
|
+
content: chunk,
|
|
394
|
+
chunk_index: i + j,
|
|
395
|
+
token_count: estimateTokens(chunk),
|
|
396
|
+
embedding: JSON.stringify(embeddings[j]),
|
|
397
|
+
}));
|
|
398
|
+
|
|
399
|
+
const { error } = await supabase.from('document_chunks').insert(rows);
|
|
400
|
+
if (error) throw error;
|
|
401
|
+
}
|
|
402
|
+
|
|
403
|
+
return doc.id;
|
|
404
|
+
}
|
|
405
|
+
```
|
|
406
|
+
|
|
407
|
+
## Phase 4: Retrieval
|
|
408
|
+
|
|
409
|
+
### Basic Similarity Search
|
|
410
|
+
|
|
411
|
+
```typescript
|
|
412
|
+
// lib/rag/retrieve.ts
|
|
413
|
+
import { createClient } from '@/lib/supabase/server';
|
|
414
|
+
import { generateEmbedding } from './embeddings';
|
|
415
|
+
|
|
416
|
+
export async function retrieveContext(
|
|
417
|
+
query: string,
|
|
418
|
+
options: {
|
|
419
|
+
matchThreshold?: number;
|
|
420
|
+
matchCount?: number;
|
|
421
|
+
filterMetadata?: Record<string, unknown>;
|
|
422
|
+
} = {}
|
|
423
|
+
) {
|
|
424
|
+
const {
|
|
425
|
+
matchThreshold = 0.7,
|
|
426
|
+
matchCount = 5,
|
|
427
|
+
filterMetadata = {},
|
|
428
|
+
} = options;
|
|
429
|
+
|
|
430
|
+
const supabase = await createClient();
|
|
431
|
+
const queryEmbedding = await generateEmbedding(query);
|
|
432
|
+
|
|
433
|
+
const { data, error } = await supabase.rpc('match_documents', {
|
|
434
|
+
query_embedding: JSON.stringify(queryEmbedding),
|
|
435
|
+
match_threshold: matchThreshold,
|
|
436
|
+
match_count: matchCount,
|
|
437
|
+
filter_metadata: filterMetadata,
|
|
438
|
+
});
|
|
439
|
+
|
|
440
|
+
if (error) throw error;
|
|
441
|
+
return data as Array<{
|
|
442
|
+
id: string;
|
|
443
|
+
document_id: string;
|
|
444
|
+
content: string;
|
|
445
|
+
metadata: Record<string, unknown>;
|
|
446
|
+
similarity: number;
|
|
447
|
+
}>;
|
|
448
|
+
}
|
|
449
|
+
```
|
|
450
|
+
|
|
451
|
+
### Hybrid Search (vector + full-text)
|
|
452
|
+
|
|
453
|
+
```sql
|
|
454
|
+
-- Add full-text search column
|
|
455
|
+
ALTER TABLE document_chunks ADD COLUMN fts tsvector
|
|
456
|
+
GENERATED ALWAYS AS (to_tsvector('english', content)) STORED;
|
|
457
|
+
|
|
458
|
+
CREATE INDEX idx_chunks_fts ON document_chunks USING gin(fts);
|
|
459
|
+
|
|
460
|
+
-- Hybrid search function
|
|
461
|
+
CREATE OR REPLACE FUNCTION hybrid_search(
|
|
462
|
+
query_text TEXT,
|
|
463
|
+
query_embedding vector(1024),
|
|
464
|
+
match_count INT DEFAULT 5,
|
|
465
|
+
keyword_weight FLOAT DEFAULT 0.3,
|
|
466
|
+
semantic_weight FLOAT DEFAULT 0.7
|
|
467
|
+
)
|
|
468
|
+
RETURNS TABLE (
|
|
469
|
+
id UUID,
|
|
470
|
+
document_id UUID,
|
|
471
|
+
content TEXT,
|
|
472
|
+
metadata JSONB,
|
|
473
|
+
score FLOAT
|
|
474
|
+
)
|
|
475
|
+
LANGUAGE plpgsql
|
|
476
|
+
AS $$
|
|
477
|
+
BEGIN
|
|
478
|
+
RETURN QUERY
|
|
479
|
+
WITH semantic AS (
|
|
480
|
+
SELECT dc.id, 1 - (dc.embedding <=> query_embedding) AS sim
|
|
481
|
+
FROM document_chunks dc
|
|
482
|
+
ORDER BY dc.embedding <=> query_embedding
|
|
483
|
+
LIMIT match_count * 2
|
|
484
|
+
),
|
|
485
|
+
keyword AS (
|
|
486
|
+
SELECT dc.id, ts_rank(dc.fts, websearch_to_tsquery('english', query_text)) AS rank
|
|
487
|
+
FROM document_chunks dc
|
|
488
|
+
WHERE dc.fts @@ websearch_to_tsquery('english', query_text)
|
|
489
|
+
LIMIT match_count * 2
|
|
490
|
+
),
|
|
491
|
+
combined AS (
|
|
492
|
+
SELECT
|
|
493
|
+
COALESCE(s.id, k.id) AS chunk_id,
|
|
494
|
+
(COALESCE(s.sim, 0) * semantic_weight + COALESCE(k.rank, 0) * keyword_weight) AS combined_score
|
|
495
|
+
FROM semantic s
|
|
496
|
+
FULL OUTER JOIN keyword k ON s.id = k.id
|
|
497
|
+
)
|
|
498
|
+
SELECT dc.id, dc.document_id, dc.content, dc.metadata, c.combined_score AS score
|
|
499
|
+
FROM combined c
|
|
500
|
+
JOIN document_chunks dc ON dc.id = c.chunk_id
|
|
501
|
+
ORDER BY c.combined_score DESC
|
|
502
|
+
LIMIT match_count;
|
|
503
|
+
END;
|
|
504
|
+
$$;
|
|
505
|
+
```
|
|
506
|
+
|
|
507
|
+
### Reranking (optional, improves quality)
|
|
508
|
+
|
|
509
|
+
```typescript
|
|
510
|
+
// lib/rag/rerank.ts
|
|
511
|
+
|
|
512
|
+
export async function rerankResults(
|
|
513
|
+
query: string,
|
|
514
|
+
results: Array<{ content: string; [key: string]: unknown }>
|
|
515
|
+
): Promise<typeof results> {
|
|
516
|
+
// Cohere Rerank API
|
|
517
|
+
const response = await fetch('https://api.cohere.com/v2/rerank', {
|
|
518
|
+
method: 'POST',
|
|
519
|
+
headers: {
|
|
520
|
+
'Content-Type': 'application/json',
|
|
521
|
+
'Authorization': `Bearer ${process.env.COHERE_API_KEY}`,
|
|
522
|
+
},
|
|
523
|
+
body: JSON.stringify({
|
|
524
|
+
model: 'rerank-v3.5',
|
|
525
|
+
query,
|
|
526
|
+
documents: results.map(r => r.content),
|
|
527
|
+
top_n: Math.min(results.length, 5),
|
|
528
|
+
}),
|
|
529
|
+
});
|
|
530
|
+
|
|
531
|
+
const data = await response.json();
|
|
532
|
+
return data.results.map((r: { index: number }) => results[r.index]);
|
|
533
|
+
}
|
|
534
|
+
```
|
|
535
|
+
|
|
536
|
+
## Phase 5: Generation (Claude)
|
|
537
|
+
|
|
538
|
+
### RAG Query with Claude
|
|
539
|
+
|
|
540
|
+
```typescript
|
|
541
|
+
// lib/rag/generate.ts
|
|
542
|
+
import Anthropic from '@anthropic-ai/sdk';
|
|
543
|
+
import { retrieveContext } from './retrieve';
|
|
544
|
+
|
|
545
|
+
const anthropic = new Anthropic({ apiKey: process.env.ANTHROPIC_API_KEY });
|
|
546
|
+
|
|
547
|
+
export async function ragQuery(
|
|
548
|
+
userQuery: string,
|
|
549
|
+
options: {
|
|
550
|
+
systemPrompt?: string;
|
|
551
|
+
matchCount?: number;
|
|
552
|
+
matchThreshold?: number;
|
|
553
|
+
} = {}
|
|
554
|
+
) {
|
|
555
|
+
const {
|
|
556
|
+
systemPrompt = 'You are a helpful assistant. Answer questions based on the provided context. If the context does not contain the answer, say so clearly.',
|
|
557
|
+
matchCount = 5,
|
|
558
|
+
matchThreshold = 0.7,
|
|
559
|
+
} = options;
|
|
560
|
+
|
|
561
|
+
// 1. Retrieve relevant context
|
|
562
|
+
const context = await retrieveContext(userQuery, { matchCount, matchThreshold });
|
|
563
|
+
|
|
564
|
+
if (context.length === 0) {
|
|
565
|
+
return {
|
|
566
|
+
answer: 'I could not find relevant information to answer your question.',
|
|
567
|
+
sources: [],
|
|
568
|
+
};
|
|
569
|
+
}
|
|
570
|
+
|
|
571
|
+
// 2. Build context block
|
|
572
|
+
const contextBlock = context
|
|
573
|
+
.map((c, i) => `[Source ${i + 1}] (similarity: ${c.similarity.toFixed(3)})\n${c.content}`)
|
|
574
|
+
.join('\n\n---\n\n');
|
|
575
|
+
|
|
576
|
+
// 3. Generate with Claude
|
|
577
|
+
const message = await anthropic.messages.create({
|
|
578
|
+
model: 'claude-sonnet-4-6',
|
|
579
|
+
max_tokens: 1024,
|
|
580
|
+
system: `${systemPrompt}\n\n<context>\n${contextBlock}\n</context>`,
|
|
581
|
+
messages: [{ role: 'user', content: userQuery }],
|
|
582
|
+
});
|
|
583
|
+
|
|
584
|
+
const answer = message.content[0].type === 'text' ? message.content[0].text : '';
|
|
585
|
+
|
|
586
|
+
return {
|
|
587
|
+
answer,
|
|
588
|
+
sources: context.map(c => ({
|
|
589
|
+
content: c.content.slice(0, 200),
|
|
590
|
+
similarity: c.similarity,
|
|
591
|
+
document_id: c.document_id,
|
|
592
|
+
})),
|
|
593
|
+
usage: message.usage,
|
|
594
|
+
};
|
|
595
|
+
}
|
|
596
|
+
```
|
|
597
|
+
|
|
598
|
+
### Streaming variant
|
|
599
|
+
|
|
600
|
+
```typescript
|
|
601
|
+
// lib/rag/generate-stream.ts
|
|
602
|
+
import Anthropic from '@anthropic-ai/sdk';
|
|
603
|
+
import { retrieveContext } from './retrieve';
|
|
604
|
+
|
|
605
|
+
const anthropic = new Anthropic({ apiKey: process.env.ANTHROPIC_API_KEY });
|
|
606
|
+
|
|
607
|
+
export async function ragQueryStream(
|
|
608
|
+
userQuery: string,
|
|
609
|
+
systemPrompt = 'Answer based on the provided context.',
|
|
610
|
+
) {
|
|
611
|
+
const context = await retrieveContext(userQuery, { matchCount: 5 });
|
|
612
|
+
|
|
613
|
+
const contextBlock = context
|
|
614
|
+
.map((c, i) => `[Source ${i + 1}]\n${c.content}`)
|
|
615
|
+
.join('\n\n---\n\n');
|
|
616
|
+
|
|
617
|
+
return anthropic.messages.stream({
|
|
618
|
+
model: 'claude-sonnet-4-6',
|
|
619
|
+
max_tokens: 1024,
|
|
620
|
+
system: `${systemPrompt}\n\n<context>\n${contextBlock}\n</context>`,
|
|
621
|
+
messages: [{ role: 'user', content: userQuery }],
|
|
622
|
+
});
|
|
623
|
+
}
|
|
624
|
+
```
|
|
625
|
+
|
|
626
|
+
## Phase 6: Next.js API Routes
|
|
627
|
+
|
|
628
|
+
### Chat endpoint (streaming)
|
|
629
|
+
|
|
630
|
+
```typescript
|
|
631
|
+
// app/api/chat/route.ts
|
|
632
|
+
import { ragQueryStream } from '@/lib/rag/generate-stream';
|
|
633
|
+
|
|
634
|
+
export async function POST(req: Request) {
|
|
635
|
+
const { message } = await req.json();
|
|
636
|
+
|
|
637
|
+
if (!message || typeof message !== 'string') {
|
|
638
|
+
return Response.json({ error: 'Message is required' }, { status: 400 });
|
|
639
|
+
}
|
|
640
|
+
|
|
641
|
+
const stream = await ragQueryStream(message);
|
|
642
|
+
|
|
643
|
+
return new Response(stream.toReadableStream(), {
|
|
644
|
+
headers: { 'Content-Type': 'text/event-stream' },
|
|
645
|
+
});
|
|
646
|
+
}
|
|
647
|
+
```
|
|
648
|
+
|
|
649
|
+
### Ingest endpoint
|
|
650
|
+
|
|
651
|
+
```typescript
|
|
652
|
+
// app/api/ingest/route.ts
|
|
653
|
+
import { ingestDocument } from '@/lib/rag/ingest';
|
|
654
|
+
import { z } from 'zod';
|
|
655
|
+
|
|
656
|
+
const IngestSchema = z.object({
|
|
657
|
+
title: z.string().min(1),
|
|
658
|
+
content: z.string().min(1),
|
|
659
|
+
metadata: z.record(z.unknown()).optional(),
|
|
660
|
+
});
|
|
661
|
+
|
|
662
|
+
export async function POST(req: Request) {
|
|
663
|
+
const body = await req.json();
|
|
664
|
+
const parsed = IngestSchema.safeParse(body);
|
|
665
|
+
|
|
666
|
+
if (!parsed.success) {
|
|
667
|
+
return Response.json({ error: parsed.error.flatten() }, { status: 400 });
|
|
668
|
+
}
|
|
669
|
+
|
|
670
|
+
const docId = await ingestDocument(
|
|
671
|
+
parsed.data.title,
|
|
672
|
+
parsed.data.content,
|
|
673
|
+
parsed.data.metadata
|
|
674
|
+
);
|
|
675
|
+
|
|
676
|
+
return Response.json({ documentId: docId });
|
|
677
|
+
}
|
|
678
|
+
```
|
|
679
|
+
|
|
680
|
+
## Quick Start Checklist
|
|
681
|
+
|
|
682
|
+
When user asks to build RAG, follow this order:
|
|
683
|
+
|
|
684
|
+
1. **Database**: Run pgvector migration (Phase 1)
|
|
685
|
+
2. **Chunker**: Create `lib/rag/chunker.ts` (Phase 2)
|
|
686
|
+
3. **Embeddings**: Create `lib/rag/embeddings.ts` with chosen provider (Phase 3)
|
|
687
|
+
4. **Ingest**: Create `lib/rag/ingest.ts` (Phase 3)
|
|
688
|
+
5. **Retrieve**: Create `lib/rag/retrieve.ts` (Phase 4)
|
|
689
|
+
6. **Generate**: Create `lib/rag/generate.ts` (Phase 5)
|
|
690
|
+
7. **API Routes**: Wire up endpoints (Phase 6)
|
|
691
|
+
8. **Test**: Ingest a sample doc, query it, verify results
|
|
692
|
+
|
|
693
|
+
## Key Decisions to Ask User
|
|
694
|
+
|
|
695
|
+
- **Embedding provider**: Voyage 4-lite (best value), Gemini Embedding 001 (free tier, highest MTEB), or OpenAI (mature ecosystem)?
|
|
696
|
+
- **Vector dimensions**: 1024 (Voyage 4), 1536 (OpenAI/Gemini reduced), 3072 (Gemini/OpenAI full)?
|
|
697
|
+
- **Hybrid search**: Pure vector or vector + full-text keyword? (hybrid recommended for production)
|
|
698
|
+
- **Reranking**: Add Cohere rerank step? Pair Voyage with Voyage Reranker, or use Cohere rerank-v3.5?
|
|
699
|
+
- **Multi-tenant**: Scope documents per user/org?
|
|
700
|
+
- **Generation model**: Claude Sonnet 4.6 (fast + cheap) vs Opus 4.6 (highest quality)?
|
|
701
|
+
- **Multimodal**: Need image/video/audio embeddings? Use Gemini Embedding 2 or voyage-multimodal-3.5
|
|
702
|
+
- **Asymmetric retrieval**: Voyage 4 shared space lets you embed docs with large model, query with lite (saves cost)
|
|
703
|
+
|
|
704
|
+
## Environment Variables Needed
|
|
705
|
+
|
|
706
|
+
```env
|
|
707
|
+
# Embeddings (pick one)
|
|
708
|
+
VOYAGE_API_KEY=pa-... # Voyage 4 series (recommended)
|
|
709
|
+
GEMINI_API_KEY=... # Google Gemini Embedding
|
|
710
|
+
OPENAI_API_KEY=sk-... # OpenAI text-embedding-3
|
|
711
|
+
|
|
712
|
+
# Generation
|
|
713
|
+
ANTHROPIC_API_KEY=sk-ant-...
|
|
714
|
+
|
|
715
|
+
# Optional: Reranking
|
|
716
|
+
COHERE_API_KEY=...
|
|
717
|
+
```
|
|
718
|
+
|
|
719
|
+
## Do You Need Pinecone?
|
|
720
|
+
|
|
721
|
+
**No.** Supabase pgvector handles everything for typical RAG workloads:
|
|
722
|
+
- HNSW indexes for fast similarity search
|
|
723
|
+
- Hybrid search (vector + full-text) via SQL
|
|
724
|
+
- RLS for multi-tenant isolation
|
|
725
|
+
- No extra service, no extra cost, no vendor lock-in
|
|
726
|
+
|
|
727
|
+
**When Pinecone makes sense** (rare for Qualia projects):
|
|
728
|
+
- 10M+ vectors where pgvector HNSW gets slow
|
|
729
|
+
- Need serverless auto-scaling with zero ops
|
|
730
|
+
- Multi-region replication requirements
|
|
731
|
+
- Already paying for Pinecone in another system
|
|
732
|
+
|
|
733
|
+
For your scale (< 1M vectors per project), Supabase pgvector is the right call.
|
|
734
|
+
|
|
735
|
+
## Performance Tips
|
|
736
|
+
|
|
737
|
+
- **Chunk size**: 512 tokens is the sweet spot. Too small = noisy, too large = diluted.
|
|
738
|
+
- **Overlap**: 50-100 tokens prevents splitting context at chunk boundaries.
|
|
739
|
+
- **HNSW index**: Use `ef_construction=64, m=16` for < 1M rows. Increase for larger datasets.
|
|
740
|
+
- **Batch embeddings**: Always batch (up to 2048 per OpenAI call). Never embed one at a time.
|
|
741
|
+
- **Cache embeddings**: Store query embeddings for repeated queries.
|
|
742
|
+
- **Threshold tuning**: Start at 0.7, lower to 0.5 if too few results, raise to 0.8 if too noisy.
|
|
743
|
+
|
|
744
|
+
## Trigger Phrases
|
|
745
|
+
|
|
746
|
+
- "build RAG" / "set up RAG" / "create RAG pipeline"
|
|
747
|
+
- "vector search" / "semantic search" / "pgvector"
|
|
748
|
+
- "embed documents" / "embedding pipeline"
|
|
749
|
+
- "knowledge base" / "document Q&A" / "chat with docs"
|
|
750
|
+
- "retrieval augmented generation"
|