yt-embeddings-strapi-plugin 0.0.1
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +531 -0
- package/dist/_chunks/App-Cv1cdLAr.js +587 -0
- package/dist/_chunks/App-bN58O1bN.mjs +583 -0
- package/dist/_chunks/en-B4KWt_jN.js +4 -0
- package/dist/_chunks/en-Byx4XI2L.mjs +4 -0
- package/dist/_chunks/index-BAfBs5PQ.js +172 -0
- package/dist/_chunks/index-K6X5FM2O.mjs +173 -0
- package/dist/admin/index.js +4 -0
- package/dist/admin/index.mjs +5 -0
- package/dist/admin/src/components/Initializer.d.ts +5 -0
- package/dist/admin/src/components/PluginIcon.d.ts +2 -0
- package/dist/admin/src/components/custom/BackLink.d.ts +5 -0
- package/dist/admin/src/components/custom/ChatModal.d.ts +1 -0
- package/dist/admin/src/components/custom/EmbeddingsModal.d.ts +1 -0
- package/dist/admin/src/components/custom/EmbeddingsWidget.d.ts +1 -0
- package/dist/admin/src/components/custom/Illo.d.ts +1 -0
- package/dist/admin/src/components/custom/Markdown.d.ts +5 -0
- package/dist/admin/src/components/custom/RobotIcon.d.ts +6 -0
- package/dist/admin/src/index.d.ts +12 -0
- package/dist/admin/src/pages/App.d.ts +2 -0
- package/dist/admin/src/pages/EmbeddingDetails.d.ts +1 -0
- package/dist/admin/src/pages/HomePage.d.ts +1 -0
- package/dist/admin/src/pluginId.d.ts +1 -0
- package/dist/admin/src/utils/api.d.ts +81 -0
- package/dist/admin/src/utils/getTranslation.d.ts +2 -0
- package/dist/server/index.js +2220 -0
- package/dist/server/index.mjs +2203 -0
- package/dist/server/src/bootstrap.d.ts +5 -0
- package/dist/server/src/config/index.d.ts +38 -0
- package/dist/server/src/content-types/index.d.ts +2 -0
- package/dist/server/src/controllers/controller.d.ts +13 -0
- package/dist/server/src/controllers/index.d.ts +30 -0
- package/dist/server/src/controllers/mcp.d.ts +18 -0
- package/dist/server/src/controllers/yt-controller.d.ts +13 -0
- package/dist/server/src/destroy.d.ts +5 -0
- package/dist/server/src/index.d.ts +280 -0
- package/dist/server/src/mcp/index.d.ts +6 -0
- package/dist/server/src/mcp/schemas/index.d.ts +55 -0
- package/dist/server/src/mcp/server.d.ts +8 -0
- package/dist/server/src/mcp/tools/get-video-transcript-range.d.ts +33 -0
- package/dist/server/src/mcp/tools/get-yt-video-summary.d.ts +23 -0
- package/dist/server/src/mcp/tools/index.d.ts +38 -0
- package/dist/server/src/mcp/tools/list-yt-videos.d.ts +28 -0
- package/dist/server/src/mcp/tools/search-yt-knowledge.d.ts +51 -0
- package/dist/server/src/middlewares/index.d.ts +2 -0
- package/dist/server/src/migrations/002-yt-tables.d.ts +2 -0
- package/dist/server/src/plugin-manager.d.ts +81 -0
- package/dist/server/src/policies/index.d.ts +2 -0
- package/dist/server/src/register.d.ts +5 -0
- package/dist/server/src/routes/admin.d.ts +14 -0
- package/dist/server/src/routes/content-api.d.ts +20 -0
- package/dist/server/src/routes/index.d.ts +41 -0
- package/dist/server/src/services/ai-tools.d.ts +127 -0
- package/dist/server/src/services/index.d.ts +185 -0
- package/dist/server/src/services/yt-embeddings.d.ts +68 -0
- package/dist/server/src/services/yt-metadata.d.ts +12 -0
- package/dist/server/src/tools/get-video-transcript-range.d.ts +32 -0
- package/dist/server/src/tools/get-yt-video-summary.d.ts +36 -0
- package/dist/server/src/tools/index.d.ts +126 -0
- package/dist/server/src/tools/list-yt-videos.d.ts +25 -0
- package/dist/server/src/tools/search-yt-knowledge.d.ts +35 -0
- package/dist/server/src/utils/chunking.d.ts +44 -0
- package/dist/server/src/utils/preprocessing.d.ts +26 -0
- package/dist/server/src/utils/yt-chunker.d.ts +16 -0
- package/package.json +106 -0
package/README.md
ADDED
|
@@ -0,0 +1,531 @@
|
|
|
1
|
+
# Strapi Content Embeddings
|
|
2
|
+
|
|
3
|
+
A Strapi v5 plugin that creates vector embeddings from your content using OpenAI and stores them in Neon PostgreSQL with pgvector. Enables semantic search, RAG (Retrieval-Augmented Generation) chat, and MCP (Model Context Protocol) integration for AI assistants like Claude Desktop.
|
|
4
|
+
|
|
5
|
+
## Features
|
|
6
|
+
|
|
7
|
+
- **Vector Embeddings**: Generate embeddings from your content using OpenAI's embedding models
|
|
8
|
+
- **Neon PostgreSQL Storage**: Store embeddings in Neon DB with pgvector for efficient similarity search
|
|
9
|
+
- **RAG Chat Interface**: Built-in chat widget to ask questions about your content
|
|
10
|
+
- **MCP Server**: Expose your embeddings to AI assistants via Model Context Protocol
|
|
11
|
+
- **Content Manager Integration**: Create embeddings directly from any content type's edit view
|
|
12
|
+
- **Standalone Embeddings**: Create embeddings independent of content types
|
|
13
|
+
- **Multiple Embedding Models**: Support for OpenAI's text-embedding-3-small, text-embedding-3-large, and text-embedding-ada-002
|
|
14
|
+
- **Database Sync**: Sync embeddings from Neon DB to Strapi via admin UI or API endpoints
|
|
15
|
+
- **Automatic Chunking**: Split large content into multiple embeddings with overlap for context preservation
|
|
16
|
+
- **Content Preprocessing**: Automatically strips HTML and Markdown formatting for cleaner embeddings
|
|
17
|
+
|
|
18
|
+
## Requirements
|
|
19
|
+
|
|
20
|
+
- Strapi v5.x
|
|
21
|
+
- Node.js 18+
|
|
22
|
+
- OpenAI API key
|
|
23
|
+
- Neon PostgreSQL database with pgvector extension
|
|
24
|
+
|
|
25
|
+
## Installation
|
|
26
|
+
|
|
27
|
+
```bash
|
|
28
|
+
npm install strapi-content-embeddings
|
|
29
|
+
# or
|
|
30
|
+
yarn add strapi-content-embeddings
|
|
31
|
+
```
|
|
32
|
+
|
|
33
|
+
## Configuration
|
|
34
|
+
|
|
35
|
+
### 1. Enable the Plugin
|
|
36
|
+
|
|
37
|
+
Add the plugin to your `config/plugins.ts` (or `config/plugins.js`):
|
|
38
|
+
|
|
39
|
+
```typescript
|
|
40
|
+
export default ({ env }) => ({
|
|
41
|
+
"strapi-content-embeddings": {
|
|
42
|
+
enabled: true,
|
|
43
|
+
config: {
|
|
44
|
+
openAIApiKey: env("OPENAI_API_KEY"),
|
|
45
|
+
neonConnectionString: env("NEON_CONNECTION_STRING"),
|
|
46
|
+
// Optional: Choose embedding model (default: "text-embedding-3-small")
|
|
47
|
+
embeddingModel: env("EMBEDDING_MODEL", "text-embedding-3-small"),
|
|
48
|
+
},
|
|
49
|
+
},
|
|
50
|
+
});
|
|
51
|
+
```
|
|
52
|
+
|
|
53
|
+
### 2. Set Environment Variables
|
|
54
|
+
|
|
55
|
+
Add the following to your `.env` file:
|
|
56
|
+
|
|
57
|
+
```bash
|
|
58
|
+
OPENAI_API_KEY=sk-your-openai-api-key
|
|
59
|
+
NEON_CONNECTION_STRING=postgresql://user:password@your-neon-host.neon.tech/dbname?sslmode=require
|
|
60
|
+
# Optional
|
|
61
|
+
EMBEDDING_MODEL=text-embedding-3-small
|
|
62
|
+
```
|
|
63
|
+
|
|
64
|
+
### 3. Get Your Neon Connection String
|
|
65
|
+
|
|
66
|
+
1. Sign up at [Neon](https://neon.tech)
|
|
67
|
+
2. Create a new project
|
|
68
|
+
3. Navigate to your project's **Connection Details**
|
|
69
|
+
4. Copy the connection string (it should look like `postgresql://user:password@ep-xxx.region.aws.neon.tech/dbname?sslmode=require`)
|
|
70
|
+
|
|
71
|
+
The plugin will automatically:
|
|
72
|
+
- Enable the pgvector extension
|
|
73
|
+
- Create the `embeddings_documents` table
|
|
74
|
+
- Set up HNSW indexes for fast similarity search
|
|
75
|
+
|
|
76
|
+
## MCP Integration
|
|
77
|
+
|
|
78
|
+
This plugin exposes an MCP (Model Context Protocol) server that allows AI assistants like Claude Desktop to search your embeddings.
|
|
79
|
+
|
|
80
|
+
### MCP Endpoint
|
|
81
|
+
|
|
82
|
+
```
|
|
83
|
+
POST /api/strapi-content-embeddings/mcp
|
|
84
|
+
```
|
|
85
|
+
|
|
86
|
+
### Available MCP Tools
|
|
87
|
+
|
|
88
|
+
| Tool | Description | Trigger |
|
|
89
|
+
|------|-------------|---------|
|
|
90
|
+
| `rag_query` | Ask questions and get AI-generated answers from your content | `/rag [question]` |
|
|
91
|
+
| `semantic_search` | Find semantically similar content | `/rag search [query]` |
|
|
92
|
+
| `list_embeddings` | List all stored embeddings | - |
|
|
93
|
+
| `get_embedding` | Get a specific embedding by ID | - |
|
|
94
|
+
| `create_embedding` | Create a new embedding | - |
|
|
95
|
+
|
|
96
|
+
### Claude Desktop Configuration
|
|
97
|
+
|
|
98
|
+
Add to your Claude Desktop config (`~/Library/Application Support/Claude/claude_desktop_config.json`):
|
|
99
|
+
|
|
100
|
+
```json
|
|
101
|
+
{
|
|
102
|
+
"mcpServers": {
|
|
103
|
+
"strapi-content-embeddings": {
|
|
104
|
+
"command": "npx",
|
|
105
|
+
"args": [
|
|
106
|
+
"mcp-remote",
|
|
107
|
+
"https://your-strapi-url.com/api/strapi-content-embeddings/mcp",
|
|
108
|
+
"--header",
|
|
109
|
+
"Authorization: Bearer YOUR_STRAPI_API_TOKEN"
|
|
110
|
+
]
|
|
111
|
+
}
|
|
112
|
+
}
|
|
113
|
+
}
|
|
114
|
+
```
|
|
115
|
+
|
|
116
|
+
### Usage in Claude Desktop
|
|
117
|
+
|
|
118
|
+
Type `/rag` followed by your question to search your embeddings:
|
|
119
|
+
|
|
120
|
+
```
|
|
121
|
+
/rag What is Strapi?
|
|
122
|
+
/rag Who is Paul Bratslavsky?
|
|
123
|
+
```
|
|
124
|
+
|
|
125
|
+
## Available Embedding Models
|
|
126
|
+
|
|
127
|
+
| Model | Dimensions | Description |
|
|
128
|
+
|-------|------------|-------------|
|
|
129
|
+
| `text-embedding-3-small` | 1536 | Fast, cost-effective (default) |
|
|
130
|
+
| `text-embedding-3-large` | 3072 | Higher accuracy, more expensive |
|
|
131
|
+
| `text-embedding-ada-002` | 1536 | Legacy model |
|
|
132
|
+
|
|
133
|
+
## Usage
|
|
134
|
+
|
|
135
|
+
### Admin Panel
|
|
136
|
+
|
|
137
|
+
#### Create Embeddings Page
|
|
138
|
+
|
|
139
|
+
Navigate to **Content Embeddings** in the Strapi admin sidebar to:
|
|
140
|
+
- View all existing embeddings
|
|
141
|
+
- Create new standalone embeddings
|
|
142
|
+
- Delete embeddings
|
|
143
|
+
- Search and filter embeddings
|
|
144
|
+
|
|
145
|
+
#### Content Manager Integration
|
|
146
|
+
|
|
147
|
+
When editing any content type, you'll see an **Embeddings** panel in the right sidebar:
|
|
148
|
+
- **Create Embedding**: Generate an embedding from the current content
|
|
149
|
+
- **View Embedding**: Navigate to the embedding details
|
|
150
|
+
- **Update Embedding**: Update the embedding with current content changes
|
|
151
|
+
|
|
152
|
+
#### Chat Widget
|
|
153
|
+
|
|
154
|
+
Click the robot icon in the bottom-right corner to open the RAG chat interface:
|
|
155
|
+
- Ask questions about your embedded content
|
|
156
|
+
- View source documents used to generate answers
|
|
157
|
+
- Navigate to source embeddings
|
|
158
|
+
|
|
159
|
+
### Programmatic Usage
|
|
160
|
+
|
|
161
|
+
#### Create an Embedding
|
|
162
|
+
|
|
163
|
+
```typescript
|
|
164
|
+
const result = await strapi
|
|
165
|
+
.plugin("strapi-content-embeddings")
|
|
166
|
+
.service("embeddings")
|
|
167
|
+
.createEmbedding({
|
|
168
|
+
data: {
|
|
169
|
+
title: "My Document",
|
|
170
|
+
content: "This is the content to embed...",
|
|
171
|
+
collectionType: "api::article.article", // optional
|
|
172
|
+
fieldName: "content", // optional
|
|
173
|
+
metadata: { customField: "value" }, // optional
|
|
174
|
+
},
|
|
175
|
+
});
|
|
176
|
+
```
|
|
177
|
+
|
|
178
|
+
#### Query Embeddings (RAG)
|
|
179
|
+
|
|
180
|
+
```typescript
|
|
181
|
+
const response = await strapi
|
|
182
|
+
.plugin("strapi-content-embeddings")
|
|
183
|
+
.service("embeddings")
|
|
184
|
+
.queryEmbeddings("What is this document about?");
|
|
185
|
+
|
|
186
|
+
// response.text - The AI-generated answer
|
|
187
|
+
// response.sourceDocuments - The documents used for context
|
|
188
|
+
```
|
|
189
|
+
|
|
190
|
+
#### Similarity Search
|
|
191
|
+
|
|
192
|
+
```typescript
|
|
193
|
+
const documents = await strapi
|
|
194
|
+
.plugin("strapi-content-embeddings")
|
|
195
|
+
.service("embeddings")
|
|
196
|
+
.similaritySearch("search query", 4); // returns top 4 similar documents
|
|
197
|
+
```
|
|
198
|
+
|
|
199
|
+
## API Endpoints
|
|
200
|
+
|
|
201
|
+
All endpoints require admin authentication.
|
|
202
|
+
|
|
203
|
+
| Method | Endpoint | Description |
|
|
204
|
+
|--------|----------|-------------|
|
|
205
|
+
| `POST` | `/strapi-content-embeddings/embeddings/create-embedding` | Create a new embedding |
|
|
206
|
+
| `PUT` | `/strapi-content-embeddings/embeddings/update-embedding/:id` | Update an existing embedding |
|
|
207
|
+
| `DELETE` | `/strapi-content-embeddings/embeddings/delete-embedding/:id` | Delete an embedding |
|
|
208
|
+
| `GET` | `/strapi-content-embeddings/embeddings/find` | List all embeddings |
|
|
209
|
+
| `GET` | `/strapi-content-embeddings/embeddings/find/:id` | Get a single embedding |
|
|
210
|
+
| `GET` | `/strapi-content-embeddings/embeddings/embeddings-query?query=...` | RAG query |
|
|
211
|
+
|
|
212
|
+
## Database Sync (Neon to Strapi)
|
|
213
|
+
|
|
214
|
+
The plugin provides endpoints to sync embeddings from Neon DB (source of truth) to Strapi. These endpoints are designed to be triggered manually or via cron jobs.
|
|
215
|
+
|
|
216
|
+
### Sync Endpoints
|
|
217
|
+
|
|
218
|
+
| Method | Endpoint | Description |
|
|
219
|
+
|--------|----------|-------------|
|
|
220
|
+
| `GET/POST` | `/api/strapi-content-embeddings/sync` | Sync embeddings from Neon to Strapi |
|
|
221
|
+
| `GET` | `/api/strapi-content-embeddings/sync/status` | Check sync status without making changes |
|
|
222
|
+
|
|
223
|
+
### Query Parameters
|
|
224
|
+
|
|
225
|
+
| Parameter | Type | Default | Description |
|
|
226
|
+
|-----------|------|---------|-------------|
|
|
227
|
+
| `dryRun` | boolean | `false` | Preview changes without applying them |
|
|
228
|
+
| `removeOrphans` | boolean | `false` | Remove Strapi entries that don't exist in Neon |
|
|
229
|
+
|
|
230
|
+
### Usage Examples
|
|
231
|
+
|
|
232
|
+
**Check sync status:**
|
|
233
|
+
```bash
|
|
234
|
+
curl "http://localhost:1337/api/strapi-content-embeddings/sync/status" \
|
|
235
|
+
-H "Authorization: Bearer YOUR_API_TOKEN"
|
|
236
|
+
```
|
|
237
|
+
|
|
238
|
+
Response:
|
|
239
|
+
```json
|
|
240
|
+
{
|
|
241
|
+
"neonCount": 150,
|
|
242
|
+
"strapiCount": 145,
|
|
243
|
+
"inSync": false,
|
|
244
|
+
"missingInStrapi": 5,
|
|
245
|
+
"missingInNeon": 0,
|
|
246
|
+
"contentDifferences": 2
|
|
247
|
+
}
|
|
248
|
+
```
|
|
249
|
+
|
|
250
|
+
**Dry run (preview changes):**
|
|
251
|
+
```bash
|
|
252
|
+
curl "http://localhost:1337/api/strapi-content-embeddings/sync?dryRun=true" \
|
|
253
|
+
-H "Authorization: Bearer YOUR_API_TOKEN"
|
|
254
|
+
```
|
|
255
|
+
|
|
256
|
+
**Run sync:**
|
|
257
|
+
```bash
|
|
258
|
+
curl "http://localhost:1337/api/strapi-content-embeddings/sync" \
|
|
259
|
+
-H "Authorization: Bearer YOUR_API_TOKEN"
|
|
260
|
+
```
|
|
261
|
+
|
|
262
|
+
**Sync and remove orphans:**
|
|
263
|
+
```bash
|
|
264
|
+
curl "http://localhost:1337/api/strapi-content-embeddings/sync?removeOrphans=true" \
|
|
265
|
+
-H "Authorization: Bearer YOUR_API_TOKEN"
|
|
266
|
+
```
|
|
267
|
+
|
|
268
|
+
### Sync Response
|
|
269
|
+
|
|
270
|
+
```json
|
|
271
|
+
{
|
|
272
|
+
"success": true,
|
|
273
|
+
"timestamp": "2024-01-07T12:00:00.000Z",
|
|
274
|
+
"neonCount": 150,
|
|
275
|
+
"strapiCount": 150,
|
|
276
|
+
"actions": {
|
|
277
|
+
"created": 5,
|
|
278
|
+
"updated": 2,
|
|
279
|
+
"orphansRemoved": 0
|
|
280
|
+
},
|
|
281
|
+
"details": {
|
|
282
|
+
"created": ["doc1 (Title 1)", "doc2 (Title 2)"],
|
|
283
|
+
"updated": ["doc3 (Title 3)"],
|
|
284
|
+
"orphansRemoved": []
|
|
285
|
+
},
|
|
286
|
+
"errors": []
|
|
287
|
+
}
|
|
288
|
+
```
|
|
289
|
+
|
|
290
|
+
### Cron Job Example
|
|
291
|
+
|
|
292
|
+
```bash
|
|
293
|
+
# Sync every hour
|
|
294
|
+
0 * * * * curl -s "https://your-strapi.com/api/strapi-content-embeddings/sync" \
|
|
295
|
+
-H "Authorization: Bearer YOUR_API_TOKEN" >> /var/log/embeddings-sync.log
|
|
296
|
+
```
|
|
297
|
+
|
|
298
|
+
## Content Chunking
|
|
299
|
+
|
|
300
|
+
For large content that exceeds the recommended size for embeddings (~4000 characters / ~1000 tokens), the plugin supports automatic chunking.
|
|
301
|
+
|
|
302
|
+
### How Chunking Works
|
|
303
|
+
|
|
304
|
+
1. **Smart Splitting**: Content is split at natural boundaries (paragraphs, sentences, words) to preserve meaning
|
|
305
|
+
2. **Overlap**: Chunks include overlapping content (default: 200 chars) to maintain context between chunks
|
|
306
|
+
3. **Metadata**: Each chunk stores metadata linking it to the original content and other chunks
|
|
307
|
+
4. **Titles**: Chunk titles include part numbers (e.g., "My Document [Part 1/3]")
|
|
308
|
+
|
|
309
|
+
### Configuration
|
|
310
|
+
|
|
311
|
+
Add chunking options to your plugin config:
|
|
312
|
+
|
|
313
|
+
```typescript
|
|
314
|
+
// config/plugins.ts
|
|
315
|
+
export default ({ env }) => ({
|
|
316
|
+
"strapi-content-embeddings": {
|
|
317
|
+
enabled: true,
|
|
318
|
+
config: {
|
|
319
|
+
openAIApiKey: env("OPENAI_API_KEY"),
|
|
320
|
+
neonConnectionString: env("NEON_CONNECTION_STRING"),
|
|
321
|
+
// Chunking options
|
|
322
|
+
chunkSize: 4000, // Max characters per chunk (default: 4000)
|
|
323
|
+
chunkOverlap: 200, // Overlap between chunks (default: 200)
|
|
324
|
+
autoChunk: false, // Auto-chunk large content globally (default: false)
|
|
325
|
+
},
|
|
326
|
+
},
|
|
327
|
+
});
|
|
328
|
+
```
|
|
329
|
+
|
|
330
|
+
### Using Chunking
|
|
331
|
+
|
|
332
|
+
#### Via MCP Tool
|
|
333
|
+
|
|
334
|
+
```json
|
|
335
|
+
{
|
|
336
|
+
"tool": "create_embedding",
|
|
337
|
+
"arguments": {
|
|
338
|
+
"title": "My Long Document",
|
|
339
|
+
"content": "... very long content ...",
|
|
340
|
+
"autoChunk": true
|
|
341
|
+
}
|
|
342
|
+
}
|
|
343
|
+
```
|
|
344
|
+
|
|
345
|
+
#### Programmatic Usage
|
|
346
|
+
|
|
347
|
+
```typescript
|
|
348
|
+
// Create with automatic chunking
|
|
349
|
+
const result = await strapi
|
|
350
|
+
.plugin("strapi-content-embeddings")
|
|
351
|
+
.service("embeddings")
|
|
352
|
+
.createChunkedEmbedding({
|
|
353
|
+
data: {
|
|
354
|
+
title: "My Long Document",
|
|
355
|
+
content: "... very long content ...",
|
|
356
|
+
},
|
|
357
|
+
});
|
|
358
|
+
|
|
359
|
+
console.log(result);
|
|
360
|
+
// {
|
|
361
|
+
// entity: { ... first chunk ... },
|
|
362
|
+
// chunks: [ ... all chunks ... ],
|
|
363
|
+
// totalChunks: 5,
|
|
364
|
+
// wasChunked: true
|
|
365
|
+
// }
|
|
366
|
+
|
|
367
|
+
// Or use createEmbedding with autoChunk flag
|
|
368
|
+
const embedding = await strapi
|
|
369
|
+
.plugin("strapi-content-embeddings")
|
|
370
|
+
.service("embeddings")
|
|
371
|
+
.createEmbedding({
|
|
372
|
+
data: {
|
|
373
|
+
title: "My Document",
|
|
374
|
+
content: "... long content ...",
|
|
375
|
+
autoChunk: true, // Enable chunking
|
|
376
|
+
},
|
|
377
|
+
});
|
|
378
|
+
```
|
|
379
|
+
|
|
380
|
+
### Chunk Metadata
|
|
381
|
+
|
|
382
|
+
Each chunk embedding includes metadata:
|
|
383
|
+
|
|
384
|
+
```json
|
|
385
|
+
{
|
|
386
|
+
"isChunk": true,
|
|
387
|
+
"chunkIndex": 0,
|
|
388
|
+
"totalChunks": 5,
|
|
389
|
+
"startOffset": 0,
|
|
390
|
+
"endOffset": 4200,
|
|
391
|
+
"originalTitle": "My Long Document",
|
|
392
|
+
"parentDocumentId": "abc123",
|
|
393
|
+
"estimatedTokens": 1050
|
|
394
|
+
}
|
|
395
|
+
```
|
|
396
|
+
|
|
397
|
+
## Content Preprocessing
|
|
398
|
+
|
|
399
|
+
The plugin automatically preprocesses content before creating embeddings to improve semantic search quality. This is enabled by default.
|
|
400
|
+
|
|
401
|
+
### What Gets Cleaned
|
|
402
|
+
|
|
403
|
+
- **HTML tags**: Stripped while preserving text content
|
|
404
|
+
- **Markdown syntax**: Headers (`#`), bold (`**`), italic (`*`), links, lists, code blocks
|
|
405
|
+
- **Whitespace**: Normalized (multiple spaces/newlines collapsed)
|
|
406
|
+
|
|
407
|
+
### Why Preprocess?
|
|
408
|
+
|
|
409
|
+
Raw markdown/HTML formatting adds noise to embeddings without adding semantic meaning:
|
|
410
|
+
|
|
411
|
+
```
|
|
412
|
+
Input: "## Features\n- **Fast** search\n- <b>Reliable</b>"
|
|
413
|
+
Output: "Features: Fast search. Reliable"
|
|
414
|
+
```
|
|
415
|
+
|
|
416
|
+
Both produce the same semantic meaning, but the cleaned version creates better embeddings for search.
|
|
417
|
+
|
|
418
|
+
### Configuration
|
|
419
|
+
|
|
420
|
+
Preprocessing is enabled by default. To disable:
|
|
421
|
+
|
|
422
|
+
```typescript
|
|
423
|
+
// config/plugins.ts
|
|
424
|
+
export default ({ env }) => ({
|
|
425
|
+
"strapi-content-embeddings": {
|
|
426
|
+
enabled: true,
|
|
427
|
+
config: {
|
|
428
|
+
openAIApiKey: env("OPENAI_API_KEY"),
|
|
429
|
+
neonConnectionString: env("NEON_CONNECTION_STRING"),
|
|
430
|
+
preprocessContent: false, // Disable preprocessing
|
|
431
|
+
},
|
|
432
|
+
},
|
|
433
|
+
});
|
|
434
|
+
```
|
|
435
|
+
|
|
436
|
+
> **Note**: The original content is always preserved in Strapi. Preprocessing only affects the text sent to OpenAI for embedding generation.
|
|
437
|
+
|
|
438
|
+
## Admin Sync UI
|
|
439
|
+
|
|
440
|
+
The plugin includes a built-in sync interface accessible from the admin panel. Click the **Sync** button in the Content Embeddings page header.
|
|
441
|
+
|
|
442
|
+
### Available Operations
|
|
443
|
+
|
|
444
|
+
| Operation | Description |
|
|
445
|
+
|-----------|-------------|
|
|
446
|
+
| **Check Status** | Compare Neon and Strapi databases, shows counts and differences |
|
|
447
|
+
| **Sync from Neon** | Import embeddings from Neon to Strapi (with preview option) |
|
|
448
|
+
| **Recreate All** | Delete all Neon embeddings and recreate from Strapi data |
|
|
449
|
+
|
|
450
|
+
### Sync Workflow
|
|
451
|
+
|
|
452
|
+
1. Click **Sync** button to open the sync modal
|
|
453
|
+
2. View current sync status (Neon vs Strapi counts)
|
|
454
|
+
3. Select **Sync from Neon** operation
|
|
455
|
+
4. Click **Preview Sync** to see what changes would be made
|
|
456
|
+
5. Review the results (Created/Updated/Removed counts)
|
|
457
|
+
6. Click **Apply Changes** to execute the sync
|
|
458
|
+
|
|
459
|
+
### Sync Options
|
|
460
|
+
|
|
461
|
+
- **Dry Run**: Preview changes without applying them (enabled by default)
|
|
462
|
+
- **Remove Orphans**: Delete Strapi entries that don't exist in Neon
|
|
463
|
+
|
|
464
|
+
## How It Works
|
|
465
|
+
|
|
466
|
+
1. **Embedding Creation**: When you create an embedding, the content is sent to OpenAI's embedding API to generate a vector representation (1536 or 3072 dimensions depending on the model).
|
|
467
|
+
|
|
468
|
+
2. **Storage**: The embedding vector is stored in Neon PostgreSQL using the pgvector extension, along with the content and metadata.
|
|
469
|
+
|
|
470
|
+
3. **Similarity Search**: When querying, the search query is converted to an embedding and compared against stored embeddings using cosine similarity via pgvector's HNSW index.
|
|
471
|
+
|
|
472
|
+
4. **RAG Response**: For chat queries, the most relevant documents are retrieved and passed to GPT-4o-mini as context to generate an accurate response.
|
|
473
|
+
|
|
474
|
+
## Database Schema
|
|
475
|
+
|
|
476
|
+
The plugin creates an `embeddings_documents` table in your Neon database:
|
|
477
|
+
|
|
478
|
+
```sql
|
|
479
|
+
CREATE TABLE embeddings_documents (
|
|
480
|
+
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
|
|
481
|
+
content TEXT,
|
|
482
|
+
metadata JSONB,
|
|
483
|
+
embedding vector(1536) -- or 3072 for text-embedding-3-large
|
|
484
|
+
);
|
|
485
|
+
```
|
|
486
|
+
|
|
487
|
+
Indexes:
|
|
488
|
+
- HNSW index on `embedding` for fast similarity search
|
|
489
|
+
- GIN index on `metadata` for filtering
|
|
490
|
+
|
|
491
|
+
## Permissions
|
|
492
|
+
|
|
493
|
+
The plugin registers the following RBAC permissions:
|
|
494
|
+
- `plugin::strapi-content-embeddings.read` - View embeddings
|
|
495
|
+
- `plugin::strapi-content-embeddings.create` - Create embeddings
|
|
496
|
+
- `plugin::strapi-content-embeddings.update` - Update embeddings
|
|
497
|
+
- `plugin::strapi-content-embeddings.delete` - Delete embeddings
|
|
498
|
+
- `plugin::strapi-content-embeddings.chat` - Use the RAG chat feature
|
|
499
|
+
|
|
500
|
+
Configure these in **Settings > Roles** for each admin role.
|
|
501
|
+
|
|
502
|
+
## Troubleshooting
|
|
503
|
+
|
|
504
|
+
### Embeddings not being created
|
|
505
|
+
|
|
506
|
+
1. Check that `OPENAI_API_KEY` is set correctly
|
|
507
|
+
2. Check that `NEON_CONNECTION_STRING` is valid
|
|
508
|
+
3. Look for errors in the Strapi console
|
|
509
|
+
|
|
510
|
+
### Chat returns "cannot find the answer"
|
|
511
|
+
|
|
512
|
+
1. Ensure embeddings exist in the database
|
|
513
|
+
2. Try creating more specific content
|
|
514
|
+
3. Check that the embedding model matches between creation and query
|
|
515
|
+
|
|
516
|
+
### Connection errors
|
|
517
|
+
|
|
518
|
+
1. Verify your Neon connection string includes `?sslmode=require`
|
|
519
|
+
2. Check that your Neon project is active (not paused)
|
|
520
|
+
3. Ensure the pgvector extension is enabled
|
|
521
|
+
|
|
522
|
+
### MCP not connecting
|
|
523
|
+
|
|
524
|
+
1. Verify the MCP endpoint URL is correct
|
|
525
|
+
2. Check the Authorization header has a valid Strapi API token
|
|
526
|
+
3. Ensure the plugin is properly configured and Strapi is running
|
|
527
|
+
|
|
528
|
+
## License
|
|
529
|
+
|
|
530
|
+
MIT
|
|
531
|
+
# yt-embeddings-strapi-plugin
|