strapi-content-embeddings 0.1.4 → 0.1.6
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +187 -0
- package/dist/_chunks/{App-4UwemHRe.mjs → App-C5NFY1UT.mjs} +287 -103
- package/dist/_chunks/{App-CnXhqiao.js → App-CA5bQnKQ.js} +286 -102
- package/dist/_chunks/{index-BWSiu_nE.mjs → index-CIpGvEcJ.mjs} +122 -104
- package/dist/_chunks/{index-BaPVw3mi.js → index-CVCA8dDp.js} +119 -101
- package/dist/admin/index.js +1 -1
- package/dist/admin/index.mjs +1 -1
- package/dist/admin/src/components/custom/EmbeddingsTable.d.ts +1 -1
- package/dist/admin/src/components/custom/MarkdownEditor.d.ts +1 -1
- package/dist/server/index.js +1137 -84
- package/dist/server/index.mjs +1137 -84
- package/dist/server/src/config/index.d.ts +9 -0
- package/dist/server/src/controllers/controller.d.ts +32 -0
- package/dist/server/src/controllers/index.d.ts +5 -0
- package/dist/server/src/index.d.ts +42 -2
- package/dist/server/src/mcp/tools/create-embedding.d.ts +6 -0
- package/dist/server/src/mcp/tools/index.d.ts +4 -0
- package/dist/server/src/plugin-manager.d.ts +32 -0
- package/dist/server/src/routes/content-api.d.ts +10 -0
- package/dist/server/src/routes/index.d.ts +10 -0
- package/dist/server/src/services/embeddings.d.ts +43 -2
- package/dist/server/src/services/index.d.ts +24 -2
- package/dist/server/src/services/sync.d.ts +71 -0
- package/dist/server/src/utils/chunking.d.ts +44 -0
- package/package.json +1 -1
package/README.md
CHANGED
|
@@ -11,6 +11,8 @@ A Strapi v5 plugin that creates vector embeddings from your content using OpenAI
|
|
|
11
11
|
- **Content Manager Integration**: Create embeddings directly from any content type's edit view
|
|
12
12
|
- **Standalone Embeddings**: Create embeddings independent of content types
|
|
13
13
|
- **Multiple Embedding Models**: Support for OpenAI's text-embedding-3-small, text-embedding-3-large, and text-embedding-ada-002
|
|
14
|
+
- **Database Sync**: Sync embeddings from Neon DB to Strapi with cron-compatible endpoints
|
|
15
|
+
- **Automatic Chunking**: Split large content into multiple embeddings with overlap for context preservation
|
|
14
16
|
|
|
15
17
|
## Requirements
|
|
16
18
|
|
|
@@ -206,6 +208,191 @@ All endpoints require admin authentication.
|
|
|
206
208
|
| `GET` | `/strapi-content-embeddings/embeddings/find/:id` | Get a single embedding |
|
|
207
209
|
| `GET` | `/strapi-content-embeddings/embeddings/embeddings-query?query=...` | RAG query |
|
|
208
210
|
|
|
211
|
+
## Database Sync (Neon to Strapi)
|
|
212
|
+
|
|
213
|
+
The plugin provides endpoints to sync embeddings from Neon DB (source of truth) to Strapi. These endpoints are designed to be triggered manually or via cron jobs.
|
|
214
|
+
|
|
215
|
+
### Sync Endpoints
|
|
216
|
+
|
|
217
|
+
| Method | Endpoint | Description |
|
|
218
|
+
|--------|----------|-------------|
|
|
219
|
+
| `GET/POST` | `/api/strapi-content-embeddings/sync` | Sync embeddings from Neon to Strapi |
|
|
220
|
+
| `GET` | `/api/strapi-content-embeddings/sync/status` | Check sync status without making changes |
|
|
221
|
+
|
|
222
|
+
### Query Parameters
|
|
223
|
+
|
|
224
|
+
| Parameter | Type | Default | Description |
|
|
225
|
+
|-----------|------|---------|-------------|
|
|
226
|
+
| `dryRun` | boolean | `false` | Preview changes without applying them |
|
|
227
|
+
| `removeOrphans` | boolean | `false` | Remove Strapi entries that don't exist in Neon |
|
|
228
|
+
|
|
229
|
+
### Usage Examples
|
|
230
|
+
|
|
231
|
+
**Check sync status:**
|
|
232
|
+
```bash
|
|
233
|
+
curl "http://localhost:1337/api/strapi-content-embeddings/sync/status" \
|
|
234
|
+
-H "Authorization: Bearer YOUR_API_TOKEN"
|
|
235
|
+
```
|
|
236
|
+
|
|
237
|
+
Response:
|
|
238
|
+
```json
|
|
239
|
+
{
|
|
240
|
+
"neonCount": 150,
|
|
241
|
+
"strapiCount": 145,
|
|
242
|
+
"inSync": false,
|
|
243
|
+
"missingInStrapi": 5,
|
|
244
|
+
"missingInNeon": 0,
|
|
245
|
+
"contentDifferences": 2
|
|
246
|
+
}
|
|
247
|
+
```
|
|
248
|
+
|
|
249
|
+
**Dry run (preview changes):**
|
|
250
|
+
```bash
|
|
251
|
+
curl "http://localhost:1337/api/strapi-content-embeddings/sync?dryRun=true" \
|
|
252
|
+
-H "Authorization: Bearer YOUR_API_TOKEN"
|
|
253
|
+
```
|
|
254
|
+
|
|
255
|
+
**Run sync:**
|
|
256
|
+
```bash
|
|
257
|
+
curl "http://localhost:1337/api/strapi-content-embeddings/sync" \
|
|
258
|
+
-H "Authorization: Bearer YOUR_API_TOKEN"
|
|
259
|
+
```
|
|
260
|
+
|
|
261
|
+
**Sync and remove orphans:**
|
|
262
|
+
```bash
|
|
263
|
+
curl "http://localhost:1337/api/strapi-content-embeddings/sync?removeOrphans=true" \
|
|
264
|
+
-H "Authorization: Bearer YOUR_API_TOKEN"
|
|
265
|
+
```
|
|
266
|
+
|
|
267
|
+
### Sync Response
|
|
268
|
+
|
|
269
|
+
```json
|
|
270
|
+
{
|
|
271
|
+
"success": true,
|
|
272
|
+
"timestamp": "2024-01-07T12:00:00.000Z",
|
|
273
|
+
"neonCount": 150,
|
|
274
|
+
"strapiCount": 150,
|
|
275
|
+
"actions": {
|
|
276
|
+
"created": 5,
|
|
277
|
+
"updated": 2,
|
|
278
|
+
"orphansRemoved": 0
|
|
279
|
+
},
|
|
280
|
+
"details": {
|
|
281
|
+
"created": ["doc1 (Title 1)", "doc2 (Title 2)"],
|
|
282
|
+
"updated": ["doc3 (Title 3)"],
|
|
283
|
+
"orphansRemoved": []
|
|
284
|
+
},
|
|
285
|
+
"errors": []
|
|
286
|
+
}
|
|
287
|
+
```
|
|
288
|
+
|
|
289
|
+
### Cron Job Example
|
|
290
|
+
|
|
291
|
+
```bash
|
|
292
|
+
# Sync every hour
|
|
293
|
+
0 * * * * curl -s "https://your-strapi.com/api/strapi-content-embeddings/sync" \
|
|
294
|
+
-H "Authorization: Bearer YOUR_API_TOKEN" >> /var/log/embeddings-sync.log
|
|
295
|
+
```
|
|
296
|
+
|
|
297
|
+
## Content Chunking
|
|
298
|
+
|
|
299
|
+
For large content that exceeds the recommended size for embeddings (~4000 characters / ~1000 tokens), the plugin supports automatic chunking.
|
|
300
|
+
|
|
301
|
+
### How Chunking Works
|
|
302
|
+
|
|
303
|
+
1. **Smart Splitting**: Content is split at natural boundaries (paragraphs, sentences, words) to preserve meaning
|
|
304
|
+
2. **Overlap**: Chunks include overlapping content (default: 200 chars) to maintain context between chunks
|
|
305
|
+
3. **Metadata**: Each chunk stores metadata linking it to the original content and other chunks
|
|
306
|
+
4. **Titles**: Chunk titles include part numbers (e.g., "My Document [Part 1/3]")
|
|
307
|
+
|
|
308
|
+
### Configuration
|
|
309
|
+
|
|
310
|
+
Add chunking options to your plugin config:
|
|
311
|
+
|
|
312
|
+
```typescript
|
|
313
|
+
// config/plugins.ts
|
|
314
|
+
export default ({ env }) => ({
|
|
315
|
+
"strapi-content-embeddings": {
|
|
316
|
+
enabled: true,
|
|
317
|
+
config: {
|
|
318
|
+
openAIApiKey: env("OPENAI_API_KEY"),
|
|
319
|
+
neonConnectionString: env("NEON_CONNECTION_STRING"),
|
|
320
|
+
// Chunking options
|
|
321
|
+
chunkSize: 4000, // Max characters per chunk (default: 4000)
|
|
322
|
+
chunkOverlap: 200, // Overlap between chunks (default: 200)
|
|
323
|
+
autoChunk: false, // Auto-chunk large content globally (default: false)
|
|
324
|
+
},
|
|
325
|
+
},
|
|
326
|
+
});
|
|
327
|
+
```
|
|
328
|
+
|
|
329
|
+
### Using Chunking
|
|
330
|
+
|
|
331
|
+
#### Via MCP Tool
|
|
332
|
+
|
|
333
|
+
```json
|
|
334
|
+
{
|
|
335
|
+
"tool": "create_embedding",
|
|
336
|
+
"arguments": {
|
|
337
|
+
"title": "My Long Document",
|
|
338
|
+
"content": "... very long content ...",
|
|
339
|
+
"autoChunk": true
|
|
340
|
+
}
|
|
341
|
+
}
|
|
342
|
+
```
|
|
343
|
+
|
|
344
|
+
#### Programmatic Usage
|
|
345
|
+
|
|
346
|
+
```typescript
|
|
347
|
+
// Create with automatic chunking
|
|
348
|
+
const result = await strapi
|
|
349
|
+
.plugin("strapi-content-embeddings")
|
|
350
|
+
.service("embeddings")
|
|
351
|
+
.createChunkedEmbedding({
|
|
352
|
+
data: {
|
|
353
|
+
title: "My Long Document",
|
|
354
|
+
content: "... very long content ...",
|
|
355
|
+
},
|
|
356
|
+
});
|
|
357
|
+
|
|
358
|
+
console.log(result);
|
|
359
|
+
// {
|
|
360
|
+
// entity: { ... first chunk ... },
|
|
361
|
+
// chunks: [ ... all chunks ... ],
|
|
362
|
+
// totalChunks: 5,
|
|
363
|
+
// wasChunked: true
|
|
364
|
+
// }
|
|
365
|
+
|
|
366
|
+
// Or use createEmbedding with autoChunk flag
|
|
367
|
+
const embedding = await strapi
|
|
368
|
+
.plugin("strapi-content-embeddings")
|
|
369
|
+
.service("embeddings")
|
|
370
|
+
.createEmbedding({
|
|
371
|
+
data: {
|
|
372
|
+
title: "My Document",
|
|
373
|
+
content: "... long content ...",
|
|
374
|
+
autoChunk: true, // Enable chunking
|
|
375
|
+
},
|
|
376
|
+
});
|
|
377
|
+
```
|
|
378
|
+
|
|
379
|
+
### Chunk Metadata
|
|
380
|
+
|
|
381
|
+
Each chunk embedding includes metadata:
|
|
382
|
+
|
|
383
|
+
```json
|
|
384
|
+
{
|
|
385
|
+
"isChunk": true,
|
|
386
|
+
"chunkIndex": 0,
|
|
387
|
+
"totalChunks": 5,
|
|
388
|
+
"startOffset": 0,
|
|
389
|
+
"endOffset": 4200,
|
|
390
|
+
"originalTitle": "My Long Document",
|
|
391
|
+
"parentDocumentId": "abc123",
|
|
392
|
+
"estimatedTokens": 1050
|
|
393
|
+
}
|
|
394
|
+
```
|
|
395
|
+
|
|
209
396
|
## How It Works
|
|
210
397
|
|
|
211
398
|
1. **Embedding Creation**: When you create an embedding, the content is sent to OpenAI's embedding API to generate a vector representation (1536 or 3072 dimensions depending on the model).
|