strapi-content-embeddings 0.1.4 → 0.1.6

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -11,6 +11,8 @@ A Strapi v5 plugin that creates vector embeddings from your content using OpenAI
11
11
  - **Content Manager Integration**: Create embeddings directly from any content type's edit view
12
12
  - **Standalone Embeddings**: Create embeddings independent of content types
13
13
  - **Multiple Embedding Models**: Support for OpenAI's text-embedding-3-small, text-embedding-3-large, and text-embedding-ada-002
14
+ - **Database Sync**: Sync embeddings from Neon DB to Strapi with cron-compatible endpoints
15
+ - **Automatic Chunking**: Split large content into multiple embeddings with overlap for context preservation
14
16
 
15
17
  ## Requirements
16
18
 
@@ -206,6 +208,191 @@ All endpoints require admin authentication.
206
208
  | `GET` | `/strapi-content-embeddings/embeddings/find/:id` | Get a single embedding |
207
209
  | `GET` | `/strapi-content-embeddings/embeddings/embeddings-query?query=...` | RAG query |
208
210
 
211
+ ## Database Sync (Neon to Strapi)
212
+
213
+ The plugin provides endpoints to sync embeddings from Neon DB (source of truth) to Strapi. These endpoints are designed to be triggered manually or via cron jobs.
214
+
215
+ ### Sync Endpoints
216
+
217
+ | Method | Endpoint | Description |
218
+ |--------|----------|-------------|
219
+ | `GET/POST` | `/api/strapi-content-embeddings/sync` | Sync embeddings from Neon to Strapi |
220
+ | `GET` | `/api/strapi-content-embeddings/sync/status` | Check sync status without making changes |
221
+
222
+ ### Query Parameters
223
+
224
+ | Parameter | Type | Default | Description |
225
+ |-----------|------|---------|-------------|
226
+ | `dryRun` | boolean | `false` | Preview changes without applying them |
227
+ | `removeOrphans` | boolean | `false` | Remove Strapi entries that don't exist in Neon |
228
+
229
+ ### Usage Examples
230
+
231
+ **Check sync status:**
232
+ ```bash
233
+ curl "http://localhost:1337/api/strapi-content-embeddings/sync/status" \
234
+ -H "Authorization: Bearer YOUR_API_TOKEN"
235
+ ```
236
+
237
+ Response:
238
+ ```json
239
+ {
240
+ "neonCount": 150,
241
+ "strapiCount": 145,
242
+ "inSync": false,
243
+ "missingInStrapi": 5,
244
+ "missingInNeon": 0,
245
+ "contentDifferences": 2
246
+ }
247
+ ```
248
+
249
+ **Dry run (preview changes):**
250
+ ```bash
251
+ curl "http://localhost:1337/api/strapi-content-embeddings/sync?dryRun=true" \
252
+ -H "Authorization: Bearer YOUR_API_TOKEN"
253
+ ```
254
+
255
+ **Run sync:**
256
+ ```bash
257
+ curl "http://localhost:1337/api/strapi-content-embeddings/sync" \
258
+ -H "Authorization: Bearer YOUR_API_TOKEN"
259
+ ```
260
+
261
+ **Sync and remove orphans:**
262
+ ```bash
263
+ curl "http://localhost:1337/api/strapi-content-embeddings/sync?removeOrphans=true" \
264
+ -H "Authorization: Bearer YOUR_API_TOKEN"
265
+ ```
266
+
267
+ ### Sync Response
268
+
269
+ ```json
270
+ {
271
+ "success": true,
272
+ "timestamp": "2024-01-07T12:00:00.000Z",
273
+ "neonCount": 150,
274
+ "strapiCount": 150,
275
+ "actions": {
276
+ "created": 5,
277
+ "updated": 2,
278
+ "orphansRemoved": 0
279
+ },
280
+ "details": {
281
+ "created": ["doc1 (Title 1)", "doc2 (Title 2)"],
282
+ "updated": ["doc3 (Title 3)"],
283
+ "orphansRemoved": []
284
+ },
285
+ "errors": []
286
+ }
287
+ ```
288
+
289
+ ### Cron Job Example
290
+
291
+ ```bash
292
+ # Sync every hour
293
+ 0 * * * * curl -s "https://your-strapi.com/api/strapi-content-embeddings/sync" \
294
+ -H "Authorization: Bearer YOUR_API_TOKEN" >> /var/log/embeddings-sync.log
295
+ ```
296
+
297
+ ## Content Chunking
298
+
299
+ For large content that exceeds the recommended size for embeddings (~4000 characters / ~1000 tokens), the plugin supports automatic chunking.
300
+
301
+ ### How Chunking Works
302
+
303
+ 1. **Smart Splitting**: Content is split at natural boundaries (paragraphs, sentences, words) to preserve meaning
304
+ 2. **Overlap**: Chunks include overlapping content (default: 200 chars) to maintain context between chunks
305
+ 3. **Metadata**: Each chunk stores metadata linking it to the original content and other chunks
306
+ 4. **Titles**: Chunk titles include part numbers (e.g., "My Document [Part 1/3]")
307
+
308
+ ### Configuration
309
+
310
+ Add chunking options to your plugin config:
311
+
312
+ ```typescript
313
+ // config/plugins.ts
314
+ export default ({ env }) => ({
315
+ "strapi-content-embeddings": {
316
+ enabled: true,
317
+ config: {
318
+ openAIApiKey: env("OPENAI_API_KEY"),
319
+ neonConnectionString: env("NEON_CONNECTION_STRING"),
320
+ // Chunking options
321
+ chunkSize: 4000, // Max characters per chunk (default: 4000)
322
+ chunkOverlap: 200, // Overlap between chunks (default: 200)
323
+ autoChunk: false, // Auto-chunk large content globally (default: false)
324
+ },
325
+ },
326
+ });
327
+ ```
328
+
329
+ ### Using Chunking
330
+
331
+ #### Via MCP Tool
332
+
333
+ ```json
334
+ {
335
+ "tool": "create_embedding",
336
+ "arguments": {
337
+ "title": "My Long Document",
338
+ "content": "... very long content ...",
339
+ "autoChunk": true
340
+ }
341
+ }
342
+ ```
343
+
344
+ #### Programmatic Usage
345
+
346
+ ```typescript
347
+ // Create with automatic chunking
348
+ const result = await strapi
349
+ .plugin("strapi-content-embeddings")
350
+ .service("embeddings")
351
+ .createChunkedEmbedding({
352
+ data: {
353
+ title: "My Long Document",
354
+ content: "... very long content ...",
355
+ },
356
+ });
357
+
358
+ console.log(result);
359
+ // {
360
+ // entity: { ... first chunk ... },
361
+ // chunks: [ ... all chunks ... ],
362
+ // totalChunks: 5,
363
+ // wasChunked: true
364
+ // }
365
+
366
+ // Or use createEmbedding with autoChunk flag
367
+ const embedding = await strapi
368
+ .plugin("strapi-content-embeddings")
369
+ .service("embeddings")
370
+ .createEmbedding({
371
+ data: {
372
+ title: "My Document",
373
+ content: "... long content ...",
374
+ autoChunk: true, // Enable chunking
375
+ },
376
+ });
377
+ ```
378
+
379
+ ### Chunk Metadata
380
+
381
+ Each chunk embedding includes metadata:
382
+
383
+ ```json
384
+ {
385
+ "isChunk": true,
386
+ "chunkIndex": 0,
387
+ "totalChunks": 5,
388
+ "startOffset": 0,
389
+ "endOffset": 4200,
390
+ "originalTitle": "My Long Document",
391
+ "parentDocumentId": "abc123",
392
+ "estimatedTokens": 1050
393
+ }
394
+ ```
395
+
209
396
  ## How It Works
210
397
 
211
398
  1. **Embedding Creation**: When you create an embedding, the content is sent to OpenAI's embedding API to generate a vector representation (1536 or 3072 dimensions depending on the model).