@kanopi/wp-ai-indexer 1.0.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/.circleci/README.md +41 -0
- package/.circleci/config.yml +43 -0
- package/.circleci/install-dependencies.sh +319 -0
- package/.circleci/security-audit-npm.sh +264 -0
- package/.circleci/setup-node.sh +223 -0
- package/.circleci/test-package.sh +247 -0
- package/.eslintrc.js +52 -0
- package/.prettierrc +10 -0
- package/LICENSE +21 -0
- package/README.md +479 -0
- package/bin/wp-ai-indexer.js +25 -0
- package/package.json +71 -0
- package/tests/helpers/test-config.ts +36 -0
- package/tests/mocks/fixtures/settings.json +13 -0
- package/tests/mocks/fixtures/wordpress-posts.json +42 -0
- package/tests/mocks/openai.mock.ts +52 -0
- package/tests/mocks/pinecone.mock.ts +76 -0
- package/tests/setup.ts +20 -0
- package/vitest.config.ts +29 -0
package/README.md
ADDED
|
@@ -0,0 +1,479 @@
|
|
|
1
|
+
# @kanopi/wp-ai-indexer
|
|
2
|
+
|
|
3
|
+
Shared Node-based indexer for WordPress AI plugins (Chatbot & Search). This package provides a single, reusable indexer that creates embeddings from WordPress content and stores them in Pinecone for use by multiple plugins.
|
|
4
|
+
|
|
5
|
+
## Features
|
|
6
|
+
|
|
7
|
+
- 📚 **WordPress Integration**: Fetches content via WordPress REST API
|
|
8
|
+
- 🔐 **Secure Authentication**: Supports WordPress Application Passwords
|
|
9
|
+
- ✂️ **Deterministic Chunking**: Consistent content chunking with configurable size and overlap
|
|
10
|
+
- 🤖 **OpenAI Embeddings**: Creates embeddings using OpenAI models
|
|
11
|
+
- 📊 **Pinecone Storage**: Upserts vectors to Pinecone with comprehensive metadata
|
|
12
|
+
- 🌐 **Domain Filtering**: Multi-environment support with automatic domain isolation
|
|
13
|
+
- 🔄 **Retry Logic**: Automatic retries with exponential backoff
|
|
14
|
+
- ⚙️ **Schema Versioning**: Enforces index schema compatibility
|
|
15
|
+
- 🎯 **Configurable**: Highly configurable via WordPress settings endpoint
|
|
16
|
+
- 🚀 **CLI Interface**: Easy-to-use command-line interface
|
|
17
|
+
|
|
18
|
+
## Installation
|
|
19
|
+
|
|
20
|
+
```bash
|
|
21
|
+
npm install @kanopi/wp-ai-indexer
|
|
22
|
+
```
|
|
23
|
+
|
|
24
|
+
Or install globally:
|
|
25
|
+
|
|
26
|
+
```bash
|
|
27
|
+
npm install -g @kanopi/wp-ai-indexer
|
|
28
|
+
```
|
|
29
|
+
|
|
30
|
+
## Requirements
|
|
31
|
+
|
|
32
|
+
- Node.js >= 18.0.0
|
|
33
|
+
- WordPress site with REST API enabled
|
|
34
|
+
- WordPress Application Password (recommended for secure authentication)
|
|
35
|
+
- OpenAI API key
|
|
36
|
+
- Pinecone API key
|
|
37
|
+
- WP AI Assistant plugin (provides `/wp-json/ai-assistant/v1/indexer-settings` endpoint)
|
|
38
|
+
|
|
39
|
+
## Configuration
|
|
40
|
+
|
|
41
|
+
### Environment Variables
|
|
42
|
+
|
|
43
|
+
Create a `.env` file in your project root:
|
|
44
|
+
|
|
45
|
+
```bash
|
|
46
|
+
# Required
|
|
47
|
+
WP_API_BASE=https://your-wordpress-site.com
|
|
48
|
+
OPENAI_API_KEY=sk-...
|
|
49
|
+
PINECONE_API_KEY=...
|
|
50
|
+
|
|
51
|
+
# Authentication (recommended if REST API is restricted)
|
|
52
|
+
WP_API_USERNAME=your-admin-username
|
|
53
|
+
WP_API_PASSWORD=xxxx xxxx xxxx xxxx xxxx xxxx # Application Password
|
|
54
|
+
|
|
55
|
+
# Optional
|
|
56
|
+
WP_AI_SETTINGS_URL=https://your-site.com/wp-json/ai-assistant/v1/indexer-settings
|
|
57
|
+
WP_AI_DEBUG=1
|
|
58
|
+
WP_AI_TIMEOUT_MS=30000
|
|
59
|
+
WP_AI_CONCURRENCY=2
|
|
60
|
+
WP_AI_NAMESPACE=
|
|
61
|
+
```
|
|
62
|
+
|
|
63
|
+
### Authentication
|
|
64
|
+
|
|
65
|
+
The indexer supports WordPress Application Passwords for secure authentication:
|
|
66
|
+
|
|
67
|
+
1. **Create an Application Password** (WordPress 5.6+):
|
|
68
|
+
```bash
|
|
69
|
+
wp user application-password create USERNAME "AI Indexer"
|
|
70
|
+
```
|
|
71
|
+
This returns a password like: `xxxx xxxx xxxx xxxx xxxx xxxx`
|
|
72
|
+
|
|
73
|
+
2. **Use the Application Password**:
|
|
74
|
+
- Set `WP_API_USERNAME` to your WordPress username
|
|
75
|
+
- Set `WP_API_PASSWORD` to the Application Password (not your regular password)
|
|
76
|
+
|
|
77
|
+
3. **Security Benefits**:
|
|
78
|
+
- ✅ Revocable without changing main password
|
|
79
|
+
- ✅ Auditable (tracks last used date and IP)
|
|
80
|
+
- ✅ Scoped per application
|
|
81
|
+
- ✅ Compatible with security plugins (Solid Security, etc.)
|
|
82
|
+
|
|
83
|
+
**Note:** Application Passwords are required if your WordPress site uses security plugins that restrict REST API access.
|
|
84
|
+
|
|
85
|
+
### WordPress Settings Endpoint
|
|
86
|
+
|
|
87
|
+
The indexer fetches configuration from WordPress via the shared settings endpoint:
|
|
88
|
+
|
|
89
|
+
**Endpoint:** `GET /wp-json/ai-assistant/v1/indexer-settings`
|
|
90
|
+
|
|
91
|
+
**Required Response Fields:**
|
|
92
|
+
|
|
93
|
+
```json
|
|
94
|
+
{
|
|
95
|
+
"schema_version": 1,
|
|
96
|
+
"domain": "your-wordpress-site.com",
|
|
97
|
+
"post_types": ["post", "page"],
|
|
98
|
+
"post_types_exclude": ["attachment", "revision"],
|
|
99
|
+
"auto_discover": false,
|
|
100
|
+
"clean_deleted": false,
|
|
101
|
+
"embedding_model": "text-embedding-3-small",
|
|
102
|
+
"embedding_dimension": 1536,
|
|
103
|
+
"chunk_size": 500,
|
|
104
|
+
"chunk_overlap": 50,
|
|
105
|
+
"pinecone_index_host": "your-index-host.pinecone.io",
|
|
106
|
+
"pinecone_index_name": "your-index-name"
|
|
107
|
+
}
|
|
108
|
+
```
|
|
109
|
+
|
|
110
|
+
**Note:** The `domain` field is automatically extracted from `WP_API_BASE` and is used for multi-environment filtering.
|
|
111
|
+
|
|
112
|
+
## Usage
|
|
113
|
+
|
|
114
|
+
### WP-CLI Commands (WordPress Plugin Integration)
|
|
115
|
+
|
|
116
|
+
If you have the WP AI Assistant WordPress plugin installed, you can use WordPress-native commands:
|
|
117
|
+
|
|
118
|
+
```bash
|
|
119
|
+
# Index all content
|
|
120
|
+
wp ai-indexer index
|
|
121
|
+
|
|
122
|
+
# Clean deleted posts
|
|
123
|
+
wp ai-indexer clean
|
|
124
|
+
|
|
125
|
+
# Delete all vectors for domain
|
|
126
|
+
wp ai-indexer delete-all
|
|
127
|
+
|
|
128
|
+
# Show configuration
|
|
129
|
+
wp ai-indexer config
|
|
130
|
+
|
|
131
|
+
# Check system requirements
|
|
132
|
+
wp ai-indexer check
|
|
133
|
+
```
|
|
134
|
+
|
|
135
|
+
These commands automatically detect whether you have a local or global installation of the indexer package and use whichever is available.
|
|
136
|
+
|
|
137
|
+
### Direct CLI Usage
|
|
138
|
+
|
|
139
|
+
You can also run the indexer directly using npx or the global installation:
|
|
140
|
+
|
|
141
|
+
| Command | Description | Standard | DDEV |
|
|
142
|
+
|---------|-------------|----------|------|
|
|
143
|
+
| **Index all content** | Process and index all WordPress content | `npx wp-ai-indexer index` | `ddev exec "cd packages/wp-ai-indexer && npx wp-ai-indexer index"` |
|
|
144
|
+
| **Index with debug** | Index with verbose debug output | `npx wp-ai-indexer index --debug` | `ddev exec "cd packages/wp-ai-indexer && npx wp-ai-indexer index --debug"` |
|
|
145
|
+
| **Clean deleted** | Remove vectors for deleted posts (see note below) | `npx wp-ai-indexer clean` | `ddev exec "cd packages/wp-ai-indexer && npx wp-ai-indexer clean"` |
|
|
146
|
+
| **Delete all** | Delete all vectors for current domain (requires confirmation) | `npx wp-ai-indexer delete-all` | `ddev exec "cd packages/wp-ai-indexer && npx wp-ai-indexer delete-all"` |
|
|
147
|
+
| **Delete all (skip confirmation)** | Delete all vectors without confirmation prompt | `npx wp-ai-indexer delete-all --yes` | `ddev exec "cd packages/wp-ai-indexer && npx wp-ai-indexer delete-all --yes"` |
|
|
148
|
+
| **Show config** | Display current configuration and verify credentials | `npx wp-ai-indexer config` | `ddev exec "cd packages/wp-ai-indexer && npx wp-ai-indexer config"` |
|
|
149
|
+
|
|
150
|
+
**Note:** When running in DDEV, environment variables are automatically loaded from `.ddev/config.yaml`. For standard usage, set environment variables in `.env` or export them in your shell.
|
|
151
|
+
|
|
152
|
+
#### Clean vs Delete All
|
|
153
|
+
|
|
154
|
+
- **`clean`**: Attempts to identify and remove vectors for deleted posts. Due to Pinecone API limitations, this command currently provides guidance rather than automatic cleaning. The recommended approach is to use `delete-all` followed by `index`.
|
|
155
|
+
|
|
156
|
+
- **`delete-all`**: Removes ALL vectors for the current domain from the Pinecone index. Use this when you need to completely re-index your content. This command requires confirmation (type "DELETE") unless you use the `--yes` flag.
|
|
157
|
+
|
|
158
|
+
**Recommended workflow for complete re-indexing:**
|
|
159
|
+
```bash
|
|
160
|
+
# Delete all vectors for current domain
|
|
161
|
+
npx wp-ai-indexer delete-all
|
|
162
|
+
|
|
163
|
+
# Re-index all content
|
|
164
|
+
npx wp-ai-indexer index
|
|
165
|
+
```
|
|
166
|
+
|
|
167
|
+
### Programmatic Usage
|
|
168
|
+
|
|
169
|
+
```typescript
|
|
170
|
+
import { Indexer, IndexerConfig } from '@kanopi/wp-ai-indexer';
|
|
171
|
+
|
|
172
|
+
const config: IndexerConfig = {
|
|
173
|
+
wpApiBase: 'https://your-wordpress-site.com',
|
|
174
|
+
openaiApiKey: process.env.OPENAI_API_KEY!,
|
|
175
|
+
pineconeApiKey: process.env.PINECONE_API_KEY!,
|
|
176
|
+
wpUsername: process.env.WP_API_USERNAME, // Optional
|
|
177
|
+
wpPassword: process.env.WP_API_PASSWORD, // Optional (Application Password)
|
|
178
|
+
debug: true,
|
|
179
|
+
};
|
|
180
|
+
|
|
181
|
+
const indexer = new Indexer(config);
|
|
182
|
+
const result = await indexer.index();
|
|
183
|
+
|
|
184
|
+
console.log('Indexing complete:', result.stats);
|
|
185
|
+
```
|
|
186
|
+
|
|
187
|
+
## Index Schema
|
|
188
|
+
|
|
189
|
+
The indexer creates vectors with the following metadata:
|
|
190
|
+
|
|
191
|
+
### Required Metadata
|
|
192
|
+
|
|
193
|
+
- `post_id` (number): WordPress post ID
|
|
194
|
+
- `post_type` (string): WordPress post type
|
|
195
|
+
- `title` (string): Post title
|
|
196
|
+
- `url` (string): Permalink URL
|
|
197
|
+
- `chunk` (string): Text content of this chunk
|
|
198
|
+
- `domain` (string): WordPress site domain (for multi-environment filtering)
|
|
199
|
+
- `schema_version` (number): Schema version (currently 1)
|
|
200
|
+
- `post_date` (string): Post publish date (ISO 8601)
|
|
201
|
+
- `post_modified` (string): Post modified date (ISO 8601)
|
|
202
|
+
- `author_id` (number): Author ID
|
|
203
|
+
- `chunk_index` (number): Chunk index within post
|
|
204
|
+
|
|
205
|
+
### Optional Metadata
|
|
206
|
+
|
|
207
|
+
- `category_ids` (string): Comma-separated category IDs
|
|
208
|
+
- `tag_ids` (string): Comma-separated tag IDs
|
|
209
|
+
|
|
210
|
+
### Domain Filtering
|
|
211
|
+
|
|
212
|
+
The `domain` field enables multi-environment support. The same Pinecone index can store vectors from multiple WordPress environments (development, staging, production) without collision.
|
|
213
|
+
|
|
214
|
+
**Example Query Filter:**
|
|
215
|
+
```javascript
|
|
216
|
+
{
|
|
217
|
+
domain: { $eq: "your-production-site.com" }
|
|
218
|
+
}
|
|
219
|
+
```
|
|
220
|
+
|
|
221
|
+
This ensures queries from production only return production content, staging queries only return staging content, etc.
|
|
222
|
+
|
|
223
|
+
## Architecture
|
|
224
|
+
|
|
225
|
+
The indexer is composed of modular components:
|
|
226
|
+
|
|
227
|
+
- **SettingsManager**: Fetches and validates WordPress settings
|
|
228
|
+
- **WordPressClient**: Fetches posts via REST API
|
|
229
|
+
- **Chunker**: Splits content into overlapping chunks
|
|
230
|
+
- **EmbeddingsManager**: Creates embeddings with OpenAI
|
|
231
|
+
- **PineconeManager**: Manages Pinecone vector operations
|
|
232
|
+
- **Indexer**: Orchestrates the full indexing pipeline
|
|
233
|
+
|
|
234
|
+
## Integration
|
|
235
|
+
|
|
236
|
+
This indexer is designed to work with the **WP AI Assistant** plugin, which provides:
|
|
237
|
+
|
|
238
|
+
- **AI Chatbot**: RAG-based conversational AI using indexed content
|
|
239
|
+
- **Semantic Search**: AI-powered search using vector similarity
|
|
240
|
+
- **Domain Filtering**: Multi-environment support (development, staging, production)
|
|
241
|
+
|
|
242
|
+
The plugin and indexer share:
|
|
243
|
+
- The same Pinecone index
|
|
244
|
+
- The same indexer settings endpoint (`/wp-json/ai-assistant/v1/indexer-settings`)
|
|
245
|
+
- The same index schema (enforced by `schema_version`)
|
|
246
|
+
- Automatic domain-based filtering for multi-environment setups
|
|
247
|
+
|
|
248
|
+
## Schema Versioning
|
|
249
|
+
|
|
250
|
+
The indexer enforces schema compatibility:
|
|
251
|
+
|
|
252
|
+
- **Schema Version 1** (current):
|
|
253
|
+
- Includes `domain` field for multi-environment filtering
|
|
254
|
+
- All metadata fields listed above
|
|
255
|
+
- Character-based chunking
|
|
256
|
+
- OpenAI text-embedding-3-small with configurable dimensions
|
|
257
|
+
|
|
258
|
+
## Performance
|
|
259
|
+
|
|
260
|
+
- **Chunking**: ~1000 chunks/second
|
|
261
|
+
- **Embeddings**: Limited by OpenAI API rate limits (~3000 RPM)
|
|
262
|
+
- **Upserting**: Batched (100 vectors/batch) for optimal Pinecone performance
|
|
263
|
+
- **Memory**: Processes posts as a stream to minimize memory usage
|
|
264
|
+
|
|
265
|
+
## Error Handling
|
|
266
|
+
|
|
267
|
+
The indexer implements robust error handling:
|
|
268
|
+
|
|
269
|
+
- ✅ Automatic retries with exponential backoff
|
|
270
|
+
- ✅ Continues processing on individual post errors
|
|
271
|
+
- ✅ Detailed error reporting
|
|
272
|
+
- ✅ Graceful degradation
|
|
273
|
+
- ✅ Exit codes suitable for CI/CD
|
|
274
|
+
|
|
275
|
+
## Development
|
|
276
|
+
|
|
277
|
+
### Build
|
|
278
|
+
|
|
279
|
+
```bash
|
|
280
|
+
npm run build
|
|
281
|
+
```
|
|
282
|
+
|
|
283
|
+
### Watch Mode
|
|
284
|
+
|
|
285
|
+
```bash
|
|
286
|
+
npm run watch
|
|
287
|
+
```
|
|
288
|
+
|
|
289
|
+
### Test Locally
|
|
290
|
+
|
|
291
|
+
```bash
|
|
292
|
+
# Set up .env file
|
|
293
|
+
cp .env.example .env
|
|
294
|
+
# Edit .env with your credentials
|
|
295
|
+
|
|
296
|
+
# Run indexer
|
|
297
|
+
npm run index
|
|
298
|
+
```
|
|
299
|
+
|
|
300
|
+
### DDEV Example
|
|
301
|
+
|
|
302
|
+
For local development with DDEV:
|
|
303
|
+
|
|
304
|
+
```bash
|
|
305
|
+
# Create Application Password
|
|
306
|
+
ddev exec "wp user application-password create admin 'AI Indexer' --porcelain"
|
|
307
|
+
# Returns: xxxx xxxx xxxx xxxx xxxx xxxx
|
|
308
|
+
|
|
309
|
+
# Set environment variables in .ddev/config.yaml:
|
|
310
|
+
# web_environment:
|
|
311
|
+
# - WP_API_BASE=https://yoursite.ddev.site
|
|
312
|
+
# - WP_API_USERNAME=admin
|
|
313
|
+
# - WP_API_PASSWORD=xxxx xxxx xxxx xxxx xxxx xxxx
|
|
314
|
+
# - OPENAI_API_KEY=sk-...
|
|
315
|
+
# - PINECONE_API_KEY=...
|
|
316
|
+
|
|
317
|
+
# Restart DDEV
|
|
318
|
+
ddev restart
|
|
319
|
+
|
|
320
|
+
# Run indexer from inside DDEV container
|
|
321
|
+
ddev exec "cd packages/wp-ai-indexer && npx wp-ai-indexer index"
|
|
322
|
+
```
|
|
323
|
+
|
|
324
|
+
## Troubleshooting
|
|
325
|
+
|
|
326
|
+
### "Built files not found"
|
|
327
|
+
|
|
328
|
+
Run `npm run build` before using the CLI.
|
|
329
|
+
|
|
330
|
+
### "Request failed with status code 401"
|
|
331
|
+
|
|
332
|
+
Authentication is required:
|
|
333
|
+
|
|
334
|
+
1. Create a WordPress Application Password:
|
|
335
|
+
```bash
|
|
336
|
+
wp user application-password create USERNAME "AI Indexer"
|
|
337
|
+
```
|
|
338
|
+
|
|
339
|
+
2. Set environment variables:
|
|
340
|
+
```bash
|
|
341
|
+
WP_API_USERNAME=your-username
|
|
342
|
+
WP_API_PASSWORD=xxxx xxxx xxxx xxxx xxxx xxxx
|
|
343
|
+
```
|
|
344
|
+
|
|
345
|
+
3. Verify credentials:
|
|
346
|
+
```bash
|
|
347
|
+
npx wp-ai-indexer config
|
|
348
|
+
```
|
|
349
|
+
|
|
350
|
+
### "Failed to fetch settings"
|
|
351
|
+
|
|
352
|
+
- Ensure WordPress site is accessible
|
|
353
|
+
- Verify the settings endpoint exists: `/wp-json/ai-assistant/v1/indexer-settings`
|
|
354
|
+
- Check that WP AI Assistant plugin is activated
|
|
355
|
+
- If using authentication, verify credentials are correct
|
|
356
|
+
|
|
357
|
+
### "REST API access restricted"
|
|
358
|
+
|
|
359
|
+
If using a security plugin (Solid Security, Wordfence, etc.):
|
|
360
|
+
|
|
361
|
+
- ✅ Use Application Passwords (recommended)
|
|
362
|
+
- ✅ Security plugins typically allow Application Password authentication
|
|
363
|
+
- ❌ Don't disable REST API security globally
|
|
364
|
+
|
|
365
|
+
### "Unsupported schema version"
|
|
366
|
+
|
|
367
|
+
Update the indexer package to support the schema version returned by WordPress.
|
|
368
|
+
|
|
369
|
+
### "Vector dimension mismatch"
|
|
370
|
+
|
|
371
|
+
The embedding dimension must match your Pinecone index:
|
|
372
|
+
|
|
373
|
+
- Check Pinecone index dimension
|
|
374
|
+
- Update WordPress plugin settings to match
|
|
375
|
+
- Common dimensions: 1536 (text-embedding-3-small), 1024 (custom), 3072 (text-embedding-3-large)
|
|
376
|
+
|
|
377
|
+
### "Rate limit exceeded"
|
|
378
|
+
|
|
379
|
+
The indexer includes automatic retry logic. If you consistently hit rate limits:
|
|
380
|
+
|
|
381
|
+
- Reduce concurrency with `WP_AI_CONCURRENCY`
|
|
382
|
+
- Increase timeout with `WP_AI_TIMEOUT_MS`
|
|
383
|
+
|
|
384
|
+
## Testing
|
|
385
|
+
|
|
386
|
+
### Run Tests
|
|
387
|
+
|
|
388
|
+
```bash
|
|
389
|
+
# Run all tests
|
|
390
|
+
npm test
|
|
391
|
+
|
|
392
|
+
# Run with coverage
|
|
393
|
+
npm run test:coverage
|
|
394
|
+
|
|
395
|
+
# Run in watch mode
|
|
396
|
+
npm run test:watch
|
|
397
|
+
|
|
398
|
+
# Run unit tests only
|
|
399
|
+
npm run test:unit
|
|
400
|
+
|
|
401
|
+
# Run integration tests only
|
|
402
|
+
npm run test:integration
|
|
403
|
+
```
|
|
404
|
+
|
|
405
|
+
### Test Structure
|
|
406
|
+
|
|
407
|
+
- **Unit Tests**: Test individual components in isolation (`tests/unit/`)
|
|
408
|
+
- **Integration Tests**: Test component interactions (`tests/integration/`)
|
|
409
|
+
- **Fixtures**: Mock data for testing (`tests/mocks/`)
|
|
410
|
+
|
|
411
|
+
## Contributing
|
|
412
|
+
|
|
413
|
+
We welcome contributions! Here's how to get started:
|
|
414
|
+
|
|
415
|
+
### Development Setup
|
|
416
|
+
|
|
417
|
+
1. **Fork and Clone**
|
|
418
|
+
```bash
|
|
419
|
+
git clone https://github.com/your-username/wp-ai-indexer.git
|
|
420
|
+
cd wp-ai-indexer
|
|
421
|
+
```
|
|
422
|
+
|
|
423
|
+
2. **Install Dependencies**
|
|
424
|
+
```bash
|
|
425
|
+
npm install
|
|
426
|
+
```
|
|
427
|
+
|
|
428
|
+
3. **Build**
|
|
429
|
+
```bash
|
|
430
|
+
npm run build
|
|
431
|
+
```
|
|
432
|
+
|
|
433
|
+
4. **Run Tests**
|
|
434
|
+
```bash
|
|
435
|
+
npm test
|
|
436
|
+
```
|
|
437
|
+
|
|
438
|
+
### Guidelines
|
|
439
|
+
|
|
440
|
+
- **Code Style**: Follow existing patterns, use ESLint and Prettier
|
|
441
|
+
```bash
|
|
442
|
+
npm run lint
|
|
443
|
+
npm run format
|
|
444
|
+
```
|
|
445
|
+
|
|
446
|
+
- **Tests**: Add tests for new features
|
|
447
|
+
- Unit tests for individual functions
|
|
448
|
+
- Integration tests for component interactions
|
|
449
|
+
- Maintain or improve code coverage
|
|
450
|
+
|
|
451
|
+
- **Commits**: Use clear, descriptive commit messages
|
|
452
|
+
- Format: `type: description`
|
|
453
|
+
- Types: `feat`, `fix`, `docs`, `test`, `refactor`, `chore`
|
|
454
|
+
- Example: `feat: add retry logic for embeddings API`
|
|
455
|
+
|
|
456
|
+
- **Pull Requests**:
|
|
457
|
+
- Reference related issues
|
|
458
|
+
- Provide clear description of changes
|
|
459
|
+
- Ensure CI passes
|
|
460
|
+
- Update documentation as needed
|
|
461
|
+
|
|
462
|
+
### Reporting Issues
|
|
463
|
+
|
|
464
|
+
When reporting bugs, please include:
|
|
465
|
+
- Node.js version (`node --version`)
|
|
466
|
+
- Package version (`npm list @kanopi/wp-ai-indexer`)
|
|
467
|
+
- WordPress version
|
|
468
|
+
- Error messages and stack traces
|
|
469
|
+
- Steps to reproduce
|
|
470
|
+
|
|
471
|
+
## License
|
|
472
|
+
|
|
473
|
+
MIT
|
|
474
|
+
|
|
475
|
+
## Support
|
|
476
|
+
|
|
477
|
+
For issues and questions:
|
|
478
|
+
- GitHub Issues: [kanopi/wp-ai-indexer](https://github.com/kanopi/wp-ai-indexer/issues)
|
|
479
|
+
- Documentation: See README.md
|
|
@@ -0,0 +1,25 @@
|
|
|
1
|
+
#!/usr/bin/env node
|
|
2
|
+
|
|
3
|
+
/**
|
|
4
|
+
* CLI entry point for wp-ai-indexer
|
|
5
|
+
*/
|
|
6
|
+
|
|
7
|
+
// Try to load the CLI module
|
|
8
|
+
// This works regardless of how the script is executed (direct, npx, or symlink)
|
|
9
|
+
try {
|
|
10
|
+
const { CLI } = require('../dist/cli');
|
|
11
|
+
const cli = new CLI();
|
|
12
|
+
cli.run(process.argv).catch((error) => {
|
|
13
|
+
console.error('Unexpected error:', error);
|
|
14
|
+
process.exit(1);
|
|
15
|
+
});
|
|
16
|
+
} catch (error) {
|
|
17
|
+
// If module not found, show helpful error
|
|
18
|
+
if (error.code === 'MODULE_NOT_FOUND') {
|
|
19
|
+
console.error('❌ Error: Built files not found. Please run "npm run build" first.');
|
|
20
|
+
console.error(' Module:', error.message);
|
|
21
|
+
} else {
|
|
22
|
+
console.error('❌ Error loading CLI:', error.message);
|
|
23
|
+
}
|
|
24
|
+
process.exit(1);
|
|
25
|
+
}
|
package/package.json
ADDED
|
@@ -0,0 +1,71 @@
|
|
|
1
|
+
{
|
|
2
|
+
"name": "@kanopi/wp-ai-indexer",
|
|
3
|
+
"version": "1.0.0",
|
|
4
|
+
"description": "Shared Node-based indexer for WordPress AI plugins (Chatbot & Search)",
|
|
5
|
+
"main": "dist/index.js",
|
|
6
|
+
"types": "dist/index.d.ts",
|
|
7
|
+
"bin": {
|
|
8
|
+
"wp-ai-indexer": "./bin/wp-ai-indexer.js"
|
|
9
|
+
},
|
|
10
|
+
"scripts": {
|
|
11
|
+
"build": "tsc",
|
|
12
|
+
"watch": "tsc --watch",
|
|
13
|
+
"index": "node bin/wp-ai-indexer.js index",
|
|
14
|
+
"clean": "node bin/wp-ai-indexer.js clean",
|
|
15
|
+
"test": "vitest run",
|
|
16
|
+
"test:watch": "vitest",
|
|
17
|
+
"test:coverage": "vitest run --coverage",
|
|
18
|
+
"test:unit": "vitest run tests/unit",
|
|
19
|
+
"test:integration": "vitest run tests/integration",
|
|
20
|
+
"test:ci": "vitest run --coverage --reporter=verbose",
|
|
21
|
+
"lint": "eslint . --ext .ts",
|
|
22
|
+
"lint:fix": "eslint . --ext .ts --fix",
|
|
23
|
+
"format": "prettier --write \"src/**/*.ts\" \"tests/**/*.ts\"",
|
|
24
|
+
"format:check": "prettier --check \"src/**/*.ts\" \"tests/**/*.ts\""
|
|
25
|
+
},
|
|
26
|
+
"repository": {
|
|
27
|
+
"type": "git",
|
|
28
|
+
"url": "https://github.com/kanopi/wp-ai-indexer.git"
|
|
29
|
+
},
|
|
30
|
+
"bugs": {
|
|
31
|
+
"url": "https://github.com/kanopi/wp-ai-indexer/issues"
|
|
32
|
+
},
|
|
33
|
+
"homepage": "https://github.com/kanopi/wp-ai-indexer#readme",
|
|
34
|
+
"keywords": [
|
|
35
|
+
"wordpress",
|
|
36
|
+
"ai",
|
|
37
|
+
"indexer",
|
|
38
|
+
"pinecone",
|
|
39
|
+
"openai",
|
|
40
|
+
"embeddings",
|
|
41
|
+
"vector-search",
|
|
42
|
+
"rag"
|
|
43
|
+
],
|
|
44
|
+
"author": "Kanopi Studios",
|
|
45
|
+
"license": "MIT",
|
|
46
|
+
"dependencies": {
|
|
47
|
+
"@pinecone-database/pinecone": "^2.0.0",
|
|
48
|
+
"openai": "^4.20.0",
|
|
49
|
+
"axios": "^1.6.0",
|
|
50
|
+
"dotenv": "^16.3.0",
|
|
51
|
+
"commander": "^11.1.0",
|
|
52
|
+
"chalk": "^4.1.2",
|
|
53
|
+
"ora": "^5.4.1"
|
|
54
|
+
},
|
|
55
|
+
"devDependencies": {
|
|
56
|
+
"@types/node": "^20.10.0",
|
|
57
|
+
"@vitest/coverage-v8": "^1.0.0",
|
|
58
|
+
"nock": "^13.4.0",
|
|
59
|
+
"typescript": "^5.3.0",
|
|
60
|
+
"vitest": "^1.0.0",
|
|
61
|
+
"@typescript-eslint/eslint-plugin": "^6.0.0",
|
|
62
|
+
"@typescript-eslint/parser": "^6.0.0",
|
|
63
|
+
"eslint": "^8.0.0",
|
|
64
|
+
"eslint-config-prettier": "^9.0.0",
|
|
65
|
+
"eslint-plugin-prettier": "^5.0.0",
|
|
66
|
+
"prettier": "^3.0.0"
|
|
67
|
+
},
|
|
68
|
+
"engines": {
|
|
69
|
+
"node": ">=18.0.0"
|
|
70
|
+
}
|
|
71
|
+
}
|
|
@@ -0,0 +1,36 @@
|
|
|
1
|
+
import { IndexerConfig, IndexerSettings } from '../../src/types';
|
|
2
|
+
|
|
3
|
+
/**
|
|
4
|
+
* Test configuration for indexer
|
|
5
|
+
*/
|
|
6
|
+
export const createTestConfig = (overrides?: Partial<IndexerConfig>): IndexerConfig => {
|
|
7
|
+
return {
|
|
8
|
+
wpApiBase: 'https://test.example.com',
|
|
9
|
+
openaiApiKey: 'test-openai-key',
|
|
10
|
+
pineconeApiKey: 'test-pinecone-key',
|
|
11
|
+
debug: false,
|
|
12
|
+
timeout: 5000,
|
|
13
|
+
...overrides,
|
|
14
|
+
};
|
|
15
|
+
};
|
|
16
|
+
|
|
17
|
+
/**
|
|
18
|
+
* Test settings for indexer
|
|
19
|
+
*/
|
|
20
|
+
export const createTestSettings = (overrides?: Partial<IndexerSettings>): IndexerSettings => {
|
|
21
|
+
return {
|
|
22
|
+
schema_version: 1,
|
|
23
|
+
post_types: ['post', 'page'],
|
|
24
|
+
post_types_exclude: ['attachment', 'revision'],
|
|
25
|
+
auto_discover: false,
|
|
26
|
+
clean_deleted: false,
|
|
27
|
+
embedding_model: 'text-embedding-3-small',
|
|
28
|
+
embedding_dimension: 1536,
|
|
29
|
+
chunk_size: 500,
|
|
30
|
+
chunk_overlap: 50,
|
|
31
|
+
pinecone_index_host: 'https://test-index.pinecone.io',
|
|
32
|
+
pinecone_index_name: 'test-index',
|
|
33
|
+
domain: 'test.example.com',
|
|
34
|
+
...overrides,
|
|
35
|
+
};
|
|
36
|
+
};
|
|
@@ -0,0 +1,13 @@
|
|
|
1
|
+
{
|
|
2
|
+
"schema_version": 1,
|
|
3
|
+
"post_types": ["post", "page"],
|
|
4
|
+
"post_types_exclude": ["attachment", "revision"],
|
|
5
|
+
"auto_discover": false,
|
|
6
|
+
"clean_deleted": false,
|
|
7
|
+
"embedding_model": "text-embedding-3-small",
|
|
8
|
+
"embedding_dimension": 1536,
|
|
9
|
+
"chunk_size": 500,
|
|
10
|
+
"chunk_overlap": 50,
|
|
11
|
+
"pinecone_index_host": "https://test-index.pinecone.io",
|
|
12
|
+
"pinecone_index_name": "test-index"
|
|
13
|
+
}
|
|
@@ -0,0 +1,42 @@
|
|
|
1
|
+
[
|
|
2
|
+
{
|
|
3
|
+
"id": 1,
|
|
4
|
+
"type": "post",
|
|
5
|
+
"date": "2024-01-01T10:00:00",
|
|
6
|
+
"modified": "2024-01-02T10:00:00",
|
|
7
|
+
"slug": "test-post",
|
|
8
|
+
"status": "publish",
|
|
9
|
+
"link": "https://example.com/test-post",
|
|
10
|
+
"title": {
|
|
11
|
+
"rendered": "Test Post"
|
|
12
|
+
},
|
|
13
|
+
"content": {
|
|
14
|
+
"rendered": "<p>This is test content for the post. It contains some text that will be chunked and embedded.</p>"
|
|
15
|
+
},
|
|
16
|
+
"excerpt": {
|
|
17
|
+
"rendered": "<p>This is a test excerpt</p>"
|
|
18
|
+
},
|
|
19
|
+
"author": 1,
|
|
20
|
+
"categories": [1, 2],
|
|
21
|
+
"tags": [1, 2, 3]
|
|
22
|
+
},
|
|
23
|
+
{
|
|
24
|
+
"id": 2,
|
|
25
|
+
"type": "page",
|
|
26
|
+
"date": "2024-01-03T10:00:00",
|
|
27
|
+
"modified": "2024-01-03T10:00:00",
|
|
28
|
+
"slug": "test-page",
|
|
29
|
+
"status": "publish",
|
|
30
|
+
"link": "https://example.com/test-page",
|
|
31
|
+
"title": {
|
|
32
|
+
"rendered": "Test Page"
|
|
33
|
+
},
|
|
34
|
+
"content": {
|
|
35
|
+
"rendered": "<p>This is a test page with some content.</p>"
|
|
36
|
+
},
|
|
37
|
+
"excerpt": {
|
|
38
|
+
"rendered": ""
|
|
39
|
+
},
|
|
40
|
+
"author": 1
|
|
41
|
+
}
|
|
42
|
+
]
|