@memberjunction/ai-vector-dupe 5.21.0 → 5.22.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -1,42 +1,66 @@
1
1
  # @memberjunction/ai-vector-dupe
2
2
 
3
- AI-powered duplicate record detection for MemberJunction entities. This package uses vector embeddings and similarity search to find potential duplicate records, track detection runs, and optionally auto-merge high-confidence matches.
3
+ <!-- Badges -->
4
+ <!-- [![npm version](https://img.shields.io/npm/v/@memberjunction/ai-vector-dupe)](https://www.npmjs.com/package/@memberjunction/ai-vector-dupe) -->
5
+ <!-- [![build](https://img.shields.io/github/actions/workflow/status/MemberJunction/MJ/ci.yml?branch=next)](https://github.com/MemberJunction/MJ/actions) -->
6
+
7
+ **AI-powered duplicate record detection for MemberJunction entities** -- finds, scores, tracks, and optionally auto-merges duplicate records using vector similarity, hybrid search (RRF), and optional reranking.
8
+
9
+ ---
4
10
 
5
11
  ## Architecture
6
12
 
7
- ```mermaid
8
- graph TD
9
- subgraph DupePkg["@memberjunction/ai-vector-dupe"]
10
- DRD["DuplicateRecordDetector"]
11
- VSB["VectorSyncBase"]
12
- ESC["EntitySyncConfig"]
13
- end
14
-
15
- subgraph Pipeline["Detection Pipeline"]
16
- LIST["Load Records<br/>from List"] --> VECT["Vectorize Records<br/>via Templates"]
17
- VECT --> EMBED["Generate<br/>Embeddings"]
18
- EMBED --> QUERY["Query Vector DB<br/>for Matches"]
19
- QUERY --> FILTER["Filter by<br/>Threshold"]
20
- FILTER --> TRACK["Track Results<br/>in Duplicate Runs"]
21
- TRACK --> MERGE["Auto-Merge<br/>Above Threshold"]
22
- end
23
-
24
- subgraph Dependencies["Key Dependencies"]
25
- VB["ai-vectors<br/>(VectorBase)"]
26
- SYNC["ai-vector-sync<br/>(EntityVectorSyncer)"]
27
- VDBB["ai-vectordb<br/>(VectorDBBase)"]
28
- AI["ai<br/>(BaseEmbeddings)"]
29
- end
30
-
31
- DRD -->|extends| VB
32
- DRD --> SYNC
33
- DRD --> VDBB
34
- DRD --> AI
35
-
36
- style DupePkg fill:#2d6a9f,stroke:#1a4971,color:#fff
37
- style Pipeline fill:#2d8659,stroke:#1a5c3a,color:#fff
38
- style Dependencies fill:#7c5295,stroke:#563a6b,color:#fff
39
13
  ```
14
+ +--------------------------+
15
+ | DuplicateRecordDetector |
16
+ | (extends VectorBase) |
17
+ +-----+----------+---------+
18
+ | |
19
+ +----------------+ +----------------+
20
+ | |
21
+ +---------v----------+ +-----------v---------+
22
+ | GetDuplicateRecords| | CheckSingleRecord |
23
+ | (list-based batch) | | (single record) |
24
+ +--------+-----------+ +-----------+---------+
25
+ | |
26
+ +-------------------+-------------------------+
27
+ |
28
+ +------------v------------+
29
+ | Detection Pipeline |
30
+ +-------------------------+
31
+ | 1. Validate Entity Doc |
32
+ | 2. Vectorize records |
33
+ | 3. Embed via AI model |
34
+ | 4. Query vector DB |
35
+ | (hybrid if supported)|
36
+ | 5. Filter self-matches |
37
+ | 6. Apply thresholds |
38
+ | 7. Persist match results|
39
+ | 8. Auto-merge (optional)|
40
+ +-------------------------+
41
+ |
42
+ +------------------+------------------+
43
+ | | |
44
+ +---------v------+ +-------v--------+ +-------v--------+
45
+ | ai-vector-sync | | ai-vectordb | | ai (Embeddings)|
46
+ | (vectorizer, | | (VectorDBBase, | | (BaseEmbeddings|
47
+ | templates) | | hybrid query) | | GetAIAPIKey) |
48
+ +----------------+ +----------------+ +----------------+
49
+ ```
50
+
51
+ **Key dependencies:**
52
+
53
+ | Package | Role |
54
+ |---|---|
55
+ | `@memberjunction/ai` | Embedding model abstraction and API key resolution |
56
+ | `@memberjunction/ai-vectordb` | Vector database abstraction (query, hybrid search) |
57
+ | `@memberjunction/ai-vectors` | `VectorBase` base class with metadata and RunView helpers |
58
+ | `@memberjunction/ai-vector-sync` | `EntityVectorSyncer` for record vectorization, template parsing |
59
+ | `@memberjunction/core` | Core types: `PotentialDuplicateRequest`, `DuplicateDetectionOptions`, etc. |
60
+ | `@memberjunction/core-entities` | Generated entity classes for Duplicate Runs, Lists, Entity Documents |
61
+ | `@memberjunction/global` | `MJGlobal` class factory, `UUIDsEqual` |
62
+
63
+ ---
40
64
 
41
65
  ## Installation
42
66
 
@@ -44,252 +68,249 @@ graph TD
44
68
  npm install @memberjunction/ai-vector-dupe
45
69
  ```
46
70
 
47
- ## Overview
48
-
49
- The package provides the `DuplicateRecordDetector` class, which orchestrates a complete duplicate detection workflow:
50
-
51
- 1. Loads records from a MemberJunction List
52
- 2. Vectorizes them using a configured Entity Document template and embedding model
53
- 3. Queries the vector database for similarity matches
54
- 4. Filters results against configurable thresholds
55
- 5. Creates Duplicate Run, Duplicate Run Detail, and Duplicate Run Detail Match records for tracking
56
- 6. Optionally auto-merges records that exceed the absolute match threshold
57
-
58
- ## Duplicate Detection Flow
59
-
60
- ```mermaid
61
- sequenceDiagram
62
- participant Caller
63
- participant DRD as DuplicateRecordDetector
64
- participant EVS as EntityVectorSyncer
65
- participant Embed as Embedding Model
66
- participant VDB as Vector Database
67
- participant DB as MJ Database
68
-
69
- Caller->>DRD: getDuplicateRecords(request, user)
70
- DRD->>DB: Load Entity Document
71
- DRD->>EVS: VectorizeEntity (ensure all records are indexed)
72
- DRD->>DB: Load records from List
73
-
74
- loop For each record
75
- DRD->>Embed: Generate embedding from template
76
- DRD->>VDB: queryIndex (topK=5)
77
- VDB-->>DRD: Scored matches
78
- DRD->>DRD: Filter by PotentialMatchThreshold
79
- DRD->>DB: Create DuplicateRunDetailMatch records
80
- end
81
-
82
- DRD->>DRD: Check AbsoluteMatchThreshold
83
- DRD->>DB: Auto-merge high-confidence duplicates
84
- DRD-->>Caller: PotentialDuplicateResponse
85
- ```
71
+ ---
86
72
 
87
- ## Core Components
73
+ ## Quick Start
88
74
 
89
- ### DuplicateRecordDetector
75
+ ### List-Based Batch Detection
76
+
77
+ Detect duplicates across all records in an MJ List:
78
+
79
+ ```typescript
80
+ import { DuplicateRecordDetector } from '@memberjunction/ai-vector-dupe';
81
+ import { PotentialDuplicateRequest } from '@memberjunction/core';
82
+
83
+ const detector = new DuplicateRecordDetector();
84
+
85
+ const request: PotentialDuplicateRequest = {
86
+ ListID: 'your-list-uuid',
87
+ EntityID: 'your-entity-uuid',
88
+ EntityDocumentID: 'your-entity-document-uuid',
89
+ Options: {
90
+ TopK: 10,
91
+ OnProgress: (progress) => {
92
+ console.log(`[${progress.Phase}] ${progress.ProcessedRecords}/${progress.TotalRecords} -- ${progress.MatchesFound} matches`);
93
+ },
94
+ },
95
+ };
96
+
97
+ const response = await detector.GetDuplicateRecords(request, contextUser);
98
+
99
+ if (response.Status === 'Success') {
100
+ for (const result of response.PotentialDuplicateResult) {
101
+ console.log(`Record: ${result.RecordCompositeKey.ToString()}`);
102
+ for (const dupe of result.Duplicates) {
103
+ console.log(` Match: ${dupe.ToString()} (${(dupe.ProbabilityScore * 100).toFixed(1)}%)`);
104
+ }
105
+ }
106
+ }
107
+ ```
90
108
 
91
- The main class that extends `VectorBase` from `@memberjunction/ai-vectors`.
109
+ ### Single-Record Check
92
110
 
93
- **Key method:**
111
+ Check one record for duplicates without creating a list -- ideal for server hooks (e.g., fire-and-forget after record save):
94
112
 
95
113
  ```typescript
96
- getDuplicateRecords(
97
- params: PotentialDuplicateRequest,
98
- contextUser?: UserInfo
99
- ): Promise<PotentialDuplicateResponse>
114
+ import { DuplicateRecordDetector } from '@memberjunction/ai-vector-dupe';
115
+ import { CompositeKey } from '@memberjunction/core';
116
+
117
+ const detector = new DuplicateRecordDetector();
118
+
119
+ const recordKey = new CompositeKey([{ FieldName: 'ID', Value: 'record-uuid' }]);
120
+
121
+ const result = await detector.CheckSingleRecord(
122
+ 'your-entity-document-uuid',
123
+ recordKey,
124
+ { TopK: 5 },
125
+ contextUser
126
+ );
127
+
128
+ for (const dupe of result.Duplicates) {
129
+ console.log(`Potential duplicate: ${dupe.ToString()} (score: ${dupe.ProbabilityScore})`);
130
+ }
100
131
  ```
101
132
 
102
- **Parameters in `PotentialDuplicateRequest`:**
133
+ ---
103
134
 
104
- | Field | Type | Description |
105
- |---|---|---|
106
- | `ListID` | `string` | ID of the List containing records to check |
107
- | `EntityID` | `string` | ID of the entity type |
108
- | `EntityDocumentID` | `string` | ID of the Entity Document with vectorization template |
109
- | `Options.DuplicateRunID` | `string` (optional) | Resume an existing duplicate run |
135
+ ## DuplicateDetectionOptions Reference
136
+
137
+ Options are passed via the `Options` property on `PotentialDuplicateRequest`, or directly to `CheckSingleRecord`.
138
+
139
+ | Option | Type | Default | Description |
140
+ |---|---|---|---|
141
+ | `TopK` | `number` | `5` | Number of nearest neighbors to retrieve per record |
142
+ | `DuplicateRunID` | `string` | -- | Resume an existing duplicate run (batch mode only) |
143
+ | `KeywordSearchWeight` | `number` | `0.3` | Weight for keyword search in hybrid mode (0.0 = vector only, 1.0 = keyword only). Vector weight is `1.0 - KeywordSearchWeight`. |
144
+ | `FusionMethod` | `string` | `'rrf'` | Fusion method for hybrid search. Currently supports `'rrf'` (Reciprocal Rank Fusion). |
145
+ | `OnProgress` | `(progress: DuplicateDetectionProgress) => void` | -- | Callback for real-time progress reporting |
146
+
147
+ ### Thresholds (Configured on Entity Document)
110
148
 
111
- **Thresholds (configured on Entity Document):**
149
+ Thresholds are not part of `DuplicateDetectionOptions` -- they are configured on the `EntityDocument` record itself:
112
150
 
113
151
  | Threshold | Purpose |
114
152
  |---|---|
115
- | `PotentialMatchThreshold` | Minimum similarity score to report as potential duplicate |
116
- | `AbsoluteMatchThreshold` | Minimum similarity score for automatic record merge |
153
+ | `PotentialMatchThreshold` | Minimum similarity score to report a candidate as a potential duplicate |
154
+ | `AbsoluteMatchThreshold` | Minimum similarity score to trigger automatic record merge |
117
155
 
118
- ### VectorSyncBase
156
+ ---
119
157
 
120
- A utility base class providing helper methods for vector synchronization operations:
158
+ ## Hybrid Search and Reciprocal Rank Fusion (RRF)
121
159
 
122
- - `parseStringTemplate(str, obj)` -- simple template variable substitution
123
- - `timer(ms)` -- async delay
124
- - `start()` / `end()` / `timeDiff()` -- execution timing
125
- - `saveJSONData(data, path)` -- JSON file output
160
+ When the configured vector database supports hybrid search (`VectorDBBase.SupportsHybridSearch === true`), the detector automatically combines **vector similarity** and **keyword search** for higher-quality results.
126
161
 
127
- ### EntitySyncConfig
162
+ ### How It Works
128
163
 
129
- Configuration type for entity synchronization scheduling:
164
+ 1. The record's template text is sent as both a vector embedding and a keyword query.
165
+ 2. The vector DB returns results from both retrieval methods.
166
+ 3. Results are fused using **Reciprocal Rank Fusion (RRF)**, a rank-based algorithm that is score-scale independent.
167
+
168
+ ### RRF Formula
130
169
 
131
- ```typescript
132
- type EntitySyncConfig = {
133
- EntityDocumentID: string; // Entity Document to use
134
- Interval: number; // Sync interval in seconds
135
- RunViewParams: RunViewParams; // View parameters for fetching
136
- IncludeInSync: boolean; // Whether to include in sync
137
- LastRunDate: string; // Last sync timestamp
138
- VectorIndexID: number; // Vector index ID
139
- VectorID: number; // Vector database ID
140
- };
141
170
  ```
171
+ FusedScore(d) = SUM_i [ 1 / (k + rank_i(d)) ]
172
+ ```
173
+
174
+ Where `rank_i(d)` is the 1-based rank of document `d` in list `i`, and `k` is a smoothing constant (default: 60).
142
175
 
143
- ## Usage
176
+ ### Using ComputeRRF Directly
144
177
 
145
- ### Basic Duplicate Detection
178
+ The `ComputeRRF` utility is exported for use in custom pipelines:
146
179
 
147
180
  ```typescript
148
- import { DuplicateRecordDetector } from '@memberjunction/ai-vector-dupe';
149
- import { PotentialDuplicateRequest, UserInfo } from '@memberjunction/core';
181
+ import { ComputeRRF, ScoredCandidate } from '@memberjunction/ai-vector-dupe';
182
+
183
+ const vectorResults: ScoredCandidate[] = [
184
+ { ID: 'rec-1', Score: 0.95 },
185
+ { ID: 'rec-2', Score: 0.87 },
186
+ { ID: 'rec-3', Score: 0.82 },
187
+ ];
188
+
189
+ const keywordResults: ScoredCandidate[] = [
190
+ { ID: 'rec-2', Score: 12.5 }, // Different scale -- RRF handles this
191
+ { ID: 'rec-4', Score: 10.1 },
192
+ { ID: 'rec-1', Score: 8.3 },
193
+ ];
194
+
195
+ const fused = ComputeRRF([vectorResults, keywordResults], 60);
196
+ // Results sorted by fused RRF score, score-scale independent
197
+ ```
150
198
 
151
- const detector = new DuplicateRecordDetector();
199
+ ### Tuning Hybrid Search
152
200
 
153
- const request: PotentialDuplicateRequest = {
154
- ListID: 'list-uuid',
155
- EntityID: 'entity-uuid',
156
- EntityDocumentID: 'doc-uuid'
157
- };
201
+ - **`KeywordSearchWeight = 0.0`**: Pure vector similarity (semantic matching).
202
+ - **`KeywordSearchWeight = 0.3`** (default): Slight keyword boost. Good for entities with distinctive names or codes.
203
+ - **`KeywordSearchWeight = 0.5`**: Equal weight. Useful when both semantic and lexical matches matter.
204
+ - **`KeywordSearchWeight = 1.0`**: Pure keyword search (not recommended for duplicate detection).
158
205
 
159
- const response = await detector.getDuplicateRecords(request, currentUser);
206
+ ---
160
207
 
161
- if (response.Status === 'Success') {
162
- for (const result of response.PotentialDuplicateResult) {
163
- console.log(`Record: ${result.RecordCompositeKey.ToString()}`);
164
- for (const dupe of result.Duplicates) {
165
- console.log(` Match: ${dupe.ToString()} (${(dupe.ProbabilityScore * 100).toFixed(1)}%)`);
166
- }
167
- }
168
- }
169
- ```
208
+ ## Reranking
209
+
210
+ When MJ's `BaseReranker` / `RerankerService` is configured, the detector can apply a second-stage reranking pass after initial retrieval. Reranking uses a cross-encoder model to re-score candidates with higher precision than embedding-based similarity alone.
211
+
212
+ Reranking is especially effective when:
213
+ - Initial retrieval returns many borderline candidates
214
+ - Entity records have complex, multi-field structures
215
+ - You need to maximize precision at the cost of slightly higher latency
216
+
217
+ See the [Duplicate Detection Guide](docs/DUPLICATE_DETECTION_GUIDE.md#reranking-integration) for configuration details.
218
+
219
+ ---
170
220
 
171
- ### Resuming an Existing Run
221
+ ## Progress Reporting
222
+
223
+ The `OnProgress` callback fires at each phase of the pipeline:
172
224
 
173
225
  ```typescript
174
226
  const request: PotentialDuplicateRequest = {
175
- ListID: 'list-uuid',
176
- EntityID: 'entity-uuid',
177
- EntityDocumentID: 'doc-uuid',
227
+ // ...
178
228
  Options: {
179
- DuplicateRunID: 'existing-run-uuid'
180
- }
229
+ OnProgress: (progress) => {
230
+ const { Phase, TotalRecords, ProcessedRecords, MatchesFound, ElapsedMs } = progress;
231
+ const pct = TotalRecords > 0 ? ((ProcessedRecords / TotalRecords) * 100).toFixed(0) : '0';
232
+ console.log(`[${Phase}] ${pct}% -- ${MatchesFound} matches (${ElapsedMs}ms)`);
233
+ },
234
+ },
181
235
  };
182
-
183
- const response = await detector.getDuplicateRecords(request, currentUser);
184
236
  ```
185
237
 
186
- ## Database Entities Used
187
-
188
- The package reads from and writes to these MemberJunction entities:
189
-
190
- ```mermaid
191
- erDiagram
192
- DUPLICATE_RUN {
193
- string ID PK
194
- string EntityID
195
- string StartedByUserID
196
- datetime StartedAt
197
- datetime EndedAt
198
- string ProcessingStatus
199
- string ApprovalStatus
200
- string SourceListID
201
- }
238
+ ### Progress Phases
202
239
 
203
- DUPLICATE_RUN_DETAIL {
204
- string ID PK
205
- string DuplicateRunID FK
206
- string RecordID
207
- string MatchStatus
208
- string MergeStatus
209
- }
240
+ | Phase | Description |
241
+ |---|---|
242
+ | `Vectorizing` | Records are being vectorized via `EntityVectorSyncer` |
243
+ | `Embedding` | Template texts are being embedded via the AI model |
244
+ | `Querying` | Vector DB is being queried for each record |
245
+ | `Matching` | Results are being persisted and match records created |
246
+ | `Merging` | High-confidence matches are being auto-merged |
210
247
 
211
- DUPLICATE_RUN_DETAIL_MATCH {
212
- string ID PK
213
- string DuplicateRunDetailID FK
214
- string MatchRecordID
215
- float MatchProbability
216
- datetime MatchedAt
217
- string Action
218
- string ApprovalStatus
219
- string MergeStatus
220
- }
248
+ ### DuplicateDetectionProgress Shape
221
249
 
222
- LIST {
223
- string ID PK
224
- string Name
225
- string EntityID
226
- }
250
+ ```typescript
251
+ interface DuplicateDetectionProgress {
252
+ Phase: 'Vectorizing' | 'Embedding' | 'Querying' | 'Matching' | 'Merging';
253
+ TotalRecords: number;
254
+ ProcessedRecords: number;
255
+ MatchesFound: number;
256
+ CurrentRecordID?: string;
257
+ ElapsedMs: number;
258
+ }
259
+ ```
227
260
 
228
- LIST_DETAIL {
229
- string ID PK
230
- string ListID FK
231
- string RecordID
232
- }
261
+ ---
233
262
 
234
- ENTITY_DOCUMENT {
235
- string ID PK
236
- string EntityID
237
- string TemplateID
238
- string AIModelID
239
- string VectorDatabaseID
240
- float PotentialMatchThreshold
241
- float AbsoluteMatchThreshold
242
- }
263
+ ## API Reference Summary
243
264
 
244
- DUPLICATE_RUN ||--o{ DUPLICATE_RUN_DETAIL : contains
245
- DUPLICATE_RUN_DETAIL ||--o{ DUPLICATE_RUN_DETAIL_MATCH : has
246
- DUPLICATE_RUN }o--|| LIST : "source"
247
- LIST ||--o{ LIST_DETAIL : contains
248
- ```
265
+ ### DuplicateRecordDetector
249
266
 
250
- ## Environment Variables
267
+ | Method | Signature | Description |
268
+ |---|---|---|
269
+ | `GetDuplicateRecords` | `(params: PotentialDuplicateRequest, contextUser?: UserInfo) => Promise<PotentialDuplicateResponse>` | Run batch duplicate detection for all records in a list |
270
+ | `CheckSingleRecord` | `(EntityDocumentID: string, RecordID: CompositeKey, Options?: DuplicateDetectionOptions, ContextUser?: UserInfo) => Promise<PotentialDuplicateResult>` | Check a single record for duplicates |
271
+ | `ParseVectorMatches` | `(queryResponse: BaseResponse, sourceKey?: CompositeKey) => PotentialDuplicateResult` | Parse raw vector DB response into typed results |
251
272
 
252
- ```env
253
- # AI Model API Keys
254
- OPENAI_API_KEY=your-openai-key
255
- MISTRAL_API_KEY=your-mistral-key
273
+ ### ComputeRRF
256
274
 
257
- # Vector Database
258
- PINECONE_API_KEY=your-pinecone-key
259
- PINECONE_HOST=your-pinecone-host
260
- PINECONE_DEFAULT_INDEX=your-index-name
275
+ ```typescript
276
+ function ComputeRRF(rankedLists: ScoredCandidate[][], k?: number): ScoredCandidate[]
277
+ ```
278
+
279
+ Compute Reciprocal Rank Fusion across multiple ranked result lists. Returns candidates sorted by descending fused score.
261
280
 
262
- # Database Connection
263
- DB_HOST=your-sql-server
264
- DB_PORT=1433
265
- DB_USERNAME=your-username
266
- DB_PASSWORD=your-password
267
- DB_DATABASE=your-database
281
+ ### ScoredCandidate
268
282
 
269
- # User Context
270
- CURRENT_USER_EMAIL=user@example.com
283
+ ```typescript
284
+ interface ScoredCandidate {
285
+ ID: string;
286
+ Score: number;
287
+ Metadata?: Record<string, unknown>;
288
+ }
271
289
  ```
272
290
 
273
- ## Dependencies
291
+ ---
292
+
293
+ ## Database Entities
294
+
295
+ The package reads from and writes to these MJ entities:
274
296
 
275
- | Package | Purpose |
297
+ | Entity | Purpose |
276
298
  |---|---|
277
- | `@memberjunction/ai` | `BaseEmbeddings`, `GetAIAPIKey` |
278
- | `@memberjunction/ai-vectordb` | `VectorDBBase`, `BaseResponse` |
279
- | `@memberjunction/ai-vectors` | `VectorBase` base class |
280
- | `@memberjunction/ai-vectors-pinecone` | Pinecone implementation |
281
- | `@memberjunction/ai-vector-sync` | `EntityVectorSyncer`, `EntityDocumentTemplateParser` |
282
- | `@memberjunction/aiengine` | AI engine integration |
283
- | `@memberjunction/core` | Core MJ types and data access |
284
- | `@memberjunction/core-entities` | Entity type definitions |
285
- | `@memberjunction/global` | MJGlobal class factory |
286
-
287
- ## Limitations
288
-
289
- - Duplicate detection operates within a single entity type
290
- - Requires pre-configured Entity Documents with templates
291
- - Currently supports Pinecone as the vector database provider
292
- - Records must be added to a List before detection can run
299
+ | `MJ: Entity Documents` | Configuration: template, AI model, vector DB, thresholds |
300
+ | `MJ: Lists` / `MJ: List Details` | Source records to check for duplicates |
301
+ | `MJ: Duplicate Runs` | Tracks each detection run (status, timing) |
302
+ | `MJ: Duplicate Run Details` | Per-record tracking within a run |
303
+ | `MJ: Duplicate Run Detail Matches` | Individual match results with probability scores |
304
+
305
+ ---
306
+
307
+ ## Further Reading
308
+
309
+ - **[Duplicate Detection Guide](docs/DUPLICATE_DETECTION_GUIDE.md)** -- comprehensive developer guide covering end-to-end workflow, threshold tuning, hybrid search deep dive, performance optimization, and troubleshooting
310
+ - **[MemberJunction AI Vectors](../Core/README.md)** -- base vector infrastructure
311
+ - **[AI Vector Sync](../Sync/README.md)** -- entity vectorization and template parsing
312
+
313
+ ---
293
314
 
294
315
  ## Development
295
316
 
@@ -297,8 +318,11 @@ CURRENT_USER_EMAIL=user@example.com
297
318
  # Build
298
319
  npm run build
299
320
 
300
- # Development mode
301
- npm run start
321
+ # Run tests
322
+ npm run test
323
+
324
+ # Watch mode
325
+ npm run test:watch
302
326
  ```
303
327
 
304
328
  ## License