nx-mongo 3.3.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,223 @@
1
+ # Improvement Plan for SimpleMongoHelper
2
+
3
+ Based on research of popular MongoDB helper packages and best practices, here's a comprehensive plan for future improvements.
4
+
5
+ ## Completed Features (v2.0.0)
6
+
7
+ ✅ Delete operations (deleteOne, deleteMany)
8
+ ✅ FindOne operation for single document queries
9
+ ✅ Count operations (countDocuments, estimatedDocumentCount)
10
+ ✅ Pagination support with limit, skip, and sort options
11
+ ✅ Aggregation pipeline support
12
+ ✅ Transaction support for multi-operation consistency
13
+ ✅ Connection retry logic with exponential backoff
14
+ ✅ Index management (createIndex, dropIndex, listIndexes)
15
+
16
+ ## Planned Improvements
17
+
18
+ ### Phase 1: Enhanced Query Features
19
+
20
+ #### 1.1 Query Builder (Inspired by mquery)
21
+ - **Priority:** Medium
22
+ - **Description:** Fluent query builder API for constructing complex queries
23
+ - **Benefits:** More readable and maintainable query code
24
+ - **Example:**
25
+ ```typescript
26
+ await helper.query('users')
27
+ .where('age').gt(18)
28
+ .where('active').equals(true)
29
+ .sort({ createdAt: -1 })
30
+ .limit(10)
31
+ .exec();
32
+ ```
33
+
34
+ #### 1.2 Advanced Filtering
35
+ - **Priority:** Low
36
+ - **Description:** Helper methods for common filter patterns
37
+ - **Features:**
38
+ - Date range queries
39
+ - Text search helpers
40
+ - Array operations helpers
41
+
42
+ #### 1.3 Bulk Operations
43
+ - **Priority:** High
44
+ - **Description:** Efficient bulk insert, update, and delete operations
45
+ - **Benefits:** Better performance for large datasets
46
+ - **Example:**
47
+ ```typescript
48
+ await helper.bulkInsert('users', usersArray);
49
+ await helper.bulkUpdate('users', updateOperations);
50
+ ```
51
+
52
+ ### Phase 2: Data Validation & Schema
53
+
54
+ #### 2.1 Schema Validation (Inspired by Mongoose)
55
+ - **Priority:** Medium
56
+ - **Description:** Optional schema validation using Zod or Joi
57
+ - **Benefits:** Type safety and data validation at runtime
58
+ - **Implementation:** Use Zod (MIT licensed) for validation
59
+ - **Example:**
60
+ ```typescript
61
+ const userSchema = z.object({
62
+ name: z.string(),
63
+ email: z.string().email(),
64
+ age: z.number().min(0).max(150)
65
+ });
66
+
67
+ await helper.insert('users', userData, { schema: userSchema });
68
+ ```
69
+
70
+ #### 2.2 Type-Safe Collections
71
+ - **Priority:** Medium
72
+ - **Description:** Collection-specific type definitions
73
+ - **Benefits:** Better TypeScript inference and type safety
74
+
75
+ ### Phase 3: Performance & Caching
76
+
77
+ #### 3.1 Query Result Caching
78
+ - **Priority:** Medium
79
+ - **Description:** Optional caching layer using node-cache (MIT licensed)
80
+ - **Benefits:** Reduce database load for frequently accessed data
81
+ - **Features:**
82
+ - TTL-based cache
83
+ - Cache invalidation
84
+ - Configurable cache keys
85
+
86
+ #### 3.2 Connection Pooling Configuration
87
+ - **Priority:** Low
88
+ - **Description:** Expose MongoDB connection pool options
89
+ - **Benefits:** Better control over connection management
90
+
91
+ #### 3.3 Query Performance Monitoring
92
+ - **Priority:** Low
93
+ - **Description:** Optional query timing and performance metrics
94
+ - **Benefits:** Identify slow queries
95
+
96
+ ### Phase 4: Developer Experience
97
+
98
+ #### 4.1 Migration Support (Inspired by migrate-mongoose)
99
+ - **Priority:** Medium
100
+ - **Description:** Database migration framework
101
+ - **Benefits:** Version-controlled schema changes
102
+ - **Features:**
103
+ - Migration files
104
+ - Up/down migrations
105
+ - Migration history tracking
106
+
107
+ #### 4.2 Seeding Support (Inspired by mongo-seeding)
108
+ - **Priority:** Low
109
+ - **Description:** Database seeding utilities
110
+ - **Benefits:** Easy test data setup
111
+
112
+ #### 4.3 Logging & Debugging
113
+ - **Priority:** Low
114
+ - **Description:** Optional logging for operations
115
+ - **Features:**
116
+ - Query logging
117
+ - Error logging
118
+ - Performance logging
119
+ - **Implementation:** Use winston or pino (MIT licensed)
120
+
121
+ #### 4.4 Middleware/Hooks Support
122
+ - **Priority:** Low
123
+ - **Description:** Pre/post operation hooks
124
+ - **Example:**
125
+ ```typescript
126
+ helper.beforeInsert((data) => {
127
+ data.createdAt = new Date();
128
+ });
129
+ ```
130
+
131
+ ### Phase 5: Advanced Features
132
+
133
+ #### 5.1 Change Streams
134
+ - **Priority:** Low
135
+ - **Description:** Support for MongoDB change streams
136
+ - **Benefits:** Real-time data monitoring
137
+
138
+ #### 5.2 GridFS Support
139
+ - **Priority:** Low
140
+ - **Description:** File storage operations
141
+ - **Benefits:** Store large files in MongoDB
142
+
143
+ #### 5.3 Full-Text Search
144
+ - **Priority:** Low
145
+ - **Description:** Text search index and query helpers
146
+ - **Benefits:** Better search capabilities
147
+
148
+ ### Phase 6: Testing & Documentation
149
+
150
+ #### 6.1 Comprehensive Test Suite
151
+ - **Priority:** High
152
+ - **Description:** Unit tests, integration tests, and E2E tests
153
+ - **Tools:** Jest or Mocha (MIT licensed)
154
+
155
+ #### 6.2 API Documentation
156
+ - **Priority:** High
157
+ - **Description:** Auto-generated API docs using TypeDoc
158
+ - **Benefits:** Better developer experience
159
+
160
+ #### 6.3 Examples & Tutorials
161
+ - **Priority:** Medium
162
+ - **Description:** More examples and use cases in README
163
+ - **Topics:**
164
+ - Common patterns
165
+ - Best practices
166
+ - Performance tips
167
+
168
+ ### Phase 7: Package Improvements
169
+
170
+ #### 7.1 ESM Support
171
+ - **Priority:** Medium
172
+ - **Description:** Support for ES modules alongside CommonJS
173
+ - **Benefits:** Modern JavaScript support
174
+
175
+ #### 7.2 Tree Shaking
176
+ - **Priority:** Low
177
+ - **Description:** Optimize bundle size
178
+ - **Benefits:** Smaller package size for users
179
+
180
+ #### 7.3 Type Definitions
181
+ - **Priority:** High
182
+ - **Description:** Ensure all types are properly exported
183
+ - **Status:** Mostly complete, needs review
184
+
185
+ ## Recommended Next Steps
186
+
187
+ 1. **Immediate (v2.1.0):**
188
+ - Add bulk operations (high impact, relatively simple)
189
+ - Improve test coverage
190
+ - Add more examples to README
191
+
192
+ 2. **Short-term (v2.2.0):**
193
+ - Add query builder (if users request it)
194
+ - Add optional Zod validation
195
+ - Add connection pooling configuration
196
+
197
+ 3. **Medium-term (v3.0.0):**
198
+ - Add migration support
199
+ - Add caching layer
200
+ - Add comprehensive test suite
201
+
202
+ 4. **Long-term:**
203
+ - Add advanced features based on user feedback
204
+ - Consider breaking into smaller packages if needed
205
+
206
+ ## Dependencies to Consider
207
+
208
+ All packages below are MIT or Apache licensed:
209
+
210
+ - **Zod** (MIT) - Schema validation
211
+ - **node-cache** (MIT) - Caching layer
212
+ - **winston** (MIT) - Logging
213
+ - **Jest** (MIT) - Testing framework
214
+ - **TypeDoc** (Apache 2.0) - Documentation generation
215
+
216
+ ## Notes
217
+
218
+ - Keep the package lightweight - only add dependencies when necessary
219
+ - Maintain backward compatibility when possible
220
+ - Follow semantic versioning
221
+ - Gather user feedback before implementing major features
222
+ - Consider performance implications of new features
223
+
@@ -0,0 +1,460 @@
1
+ # Generic Instructions for Config-Driven Ref Mapping and Signature-Based Deduplication
2
+
3
+ ## Abstract
4
+
5
+ This document provides generic instructions for implementing a **config-driven collection mapping** and **signature-based deduplication** system for database operations. The solution enables applications to work with logical references (refs) instead of physical collection names, while automatically handling duplicate prevention through deterministic document signatures.
6
+
7
+ ## Core Concepts
8
+
9
+ ### 1. Reference-Based Abstraction
10
+
11
+ Instead of hardcoding collection names throughout an application, define a mapping configuration that associates logical reference names with physical database collections. This provides:
12
+
13
+ - **Flexibility**: Change collection names without modifying application code
14
+ - **Environment-specific mappings**: Different collections per environment (dev, staging, prod)
15
+ - **Query encapsulation**: Associate default queries with references
16
+ - **Separation of concerns**: Business logic uses refs, infrastructure uses collections
17
+
18
+ ### 2. Signature-Based Deduplication
19
+
20
+ Generate deterministic signatures for documents based on configurable key paths. Use these signatures to:
21
+
22
+ - **Prevent duplicates**: Automatically detect and handle duplicate documents
23
+ - **Upsert behavior**: Update existing documents or insert new ones based on signature matching
24
+ - **Business logic independence**: Duplicate detection based on business rules, not database IDs
25
+
26
+ ## Configuration Schema
27
+
28
+ ### Input Configuration
29
+
30
+ Define how to read data from collections:
31
+
32
+ ```typescript
33
+ interface InputConfig {
34
+ ref: string; // Logical reference name (e.g., "topology", "vulnerabilities")
35
+ collection: string; // Physical collection name (e.g., "topology-definition-neo-data")
36
+ query?: Filter<any>; // Optional default query filter
37
+ }
38
+ ```
39
+
40
+ **Purpose**: Map logical references to physical collections with optional query filters.
41
+
42
+ **Example**:
43
+ ```json
44
+ {
45
+ "ref": "highSeverityVulns",
46
+ "collection": "vulnerabilities-data",
47
+ "query": { "severity": { "$in": ["high", "critical"] } }
48
+ }
49
+ ```
50
+
51
+ ### Output Configuration
52
+
53
+ Define how to write data to collections:
54
+
55
+ ```typescript
56
+ interface OutputConfig {
57
+ ref: string; // Logical reference name
58
+ collection: string; // Physical collection name
59
+ keys?: string[]; // Dot-notation paths for signature generation
60
+ mode?: "append" | "replace"; // Write mode
61
+ }
62
+ ```
63
+
64
+ **Purpose**: Map logical references to physical collections with signature keys and write behavior.
65
+
66
+ **Example**:
67
+ ```json
68
+ {
69
+ "ref": "paths",
70
+ "collection": "paths-neo-data",
71
+ "keys": ["segments[]", "edges[].from", "edges[].to", "target_role"],
72
+ "mode": "append"
73
+ }
74
+ ```
75
+
76
+ ### Global Configuration
77
+
78
+ ```typescript
79
+ interface Config {
80
+ inputs: InputConfig[];
81
+ outputs: OutputConfig[];
82
+ output?: {
83
+ mode?: "append" | "replace"; // Global default mode
84
+ };
85
+ }
86
+ ```
87
+
88
+ ## Implementation Components
89
+
90
+ ### 1. Dot-Path Value Extraction
91
+
92
+ **Purpose**: Extract values from nested objects using dot-notation paths with array wildcard support.
93
+
94
+ **Requirements**:
95
+ - Support simple paths: `"meta.id"` → extracts `meta.id`
96
+ - Support array wildcards: `"segments[]"` → extracts all elements from `segments` array
97
+ - Support nested array access: `"edges[].from"` → extracts `from` from all `edges` elements
98
+ - Recursively flatten nested arrays
99
+ - Handle null/undefined gracefully (return empty array)
100
+
101
+ **Algorithm**:
102
+ 1. Parse path for array wildcards (`[]`)
103
+ 2. If wildcard at end: extract array, flatten recursively
104
+ 3. If wildcard in middle: extract from each array element, then continue path
105
+ 4. If simple path: traverse object properties
106
+ 5. Return array of extracted values
107
+
108
+ **Example Implementation Pattern**:
109
+ ```typescript
110
+ function extractByPath(obj: any, path: string): any[] {
111
+ // Handle array wildcards
112
+ if (path.endsWith('[]')) {
113
+ const basePath = path.slice(0, -2);
114
+ const array = extractByPath(obj, basePath);
115
+ return flatten(array);
116
+ }
117
+
118
+ // Handle nested paths
119
+ // ... recursive extraction logic
120
+ }
121
+ ```
122
+
123
+ ### 2. Value Normalization
124
+
125
+ **Purpose**: Convert values to consistent string representations for signature computation.
126
+
127
+ **Normalization Rules**:
128
+
129
+ | Type | Normalization |
130
+ |------|---------------|
131
+ | String | As-is |
132
+ | Number | `String(value)` |
133
+ | Boolean | `"true"` or `"false"` |
134
+ | Date | `value.toISOString()` (UTC) |
135
+ | Null/Undefined | `"null"` |
136
+ | Object | `JSON.stringify(value, Object.keys(value).sort())` (sorted keys) |
137
+ | Array | Flatten → Normalize each → Deduplicate → Sort → `JSON.stringify` |
138
+
139
+ **Critical**: Objects must have sorted keys for consistent stringification.
140
+
141
+ ### 3. Signature Computation
142
+
143
+ **Purpose**: Generate deterministic hash signatures for documents.
144
+
145
+ **Algorithm**:
146
+ 1. **Extract**: For each key path, extract values using dot-path extraction
147
+ 2. **Normalize**: Normalize all extracted values per normalization rules
148
+ 3. **Deduplicate**: Remove duplicate normalized values per key
149
+ 4. **Sort**: Sort normalized values lexicographically per key
150
+ 5. **Canonical Map**: Create `{ key1: [values], key2: [values], ... }`
151
+ 6. **Sort Keys**: Sort map keys alphabetically
152
+ 7. **Stringify**: `JSON.stringify(canonicalMap)`
153
+ 8. **Hash**: Apply hash algorithm (SHA-256 recommended)
154
+ 9. **Return**: Hex string
155
+
156
+ **Properties**:
157
+ - **Deterministic**: Same document + same keys → same signature
158
+ - **Collision-resistant**: Different documents → different signatures (with high probability)
159
+ - **Order-independent**: Array element order doesn't affect signature (due to sorting)
160
+
161
+ ### 4. Index Management
162
+
163
+ **Purpose**: Ensure unique index exists on signature field for efficient lookups.
164
+
165
+ **Requirements**:
166
+ - **Idempotent**: Safe to call multiple times
167
+ - **Conflict handling**: If index exists with wrong options, drop and recreate
168
+ - **Field name**: Configurable (default: `_sig`)
169
+ - **Uniqueness**: Always unique (prevents duplicates)
170
+
171
+ **Algorithm**:
172
+ 1. List existing indexes on collection
173
+ 2. Find index on signature field
174
+ 3. If exists with correct options → return (no-op)
175
+ 4. If exists with wrong options → drop index
176
+ 5. Create index with correct options
177
+ 6. Return creation status and index name
178
+
179
+ ### 5. Load by Reference
180
+
181
+ **Purpose**: Load data using logical reference name instead of collection name.
182
+
183
+ **Algorithm**:
184
+ 1. Look up reference in `config.inputs[]`
185
+ 2. If not found → error
186
+ 3. Get `collection` and `query` from config
187
+ 4. Execute query on collection (with optional pagination)
188
+ 5. Return results
189
+
190
+ **Features**:
191
+ - Automatic query application
192
+ - Pagination support
193
+ - Transaction session support
194
+
195
+ ### 6. Write by Reference
196
+
197
+ **Purpose**: Write data using logical reference with automatic deduplication.
198
+
199
+ **Algorithm**:
200
+
201
+ **If `mode === "replace"`**:
202
+ 1. Clear collection (within transaction if session provided)
203
+
204
+ **If `keys` specified** (signature-based):
205
+ 1. Ensure signature index exists
206
+ 2. For each document:
207
+ - Compute signature using `keys`
208
+ - Add `_sig` field to document
209
+ - Create upsert operation: `{ filter: { _sig: sig }, update: { $set: doc }, upsert: true }`
210
+ 3. Execute bulk upsert operations (batch size: 1000, ordered: false)
211
+ 4. Aggregate results: count inserted vs updated
212
+
213
+ **If `keys` not specified** (regular insert):
214
+ 1. Execute `insertMany` (ordered: false)
215
+ 2. Aggregate errors
216
+
217
+ **Error Handling**:
218
+ - **Never throw on first error**: Aggregate all errors
219
+ - Continue processing after errors
220
+ - Return structured result: `{ inserted, updated, errors[], indexCreated }`
221
+
222
+ ## Write Modes
223
+
224
+ ### Append Mode (Default)
225
+
226
+ - **Behavior**: Add documents to collection
227
+ - **Deduplication**: If keys specified, upsert by signature (update if exists, insert if new)
228
+ - **Use case**: Incremental data updates, accumulating data over time
229
+
230
+ ### Replace Mode
231
+
232
+ - **Behavior**: Clear collection, then insert documents
233
+ - **Deduplication**: Still applies if keys specified (for the new batch)
234
+ - **Use case**: Full data refresh, replacing entire dataset
235
+
236
+ ## Transaction Support
237
+
238
+ All operations should support optional transaction sessions:
239
+
240
+ - **Load operations**: Use session in query execution
241
+ - **Write operations**: Use session in bulk operations
242
+ - **Replace mode**: Wrap clear + write in transaction
243
+ - **Error handling**: Rollback on transaction errors
244
+
245
+ ## Error Aggregation
246
+
247
+ **Principle**: Never fail fast on first error. Continue processing and aggregate all errors.
248
+
249
+ **Error Structure**:
250
+ ```typescript
251
+ interface ErrorInfo {
252
+ index: number; // Document index in input array
253
+ error: Error; // Error object
254
+ doc?: any; // Optional: problematic document
255
+ }
256
+ ```
257
+
258
+ **Benefits**:
259
+ - Partial success: Some documents succeed even if others fail
260
+ - Better observability: See all failures at once
261
+ - Easier debugging: Know which documents failed
262
+
263
+ ## Performance Considerations
264
+
265
+ ### Batch Processing
266
+
267
+ - Process documents in batches (recommended: 1000 per batch)
268
+ - Use unordered bulk operations (`ordered: false`) for better performance
269
+ - Continue processing even if a batch has errors
270
+
271
+ ### Index Management
272
+
273
+ - Create signature index once (idempotent operation)
274
+ - Index should be unique for duplicate prevention
275
+ - Index name typically auto-generated (e.g., `_sig_1`)
276
+
277
+ ### Large Arrays in Signatures
278
+
279
+ - **No automatic capping**: Arrays are processed fully
280
+ - **Performance impact**: Large arrays (1000+ elements) may slow signature computation
281
+ - **Future enhancement**: Consider optional `signatureLimit` parameter
282
+
283
+ ## Security Considerations
284
+
285
+ ### Index Creation Permissions
286
+
287
+ - Requires `createIndex` privilege
288
+ - Provide clear error messages if permission denied
289
+ - Document permission requirements
290
+
291
+ ### Signature Field
292
+
293
+ - Signature field (`_sig`) is added to documents automatically
294
+ - Applications should not manually set this field
295
+ - Field name is configurable but defaults to `_sig`
296
+
297
+ ## Testing Requirements
298
+
299
+ ### Unit Tests
300
+
301
+ 1. **Dot-path extraction**:
302
+ - Simple paths
303
+ - Array wildcards
304
+ - Nested arrays
305
+ - Null/undefined handling
306
+
307
+ 2. **Value normalization**:
308
+ - All data types
309
+ - Object key sorting
310
+ - Array flattening and sorting
311
+
312
+ 3. **Signature computation**:
313
+ - Deterministic (same input → same output)
314
+ - Uniqueness (different inputs → different outputs)
315
+ - Order independence
316
+
317
+ ### Integration Tests
318
+
319
+ 1. **Load by reference**:
320
+ - Valid ref lookup
321
+ - Invalid ref error
322
+ - Query application
323
+ - Pagination
324
+
325
+ 2. **Write by reference**:
326
+ - Append mode with deduplication
327
+ - Replace mode
328
+ - Error aggregation
329
+ - Transaction support
330
+
331
+ 3. **Index management**:
332
+ - Idempotent creation
333
+ - Conflict handling
334
+ - Permission errors
335
+
336
+ ### Performance Tests
337
+
338
+ 1. Large document arrays (1000+ documents)
339
+ 2. Large arrays in signatures (1000+ elements per key)
340
+ 3. Batch processing efficiency
341
+
342
+ ## Migration Guide
343
+
344
+ ### For Applications Using Direct Collection Names
345
+
346
+ **Before**:
347
+ ```typescript
348
+ const docs = await helper.loadCollection('topology-definition-neo-data', { type: 'active' });
349
+ await helper.insert('paths-neo-data', pathDocs);
350
+ ```
351
+
352
+ **After**:
353
+ ```typescript
354
+ // Define config once
355
+ const config = {
356
+ inputs: [
357
+ { ref: 'topology', collection: 'topology-definition-neo-data', query: { type: 'active' } }
358
+ ],
359
+ outputs: [
360
+ { ref: 'paths', collection: 'paths-neo-data', keys: ['segments[]', 'target_role'] }
361
+ ]
362
+ };
363
+
364
+ // Use refs in code
365
+ const docs = await helper.loadByRef('topology');
366
+ await helper.writeByRef('paths', pathDocs);
367
+ ```
368
+
369
+ ### Benefits
370
+
371
+ - **Flexibility**: Change collection names in config only
372
+ - **Automatic deduplication**: No manual duplicate checking
373
+ - **Query encapsulation**: Queries defined once in config
374
+ - **Type safety**: Ref names can be validated
375
+
376
+ ## Best Practices
377
+
378
+ ### 1. Choosing Signature Keys
379
+
380
+ - **Select business-critical fields**: Fields that define document uniqueness
381
+ - **Avoid volatile fields**: Don't include timestamps, auto-incrementing IDs
382
+ - **Include all relevant fields**: Missing fields may cause false duplicates
383
+ - **Test signature uniqueness**: Verify different documents produce different signatures
384
+
385
+ ### 2. Write Mode Selection
386
+
387
+ - **Append mode**: For incremental updates, accumulating data
388
+ - **Replace mode**: For full refreshes, replacing entire datasets
389
+ - **Consider data size**: Replace mode may be slow for large collections
390
+
391
+ ### 3. Error Handling
392
+
393
+ - **Always check error array**: Even if `inserted > 0`, check for errors
394
+ - **Log errors**: Log failed documents for debugging
395
+ - **Retry logic**: Consider retrying failed documents
396
+
397
+ ### 4. Configuration Management
398
+
399
+ - **Environment-specific configs**: Different configs per environment
400
+ - **Version control**: Keep configs in version control
401
+ - **Validation**: Validate config structure at startup
402
+
403
+ ## API Design Principles
404
+
405
+ ### 1. Generic and Reusable
406
+
407
+ - All logic built into the helper/library
408
+ - Applications only pass refs and documents
409
+ - No application-specific code in helper
410
+
411
+ ### 2. Backward Compatible
412
+
413
+ - Existing APIs remain unchanged
414
+ - New features are additive
415
+ - Config is optional
416
+
417
+ ### 3. Explicit Configuration
418
+
419
+ - Configuration is explicit (not inferred)
420
+ - Clear separation between inputs and outputs
421
+ - Per-output overrides for global defaults
422
+
423
+ ### 4. Observable Results
424
+
425
+ - Return detailed results (inserted, updated, errors)
426
+ - Include metadata (index created, etc.)
427
+ - Never hide errors
428
+
429
+ ## Future Enhancements
430
+
431
+ ### Optional Features
432
+
433
+ 1. **Signature limit**: Cap array element counts for performance
434
+ 2. **Custom normalization**: Allow custom normalization functions
435
+ 3. **Signature validation**: Utilities to validate signature consistency
436
+ 4. **Config validation**: Schema validation for configuration
437
+ 5. **Metrics**: Performance metrics for operations
438
+
439
+ ### Considerations
440
+
441
+ - **Concurrency**: Document concurrent write behavior
442
+ - **Large datasets**: Consider streaming for very large datasets
443
+ - **Custom hash algorithms**: Support for additional algorithms beyond SHA-256
444
+
445
+ ## Conclusion
446
+
447
+ This solution provides a generic, reusable approach to:
448
+
449
+ 1. **Abstraction**: Work with logical references instead of physical collections
450
+ 2. **Deduplication**: Automatic duplicate prevention based on business logic
451
+ 3. **Flexibility**: Change collection names and queries without code changes
452
+ 4. **Observability**: Detailed results and error aggregation
453
+ 5. **Performance**: Efficient bulk operations with batching
454
+
455
+ The implementation is database-agnostic in concept, though specific examples use MongoDB syntax. The principles apply to any database system that supports:
456
+ - Indexed queries
457
+ - Bulk write operations
458
+ - Unique constraints
459
+ - Transaction support
460
+