nx-mongo 3.3.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/IMPROVEMENT_PLAN.md +223 -0
- package/PROVIDER_INSTRUCTIONS.md +460 -0
- package/README.md +1144 -0
- package/dist/simpleMongoHelper.d.ts +366 -0
- package/dist/simpleMongoHelper.d.ts.map +1 -0
- package/dist/simpleMongoHelper.js +1333 -0
- package/dist/simpleMongoHelper.js.map +1 -0
- package/dist/test.d.ts +2 -0
- package/dist/test.d.ts.map +1 -0
- package/dist/test.js +179 -0
- package/dist/test.js.map +1 -0
- package/package.json +41 -0
- package/src/simpleMongoHelper.ts +1660 -0
- package/src/test.ts +209 -0
- package/tsconfig.json +21 -0
|
@@ -0,0 +1,223 @@
|
|
|
1
|
+
# Improvement Plan for SimpleMongoHelper
|
|
2
|
+
|
|
3
|
+
Based on research of popular MongoDB helper packages and best practices, here's a comprehensive plan for future improvements.
|
|
4
|
+
|
|
5
|
+
## Completed Features (v2.0.0)
|
|
6
|
+
|
|
7
|
+
✅ Delete operations (deleteOne, deleteMany)
|
|
8
|
+
✅ FindOne operation for single document queries
|
|
9
|
+
✅ Count operations (countDocuments, estimatedDocumentCount)
|
|
10
|
+
✅ Pagination support with limit, skip, and sort options
|
|
11
|
+
✅ Aggregation pipeline support
|
|
12
|
+
✅ Transaction support for multi-operation consistency
|
|
13
|
+
✅ Connection retry logic with exponential backoff
|
|
14
|
+
✅ Index management (createIndex, dropIndex, listIndexes)
|
|
15
|
+
|
|
16
|
+
## Planned Improvements
|
|
17
|
+
|
|
18
|
+
### Phase 1: Enhanced Query Features
|
|
19
|
+
|
|
20
|
+
#### 1.1 Query Builder (Inspired by mquery)
|
|
21
|
+
- **Priority:** Medium
|
|
22
|
+
- **Description:** Fluent query builder API for constructing complex queries
|
|
23
|
+
- **Benefits:** More readable and maintainable query code
|
|
24
|
+
- **Example:**
|
|
25
|
+
```typescript
|
|
26
|
+
await helper.query('users')
|
|
27
|
+
.where('age').gt(18)
|
|
28
|
+
.where('active').equals(true)
|
|
29
|
+
.sort({ createdAt: -1 })
|
|
30
|
+
.limit(10)
|
|
31
|
+
.exec();
|
|
32
|
+
```
|
|
33
|
+
|
|
34
|
+
#### 1.2 Advanced Filtering
|
|
35
|
+
- **Priority:** Low
|
|
36
|
+
- **Description:** Helper methods for common filter patterns
|
|
37
|
+
- **Features:**
|
|
38
|
+
- Date range queries
|
|
39
|
+
- Text search helpers
|
|
40
|
+
- Array operations helpers
|
|
41
|
+
|
|
42
|
+
#### 1.3 Bulk Operations
|
|
43
|
+
- **Priority:** High
|
|
44
|
+
- **Description:** Efficient bulk insert, update, and delete operations
|
|
45
|
+
- **Benefits:** Better performance for large datasets
|
|
46
|
+
- **Example:**
|
|
47
|
+
```typescript
|
|
48
|
+
await helper.bulkInsert('users', usersArray);
|
|
49
|
+
await helper.bulkUpdate('users', updateOperations);
|
|
50
|
+
```
|
|
51
|
+
|
|
52
|
+
### Phase 2: Data Validation & Schema
|
|
53
|
+
|
|
54
|
+
#### 2.1 Schema Validation (Inspired by Mongoose)
|
|
55
|
+
- **Priority:** Medium
|
|
56
|
+
- **Description:** Optional schema validation using Zod or Joi
|
|
57
|
+
- **Benefits:** Type safety and data validation at runtime
|
|
58
|
+
- **Implementation:** Use Zod (MIT licensed) for validation
|
|
59
|
+
- **Example:**
|
|
60
|
+
```typescript
|
|
61
|
+
const userSchema = z.object({
|
|
62
|
+
name: z.string(),
|
|
63
|
+
email: z.string().email(),
|
|
64
|
+
age: z.number().min(0).max(150)
|
|
65
|
+
});
|
|
66
|
+
|
|
67
|
+
await helper.insert('users', userData, { schema: userSchema });
|
|
68
|
+
```
|
|
69
|
+
|
|
70
|
+
#### 2.2 Type-Safe Collections
|
|
71
|
+
- **Priority:** Medium
|
|
72
|
+
- **Description:** Collection-specific type definitions
|
|
73
|
+
- **Benefits:** Better TypeScript inference and type safety
|
|
74
|
+
|
|
75
|
+
### Phase 3: Performance & Caching
|
|
76
|
+
|
|
77
|
+
#### 3.1 Query Result Caching
|
|
78
|
+
- **Priority:** Medium
|
|
79
|
+
- **Description:** Optional caching layer using node-cache (MIT licensed)
|
|
80
|
+
- **Benefits:** Reduce database load for frequently accessed data
|
|
81
|
+
- **Features:**
|
|
82
|
+
- TTL-based cache
|
|
83
|
+
- Cache invalidation
|
|
84
|
+
- Configurable cache keys
|
|
85
|
+
|
|
86
|
+
#### 3.2 Connection Pooling Configuration
|
|
87
|
+
- **Priority:** Low
|
|
88
|
+
- **Description:** Expose MongoDB connection pool options
|
|
89
|
+
- **Benefits:** Better control over connection management
|
|
90
|
+
|
|
91
|
+
#### 3.3 Query Performance Monitoring
|
|
92
|
+
- **Priority:** Low
|
|
93
|
+
- **Description:** Optional query timing and performance metrics
|
|
94
|
+
- **Benefits:** Identify slow queries
|
|
95
|
+
|
|
96
|
+
### Phase 4: Developer Experience
|
|
97
|
+
|
|
98
|
+
#### 4.1 Migration Support (Inspired by migrate-mongoose)
|
|
99
|
+
- **Priority:** Medium
|
|
100
|
+
- **Description:** Database migration framework
|
|
101
|
+
- **Benefits:** Version-controlled schema changes
|
|
102
|
+
- **Features:**
|
|
103
|
+
- Migration files
|
|
104
|
+
- Up/down migrations
|
|
105
|
+
- Migration history tracking
|
|
106
|
+
|
|
107
|
+
#### 4.2 Seeding Support (Inspired by mongo-seeding)
|
|
108
|
+
- **Priority:** Low
|
|
109
|
+
- **Description:** Database seeding utilities
|
|
110
|
+
- **Benefits:** Easy test data setup
|
|
111
|
+
|
|
112
|
+
#### 4.3 Logging & Debugging
|
|
113
|
+
- **Priority:** Low
|
|
114
|
+
- **Description:** Optional logging for operations
|
|
115
|
+
- **Features:**
|
|
116
|
+
- Query logging
|
|
117
|
+
- Error logging
|
|
118
|
+
- Performance logging
|
|
119
|
+
- **Implementation:** Use winston or pino (MIT licensed)
|
|
120
|
+
|
|
121
|
+
#### 4.4 Middleware/Hooks Support
|
|
122
|
+
- **Priority:** Low
|
|
123
|
+
- **Description:** Pre/post operation hooks
|
|
124
|
+
- **Example:**
|
|
125
|
+
```typescript
|
|
126
|
+
helper.beforeInsert((data) => {
|
|
127
|
+
data.createdAt = new Date();
|
|
128
|
+
});
|
|
129
|
+
```
|
|
130
|
+
|
|
131
|
+
### Phase 5: Advanced Features
|
|
132
|
+
|
|
133
|
+
#### 5.1 Change Streams
|
|
134
|
+
- **Priority:** Low
|
|
135
|
+
- **Description:** Support for MongoDB change streams
|
|
136
|
+
- **Benefits:** Real-time data monitoring
|
|
137
|
+
|
|
138
|
+
#### 5.2 GridFS Support
|
|
139
|
+
- **Priority:** Low
|
|
140
|
+
- **Description:** File storage operations
|
|
141
|
+
- **Benefits:** Store large files in MongoDB
|
|
142
|
+
|
|
143
|
+
#### 5.3 Full-Text Search
|
|
144
|
+
- **Priority:** Low
|
|
145
|
+
- **Description:** Text search index and query helpers
|
|
146
|
+
- **Benefits:** Better search capabilities
|
|
147
|
+
|
|
148
|
+
### Phase 6: Testing & Documentation
|
|
149
|
+
|
|
150
|
+
#### 6.1 Comprehensive Test Suite
|
|
151
|
+
- **Priority:** High
|
|
152
|
+
- **Description:** Unit tests, integration tests, and E2E tests
|
|
153
|
+
- **Tools:** Jest or Mocha (MIT licensed)
|
|
154
|
+
|
|
155
|
+
#### 6.2 API Documentation
|
|
156
|
+
- **Priority:** High
|
|
157
|
+
- **Description:** Auto-generated API docs using TypeDoc
|
|
158
|
+
- **Benefits:** Better developer experience
|
|
159
|
+
|
|
160
|
+
#### 6.3 Examples & Tutorials
|
|
161
|
+
- **Priority:** Medium
|
|
162
|
+
- **Description:** More examples and use cases in README
|
|
163
|
+
- **Topics:**
|
|
164
|
+
- Common patterns
|
|
165
|
+
- Best practices
|
|
166
|
+
- Performance tips
|
|
167
|
+
|
|
168
|
+
### Phase 7: Package Improvements
|
|
169
|
+
|
|
170
|
+
#### 7.1 ESM Support
|
|
171
|
+
- **Priority:** Medium
|
|
172
|
+
- **Description:** Support for ES modules alongside CommonJS
|
|
173
|
+
- **Benefits:** Modern JavaScript support
|
|
174
|
+
|
|
175
|
+
#### 7.2 Tree Shaking
|
|
176
|
+
- **Priority:** Low
|
|
177
|
+
- **Description:** Optimize bundle size
|
|
178
|
+
- **Benefits:** Smaller package size for users
|
|
179
|
+
|
|
180
|
+
#### 7.3 Type Definitions
|
|
181
|
+
- **Priority:** High
|
|
182
|
+
- **Description:** Ensure all types are properly exported
|
|
183
|
+
- **Status:** Mostly complete, needs review
|
|
184
|
+
|
|
185
|
+
## Recommended Next Steps
|
|
186
|
+
|
|
187
|
+
1. **Immediate (v2.1.0):**
|
|
188
|
+
- Add bulk operations (high impact, relatively simple)
|
|
189
|
+
- Improve test coverage
|
|
190
|
+
- Add more examples to README
|
|
191
|
+
|
|
192
|
+
2. **Short-term (v2.2.0):**
|
|
193
|
+
- Add query builder (if users request it)
|
|
194
|
+
- Add optional Zod validation
|
|
195
|
+
- Add connection pooling configuration
|
|
196
|
+
|
|
197
|
+
3. **Medium-term (v3.0.0):**
|
|
198
|
+
- Add migration support
|
|
199
|
+
- Add caching layer
|
|
200
|
+
- Add comprehensive test suite
|
|
201
|
+
|
|
202
|
+
4. **Long-term:**
|
|
203
|
+
- Add advanced features based on user feedback
|
|
204
|
+
- Consider breaking into smaller packages if needed
|
|
205
|
+
|
|
206
|
+
## Dependencies to Consider
|
|
207
|
+
|
|
208
|
+
All packages below are MIT or Apache licensed:
|
|
209
|
+
|
|
210
|
+
- **Zod** (MIT) - Schema validation
|
|
211
|
+
- **node-cache** (MIT) - Caching layer
|
|
212
|
+
- **winston** (MIT) - Logging
|
|
213
|
+
- **Jest** (MIT) - Testing framework
|
|
214
|
+
- **TypeDoc** (Apache 2.0) - Documentation generation
|
|
215
|
+
|
|
216
|
+
## Notes
|
|
217
|
+
|
|
218
|
+
- Keep the package lightweight - only add dependencies when necessary
|
|
219
|
+
- Maintain backward compatibility when possible
|
|
220
|
+
- Follow semantic versioning
|
|
221
|
+
- Gather user feedback before implementing major features
|
|
222
|
+
- Consider performance implications of new features
|
|
223
|
+
|
|
@@ -0,0 +1,460 @@
|
|
|
1
|
+
# Generic Instructions for Config-Driven Ref Mapping and Signature-Based Deduplication
|
|
2
|
+
|
|
3
|
+
## Abstract
|
|
4
|
+
|
|
5
|
+
This document provides generic instructions for implementing a **config-driven collection mapping** and **signature-based deduplication** system for database operations. The solution enables applications to work with logical references (refs) instead of physical collection names, while automatically handling duplicate prevention through deterministic document signatures.
|
|
6
|
+
|
|
7
|
+
## Core Concepts
|
|
8
|
+
|
|
9
|
+
### 1. Reference-Based Abstraction
|
|
10
|
+
|
|
11
|
+
Instead of hardcoding collection names throughout an application, define a mapping configuration that associates logical reference names with physical database collections. This provides:
|
|
12
|
+
|
|
13
|
+
- **Flexibility**: Change collection names without modifying application code
|
|
14
|
+
- **Environment-specific mappings**: Different collections per environment (dev, staging, prod)
|
|
15
|
+
- **Query encapsulation**: Associate default queries with references
|
|
16
|
+
- **Separation of concerns**: Business logic uses refs, infrastructure uses collections
|
|
17
|
+
|
|
18
|
+
### 2. Signature-Based Deduplication
|
|
19
|
+
|
|
20
|
+
Generate deterministic signatures for documents based on configurable key paths. Use these signatures to:
|
|
21
|
+
|
|
22
|
+
- **Prevent duplicates**: Automatically detect and handle duplicate documents
|
|
23
|
+
- **Upsert behavior**: Update existing documents or insert new ones based on signature matching
|
|
24
|
+
- **Business logic independence**: Duplicate detection based on business rules, not database IDs
|
|
25
|
+
|
|
26
|
+
## Configuration Schema
|
|
27
|
+
|
|
28
|
+
### Input Configuration
|
|
29
|
+
|
|
30
|
+
Define how to read data from collections:
|
|
31
|
+
|
|
32
|
+
```typescript
|
|
33
|
+
interface InputConfig {
|
|
34
|
+
ref: string; // Logical reference name (e.g., "topology", "vulnerabilities")
|
|
35
|
+
collection: string; // Physical collection name (e.g., "topology-definition-neo-data")
|
|
36
|
+
query?: Filter<any>; // Optional default query filter
|
|
37
|
+
}
|
|
38
|
+
```
|
|
39
|
+
|
|
40
|
+
**Purpose**: Map logical references to physical collections with optional query filters.
|
|
41
|
+
|
|
42
|
+
**Example**:
|
|
43
|
+
```json
|
|
44
|
+
{
|
|
45
|
+
"ref": "highSeverityVulns",
|
|
46
|
+
"collection": "vulnerabilities-data",
|
|
47
|
+
"query": { "severity": { "$in": ["high", "critical"] } }
|
|
48
|
+
}
|
|
49
|
+
```
|
|
50
|
+
|
|
51
|
+
### Output Configuration
|
|
52
|
+
|
|
53
|
+
Define how to write data to collections:
|
|
54
|
+
|
|
55
|
+
```typescript
|
|
56
|
+
interface OutputConfig {
|
|
57
|
+
ref: string; // Logical reference name
|
|
58
|
+
collection: string; // Physical collection name
|
|
59
|
+
keys?: string[]; // Dot-notation paths for signature generation
|
|
60
|
+
mode?: "append" | "replace"; // Write mode
|
|
61
|
+
}
|
|
62
|
+
```
|
|
63
|
+
|
|
64
|
+
**Purpose**: Map logical references to physical collections with signature keys and write behavior.
|
|
65
|
+
|
|
66
|
+
**Example**:
|
|
67
|
+
```json
|
|
68
|
+
{
|
|
69
|
+
"ref": "paths",
|
|
70
|
+
"collection": "paths-neo-data",
|
|
71
|
+
"keys": ["segments[]", "edges[].from", "edges[].to", "target_role"],
|
|
72
|
+
"mode": "append"
|
|
73
|
+
}
|
|
74
|
+
```
|
|
75
|
+
|
|
76
|
+
### Global Configuration
|
|
77
|
+
|
|
78
|
+
```typescript
|
|
79
|
+
interface Config {
|
|
80
|
+
inputs: InputConfig[];
|
|
81
|
+
outputs: OutputConfig[];
|
|
82
|
+
output?: {
|
|
83
|
+
mode?: "append" | "replace"; // Global default mode
|
|
84
|
+
};
|
|
85
|
+
}
|
|
86
|
+
```
|
|
87
|
+
|
|
88
|
+
## Implementation Components
|
|
89
|
+
|
|
90
|
+
### 1. Dot-Path Value Extraction
|
|
91
|
+
|
|
92
|
+
**Purpose**: Extract values from nested objects using dot-notation paths with array wildcard support.
|
|
93
|
+
|
|
94
|
+
**Requirements**:
|
|
95
|
+
- Support simple paths: `"meta.id"` → extracts `meta.id`
|
|
96
|
+
- Support array wildcards: `"segments[]"` → extracts all elements from `segments` array
|
|
97
|
+
- Support nested array access: `"edges[].from"` → extracts `from` from all `edges` elements
|
|
98
|
+
- Recursively flatten nested arrays
|
|
99
|
+
- Handle null/undefined gracefully (return empty array)
|
|
100
|
+
|
|
101
|
+
**Algorithm**:
|
|
102
|
+
1. Parse path for array wildcards (`[]`)
|
|
103
|
+
2. If wildcard at end: extract array, flatten recursively
|
|
104
|
+
3. If wildcard in middle: extract from each array element, then continue path
|
|
105
|
+
4. If simple path: traverse object properties
|
|
106
|
+
5. Return array of extracted values
|
|
107
|
+
|
|
108
|
+
**Example Implementation Pattern**:
|
|
109
|
+
```typescript
|
|
110
|
+
function extractByPath(obj: any, path: string): any[] {
|
|
111
|
+
// Handle array wildcards
|
|
112
|
+
if (path.endsWith('[]')) {
|
|
113
|
+
const basePath = path.slice(0, -2);
|
|
114
|
+
const array = extractByPath(obj, basePath);
|
|
115
|
+
return flatten(array);
|
|
116
|
+
}
|
|
117
|
+
|
|
118
|
+
// Handle nested paths
|
|
119
|
+
// ... recursive extraction logic
|
|
120
|
+
}
|
|
121
|
+
```
|
|
122
|
+
|
|
123
|
+
### 2. Value Normalization
|
|
124
|
+
|
|
125
|
+
**Purpose**: Convert values to consistent string representations for signature computation.
|
|
126
|
+
|
|
127
|
+
**Normalization Rules**:
|
|
128
|
+
|
|
129
|
+
| Type | Normalization |
|
|
130
|
+
|------|---------------|
|
|
131
|
+
| String | As-is |
|
|
132
|
+
| Number | `String(value)` |
|
|
133
|
+
| Boolean | `"true"` or `"false"` |
|
|
134
|
+
| Date | `value.toISOString()` (UTC) |
|
|
135
|
+
| Null/Undefined | `"null"` |
|
|
136
|
+
| Object | `JSON.stringify(value, Object.keys(value).sort())` (sorted keys) |
|
|
137
|
+
| Array | Flatten → Normalize each → Deduplicate → Sort → `JSON.stringify` |
|
|
138
|
+
|
|
139
|
+
**Critical**: Objects must have sorted keys for consistent stringification.
|
|
140
|
+
|
|
141
|
+
### 3. Signature Computation
|
|
142
|
+
|
|
143
|
+
**Purpose**: Generate deterministic hash signatures for documents.
|
|
144
|
+
|
|
145
|
+
**Algorithm**:
|
|
146
|
+
1. **Extract**: For each key path, extract values using dot-path extraction
|
|
147
|
+
2. **Normalize**: Normalize all extracted values per normalization rules
|
|
148
|
+
3. **Deduplicate**: Remove duplicate normalized values per key
|
|
149
|
+
4. **Sort**: Sort normalized values lexicographically per key
|
|
150
|
+
5. **Canonical Map**: Create `{ key1: [values], key2: [values], ... }`
|
|
151
|
+
6. **Sort Keys**: Sort map keys alphabetically
|
|
152
|
+
7. **Stringify**: `JSON.stringify(canonicalMap)`
|
|
153
|
+
8. **Hash**: Apply hash algorithm (SHA-256 recommended)
|
|
154
|
+
9. **Return**: Hex string
|
|
155
|
+
|
|
156
|
+
**Properties**:
|
|
157
|
+
- **Deterministic**: Same document + same keys → same signature
|
|
158
|
+
- **Collision-resistant**: Different documents → different signatures (with high probability)
|
|
159
|
+
- **Order-independent**: Array element order doesn't affect signature (due to sorting)
|
|
160
|
+
|
|
161
|
+
### 4. Index Management
|
|
162
|
+
|
|
163
|
+
**Purpose**: Ensure unique index exists on signature field for efficient lookups.
|
|
164
|
+
|
|
165
|
+
**Requirements**:
|
|
166
|
+
- **Idempotent**: Safe to call multiple times
|
|
167
|
+
- **Conflict handling**: If index exists with wrong options, drop and recreate
|
|
168
|
+
- **Field name**: Configurable (default: `_sig`)
|
|
169
|
+
- **Uniqueness**: Always unique (prevents duplicates)
|
|
170
|
+
|
|
171
|
+
**Algorithm**:
|
|
172
|
+
1. List existing indexes on collection
|
|
173
|
+
2. Find index on signature field
|
|
174
|
+
3. If exists with correct options → return (no-op)
|
|
175
|
+
4. If exists with wrong options → drop index
|
|
176
|
+
5. Create index with correct options
|
|
177
|
+
6. Return creation status and index name
|
|
178
|
+
|
|
179
|
+
### 5. Load by Reference
|
|
180
|
+
|
|
181
|
+
**Purpose**: Load data using logical reference name instead of collection name.
|
|
182
|
+
|
|
183
|
+
**Algorithm**:
|
|
184
|
+
1. Look up reference in `config.inputs[]`
|
|
185
|
+
2. If not found → error
|
|
186
|
+
3. Get `collection` and `query` from config
|
|
187
|
+
4. Execute query on collection (with optional pagination)
|
|
188
|
+
5. Return results
|
|
189
|
+
|
|
190
|
+
**Features**:
|
|
191
|
+
- Automatic query application
|
|
192
|
+
- Pagination support
|
|
193
|
+
- Transaction session support
|
|
194
|
+
|
|
195
|
+
### 6. Write by Reference
|
|
196
|
+
|
|
197
|
+
**Purpose**: Write data using logical reference with automatic deduplication.
|
|
198
|
+
|
|
199
|
+
**Algorithm**:
|
|
200
|
+
|
|
201
|
+
**If `mode === "replace"`**:
|
|
202
|
+
1. Clear collection (within transaction if session provided)
|
|
203
|
+
|
|
204
|
+
**If `keys` specified** (signature-based):
|
|
205
|
+
1. Ensure signature index exists
|
|
206
|
+
2. For each document:
|
|
207
|
+
- Compute signature using `keys`
|
|
208
|
+
- Add `_sig` field to document
|
|
209
|
+
- Create upsert operation: `{ filter: { _sig: sig }, update: { $set: doc }, upsert: true }`
|
|
210
|
+
3. Execute bulk upsert operations (batch size: 1000, ordered: false)
|
|
211
|
+
4. Aggregate results: count inserted vs updated
|
|
212
|
+
|
|
213
|
+
**If `keys` not specified** (regular insert):
|
|
214
|
+
1. Execute `insertMany` (ordered: false)
|
|
215
|
+
2. Aggregate errors
|
|
216
|
+
|
|
217
|
+
**Error Handling**:
|
|
218
|
+
- **Never throw on first error**: Aggregate all errors
|
|
219
|
+
- Continue processing after errors
|
|
220
|
+
- Return structured result: `{ inserted, updated, errors[], indexCreated }`
|
|
221
|
+
|
|
222
|
+
## Write Modes
|
|
223
|
+
|
|
224
|
+
### Append Mode (Default)
|
|
225
|
+
|
|
226
|
+
- **Behavior**: Add documents to collection
|
|
227
|
+
- **Deduplication**: If keys specified, upsert by signature (update if exists, insert if new)
|
|
228
|
+
- **Use case**: Incremental data updates, accumulating data over time
|
|
229
|
+
|
|
230
|
+
### Replace Mode
|
|
231
|
+
|
|
232
|
+
- **Behavior**: Clear collection, then insert documents
|
|
233
|
+
- **Deduplication**: Still applies if keys specified (for the new batch)
|
|
234
|
+
- **Use case**: Full data refresh, replacing entire dataset
|
|
235
|
+
|
|
236
|
+
## Transaction Support
|
|
237
|
+
|
|
238
|
+
All operations should support optional transaction sessions:
|
|
239
|
+
|
|
240
|
+
- **Load operations**: Use session in query execution
|
|
241
|
+
- **Write operations**: Use session in bulk operations
|
|
242
|
+
- **Replace mode**: Wrap clear + write in transaction
|
|
243
|
+
- **Error handling**: Rollback on transaction errors
|
|
244
|
+
|
|
245
|
+
## Error Aggregation
|
|
246
|
+
|
|
247
|
+
**Principle**: Never fail fast on first error. Continue processing and aggregate all errors.
|
|
248
|
+
|
|
249
|
+
**Error Structure**:
|
|
250
|
+
```typescript
|
|
251
|
+
interface ErrorInfo {
|
|
252
|
+
index: number; // Document index in input array
|
|
253
|
+
error: Error; // Error object
|
|
254
|
+
doc?: any; // Optional: problematic document
|
|
255
|
+
}
|
|
256
|
+
```
|
|
257
|
+
|
|
258
|
+
**Benefits**:
|
|
259
|
+
- Partial success: Some documents succeed even if others fail
|
|
260
|
+
- Better observability: See all failures at once
|
|
261
|
+
- Easier debugging: Know which documents failed
|
|
262
|
+
|
|
263
|
+
## Performance Considerations
|
|
264
|
+
|
|
265
|
+
### Batch Processing
|
|
266
|
+
|
|
267
|
+
- Process documents in batches (recommended: 1000 per batch)
|
|
268
|
+
- Use unordered bulk operations (`ordered: false`) for better performance
|
|
269
|
+
- Continue processing even if a batch has errors
|
|
270
|
+
|
|
271
|
+
### Index Management
|
|
272
|
+
|
|
273
|
+
- Create signature index once (idempotent operation)
|
|
274
|
+
- Index should be unique for duplicate prevention
|
|
275
|
+
- Index name typically auto-generated (e.g., `_sig_1`)
|
|
276
|
+
|
|
277
|
+
### Large Arrays in Signatures
|
|
278
|
+
|
|
279
|
+
- **No automatic capping**: Arrays are processed fully
|
|
280
|
+
- **Performance impact**: Large arrays (1000+ elements) may slow signature computation
|
|
281
|
+
- **Future enhancement**: Consider optional `signatureLimit` parameter
|
|
282
|
+
|
|
283
|
+
## Security Considerations
|
|
284
|
+
|
|
285
|
+
### Index Creation Permissions
|
|
286
|
+
|
|
287
|
+
- Requires `createIndex` privilege
|
|
288
|
+
- Provide clear error messages if permission denied
|
|
289
|
+
- Document permission requirements
|
|
290
|
+
|
|
291
|
+
### Signature Field
|
|
292
|
+
|
|
293
|
+
- Signature field (`_sig`) is added to documents automatically
|
|
294
|
+
- Applications should not manually set this field
|
|
295
|
+
- Field name is configurable but defaults to `_sig`
|
|
296
|
+
|
|
297
|
+
## Testing Requirements
|
|
298
|
+
|
|
299
|
+
### Unit Tests
|
|
300
|
+
|
|
301
|
+
1. **Dot-path extraction**:
|
|
302
|
+
- Simple paths
|
|
303
|
+
- Array wildcards
|
|
304
|
+
- Nested arrays
|
|
305
|
+
- Null/undefined handling
|
|
306
|
+
|
|
307
|
+
2. **Value normalization**:
|
|
308
|
+
- All data types
|
|
309
|
+
- Object key sorting
|
|
310
|
+
- Array flattening and sorting
|
|
311
|
+
|
|
312
|
+
3. **Signature computation**:
|
|
313
|
+
- Deterministic (same input → same output)
|
|
314
|
+
- Uniqueness (different inputs → different outputs)
|
|
315
|
+
- Order independence
|
|
316
|
+
|
|
317
|
+
### Integration Tests
|
|
318
|
+
|
|
319
|
+
1. **Load by reference**:
|
|
320
|
+
- Valid ref lookup
|
|
321
|
+
- Invalid ref error
|
|
322
|
+
- Query application
|
|
323
|
+
- Pagination
|
|
324
|
+
|
|
325
|
+
2. **Write by reference**:
|
|
326
|
+
- Append mode with deduplication
|
|
327
|
+
- Replace mode
|
|
328
|
+
- Error aggregation
|
|
329
|
+
- Transaction support
|
|
330
|
+
|
|
331
|
+
3. **Index management**:
|
|
332
|
+
- Idempotent creation
|
|
333
|
+
- Conflict handling
|
|
334
|
+
- Permission errors
|
|
335
|
+
|
|
336
|
+
### Performance Tests
|
|
337
|
+
|
|
338
|
+
1. Large document arrays (1000+ documents)
|
|
339
|
+
2. Large arrays in signatures (1000+ elements per key)
|
|
340
|
+
3. Batch processing efficiency
|
|
341
|
+
|
|
342
|
+
## Migration Guide
|
|
343
|
+
|
|
344
|
+
### For Applications Using Direct Collection Names
|
|
345
|
+
|
|
346
|
+
**Before**:
|
|
347
|
+
```typescript
|
|
348
|
+
const docs = await helper.loadCollection('topology-definition-neo-data', { type: 'active' });
|
|
349
|
+
await helper.insert('paths-neo-data', pathDocs);
|
|
350
|
+
```
|
|
351
|
+
|
|
352
|
+
**After**:
|
|
353
|
+
```typescript
|
|
354
|
+
// Define config once
|
|
355
|
+
const config = {
|
|
356
|
+
inputs: [
|
|
357
|
+
{ ref: 'topology', collection: 'topology-definition-neo-data', query: { type: 'active' } }
|
|
358
|
+
],
|
|
359
|
+
outputs: [
|
|
360
|
+
{ ref: 'paths', collection: 'paths-neo-data', keys: ['segments[]', 'target_role'] }
|
|
361
|
+
]
|
|
362
|
+
};
|
|
363
|
+
|
|
364
|
+
// Use refs in code
|
|
365
|
+
const docs = await helper.loadByRef('topology');
|
|
366
|
+
await helper.writeByRef('paths', pathDocs);
|
|
367
|
+
```
|
|
368
|
+
|
|
369
|
+
### Benefits
|
|
370
|
+
|
|
371
|
+
- **Flexibility**: Change collection names in config only
|
|
372
|
+
- **Automatic deduplication**: No manual duplicate checking
|
|
373
|
+
- **Query encapsulation**: Queries defined once in config
|
|
374
|
+
- **Type safety**: Ref names can be validated
|
|
375
|
+
|
|
376
|
+
## Best Practices
|
|
377
|
+
|
|
378
|
+
### 1. Choosing Signature Keys
|
|
379
|
+
|
|
380
|
+
- **Select business-critical fields**: Fields that define document uniqueness
|
|
381
|
+
- **Avoid volatile fields**: Don't include timestamps, auto-incrementing IDs
|
|
382
|
+
- **Include all relevant fields**: Missing fields may cause false duplicates
|
|
383
|
+
- **Test signature uniqueness**: Verify different documents produce different signatures
|
|
384
|
+
|
|
385
|
+
### 2. Write Mode Selection
|
|
386
|
+
|
|
387
|
+
- **Append mode**: For incremental updates, accumulating data
|
|
388
|
+
- **Replace mode**: For full refreshes, replacing entire datasets
|
|
389
|
+
- **Consider data size**: Replace mode may be slow for large collections
|
|
390
|
+
|
|
391
|
+
### 3. Error Handling
|
|
392
|
+
|
|
393
|
+
- **Always check error array**: Even if `inserted > 0`, check for errors
|
|
394
|
+
- **Log errors**: Log failed documents for debugging
|
|
395
|
+
- **Retry logic**: Consider retrying failed documents
|
|
396
|
+
|
|
397
|
+
### 4. Configuration Management
|
|
398
|
+
|
|
399
|
+
- **Environment-specific configs**: Different configs per environment
|
|
400
|
+
- **Version control**: Keep configs in version control
|
|
401
|
+
- **Validation**: Validate config structure at startup
|
|
402
|
+
|
|
403
|
+
## API Design Principles
|
|
404
|
+
|
|
405
|
+
### 1. Generic and Reusable
|
|
406
|
+
|
|
407
|
+
- All logic built into the helper/library
|
|
408
|
+
- Applications only pass refs and documents
|
|
409
|
+
- No application-specific code in helper
|
|
410
|
+
|
|
411
|
+
### 2. Backward Compatible
|
|
412
|
+
|
|
413
|
+
- Existing APIs remain unchanged
|
|
414
|
+
- New features are additive
|
|
415
|
+
- Config is optional
|
|
416
|
+
|
|
417
|
+
### 3. Explicit Configuration
|
|
418
|
+
|
|
419
|
+
- Configuration is explicit (not inferred)
|
|
420
|
+
- Clear separation between inputs and outputs
|
|
421
|
+
- Per-output overrides for global defaults
|
|
422
|
+
|
|
423
|
+
### 4. Observable Results
|
|
424
|
+
|
|
425
|
+
- Return detailed results (inserted, updated, errors)
|
|
426
|
+
- Include metadata (index created, etc.)
|
|
427
|
+
- Never hide errors
|
|
428
|
+
|
|
429
|
+
## Future Enhancements
|
|
430
|
+
|
|
431
|
+
### Optional Features
|
|
432
|
+
|
|
433
|
+
1. **Signature limit**: Cap array element counts for performance
|
|
434
|
+
2. **Custom normalization**: Allow custom normalization functions
|
|
435
|
+
3. **Signature validation**: Utilities to validate signature consistency
|
|
436
|
+
4. **Config validation**: Schema validation for configuration
|
|
437
|
+
5. **Metrics**: Performance metrics for operations
|
|
438
|
+
|
|
439
|
+
### Considerations
|
|
440
|
+
|
|
441
|
+
- **Concurrency**: Document concurrent write behavior
|
|
442
|
+
- **Large datasets**: Consider streaming for very large datasets
|
|
443
|
+
- **Custom hash algorithms**: Support for additional algorithms beyond SHA-256
|
|
444
|
+
|
|
445
|
+
## Conclusion
|
|
446
|
+
|
|
447
|
+
This solution provides a generic, reusable approach to:
|
|
448
|
+
|
|
449
|
+
1. **Abstraction**: Work with logical references instead of physical collections
|
|
450
|
+
2. **Deduplication**: Automatic duplicate prevention based on business logic
|
|
451
|
+
3. **Flexibility**: Change collection names and queries without code changes
|
|
452
|
+
4. **Observability**: Detailed results and error aggregation
|
|
453
|
+
5. **Performance**: Efficient bulk operations with batching
|
|
454
|
+
|
|
455
|
+
The implementation is database-agnostic in concept, though specific examples use MongoDB syntax. The principles apply to any database system that supports:
|
|
456
|
+
- Indexed queries
|
|
457
|
+
- Bulk write operations
|
|
458
|
+
- Unique constraints
|
|
459
|
+
- Transaction support
|
|
460
|
+
|