ml-cache 1.0.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/LICENSE +21 -0
- package/README.md +599 -0
- package/dist/client.d.ts +175 -0
- package/dist/client.d.ts.map +1 -0
- package/dist/client.js +411 -0
- package/dist/client.js.map +1 -0
- package/dist/constants.d.ts +184 -0
- package/dist/constants.d.ts.map +1 -0
- package/dist/constants.js +236 -0
- package/dist/constants.js.map +1 -0
- package/dist/esm/client.d.ts +175 -0
- package/dist/esm/client.d.ts.map +1 -0
- package/dist/esm/client.js +407 -0
- package/dist/esm/client.js.map +1 -0
- package/dist/esm/constants.d.ts +184 -0
- package/dist/esm/constants.d.ts.map +1 -0
- package/dist/esm/constants.js +233 -0
- package/dist/esm/constants.js.map +1 -0
- package/dist/esm/index.d.ts +58 -0
- package/dist/esm/index.d.ts.map +1 -0
- package/dist/esm/index.js.map +1 -0
- package/dist/esm/queue.d.ts +106 -0
- package/dist/esm/queue.d.ts.map +1 -0
- package/dist/esm/queue.js +207 -0
- package/dist/esm/queue.js.map +1 -0
- package/dist/esm/storage/glacier-storage.d.ts +77 -0
- package/dist/esm/storage/glacier-storage.d.ts.map +1 -0
- package/dist/esm/storage/glacier-storage.js +133 -0
- package/dist/esm/storage/glacier-storage.js.map +1 -0
- package/dist/esm/storage/index.d.ts +7 -0
- package/dist/esm/storage/index.d.ts.map +1 -0
- package/dist/esm/storage/index.js +7 -0
- package/dist/esm/storage/index.js.map +1 -0
- package/dist/esm/storage/s3-storage.d.ts +89 -0
- package/dist/esm/storage/s3-storage.d.ts.map +1 -0
- package/dist/esm/storage/s3-storage.js +161 -0
- package/dist/esm/storage/s3-storage.js.map +1 -0
- package/dist/esm/types/index.d.ts +337 -0
- package/dist/esm/types/index.d.ts.map +1 -0
- package/dist/esm/types/index.js +6 -0
- package/dist/esm/types/index.js.map +1 -0
- package/dist/esm/utils/helpers.d.ts +132 -0
- package/dist/esm/utils/helpers.d.ts.map +1 -0
- package/dist/esm/utils/helpers.js +198 -0
- package/dist/esm/utils/helpers.js.map +1 -0
- package/dist/esm/utils/index.d.ts +8 -0
- package/dist/esm/utils/index.d.ts.map +1 -0
- package/dist/esm/utils/index.js +8 -0
- package/dist/esm/utils/index.js.map +1 -0
- package/dist/esm/utils/logger.d.ts +110 -0
- package/dist/esm/utils/logger.d.ts.map +1 -0
- package/dist/esm/utils/logger.js +177 -0
- package/dist/esm/utils/logger.js.map +1 -0
- package/dist/esm/utils/validator.d.ts +77 -0
- package/dist/esm/utils/validator.d.ts.map +1 -0
- package/dist/esm/utils/validator.js +208 -0
- package/dist/esm/utils/validator.js.map +1 -0
- package/dist/index.d.ts +58 -0
- package/dist/index.d.ts.map +1 -0
- package/dist/index.js +100 -0
- package/dist/index.js.map +1 -0
- package/dist/index.mjs +72 -0
- package/dist/queue.d.ts +106 -0
- package/dist/queue.d.ts.map +1 -0
- package/dist/queue.js +211 -0
- package/dist/queue.js.map +1 -0
- package/dist/storage/glacier-storage.d.ts +77 -0
- package/dist/storage/glacier-storage.d.ts.map +1 -0
- package/dist/storage/glacier-storage.js +137 -0
- package/dist/storage/glacier-storage.js.map +1 -0
- package/dist/storage/index.d.ts +7 -0
- package/dist/storage/index.d.ts.map +1 -0
- package/dist/storage/index.js +12 -0
- package/dist/storage/index.js.map +1 -0
- package/dist/storage/s3-storage.d.ts +89 -0
- package/dist/storage/s3-storage.d.ts.map +1 -0
- package/dist/storage/s3-storage.js +165 -0
- package/dist/storage/s3-storage.js.map +1 -0
- package/dist/types/index.d.ts +337 -0
- package/dist/types/index.d.ts.map +1 -0
- package/dist/types/index.js +7 -0
- package/dist/types/index.js.map +1 -0
- package/dist/utils/helpers.d.ts +132 -0
- package/dist/utils/helpers.d.ts.map +1 -0
- package/dist/utils/helpers.js +215 -0
- package/dist/utils/helpers.js.map +1 -0
- package/dist/utils/index.d.ts +8 -0
- package/dist/utils/index.d.ts.map +1 -0
- package/dist/utils/index.js +35 -0
- package/dist/utils/index.js.map +1 -0
- package/dist/utils/logger.d.ts +110 -0
- package/dist/utils/logger.d.ts.map +1 -0
- package/dist/utils/logger.js +181 -0
- package/dist/utils/logger.js.map +1 -0
- package/dist/utils/validator.d.ts +77 -0
- package/dist/utils/validator.d.ts.map +1 -0
- package/dist/utils/validator.js +221 -0
- package/dist/utils/validator.js.map +1 -0
- package/package.json +64 -0
package/LICENSE
ADDED
|
@@ -0,0 +1,21 @@
|
|
|
1
|
+
MIT License
|
|
2
|
+
|
|
3
|
+
Copyright (c) 2024 Nicolas Mondain
|
|
4
|
+
|
|
5
|
+
Permission is hereby granted, free of charge, to any person obtaining a copy
|
|
6
|
+
of this software and associated documentation files (the "Software"), to deal
|
|
7
|
+
in the Software without restriction, including without limitation the rights
|
|
8
|
+
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
|
|
9
|
+
copies of the Software, and to permit persons to whom the Software is
|
|
10
|
+
furnished to do so, subject to the following conditions:
|
|
11
|
+
|
|
12
|
+
The above copyright notice and this permission notice shall be included in all
|
|
13
|
+
copies or substantial portions of the Software.
|
|
14
|
+
|
|
15
|
+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
|
16
|
+
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
|
17
|
+
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
|
|
18
|
+
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
|
19
|
+
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
|
|
20
|
+
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
|
|
21
|
+
SOFTWARE.
|
package/README.md
ADDED
|
@@ -0,0 +1,599 @@
|
|
|
1
|
+
# ml-cache
|
|
2
|
+
|
|
3
|
+
**Store your business data today. Train your AI models tomorrow.**
|
|
4
|
+
|
|
5
|
+
[](https://www.npmjs.com/package/ml-cache)
|
|
6
|
+
[](https://opensource.org/licenses/MIT)
|
|
7
|
+
[](https://www.typescriptlang.org/)
|
|
8
|
+
[](https://nodejs.org/)
|
|
9
|
+
|
|
10
|
+
---
|
|
11
|
+
|
|
12
|
+
## The Problem
|
|
13
|
+
|
|
14
|
+
Machine learning is transforming every industry, but there's a catch: **you need massive amounts of quality data to train effective models**. Companies that start collecting data today will have a significant competitive advantage when:
|
|
15
|
+
|
|
16
|
+
- ML training costs continue to drop exponentially
|
|
17
|
+
- Your business grows and you need personalized AI features
|
|
18
|
+
- You want to build recommendation engines, fraud detection, or predictive analytics
|
|
19
|
+
- Custom models become essential for differentiation
|
|
20
|
+
|
|
21
|
+
**The data you're generating right now is invaluable for future AI/ML applications. Don't let it slip away.**
|
|
22
|
+
|
|
23
|
+
## The Solution
|
|
24
|
+
|
|
25
|
+
`ml-cache` is a lightweight TypeScript SDK that captures your business events and stores them in Amazon S3 Glacier — the most cost-effective cold storage solution available. It's designed with a simple philosophy:
|
|
26
|
+
|
|
27
|
+
> **Collect everything now. Pay almost nothing. Train models when ready.**
|
|
28
|
+
|
|
29
|
+
### Why Cold Storage?
|
|
30
|
+
|
|
31
|
+
| Storage Type | Cost per TB/month | Retrieval |
|
|
32
|
+
|--------------|-------------------|-----------|
|
|
33
|
+
| S3 Standard | ~$23 | Instant |
|
|
34
|
+
| S3 Glacier | ~$4 | Minutes to hours |
|
|
35
|
+
| S3 Glacier Deep Archive | ~$1 | 12-48 hours |
|
|
36
|
+
|
|
37
|
+
For ML training data that you'll access months or years from now, cold storage is **20x cheaper** than standard storage.
|
|
38
|
+
|
|
39
|
+
---
|
|
40
|
+
|
|
41
|
+
## Features
|
|
42
|
+
|
|
43
|
+
- **Simple, Familiar API** — Inspired by analytics SDKs like RudderStack and Segment
|
|
44
|
+
- **Automatic Batching** — Efficiently groups events to minimize API calls
|
|
45
|
+
- **Smart Retry Logic** — Exponential backoff ensures no data loss
|
|
46
|
+
- **Type-Safe** — Full TypeScript support with comprehensive type definitions
|
|
47
|
+
- **Flexible Storage** — S3 Standard, Glacier, or Glacier Deep Archive
|
|
48
|
+
- **Rich Context** — Capture user, device, page, and campaign data
|
|
49
|
+
- **Zero Dependencies on Analytics** — Direct AWS integration, no middlemen
|
|
50
|
+
- **Production Ready** — Battle-tested error handling and graceful shutdown
|
|
51
|
+
|
|
52
|
+
---
|
|
53
|
+
|
|
54
|
+
## Installation
|
|
55
|
+
|
|
56
|
+
```bash
|
|
57
|
+
npm install ml-cache
|
|
58
|
+
```
|
|
59
|
+
|
|
60
|
+
```bash
|
|
61
|
+
yarn add ml-cache
|
|
62
|
+
```
|
|
63
|
+
|
|
64
|
+
```bash
|
|
65
|
+
pnpm add ml-cache
|
|
66
|
+
```
|
|
67
|
+
|
|
68
|
+
---
|
|
69
|
+
|
|
70
|
+
## Quick Start
|
|
71
|
+
|
|
72
|
+
```typescript
|
|
73
|
+
import { MLCacheClient } from 'ml-cache';
|
|
74
|
+
|
|
75
|
+
// Initialize the client
|
|
76
|
+
const mlCache = new MLCacheClient({
|
|
77
|
+
credentials: {
|
|
78
|
+
accessKeyId: process.env.AWS_ACCESS_KEY_ID!,
|
|
79
|
+
secretAccessKey: process.env.AWS_SECRET_ACCESS_KEY!,
|
|
80
|
+
},
|
|
81
|
+
s3: {
|
|
82
|
+
bucket: 'my-ml-data-lake',
|
|
83
|
+
region: 'us-east-1',
|
|
84
|
+
storageClass: 'GLACIER', // Cost-effective cold storage
|
|
85
|
+
},
|
|
86
|
+
storageMode: 'S3',
|
|
87
|
+
sourceApp: 'my-webapp',
|
|
88
|
+
environment: 'production',
|
|
89
|
+
});
|
|
90
|
+
|
|
91
|
+
// Track business events
|
|
92
|
+
await mlCache.track({
|
|
93
|
+
eventType: 'purchase',
|
|
94
|
+
properties: {
|
|
95
|
+
productId: 'SKU-12345',
|
|
96
|
+
productName: 'Premium Widget',
|
|
97
|
+
price: 99.99,
|
|
98
|
+
currency: 'USD',
|
|
99
|
+
quantity: 2,
|
|
100
|
+
},
|
|
101
|
+
context: {
|
|
102
|
+
user: {
|
|
103
|
+
userId: 'user-789',
|
|
104
|
+
traits: {
|
|
105
|
+
plan: 'premium',
|
|
106
|
+
signupDate: '2024-01-15',
|
|
107
|
+
},
|
|
108
|
+
},
|
|
109
|
+
},
|
|
110
|
+
});
|
|
111
|
+
|
|
112
|
+
// Identify users
|
|
113
|
+
await mlCache.identify('user-789', {
|
|
114
|
+
email: 'user@example.com',
|
|
115
|
+
name: 'Jane Doe',
|
|
116
|
+
company: 'Acme Corp',
|
|
117
|
+
});
|
|
118
|
+
|
|
119
|
+
// Track page views
|
|
120
|
+
await mlCache.page('Product Details', {
|
|
121
|
+
url: '/products/SKU-12345',
|
|
122
|
+
referrer: 'google.com',
|
|
123
|
+
});
|
|
124
|
+
|
|
125
|
+
// Graceful shutdown (flushes remaining events)
|
|
126
|
+
await mlCache.shutdown();
|
|
127
|
+
```
|
|
128
|
+
|
|
129
|
+
---
|
|
130
|
+
|
|
131
|
+
## Configuration
|
|
132
|
+
|
|
133
|
+
### Full Configuration Options
|
|
134
|
+
|
|
135
|
+
```typescript
|
|
136
|
+
import { MLCacheClient, type MLCacheConfig } from 'ml-cache';
|
|
137
|
+
|
|
138
|
+
const config: MLCacheConfig = {
|
|
139
|
+
// Required: AWS Credentials
|
|
140
|
+
credentials: {
|
|
141
|
+
accessKeyId: 'AKIA...',
|
|
142
|
+
secretAccessKey: '...',
|
|
143
|
+
sessionToken: '...', // Optional: for temporary credentials
|
|
144
|
+
},
|
|
145
|
+
|
|
146
|
+
// S3 Configuration (required for S3 or S3_TO_GLACIER mode)
|
|
147
|
+
s3: {
|
|
148
|
+
bucket: 'my-ml-data-bucket',
|
|
149
|
+
region: 'us-east-1',
|
|
150
|
+
prefix: 'events/', // Optional: folder prefix for objects
|
|
151
|
+
storageClass: 'GLACIER', // STANDARD, GLACIER, DEEP_ARCHIVE, etc.
|
|
152
|
+
},
|
|
153
|
+
|
|
154
|
+
// Glacier Configuration (required for GLACIER mode)
|
|
155
|
+
glacier: {
|
|
156
|
+
vaultName: 'my-ml-vault',
|
|
157
|
+
region: 'us-east-1',
|
|
158
|
+
accountId: '-', // Optional: defaults to current account
|
|
159
|
+
},
|
|
160
|
+
|
|
161
|
+
// Storage mode
|
|
162
|
+
storageMode: 'S3', // 'S3' | 'GLACIER' | 'S3_TO_GLACIER'
|
|
163
|
+
|
|
164
|
+
// Batching configuration
|
|
165
|
+
batch: {
|
|
166
|
+
enabled: true, // Enable event batching
|
|
167
|
+
maxSize: 100, // Max events per batch
|
|
168
|
+
maxWaitMs: 30000, // Flush every 30 seconds
|
|
169
|
+
},
|
|
170
|
+
|
|
171
|
+
// Retry configuration
|
|
172
|
+
retry: {
|
|
173
|
+
maxRetries: 3,
|
|
174
|
+
initialDelayMs: 1000,
|
|
175
|
+
maxDelayMs: 30000,
|
|
176
|
+
exponentialBackoff: true,
|
|
177
|
+
},
|
|
178
|
+
|
|
179
|
+
// Logging configuration
|
|
180
|
+
log: {
|
|
181
|
+
level: 'info', // 'debug' | 'info' | 'warn' | 'error' | 'silent'
|
|
182
|
+
enabled: true,
|
|
183
|
+
customLogger: (level, message, data) => {
|
|
184
|
+
// Your custom logging logic
|
|
185
|
+
},
|
|
186
|
+
},
|
|
187
|
+
|
|
188
|
+
// Metadata
|
|
189
|
+
sourceApp: 'my-application',
|
|
190
|
+
environment: 'production',
|
|
191
|
+
debug: false,
|
|
192
|
+
};
|
|
193
|
+
|
|
194
|
+
const client = new MLCacheClient(config);
|
|
195
|
+
```
|
|
196
|
+
|
|
197
|
+
### Storage Classes
|
|
198
|
+
|
|
199
|
+
Choose the right storage class for your needs:
|
|
200
|
+
|
|
201
|
+
| Storage Class | Use Case | Retrieval Time |
|
|
202
|
+
|--------------|----------|----------------|
|
|
203
|
+
| `STANDARD` | Frequent access, testing | Instant |
|
|
204
|
+
| `STANDARD_IA` | Infrequent access | Instant |
|
|
205
|
+
| `GLACIER` | **Recommended for ML data** | 1-5 minutes |
|
|
206
|
+
| `DEEP_ARCHIVE` | Rarely accessed, lowest cost | 12-48 hours |
|
|
207
|
+
|
|
208
|
+
---
|
|
209
|
+
|
|
210
|
+
## Event Types
|
|
211
|
+
|
|
212
|
+
### Track Events
|
|
213
|
+
|
|
214
|
+
Capture any business event with rich properties:
|
|
215
|
+
|
|
216
|
+
```typescript
|
|
217
|
+
await mlCache.track({
|
|
218
|
+
eventType: 'checkout_completed',
|
|
219
|
+
properties: {
|
|
220
|
+
orderId: 'ORD-123456',
|
|
221
|
+
total: 299.99,
|
|
222
|
+
items: [
|
|
223
|
+
{ sku: 'WIDGET-A', quantity: 2, price: 49.99 },
|
|
224
|
+
{ sku: 'WIDGET-B', quantity: 1, price: 199.99 },
|
|
225
|
+
],
|
|
226
|
+
paymentMethod: 'credit_card',
|
|
227
|
+
shippingMethod: 'express',
|
|
228
|
+
},
|
|
229
|
+
context: {
|
|
230
|
+
user: { userId: 'user-123' },
|
|
231
|
+
campaign: {
|
|
232
|
+
source: 'google',
|
|
233
|
+
medium: 'cpc',
|
|
234
|
+
name: 'summer_sale',
|
|
235
|
+
},
|
|
236
|
+
},
|
|
237
|
+
});
|
|
238
|
+
```
|
|
239
|
+
|
|
240
|
+
### Identify Users
|
|
241
|
+
|
|
242
|
+
Build user profiles for personalization:
|
|
243
|
+
|
|
244
|
+
```typescript
|
|
245
|
+
await mlCache.identify('user-123', {
|
|
246
|
+
email: 'user@example.com',
|
|
247
|
+
name: 'John Doe',
|
|
248
|
+
plan: 'enterprise',
|
|
249
|
+
company: 'Acme Corp',
|
|
250
|
+
employeeCount: 500,
|
|
251
|
+
industry: 'technology',
|
|
252
|
+
});
|
|
253
|
+
```
|
|
254
|
+
|
|
255
|
+
### Page Views
|
|
256
|
+
|
|
257
|
+
Track navigation patterns:
|
|
258
|
+
|
|
259
|
+
```typescript
|
|
260
|
+
await mlCache.page('Pricing', {
|
|
261
|
+
url: 'https://example.com/pricing',
|
|
262
|
+
path: '/pricing',
|
|
263
|
+
title: 'Pricing Plans - Example',
|
|
264
|
+
referrer: 'https://google.com',
|
|
265
|
+
});
|
|
266
|
+
```
|
|
267
|
+
|
|
268
|
+
---
|
|
269
|
+
|
|
270
|
+
## Event Context
|
|
271
|
+
|
|
272
|
+
Enrich events with contextual data:
|
|
273
|
+
|
|
274
|
+
```typescript
|
|
275
|
+
await mlCache.track({
|
|
276
|
+
eventType: 'feature_used',
|
|
277
|
+
properties: { feature: 'dark_mode' },
|
|
278
|
+
context: {
|
|
279
|
+
// User context
|
|
280
|
+
user: {
|
|
281
|
+
userId: 'user-123',
|
|
282
|
+
anonymousId: 'anon-456',
|
|
283
|
+
traits: {
|
|
284
|
+
plan: 'pro',
|
|
285
|
+
role: 'admin',
|
|
286
|
+
},
|
|
287
|
+
},
|
|
288
|
+
|
|
289
|
+
// Device context
|
|
290
|
+
device: {
|
|
291
|
+
userAgent: navigator.userAgent,
|
|
292
|
+
deviceType: 'desktop',
|
|
293
|
+
os: 'macOS',
|
|
294
|
+
browser: 'Chrome',
|
|
295
|
+
screenResolution: '1920x1080',
|
|
296
|
+
locale: 'en-US',
|
|
297
|
+
timezone: 'America/New_York',
|
|
298
|
+
},
|
|
299
|
+
|
|
300
|
+
// Page context
|
|
301
|
+
page: {
|
|
302
|
+
url: window.location.href,
|
|
303
|
+
path: window.location.pathname,
|
|
304
|
+
title: document.title,
|
|
305
|
+
referrer: document.referrer,
|
|
306
|
+
},
|
|
307
|
+
|
|
308
|
+
// Campaign/UTM context
|
|
309
|
+
campaign: {
|
|
310
|
+
source: 'newsletter',
|
|
311
|
+
medium: 'email',
|
|
312
|
+
name: 'weekly_digest',
|
|
313
|
+
content: 'cta_button',
|
|
314
|
+
},
|
|
315
|
+
|
|
316
|
+
// App context
|
|
317
|
+
app: {
|
|
318
|
+
name: 'MyApp',
|
|
319
|
+
version: '2.1.0',
|
|
320
|
+
build: '456',
|
|
321
|
+
},
|
|
322
|
+
|
|
323
|
+
// Custom context
|
|
324
|
+
custom: {
|
|
325
|
+
experimentId: 'exp-123',
|
|
326
|
+
variant: 'B',
|
|
327
|
+
},
|
|
328
|
+
},
|
|
329
|
+
});
|
|
330
|
+
```
|
|
331
|
+
|
|
332
|
+
---
|
|
333
|
+
|
|
334
|
+
## Callbacks & Monitoring
|
|
335
|
+
|
|
336
|
+
```typescript
|
|
337
|
+
// Monitor all tracked events
|
|
338
|
+
mlCache.onEvent((event) => {
|
|
339
|
+
console.log('Event tracked:', event.eventType, event.eventId);
|
|
340
|
+
});
|
|
341
|
+
|
|
342
|
+
// Handle errors
|
|
343
|
+
mlCache.onError((error, event) => {
|
|
344
|
+
console.error('Failed to store event:', error.message);
|
|
345
|
+
// Optionally: send to error tracking service
|
|
346
|
+
});
|
|
347
|
+
|
|
348
|
+
// Monitor flushes
|
|
349
|
+
mlCache.onFlush((result) => {
|
|
350
|
+
console.log(`Flushed ${result.eventCount} events`);
|
|
351
|
+
if (result.failedEventIds.length > 0) {
|
|
352
|
+
console.warn('Failed events:', result.failedEventIds);
|
|
353
|
+
}
|
|
354
|
+
});
|
|
355
|
+
|
|
356
|
+
// Health check
|
|
357
|
+
const health = await mlCache.getHealth();
|
|
358
|
+
console.log('SDK Health:', health);
|
|
359
|
+
// {
|
|
360
|
+
// healthy: true,
|
|
361
|
+
// s3Connected: true,
|
|
362
|
+
// glacierConnected: false,
|
|
363
|
+
// queueSize: 5,
|
|
364
|
+
// lastFlush: '2024-01-15T10:30:00.000Z',
|
|
365
|
+
// }
|
|
366
|
+
```
|
|
367
|
+
|
|
368
|
+
---
|
|
369
|
+
|
|
370
|
+
## Data Format
|
|
371
|
+
|
|
372
|
+
Events are stored in NDJSON (Newline Delimited JSON) format, perfect for:
|
|
373
|
+
|
|
374
|
+
- **Apache Spark** — Native NDJSON support
|
|
375
|
+
- **AWS Athena** — Query directly with SQL
|
|
376
|
+
- **Pandas** — `pd.read_json(file, lines=True)`
|
|
377
|
+
- **Any ML pipeline** — Simple line-by-line parsing
|
|
378
|
+
|
|
379
|
+
### S3 Object Structure
|
|
380
|
+
|
|
381
|
+
```
|
|
382
|
+
s3://my-bucket/ml-cache-events/
|
|
383
|
+
├── 2024/
|
|
384
|
+
│ ├── 01/
|
|
385
|
+
│ │ ├── 15/
|
|
386
|
+
│ │ │ ├── 10/
|
|
387
|
+
│ │ │ │ ├── batch_1705312200_a1b2c3d4.ndjson
|
|
388
|
+
│ │ │ │ └── batch_1705312500_e5f6g7h8.ndjson
|
|
389
|
+
```
|
|
390
|
+
|
|
391
|
+
### Event Schema
|
|
392
|
+
|
|
393
|
+
```json
|
|
394
|
+
{
|
|
395
|
+
"eventId": "550e8400-e29b-41d4-a716-446655440000",
|
|
396
|
+
"timestamp": "2024-01-15T10:30:00.000Z",
|
|
397
|
+
"eventType": "purchase",
|
|
398
|
+
"properties": {
|
|
399
|
+
"productId": "SKU-123",
|
|
400
|
+
"amount": 99.99
|
|
401
|
+
},
|
|
402
|
+
"context": {
|
|
403
|
+
"user": { "userId": "user-456" }
|
|
404
|
+
},
|
|
405
|
+
"metadata": {
|
|
406
|
+
"sdkVersion": "1.0.0",
|
|
407
|
+
"sourceApp": "my-app",
|
|
408
|
+
"environment": "production",
|
|
409
|
+
"batchId": "batch_1705312200_a1b2c3d4"
|
|
410
|
+
}
|
|
411
|
+
}
|
|
412
|
+
```
|
|
413
|
+
|
|
414
|
+
---
|
|
415
|
+
|
|
416
|
+
## AWS Setup
|
|
417
|
+
|
|
418
|
+
### IAM Policy
|
|
419
|
+
|
|
420
|
+
Create an IAM policy with minimal required permissions:
|
|
421
|
+
|
|
422
|
+
```json
|
|
423
|
+
{
|
|
424
|
+
"Version": "2012-10-17",
|
|
425
|
+
"Statement": [
|
|
426
|
+
{
|
|
427
|
+
"Effect": "Allow",
|
|
428
|
+
"Action": [
|
|
429
|
+
"s3:PutObject",
|
|
430
|
+
"s3:GetBucketLocation"
|
|
431
|
+
],
|
|
432
|
+
"Resource": [
|
|
433
|
+
"arn:aws:s3:::your-bucket-name",
|
|
434
|
+
"arn:aws:s3:::your-bucket-name/*"
|
|
435
|
+
]
|
|
436
|
+
}
|
|
437
|
+
]
|
|
438
|
+
}
|
|
439
|
+
```
|
|
440
|
+
|
|
441
|
+
For Glacier mode, add:
|
|
442
|
+
|
|
443
|
+
```json
|
|
444
|
+
{
|
|
445
|
+
"Effect": "Allow",
|
|
446
|
+
"Action": [
|
|
447
|
+
"glacier:UploadArchive",
|
|
448
|
+
"glacier:DescribeVault"
|
|
449
|
+
],
|
|
450
|
+
"Resource": "arn:aws:glacier:*:*:vaults/your-vault-name"
|
|
451
|
+
}
|
|
452
|
+
```
|
|
453
|
+
|
|
454
|
+
### S3 Lifecycle Policy (Optional)
|
|
455
|
+
|
|
456
|
+
Automatically transition data to deeper cold storage:
|
|
457
|
+
|
|
458
|
+
```json
|
|
459
|
+
{
|
|
460
|
+
"Rules": [
|
|
461
|
+
{
|
|
462
|
+
"ID": "MLDataLifecycle",
|
|
463
|
+
"Status": "Enabled",
|
|
464
|
+
"Prefix": "ml-cache-events/",
|
|
465
|
+
"Transitions": [
|
|
466
|
+
{
|
|
467
|
+
"Days": 90,
|
|
468
|
+
"StorageClass": "GLACIER"
|
|
469
|
+
},
|
|
470
|
+
{
|
|
471
|
+
"Days": 365,
|
|
472
|
+
"StorageClass": "DEEP_ARCHIVE"
|
|
473
|
+
}
|
|
474
|
+
]
|
|
475
|
+
}
|
|
476
|
+
]
|
|
477
|
+
}
|
|
478
|
+
```
|
|
479
|
+
|
|
480
|
+
---
|
|
481
|
+
|
|
482
|
+
## Best Practices
|
|
483
|
+
|
|
484
|
+
### 1. Capture Rich Context
|
|
485
|
+
|
|
486
|
+
The more context you capture now, the better your models will be:
|
|
487
|
+
|
|
488
|
+
```typescript
|
|
489
|
+
// Good: Rich context for future ML
|
|
490
|
+
await mlCache.track({
|
|
491
|
+
eventType: 'product_viewed',
|
|
492
|
+
properties: {
|
|
493
|
+
productId: 'SKU-123',
|
|
494
|
+
category: 'electronics',
|
|
495
|
+
price: 299.99,
|
|
496
|
+
inStock: true,
|
|
497
|
+
viewDuration: 45,
|
|
498
|
+
scrollDepth: 0.8,
|
|
499
|
+
},
|
|
500
|
+
context: {
|
|
501
|
+
user: { userId: 'user-456', traits: { segment: 'high-value' } },
|
|
502
|
+
page: { referrer: 'google.com', searchQuery: 'best headphones' },
|
|
503
|
+
device: { deviceType: 'mobile', os: 'iOS' },
|
|
504
|
+
},
|
|
505
|
+
});
|
|
506
|
+
```
|
|
507
|
+
|
|
508
|
+
### 2. Use Consistent Event Names
|
|
509
|
+
|
|
510
|
+
Establish naming conventions early:
|
|
511
|
+
|
|
512
|
+
```typescript
|
|
513
|
+
// Use snake_case for event types
|
|
514
|
+
'user_signed_up' // Good
|
|
515
|
+
'userSignedUp' // Avoid
|
|
516
|
+
'User Signed Up' // Avoid
|
|
517
|
+
|
|
518
|
+
// Be specific
|
|
519
|
+
'checkout_started' // Good
|
|
520
|
+
'checkout' // Too vague
|
|
521
|
+
```
|
|
522
|
+
|
|
523
|
+
### 3. Graceful Shutdown
|
|
524
|
+
|
|
525
|
+
Always flush events before application exit:
|
|
526
|
+
|
|
527
|
+
```typescript
|
|
528
|
+
process.on('SIGTERM', async () => {
|
|
529
|
+
await mlCache.shutdown();
|
|
530
|
+
process.exit(0);
|
|
531
|
+
});
|
|
532
|
+
```
|
|
533
|
+
|
|
534
|
+
### 4. Monitor Queue Size
|
|
535
|
+
|
|
536
|
+
Prevent memory issues in high-traffic scenarios:
|
|
537
|
+
|
|
538
|
+
```typescript
|
|
539
|
+
setInterval(() => {
|
|
540
|
+
const queueSize = mlCache.getQueueSize();
|
|
541
|
+
if (queueSize > 5000) {
|
|
542
|
+
console.warn(`Queue size high: ${queueSize}`);
|
|
543
|
+
}
|
|
544
|
+
}, 60000);
|
|
545
|
+
```
|
|
546
|
+
|
|
547
|
+
---
|
|
548
|
+
|
|
549
|
+
## Future ML Use Cases
|
|
550
|
+
|
|
551
|
+
The data you collect today can power tomorrow's AI features:
|
|
552
|
+
|
|
553
|
+
| Event Type | Future ML Application |
|
|
554
|
+
|------------|----------------------|
|
|
555
|
+
| `purchase` | Recommendation engine, demand forecasting |
|
|
556
|
+
| `page_view` | Content personalization, A/B test analysis |
|
|
557
|
+
| `search` | Search ranking, query understanding |
|
|
558
|
+
| `support_ticket` | Automated responses, sentiment analysis |
|
|
559
|
+
| `user_behavior` | Churn prediction, engagement scoring |
|
|
560
|
+
| `product_interaction` | Dynamic pricing, inventory optimization |
|
|
561
|
+
|
|
562
|
+
---
|
|
563
|
+
|
|
564
|
+
## API Reference
|
|
565
|
+
|
|
566
|
+
### MLCacheClient
|
|
567
|
+
|
|
568
|
+
| Method | Description |
|
|
569
|
+
|--------|-------------|
|
|
570
|
+
| `track(event)` | Track a custom event |
|
|
571
|
+
| `identify(userId, traits?)` | Identify a user with traits |
|
|
572
|
+
| `page(name, properties?)` | Track a page view |
|
|
573
|
+
| `flush()` | Manually flush the event queue |
|
|
574
|
+
| `getHealth()` | Get SDK health status |
|
|
575
|
+
| `getQueueSize()` | Get current queue size |
|
|
576
|
+
| `getVersion()` | Get SDK version |
|
|
577
|
+
| `shutdown()` | Gracefully shutdown the client |
|
|
578
|
+
| `onEvent(callback)` | Register event callback |
|
|
579
|
+
| `onError(callback)` | Register error callback |
|
|
580
|
+
| `onFlush(callback)` | Register flush callback |
|
|
581
|
+
|
|
582
|
+
---
|
|
583
|
+
|
|
584
|
+
## License
|
|
585
|
+
|
|
586
|
+
MIT
|
|
587
|
+
|
|
588
|
+
---
|
|
589
|
+
|
|
590
|
+
## Contributing
|
|
591
|
+
|
|
592
|
+
Contributions are welcome! Please read our contributing guidelines and submit pull requests to the GitHub repository.
|
|
593
|
+
|
|
594
|
+
---
|
|
595
|
+
|
|
596
|
+
<p align="center">
|
|
597
|
+
<strong>Start collecting your ML training data today.</strong><br>
|
|
598
|
+
<em>The best time to start was yesterday. The second best time is now.</em>
|
|
599
|
+
</p>
|