melaka 0.0.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (49) hide show
  1. package/CONTRIBUTING.md +347 -0
  2. package/LICENSE +21 -0
  3. package/README.md +57 -0
  4. package/docs/AI_PROVIDERS.md +343 -0
  5. package/docs/ARCHITECTURE.md +512 -0
  6. package/docs/CLI.md +438 -0
  7. package/docs/CONFIGURATION.md +453 -0
  8. package/docs/INTEGRATION.md +477 -0
  9. package/docs/ROADMAP.md +248 -0
  10. package/package.json +46 -0
  11. package/packages/ai/README.md +43 -0
  12. package/packages/ai/package.json +42 -0
  13. package/packages/ai/src/facade.ts +120 -0
  14. package/packages/ai/src/index.ts +34 -0
  15. package/packages/ai/src/prompt.ts +117 -0
  16. package/packages/ai/src/providers/gemini.ts +185 -0
  17. package/packages/ai/src/providers/index.ts +9 -0
  18. package/packages/ai/src/types.ts +134 -0
  19. package/packages/ai/tsconfig.json +19 -0
  20. package/packages/cli/README.md +70 -0
  21. package/packages/cli/package.json +44 -0
  22. package/packages/cli/src/cli.ts +30 -0
  23. package/packages/cli/src/commands/deploy.ts +115 -0
  24. package/packages/cli/src/commands/index.ts +9 -0
  25. package/packages/cli/src/commands/init.ts +107 -0
  26. package/packages/cli/src/commands/status.ts +73 -0
  27. package/packages/cli/src/commands/translate.ts +92 -0
  28. package/packages/cli/src/commands/validate.ts +69 -0
  29. package/packages/cli/tsconfig.json +19 -0
  30. package/packages/core/README.md +46 -0
  31. package/packages/core/package.json +50 -0
  32. package/packages/core/src/config.ts +241 -0
  33. package/packages/core/src/index.ts +111 -0
  34. package/packages/core/src/schema-generator.ts +263 -0
  35. package/packages/core/src/schemas.ts +126 -0
  36. package/packages/core/src/types.ts +481 -0
  37. package/packages/core/src/utils.ts +343 -0
  38. package/packages/core/tsconfig.json +19 -0
  39. package/packages/firestore/README.md +60 -0
  40. package/packages/firestore/package.json +48 -0
  41. package/packages/firestore/src/generator.ts +270 -0
  42. package/packages/firestore/src/i18n.ts +262 -0
  43. package/packages/firestore/src/index.ts +54 -0
  44. package/packages/firestore/src/processor.ts +245 -0
  45. package/packages/firestore/src/queue.ts +202 -0
  46. package/packages/firestore/src/task-handler.ts +164 -0
  47. package/packages/firestore/tsconfig.json +19 -0
  48. package/pnpm-workspace.yaml +2 -0
  49. package/turbo.json +31 -0
@@ -0,0 +1,512 @@
1
+ # Melaka Architecture
2
+
3
+ This document describes the system architecture for Melaka, an AI-powered localization SDK for Firebase Firestore.
4
+
5
+ ## Overview
6
+
7
+ Melaka is designed as a modular system with clear separation of concerns:
8
+
9
+ ```
10
+ ┌─────────────────────────────────────────────────────────────────────────────┐
11
+ │ USER INTERFACE │
12
+ ├─────────────────────────────────────────────────────────────────────────────┤
13
+ │ │
14
+ │ melaka init melaka deploy melaka translate melaka status │
15
+ │ │
16
+ └──────────────────────────────────┬──────────────────────────────────────────┘
17
+
18
+
19
+ ┌─────────────────────────────────────────────────────────────────────────────┐
20
+ │ CLI LAYER (@melaka/cli) │
21
+ ├─────────────────────────────────────────────────────────────────────────────┤
22
+ │ │
23
+ │ Config Loader Trigger Generator Migration Runner Status Checker │
24
+ │ │
25
+ └──────────────────────────────────┬──────────────────────────────────────────┘
26
+
27
+ ┌──────────────┼──────────────┐
28
+ ▼ ▼ ▼
29
+ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐
30
+ │ @melaka/core│ │@melaka/fire-│ │ @melaka/ai │
31
+ │ │ │ store │ │ │
32
+ │ - Config │ │ │ │ - Gemini │
33
+ │ - Types │ │ - Triggers │ │ - OpenAI │
34
+ │ - Schemas │ │ - i18n Ops │ │ - Claude │
35
+ │ - Utils │ │ - Tasks │ │ - Schema │
36
+ └─────────────┘ └─────────────┘ └─────────────┘
37
+
38
+
39
+ ┌───────────────────────────────────────────────┐
40
+ │ FIREBASE / FIRESTORE │
41
+ │ │
42
+ │ ┌──────────┐ ┌──────────┐ ┌──────────┐ │
43
+ │ │Collection│───▶│ i18n/ │───▶│ ms-MY │ │
44
+ │ │ /doc │ │ │ │ zh-CN │ │
45
+ │ └──────────┘ └──────────┘ │ ja-JP │ │
46
+ │ └──────────┘ │
47
+ └───────────────────────────────────────────────┘
48
+ ```
49
+
50
+ ## Package Structure
51
+
52
+ Melaka uses a monorepo structure with the following packages:
53
+
54
+ ```
55
+ melaka/
56
+ ├── packages/
57
+ │ ├── core/ # Config parsing, types, schemas, utilities
58
+ │ ├── firestore/ # Firestore adapter, triggers, i18n operations
59
+ │ ├── ai/ # AI provider adapters (Gemini, OpenAI, Claude)
60
+ │ └── cli/ # Command-line interface
61
+ ├── examples/
62
+ │ └── basic/ # Basic usage example
63
+ ├── docs/ # Documentation
64
+ └── scripts/ # Build and release scripts
65
+ ```
66
+
67
+ ### @melaka/core
68
+
69
+ The foundation package containing:
70
+
71
+ - **Configuration Types** — TypeScript interfaces for config files
72
+ - **Schema Definitions** — Zod schemas for validation
73
+ - **Field Type Detection** — Auto-detection of translatable vs non-translatable fields
74
+ - **Content Hashing** — SHA256 hashing for change detection
75
+ - **Glossary Management** — Shared terminology handling
76
+
77
+ ### @melaka/firestore
78
+
79
+ Firebase/Firestore-specific functionality:
80
+
81
+ - **Trigger Generator** — Creates `onDocumentWritten` triggers from config
82
+ - **i18n Operations** — Read/write to `/{doc}/i18n/{locale}` subcollections
83
+ - **Task Queue Integration** — Cloud Tasks for async translation
84
+ - **Batch Processing** — Efficient bulk translation with rate limiting
85
+ - **Collection Group Support** — Handle subcollection translation
86
+
87
+ ### @melaka/ai
88
+
89
+ AI provider adapters with unified interface:
90
+
91
+ - **Gemini Adapter** — Google's Gemini models via Genkit
92
+ - **OpenAI Adapter** — GPT-4 and GPT-3.5
93
+ - **Claude Adapter** — Anthropic's Claude models
94
+ - **Translation Facade** — Unified translation interface
95
+ - **Schema-Based Output** — Structured JSON output with Zod validation
96
+
97
+ ### @melaka/cli
98
+
99
+ Command-line interface:
100
+
101
+ - **`melaka init`** — Initialize config file in a project
102
+ - **`melaka deploy`** — Generate and deploy Firestore triggers
103
+ - **`melaka translate`** — Run manual translation for a collection
104
+ - **`melaka status`** — Check translation progress
105
+ - **`melaka retry`** — Retry failed translations
106
+
107
+ ---
108
+
109
+ ## Core Concepts
110
+
111
+ ### 1. i18n Subcollection Pattern
112
+
113
+ Translations are stored as subcollections under each document:
114
+
115
+ ```
116
+ /articles/article-123
117
+ /articles/article-123/i18n/ms-MY ← Malaysian translation
118
+ /articles/article-123/i18n/zh-CN ← Chinese translation
119
+ /articles/article-123/i18n/ja-JP ← Japanese translation
120
+ ```
121
+
122
+ Each i18n document contains:
123
+
124
+ ```typescript
125
+ {
126
+ // Translated fields
127
+ title: "Tajuk Artikel",
128
+ content: "Kandungan artikel...",
129
+
130
+ // Original non-translatable fields (copied)
131
+ author_ref: DocumentReference,
132
+ created_at: Timestamp,
133
+ view_count: 123,
134
+
135
+ // Melaka metadata
136
+ _melaka: {
137
+ source_hash: "abc123...", // Hash of source content
138
+ translated_at: Timestamp, // When translation occurred
139
+ model: "gemini-2.5-flash", // AI model used
140
+ status: "completed", // completed | failed | pending
141
+ reviewed: false, // Human review flag
142
+ error?: string // Error message if failed
143
+ }
144
+ }
145
+ ```
146
+
147
+ ### 2. Field Type Detection
148
+
149
+ Melaka automatically determines which fields to translate:
150
+
151
+ **Translatable (sent to AI):**
152
+ - `string` — Text content
153
+ - `string[]` — Arrays of text
154
+
155
+ **Non-Translatable (copied as-is):**
156
+ - `number`, `number[]` — Numeric values
157
+ - `boolean` — Boolean flags
158
+ - `object`, `object[]` — Complex objects
159
+ - `DocumentReference`, `DocumentReference[]` — Firestore references
160
+ - `Timestamp` — Date/time values
161
+ - `GeoPoint` — Location data
162
+
163
+ ### 3. Change Detection
164
+
165
+ Before translating, Melaka creates a SHA256 hash of the source content:
166
+
167
+ ```typescript
168
+ const sourceHash = sha256(JSON.stringify(translatableContent));
169
+ ```
170
+
171
+ On subsequent updates:
172
+ 1. Compute new source hash
173
+ 2. Compare with stored `_melaka.source_hash`
174
+ 3. Skip translation if unchanged (unless `forceUpdate: true`)
175
+
176
+ ### 4. Configuration-Driven
177
+
178
+ All behavior is defined in `melaka.config.ts`:
179
+
180
+ ```typescript
181
+ import { defineConfig } from 'melaka';
182
+
183
+ export default defineConfig({
184
+ // Target languages
185
+ languages: ['ms-MY', 'zh-CN'],
186
+
187
+ // AI provider settings
188
+ ai: {
189
+ provider: 'gemini',
190
+ model: 'gemini-2.5-flash',
191
+ temperature: 0.3,
192
+ },
193
+
194
+ // Collections to translate
195
+ collections: [
196
+ {
197
+ path: 'articles',
198
+ fields: ['title', 'content', 'summary'],
199
+ prompt: 'This is blog content. Keep the tone casual.',
200
+ },
201
+ {
202
+ path: 'products',
203
+ fields: ['name', 'description'],
204
+ glossary: {
205
+ 'widget': 'widget', // Don't translate
206
+ },
207
+ },
208
+ ],
209
+
210
+ // Shared glossary
211
+ glossary: {
212
+ 'company_name': 'Nama Syarikat',
213
+ },
214
+ });
215
+ ```
216
+
217
+ ---
218
+
219
+ ## Data Flow
220
+
221
+ ### Automatic Translation (Firestore Triggers)
222
+
223
+ ```
224
+ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐
225
+ │ Client │ │ Firestore │ │ Trigger │ │ Cloud Task │
226
+ │ Write │────▶│ Update │────▶│ Fires │────▶│ Enqueued │
227
+ └─────────────┘ └─────────────┘ └─────────────┘ └──────┬──────┘
228
+
229
+ ┌─────────────┐ ┌─────────────┐ │
230
+ │ i18n │ │ AI │ │
231
+ │ Updated │◀────│ Translation │◀───────────┘
232
+ └─────────────┘ └─────────────┘
233
+ ```
234
+
235
+ 1. Client creates/updates a document
236
+ 2. Firestore trigger (`onDocumentWritten`) fires
237
+ 3. Trigger enqueues a Cloud Task with document reference
238
+ 4. Task handler:
239
+ - Reads source document
240
+ - Separates translatable/non-translatable content
241
+ - Checks source hash for changes
242
+ - Calls AI translation API
243
+ - Writes to i18n subcollection
244
+
245
+ ### Manual Translation (CLI)
246
+
247
+ ```
248
+ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐
249
+ │ melaka │ │ Query │ │ Enqueue │ │ Process │
250
+ │ translate │────▶│ Collection │────▶│ Tasks │────▶│ Tasks │
251
+ └─────────────┘ └─────────────┘ └─────────────┘ └─────────────┘
252
+ ```
253
+
254
+ 1. CLI queries all documents in a collection
255
+ 2. For each document, checks if translation needed
256
+ 3. Enqueues Cloud Tasks in batches
257
+ 4. Tasks process with rate limiting and retry logic
258
+
259
+ ---
260
+
261
+ ## Translation Process Detail
262
+
263
+ ### 1. Content Separation
264
+
265
+ ```typescript
266
+ function separateContent(doc: DocumentData, config: CollectionConfig) {
267
+ const translatable: Record<string, unknown> = {};
268
+ const nonTranslatable: Record<string, unknown> = {};
269
+
270
+ for (const [field, value] of Object.entries(doc)) {
271
+ const fieldType = detectFieldType(value);
272
+
273
+ if (isTranslatable(fieldType)) {
274
+ translatable[field] = value;
275
+ } else {
276
+ nonTranslatable[field] = value;
277
+ }
278
+ }
279
+
280
+ return { translatable, nonTranslatable };
281
+ }
282
+ ```
283
+
284
+ ### 2. Schema Generation
285
+
286
+ Dynamic Zod schemas are created based on field types:
287
+
288
+ ```typescript
289
+ function createTranslationSchema(translatable: Record<string, unknown>) {
290
+ const schemaFields: Record<string, ZodSchema> = {};
291
+
292
+ for (const [field, value] of Object.entries(translatable)) {
293
+ if (typeof value === 'string') {
294
+ schemaFields[field] = z.string();
295
+ } else if (Array.isArray(value) && typeof value[0] === 'string') {
296
+ schemaFields[field] = z.array(z.string());
297
+ }
298
+ }
299
+
300
+ return z.object(schemaFields);
301
+ }
302
+ ```
303
+
304
+ ### 3. AI Translation
305
+
306
+ ```typescript
307
+ const prompt = `
308
+ Translate the following content from English to ${targetLanguage}.
309
+
310
+ Preserve:
311
+ - Exact structure and meaning
312
+ - Markdown formatting
313
+ - Proper nouns (names, brands)
314
+ - Numbers and indices
315
+
316
+ ${config.prompt || ''}
317
+
318
+ Glossary:
319
+ ${formatGlossary(config.glossary)}
320
+
321
+ Content:
322
+ ${JSON.stringify(translatableContent, null, 2)}
323
+ `;
324
+
325
+ const result = await ai.generate({
326
+ prompt,
327
+ output: { schema: translationSchema },
328
+ });
329
+ ```
330
+
331
+ ### 4. Content Merge & Save
332
+
333
+ ```typescript
334
+ const finalDoc = {
335
+ ...result.output, // Translated fields
336
+ ...nonTranslatableContent, // Copied fields
337
+ _melaka: {
338
+ source_hash: sourceHash,
339
+ translated_at: Timestamp.now(),
340
+ model: config.ai.model,
341
+ status: 'completed',
342
+ reviewed: false,
343
+ },
344
+ };
345
+
346
+ await doc.ref.collection('i18n').doc(targetLanguage).set(finalDoc);
347
+ ```
348
+
349
+ ---
350
+
351
+ ## Error Handling
352
+
353
+ ### Retry Strategy
354
+
355
+ Failed translations use exponential backoff:
356
+
357
+ ```typescript
358
+ {
359
+ retryConfig: {
360
+ maxAttempts: 3,
361
+ minBackoffSeconds: 60,
362
+ maxBackoffSeconds: 300,
363
+ }
364
+ }
365
+ ```
366
+
367
+ ### Error Recording
368
+
369
+ Failed translations are recorded with error details:
370
+
371
+ ```typescript
372
+ {
373
+ _melaka: {
374
+ status: 'failed',
375
+ error: 'Rate limit exceeded',
376
+ translated_at: Timestamp.now(),
377
+ source_hash: sourceHash,
378
+ }
379
+ }
380
+ ```
381
+
382
+ ### Recovery
383
+
384
+ ```bash
385
+ # Retry all failed translations
386
+ melaka retry --collection articles --language ms-MY
387
+ ```
388
+
389
+ ---
390
+
391
+ ## Performance Considerations
392
+
393
+ ### Rate Limiting
394
+
395
+ - **Default:** 10 concurrent translation tasks
396
+ - **Staggered Execution:** Tasks scheduled with increasing delays
397
+ - **Batch Processing:** Documents processed in configurable batch sizes
398
+
399
+ ### Optimization
400
+
401
+ - **Content Hashing:** Skip unchanged documents
402
+ - **Field Filtering:** Only translate configured fields
403
+ - **Async Processing:** All translations run as Cloud Tasks
404
+
405
+ ### Monitoring
406
+
407
+ ```bash
408
+ # Check translation progress
409
+ melaka status --collection articles
410
+
411
+ # Output:
412
+ # articles → ms-MY
413
+ # Total: 150
414
+ # Completed: 142 (94.7%)
415
+ # Failed: 3
416
+ # Pending: 5
417
+ ```
418
+
419
+ ---
420
+
421
+ ## Security
422
+
423
+ ### API Key Management
424
+
425
+ AI provider API keys should be stored as Firebase secrets:
426
+
427
+ ```bash
428
+ firebase functions:secrets:set GEMINI_API_KEY
429
+ firebase functions:secrets:set OPENAI_API_KEY
430
+ ```
431
+
432
+ Access in triggers:
433
+
434
+ ```typescript
435
+ import { defineSecret } from 'firebase-functions/params';
436
+
437
+ const geminiApiKey = defineSecret('GEMINI_API_KEY');
438
+
439
+ export const translateTask = onTaskDispatched(
440
+ { secrets: [geminiApiKey] },
441
+ async (request) => {
442
+ // Use geminiApiKey.value()
443
+ }
444
+ );
445
+ ```
446
+
447
+ ### Firestore Rules
448
+
449
+ Example rules for i18n subcollections:
450
+
451
+ ```javascript
452
+ rules_version = '2';
453
+ service cloud.firestore {
454
+ match /databases/{database}/documents {
455
+ // Allow read access to translations
456
+ match /{collection}/{docId}/i18n/{locale} {
457
+ allow read: if true;
458
+ allow write: if false; // Only Cloud Functions can write
459
+ }
460
+ }
461
+ }
462
+ ```
463
+
464
+ ---
465
+
466
+ ## Future Considerations
467
+
468
+ ### Potential Features
469
+
470
+ - **Translation Memory** — Reuse translations across documents
471
+ - **Real-time Preview** — Preview translations before saving
472
+ - **Review Dashboard** — Web UI for human review workflow
473
+ - **Incremental Field Translation** — Translate only changed fields
474
+ - **Custom Validators** — User-defined validation rules
475
+ - **Webhook Notifications** — Notify external systems on completion
476
+
477
+ ### Database Adapters (Future)
478
+
479
+ The architecture is designed to support other databases:
480
+
481
+ ```
482
+ melaka/
483
+ ├── packages/
484
+ │ ├── firestore/ # Current
485
+ │ ├── supabase/ # Future
486
+ │ ├── mongodb/ # Future
487
+ │ └── planetscale/ # Future
488
+ ```
489
+
490
+ Each adapter would implement a common interface:
491
+
492
+ ```typescript
493
+ interface DatabaseAdapter {
494
+ getDocument(path: string): Promise<Document>;
495
+ setTranslation(path: string, locale: string, data: object): Promise<void>;
496
+ queryCollection(path: string): Promise<Document[]>;
497
+ onDocumentChange(path: string, handler: ChangeHandler): void;
498
+ }
499
+ ```
500
+
501
+ ---
502
+
503
+ ## Summary
504
+
505
+ Melaka provides a complete solution for Firestore localization:
506
+
507
+ 1. **Declarative Configuration** — Simple config file defines all behavior
508
+ 2. **Automatic Translation** — Firestore triggers keep translations in sync
509
+ 3. **AI-Powered** — Leverages modern LLMs for quality translations
510
+ 4. **Production-Ready Pattern** — i18n subcollections proven in real apps
511
+ 5. **Developer Experience** — CLI for common operations
512
+ 6. **Extensible** — Modular architecture for future growth