@earth-app/collegedb 1.0.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/LICENSE ADDED
@@ -0,0 +1,21 @@
1
+ MIT License
2
+
3
+ Copyright (c) 2025 The Earth App
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining a copy
6
+ of this software and associated documentation files (the "Software"), to deal
7
+ in the Software without restriction, including without limitation the rights
8
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
+ copies of the Software, and to permit persons to whom the Software is
10
+ furnished to do so, subject to the following conditions:
11
+
12
+ The above copyright notice and this permission notice shall be included in all
13
+ copies or substantial portions of the Software.
14
+
15
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21
+ SOFTWARE.
package/README.md ADDED
@@ -0,0 +1,518 @@
1
+ # CollegeDB
2
+
3
+ > Cloudflare D1 Sharding Router
4
+
5
+ [![TypeScript](https://img.shields.io/badge/TypeScript-5.0+-blue.svg)](https://www.typescriptlang.org/)
6
+ [![Cloudflare Workers](https://img.shields.io/badge/cloudflare-workers-orange.svg)](https://workers.cloudflare.com/)
7
+ [![License](https://img.shields.io/badge/license-MIT-green.svg)](LICENSE)
8
+
9
+ A TypeScript library for horizontal scaling of SQLite-style databases on Cloudflare using D1 and KV. CollegeDB simulates vertical scaling by routing queries to the correct D1 database instance using primary key mappings stored in Cloudflare KV.
10
+
11
+ ## Overview
12
+
13
+ CollegeDB provides a sharding layer on top of Cloudflare D1 databases, enabling you to:
14
+
15
+ - **Scale horizontally** across multiple D1 instances
16
+ - **Route queries automatically** based on primary keys
17
+ - **Maintain consistency** with KV-based mapping
18
+ - **Monitor and rebalance** shard distribution
19
+ - **Handle migrations** between shards seamlessly
20
+
21
+ ## 📦 Features
22
+
23
+ - **🔀 Automatic Query Routing**: Primary key → shard mapping using Cloudflare KV
24
+ - **🎯 Multiple Allocation Strategies**: Round-robin, random, or hash-based distribution
25
+ - **📊 Shard Coordination**: Durable Objects for allocation and statistics
26
+ - **🛠 Migration Support**: Move data between shards with zero downtime
27
+ - **🔄 Automatic Drop-in Replacement**: Zero-config integration with existing databases
28
+ - **🤖 Smart Migration Detection**: Automatically discovers and maps existing data
29
+ - **⚡ High Performance**: Optimized for Cloudflare Workers runtime
30
+ - **🔧 TypeScript First**: Full type safety and excellent DX
31
+
32
+ ## Installation
33
+
34
+ ```bash
35
+ bun add collegedb
36
+ # or
37
+ npm install collegedb
38
+ ```
39
+
40
+ ## Basic Usage
41
+
42
+ ```typescript
43
+ import { collegedb, createSchema, run, first } from 'collegedb';
44
+
45
+ // Initialize with your Cloudflare bindings (existing databases work automatically!)
46
+ collegedb(
47
+ {
48
+ kv: env.KV,
49
+ coordinator: env.ShardCoordinator,
50
+ shards: {
51
+ 'db-east': env['db-east'], // Can be existing DB with data
52
+ 'db-west': env['db-west'] // Can be existing DB with data
53
+ },
54
+ strategy: 'hash' // or 'round-robin', 'random'
55
+ },
56
+ async () => {
57
+ // Create schema on new shards only (existing shards auto-detected)
58
+ await createSchema(env['db-new-shard']);
59
+
60
+ // Insert data (automatically routed to appropriate shard)
61
+ await run('user-123', 'INSERT INTO users (id, name, email) VALUES (?, ?, ?)', ['user-123', 'Johnson', 'alice@example.com']);
62
+
63
+ // Query data (automatically routed to correct shard, works with existing data!)
64
+ const result = await first<User>('existing-user-456', 'SELECT * FROM users WHERE id = ?', ['existing-user-456']);
65
+
66
+ console.log(result); // User data from existing database
67
+ }
68
+ );
69
+ ```
70
+
71
+ ## Drop-in Replacement for Existing Databases
72
+
73
+ CollegeDB supports **seamless, automatic integration** with existing D1 databases that already contain data. Simply add your existing databases as shards in the configuration. CollegeDB will automatically detect existing data and create the necessary shard mappings **without requiring any manual migration steps**.
74
+
75
+ ### Requirements for Drop-in Replacement
76
+
77
+ 1. **Primary Keys**: All tables must have a primary key column (typically named `id`)
78
+ 2. **Schema Compatibility**: Tables should use standard SQLite data types
79
+ 3. **Access Permissions**: CollegeDB needs read/write access to existing databases
80
+ 4. **KV Namespace**: A Cloudflare KV namespace for storing shard mappings
81
+
82
+ ```typescript
83
+ import { collegedb, first, run } from 'collegedb';
84
+
85
+ // Add your existing databases as shards - that's it!
86
+ collegedb(
87
+ {
88
+ kv: env.KV,
89
+ shards: {
90
+ 'db-users': env.ExistingUserDB, // Your existing database with users
91
+ 'db-orders': env.ExistingOrderDB, // Your existing database with orders
92
+ 'db-new': env.NewDB // Optional new shard for growth
93
+ },
94
+ strategy: 'hash'
95
+ },
96
+ async () => {
97
+ // Existing data works immediately!
98
+ const existingUser = await first('user-from-old-db', 'SELECT * FROM users WHERE id = ?', ['user-from-old-db']);
99
+
100
+ // New data gets distributed automatically
101
+ await run('new-user-123', 'INSERT INTO users (id, name, email) VALUES (?, ?, ?)', ['new-user-123', 'New User', 'new@example.com']);
102
+ }
103
+ );
104
+ ```
105
+
106
+ **That's it!** No migration scripts, no manual mapping creation, no downtime. Your existing data is immediately accessible through CollegeDB's sharding system.
107
+
108
+ ### Manual Validation (Optional)
109
+
110
+ You can manually validate databases before integration if needed:
111
+
112
+ ```typescript
113
+ import { validateTableForSharding, listTables } from 'collegedb';
114
+
115
+ // Check database structure
116
+ const tables = await listTables(env.ExistingDB);
117
+ console.log('Found tables:', tables);
118
+
119
+ // Validate each table
120
+ for (const table of tables) {
121
+ const validation = await validateTableForSharding(env.ExistingDB, table);
122
+ if (validation.isValid) {
123
+ console.log(`✅ ${table}: ${validation.recordCount} records ready`);
124
+ } else {
125
+ console.log(`❌ ${table}: ${validation.issues.join(', ')}`);
126
+ }
127
+ }
128
+ ```
129
+
130
+ ### Manual Data Discovery (Optional)
131
+
132
+ If you want to inspect existing data before automatic migration:
133
+
134
+ ```typescript
135
+ import { discoverExistingPrimaryKeys } from 'collegedb';
136
+
137
+ // Discover all user IDs in existing users table
138
+ const userIds = await discoverExistingPrimaryKeys(env.ExistingDB, 'users');
139
+ console.log(`Found ${userIds.length} existing users`);
140
+
141
+ // Custom primary key column
142
+ const orderIds = await discoverExistingPrimaryKeys(env.ExistingDB, 'orders', 'order_id');
143
+ ```
144
+
145
+ ### Manual Integration (Optional)
146
+
147
+ For complete control over the integration process:
148
+
149
+ ```typescript
150
+ import { integrateExistingDatabase, KVShardMapper } from 'collegedb';
151
+
152
+ const mapper = new KVShardMapper(env.KV);
153
+
154
+ // Integrate your existing database
155
+ const result = await integrateExistingDatabase(
156
+ env.ExistingDB, // Your existing D1 database
157
+ 'db-primary', // Shard name for this database
158
+ mapper, // KV mapper instance
159
+ {
160
+ tables: ['users', 'posts', 'orders'], // Tables to integrate
161
+ primaryKeyColumn: 'id', // Primary key column name
162
+ strategy: 'hash', // Allocation strategy for future records
163
+ addShardMappingsTable: true, // Add CollegeDB metadata table
164
+ dryRun: false // Set true for testing
165
+ }
166
+ );
167
+
168
+ if (result.success) {
169
+ console.log(`✅ Integrated ${result.totalRecords} records from ${result.tablesProcessed} tables`);
170
+ } else {
171
+ console.error('Integration issues:', result.issues);
172
+ }
173
+ ```
174
+
175
+ After integration, initialize CollegeDB with your existing databases as shards:
176
+
177
+ ```typescript
178
+ import { initialize, first } from 'collegedb';
179
+
180
+ // Include existing databases as shards
181
+ initialize({
182
+ kv: env.KV,
183
+ coordinator: env.ShardCoordinator,
184
+ shards: {
185
+ 'db-primary': env.ExistingDB, // Your integrated existing database
186
+ 'db-secondary': env.AnotherExistingDB, // Another existing database
187
+ 'db-new': env.NewDB // Optional new shard for growth
188
+ },
189
+ strategy: 'hash'
190
+ });
191
+
192
+ // Existing data is now automatically routed!
193
+ const user = await first('existing-user-123', 'SELECT * FROM users WHERE id = ?', ['existing-user-123']);
194
+ ```
195
+
196
+ ### Complete Drop-in Example
197
+
198
+ The simplest possible integration - just add your existing databases:
199
+
200
+ ```typescript
201
+ import { initialize, first, run } from 'collegedb';
202
+
203
+ export default {
204
+ async fetch(request: Request, env: Env): Promise<Response> {
205
+ // Step 1: Initialize with existing databases (automatic migration happens here!)
206
+ initialize({
207
+ kv: env.KV,
208
+ shards: {
209
+ 'db-users': env.ExistingUserDB, // Your existing database with users
210
+ 'db-orders': env.ExistingOrderDB, // Your existing database with orders
211
+ 'db-new': env.NewDB // New shard for future growth
212
+ },
213
+ strategy: 'hash'
214
+ });
215
+
216
+ // Step 2: Use existing data immediately - no migration needed!
217
+ // Supports typed queries, inserts, updates, deletes, etc.
218
+ const existingUser = await first<User>('user-from-old-db', 'SELECT * FROM users WHERE id = ?', ['user-from-old-db']);
219
+
220
+ // Step 3: New data gets distributed automatically
221
+ await run('new-user-123', 'INSERT INTO users (id, name, email) VALUES (?, ?, ?)', ['new-user-123', 'New User', 'new@example.com']);
222
+
223
+ return new Response(
224
+ JSON.stringify({
225
+ existingUser: existingUser.results[0],
226
+ message: 'Automatic drop-in replacement successful!'
227
+ })
228
+ );
229
+ }
230
+ };
231
+ ```
232
+
233
+ ### Manual Integration Example
234
+
235
+ If your tables use different primary key column names:
236
+
237
+ ```typescript
238
+ // For tables with custom primary key columns
239
+ const productIds = await discoverExistingPrimaryKeys(env.ProductDB, 'products', 'product_id');
240
+ const sessionIds = await discoverExistingPrimaryKeys(env.SessionDB, 'sessions', 'session_key');
241
+ ```
242
+
243
+ Integrate only specific tables from existing databases:
244
+
245
+ ```typescript
246
+ const result = await integrateExistingDatabase(env.ExistingDB, 'db-legacy', mapper, {
247
+ tables: ['users', 'orders'] // Only integrate these tables
248
+ // Skip 'temp_logs', 'cache_data', etc.
249
+ });
250
+ ```
251
+
252
+ Test integration without making changes:
253
+
254
+ ```typescript
255
+ const testResult = await integrateExistingDatabase(env.ExistingDB, 'db-test', mapper, {
256
+ dryRun: true // No actual mappings created
257
+ });
258
+
259
+ console.log(`Would process ${testResult.totalRecords} records from ${testResult.tablesProcessed} tables`);
260
+ ```
261
+
262
+ ### Performance Impact
263
+
264
+ - **One-time Setup**: Migration detection runs once per shard
265
+ - **Minimal Overhead**: Only scans table metadata and sample records
266
+ - **Cached Results**: Subsequent operations have no migration overhead
267
+ - **Async Processing**: Doesn't block application startup or queries
268
+
269
+ ```typescript
270
+ // Simple rollback - clear all mappings
271
+ import { KVShardMapper } from 'collegedb';
272
+ const mapper = new KVShardMapper(env.KV);
273
+ await mapper.clearAllMappings(); // Returns to pre-migration state
274
+
275
+ // Or clear cache to force re-detection
276
+ import { clearMigrationCache } from 'collegedb';
277
+ clearMigrationCache(); // Forces fresh migration check
278
+ ```
279
+
280
+ ## Troubleshooting
281
+
282
+ ### Tables without Primary Keys
283
+
284
+ ```typescript
285
+ // Error: Primary key column 'id' not found
286
+ // Solution: Add primary key to existing table
287
+ await db.prepare(`ALTER TABLE legacy_table ADD COLUMN id TEXT PRIMARY KEY`).run();
288
+ ```
289
+
290
+ ### Large Database Integration
291
+
292
+ ```typescript
293
+ // For very large databases, integrate in batches
294
+ const allTables = await listTables(env.LargeDB);
295
+ const batchSize = 2;
296
+
297
+ for (let i = 0; i < allTables.length; i += batchSize) {
298
+ const batch = allTables.slice(i, i + batchSize);
299
+ await integrateExistingDatabase(env.LargeDB, 'db-large', mapper, {
300
+ tables: batch
301
+ });
302
+ }
303
+ ```
304
+
305
+ ### Mixed Primary Key Types
306
+
307
+ ```typescript
308
+ // Handle different primary key column names per table
309
+ const customIntegration = {
310
+ users: 'user_id',
311
+ orders: 'order_number',
312
+ products: 'sku'
313
+ };
314
+
315
+ for (const [table, pkColumn] of Object.entries(customIntegration)) {
316
+ const keys = await discoverExistingPrimaryKeys(env.DB, table, pkColumn);
317
+ await createMappingsForExistingKeys(keys, ['db-shard1'], 'hash', mapper);
318
+ }
319
+ ```
320
+
321
+ ## 📚 API Reference
322
+
323
+ | Function | Description | Parameters |
324
+ | ------------------------------ | -------------------------------------------- | ------------------------ |
325
+ | `collegedb(config, callback)` | Initialize CollegeDB, then run a callback | `CollegeDBConfig, ()=>T` |
326
+ | `initialize(config)` | Initialize CollegeDB with configuration | `CollegeDBConfig` |
327
+ | `createSchema(d1)` | Create database schema on a D1 instance | `D1Database` |
328
+ | `prepare(key, sql)` | Prepare a SQL statement for execution | `string, string` |
329
+ | `run(key, sql, bindings)` | Execute a SQL query with primary key routing | `string, string, any[]` |
330
+ | `first(key, sql, bindings)` | Execute a SQL query and return first result | `string, string, any[]` |
331
+ | `all(key, sql, bindings)` | Execute a SQL query and return all results | `string, string, any[]` |
332
+ | `reassignShard(key, newShard)` | Move primary key to different shard | `string, string` |
333
+ | `listKnownShards()` | Get list of available shards | `void` |
334
+ | `getShardStats()` | Get statistics for all shards | `void` |
335
+
336
+ ### Drop-in Replacement Functions
337
+
338
+ | Function | Description | Parameters |
339
+ | ----------------------------------------- | ------------------------------------------------------- | ------------------------------ |
340
+ | `autoDetectAndMigrate(d1, shard, config)` | **NEW**: Automatically detect and migrate existing data | `D1Database, string, config` |
341
+ | `checkMigrationNeeded(d1, shard, config)` | **NEW**: Check if database needs migration | `D1Database, string, config` |
342
+ | `validateTableForSharding(d1, table)` | Check if table is suitable for sharding | `D1Database, string` |
343
+ | `discoverExistingPrimaryKeys(d1, table)` | Find all primary keys in existing table | `D1Database, string` |
344
+ | `integrateExistingDatabase(d1, shard)` | Complete drop-in integration of existing DB | `D1Database, string, mapper` |
345
+ | `createMappingsForExistingKeys(keys)` | Create shard mappings for existing keys | `string[], string[], strategy` |
346
+ | `listTables(d1)` | Get list of tables in database | `D1Database` |
347
+ | `clearMigrationCache()` | Clear automatic migration cache | `void` |
348
+
349
+ ## 🏗 Architecture
350
+
351
+ ```txt
352
+ ┌─────────────────────────────────────────────────────────────┐
353
+ │ Cloudflare Worker │
354
+ ├─────────────────────────────────────────────────────────────┤
355
+ │ CollegeDB Router │
356
+ │ ┌─────────────┐ ┌─────────────┐ ┌─────────────────────┐ │
357
+ │ │ KV │ │ Durable │ │ Query Router │ │
358
+ │ │ Mappings │ │ Objects │ │ │ │
359
+ │ │ │ │ (Optional) │ │ │ │
360
+ │ └─────────────┘ └─────────────┘ └─────────────────────┘ │
361
+ ├─────────────────────────────────────────────────────────────┤
362
+ │ ┌─────────────┐ ┌─────────────┐ ┌─────────────────────┐ │
363
+ │ │ D1 East │ │ D1 West │ │ D1 Central │ │
364
+ │ │ Shard │ │ Shard │ │ Shard │ │
365
+ │ │ │ │ │ │ (Optional) │ │
366
+ │ └─────────────┘ └─────────────┘ └─────────────────────┘ │
367
+ └─────────────────────────────────────────────────────────────┘
368
+ ```
369
+
370
+ ### Data Flow
371
+
372
+ 1. **Query Received**: Application sends query with primary key
373
+ 2. **Shard Resolution**: CollegeDB checks KV for existing mapping or allocates new shard
374
+ 3. **Query Execution**: SQL executed on appropriate D1 database
375
+ 4. **Response**: Results returned to application
376
+
377
+ ### Shard Allocation Strategies
378
+
379
+ - **Hash**: Consistent hashing for deterministic shard selection
380
+ - **Round-Robin**: Evenly distribute new keys across shards
381
+ - **Random**: Random shard selection for load balancing
382
+
383
+ ## 🌐 Cloudflare Setup
384
+
385
+ ### 1. Create D1 Databases
386
+
387
+ ```bash
388
+ # Create multiple D1 databases for sharding
389
+ wrangler d1 create collegedb-east
390
+ wrangler d1 create collegedb-west
391
+ wrangler d1 create collegedb-central
392
+ ```
393
+
394
+ ### 2. Create KV Namespace
395
+
396
+ ```bash
397
+ # Create KV namespace for shard mappings
398
+ wrangler kv namespace create "KV"
399
+ ```
400
+
401
+ ### 3. Configure wrangler.toml
402
+
403
+ ```toml
404
+ [[d1_databases]]
405
+ binding = "db-east"
406
+ database_name = "collegedb-east"
407
+ database_id = "your-database-id"
408
+
409
+ [[d1_databases]]
410
+ binding = "db-west"
411
+ database_name = "collegedb-west"
412
+ database_id = "your-database-id"
413
+
414
+ [[kv_namespaces]]
415
+ binding = "KV"
416
+ id = "your-kv-namespace-id"
417
+
418
+ [[durable_objects.bindings]]
419
+ name = "ShardCoordinator"
420
+ class_name = "ShardCoordinator"
421
+ ```
422
+
423
+ ### 4. Deploy
424
+
425
+ ```bash
426
+ # Deploy to Cloudflare Workers
427
+ wrangler deploy
428
+
429
+ # Deploy with environment
430
+ wrangler deploy --env production
431
+ ```
432
+
433
+ ## 📊 Monitoring and Maintenance
434
+
435
+ ### Shard Statistics
436
+
437
+ ```typescript
438
+ import { getShardStats, listKnownShards } from 'collegedb';
439
+
440
+ // Get detailed statistics
441
+ const stats = await getShardStats();
442
+ console.log(stats);
443
+ // [
444
+ // { binding: 'db-east', count: 1542 },
445
+ // { binding: 'db-west', count: 1458 }
446
+ // ]
447
+
448
+ // List available shards
449
+ const shards = await listKnownShards();
450
+ console.log(shards); // ['db-east', 'db-west']
451
+ ```
452
+
453
+ ### Shard Rebalancing
454
+
455
+ ```typescript
456
+ import { reassignShard } from 'collegedb';
457
+
458
+ // Move a primary key to a different shard
459
+ await reassignShard('user-123', 'db-west');
460
+ ```
461
+
462
+ ### Health Monitoring
463
+
464
+ Monitor your CollegeDB deployment by tracking:
465
+
466
+ - **Shard distribution balance**
467
+ - **Query latency per shard**
468
+ - **Error rates and failed queries**
469
+ - **KV operation metrics**
470
+
471
+ ## 🔧 Advanced Configuration
472
+
473
+ ### Custom Allocation Strategy
474
+
475
+ ```typescript
476
+ initialize({
477
+ kv: env.KV,
478
+ shards: { 'db-east': env['db-east'], 'db-west': env['db-west'] },
479
+ strategy: 'hash' // Shard selection based on primary key hash
480
+ });
481
+ ```
482
+
483
+ ### Environment-Specific Setup
484
+
485
+ ```typescript
486
+ const config = {
487
+ kv: env.KV,
488
+ shards: env.NODE_ENV === 'production' ? { 'db-prod-1': env['db-prod-1'], 'db-prod-2': env['db-prod-2'] } : { 'db-dev': env['db-dev'] },
489
+ strategy: 'round-robin' // Shard selection is evenly distributed, regardless of size
490
+ };
491
+
492
+ initialize(config);
493
+ ```
494
+
495
+ ## 🤝 Contributing
496
+
497
+ 1. Fork the repository
498
+ 2. Create a feature branch: `git checkout -b feature/amazing-feature`
499
+ 3. Commit changes: `git commit -m 'Add amazing feature'`
500
+ 4. Push to branch: `git push origin feature/amazing-feature`
501
+ 5. Submit a pull request
502
+
503
+ ## 📝 License
504
+
505
+ This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.
506
+
507
+ ## 🔗 Links
508
+
509
+ - [Cloudflare D1 Documentation](https://developers.cloudflare.com/d1/)
510
+ - [Cloudflare KV Documentation](https://developers.cloudflare.com/kv/)
511
+ - [Cloudflare Workers Documentation](https://developers.cloudflare.com/workers/)
512
+ - [Durable Objects Documentation](https://developers.cloudflare.com/durable-objects/)
513
+
514
+ ## 🆘 Support
515
+
516
+ - 📖 [Documentation](https://earth-app.github.io/CollegeDB)
517
+ - 🐛 [Report Issues](https://github.com/earth-app/CollegeDB/issues)
518
+ - 💬 [Discussions](https://github.com/earth-app/CollegeDB/discussions)