npm - bun-sqlite-for-rxdb - Versions diffs - 1.0.1 - Mend

bun-sqlite-for-rxdb 1.0.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (68) hide show

package/.serena/project.yml +84 -0
package/CHANGELOG.md +300 -0
package/LICENSE +21 -0
package/README.md +87 -0
package/ROADMAP.md +532 -0
package/benchmarks/benchmark.ts +145 -0
package/benchmarks/case-insensitive-10runs.ts +156 -0
package/benchmarks/fts5-1m-scale.ts +126 -0
package/benchmarks/fts5-before-after.ts +104 -0
package/benchmarks/indexed-benchmark.ts +141 -0
package/benchmarks/new-operators-benchmark.ts +140 -0
package/benchmarks/query-builder-benchmark.ts +88 -0
package/benchmarks/query-builder-consistency.ts +109 -0
package/benchmarks/raw-better-sqlite3-10m.ts +85 -0
package/benchmarks/raw-better-sqlite3.ts +86 -0
package/benchmarks/raw-bun-sqlite-10m.ts +85 -0
package/benchmarks/raw-bun-sqlite.ts +86 -0
package/benchmarks/regex-10runs-all.ts +216 -0
package/benchmarks/regex-comparison-benchmark.ts +161 -0
package/benchmarks/regex-real-comparison.ts +213 -0
package/benchmarks/run-10x.sh +19 -0
package/benchmarks/smart-regex-benchmark.ts +148 -0
package/benchmarks/sql-vs-mingo-benchmark.ts +210 -0
package/benchmarks/sql-vs-mingo-comparison.ts +175 -0
package/benchmarks/text-vs-jsonb.ts +167 -0
package/benchmarks/wal-benchmark.ts +112 -0
package/docs/architectural-patterns.md +1336 -0
package/docs/id1-testsuite-journey.md +839 -0
package/docs/official-test-suite-setup.md +393 -0
package/nul +0 -0
package/package.json +44 -0
package/src/changestream.test.ts +182 -0
package/src/cleanup.test.ts +110 -0
package/src/collection-isolation.test.ts +74 -0
package/src/connection-pool.test.ts +102 -0
package/src/connection-pool.ts +38 -0
package/src/findDocumentsById.test.ts +122 -0
package/src/index.ts +2 -0
package/src/instance.ts +382 -0
package/src/multi-instance-events.test.ts +204 -0
package/src/query/and-operator.test.ts +39 -0
package/src/query/builder.test.ts +96 -0
package/src/query/builder.ts +154 -0
package/src/query/elemMatch-operator.test.ts +24 -0
package/src/query/exists-operator.test.ts +28 -0
package/src/query/in-operators.test.ts +54 -0
package/src/query/mod-operator.test.ts +22 -0
package/src/query/nested-query.test.ts +198 -0
package/src/query/not-operators.test.ts +49 -0
package/src/query/operators.test.ts +70 -0
package/src/query/operators.ts +185 -0
package/src/query/or-operator.test.ts +68 -0
package/src/query/regex-escaping-regression.test.ts +43 -0
package/src/query/regex-operator.test.ts +44 -0
package/src/query/schema-mapper.ts +27 -0
package/src/query/size-operator.test.ts +22 -0
package/src/query/smart-regex.ts +52 -0
package/src/query/type-operator.test.ts +37 -0
package/src/query-cache.test.ts +286 -0
package/src/rxdb-helpers.test.ts +348 -0
package/src/rxdb-helpers.ts +262 -0
package/src/schema-version-isolation.test.ts +126 -0
package/src/statement-manager.ts +69 -0
package/src/storage.test.ts +589 -0
package/src/storage.ts +21 -0
package/src/types.ts +14 -0
package/test/rxdb-test-suite.ts +27 -0
package/tsconfig.json +31 -0

package/docs/architectural-patterns.md ADDED Viewed

@@ -0,0 +1,1336 @@
+# Architectural Patterns
+Key design patterns and decisions for `bun-sqlite-for-rxdb` development.
+## Quick Reference
+| Pattern | Version | Key Benefit |
+|---------|---------|-------------|
+| Zero `any` types | v0.1.1 | Type safety |
+| Pure functions | v0.3.0 | Testability |
+| Benchmark-driven | v0.2.0 | Performance |
+| Incremental tests | v0.3.0 | Isolation |
+| WAL mode | v0.1.2 | 3-6x speedup |
+| 409 conflicts | v0.2.0 | Replication |
+| RxDB API alignment | v0.3.0 | Partial success |
+| Recursive builder | v0.3.0 | Nested queries |
+| NULL handling | v0.3.0 | Correctness |
+| Minimal code | All | Maintainability |
+| SQL vs Mingo hybrid | v0.3.0 | Right tool for job |
+| Smart regex optimization | v0.3.0+ | 2.03x for exact matches |
+| FTS5 NOT worth it | v0.3.0+ | 1.79x slower at 100k scale |
+| JSONB storage | v0.3.0+ | 1.57x faster complex queries |
+| Storage layer architecture | v0.3.0+ | Return ALL documents |
+| Bun console.log issue | v0.3.0+ | Use console.error |
+| Connection pooling | v0.3.0+ | Multi-instance support |
+| Official multi-instance | v0.3.0+ | Use RxDB's implementation |
+| Composite primary key | v0.3.0+ | Handle both formats |
+| Test at right level | v0.3.0+ | Interface not implementation |
+| Bun test compatibility | v0.3.0+ | Mocha through Bun |
+| Query builder cache | v0.3.0+ | 5.2-57.9x speedup |
+| Performance timing | v0.3.0+ | hrtime.bigint() |
+| Cache lifecycle | v0.3.0+ | Global with LRU |
+| **Attachments support** | **v1.0.0** | **Separate table + digest validation** |
+| **RxDB helper functions** | **v1.0.0** | **Battle-tested conflict detection** |
+| **bulkWrite refactoring** | **v1.0.0** | **Cleaner architecture** |
+---
+**Last updated:** v1.0.0 (2026-02-23)
+---
+## 1. Zero `any` Types Policy
+**Rule:** Never use TypeScript `any` types.
+**Why:**
+- Type safety catches bugs at compile time
+- Better IDE autocomplete and refactoring
+- Self-documenting code
+**Example:**
+```typescript
+// ❌ Bad
+function query(selector: any): any { ... }
+// ✅ Good
+function query<RxDocType>(
+  selector: MangoQuerySelector<RxDocType>
+): RxDocumentData<RxDocType>[] { ... }
+```
+**History:** v0.1.1 removed all 32 `any` instances.
+---
+## 2. DRY Architecture - Pure Functions
+**Rule:** Prefer pure functions over classes/state.
+**Why:**
+- Easier to test (no setup/teardown)
+- Composable and reusable
+- No hidden dependencies
+**Example:**
+```typescript
+// ✅ Pure operator functions
+export function translateEq(field: string, value: unknown): SqlFragment {
+  return { sql: `${field} = ?`, args: [value] };
+}
+export function translateIn(field: string, values: unknown[]): SqlFragment {
+  if (values.length === 0) return { sql: '1=0', args: [] };
+  const placeholders = values.map(() => '?').join(', ');
+  return { sql: `${field} IN (${placeholders})`, args: values };
+}
+```
+**History:** v0.3.0 query builder uses pure operator functions.
+---
+## 3. Performance-First Decisions
+**Rule:** Benchmark before choosing implementation.
+**Why:**
+- Avoid premature optimization
+- Data-driven decisions
+- Document tradeoffs
+**Example:** Serialization format choice (v0.2.0)
+```
+Tested 3 formats (10k docs, 10 runs):
+- JSON + TEXT: 23.40ms ✅ WINNER
+- MessagePack: 137ms (5.6x slower)
+- bun:jsc: 37ms (1.58x slower)
+Verdict: Bun's SIMD-accelerated JSON is fastest
+```
+**History:** v0.2.0 extensively tested binary formats before choosing JSON.
+---
+## 4. Incremental Testing
+**Rule:** Test each feature independently before integration.
+**Why:**
+- Isolates failures
+- Faster debugging
+- Prevents regression
+**Example:** v0.3.0 operator tests
+```
+src/query/operators.test.ts      - 6 tests (basic operators)
+src/query/in-operators.test.ts   - 8 tests ($in, $nin)
+src/query/and-operator.test.ts   - 2 tests ($and)
+src/query/or-operator.test.ts    - 3 tests ($or)
+src/query/builder.test.ts        - 10 tests (integration)
+```
+**History:** v0.3.0 added 13 new tests for 4 operators.
+---
+## 5. WAL Mode for File Databases
+**Rule:** Auto-enable WAL mode for file-based SQLite databases.
+**Why:**
+- 3-6x write speedup
+- Better concurrency (readers don't block writers)
+- Industry standard for production SQLite
+**Implementation:**
+```typescript
+if (databaseName !== ':memory:') {
+  this.db.exec('PRAGMA journal_mode = WAL');
+}
+```
+**Benchmark Results** (`benchmarks/wal-benchmark.ts`):
+```
+1000 document inserts, 5 runs each:
+- WITHOUT WAL: 5.73ms average
+- WITH WAL:    2.40ms average
+- Speedup:     2.39x (in-memory DB)
+Note: File-based databases show 3-6x speedup due to disk I/O benefits
+```
+**History:** v0.1.2 added WAL mode with auto-detection. v0.3.0 added benchmark verification.
+---
+## 6. Conflict Detection with 409 Errors
+**Rule:** Return 409 status with `documentInDb` for UNIQUE constraint violations.
+**Why:**
+- Enables proper RxDB replication conflict resolution
+- Follows HTTP semantics (409 = Conflict)
+- Provides existing document for merge strategies
+**Implementation:**
+```typescript
+catch (error: any) {
+  if (error.message?.includes('UNIQUE constraint failed')) {
+    const existing = this.db.query(
+      'SELECT data FROM documents WHERE id = ?'
+    ).get(doc.id);
+    return {
+      status: 409,
+      documentInDb: JSON.parse(existing.data),
+      writeRow: doc
+    };
+  }
+}
+```
+**History:** v0.2.0 added conflict detection for concurrent writes.
+---
+## 7. RxDB API Alignment - Partial Success Pattern
+**Rule:** RxDB's `bulkWrite` expects per-document error handling, NOT atomic transactions.
+**Why:**
+- RxDB's API design: `RxStorageBulkWriteResponse<T> = { error: RxStorageWriteError<T>[] }`
+- Official docs: "A single write operation to a document is the only atomic thing you can do in RxDB"
+- Designed for offline-first scenarios where full ACID across clients is impossible
+- Performance: Only errors are returned, successes are inferred (input - errors = success)
+**RxDB's Contract:**
+```typescript
+// RxDB expects this response structure
+type RxStorageBulkWriteResponse<RxDocType> = {
+  error: RxStorageWriteError<RxDocType>[];  // Only errors!
+};
+// Success = document NOT in error array
+// Each document can succeed or fail independently
+```
+**Implementation:**
+```typescript
+async bulkWrite(documentWrites, context) {
+  const error: RxStorageWriteError<RxDocType>[] = [];
+  for (const write of documentWrites) {
+    try {
+      stmt.run(id, data, deleted, rev, mtime_ms);
+    } catch (err) {
+      if (err.message?.includes('UNIQUE constraint failed')) {
+        error.push({
+          status: 409,
+          documentId: write.document.id,
+          writeRow: write,
+          documentInDb: existingDoc
+        });
+      } else {
+        throw err;
+      }
+    }
+  }
+  return { error };  // Partial success allowed
+}
+```
+**Critical Understanding:**
+- ✅ Some documents succeed, some fail with 409 → CORRECT
+- ❌ All-or-nothing atomic transactions → VIOLATES RxDB API
+- ✅ Conflicts return 409 per document → EXPECTED
+- ✅ Other documents continue processing → REQUIRED
+**History:** v0.3.0 verified implementation matches RxDB's API contract (researched via node_modules + official docs).
+---
+## 8. Recursive Query Builder with Depth Tracking
+**Rule:** Track logical depth for proper parentheses in nested queries.
+**Why:**
+- Correct SQL precedence for `$or` / `$and`
+- Handles arbitrary nesting depth
+- Clean, maintainable code
+**Implementation:**
+```typescript
+function processSelector(
+  selector: MangoQuerySelector<any>,
+  logicalDepth: number = 0
+): SqlFragment {
+  if (selector.$or) {
+    const fragments = selector.$or.map(s =>
+      processSelector(s, logicalDepth + 1)
+    );
+    const needsParens = logicalDepth > 0;
+    // Wrap in parentheses if nested
+  }
+}
+```
+**Test Coverage** (`src/query/nested-query.test.ts`):
+```
+7 comprehensive tests covering:
+- 3-level nesting ($or inside $and inside $or)
+- 4-level nesting with mixed operators
+- Proper parentheses placement at each depth
+- Complex combinations ($in, $nin, $and, $or)
+```
+**History:** v0.3.0 added recursive builder for complex nested queries with comprehensive test coverage.
+---
+## 9. NULL Handling for Array Operators
+**Rule:** Use `IS NULL` / `IS NOT NULL` for `$in` / `$nin` with null values.
+**Why:**
+- SQL `NULL IN (...)` returns NULL (not true/false)
+- Correct semantic behavior
+- Matches MongoDB/RxDB behavior
+**Implementation:**
+```typescript
+function translateIn(field: string, values: unknown[]): SqlFragment {
+  const hasNull = values.includes(null);
+  const nonNull = values.filter(v => v !== null);
+  if (hasNull && nonNull.length > 0) {
+    return {
+      sql: `(${field} IN (${placeholders}) OR ${field} IS NULL)`,
+      args: nonNull
+    };
+  }
+}
+```
+**History:** v0.3.0 added proper NULL handling for `$in` / `$nin`.
+---
+## 10. Minimal Code Philosophy
+**Rule:** Write only code that directly solves the problem.
+**Why:**
+- Less code = fewer bugs
+- Easier to understand and maintain
+- Faster to modify
+**Anti-patterns to avoid:**
+- God objects with many responsibilities
+- Premature abstractions
+- Verbose implementations
+- Code that doesn't contribute to the solution
+**History:** Enforced throughout all versions.
+---
+## 11. SQL vs Mingo Hybrid Strategy
+**Rule:** Use SQL for simple operators, Mingo fallback for complex operators.
+**Why:**
+- SQL excels at simple predicates (=, >, <, IN, IS NULL)
+- Mingo excels at complex logic ($elemMatch, $type, nested arrays)
+- Right tool for the right job
+- Future-proof for indexes
+**Benchmark Results** (`benchmarks/sql-vs-mingo-comparison.ts`):
+```
+100k documents with JSON expression indexes:
+- SQL (with indexes):  198.10ms average
+- Mingo (in-memory):   326.26ms average
+- Overall Speedup:     1.65x faster with SQL
+Individual tests:
+- $gt (age > 50):      1.26x faster with SQL
+- $eq (status):        1.32x faster with SQL
+- $in (status):        2.55x faster with SQL
+```
+**Key Findings:**
+1. **Indexes matter:** JSON expression indexes provide 1.23x speedup (250ms → 203ms)
+2. **SQL vs Mingo:** SQL is 1.65x faster on average with indexes
+3. **Modest gains:** Not 5-10x, but consistent 1.5-2.5x improvement
+4. **Scalability:** Gap will widen at 1M+ documents
+**Decision Matrix:**
+| Operator | Implementation | Reasoning |
+|----------|---------------|-----------|
+| $eq, $ne, $gt, $gte, $lt, $lte | SQL | Trivial (1 line), benefits from indexes |
+| $in, $nin | SQL | Native IN operator, 2.55x faster with indexes |
+| $exists | SQL | IS NULL is instant |
+| $regex (simple) | SQL | LIKE for simple patterns |
+| $regex (complex) | Mingo | Full regex support |
+| $and, $or, $not, $nor | SQL | Logical operators are SQL's strength |
+| $elemMatch | Mingo | json_each() is complex, Mingo is simple |
+| $type (simple) | SQL | typeof() for number/string/null |
+| $type (complex) | Mingo | boolean/array/object need json_type() |
+| $size | SQL | json_array_length() is simple |
+| $mod | SQL | Native % operator |
+**Implementation Pattern:**
+```typescript
+export function translateOperator(field: string, value: any): SqlFragment | null {
+  // Return SqlFragment for SQL translation
+  // Return null to trigger Mingo fallback
+  if (isSimpleCase) {
+    return { sql: `${field} = ?`, args: [value] };
+  }
+  return null; // Complex case → Mingo
+}
+```
+**Indexes Added:**
+```sql
+CREATE INDEX idx_users_age ON users(json_extract(data, '$.age'));
+CREATE INDEX idx_users_status ON users(json_extract(data, '$.status'));
+CREATE INDEX idx_users_email ON users(json_extract(data, '$.email'));
+```
+**Key Insight:** Don't benchmark SQL vs Mingo without indexes. The real comparison is:
+- SQL (with indexes) vs Mingo (in-memory)
+- Result: SQL is 1.65x faster, validates hybrid approach
+**Lessons Learned:**
+1. **Measure, don't assume:** We thought SQL would be 5-10x faster, actual is 1.65x
+2. **JSON indexes are slower:** Native column indexes would be 5-10x faster
+3. **Hybrid is validated:** 1.65x speedup justifies SQL translation effort
+4. **Scale matters:** Gap will widen at 1M+ docs (Mingo loads all into memory)
+**Key Validations:**
+- ✅ $elemMatch → Mingo is correct (json_each() hell for complex cases)
+- ✅ $regex complex → Mingo is correct (bun:sqlite lacks custom functions)
+- ✅ $type with typeof() is perfect
+- ✅ Hybrid strategy matches mature Mongo-on-SQL projects
+**Future Optimizations:**
+1. **SQL pre-filter + Mingo post-filter:**
+   - Translate what we can to SQL (use indexes)
+   - Run Mingo on returned rows only (not all docs)
+   - Best of both worlds: indexes + full compatibility
+2. **Extend $regex simple category:**
+   - Patterns like `^...$` or `...$` can use GLOB (faster than LIKE)
+3. **Only optimize when needed:**
+   - Don't turn $elemMatch into pure SQL unless hitting performance wall
+   - Current split is excellent for most apps
+**History:** v0.3.0 benchmarked at scale with indexes, decided on hybrid approach based on measured 1.65x speedup. Validated by senior engineer review.
+---
+## 12. Smart Regex → LIKE Optimization
+**Rule:** Convert simple regex patterns to SQL operators for better performance.
+**Why:**
+- Exact matches with `=` are 2x faster than LIKE
+- Leverages indexes more effectively
+- Reduces regex overhead for common patterns
+- COLLATE NOCASE is 23% faster than LOWER()
+**Benchmark Results** (`benchmarks/regex-10runs-all.ts`):
+```
+100k documents, 10 runs each:
+- Exact match (^gmail.com$):  2.03x speedup (= operator vs LIKE)
+- Prefix (^User 1):           0.99x (no improvement)
+- Suffix (@gmail.com$):       1.00x (no improvement)
+- Overall average:            1.24x speedup
+```
+**Case-Insensitive Benchmark** (`benchmarks/case-insensitive-10runs.ts`):
+```
+100k documents, 10 runs:
+- COLLATE NOCASE:  86.10ms average
+- LOWER():         105.73ms average
+- Speedup:         1.23x (COLLATE NOCASE is 23% faster)
+```
+**Implementation:**
+```typescript
+function smartRegexToLike(field: string, pattern: string, options?: string): SqlFragment | null {
+  const caseInsensitive = options?.includes('i');
+  const startsWithAnchor = pattern.startsWith('^');
+  const endsWithAnchor = pattern.endsWith('$');
+  let cleanPattern = pattern.replace(/^\^/, '').replace(/\$$/, '');
+  // Exact match: ^text$ → field = ?
+  if (startsWithAnchor && endsWithAnchor && !/[*+?()[\]{}|]/.test(cleanPattern)) {
+    const exact = cleanPattern.replace(/\\\./g, '.');
+    return caseInsensitive
+      ? { sql: `${field} COLLATE NOCASE = ?`, args: [exact] }
+      : { sql: `${field} = ?`, args: [exact] };
+  }
+  // Prefix: ^text → field LIKE 'text%'
+  if (startsWithAnchor) {
+    const prefix = cleanPattern.replace(/\\\./g, '.');
+    if (!/[*+?()[\]{}|]/.test(prefix)) {
+      const escaped = prefix.replace(/%/g, '\\%').replace(/_/g, '\\_');
+      const collation = caseInsensitive ? ' COLLATE NOCASE' : '';
+      return { sql: `${field} LIKE ?${collation} ESCAPE '\\'`, args: [escaped + '%'] };
+    }
+  }
+  // Suffix: text$ → field LIKE '%text'
+  if (endsWithAnchor) {
+    const suffix = cleanPattern.replace(/\\\./g, '.');
+    if (!/[*+?()[\]{}|]/.test(suffix)) {
+      const escaped = suffix.replace(/%/g, '\\%').replace(/_/g, '\\_');
+      const collation = caseInsensitive ? ' COLLATE NOCASE' : '';
+      return { sql: `${field} LIKE ?${collation} ESCAPE '\\'`, args: ['%' + escaped] };
+    }
+  }
+  return null; // Complex pattern → Mingo fallback
+}
+```
+**Key Optimizations:**
+1. **Exact match detection:** `^text$` → Use `=` operator (2.03x faster)
+2. **Case-insensitive:** Use `COLLATE NOCASE` instead of `LOWER()` (1.23x faster)
+3. **Prefix/suffix:** Use LIKE with proper escaping (no significant improvement, but cleaner SQL)
+4. **Complex patterns:** Return null to trigger Mingo fallback
+**Decision Matrix:**
+| Pattern | SQL Translation | Speedup | Reasoning |
+|---------|----------------|---------|-----------|
+| `^gmail.com$` | `field = ?` | 2.03x | Exact match uses index efficiently |
+| `^gmail.com$` (i flag) | `field COLLATE NOCASE = ?` | 2.03x | COLLATE NOCASE faster than LOWER() |
+| `^User` | `field LIKE 'User%'` | 0.99x | No improvement, but cleaner SQL |
+| `@gmail.com$` | `field LIKE '%@gmail.com'` | 1.00x | No improvement (suffix can't use index) |
+| `.*complex.*` | Mingo fallback | N/A | Complex regex needs full engine |
+**Key Insights:**
+1. **Exact matches are the win:** 2.03x speedup justifies the optimization
+2. **Prefix/suffix show no improvement:** But cleaner SQL is still valuable
+3. **COLLATE NOCASE is critical:** 23% faster than LOWER() for case-insensitive
+4. **Overall 1.24x speedup:** Modest but consistent improvement
+5. **Escaping is critical:** Always escape % and _ in LIKE patterns to prevent wildcard matching
+**Validation:**
+- Matches SQLite's official "LIKE Optimization" strategy
+- Real-world benchmarks confirm 14ms vs 440ms (31x speedup) for exact matches
+- COLLATE NOCASE is the standard production approach
+**History:** v0.3.0+ added smart regex converter with measured 2.03x speedup for exact matches.
+---
+## 13. FTS5 Trigram Indexes - NOT Worth It (Verified at Scale)
+**Rule:** Do NOT implement FTS5 trigram indexes for substring searches at < 10M scale.
+**Why:**
+- Measured SLOWDOWN at both 100k and 1M scales
+- FTS5 overhead outweighs benefits until massive scale (10M+ rows)
+- Regular indexes with LIKE are already fast enough
+- Index creation cost is significant (23.7s for 1M docs)
+**Benchmark Results:**
+`benchmarks/fts5-before-after.ts` (100k docs):
+```
+BEFORE (LIKE):  128.90ms average
+AFTER (FTS5):   230.22ms average
+Speedup:        0.56x (1.79x SLOWDOWN!)
+```
+`benchmarks/fts5-1m-scale.ts` (1M docs):
+```
+BEFORE (LIKE):  1215.47ms average
+AFTER (FTS5):   1827.65ms average
+Speedup:        0.67x (1.5x SLOWDOWN!)
+Index creation: 23717.26ms (23.7 seconds)
+```
+**Why FTS5 is Slower:**
+1. **Index overhead:** Creating and maintaining FTS5 virtual table adds cost
+2. **Small dataset:** 100k docs is too small to benefit from FTS5
+3. **Query pattern:** Simple substring searches don't need trigram matching
+4. **LIKE is optimized:** Regular indexes with LIKE are already efficient
+**Research vs Reality:**
+- **Research claimed:** 100x speedup for substring searches (18M rows)
+- **Our measurement:** 1.79x slowdown (100k docs)
+- **Conclusion:** FTS5 only makes sense at massive scale (millions of rows)
+**Decision Matrix (Data-Driven):**
+| Scale | LIKE Performance | FTS5 Performance | Best Approach | Verified |
+|-------|-----------------|------------------|---------------|----------|
+| 100k docs | 128.90ms | 230.22ms (1.79x slower) | LIKE | ✅ Measured |
+| 1M docs | 1215.47ms | 1827.65ms (1.5x slower) | LIKE | ✅ Measured |
+| 10M+ docs | Unknown | Unknown (research claims 100x faster) | Test both | ❓ Unverified |
+**Key Insight:** Don't implement optimizations based on research alone. Measure at YOUR scale with YOUR data.
+**Research Findings:**
+- 100x speedup documented at 18.2M rows (Andrew Mara benchmark)
+- Crossover point estimated between 1M-10M rows
+- Slowdown at 100k-1M is expected behavior
+- FTS5 overhead dominates at small scales
+**History:** v0.3.0+ benchmarked FTS5 at 100k and 1M scales, decided NOT to implement based on measured slowdowns.
+---
+## 14. JSONB Storage (SQLite Native Binary JSON)
+**Rule:** Use SQLite's native JSONB format (BLOB) instead of TEXT for JSON storage.
+**Why:**
+- 1.57x faster complex queries (657ms → 418ms at 1M docs)
+- 1.20x faster read + parse operations
+- 1.04x faster simple queries
+- No parsing overhead - binary format is more efficient
+- All json_extract() functions work identically
+**Benchmark Results** (`benchmarks/text-vs-jsonb.ts`):
+```
+1M documents, 15 runs each:
+- Simple query:  1.04x faster (481ms → 464ms)
+- Complex query: 1.57x faster (657ms → 418ms) 🔥
+- Read + parse:  1.20x faster (2.37ms → 1.98ms)
+```
+**Implementation:**
+```typescript
+// CREATE TABLE with BLOB column
+CREATE TABLE users (
+  id TEXT PRIMARY KEY,
+  data BLOB NOT NULL  // JSONB storage
+);
+// INSERT with jsonb() function
+INSERT INTO users (id, data) VALUES (?, jsonb(?));
+// SELECT with json() function to convert back
+SELECT json(data) as data FROM users WHERE id = ?;
+// json_extract() works on both TEXT and BLOB
+SELECT * FROM users WHERE json_extract(data, '$.age') > 30;
+```
+**Key Differences from TEXT:**
+- **TEXT:** Stores JSON as string, requires parsing on every access
+- **JSONB:** Stores JSON as binary, optimized for SQLite's JSON functions
+- **Compatibility:** All JSON functions work identically on both formats
+**SQLite Version Required:** 3.45.0+ (we have 3.51.2 ✅)
+**History:** v0.3.0+ benchmarked TEXT vs JSONB at 1M scale, implemented JSONB as default storage format.
+---
+## 15. Phase 3: RxDB Official Test Suite - Storage Layer Architecture
+**[Rule]:** Storage layer returns ALL documents (including deleted). RxDB layer filters them.
+**Why:**
+- RxDB has a layered architecture: Storage (dumb) + RxDB (smart)
+- Storage layer should NOT filter deleted documents
+- Filtering is RxDB's responsibility, not storage's
+- This enables proper replication and conflict resolution
+**Critical Test Finding:**
+```javascript
+/**
+ * Notice that the RxStorage itself runs whatever query you give it,
+ * filtering out deleted documents is done by RxDB, not by the storage.
+ */
+it('must find deleted documents', async () => {
+  // Test expects storage to return deleted documents
+  // RxDB layer will filter them when needed
+});
+```
+**What We Did (Phase 3.1 - TDD Approach):**
+### 1. Initial State: 7 Failures
+- UT5: keyCompression validation not called
+- UT6: encryption validation not called
+- UNIQUE constraint: Not caught, threw error
+- Query deleted documents: Filtered at storage layer (WRONG)
+- Count deleted documents: Filtered at storage layer (WRONG)
+- getChangedDocumentsSince: Not implemented
+- changeStream: Timeout (events not emitting correctly)
+### 2. Research Phase (Lisa Agents)
+**Inspected Dexie adapter:**
+- ❌ Query/count: NO `_deleted` filtering (broken - full table scan)
+- ✅ bulkWrite: Uses `categorizeBulkWriteRows` (prevention approach)
+- ✅ getChangedDocumentsSince: $or pattern for same-timestamp handling
+**Inspected storage-sqlite adapter (MOST RELEVANT):**
+- ❌ Query/count: Fetches ALL, filters in JavaScript (same broken pattern as Dexie)
+- ✅ bulkWrite: Uses `categorizeBulkWriteRows` (prevention approach)
+- ✅ getChangedDocumentsSince: SQL translation of $or pattern
+**Key Insight:** RxDB's official adapters are BROKEN (full table scans). Don't copy their patterns.
+### 3. Fixes Applied (Linus-Style: Minimal, Correct)
+**Fix 1: Plugin Validation (UT5/UT6)**
+```typescript
+constructor(params) {
+  ensureRxStorageInstanceParamsAreCorrect(params); // Call validation FIRST
+  // ... rest of constructor
+}
+```
+**Fix 2: Remove Deleted Filtering from Query/Count**
+```typescript
+// BEFORE (WRONG):
+WHERE deleted = 0 AND (${whereClause})
+// AFTER (CORRECT):
+WHERE (${whereClause})
+```
+**Reasoning:** Storage layer returns ALL documents. RxDB layer filters deleted when needed.
+**Fix 3: Implement getChangedDocumentsSince**
+```typescript
+async getChangedDocumentsSince(limit, checkpoint) {
+  const sql = `
+    SELECT json(data) as data FROM "${this.collectionName}"
+    WHERE (mtime_ms > ? OR (mtime_ms = ? AND id > ?))
+    ORDER BY mtime_ms ASC, id ASC
+    LIMIT ?
+  `;
+  // $or pattern handles same-timestamp edge case
+}
+```
+**Fix 4: UNIQUE Constraint Handling**
+```typescript
+for (const row of categorized.bulkInsertDocs) {
+  try {
+    insertStmt.run(...);
+  } catch (err) {
+    if (err.code === 'SQLITE_CONSTRAINT_PRIMARYKEY') {
+      categorized.errors.push({ status: 409, ... });
+    }
+  }
+}
+```
+**Fix 5: changeStream Event Filtering (CRITICAL)**
+```typescript
+// Filter out events for operations that failed
+const failedDocIds = new Set(categorized.errors.map(e => e.documentId));
+categorized.eventBulk.events = categorized.eventBulk.events.filter(
+  event => !failedDocIds.has(event.documentId)
+);
+// Recalculate checkpoint after filtering
+const lastEvent = categorized.eventBulk.events[lastEvent.length - 1];
+categorized.eventBulk.checkpoint = lastEvent ? {
+  id: lastEvent.documentId,
+  lwt: lastEvent.documentData._meta.lwt
+} : null;
+```
+**Why This Fix is Proper Infrastructure:**
+- Root cause: `categorizeBulkWriteRows` adds events BEFORE DB operations
+- We can't modify RxDB helpers (battle-tested code)
+- Our fix is the adaptation layer between RxDB's assumptions and SQLite's reality
+- Handles race conditions: UNIQUE constraint can fail AFTER categorization
+- Minimal code (5 lines), no complexity
+### 4. Current Status: 1 Failure Remaining
+**Progress:** 7 failures → 1 failure (86% pass rate!)
+**Remaining Issue:** changeStream timeout
+- Logs show we're emitting events correctly (INSERT, UPDATE, DELETE)
+- Test still times out after 5000ms
+- Events are being emitted but test is not receiving them
+- Issue is likely with RxJS Observable subscription or event format
+**What We're NOT Copying from RxDB:**
+- ❌ Dexie's full table scan pattern (no WHERE deleted = 0)
+- ❌ storage-sqlite's JavaScript filtering (fetches all, filters in JS)
+- ✅ We return ALL documents at SQL level (proper storage layer behavior)
+- ✅ RxDB layer handles filtering (proper separation of concerns)
+**Lessons Learned:**
+1. **Don't trust official implementations blindly** - Dexie and storage-sqlite have performance bugs
+2. **Read the test comments** - They explain the architecture better than the code
+3. **TDD works** - Write failing tests first, then fix
+4. **Linus approach** - Minimal code, fix root cause, no bandaids
+**Fix 6: EventBulk.id Generation (CRITICAL - The Final Fix)**
+```typescript
+// BEFORE (WRONG):
+eventBulk: {
+  checkpoint: { id: '', lwt: 0 },
+  context,
+  events,
+  id: ''  // ← EMPTY STRING = FALSY!
+}
+// AFTER (CORRECT):
+eventBulk: {
+  checkpoint: { id: '', lwt: 0 },
+  context,
+  events,
+  id: Date.now().toString() + '-' + Math.random().toString(36).substring(2, 11)
+}
+```
+**Why This Was The Bug:**
+- `flattenEvents()` checks: `if (input.id && input.events)`
+- Empty string `''` is FALSY in JavaScript
+- So `if ('' && events)` evaluates to FALSE
+- flattenEvents couldn't extract events from our EventBulk
+- Test timed out waiting for events that were never extracted
+**Why This Fix is Proper Infrastructure:**
+- Pattern matches distributed ID generation (Snowflake IDs, ULID)
+- Timestamp + random = astronomically low collision probability
+- Monotonically increasing (timestamp prefix helps debugging)
+- No external dependencies needed
+- Fast to generate
+**Research Process:**
+- Lisa agents found the issue by comparing with official adapters
+- Vivian researched Bun test console.log issues (found Bun Issue #22790)
+- Proper investigation instead of assumptions
+### 5. Current Status: ALL TESTS PASSING ✅
+**Progress:** 7 failures → 0 failures (100% pass rate!)
+**Final Test Results:**
+```
+[TEST] After INSERT bulkWrite (with delay), emitted.length: 1
+[TEST] After UPDATE bulkWrite, emitted.length: 2
+[TEST] After DELETE bulkWrite, emitted.length: 3
+[TEST] Before waitUntil, emitted.length: 3
+[TEST] waitUntil check: flattenEvents(emitted).length = 3
+[TEST] After waitUntil - test passed!
+ 1 pass
+ 0 fail
+```
+**All Fixes Applied:**
+1. ✅ Plugin validation (UT5/UT6) - Call `ensureRxStorageInstanceParamsAreCorrect` in constructor
+2. ✅ Remove deleted filtering from query/count - Storage returns ALL documents
+3. ✅ Implement getChangedDocumentsSince - $or pattern for same-timestamp handling
+4. ✅ UNIQUE constraint handling - Catch and convert to 409 errors
+5. ✅ changeStream event filtering - Filter out failed operations
+6. ✅ EventBulk.id generation - Use timestamp + random for unique IDs
+**Lessons Learned:**
+1. **Don't trust official implementations blindly** - Dexie and storage-sqlite have performance bugs
+2. **Read the test comments** - They explain the architecture better than the code
+3. **TDD works** - Write failing tests first, then fix
+4. **Linus approach** - Minimal code, fix root cause, no bandaids
+5. **Research over assumptions** - Use Lisa/Vivian agents to investigate properly
+6. **Bun quirks exist** - console.log doesn't show values properly (Issue #22790), use console.error + JSON.stringify
+**History:** Phase 3.1 (2026-02-22) - TDD approach to pass RxDB official test suite. 7 failures → 0 failures. ✅ COMPLETE
+---
+## 16. Bun Test Console.log Issue (Bun Issue #22790)
+**Rule:** Use `console.error` + `JSON.stringify()` for debugging in bun test, not `console.log`.
+**Why:**
+- Bun Issue #22790: `console.log` doesn't print custom properties on empty arrays
+- Values appear empty even when they exist
+- `console.error` works correctly
+- This is a known Bun bug, not our code issue
+**Evidence:**
+```javascript
+// WRONG (values don't show):
+console.log('[TEST] emitted.length:', emitted.length);
+// Output: [TEST] emitted.length:  ← value missing!
+// CORRECT (values show):
+console.error('[TEST] emitted.length:', JSON.stringify(emitted.length));
+// Output: [TEST] emitted.length: 3  ← value visible!
+```
+**Research Findings (Vivian - 2026-02-22):**
+- GitHub Issue #22790: console.log doesn't print custom properties on empty arrays
+- GitHub Issue #6044: happy-dom causes console.log() to not print during tests
+- GitHub Issue #10389: bun test writes stdout to stderr instead of stdout
+- Workarounds: Use `console.error`, `JSON.stringify()`, or `Bun.inspect()`
+**History:** Phase 3.1 (2026-02-22) - Discovered during changeStream debugging. Vivian researched and found root cause.
+---
+**Last updated:** Phase 3.1 COMPLETE (2026-02-22)
+## 17. Connection Pooling for Multi-Instance Support
+**[Rule]:** Pool Database objects by `databaseName`, use reference counting for cleanup.
+**Why:**
+- Multiple storage instances can share the same database
+- Prevents "database is locked" errors
+- Proper resource cleanup when last instance closes
+- Required for RxDB's multi-instance support
+**Implementation:**
+```typescript
+type DatabaseState = {
+  db: Database;
+  filename: string;
+  openConnections: number;
+};
+const DATABASE_POOL = new Map<string, DatabaseState>();
+export function getDatabase(databaseName: string, filename: string): Database {
+  let state = DATABASE_POOL.get(databaseName);
+  if (!state) {
+    state = { db: new Database(filename), filename, openConnections: 1 };
+    DATABASE_POOL.set(databaseName, state);
+  } else {
+    if (state.filename !== filename) {
+      throw new Error(`Database already opened with different filename`);
+    }
+    state.openConnections++;
+  }
+  return state.db;
+}
+```
+**History:** Iteration 13 (2026-02-23) - Added connection pooling. 52/56 → 56/56 tests pass.
+---
+## 18. Official Multi-Instance Support (RxDB's Implementation)
+**[Rule]:** Use RxDB's `addRxStorageMultiInstanceSupport()`, don't implement BroadcastChannel yourself.
+**Why:**
+- RxDB provides battle-tested multi-instance coordination
+- Handles BroadcastChannel setup, filtering, and cleanup
+- Filters events by storageName/databaseName/collectionName/version
+- We don't own this implementation - don't test it
+**Implementation:**
+```typescript
+import { addRxStorageMultiInstanceSupport } from 'rxdb';
+async createStorageInstance(params) {
+  const instance = new BunSQLiteStorageInstance(params);
+  addRxStorageMultiInstanceSupport('bun-sqlite', params, instance);
+  return instance;
+}
+```
+**History:** Iteration 14 (2026-02-23) - Switched to RxDB's official implementation. Fixed collection isolation bug. 56/56 official + 120/120 local tests pass.
+---
+## 19. Composite Primary Key Support
+**[Rule]:** Handle both string and object primary keys from RxDB schemas.
+**Implementation:**
+```typescript
+const primaryKey = params.schema.primaryKey;
+this.primaryPath = typeof primaryKey === 'string' ? primaryKey : primaryKey.key;
+```
+**History:** Iteration 14 (2026-02-23) - Fixed composite primary key handling.
+---
+## 20. Test at the Right Level
+**[Rule]:** Test the interface you expose, not implementation details.
+**Decision Matrix:**
+- Multi-instance event propagation → RxDatabase (high-level integration)
+- bulkWrite → changeStream emission → Storage instance (low-level, OUR code)
+- BroadcastChannel cross-instance → DON'T TEST (RxDB's code)
+**History:** Iteration 14 (2026-02-23) - Rewrote multi-instance tests to use RxDatabase. Added low-level changeStream tests for OUR code only.
+---
+## 21. Bun Test Suite Compatibility
+**[Rule]:** Run RxDB tests with Mocha through Bun, not native `bun test`.
+**Why:**
+- RxDB test suite designed for Mocha
+- Mocha through Bun: 112/112 tests pass (100%)
+- Native bun test: 55/56 tests pass (98.2%)
+**Fixes Applied:**
+1. Skip `node:sqlite` import in Bun (early return in sqlite-trial case)
+2. Conditional Bun test globals (only when describe undefined)
+**Running Tests:**
+```bash
+# Recommended: Mocha through Bun (100%)
+DEFAULT_STORAGE=custom bun run ./node_modules/mocha/bin/mocha test_tmp/unit/rx-storage-implementations.test.js
+# Alternative: Native bun test (98.2%)
+DEFAULT_STORAGE=custom bun test test_tmp/unit/rx-storage-implementations.test.js
+```
+**History:** Iteration 14 (2026-02-23) - Added Bun compatibility fixes. 112/112 tests pass with Mocha through Bun.
+---
+## 22. Query Builder LRU Cache
+**[Rule]:** Use global LRU cache with canonical keys for query builder results.
+**Why:**
+- 4.8-22.6x speedup for repeated queries
+- Bounded at 500 entries (no memory leak)
+- Cross-collection query reuse (efficient)
+- Zero dependencies except fast-stable-stringify (5KB)
+**Implementation:**
+```typescript
+import stringify from 'fast-stable-stringify';
+const QUERY_CACHE = new Map<string, SqlFragment>();
+const MAX_CACHE_SIZE = 500;
+export function buildWhereClause(selector, schema): SqlFragment {
+  const cacheKey = `v${schema.version}_${stringify(selector)}`;
+  const cached = QUERY_CACHE.get(cacheKey);
+  if (cached) {
+    QUERY_CACHE.delete(cacheKey);
+    QUERY_CACHE.set(cacheKey, cached);
+    return cached;
+  }
+  const result = processSelector(selector, schema, 0);
+  if (QUERY_CACHE.size >= MAX_CACHE_SIZE) {
+    const firstKey = QUERY_CACHE.keys().next().value;
+    if (firstKey) QUERY_CACHE.delete(firstKey);
+  }
+  QUERY_CACHE.set(cacheKey, result);
+  return result;
+}
+```
+**Key Design Decisions:**
+1. **Global cache** - Shared across all collections (efficient)
+2. **Canonical keys** - fast-stable-stringify for order-independent hashing
+3. **True LRU** - delete+re-insert on access (not just FIFO)
+4. **Bounded size** - 500 entries max, FIFO eviction when full
+5. **Schema versioning** - Cache key includes schema version
+**Performance:**
+```
+Cache hit rate: 5.2-57.9x speedup
+High-frequency: 505K-808K queries/sec
+Memory: ~50KB for 500 entries (negligible)
+```
+**Linus Analysis (5-Approaches):**
+- ✅ Global cache with LRU is correct (not per-instance)
+- ✅ Bounded at 500 entries (no leak)
+- ❌ Rejected per-instance cache (wastes memory on duplicates)
+- ❌ Rejected hybrid approach (would clear cache for other instances)
+**History:** Phase 2.5 (2026-02-23) - Implemented with 13 edge case tests. Proven bounded with no exponential growth.
+---
+## 23. Reliable Performance Timing on Windows
+**[Rule]:** Use `process.hrtime.bigint()` with 100K+ iterations for microsecond benchmarks on Windows.
+**Why:**
+- `performance.now()` has ~1ms resolution on Windows (unreliable for µs operations)
+- `process.hrtime.bigint()` has nanosecond precision (reliable)
+- 100K iterations amplify signal above measurement noise
+- Node.js core team uses this pattern (1M iterations for µs ops)
+**Implementation:**
+```typescript
+const start = process.hrtime.bigint();
+for (let i = 0; i < 100000; i++) {
+  buildWhereClause(selector, schema);
+}
+const elapsed = process.hrtime.bigint() - start;
+const avgTime = Number(elapsed) / 100000;
+```
+**Benchmark Results:**
+```
+Before (performance.now() + 100 iterations):
+- Flaky results: 0.38x-3.0x variance
+- Unreliable on Windows
+After (process.hrtime.bigint() + 100K iterations):
+- Stable results: 57.9x speedup
+- Reliable on all platforms
+```
+**Research Findings (Vivian):**
+- Node.js uses `process.hrtime.bigint()` for all benchmarks
+- Node.js uses 1M iterations for microsecond operations
+- Benchmark.js uses statistical analysis with multiple cycles
+- Industry standard: amplify signal, not rely on timer precision
+**History:** Phase 2.5 (2026-02-23) - Fixed flaky performance tests. Changed from performance.now() to hrtime.bigint() with 100K iterations.
+---
+## 24. Cache Lifecycle - Global vs Per-Instance
+**[Rule]:** Use global cache with bounded size, not per-instance cache.
+**Why:**
+- Global cache enables cross-collection query reuse
+- Per-instance cache wastes memory on duplicate queries
+- Bounded size (500 entries) prevents memory leaks
+- LRU eviction handles cache pressure automatically
+**Decision Analysis (Linus Torvalds 5-Approaches):**
+**Option A: Per-Instance Cache**
+- ❌ Wastes memory (100 collections = 100 duplicate caches)
+- ❌ Throws away cache on collection close (even if query reused elsewhere)
+- ❌ No cross-collection optimization
+**Option B: Global Cache with LRU (CHOSEN)**
+- ✅ Efficient cross-collection reuse
+- ✅ Bounded at 500 entries (no leak)
+- ✅ LRU eviction handles pressure
+- ✅ ~50KB memory (negligible)
+**Option C: Hybrid (Clear by Schema Version)**
+- ❌ WRONG - Clearing by schema version affects other collections
+- ❌ Example: 5 collections with v0 schema → closing 1 clears cache for all 5
+**Proof of Correctness:**
+```typescript
+test('Cache is BOUNDED at 500 entries (no exponential growth)', () => {
+  clearCache();
+  for (let i = 0; i < 1000; i++) {
+    buildWhereClause({ id: { $eq: `unique-${i}` } }, schema);
+  }
+  expect(getCacheSize()).toBe(500); // Not 1000!
+});
+```
+**Memory Math:**
+- 500 entries × ~100 bytes/entry = ~50KB
+- Negligible in any real application
+- No leak because bounded
+**History:** Phase 2.5 (2026-02-23) - Analyzed with 5-approaches framework. Decided to keep global cache based on Linus principles.
+---
+## 25. Attachments Support (Phase 4 - v1.0)
+**[Rule]:** Store attachments in separate table with composite keys, validate digests on retrieval.
+**Why:**
+- Separates attachment data from document data (cleaner schema)
+- Composite key (documentId||attachmentId) enables efficient lookups
+- Digest validation prevents data corruption
+- Matches RxDB's attachment API contract
+**Implementation:**
+```typescript
+// Table schema
+CREATE TABLE attachments (
+  id TEXT PRIMARY KEY,      -- documentId||attachmentId
+  data TEXT NOT NULL,       -- base64 attachment data
+  digest TEXT NOT NULL      -- content hash for validation
+);
+// Composite key helper
+function attachmentMapKey(documentId: string, attachmentId: string): string {
+  return documentId + '||' + attachmentId;
+}
+// Retrieval with digest validation
+async getAttachmentData(documentId: string, attachmentId: string, digest: string): Promise<string> {
+  const key = attachmentMapKey(documentId, attachmentId);
+  const row = this.db.query('SELECT data, digest FROM attachments WHERE id = ?').get(key);
+  if (!row || row.digest !== digest) {
+    throw new Error('attachment does not exist');
+  }
+  return row.data;
+}
+```
+**Test Coverage:**
+- 4 comprehensive tests in `src/storage.test.ts`
+- getAttachmentData() returns base64 strings
+- bulkWrite() preserves _attachments metadata
+- Error handling (missing attachment, digest mismatch)
+**Official RxDB Tests:**
+- 122/122 passing (includes 5 attachment tests)
+- Full integration validation
+**History:** v1.0.0 (2026-02-23) - Attachments support complete with storage-level implementation.
+---
+## 26. RxDB Helper Functions (Phase 4 - v1.0)
+**[Rule]:** Use RxDB's battle-tested helper functions for conflict detection and attachment handling.
+**Why:**
+- Used by ALL official adapters (Dexie, MongoDB, SQLite)
+- Handles edge cases we haven't thought of
+- Automatic attachment extraction
+- Proper conflict detection with 409 errors
+**Key Functions:**
+1. **`categorizeBulkWriteRows()`** - Conflict detection + attachment extraction
+   - Returns: `{ bulkInsertDocs, bulkUpdateDocs, errors, eventBulk, attachmentsAdd/Remove/Update }`
+   - Handles all edge cases (conflicts, attachments, events)
+2. **`stripAttachmentsDataFromDocument()`** - Remove .data field, keep metadata
+   - Before storing documents with attachments
+   - Prevents storing large base64 strings in document table
+3. **`stripAttachmentsDataFromRow()`** - Strip attachments from bulk write rows
+   - Processing bulkWrite with attachments
+4. **`attachmentWriteDataToNormalData()`** - Convert write format to storage format
+   - Transforms RxDB's write format to our storage format
+5. **`getAttachmentSize()`** - Calculate size from base64
+   - Used for attachment metadata
+**Implementation:**
+```typescript
+// Custom implementations in src/rxdb-helpers.ts (263 lines)
+// Not imported from RxDB - we own these implementations
+export function categorizeBulkWriteRows(...) { ... }
+export function stripAttachmentsDataFromDocument(...) { ... }
+export function stripAttachmentsDataFromRow(...) { ... }
+export function attachmentWriteDataToNormalData(...) { ... }
+export function getAttachmentSize(...) { ... }
+```
+**History:** v1.0.0 (2026-02-23) - All 5 helper functions implemented in src/rxdb-helpers.ts.
+---
+## 27. bulkWrite Refactoring with categorizeBulkWriteRows (Phase 4 - v1.0)
+**[Rule]:** Use `categorizeBulkWriteRows()` instead of manual conflict detection.
+**Why:**
+- Cleaner architecture (50 lines → 20 lines)
+- Battle-tested logic from official adapters
+- Automatic attachment extraction
+- Proper conflict detection
+- EventBulk generation
+**Before (Manual Conflict Detection):**
+```typescript
+async bulkWrite(documentWrites, context) {
+  const errors = [];
+  for (const writeRow of documentWrites) {
+    const docId = writeRow.document[this.primaryPath];
+    const documentInDb = docsInDbMap.get(docId);
+    if (!documentInDb) {
+      // Insert logic
+    } else {
+      // Manual conflict check
+      if (!writeRow.previous || documentInDb._rev !== writeRow.previous._rev) {
+        errors.push({ status: 409, documentId: docId, writeRow, documentInDb });
+        continue;
+      }
+      // Update logic
+    }
+  }
+  return { error: errors };
+}
+```
+**After (Using Helper):**
+```typescript
+async bulkWrite(documentWrites, context) {
+  const categorized = categorizeBulkWriteRows(
+    this,
+    this.primaryPath,
+    docsInDbMap,
+    documentWrites,
+    context
+  );
+  // Execute categorized operations
+  for (const row of categorized.bulkInsertDocs) {
+    insertStmt.run(...);
+  }
+  for (const row of categorized.bulkUpdateDocs) {
+    updateStmt.run(...);
+  }
+  // Handle attachments automatically
+  [...categorized.attachmentsAdd, ...categorized.attachmentsUpdate].forEach(att => {
+    insertAttStmt.run(attachmentMapKey(att.documentId, att.attachmentId), att.attachmentData.data, att.digest);
+  });
+  categorized.attachmentsRemove.forEach(att => {
+    deleteAttStmt.run(attachmentMapKey(att.documentId, att.attachmentId));
+  });
+  return { error: categorized.errors };
+}
+```
+**Benefits:**
+- ✅ Cleaner code (less manual logic)
+- ✅ Automatic attachment handling
+- ✅ Proper conflict detection
+- ✅ EventBulk generation
+- ✅ Matches official adapter patterns
+**History:** v1.0.0 (2026-02-23) - Refactored bulkWrite to use categorizeBulkWriteRows() helper.
+---