npm - databonk - Versions diffs - 0.0.2 → 0.0.4 - Mend

databonk 0.0.2 → 0.0.4

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (55) hide show

package/README.md +116 -111
package/build/release.d.ts +719 -0
package/build/release.js +774 -0
package/build/release.wasm +0 -0
package/build/release.wasm.map +1 -0
package/build/release.wat +22633 -0
package/dist/dataframe.d.ts +82 -0
package/dist/dataframe.d.ts.map +1 -0
package/dist/dataframe.js +318 -0
package/dist/dataframe.js.map +1 -0
package/dist/index.d.ts +42 -19
package/dist/index.d.ts.map +1 -1
package/dist/index.js +37 -6166
package/dist/index.js.map +1 -1
package/dist/loader.d.ts +86 -0
package/dist/loader.d.ts.map +1 -0
package/dist/loader.js +147 -0
package/dist/loader.js.map +1 -0
package/dist/shared-memory.d.ts +64 -0
package/dist/shared-memory.d.ts.map +1 -0
package/dist/shared-memory.js +113 -0
package/dist/shared-memory.js.map +1 -0
package/package.json +30 -56
package/dist/core/column.d.ts +0 -55
package/dist/core/column.d.ts.map +0 -1
package/dist/core/dataframe.d.ts +0 -70
package/dist/core/dataframe.d.ts.map +0 -1
package/dist/core/index-cache.d.ts +0 -44
package/dist/core/index-cache.d.ts.map +0 -1
package/dist/index.esm.js +0 -6153
package/dist/index.esm.js.map +0 -1
package/dist/io/csv.d.ts +0 -23
package/dist/io/csv.d.ts.map +0 -1
package/dist/operations/aggregation.d.ts +0 -23
package/dist/operations/aggregation.d.ts.map +0 -1
package/dist/operations/derive.d.ts +0 -38
package/dist/operations/derive.d.ts.map +0 -1
package/dist/operations/groupby.d.ts +0 -36
package/dist/operations/groupby.d.ts.map +0 -1
package/dist/operations/join.d.ts +0 -22
package/dist/operations/join.d.ts.map +0 -1
package/dist/operations/reshape.d.ts +0 -17
package/dist/operations/reshape.d.ts.map +0 -1
package/dist/utils/aggregation-engine.d.ts +0 -84
package/dist/utils/aggregation-engine.d.ts.map +0 -1
package/dist/utils/bitset.d.ts +0 -30
package/dist/utils/bitset.d.ts.map +0 -1
package/dist/utils/hash.d.ts +0 -79
package/dist/utils/hash.d.ts.map +0 -1
package/dist/utils/performance.d.ts +0 -44
package/dist/utils/performance.d.ts.map +0 -1
package/dist/utils/types.d.ts +0 -7
package/dist/utils/types.d.ts.map +0 -1
package/dist/validation/schema.d.ts +0 -73
package/dist/validation/schema.d.ts.map +0 -1

package/README.md CHANGED Viewed

@@ -1,161 +1,166 @@
-# Databonk.js
+# Databonk
-A lightweight, fast data frame library for JavaScript and TypeScript with built-in schema validation.
+**WASM-powered DataFrame library with SIMD acceleration**
-## Features
+Databonk is a high-performance columnar DataFrame library built with AssemblyScript and WebAssembly, featuring SIMD-optimized operations and optional SharedArrayBuffer support for zero-copy data access.
-- **Lightweight**: Minimal dependencies, tree-shakeable modules
-- **Fast**: Columnar storage using TypedArrays for performance
-- **Simple**: Clean API for common data operations
-- **Flexible**: Works with regular arrays, TypedArrays, or Apache Arrow
-- **Schema Validation**: Built-in Zod integration for data validation
-- **Type Safe**: Full TypeScript support with inferred types
+## Key Features
+- **14x faster** than JavaScript for aggregations (sum, mean, min, max)
+- **SIMD acceleration** with 4-way parallel computation
+- **Zero-copy access** to column data via SharedArrayBuffer
+- **Full TypeScript support** with comprehensive type definitions
+- **Memory efficient** columnar storage design
+- **Fluent API** for method chaining
 ## Installation
 ```bash
-npm install databonk zod
+npm install databonk
 ```
 ## Quick Start
-```javascript
-import { DataFrame, SchemaValidator, CommonSchemas } from 'databonk';
-// Create a DataFrame
-const df = DataFrame.from({
-  name: ['Alice', 'Bob', 'Charlie'],
-  age: [25, 30, 35],
-  city: ['NYC', 'LA', 'Chicago']
-});
+```typescript
+import { loadDatabonk, DatabonkDataFrame } from 'databonk';
-// Basic operations
-const adults = df.filter(row => row.age >= 30);
-const avgAge = df.column('age').mean();
-const grouped = df.groupBy(['city']).agg({ avgAge: 'mean' });
+// Load the WASM module
+const module = await loadDatabonk();
-// Schema validation
-const result = df.validate(CommonSchemas.person);
-console.log(`Valid rows: ${result.validRows}/${result.totalRows}`);
-```
+// Create a DataFrame from typed arrays
+const df = await DatabonkDataFrame.fromTypedArrays(module, [
+  { name: 'id', data: new Int32Array([1, 2, 3, 4, 5]) },
+  { name: 'value', data: new Float32Array([10.5, 20.5, 30.5, 40.5, 50.5]) },
+]);
-## Core Features
+// Aggregations
+console.log('Sum:', df.sum('value'));     // 152.5
+console.log('Mean:', df.mean('value'));   // 30.5
+console.log('Min:', df.min('value'));     // 10.5
+console.log('Max:', df.max('value'));     // 50.5
+console.log('Rows:', df.rowCount);        // 5
-### Data Operations
-- **Filtering & Selection**: Powerful row/column filtering with predicate functions
-- **Joins**: Inner, left, right, and outer joins with multiple keys
-- **Aggregations**: Sum, mean, count, min, max, std, variance with group-by support
-- **Reshaping**: Pivot, melt, transpose operations for data transformation
-- **Sorting**: Multi-column sorting with custom comparators
+// Clean up when done
+df.free();
+```
-### Schema Validation
-- **Built-in Schemas**: Common patterns for users, products, transactions, coordinates
-- **Custom Validation**: Define your own schemas with Zod
-- **Data Cleaning**: Filter valid/invalid rows, transform data types
-- **Error Reporting**: Detailed validation errors with row/column information
+## Performance
-### I/O Support
-- **CSV**: Read/write CSV files with automatic type inference
-- **Apache Arrow**: Optional integration for columnar data exchange
-- **Streaming**: Memory-efficient processing of large datasets
+Benchmarks on 1 million rows (Float32):
-## Examples
+| Operation | WASM SIMD | JavaScript | Speedup |
+|-----------|-----------|------------|---------|
+| Sum       | ~0.3ms    | ~4.2ms     | **14x** |
+| Min       | ~0.4ms    | ~4.8ms     | **12x** |
+| Max       | ~0.4ms    | ~4.8ms     | **12x** |
+| Mean      | ~0.3ms    | ~5.0ms     | **16x** |
-### Schema Validation
+## API Overview
-```javascript
-import { DataFrame, SchemaValidator } from 'databonk';
-import { z } from 'zod';
+### Module Loading
-// Define a custom schema
-const userSchema = SchemaValidator.define({
-  name: z.string().min(1),
-  age: z.number().int().min(0).max(150),
-  email: z.string().email(),
-  role: z.enum(['admin', 'user', 'guest'])
+```typescript
+const module = await loadDatabonk({
+  wasmPath: './build/release.wasm',  // Optional: custom WASM path
+  sharedMemory: true,                 // Optional: enable SharedArrayBuffer
+  initialMemory: 256,                 // Optional: initial memory pages (16MB default)
+  maximumMemory: 16384,               // Optional: max memory pages (1GB default)
 });
+```
+### DataFrame Creation
-const userData = [
-  { name: 'Alice', age: 25, email: 'alice@example.com', role: 'admin' },
-  { name: '', age: -5, email: 'invalid', role: 'unknown' } // Invalid
-];
+```typescript
+const df = await DatabonkDataFrame.fromTypedArrays(module, [
+  { name: 'int_col', data: new Int32Array([1, 2, 3]) },
+  { name: 'float_col', data: new Float32Array([1.5, 2.5, 3.5]) },
+  { name: 'double_col', data: new Float64Array([1.1, 2.2, 3.3]) },
+]);
+```
-const df = DataFrame.fromRows(userData);
+### Aggregations
-// Validate data
-const validation = df.validate(userSchema);
-console.log(`Errors: ${validation.errors.length}`);
+```typescript
+df.sum('column');    // Sum of values
+df.mean('column');   // Average
+df.min('column');    // Minimum
+df.max('column');    // Maximum
+df.count('column');  // Count of values
+```
-// Filter valid rows
-const validUsers = df.filterValid(userSchema);
+### Column Arithmetic
-// Transform data with type coercion
-const cleanData = df.validateAndTransform(userSchema);
+```typescript
+df.add('a', 'b', 'sum')           // sum = a + b
+  .sub('a', 'b', 'diff')          // diff = a - b
+  .scalarMul('a', 2.5, 'scaled'); // scaled = a * 2.5
 ```
-### Advanced Data Operations
+### GroupBy
-```javascript
-// Join operations
-const sales = DataFrame.fromRows([
-  { product_id: 1, quantity: 100, region: 'North' },
-  { product_id: 2, quantity: 150, region: 'South' }
-]);
+```typescript
+const grouped = df.groupBy('category', 256)  // maxKey parameter
+  .sum('value');  // or .mean('value')
+```
-const products = DataFrame.fromRows([
-  { product_id: 1, name: 'Widget', price: 10.99 },
-  { product_id: 2, name: 'Gadget', price: 15.99 }
-]);
+### Inner Join
+```typescript
+const result = left.innerJoin(right, 'left_key', 'right_key');
+```
-const joined = sales.join(products, 'product_id', 'inner');
+### Zero-Copy Column Access
-// Group by with multiple aggregations
-const summary = joined
-  .groupBy(['region'])
-  .agg({
-    quantity: ['sum', 'mean'],
-    price: 'mean'
-  });
+```typescript
+const view = df.getColumnView('value');
+if (view) {
+  console.log(view.get(0));      // First value
+  console.log([...view]);        // Iterate
+  console.log(view.toArray());   // Copy to regular array
+}
+```
-// Add calculated columns
-const withRevenue = joined.withColumn('revenue',
-  row => row.quantity * row.price
-);
+### Memory Management
-// Pivot tables
-const pivot = sales.pivot(['region'], 'product_id', 'quantity', 'sum');
+```typescript
+df.free();  // Always free DataFrames when done
 ```
-## Docker Development
+## Documentation
-```bash
-# Build and start development environment
-make docker-dev
+- [API Reference](./docs/api.md) - Full API documentation
+- [Examples](./docs/examples.md) - Detailed code examples
-# Run tests in Docker
-make docker-test
+## Supported Column Types
-# Open shell in container
-make docker-shell
-```
+| Type | TypedArray | Use Case |
+|------|------------|----------|
+| Int32 | `Int32Array` | Integer keys, IDs, counts |
+| Float32 | `Float32Array` | Standard floating-point values |
+| Float64 | `Float64Array` | High-precision values |
+## Current Limitations
+- GroupBy currently supports single value column aggregation
+- Join keys must be Int32 values
+- String columns are supported for storage but not for operations
 ## Development
 ```bash
-# Local development
+# Install dependencies
 npm install
-npm run build
+# Build WASM module
+npm run asbuild
+# Run tests
 npm test
-# With Docker
-make setup
-make dev
+# Run benchmarks
+npm run benchmark
 ```
-## Performance
+## License
-Databonk.js is designed for small to medium datasets (up to ~1M rows) with:
-- **Memory efficient**: Columnar storage with TypedArrays
-- **Fast operations**: Optimized algorithms for joins, aggregations
-- **Minimal overhead**: Zero-copy operations where possible
-- **Tree-shakeable**: Only import what you use
+MIT