csvdb-cli 0.2.8__cp312-cp312-win_amd64.whl

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Binary file
@@ -0,0 +1,695 @@
1
+ Metadata-Version: 2.4
2
+ Name: csvdb-cli
3
+ Version: 0.2.8
4
+ Classifier: Programming Language :: Rust
5
+ Classifier: Programming Language :: Python :: Implementation :: CPython
6
+ Summary: CLI tool to convert between SQLite, DuckDB, CSV, and Parquet
7
+ Keywords: csv,sqlite,duckdb,parquet,database,cli
8
+ License-Expression: MIT
9
+ Requires-Python: >=3.8
10
+ Description-Content-Type: text/markdown; charset=UTF-8; variant=GFM
11
+
12
+ # csvdb
13
+
14
+ Version-control your relational data like code.
15
+
16
+ > **Note:** This is beta software. The API and file format may change. Use with caution in production.
17
+
18
+ SQLite and DuckDB files are binary — git can't diff them, reviewers can't read them, and merges are impossible. csvdb converts your database into a directory of plain-text CSV files + `schema.sql`, fully diffable and round-trip lossless. Convert back to SQLite, DuckDB, or Parquet when you need query performance.
19
+
20
+ ```diff
21
+ # git diff myapp.csvdb/rates.csv
22
+ "date","rate"
23
+ "2024-01-01","4.50"
24
+ -"2024-04-01","4.25"
25
+ +"2024-04-01","3.75"
26
+ +"2024-07-01","3.50"
27
+ ```
28
+
29
+ Every change is a readable, reviewable line in a PR. No binary blobs, no "file changed" with no context.
30
+
31
+ **Use cases:**
32
+ - Seed data and test fixtures committed alongside code
33
+ - Config and lookup tables reviewed in PRs before deploy
34
+ - CI integrity checks: `csvdb checksum data.csvdb/ | grep $EXPECTED`
35
+ - Migrating between SQLite, DuckDB, and Parquet without ETL scripts
36
+ - Manual edits in a spreadsheet or text editor, rebuild with one command
37
+ - Audit trail: `git blame` on any CSV row shows who changed it and when
38
+
39
+ ## Directory Layouts
40
+
41
+ A `.csvdb` directory contains:
42
+ ```
43
+ mydb.csvdb/
44
+ csvdb.toml # format version, export settings
45
+ schema.sql # CREATE TABLE, CREATE INDEX, CREATE VIEW
46
+ users.csv # one file per table
47
+ orders.csv
48
+ ```
49
+
50
+ A `.parquetdb` directory has the same structure with Parquet files instead of CSVs:
51
+ ```
52
+ mydb.parquetdb/
53
+ csvdb.toml # format version, export settings
54
+ schema.sql # CREATE TABLE, CREATE INDEX, CREATE VIEW
55
+ users.parquet # one file per table
56
+ orders.parquet
57
+ ```
58
+
59
+ The schema defines the structure. The data files hold the data. `csvdb.toml` records the format version and the settings used to produce the export.
60
+
61
+ ## Why csvdb
62
+
63
+ **CSV format** works with standard tools:
64
+ - Edit with any text editor or spreadsheet
65
+ - Diff and merge with git
66
+ - Process with awk, pandas, Excel
67
+
68
+ **SQLite/DuckDB format** provides fast access:
69
+ - Indexed lookups without scanning entire files
70
+ - Views for complex joins and computed columns
71
+ - Full SQL query support
72
+ - Single-file distribution
73
+
74
+ **Parquet format** provides columnar storage:
75
+ - Efficient compression and encoding
76
+ - Fast analytical queries
77
+ - Wide ecosystem support (Spark, pandas, DuckDB, etc.)
78
+ - Per-table `.parquet` files in a `.parquetdb` directory
79
+
80
+ csvdb lets you store data as CSV (human-readable, git-friendly) and convert to SQLite, DuckDB, or Parquet when you need query performance.
81
+
82
+ ## Installation
83
+
84
+ ```bash
85
+ # Rust (via cargo)
86
+ cargo install csvdb
87
+
88
+ # Python library (import csvdb)
89
+ pip install csvdb-py
90
+
91
+ # Standalone binary (via pip/pipx/uvx)
92
+ uvx csvdb-cli
93
+ ```
94
+
95
+ ## Quick Start
96
+
97
+ ```bash
98
+ # Convert an existing SQLite database to csvdb
99
+ csvdb to-csvdb mydb.sqlite
100
+ git add mydb.csvdb/
101
+ git commit -m "Track data in csvdb format"
102
+
103
+ # Edit data
104
+ vim mydb.csvdb/users.csv
105
+
106
+ # Rebuild database
107
+ csvdb to-sqlite mydb.csvdb/
108
+
109
+ # Or export to Parquet
110
+ csvdb to-parquetdb mydb.csvdb/
111
+ ```
112
+
113
+ ## Commands
114
+
115
+ ### init — Create csvdb from raw CSV files
116
+
117
+ ```bash
118
+ # From a directory of CSV files
119
+ csvdb init ./raw_csvs/
120
+
121
+ # From a single CSV file
122
+ csvdb init data.csv
123
+ ```
124
+
125
+ Creates a `.csvdb` directory by:
126
+ - Inferring schema from CSV headers and data types
127
+ - Detecting primary keys (columns named `id` or `<table>_id`)
128
+ - Detecting foreign keys (columns like `user_id` referencing `users.id`)
129
+ - Copying CSV files
130
+
131
+ Options:
132
+ - `--no-pk-detection` - Disable automatic primary key detection
133
+ - `--no-fk-detection` - Disable automatic foreign key detection
134
+
135
+ ### to-csvdb — Export database to csvdb
136
+
137
+ ```bash
138
+ # From SQLite
139
+ csvdb to-csvdb mydb.sqlite
140
+
141
+ # From DuckDB
142
+ csvdb to-csvdb mydb.duckdb
143
+
144
+ # From Parquet
145
+ csvdb to-csvdb mydb.parquetdb/
146
+ csvdb to-csvdb single_table.parquet
147
+ ```
148
+
149
+ Creates `mydb.csvdb/` containing:
150
+ - `schema.sql` - table definitions, indexes, views
151
+ - `*.csv` - one file per table, sorted by primary key
152
+
153
+ Supports multiple input formats:
154
+ - **SQLite** (`.sqlite`, `.sqlite3`, `.db`)
155
+ - **DuckDB** (`.duckdb`)
156
+ - **parquetdb** (`.parquetdb` directory)
157
+ - **Parquet** (`.parquet` single file)
158
+
159
+ Options:
160
+ - `-o, --output <dir>` - Custom output directory
161
+ - `--order <mode>` - Row ordering mode (see below)
162
+ - `--null-mode <mode>` - NULL representation in CSV (see below)
163
+ - `--pipe` - Write to temp directory, output only path (for piping)
164
+
165
+ ### to-sqlite — Build SQLite database
166
+
167
+ ```bash
168
+ csvdb to-sqlite mydb.csvdb/
169
+ csvdb to-sqlite mydb.parquetdb/
170
+ ```
171
+
172
+ Creates `mydb.sqlite` from a csvdb or parquetdb directory.
173
+
174
+ Options:
175
+ - `--force` - Overwrite existing output file
176
+ - `--tables <list>` - Only include these tables (comma-separated)
177
+ - `--exclude <list>` - Exclude these tables (comma-separated)
178
+
179
+ ### to-duckdb — Build DuckDB database
180
+
181
+ ```bash
182
+ csvdb to-duckdb mydb.csvdb/
183
+ csvdb to-duckdb mydb.parquetdb/
184
+ ```
185
+
186
+ Creates `mydb.duckdb` from a csvdb or parquetdb directory.
187
+
188
+ Options:
189
+ - `--force` - Overwrite existing output file
190
+ - `--tables <list>` - Only include these tables (comma-separated)
191
+ - `--exclude <list>` - Exclude these tables (comma-separated)
192
+
193
+ ### to-parquetdb — Convert any format to Parquet
194
+
195
+ ```bash
196
+ # From SQLite
197
+ csvdb to-parquetdb mydb.sqlite
198
+
199
+ # From DuckDB
200
+ csvdb to-parquetdb mydb.duckdb
201
+
202
+ # From csvdb
203
+ csvdb to-parquetdb mydb.csvdb/
204
+
205
+ # From a single Parquet file
206
+ csvdb to-parquetdb users.parquet
207
+ ```
208
+
209
+ Creates `mydb.parquetdb/` containing:
210
+ - `schema.sql` - table definitions, indexes, views
211
+ - `csvdb.toml` - format version and export settings
212
+ - `*.parquet` - one Parquet file per table
213
+
214
+ Supports multiple input formats:
215
+ - **SQLite** (`.sqlite`, `.sqlite3`, `.db`)
216
+ - **DuckDB** (`.duckdb`)
217
+ - **csvdb** (`.csvdb` directory)
218
+ - **parquetdb** (`.parquetdb` directory)
219
+ - **Parquet** (`.parquet` single file)
220
+
221
+ Options:
222
+ - `-o, --output <dir>` - Custom output directory
223
+ - `--order <mode>` - Row ordering mode (see below)
224
+ - `--null-mode <mode>` - NULL representation (see below)
225
+ - `--pipe` - Write to temp directory, output only path (for piping)
226
+ - `--force` - Overwrite existing output directory
227
+ - `--tables <list>` - Only include these tables (comma-separated)
228
+ - `--exclude <list>` - Exclude these tables (comma-separated)
229
+
230
+ ### checksum — Verify data integrity
231
+
232
+ ```bash
233
+ csvdb checksum mydb.sqlite
234
+ csvdb checksum mydb.csvdb/
235
+ csvdb checksum mydb.duckdb
236
+ csvdb checksum mydb.parquetdb/
237
+ csvdb checksum users.parquet
238
+ ```
239
+
240
+ Computes a SHA-256 checksum of the database content. The checksum is:
241
+ - **Format-independent**: Same data produces same hash regardless of format
242
+ - **Deterministic**: Same data always produces same hash
243
+ - **Content-based**: Includes schema structure and all row data
244
+
245
+ Use checksums to verify roundtrip conversions:
246
+ ```bash
247
+ csvdb checksum original.sqlite # a1b2c3...
248
+ csvdb to-csvdb original.sqlite
249
+ csvdb to-duckdb original.csvdb/
250
+ csvdb checksum original.duckdb # a1b2c3... (same!)
251
+ csvdb to-parquetdb original.csvdb/
252
+ csvdb checksum original.parquetdb/ # a1b2c3... (same!)
253
+ ```
254
+
255
+ ## Primary Key Requirement
256
+
257
+ By default, every table must have an explicit primary key. Rows are sorted by primary key when exporting to CSV. By enforcing a stable row order, csvdb guarantees that identical data always produces identical CSV files, making git diffs meaningful and noise-free.
258
+
259
+ ### Tables Without Primary Keys
260
+
261
+ For tables without a primary key (event logs, append-only tables), use the `--order` option:
262
+
263
+ ```bash
264
+ # Order by all columns (deterministic but may have issues with duplicates)
265
+ csvdb to-csvdb mydb.sqlite --order=all-columns
266
+
267
+ # Add a synthetic __csvdb_rowid column (best for event/log tables)
268
+ csvdb to-csvdb mydb.sqlite --order=add-synthetic-key
269
+ ```
270
+
271
+ #### Order Modes
272
+
273
+ | Mode | Description | Best For |
274
+ |------|-------------|----------|
275
+ | `pk` (default) | Order by primary key | Tables with natural keys |
276
+ | `all-columns` | Order by all columns | Reference tables without PK |
277
+ | `add-synthetic-key` | Add `__csvdb_rowid` column | Event logs, append-only data |
278
+
279
+ ## NULL Handling
280
+
281
+ CSV has no native NULL concept. csvdb uses explicit conventions to preserve NULLs across database roundtrips.
282
+
283
+ By default, CSV files use `\N` (PostgreSQL convention) to represent NULL values:
284
+
285
+ ```csv
286
+ "id","name","value"
287
+ "1","\N","42" # name is NULL
288
+ "2","","42" # name is empty string
289
+ "3","hello","\N" # value is NULL
290
+ ```
291
+
292
+ This preserves the distinction between NULL and empty string through roundtrips:
293
+ - **SQLite roundtrip**: NULL and empty string are fully preserved
294
+ - **DuckDB roundtrip**: NULL is preserved. **DuckDB limitation**: empty strings may become NULL due to a Rust driver limitation.
295
+
296
+ ### --null-mode
297
+
298
+ | Mode | NULL representation | Lossless? | Use case |
299
+ |------|-------------------|-----------|----------|
300
+ | `marker` (default) | `\N` | Yes | Roundtrip-safe, distinguishes NULL from empty string |
301
+ | `empty` | empty string | No | Simpler CSV, but cannot distinguish NULL from `""` |
302
+ | `literal` | `NULL` | No | Human-readable, but cannot distinguish NULL from the string `"NULL"` |
303
+
304
+ ```bash
305
+ csvdb to-csvdb mydb.sqlite # default: \N marker
306
+ csvdb to-csvdb mydb.sqlite --null-mode=empty # empty string for NULL
307
+ csvdb to-csvdb mydb.sqlite --null-mode=literal # literal "NULL" string
308
+ ```
309
+
310
+ Lossy modes print a warning to stderr. Use `--pipe` to suppress warnings.
311
+
312
+ ## CSV Dialect
313
+
314
+ csvdb produces a strict, deterministic CSV dialect:
315
+
316
+ | Property | Value |
317
+ |----------|-------|
318
+ | Encoding | UTF-8 |
319
+ | Delimiter | `,` (comma) |
320
+ | Quote character | `"` (double quote) |
321
+ | Quoting | Always — every field is quoted, including headers |
322
+ | Quote escaping | Doubled (`""`) per RFC 4180 |
323
+ | Record terminator | `\n` (LF), not CRLF |
324
+ | Header row | Always present as the first row |
325
+ | Row ordering | Sorted by primary key (deterministic) |
326
+ | NULL representation | Configurable via `--null-mode` (see above) |
327
+
328
+ This is mostly RFC 4180 compliant, with one deliberate deviation: line endings use LF instead of CRLF. This produces cleaner git diffs and avoids mixed-endings issues on Unix systems.
329
+
330
+ Newlines embedded within field values are preserved as-is inside quoted fields. The Rust `csv` crate handles quoting and escaping automatically.
331
+
332
+ See [FORMAT.md](FORMAT.md) for the full normative format specification.
333
+
334
+ ## Gotchas
335
+
336
+ Things that may surprise you on day one:
337
+
338
+ - **String-based sorting.** PK sort is lexicographic on strings, not numeric. `"10"` sorts before `"2"`. If you need numeric order, use a zero-padded string or an INTEGER primary key (integers sort correctly because shorter strings come first and same-length digit strings sort numerically).
339
+
340
+ - **Schema inference is limited.** `csvdb init` only infers three types: `INTEGER`, `REAL`, `TEXT`. It won't detect dates, booleans, or blobs. Edit `schema.sql` after init if you need richer types.
341
+
342
+ - **PK detection stops tracking at 100k values.** During `init`, uniqueness tracking for primary key candidates stops after 100,000 values. If the column was unique up to that point, it's still used as the PK.
343
+
344
+ - **Float precision in checksums.** Values are normalized to 10 decimal places for checksumming. `42.0` normalizes to `42` (integer-valued floats become integers). Very small precision differences across databases are absorbed.
345
+
346
+ - **DuckDB empty string limitation.** Empty strings in TEXT columns may become NULL when round-tripping through DuckDB due to a Rust driver limitation.
347
+
348
+ - **Blob values are hex strings in CSV.** BLOB data is stored as lowercase hex (e.g. `cafe`). It roundtrips correctly through SQLite and DuckDB.
349
+
350
+ - **No duplicate PK validation during CSV read.** Duplicate primary keys are not caught when reading CSV files. They will cause an error at database INSERT time.
351
+
352
+ - **DuckDB indexes are not exported.** Index metadata is not available from DuckDB sources. Indexes defined in a csvdb `schema.sql` are preserved when converting between csvdb and SQLite, but not when the source is DuckDB.
353
+
354
+ - **Views are not dependency-ordered.** Views are written in alphabetical order. If view A references view B, you may need to manually reorder them in `schema.sql`.
355
+
356
+ - **`__csvdb_rowid` is reserved.** The column name `__csvdb_rowid` is used by the `add-synthetic-key` order mode. Don't use it in your own schemas.
357
+
358
+ ## Examples
359
+
360
+ The [`examples/`](examples/) directory contains ready-to-use examples:
361
+
362
+ - **`examples/store.csvdb/`** — A hand-written csvdb directory with two tables, an index, a view, and NULL values
363
+ - **`examples/raw-csvs/`** — Plain CSV files for demonstrating `csvdb init`
364
+
365
+ See [`examples/README.md`](examples/README.md) for usage instructions.
366
+
367
+ ## Workflows
368
+
369
+ ### Git-Tracked Data
370
+
371
+ Store data in git, rebuild databases as needed:
372
+
373
+ ```bash
374
+ # Initial setup: export existing database
375
+ csvdb to-csvdb production.sqlite
376
+ git add production.csvdb/
377
+ git commit -m "Initial data import"
378
+
379
+ # Daily workflow: edit CSVs, commit, rebuild
380
+ vim production.csvdb/users.csv
381
+ git add -p production.csvdb/
382
+ git commit -m "Update user records"
383
+ csvdb to-sqlite production.csvdb/
384
+ ```
385
+
386
+ ### Deploy to Production
387
+
388
+ Use csvdb as the source of truth. Track schema and data in git, export to SQLite for deployment:
389
+
390
+ ```bash
391
+ # Define your schema and seed data in csvdb format
392
+ mkdir -p myapp.csvdb
393
+ cat > myapp.csvdb/schema.sql <<'EOF'
394
+ CREATE TABLE config (
395
+ key TEXT PRIMARY KEY,
396
+ value TEXT NOT NULL
397
+ );
398
+ CREATE TABLE rates (
399
+ date TEXT NOT NULL,
400
+ rate REAL NOT NULL,
401
+ PRIMARY KEY (date)
402
+ );
403
+ EOF
404
+
405
+ # Edit data directly as CSV
406
+ cat > myapp.csvdb/config.csv <<'EOF'
407
+ key,value
408
+ app_name,MyApp
409
+ version,2.1
410
+ EOF
411
+
412
+ # Commit to git — schema and data are versioned together
413
+ git add myapp.csvdb/
414
+ git commit -m "Add rate config for Q1"
415
+
416
+ # Build SQLite for deployment
417
+ csvdb to-sqlite myapp.csvdb/
418
+ scp myapp.sqlite prod-server:/opt/myapp/data/
419
+ ```
420
+
421
+ Changes go through normal code review. `git diff` shows exactly which rows changed. Rollback is `git revert`.
422
+
423
+ ### Data Review via Pull Request
424
+
425
+ Treat data changes like code changes:
426
+
427
+ ```bash
428
+ git checkout -b update-q2-rates
429
+ # Edit the CSV
430
+ vim myapp.csvdb/rates.csv
431
+ git add myapp.csvdb/rates.csv
432
+ git commit -m "Update Q2 rates"
433
+ git push origin update-q2-rates
434
+ # Open PR — reviewers see the exact row-level diff
435
+ ```
436
+
437
+ Because CSVs are sorted by primary key, the diff contains only actual changes — no noise from row reordering.
438
+
439
+ ### Piping Commands
440
+
441
+ Use `--pipe` for one-liner conversions:
442
+
443
+ ```bash
444
+ # SQLite → DuckDB via pipe
445
+ csvdb to-csvdb mydb.sqlite --pipe | xargs csvdb to-duckdb
446
+
447
+ # SQLite → Parquet via pipe
448
+ csvdb to-parquetdb mydb.sqlite --pipe | xargs csvdb to-duckdb
449
+ ```
450
+
451
+ The `--pipe` flag:
452
+ - Writes to system temp directory
453
+ - Outputs only the path (no "Created:" prefix)
454
+ - Uses forward slashes for cross-platform compatibility
455
+
456
+ ### Database Migration
457
+
458
+ Convert between database formats:
459
+
460
+ ```bash
461
+ # SQLite to DuckDB
462
+ csvdb to-csvdb legacy.sqlite
463
+ csvdb to-duckdb legacy.csvdb/
464
+
465
+ # DuckDB to SQLite
466
+ csvdb to-csvdb analytics.duckdb
467
+ csvdb to-sqlite analytics.csvdb/
468
+
469
+ # SQLite to Parquet
470
+ csvdb to-parquetdb legacy.sqlite
471
+
472
+ # Parquet to SQLite
473
+ csvdb to-sqlite legacy.parquetdb/
474
+
475
+ # Verify no data loss
476
+ csvdb checksum legacy.sqlite
477
+ csvdb checksum legacy.duckdb
478
+ csvdb checksum legacy.parquetdb/
479
+ # Checksums match = data preserved
480
+ ```
481
+
482
+ ### Diff and Review Changes
483
+
484
+ Use git to review data changes:
485
+
486
+ ```bash
487
+ # See what changed
488
+ git diff production.csvdb/
489
+
490
+ # See changes to specific table
491
+ git diff production.csvdb/orders.csv
492
+
493
+ # Blame: who changed what
494
+ git blame production.csvdb/users.csv
495
+ ```
496
+
497
+ ### CI/CD Integration
498
+
499
+ Verify data integrity in CI:
500
+
501
+ ```bash
502
+ #!/bin/bash
503
+ set -e
504
+
505
+ # Rebuild from csvdb source
506
+ csvdb to-sqlite data.csvdb/
507
+
508
+ # Verify checksum matches expected
509
+ EXPECTED="a1b2c3d4..."
510
+ ACTUAL=$(csvdb checksum data.sqlite)
511
+ [ "$EXPECTED" = "$ACTUAL" ] || exit 1
512
+ ```
513
+
514
+ ## Python Bindings
515
+
516
+ csvdb provides native Python bindings via PyO3, giving you direct access to all csvdb functions without subprocess overhead.
517
+
518
+ ### Install
519
+
520
+ ```bash
521
+ pip install csvdb-py
522
+ ```
523
+
524
+ ### API
525
+
526
+ ```python
527
+ import csvdb
528
+
529
+ # Convert between formats
530
+ csvdb.to_csvdb("mydb.sqlite", force=True)
531
+ csvdb.to_sqlite("mydb.csvdb", force=True)
532
+ csvdb.to_duckdb("mydb.csvdb", force=True)
533
+ csvdb.to_parquetdb("mydb.csvdb", force=True)
534
+
535
+ # Incremental export (only re-exports changed tables)
536
+ result = csvdb.to_csvdb_incremental("mydb.sqlite")
537
+ # result: {"path": "...", "added": [...], "updated": [...], "unchanged": [...], "removed": [...]}
538
+
539
+ # Checksum (format-independent, deterministic)
540
+ hash = csvdb.checksum("mydb.csvdb")
541
+
542
+ # SQL queries (read-only, returns list of dicts)
543
+ rows = csvdb.sql("mydb.csvdb", "SELECT name, COUNT(*) AS n FROM users GROUP BY name")
544
+
545
+ # Diff two databases
546
+ has_diff = csvdb.diff("v1.csvdb", "v2.csvdb")
547
+
548
+ # Validate structure
549
+ info = csvdb.validate("mydb.csvdb")
550
+
551
+ # Initialize csvdb from raw CSV files
552
+ result = csvdb.init("./raw_csvs/")
553
+
554
+ # Selective export
555
+ csvdb.to_csvdb("mydb.sqlite", tables=["users", "orders"], force=True)
556
+ csvdb.to_csvdb("mydb.sqlite", exclude=["logs"], force=True)
557
+ ```
558
+
559
+ ### Development
560
+
561
+ ```bash
562
+ cd csvdb-python
563
+ uv sync
564
+ uv run maturin develop --release
565
+ uv run pytest
566
+ ```
567
+
568
+ ## Perl Bindings
569
+
570
+ csvdb provides Perl bindings via a C FFI shared library and `FFI::Platypus`.
571
+
572
+ ### Setup
573
+
574
+ ```bash
575
+ # Build the shared library
576
+ cargo build --release -p csvdb-ffi
577
+
578
+ # Install dependencies (macOS)
579
+ brew install cpanminus libffi
580
+ LDFLAGS="-L/opt/homebrew/opt/libffi/lib" \
581
+ CPPFLAGS="-I/opt/homebrew/opt/libffi/include" \
582
+ cpanm FFI::Platypus
583
+
584
+ # Install dependencies (Linux)
585
+ sudo apt-get install cpanminus libffi-dev
586
+ cpanm FFI::Platypus
587
+ ```
588
+
589
+ ### Running Examples
590
+
591
+ ```bash
592
+ perl -Iperl/lib perl/examples/basic_usage.pl
593
+ ```
594
+
595
+ ### API
596
+
597
+ ```perl
598
+ use Csvdb;
599
+
600
+ print Csvdb::version(), "\n";
601
+
602
+ # Convert between formats
603
+ my $csvdb_path = Csvdb::to_csvdb(input => "mydb.sqlite", force => 1);
604
+ my $sqlite_path = Csvdb::to_sqlite(input => "mydb.csvdb", force => 1);
605
+ my $duckdb_path = Csvdb::to_duckdb(input => "mydb.csvdb", force => 1);
606
+
607
+ # Checksum
608
+ my $hash = Csvdb::checksum(input => "mydb.csvdb");
609
+
610
+ # SQL query (returns CSV text)
611
+ my $csv = Csvdb::sql(path => "mydb.csvdb", query => "SELECT * FROM users");
612
+
613
+ # Diff (returns 0=identical, 1=differences)
614
+ my $rc = Csvdb::diff(left => "v1.csvdb", right => "v2.csvdb");
615
+
616
+ # Validate (returns 0=valid, 1=errors)
617
+ my $rc = Csvdb::validate(input => "mydb.csvdb");
618
+ ```
619
+
620
+ ### Running Tests
621
+
622
+ ```bash
623
+ cargo build --release -p csvdb-ffi
624
+ prove perl/t/
625
+ ```
626
+
627
+ ## Project Structure
628
+
629
+ ```
630
+ csvdb/ # Core library + CLI binary
631
+ src/
632
+ main.rs # CLI (clap)
633
+ lib.rs
634
+ commands/
635
+ init.rs # CSV files -> csvdb (schema inference)
636
+ to_csv.rs # any format -> csvdb
637
+ to_sqlite.rs # any format -> SQLite
638
+ to_duckdb.rs # any format -> DuckDB
639
+ to_parquetdb.rs # any format -> parquetdb (Parquet)
640
+ checksum.rs # Format-independent checksums
641
+ validate.rs # Structural integrity checks
642
+ diff.rs # Compare two databases
643
+ sql.rs # Read-only SQL queries
644
+ core/
645
+ schema.rs # Parse/emit schema.sql, type normalization
646
+ table.rs # Row operations, PK handling
647
+ csv.rs # Deterministic CSV I/O
648
+ input.rs # Input format detection
649
+ csvdb-python/ # Python bindings (PyO3)
650
+ src/lib.rs
651
+ examples/
652
+ basic_usage.py
653
+ advanced_usage.py
654
+ csvdb-ffi/ # C FFI for Perl and other languages
655
+ src/lib.rs
656
+ perl/ # Perl module (FFI::Platypus)
657
+ lib/Csvdb.pm
658
+ examples/basic_usage.pl
659
+ tests/functional/ # Python functional tests
660
+ conftest.py
661
+ test_commands.py
662
+ test_performance.py
663
+ pyproject.toml
664
+ ```
665
+
666
+ ## Development
667
+
668
+ ```bash
669
+ cargo build -p csvdb
670
+ cargo run -p csvdb -- init ./raw_csvs/
671
+ cargo run -p csvdb -- to-csvdb mydb.sqlite
672
+ cargo run -p csvdb -- to-sqlite mydb.csvdb/
673
+ cargo run -p csvdb -- to-duckdb mydb.csvdb/
674
+ cargo run -p csvdb -- to-parquetdb mydb.sqlite
675
+ cargo run -p csvdb -- checksum mydb.sqlite
676
+ ```
677
+
678
+ ## Testing
679
+
680
+ ```bash
681
+ # Rust unit tests
682
+ cargo test
683
+
684
+ # Python functional tests (189 tests)
685
+ cd tests/functional
686
+ uv run pytest
687
+
688
+ # Cross-platform (avoids .venv collision)
689
+ uv run --isolated pytest
690
+ ```
691
+
692
+ ## License
693
+
694
+ MIT
695
+
@@ -0,0 +1,5 @@
1
+ csvdb_cli-0.2.8.data/scripts/csvdb.exe,sha256=KiL5jUbn8n_9WRvIXIVbpuIVht6ppRhJfpOWTH91L4w,32500736
2
+ csvdb_cli-0.2.8.dist-info/METADATA,sha256=y2MAc_SmpZNwZZ-mI3KO-K1n3TfkXazr_6jicVjgyMU,20434
3
+ csvdb_cli-0.2.8.dist-info/WHEEL,sha256=qKMSySxXiSk92mm8YkFo8xRrmPs5eXVkc3m4Z8OBMj4,97
4
+ csvdb_cli-0.2.8.dist-info/sboms/csvdb.cyclonedx.json,sha256=bh199DLh1yVxUrWKBcuYs68VZ_ck2qtT1B27B4XrwUI,414236
5
+ csvdb_cli-0.2.8.dist-info/RECORD,,
@@ -0,0 +1,4 @@
1
+ Wheel-Version: 1.0
2
+ Generator: maturin (1.12.0)
3
+ Root-Is-Purelib: false
4
+ Tag: cp312-cp312-win_amd64