csvdb-cli 0.2.8__cp312-cp312-win_amd64.whl
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
|
Binary file
|
|
@@ -0,0 +1,695 @@
|
|
|
1
|
+
Metadata-Version: 2.4
|
|
2
|
+
Name: csvdb-cli
|
|
3
|
+
Version: 0.2.8
|
|
4
|
+
Classifier: Programming Language :: Rust
|
|
5
|
+
Classifier: Programming Language :: Python :: Implementation :: CPython
|
|
6
|
+
Summary: CLI tool to convert between SQLite, DuckDB, CSV, and Parquet
|
|
7
|
+
Keywords: csv,sqlite,duckdb,parquet,database,cli
|
|
8
|
+
License-Expression: MIT
|
|
9
|
+
Requires-Python: >=3.8
|
|
10
|
+
Description-Content-Type: text/markdown; charset=UTF-8; variant=GFM
|
|
11
|
+
|
|
12
|
+
# csvdb
|
|
13
|
+
|
|
14
|
+
Version-control your relational data like code.
|
|
15
|
+
|
|
16
|
+
> **Note:** This is beta software. The API and file format may change. Use with caution in production.
|
|
17
|
+
|
|
18
|
+
SQLite and DuckDB files are binary — git can't diff them, reviewers can't read them, and merges are impossible. csvdb converts your database into a directory of plain-text CSV files + `schema.sql`, fully diffable and round-trip lossless. Convert back to SQLite, DuckDB, or Parquet when you need query performance.
|
|
19
|
+
|
|
20
|
+
```diff
|
|
21
|
+
# git diff myapp.csvdb/rates.csv
|
|
22
|
+
"date","rate"
|
|
23
|
+
"2024-01-01","4.50"
|
|
24
|
+
-"2024-04-01","4.25"
|
|
25
|
+
+"2024-04-01","3.75"
|
|
26
|
+
+"2024-07-01","3.50"
|
|
27
|
+
```
|
|
28
|
+
|
|
29
|
+
Every change is a readable, reviewable line in a PR. No binary blobs, no "file changed" with no context.
|
|
30
|
+
|
|
31
|
+
**Use cases:**
|
|
32
|
+
- Seed data and test fixtures committed alongside code
|
|
33
|
+
- Config and lookup tables reviewed in PRs before deploy
|
|
34
|
+
- CI integrity checks: `csvdb checksum data.csvdb/ | grep $EXPECTED`
|
|
35
|
+
- Migrating between SQLite, DuckDB, and Parquet without ETL scripts
|
|
36
|
+
- Manual edits in a spreadsheet or text editor, rebuild with one command
|
|
37
|
+
- Audit trail: `git blame` on any CSV row shows who changed it and when
|
|
38
|
+
|
|
39
|
+
## Directory Layouts
|
|
40
|
+
|
|
41
|
+
A `.csvdb` directory contains:
|
|
42
|
+
```
|
|
43
|
+
mydb.csvdb/
|
|
44
|
+
csvdb.toml # format version, export settings
|
|
45
|
+
schema.sql # CREATE TABLE, CREATE INDEX, CREATE VIEW
|
|
46
|
+
users.csv # one file per table
|
|
47
|
+
orders.csv
|
|
48
|
+
```
|
|
49
|
+
|
|
50
|
+
A `.parquetdb` directory has the same structure with Parquet files instead of CSVs:
|
|
51
|
+
```
|
|
52
|
+
mydb.parquetdb/
|
|
53
|
+
csvdb.toml # format version, export settings
|
|
54
|
+
schema.sql # CREATE TABLE, CREATE INDEX, CREATE VIEW
|
|
55
|
+
users.parquet # one file per table
|
|
56
|
+
orders.parquet
|
|
57
|
+
```
|
|
58
|
+
|
|
59
|
+
The schema defines the structure. The data files hold the data. `csvdb.toml` records the format version and the settings used to produce the export.
|
|
60
|
+
|
|
61
|
+
## Why csvdb
|
|
62
|
+
|
|
63
|
+
**CSV format** works with standard tools:
|
|
64
|
+
- Edit with any text editor or spreadsheet
|
|
65
|
+
- Diff and merge with git
|
|
66
|
+
- Process with awk, pandas, Excel
|
|
67
|
+
|
|
68
|
+
**SQLite/DuckDB format** provides fast access:
|
|
69
|
+
- Indexed lookups without scanning entire files
|
|
70
|
+
- Views for complex joins and computed columns
|
|
71
|
+
- Full SQL query support
|
|
72
|
+
- Single-file distribution
|
|
73
|
+
|
|
74
|
+
**Parquet format** provides columnar storage:
|
|
75
|
+
- Efficient compression and encoding
|
|
76
|
+
- Fast analytical queries
|
|
77
|
+
- Wide ecosystem support (Spark, pandas, DuckDB, etc.)
|
|
78
|
+
- Per-table `.parquet` files in a `.parquetdb` directory
|
|
79
|
+
|
|
80
|
+
csvdb lets you store data as CSV (human-readable, git-friendly) and convert to SQLite, DuckDB, or Parquet when you need query performance.
|
|
81
|
+
|
|
82
|
+
## Installation
|
|
83
|
+
|
|
84
|
+
```bash
|
|
85
|
+
# Rust (via cargo)
|
|
86
|
+
cargo install csvdb
|
|
87
|
+
|
|
88
|
+
# Python library (import csvdb)
|
|
89
|
+
pip install csvdb-py
|
|
90
|
+
|
|
91
|
+
# Standalone binary (via pip/pipx/uvx)
|
|
92
|
+
uvx csvdb-cli
|
|
93
|
+
```
|
|
94
|
+
|
|
95
|
+
## Quick Start
|
|
96
|
+
|
|
97
|
+
```bash
|
|
98
|
+
# Convert an existing SQLite database to csvdb
|
|
99
|
+
csvdb to-csvdb mydb.sqlite
|
|
100
|
+
git add mydb.csvdb/
|
|
101
|
+
git commit -m "Track data in csvdb format"
|
|
102
|
+
|
|
103
|
+
# Edit data
|
|
104
|
+
vim mydb.csvdb/users.csv
|
|
105
|
+
|
|
106
|
+
# Rebuild database
|
|
107
|
+
csvdb to-sqlite mydb.csvdb/
|
|
108
|
+
|
|
109
|
+
# Or export to Parquet
|
|
110
|
+
csvdb to-parquetdb mydb.csvdb/
|
|
111
|
+
```
|
|
112
|
+
|
|
113
|
+
## Commands
|
|
114
|
+
|
|
115
|
+
### init — Create csvdb from raw CSV files
|
|
116
|
+
|
|
117
|
+
```bash
|
|
118
|
+
# From a directory of CSV files
|
|
119
|
+
csvdb init ./raw_csvs/
|
|
120
|
+
|
|
121
|
+
# From a single CSV file
|
|
122
|
+
csvdb init data.csv
|
|
123
|
+
```
|
|
124
|
+
|
|
125
|
+
Creates a `.csvdb` directory by:
|
|
126
|
+
- Inferring schema from CSV headers and data types
|
|
127
|
+
- Detecting primary keys (columns named `id` or `<table>_id`)
|
|
128
|
+
- Detecting foreign keys (columns like `user_id` referencing `users.id`)
|
|
129
|
+
- Copying CSV files
|
|
130
|
+
|
|
131
|
+
Options:
|
|
132
|
+
- `--no-pk-detection` - Disable automatic primary key detection
|
|
133
|
+
- `--no-fk-detection` - Disable automatic foreign key detection
|
|
134
|
+
|
|
135
|
+
### to-csvdb — Export database to csvdb
|
|
136
|
+
|
|
137
|
+
```bash
|
|
138
|
+
# From SQLite
|
|
139
|
+
csvdb to-csvdb mydb.sqlite
|
|
140
|
+
|
|
141
|
+
# From DuckDB
|
|
142
|
+
csvdb to-csvdb mydb.duckdb
|
|
143
|
+
|
|
144
|
+
# From Parquet
|
|
145
|
+
csvdb to-csvdb mydb.parquetdb/
|
|
146
|
+
csvdb to-csvdb single_table.parquet
|
|
147
|
+
```
|
|
148
|
+
|
|
149
|
+
Creates `mydb.csvdb/` containing:
|
|
150
|
+
- `schema.sql` - table definitions, indexes, views
|
|
151
|
+
- `*.csv` - one file per table, sorted by primary key
|
|
152
|
+
|
|
153
|
+
Supports multiple input formats:
|
|
154
|
+
- **SQLite** (`.sqlite`, `.sqlite3`, `.db`)
|
|
155
|
+
- **DuckDB** (`.duckdb`)
|
|
156
|
+
- **parquetdb** (`.parquetdb` directory)
|
|
157
|
+
- **Parquet** (`.parquet` single file)
|
|
158
|
+
|
|
159
|
+
Options:
|
|
160
|
+
- `-o, --output <dir>` - Custom output directory
|
|
161
|
+
- `--order <mode>` - Row ordering mode (see below)
|
|
162
|
+
- `--null-mode <mode>` - NULL representation in CSV (see below)
|
|
163
|
+
- `--pipe` - Write to temp directory, output only path (for piping)
|
|
164
|
+
|
|
165
|
+
### to-sqlite — Build SQLite database
|
|
166
|
+
|
|
167
|
+
```bash
|
|
168
|
+
csvdb to-sqlite mydb.csvdb/
|
|
169
|
+
csvdb to-sqlite mydb.parquetdb/
|
|
170
|
+
```
|
|
171
|
+
|
|
172
|
+
Creates `mydb.sqlite` from a csvdb or parquetdb directory.
|
|
173
|
+
|
|
174
|
+
Options:
|
|
175
|
+
- `--force` - Overwrite existing output file
|
|
176
|
+
- `--tables <list>` - Only include these tables (comma-separated)
|
|
177
|
+
- `--exclude <list>` - Exclude these tables (comma-separated)
|
|
178
|
+
|
|
179
|
+
### to-duckdb — Build DuckDB database
|
|
180
|
+
|
|
181
|
+
```bash
|
|
182
|
+
csvdb to-duckdb mydb.csvdb/
|
|
183
|
+
csvdb to-duckdb mydb.parquetdb/
|
|
184
|
+
```
|
|
185
|
+
|
|
186
|
+
Creates `mydb.duckdb` from a csvdb or parquetdb directory.
|
|
187
|
+
|
|
188
|
+
Options:
|
|
189
|
+
- `--force` - Overwrite existing output file
|
|
190
|
+
- `--tables <list>` - Only include these tables (comma-separated)
|
|
191
|
+
- `--exclude <list>` - Exclude these tables (comma-separated)
|
|
192
|
+
|
|
193
|
+
### to-parquetdb — Convert any format to Parquet
|
|
194
|
+
|
|
195
|
+
```bash
|
|
196
|
+
# From SQLite
|
|
197
|
+
csvdb to-parquetdb mydb.sqlite
|
|
198
|
+
|
|
199
|
+
# From DuckDB
|
|
200
|
+
csvdb to-parquetdb mydb.duckdb
|
|
201
|
+
|
|
202
|
+
# From csvdb
|
|
203
|
+
csvdb to-parquetdb mydb.csvdb/
|
|
204
|
+
|
|
205
|
+
# From a single Parquet file
|
|
206
|
+
csvdb to-parquetdb users.parquet
|
|
207
|
+
```
|
|
208
|
+
|
|
209
|
+
Creates `mydb.parquetdb/` containing:
|
|
210
|
+
- `schema.sql` - table definitions, indexes, views
|
|
211
|
+
- `csvdb.toml` - format version and export settings
|
|
212
|
+
- `*.parquet` - one Parquet file per table
|
|
213
|
+
|
|
214
|
+
Supports multiple input formats:
|
|
215
|
+
- **SQLite** (`.sqlite`, `.sqlite3`, `.db`)
|
|
216
|
+
- **DuckDB** (`.duckdb`)
|
|
217
|
+
- **csvdb** (`.csvdb` directory)
|
|
218
|
+
- **parquetdb** (`.parquetdb` directory)
|
|
219
|
+
- **Parquet** (`.parquet` single file)
|
|
220
|
+
|
|
221
|
+
Options:
|
|
222
|
+
- `-o, --output <dir>` - Custom output directory
|
|
223
|
+
- `--order <mode>` - Row ordering mode (see below)
|
|
224
|
+
- `--null-mode <mode>` - NULL representation (see below)
|
|
225
|
+
- `--pipe` - Write to temp directory, output only path (for piping)
|
|
226
|
+
- `--force` - Overwrite existing output directory
|
|
227
|
+
- `--tables <list>` - Only include these tables (comma-separated)
|
|
228
|
+
- `--exclude <list>` - Exclude these tables (comma-separated)
|
|
229
|
+
|
|
230
|
+
### checksum — Verify data integrity
|
|
231
|
+
|
|
232
|
+
```bash
|
|
233
|
+
csvdb checksum mydb.sqlite
|
|
234
|
+
csvdb checksum mydb.csvdb/
|
|
235
|
+
csvdb checksum mydb.duckdb
|
|
236
|
+
csvdb checksum mydb.parquetdb/
|
|
237
|
+
csvdb checksum users.parquet
|
|
238
|
+
```
|
|
239
|
+
|
|
240
|
+
Computes a SHA-256 checksum of the database content. The checksum is:
|
|
241
|
+
- **Format-independent**: Same data produces same hash regardless of format
|
|
242
|
+
- **Deterministic**: Same data always produces same hash
|
|
243
|
+
- **Content-based**: Includes schema structure and all row data
|
|
244
|
+
|
|
245
|
+
Use checksums to verify roundtrip conversions:
|
|
246
|
+
```bash
|
|
247
|
+
csvdb checksum original.sqlite # a1b2c3...
|
|
248
|
+
csvdb to-csvdb original.sqlite
|
|
249
|
+
csvdb to-duckdb original.csvdb/
|
|
250
|
+
csvdb checksum original.duckdb # a1b2c3... (same!)
|
|
251
|
+
csvdb to-parquetdb original.csvdb/
|
|
252
|
+
csvdb checksum original.parquetdb/ # a1b2c3... (same!)
|
|
253
|
+
```
|
|
254
|
+
|
|
255
|
+
## Primary Key Requirement
|
|
256
|
+
|
|
257
|
+
By default, every table must have an explicit primary key. Rows are sorted by primary key when exporting to CSV. By enforcing a stable row order, csvdb guarantees that identical data always produces identical CSV files, making git diffs meaningful and noise-free.
|
|
258
|
+
|
|
259
|
+
### Tables Without Primary Keys
|
|
260
|
+
|
|
261
|
+
For tables without a primary key (event logs, append-only tables), use the `--order` option:
|
|
262
|
+
|
|
263
|
+
```bash
|
|
264
|
+
# Order by all columns (deterministic but may have issues with duplicates)
|
|
265
|
+
csvdb to-csvdb mydb.sqlite --order=all-columns
|
|
266
|
+
|
|
267
|
+
# Add a synthetic __csvdb_rowid column (best for event/log tables)
|
|
268
|
+
csvdb to-csvdb mydb.sqlite --order=add-synthetic-key
|
|
269
|
+
```
|
|
270
|
+
|
|
271
|
+
#### Order Modes
|
|
272
|
+
|
|
273
|
+
| Mode | Description | Best For |
|
|
274
|
+
|------|-------------|----------|
|
|
275
|
+
| `pk` (default) | Order by primary key | Tables with natural keys |
|
|
276
|
+
| `all-columns` | Order by all columns | Reference tables without PK |
|
|
277
|
+
| `add-synthetic-key` | Add `__csvdb_rowid` column | Event logs, append-only data |
|
|
278
|
+
|
|
279
|
+
## NULL Handling
|
|
280
|
+
|
|
281
|
+
CSV has no native NULL concept. csvdb uses explicit conventions to preserve NULLs across database roundtrips.
|
|
282
|
+
|
|
283
|
+
By default, CSV files use `\N` (PostgreSQL convention) to represent NULL values:
|
|
284
|
+
|
|
285
|
+
```csv
|
|
286
|
+
"id","name","value"
|
|
287
|
+
"1","\N","42" # name is NULL
|
|
288
|
+
"2","","42" # name is empty string
|
|
289
|
+
"3","hello","\N" # value is NULL
|
|
290
|
+
```
|
|
291
|
+
|
|
292
|
+
This preserves the distinction between NULL and empty string through roundtrips:
|
|
293
|
+
- **SQLite roundtrip**: NULL and empty string are fully preserved
|
|
294
|
+
- **DuckDB roundtrip**: NULL is preserved. **DuckDB limitation**: empty strings may become NULL due to a Rust driver limitation.
|
|
295
|
+
|
|
296
|
+
### --null-mode
|
|
297
|
+
|
|
298
|
+
| Mode | NULL representation | Lossless? | Use case |
|
|
299
|
+
|------|-------------------|-----------|----------|
|
|
300
|
+
| `marker` (default) | `\N` | Yes | Roundtrip-safe, distinguishes NULL from empty string |
|
|
301
|
+
| `empty` | empty string | No | Simpler CSV, but cannot distinguish NULL from `""` |
|
|
302
|
+
| `literal` | `NULL` | No | Human-readable, but cannot distinguish NULL from the string `"NULL"` |
|
|
303
|
+
|
|
304
|
+
```bash
|
|
305
|
+
csvdb to-csvdb mydb.sqlite # default: \N marker
|
|
306
|
+
csvdb to-csvdb mydb.sqlite --null-mode=empty # empty string for NULL
|
|
307
|
+
csvdb to-csvdb mydb.sqlite --null-mode=literal # literal "NULL" string
|
|
308
|
+
```
|
|
309
|
+
|
|
310
|
+
Lossy modes print a warning to stderr. Use `--pipe` to suppress warnings.
|
|
311
|
+
|
|
312
|
+
## CSV Dialect
|
|
313
|
+
|
|
314
|
+
csvdb produces a strict, deterministic CSV dialect:
|
|
315
|
+
|
|
316
|
+
| Property | Value |
|
|
317
|
+
|----------|-------|
|
|
318
|
+
| Encoding | UTF-8 |
|
|
319
|
+
| Delimiter | `,` (comma) |
|
|
320
|
+
| Quote character | `"` (double quote) |
|
|
321
|
+
| Quoting | Always — every field is quoted, including headers |
|
|
322
|
+
| Quote escaping | Doubled (`""`) per RFC 4180 |
|
|
323
|
+
| Record terminator | `\n` (LF), not CRLF |
|
|
324
|
+
| Header row | Always present as the first row |
|
|
325
|
+
| Row ordering | Sorted by primary key (deterministic) |
|
|
326
|
+
| NULL representation | Configurable via `--null-mode` (see above) |
|
|
327
|
+
|
|
328
|
+
This is mostly RFC 4180 compliant, with one deliberate deviation: line endings use LF instead of CRLF. This produces cleaner git diffs and avoids mixed-endings issues on Unix systems.
|
|
329
|
+
|
|
330
|
+
Newlines embedded within field values are preserved as-is inside quoted fields. The Rust `csv` crate handles quoting and escaping automatically.
|
|
331
|
+
|
|
332
|
+
See [FORMAT.md](FORMAT.md) for the full normative format specification.
|
|
333
|
+
|
|
334
|
+
## Gotchas
|
|
335
|
+
|
|
336
|
+
Things that may surprise you on day one:
|
|
337
|
+
|
|
338
|
+
- **String-based sorting.** PK sort is lexicographic on strings, not numeric. `"10"` sorts before `"2"`. If you need numeric order, use a zero-padded string or an INTEGER primary key (integers sort correctly because shorter strings come first and same-length digit strings sort numerically).
|
|
339
|
+
|
|
340
|
+
- **Schema inference is limited.** `csvdb init` only infers three types: `INTEGER`, `REAL`, `TEXT`. It won't detect dates, booleans, or blobs. Edit `schema.sql` after init if you need richer types.
|
|
341
|
+
|
|
342
|
+
- **PK detection stops tracking at 100k values.** During `init`, uniqueness tracking for primary key candidates stops after 100,000 values. If the column was unique up to that point, it's still used as the PK.
|
|
343
|
+
|
|
344
|
+
- **Float precision in checksums.** Values are normalized to 10 decimal places for checksumming. `42.0` normalizes to `42` (integer-valued floats become integers). Very small precision differences across databases are absorbed.
|
|
345
|
+
|
|
346
|
+
- **DuckDB empty string limitation.** Empty strings in TEXT columns may become NULL when round-tripping through DuckDB due to a Rust driver limitation.
|
|
347
|
+
|
|
348
|
+
- **Blob values are hex strings in CSV.** BLOB data is stored as lowercase hex (e.g. `cafe`). It roundtrips correctly through SQLite and DuckDB.
|
|
349
|
+
|
|
350
|
+
- **No duplicate PK validation during CSV read.** Duplicate primary keys are not caught when reading CSV files. They will cause an error at database INSERT time.
|
|
351
|
+
|
|
352
|
+
- **DuckDB indexes are not exported.** Index metadata is not available from DuckDB sources. Indexes defined in a csvdb `schema.sql` are preserved when converting between csvdb and SQLite, but not when the source is DuckDB.
|
|
353
|
+
|
|
354
|
+
- **Views are not dependency-ordered.** Views are written in alphabetical order. If view A references view B, you may need to manually reorder them in `schema.sql`.
|
|
355
|
+
|
|
356
|
+
- **`__csvdb_rowid` is reserved.** The column name `__csvdb_rowid` is used by the `add-synthetic-key` order mode. Don't use it in your own schemas.
|
|
357
|
+
|
|
358
|
+
## Examples
|
|
359
|
+
|
|
360
|
+
The [`examples/`](examples/) directory contains ready-to-use examples:
|
|
361
|
+
|
|
362
|
+
- **`examples/store.csvdb/`** — A hand-written csvdb directory with two tables, an index, a view, and NULL values
|
|
363
|
+
- **`examples/raw-csvs/`** — Plain CSV files for demonstrating `csvdb init`
|
|
364
|
+
|
|
365
|
+
See [`examples/README.md`](examples/README.md) for usage instructions.
|
|
366
|
+
|
|
367
|
+
## Workflows
|
|
368
|
+
|
|
369
|
+
### Git-Tracked Data
|
|
370
|
+
|
|
371
|
+
Store data in git, rebuild databases as needed:
|
|
372
|
+
|
|
373
|
+
```bash
|
|
374
|
+
# Initial setup: export existing database
|
|
375
|
+
csvdb to-csvdb production.sqlite
|
|
376
|
+
git add production.csvdb/
|
|
377
|
+
git commit -m "Initial data import"
|
|
378
|
+
|
|
379
|
+
# Daily workflow: edit CSVs, commit, rebuild
|
|
380
|
+
vim production.csvdb/users.csv
|
|
381
|
+
git add -p production.csvdb/
|
|
382
|
+
git commit -m "Update user records"
|
|
383
|
+
csvdb to-sqlite production.csvdb/
|
|
384
|
+
```
|
|
385
|
+
|
|
386
|
+
### Deploy to Production
|
|
387
|
+
|
|
388
|
+
Use csvdb as the source of truth. Track schema and data in git, export to SQLite for deployment:
|
|
389
|
+
|
|
390
|
+
```bash
|
|
391
|
+
# Define your schema and seed data in csvdb format
|
|
392
|
+
mkdir -p myapp.csvdb
|
|
393
|
+
cat > myapp.csvdb/schema.sql <<'EOF'
|
|
394
|
+
CREATE TABLE config (
|
|
395
|
+
key TEXT PRIMARY KEY,
|
|
396
|
+
value TEXT NOT NULL
|
|
397
|
+
);
|
|
398
|
+
CREATE TABLE rates (
|
|
399
|
+
date TEXT NOT NULL,
|
|
400
|
+
rate REAL NOT NULL,
|
|
401
|
+
PRIMARY KEY (date)
|
|
402
|
+
);
|
|
403
|
+
EOF
|
|
404
|
+
|
|
405
|
+
# Edit data directly as CSV
|
|
406
|
+
cat > myapp.csvdb/config.csv <<'EOF'
|
|
407
|
+
key,value
|
|
408
|
+
app_name,MyApp
|
|
409
|
+
version,2.1
|
|
410
|
+
EOF
|
|
411
|
+
|
|
412
|
+
# Commit to git — schema and data are versioned together
|
|
413
|
+
git add myapp.csvdb/
|
|
414
|
+
git commit -m "Add rate config for Q1"
|
|
415
|
+
|
|
416
|
+
# Build SQLite for deployment
|
|
417
|
+
csvdb to-sqlite myapp.csvdb/
|
|
418
|
+
scp myapp.sqlite prod-server:/opt/myapp/data/
|
|
419
|
+
```
|
|
420
|
+
|
|
421
|
+
Changes go through normal code review. `git diff` shows exactly which rows changed. Rollback is `git revert`.
|
|
422
|
+
|
|
423
|
+
### Data Review via Pull Request
|
|
424
|
+
|
|
425
|
+
Treat data changes like code changes:
|
|
426
|
+
|
|
427
|
+
```bash
|
|
428
|
+
git checkout -b update-q2-rates
|
|
429
|
+
# Edit the CSV
|
|
430
|
+
vim myapp.csvdb/rates.csv
|
|
431
|
+
git add myapp.csvdb/rates.csv
|
|
432
|
+
git commit -m "Update Q2 rates"
|
|
433
|
+
git push origin update-q2-rates
|
|
434
|
+
# Open PR — reviewers see the exact row-level diff
|
|
435
|
+
```
|
|
436
|
+
|
|
437
|
+
Because CSVs are sorted by primary key, the diff contains only actual changes — no noise from row reordering.
|
|
438
|
+
|
|
439
|
+
### Piping Commands
|
|
440
|
+
|
|
441
|
+
Use `--pipe` for one-liner conversions:
|
|
442
|
+
|
|
443
|
+
```bash
|
|
444
|
+
# SQLite → DuckDB via pipe
|
|
445
|
+
csvdb to-csvdb mydb.sqlite --pipe | xargs csvdb to-duckdb
|
|
446
|
+
|
|
447
|
+
# SQLite → Parquet via pipe
|
|
448
|
+
csvdb to-parquetdb mydb.sqlite --pipe | xargs csvdb to-duckdb
|
|
449
|
+
```
|
|
450
|
+
|
|
451
|
+
The `--pipe` flag:
|
|
452
|
+
- Writes to system temp directory
|
|
453
|
+
- Outputs only the path (no "Created:" prefix)
|
|
454
|
+
- Uses forward slashes for cross-platform compatibility
|
|
455
|
+
|
|
456
|
+
### Database Migration
|
|
457
|
+
|
|
458
|
+
Convert between database formats:
|
|
459
|
+
|
|
460
|
+
```bash
|
|
461
|
+
# SQLite to DuckDB
|
|
462
|
+
csvdb to-csvdb legacy.sqlite
|
|
463
|
+
csvdb to-duckdb legacy.csvdb/
|
|
464
|
+
|
|
465
|
+
# DuckDB to SQLite
|
|
466
|
+
csvdb to-csvdb analytics.duckdb
|
|
467
|
+
csvdb to-sqlite analytics.csvdb/
|
|
468
|
+
|
|
469
|
+
# SQLite to Parquet
|
|
470
|
+
csvdb to-parquetdb legacy.sqlite
|
|
471
|
+
|
|
472
|
+
# Parquet to SQLite
|
|
473
|
+
csvdb to-sqlite legacy.parquetdb/
|
|
474
|
+
|
|
475
|
+
# Verify no data loss
|
|
476
|
+
csvdb checksum legacy.sqlite
|
|
477
|
+
csvdb checksum legacy.duckdb
|
|
478
|
+
csvdb checksum legacy.parquetdb/
|
|
479
|
+
# Checksums match = data preserved
|
|
480
|
+
```
|
|
481
|
+
|
|
482
|
+
### Diff and Review Changes
|
|
483
|
+
|
|
484
|
+
Use git to review data changes:
|
|
485
|
+
|
|
486
|
+
```bash
|
|
487
|
+
# See what changed
|
|
488
|
+
git diff production.csvdb/
|
|
489
|
+
|
|
490
|
+
# See changes to specific table
|
|
491
|
+
git diff production.csvdb/orders.csv
|
|
492
|
+
|
|
493
|
+
# Blame: who changed what
|
|
494
|
+
git blame production.csvdb/users.csv
|
|
495
|
+
```
|
|
496
|
+
|
|
497
|
+
### CI/CD Integration
|
|
498
|
+
|
|
499
|
+
Verify data integrity in CI:
|
|
500
|
+
|
|
501
|
+
```bash
|
|
502
|
+
#!/bin/bash
|
|
503
|
+
set -e
|
|
504
|
+
|
|
505
|
+
# Rebuild from csvdb source
|
|
506
|
+
csvdb to-sqlite data.csvdb/
|
|
507
|
+
|
|
508
|
+
# Verify checksum matches expected
|
|
509
|
+
EXPECTED="a1b2c3d4..."
|
|
510
|
+
ACTUAL=$(csvdb checksum data.sqlite)
|
|
511
|
+
[ "$EXPECTED" = "$ACTUAL" ] || exit 1
|
|
512
|
+
```
|
|
513
|
+
|
|
514
|
+
## Python Bindings
|
|
515
|
+
|
|
516
|
+
csvdb provides native Python bindings via PyO3, giving you direct access to all csvdb functions without subprocess overhead.
|
|
517
|
+
|
|
518
|
+
### Install
|
|
519
|
+
|
|
520
|
+
```bash
|
|
521
|
+
pip install csvdb-py
|
|
522
|
+
```
|
|
523
|
+
|
|
524
|
+
### API
|
|
525
|
+
|
|
526
|
+
```python
|
|
527
|
+
import csvdb
|
|
528
|
+
|
|
529
|
+
# Convert between formats
|
|
530
|
+
csvdb.to_csvdb("mydb.sqlite", force=True)
|
|
531
|
+
csvdb.to_sqlite("mydb.csvdb", force=True)
|
|
532
|
+
csvdb.to_duckdb("mydb.csvdb", force=True)
|
|
533
|
+
csvdb.to_parquetdb("mydb.csvdb", force=True)
|
|
534
|
+
|
|
535
|
+
# Incremental export (only re-exports changed tables)
|
|
536
|
+
result = csvdb.to_csvdb_incremental("mydb.sqlite")
|
|
537
|
+
# result: {"path": "...", "added": [...], "updated": [...], "unchanged": [...], "removed": [...]}
|
|
538
|
+
|
|
539
|
+
# Checksum (format-independent, deterministic)
|
|
540
|
+
hash = csvdb.checksum("mydb.csvdb")
|
|
541
|
+
|
|
542
|
+
# SQL queries (read-only, returns list of dicts)
|
|
543
|
+
rows = csvdb.sql("mydb.csvdb", "SELECT name, COUNT(*) AS n FROM users GROUP BY name")
|
|
544
|
+
|
|
545
|
+
# Diff two databases
|
|
546
|
+
has_diff = csvdb.diff("v1.csvdb", "v2.csvdb")
|
|
547
|
+
|
|
548
|
+
# Validate structure
|
|
549
|
+
info = csvdb.validate("mydb.csvdb")
|
|
550
|
+
|
|
551
|
+
# Initialize csvdb from raw CSV files
|
|
552
|
+
result = csvdb.init("./raw_csvs/")
|
|
553
|
+
|
|
554
|
+
# Selective export
|
|
555
|
+
csvdb.to_csvdb("mydb.sqlite", tables=["users", "orders"], force=True)
|
|
556
|
+
csvdb.to_csvdb("mydb.sqlite", exclude=["logs"], force=True)
|
|
557
|
+
```
|
|
558
|
+
|
|
559
|
+
### Development
|
|
560
|
+
|
|
561
|
+
```bash
|
|
562
|
+
cd csvdb-python
|
|
563
|
+
uv sync
|
|
564
|
+
uv run maturin develop --release
|
|
565
|
+
uv run pytest
|
|
566
|
+
```
|
|
567
|
+
|
|
568
|
+
## Perl Bindings
|
|
569
|
+
|
|
570
|
+
csvdb provides Perl bindings via a C FFI shared library and `FFI::Platypus`.
|
|
571
|
+
|
|
572
|
+
### Setup
|
|
573
|
+
|
|
574
|
+
```bash
|
|
575
|
+
# Build the shared library
|
|
576
|
+
cargo build --release -p csvdb-ffi
|
|
577
|
+
|
|
578
|
+
# Install dependencies (macOS)
|
|
579
|
+
brew install cpanminus libffi
|
|
580
|
+
LDFLAGS="-L/opt/homebrew/opt/libffi/lib" \
|
|
581
|
+
CPPFLAGS="-I/opt/homebrew/opt/libffi/include" \
|
|
582
|
+
cpanm FFI::Platypus
|
|
583
|
+
|
|
584
|
+
# Install dependencies (Linux)
|
|
585
|
+
sudo apt-get install cpanminus libffi-dev
|
|
586
|
+
cpanm FFI::Platypus
|
|
587
|
+
```
|
|
588
|
+
|
|
589
|
+
### Running Examples
|
|
590
|
+
|
|
591
|
+
```bash
|
|
592
|
+
perl -Iperl/lib perl/examples/basic_usage.pl
|
|
593
|
+
```
|
|
594
|
+
|
|
595
|
+
### API
|
|
596
|
+
|
|
597
|
+
```perl
|
|
598
|
+
use Csvdb;
|
|
599
|
+
|
|
600
|
+
print Csvdb::version(), "\n";
|
|
601
|
+
|
|
602
|
+
# Convert between formats
|
|
603
|
+
my $csvdb_path = Csvdb::to_csvdb(input => "mydb.sqlite", force => 1);
|
|
604
|
+
my $sqlite_path = Csvdb::to_sqlite(input => "mydb.csvdb", force => 1);
|
|
605
|
+
my $duckdb_path = Csvdb::to_duckdb(input => "mydb.csvdb", force => 1);
|
|
606
|
+
|
|
607
|
+
# Checksum
|
|
608
|
+
my $hash = Csvdb::checksum(input => "mydb.csvdb");
|
|
609
|
+
|
|
610
|
+
# SQL query (returns CSV text)
|
|
611
|
+
my $csv = Csvdb::sql(path => "mydb.csvdb", query => "SELECT * FROM users");
|
|
612
|
+
|
|
613
|
+
# Diff (returns 0=identical, 1=differences)
|
|
614
|
+
my $rc = Csvdb::diff(left => "v1.csvdb", right => "v2.csvdb");
|
|
615
|
+
|
|
616
|
+
# Validate (returns 0=valid, 1=errors)
|
|
617
|
+
my $rc = Csvdb::validate(input => "mydb.csvdb");
|
|
618
|
+
```
|
|
619
|
+
|
|
620
|
+
### Running Tests
|
|
621
|
+
|
|
622
|
+
```bash
|
|
623
|
+
cargo build --release -p csvdb-ffi
|
|
624
|
+
prove perl/t/
|
|
625
|
+
```
|
|
626
|
+
|
|
627
|
+
## Project Structure
|
|
628
|
+
|
|
629
|
+
```
|
|
630
|
+
csvdb/ # Core library + CLI binary
|
|
631
|
+
src/
|
|
632
|
+
main.rs # CLI (clap)
|
|
633
|
+
lib.rs
|
|
634
|
+
commands/
|
|
635
|
+
init.rs # CSV files -> csvdb (schema inference)
|
|
636
|
+
to_csv.rs # any format -> csvdb
|
|
637
|
+
to_sqlite.rs # any format -> SQLite
|
|
638
|
+
to_duckdb.rs # any format -> DuckDB
|
|
639
|
+
to_parquetdb.rs # any format -> parquetdb (Parquet)
|
|
640
|
+
checksum.rs # Format-independent checksums
|
|
641
|
+
validate.rs # Structural integrity checks
|
|
642
|
+
diff.rs # Compare two databases
|
|
643
|
+
sql.rs # Read-only SQL queries
|
|
644
|
+
core/
|
|
645
|
+
schema.rs # Parse/emit schema.sql, type normalization
|
|
646
|
+
table.rs # Row operations, PK handling
|
|
647
|
+
csv.rs # Deterministic CSV I/O
|
|
648
|
+
input.rs # Input format detection
|
|
649
|
+
csvdb-python/ # Python bindings (PyO3)
|
|
650
|
+
src/lib.rs
|
|
651
|
+
examples/
|
|
652
|
+
basic_usage.py
|
|
653
|
+
advanced_usage.py
|
|
654
|
+
csvdb-ffi/ # C FFI for Perl and other languages
|
|
655
|
+
src/lib.rs
|
|
656
|
+
perl/ # Perl module (FFI::Platypus)
|
|
657
|
+
lib/Csvdb.pm
|
|
658
|
+
examples/basic_usage.pl
|
|
659
|
+
tests/functional/ # Python functional tests
|
|
660
|
+
conftest.py
|
|
661
|
+
test_commands.py
|
|
662
|
+
test_performance.py
|
|
663
|
+
pyproject.toml
|
|
664
|
+
```
|
|
665
|
+
|
|
666
|
+
## Development
|
|
667
|
+
|
|
668
|
+
```bash
|
|
669
|
+
cargo build -p csvdb
|
|
670
|
+
cargo run -p csvdb -- init ./raw_csvs/
|
|
671
|
+
cargo run -p csvdb -- to-csvdb mydb.sqlite
|
|
672
|
+
cargo run -p csvdb -- to-sqlite mydb.csvdb/
|
|
673
|
+
cargo run -p csvdb -- to-duckdb mydb.csvdb/
|
|
674
|
+
cargo run -p csvdb -- to-parquetdb mydb.sqlite
|
|
675
|
+
cargo run -p csvdb -- checksum mydb.sqlite
|
|
676
|
+
```
|
|
677
|
+
|
|
678
|
+
## Testing
|
|
679
|
+
|
|
680
|
+
```bash
|
|
681
|
+
# Rust unit tests
|
|
682
|
+
cargo test
|
|
683
|
+
|
|
684
|
+
# Python functional tests (189 tests)
|
|
685
|
+
cd tests/functional
|
|
686
|
+
uv run pytest
|
|
687
|
+
|
|
688
|
+
# Cross-platform (avoids .venv collision)
|
|
689
|
+
uv run --isolated pytest
|
|
690
|
+
```
|
|
691
|
+
|
|
692
|
+
## License
|
|
693
|
+
|
|
694
|
+
MIT
|
|
695
|
+
|
|
@@ -0,0 +1,5 @@
|
|
|
1
|
+
csvdb_cli-0.2.8.data/scripts/csvdb.exe,sha256=KiL5jUbn8n_9WRvIXIVbpuIVht6ppRhJfpOWTH91L4w,32500736
|
|
2
|
+
csvdb_cli-0.2.8.dist-info/METADATA,sha256=y2MAc_SmpZNwZZ-mI3KO-K1n3TfkXazr_6jicVjgyMU,20434
|
|
3
|
+
csvdb_cli-0.2.8.dist-info/WHEEL,sha256=qKMSySxXiSk92mm8YkFo8xRrmPs5eXVkc3m4Z8OBMj4,97
|
|
4
|
+
csvdb_cli-0.2.8.dist-info/sboms/csvdb.cyclonedx.json,sha256=bh199DLh1yVxUrWKBcuYs68VZ_ck2qtT1B27B4XrwUI,414236
|
|
5
|
+
csvdb_cli-0.2.8.dist-info/RECORD,,
|