rustream 0.1.0__py3-none-win_amd64.whl

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Binary file
@@ -0,0 +1,201 @@
1
+ Metadata-Version: 2.4
2
+ Name: rustream
3
+ Version: 0.1.0
4
+ Classifier: Development Status :: 3 - Alpha
5
+ Classifier: Environment :: Console
6
+ Classifier: Intended Audience :: Developers
7
+ Classifier: License :: OSI Approved :: MIT License
8
+ Classifier: Programming Language :: Rust
9
+ Classifier: Topic :: Database
10
+ Summary: Fast Postgres → Parquet sync tool
11
+ Keywords: postgres,parquet,s3,sync,etl
12
+ License: MIT
13
+ Requires-Python: >=3.8
14
+ Description-Content-Type: text/markdown; charset=UTF-8; variant=GFM
15
+ Project-URL: Repository, https://github.com/kraftaa/rustream
16
+
17
+ # rustream
18
+
19
+ Fast Postgres to Parquet sync tool. Reads tables from Postgres, writes Parquet files to local disk or S3. Supports incremental sync via `updated_at` watermark tracking.
20
+
21
+ ## Installation
22
+
23
+ ### From PyPI
24
+
25
+ ```bash
26
+ pipx install rustream
27
+ # or
28
+ pip install rustream
29
+ ```
30
+
31
+ ### From source
32
+
33
+ ```bash
34
+ git clone https://github.com/kraftaa/rustream.git
35
+ cd rustream
36
+ cargo build --release
37
+ # binary is at target/release/rustream
38
+ ```
39
+
40
+ ### With maturin (local dev)
41
+
42
+ ```bash
43
+ pip install maturin
44
+ maturin develop --release
45
+ # now `rustream` is on your PATH
46
+ ```
47
+
48
+ ## Usage
49
+
50
+ ```bash
51
+ # Copy and edit the example config
52
+ cp config.example.yaml config.yaml
53
+
54
+ # Preview what will be synced (no files written)
55
+ rustream sync --config config.yaml --dry-run
56
+
57
+ # Run sync
58
+ rustream sync --config config.yaml
59
+ ```
60
+
61
+ Enable debug logging with `RUST_LOG`:
62
+
63
+ ```bash
64
+ RUST_LOG=rustream=debug rustream sync --config config.yaml
65
+ ```
66
+
67
+ ## Configuration
68
+
69
+ ### Specific tables (recommended)
70
+
71
+ ```yaml
72
+ postgres:
73
+ host: localhost
74
+ database: mydb
75
+ user: postgres
76
+ password: secret
77
+
78
+ output:
79
+ type: local
80
+ path: ./output
81
+
82
+ tables:
83
+ - name: users
84
+ incremental_column: updated_at
85
+ columns: # optional: pick specific columns
86
+ - id
87
+ - email
88
+ - created_at
89
+ - updated_at
90
+
91
+ - name: orders
92
+ incremental_column: updated_at
93
+
94
+ - name: products # no incremental_column = full sync every run
95
+ ```
96
+
97
+ ### All tables (auto-discover)
98
+
99
+ Omit `tables` to sync every table in the schema. Use `exclude` to skip some:
100
+
101
+ ```yaml
102
+ postgres:
103
+ host: localhost
104
+ database: mydb
105
+ user: postgres
106
+
107
+ output:
108
+ type: local
109
+ path: ./output
110
+
111
+ # schema: public # default
112
+ exclude:
113
+ - schema_migrations
114
+ - ar_internal_metadata
115
+ ```
116
+
117
+ ### S3 output
118
+
119
+ ```yaml
120
+ output:
121
+ type: s3
122
+ bucket: my-data-lake
123
+ prefix: raw/postgres
124
+ region: us-east-1
125
+ ```
126
+
127
+ AWS credentials come from environment variables, `~/.aws/credentials`, or IAM role.
128
+
129
+ ### Config reference
130
+
131
+ | Field | Description |
132
+ |---|---|
133
+ | `postgres.host` | Postgres host |
134
+ | `postgres.port` | Postgres port (default: 5432) |
135
+ | `postgres.database` | Database name |
136
+ | `postgres.user` | Database user |
137
+ | `postgres.password` | Database password (optional) |
138
+ | `output.type` | `local` or `s3` |
139
+ | `output.path` | Local directory for Parquet files (when type=local) |
140
+ | `output.bucket` | S3 bucket (when type=s3) |
141
+ | `output.prefix` | S3 key prefix (when type=s3) |
142
+ | `output.region` | AWS region (when type=s3, optional) |
143
+ | `batch_size` | Rows per Parquet file (default: 10000) |
144
+ | `state_dir` | Directory for SQLite watermark state (default: `.rustream_state`) |
145
+ | `schema` | Schema to discover tables from (default: `public`) |
146
+ | `exclude` | List of table names to skip when using auto-discovery |
147
+ | `tables[].name` | Table name |
148
+ | `tables[].schema` | Schema name (default: `public`) |
149
+ | `tables[].columns` | Columns to sync (default: all) |
150
+ | `tables[].incremental_column` | Column for watermark-based incremental sync |
151
+ | `tables[].partition_by` | Partition output files: `date`, `month`, or `year` |
152
+
153
+ ## How it works
154
+
155
+ 1. Connects to Postgres and introspects each table's schema via `information_schema`
156
+ 2. Maps Postgres column types to Arrow types automatically
157
+ 3. Reads rows in batches, converting to Arrow RecordBatches
158
+ 4. Writes each batch as a Snappy-compressed Parquet file
159
+ 5. Tracks the high watermark (max value of `incremental_column`) in local SQLite
160
+ 6. On next run, only reads rows where `incremental_column > last_watermark`
161
+
162
+ Tables without `incremental_column` do a full sync every run.
163
+
164
+ ## Supported Postgres types
165
+
166
+ | Postgres | Arrow |
167
+ |---|---|
168
+ | `boolean` | Boolean |
169
+ | `smallint` | Int16 |
170
+ | `integer`, `serial` | Int32 |
171
+ | `bigint`, `bigserial` | Int64 |
172
+ | `real` | Float32 |
173
+ | `double precision` | Float64 |
174
+ | `numeric` / `decimal` | Utf8 (preserves precision) |
175
+ | `text`, `varchar`, `char` | Utf8 |
176
+ | `bytea` | Binary |
177
+ | `date` | Date32 |
178
+ | `timestamp` | Timestamp(Microsecond) |
179
+ | `timestamptz` | Timestamp(Microsecond, UTC) |
180
+ | `uuid` | Utf8 |
181
+ | `json`, `jsonb` | Utf8 |
182
+ | arrays | Utf8 (JSON serialized) |
183
+
184
+ ## Publishing
185
+
186
+ The project uses [maturin](https://github.com/PyO3/maturin) to package the Rust binary as a Python wheel (same approach as ruff, uv, etc). The CI workflow in `.github/workflows/release.yml` builds wheels for Linux, macOS, and Windows, then publishes to PyPI on tagged releases.
187
+
188
+ To publish manually:
189
+
190
+ ```bash
191
+ # Build wheels for current platform
192
+ maturin build --release
193
+
194
+ # Upload to PyPI (needs PYPI_API_TOKEN)
195
+ maturin publish
196
+ ```
197
+
198
+ ## License
199
+
200
+ MIT
201
+
@@ -0,0 +1,4 @@
1
+ rustream-0.1.0.data\scripts\rustream.exe,sha256=Zj23eK-y1VvsCK7WpEY6Xzsu28dUop8sZvXdGCH26fY,29388288
2
+ rustream-0.1.0.dist-info\METADATA,sha256=9pcWqF5bPtejATiztJ8NzmxIN6Aw1rxcNsPRZbQ8mDo,5242
3
+ rustream-0.1.0.dist-info\WHEEL,sha256=jsSEiVNsW1dJj5gDaReR40i7mhgBjWtms6nAD6EViXU,94
4
+ rustream-0.1.0.dist-info\RECORD,,
@@ -0,0 +1,4 @@
1
+ Wheel-Version: 1.0
2
+ Generator: maturin (1.11.5)
3
+ Root-Is-Purelib: false
4
+ Tag: py3-none-win_amd64