robin-sparkless 0.1.0__cp38-abi3-musllinux_1_2_x86_64.whl
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- robin_sparkless/__init__.py +5 -0
- robin_sparkless/robin_sparkless.abi3.so +0 -0
- robin_sparkless-0.1.0.dist-info/METADATA +166 -0
- robin_sparkless-0.1.0.dist-info/RECORD +8 -0
- robin_sparkless-0.1.0.dist-info/WHEEL +4 -0
- robin_sparkless.libs/libexpat-7ae38be4.so.1 +0 -0
- robin_sparkless.libs/libm-f06f2ce1.so.6 +0 -0
- robin_sparkless.libs/libpython3-e1b05735.12.so.1.0 +0 -0
|
Binary file
|
|
@@ -0,0 +1,166 @@
|
|
|
1
|
+
Metadata-Version: 2.4
|
|
2
|
+
Name: robin-sparkless
|
|
3
|
+
Version: 0.1.0
|
|
4
|
+
Classifier: Development Status :: 3 - Alpha
|
|
5
|
+
Classifier: License :: OSI Approved :: MIT License
|
|
6
|
+
Classifier: Programming Language :: Python :: 3
|
|
7
|
+
Classifier: Programming Language :: Rust
|
|
8
|
+
Classifier: Topic :: Scientific/Engineering
|
|
9
|
+
Summary: PySpark-like DataFrame API in Rust (Polars backend), with Python bindings via PyO3
|
|
10
|
+
Author: Robin Sparkless contributors
|
|
11
|
+
License: MIT
|
|
12
|
+
Requires-Python: >=3.8
|
|
13
|
+
Description-Content-Type: text/markdown; charset=UTF-8; variant=GFM
|
|
14
|
+
|
|
15
|
+
# Robin Sparkless
|
|
16
|
+
|
|
17
|
+
**PySpark-style DataFrames in Rust—no JVM.** A DataFrame library that mirrors PySpark’s API and semantics while using [Polars](https://www.pola.rs/) as the execution engine.
|
|
18
|
+
|
|
19
|
+
[](https://crates.io/crates/robin-sparkless)
|
|
20
|
+
[](https://docs.rs/robin-sparkless)
|
|
21
|
+
[](https://robin-sparkless.readthedocs.io/en/latest/)
|
|
22
|
+
[](LICENSE)
|
|
23
|
+
|
|
24
|
+
---
|
|
25
|
+
|
|
26
|
+
## Why Robin Sparkless?
|
|
27
|
+
|
|
28
|
+
- **Familiar API** — `SparkSession`, `DataFrame`, `Column`, and PySpark-like functions so you can reuse patterns without the JVM.
|
|
29
|
+
- **Polars under the hood** — Fast, native Rust execution with Polars for IO, expressions, and aggregations.
|
|
30
|
+
- **Rust-first, Python optional** — Use it as a Rust library or build the Python extension via PyO3 for a drop-in style API.
|
|
31
|
+
- **Sparkless backend target** — Designed to power [Sparkless](https://github.com/eddiethedean/sparkless) (the Python PySpark replacement) so Sparkless can run on this engine via PyO3.
|
|
32
|
+
|
|
33
|
+
---
|
|
34
|
+
|
|
35
|
+
## Features
|
|
36
|
+
|
|
37
|
+
| Area | What’s included |
|
|
38
|
+
|------|------------------|
|
|
39
|
+
| **Core** | `SparkSession`, `DataFrame`, `Column`; `filter`, `select`, `with_column`, `order_by`, `group_by`, joins |
|
|
40
|
+
| **IO** | CSV, Parquet, JSON via `SparkSession::read_*` |
|
|
41
|
+
| **Expressions** | `col()`, `lit()`, `when`/`then`/`otherwise`, `coalesce`, cast, type/conditional helpers |
|
|
42
|
+
| **Aggregates** | `count`, `sum`, `avg`, `min`, `max`, and more; multi-column groupBy |
|
|
43
|
+
| **Window** | `row_number`, `rank`, `dense_rank`, `lag`, `lead`, `first_value`, `last_value`, and others with `.over()` |
|
|
44
|
+
| **Arrays & maps** | `array_*`, `explode`, `create_map`, `map_keys`, `map_values`, and related functions |
|
|
45
|
+
| **Strings & JSON** | String functions (`upper`, `lower`, `substring`, `regexp_*`, etc.), `get_json_object`, `from_json`, `to_json` |
|
|
46
|
+
| **Datetime & math** | Date/time extractors and arithmetic, `year`/`month`/`day`, math (`sin`, `cos`, `sqrt`, `pow`, …) |
|
|
47
|
+
| **Optional SQL** | `spark.sql("SELECT ...")` with temp views (`createOrReplaceTempView`, `table`) — enable with `--features sql` |
|
|
48
|
+
| **Optional Delta** | `read_delta`, `read_delta_with_version`, `write_delta` — enable with `--features delta` |
|
|
49
|
+
|
|
50
|
+
Known differences from PySpark are documented in [docs/PYSPARK_DIFFERENCES.md](docs/PYSPARK_DIFFERENCES.md). Parity status and roadmap are in [docs/PARITY_STATUS.md](docs/PARITY_STATUS.md) and [docs/ROADMAP.md](docs/ROADMAP.md).
|
|
51
|
+
|
|
52
|
+
---
|
|
53
|
+
|
|
54
|
+
## Installation
|
|
55
|
+
|
|
56
|
+
### Rust
|
|
57
|
+
|
|
58
|
+
Add to your `Cargo.toml`:
|
|
59
|
+
|
|
60
|
+
```toml
|
|
61
|
+
[dependencies]
|
|
62
|
+
robin-sparkless = "0.1.0"
|
|
63
|
+
```
|
|
64
|
+
|
|
65
|
+
Optional features:
|
|
66
|
+
|
|
67
|
+
```toml
|
|
68
|
+
robin-sparkless = { version = "0.1.0", features = ["sql"] } # spark.sql(), temp views
|
|
69
|
+
robin-sparkless = { version = "0.1.0", features = ["delta"] } # Delta Lake read/write
|
|
70
|
+
```
|
|
71
|
+
|
|
72
|
+
### Python (PyO3)
|
|
73
|
+
|
|
74
|
+
Build the Python extension with [maturin](https://www.maturin.rs/) (Rust + Python 3.8+):
|
|
75
|
+
|
|
76
|
+
```bash
|
|
77
|
+
pip install maturin
|
|
78
|
+
maturin develop --features pyo3
|
|
79
|
+
# With optional SQL and/or Delta:
|
|
80
|
+
maturin develop --features "pyo3,sql"
|
|
81
|
+
maturin develop --features "pyo3,delta"
|
|
82
|
+
maturin develop --features "pyo3,sql,delta"
|
|
83
|
+
```
|
|
84
|
+
|
|
85
|
+
Then use the `robin_sparkless` module; see [docs/PYTHON_API.md](docs/PYTHON_API.md).
|
|
86
|
+
|
|
87
|
+
---
|
|
88
|
+
|
|
89
|
+
## Quick start
|
|
90
|
+
|
|
91
|
+
### Rust
|
|
92
|
+
|
|
93
|
+
```rust
|
|
94
|
+
use robin_sparkless::{col, lit_i64, SparkSession};
|
|
95
|
+
|
|
96
|
+
fn main() -> Result<(), Box<dyn std::error::Error>> {
|
|
97
|
+
let spark = SparkSession::builder().app_name("demo").get_or_create();
|
|
98
|
+
|
|
99
|
+
// Create a DataFrame from rows (id, age, name)
|
|
100
|
+
let df = spark.create_dataframe(
|
|
101
|
+
vec![
|
|
102
|
+
(1, 25, "Alice".to_string()),
|
|
103
|
+
(2, 30, "Bob".to_string()),
|
|
104
|
+
(3, 35, "Charlie".to_string()),
|
|
105
|
+
],
|
|
106
|
+
vec!["id", "age", "name"],
|
|
107
|
+
)?;
|
|
108
|
+
|
|
109
|
+
// Filter and show
|
|
110
|
+
let adults = df.filter(col("age").gt(lit_i64(26)))?;
|
|
111
|
+
adults.show(Some(10))?;
|
|
112
|
+
|
|
113
|
+
Ok(())
|
|
114
|
+
}
|
|
115
|
+
```
|
|
116
|
+
|
|
117
|
+
You can also wrap an existing Polars `DataFrame` with `DataFrame::from_polars(polars_df)`. See [docs/QUICKSTART.md](docs/QUICKSTART.md) for joins, window functions, and more.
|
|
118
|
+
|
|
119
|
+
### Python
|
|
120
|
+
|
|
121
|
+
```python
|
|
122
|
+
import robin_sparkless as rs
|
|
123
|
+
|
|
124
|
+
spark = rs.SparkSession.builder().app_name("demo").get_or_create()
|
|
125
|
+
df = spark.create_dataframe([(1, 25, "Alice"), (2, 30, "Bob")], ["id", "age", "name"])
|
|
126
|
+
filtered = df.filter(rs.col("age").gt(rs.lit(26)))
|
|
127
|
+
print(filtered.collect()) # [{"id": 2, "age": 30, "name": "Bob"}]
|
|
128
|
+
```
|
|
129
|
+
|
|
130
|
+
---
|
|
131
|
+
|
|
132
|
+
## Development
|
|
133
|
+
|
|
134
|
+
**Prerequisites:** Rust (see [rust-toolchain.toml](rust-toolchain.toml)), and for Python tests: Python 3.8+, `maturin`, `pytest`.
|
|
135
|
+
|
|
136
|
+
| Command | Description |
|
|
137
|
+
|---------|-------------|
|
|
138
|
+
| `cargo build` | Build (Rust only) |
|
|
139
|
+
| `cargo build --features pyo3` | Build with Python extension |
|
|
140
|
+
| `cargo test` | Run Rust tests |
|
|
141
|
+
| `make test` | Run Rust + Python tests (creates venv, `maturin develop`, `pytest`) |
|
|
142
|
+
| `make check` | Format, clippy, audit, deny, tests |
|
|
143
|
+
| `cargo bench` | Benchmarks (robin-sparkless vs Polars) |
|
|
144
|
+
| `cargo doc --open` | Build and open API docs |
|
|
145
|
+
|
|
146
|
+
CI runs the same checks on push/PR (see [.github/workflows/ci.yml](.github/workflows/ci.yml)).
|
|
147
|
+
|
|
148
|
+
---
|
|
149
|
+
|
|
150
|
+
## Documentation
|
|
151
|
+
|
|
152
|
+
- [**Full documentation (Read the Docs)**](https://robin-sparkless.readthedocs.io/) — Quickstart, Python API, reference, and Sparkless integration (MkDocs)
|
|
153
|
+
- [**API reference (docs.rs)**](https://docs.rs/robin-sparkless) — Crate API
|
|
154
|
+
- [**QUICKSTART**](docs/QUICKSTART.md) — Build, usage, optional features, benchmarks
|
|
155
|
+
- [**ROADMAP**](docs/ROADMAP.md) — Development roadmap and Sparkless integration
|
|
156
|
+
- [**PYSPARK_DIFFERENCES**](docs/PYSPARK_DIFFERENCES.md) — Known divergences from PySpark
|
|
157
|
+
- [**RELEASING**](docs/RELEASING.md) — Releasing and publishing to crates.io
|
|
158
|
+
|
|
159
|
+
See also [CHANGELOG.md](CHANGELOG.md) for version history.
|
|
160
|
+
|
|
161
|
+
---
|
|
162
|
+
|
|
163
|
+
## License
|
|
164
|
+
|
|
165
|
+
MIT
|
|
166
|
+
|
|
@@ -0,0 +1,8 @@
|
|
|
1
|
+
robin_sparkless/__init__.py,sha256=h26i0kv9czksgzJ0zbFJvX-5IZV8hrWVEerFQOVG3fA,143
|
|
2
|
+
robin_sparkless/robin_sparkless.abi3.so,sha256=LgYbC0ck-hnzaNqG208BKlzGgIeRSgJf7Ilc-XbRpac,54156865
|
|
3
|
+
robin_sparkless-0.1.0.dist-info/METADATA,sha256=gyXRM6eaUer-zATnUJx5ByeY7f6JwhY48a9yiEMOvDU,6393
|
|
4
|
+
robin_sparkless-0.1.0.dist-info/WHEEL,sha256=YwiRh-q2ktXDNZBtnkQS-JaPjD72S6lWu37XiLW5Pm0,106
|
|
5
|
+
robin_sparkless.libs/libexpat-7ae38be4.so.1,sha256=KOWJvypweOLfupaM-rmzYULyN-QPKqkDH7psH23MMsU,178857
|
|
6
|
+
robin_sparkless.libs/libm-f06f2ce1.so.6,sha256=2wYyVUqIYZsl3KkeAe_0J6cSp8q3TVkTP0WHjBq0KLs,973721
|
|
7
|
+
robin_sparkless.libs/libpython3-e1b05735.12.so.1.0,sha256=neiMNBnO3c7o2TFchyHGNpBmozMk6Q4m2mjGa7w1AvY,9159497
|
|
8
|
+
robin_sparkless-0.1.0.dist-info/RECORD,,
|
|
Binary file
|
|
Binary file
|
|
Binary file
|