dataprof 0.4.70__cp311-cp311-win_amd64.whl → 0.4.77__cp311-cp311-win_amd64.whl
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Potentially problematic release.
This version of dataprof might be problematic. Click here for more details.
- dataprof/dataprof.cp311-win_amd64.pyd +0 -0
- {dataprof-0.4.70.dist-info → dataprof-0.4.77.dist-info}/METADATA +21 -15
- dataprof-0.4.77.dist-info/RECORD +6 -0
- {dataprof-0.4.70.dist-info → dataprof-0.4.77.dist-info}/WHEEL +1 -1
- dataprof-0.4.70.dist-info/RECORD +0 -6
- {dataprof-0.4.70.dist-info → dataprof-0.4.77.dist-info}/licenses/LICENSE +0 -0
|
Binary file
|
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
Metadata-Version: 2.4
|
|
2
2
|
Name: dataprof
|
|
3
|
-
Version: 0.4.
|
|
3
|
+
Version: 0.4.77
|
|
4
4
|
Classifier: Development Status :: 4 - Beta
|
|
5
5
|
Classifier: Intended Audience :: Developers
|
|
6
6
|
Classifier: Intended Audience :: Science/Research
|
|
@@ -44,7 +44,6 @@ Project-URL: Issues, https://github.com/AndreaBozzo/dataprof/issues
|
|
|
44
44
|
[](LICENSE)
|
|
45
45
|
[](https://www.rust-lang.org)
|
|
46
46
|
[](https://crates.io/crates/dataprof)
|
|
47
|
-
[](https://csv-mlready-api.vercel.app)
|
|
48
47
|
|
|
49
48
|
**DISCLAIMER FOR HUMAN READERS**
|
|
50
49
|
|
|
@@ -71,18 +70,6 @@ DataProf processes **all data locally** on your machine. Zero telemetry, zero ex
|
|
|
71
70
|
|
|
72
71
|
**Complete transparency:** Every metric, calculation, and data point is documented with source code references for independent verification.
|
|
73
72
|
|
|
74
|
-
## Try Online
|
|
75
|
-
|
|
76
|
-
**No installation required!** Test dataprof instantly with our web interface:
|
|
77
|
-
|
|
78
|
-
**[CSV Quality API →](https://csv-mlready-api.vercel.app)**
|
|
79
|
-
|
|
80
|
-
- Drag & drop your CSV (up to 50MB)
|
|
81
|
-
- Get comprehensive quality score in ~10 seconds
|
|
82
|
-
- ISO 8000/25012 compliant metrics
|
|
83
|
-
- Powered by dataprof v0.4.61 core engine
|
|
84
|
-
- Embeddable badges for your README
|
|
85
|
-
|
|
86
73
|
## CI/CD Integration
|
|
87
74
|
|
|
88
75
|
Automate data quality checks in your workflows with our GitHub Action:
|
|
@@ -122,6 +109,9 @@ Perfect for ensuring data quality in pipelines, validating data integrity, or ge
|
|
|
122
109
|
# Comprehensive quality analysis
|
|
123
110
|
dataprof analyze data.csv --detailed
|
|
124
111
|
|
|
112
|
+
# Analyze Parquet files (requires --features parquet)
|
|
113
|
+
dataprof analyze data.parquet --detailed
|
|
114
|
+
|
|
125
115
|
# Windows example (from project root after cargo build --release)
|
|
126
116
|
target\release\dataprof-cli.exe analyze data.csv --detailed
|
|
127
117
|
```
|
|
@@ -143,6 +133,12 @@ dataprof batch /data/folder --recursive --parallel
|
|
|
143
133
|
# Generate HTML batch dashboard
|
|
144
134
|
dataprof batch /data/folder --recursive --html batch_report.html
|
|
145
135
|
|
|
136
|
+
# JSON export for CI/CD automation
|
|
137
|
+
dataprof batch /data/folder --json batch_results.json --recursive
|
|
138
|
+
|
|
139
|
+
# JSON output to stdout
|
|
140
|
+
dataprof batch /data/folder --format json --recursive
|
|
141
|
+
|
|
146
142
|
# With custom filter and progress
|
|
147
143
|
dataprof batch /data/folder --filter "*.csv" --parallel --progress
|
|
148
144
|
```
|
|
@@ -257,10 +253,13 @@ cargo build --release
|
|
|
257
253
|
# With Apache Arrow (columnar processing, ~90s compile)
|
|
258
254
|
cargo build --release --features arrow
|
|
259
255
|
|
|
256
|
+
# With Parquet support (requires arrow, ~95s compile)
|
|
257
|
+
cargo build --release --features parquet
|
|
258
|
+
|
|
260
259
|
# With database connectors
|
|
261
260
|
cargo build --release --features postgres,mysql,sqlite
|
|
262
261
|
|
|
263
|
-
# All features (full functionality, ~
|
|
262
|
+
# All features (full functionality, ~130s compile)
|
|
264
263
|
cargo build --release --all-features
|
|
265
264
|
```
|
|
266
265
|
|
|
@@ -271,6 +270,13 @@ cargo build --release --all-features
|
|
|
271
270
|
- ❌ Small files (<10MB) - standard engine is faster
|
|
272
271
|
- ❌ Mixed/messy data - streaming engine handles better
|
|
273
272
|
|
|
273
|
+
**When to use Parquet?**
|
|
274
|
+
- ✅ Analytics workloads with columnar data
|
|
275
|
+
- ✅ Data lake architectures
|
|
276
|
+
- ✅ Integration with Spark, Pandas, PyArrow
|
|
277
|
+
- ✅ Efficient storage and compression
|
|
278
|
+
- ✅ Type-safe schema preservation
|
|
279
|
+
|
|
274
280
|
### Common Development Tasks
|
|
275
281
|
```bash
|
|
276
282
|
cargo test # Run all tests
|
|
@@ -0,0 +1,6 @@
|
|
|
1
|
+
dataprof-0.4.77.dist-info/METADATA,sha256=3NQ0GGEihhNT4JavpEoi1JEDo8m5XarsQQG2gdO1JBA,11146
|
|
2
|
+
dataprof-0.4.77.dist-info/WHEEL,sha256=bNaa2-XeaoMXnkzV391Sm2NgCjpJ3A2VmfN6ZUnNTZA,96
|
|
3
|
+
dataprof-0.4.77.dist-info/licenses/LICENSE,sha256=pD_29Inf0TmerzrHuH-Lcu2GeD39lNK0_8bDJVkHjos,1090
|
|
4
|
+
dataprof/__init__.py,sha256=84U5MpyP59z3koB4vbdsJg1XQSKYeTS1SC7b3VqwjfU,115
|
|
5
|
+
dataprof/dataprof.cp311-win_amd64.pyd,sha256=bSnHi0lOUmA1wyMwIdqIJ7ntxWx7z1ifct6fmKShaz4,2178048
|
|
6
|
+
dataprof-0.4.77.dist-info/RECORD,,
|
dataprof-0.4.70.dist-info/RECORD
DELETED
|
@@ -1,6 +0,0 @@
|
|
|
1
|
-
dataprof-0.4.70.dist-info/METADATA,sha256=Y66oad9ljoc0d-0aJdkHiLe05x9CVpH1_hR2h1ZSxwc,11044
|
|
2
|
-
dataprof-0.4.70.dist-info/WHEEL,sha256=YCZ9Vxhf2aXNyfoR2QH-PPqnUr48Igr9zjgnGhp3xTc,96
|
|
3
|
-
dataprof-0.4.70.dist-info/licenses/LICENSE,sha256=pD_29Inf0TmerzrHuH-Lcu2GeD39lNK0_8bDJVkHjos,1090
|
|
4
|
-
dataprof/__init__.py,sha256=84U5MpyP59z3koB4vbdsJg1XQSKYeTS1SC7b3VqwjfU,115
|
|
5
|
-
dataprof/dataprof.cp311-win_amd64.pyd,sha256=s5Rc1SqXksqgPVSr34brhdPXe93zDtyrzXs2Rtgzv3Y,2149888
|
|
6
|
-
dataprof-0.4.70.dist-info/RECORD,,
|
|
File without changes
|