eda-wizard 0.1.0__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,21 @@
1
+ MIT License
2
+
3
+ Copyright (c) 2025 SMFRafin
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining a copy
6
+ of this software and associated documentation files (the "Software"), to deal
7
+ in the Software without restriction, including without limitation the rights
8
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
+ copies of the Software, and to permit persons to whom the Software is
10
+ furnished to do so, subject to the following conditions:
11
+
12
+ The above copyright notice and this permission notice shall be included in all
13
+ copies or substantial portions of the Software.
14
+
15
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21
+ SOFTWARE.
@@ -0,0 +1,7 @@
1
+ include LICENSE
2
+ include README.md
3
+ include pyproject.toml
4
+ recursive-include edawizard *.py
5
+ recursive-exclude * __pycache__
6
+ recursive-exclude * *.pyc
7
+ recursive-exclude * *.pyo
@@ -0,0 +1,254 @@
1
+ Metadata-Version: 2.4
2
+ Name: eda-wizard
3
+ Version: 0.1.0
4
+ Summary: Zero-config automated EDA: pass any folder, get charts, stats, and a full HTML report.
5
+ Author: PyEDA Contributors
6
+ License: MIT License
7
+
8
+ Copyright (c) 2025 SMFRafin
9
+
10
+ Permission is hereby granted, free of charge, to any person obtaining a copy
11
+ of this software and associated documentation files (the "Software"), to deal
12
+ in the Software without restriction, including without limitation the rights
13
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
14
+ copies of the Software, and to permit persons to whom the Software is
15
+ furnished to do so, subject to the following conditions:
16
+
17
+ The above copyright notice and this permission notice shall be included in all
18
+ copies or substantial portions of the Software.
19
+
20
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
21
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
22
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
23
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
24
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
25
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
26
+ SOFTWARE.
27
+
28
+ Project-URL: Homepage, https://github.com/SMFRafin/eda-wizard
29
+ Project-URL: Repository, https://github.com/SMFRafin/eda-wizard
30
+ Project-URL: Bug Tracker, https://github.com/SMFRafin/eda-wizard/issues
31
+ Keywords: eda,exploratory-data-analysis,data-science,automation,visualization,pandas,csv,video,audio,machine-learning
32
+ Classifier: Development Status :: 4 - Beta
33
+ Classifier: Intended Audience :: Developers
34
+ Classifier: Intended Audience :: Science/Research
35
+ Classifier: License :: OSI Approved :: MIT License
36
+ Classifier: Programming Language :: Python :: 3
37
+ Classifier: Programming Language :: Python :: 3.9
38
+ Classifier: Programming Language :: Python :: 3.10
39
+ Classifier: Programming Language :: Python :: 3.11
40
+ Classifier: Programming Language :: Python :: 3.12
41
+ Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
42
+ Classifier: Topic :: Scientific/Engineering :: Visualization
43
+ Requires-Python: >=3.9
44
+ Description-Content-Type: text/markdown
45
+ License-File: LICENSE
46
+ Requires-Dist: pandas>=1.5
47
+ Requires-Dist: numpy>=1.23
48
+ Requires-Dist: matplotlib>=3.6
49
+ Requires-Dist: seaborn>=0.12
50
+ Requires-Dist: openpyxl>=3.0
51
+ Requires-Dist: xlrd>=2.0
52
+ Requires-Dist: pyarrow>=10.0
53
+ Provides-Extra: video
54
+ Requires-Dist: opencv-python-headless>=4.7; extra == "video"
55
+ Provides-Extra: audio
56
+ Requires-Dist: mutagen>=1.46; extra == "audio"
57
+ Provides-Extra: image
58
+ Requires-Dist: Pillow>=9.0; extra == "image"
59
+ Provides-Extra: media
60
+ Requires-Dist: opencv-python-headless>=4.7; extra == "media"
61
+ Requires-Dist: mutagen>=1.46; extra == "media"
62
+ Requires-Dist: Pillow>=9.0; extra == "media"
63
+ Provides-Extra: full
64
+ Requires-Dist: eda-wizard[media]; extra == "full"
65
+ Provides-Extra: dev
66
+ Requires-Dist: pytest>=7; extra == "dev"
67
+ Requires-Dist: pytest-cov; extra == "dev"
68
+ Requires-Dist: ruff; extra == "dev"
69
+ Requires-Dist: build; extra == "dev"
70
+ Requires-Dist: twine; extra == "dev"
71
+ Dynamic: license-file
72
+
73
+ # ⬡ EDA Wizard — Zero-Config Automated EDA
74
+
75
+ > Pass any folder. Get a full EDA report. No setup.
76
+
77
+ ```python
78
+ import edawizard
79
+ edawizard.eda("any/folder")
80
+ ```
81
+
82
+ ---
83
+
84
+ ## What it does
85
+
86
+ EDA Wizard recursively scans a folder, detects every supported file type, runs comprehensive exploratory data analysis on each one, and generates a **self-contained `report.html`** plus organized CSVs — all in one call.
87
+
88
+ ### Supported file types
89
+
90
+ | Category | Formats |
91
+ |----------|---------|
92
+ | **Tabular** | CSV, TSV, Excel (.xlsx/.xls), JSON, Parquet, Feather, Pickle |
93
+ | **Video** | MP4, AVI, MKV, MOV, WMV, FLV, WebM, MPEG, 3GP, TS … |
94
+ | **Audio** | MP3, WAV, FLAC, AAC, OGG, M4A, WMA, OPUS, AIFF … |
95
+ | **Image** | PNG, JPG, JPEG, WebP, TIFF, BMP, GIF |
96
+
97
+ ---
98
+
99
+ ## Installation
100
+
101
+ ### Core (tabular data only — zero extra deps):
102
+ ```bash
103
+ pip install eda-wizard
104
+ ```
105
+
106
+ ### With full media support (video + audio + image):
107
+ ```bash
108
+ pip install "eda-wizard[full]"
109
+ ```
110
+
111
+ ### Pick what you need:
112
+ ```bash
113
+ pip install "eda-wizard[video]" # adds opencv-python-headless
114
+ pip install "eda-wizard[audio]" # adds mutagen
115
+ pip install "eda-wizard[image]" # adds Pillow
116
+ ```
117
+
118
+ > **Video fallback:** If `opencv-python-headless` is not installed, EDA Wizard automatically tries `ffprobe` (part of [FFmpeg](https://ffmpeg.org)) if available on your PATH.
119
+
120
+ ---
121
+
122
+ ## Usage
123
+
124
+ ```python
125
+ import edawizard
126
+
127
+ # Scan a folder — all supported files analysed automatically
128
+ edawizard.eda("path/to/any/folder")
129
+
130
+ # Custom output folder name
131
+ edawizard.eda("my_data", output_name="eda_results")
132
+
133
+ # Disable row sampling (analyse all rows, slower on huge files)
134
+ edawizard.eda("my_data", max_rows=0)
135
+
136
+ # Silent mode
137
+ edawizard.eda("my_data", verbose=False)
138
+ ```
139
+
140
+ ---
141
+
142
+ ## Output structure
143
+
144
+ ```
145
+ PyEDA_Output/
146
+ ├── report.html ← self-contained interactive report (open in browser)
147
+ ├── summary.csv ← overview of all files
148
+
149
+ ├── sales/ ← one folder per tabular file
150
+ │ ├── summary_stats.csv
151
+ │ ├── missing_values.csv
152
+ │ ├── correlation_matrix.csv
153
+ │ └── value_counts/
154
+ │ ├── region.csv
155
+ │ └── category.csv
156
+
157
+ ├── media_video/
158
+ │ └── metadata.csv ← FPS, resolution, duration, codec, bitrate per file
159
+
160
+ ├── media_audio/
161
+ │ └── metadata.csv ← sample rate, channels, bitrate, duration per file
162
+
163
+ └── media_image/
164
+ └── metadata.csv ← width, height, megapixels, mode, DPI per file
165
+ ```
166
+
167
+ ---
168
+
169
+ ## What's in the report
170
+
171
+ ### Tabular datasets
172
+ - Shape, memory usage, dtype breakdown
173
+ - Missing value counts + heatmap (per column)
174
+ - Duplicate row detection
175
+ - Descriptive statistics (mean, std, percentiles, skew, kurtosis)
176
+ - Outlier detection (IQR method) with fences
177
+ - Correlation matrix heatmap + top correlated pairs
178
+ - Distribution histograms + KDE per numeric column
179
+ - Box plots per numeric column
180
+ - Top-value bar charts per categorical column
181
+ - Zero / negative counts per numeric column
182
+
183
+ ### Video datasets
184
+ - Duration distribution + total hours
185
+ - FPS distribution (discrete bar or continuous histogram)
186
+ - Resolution scatter plot (width × height bubble map)
187
+ - Aspect ratio breakdown
188
+ - Bitrate distribution
189
+ - Codec distribution
190
+
191
+ ### Audio datasets
192
+ - Duration distribution + total hours
193
+ - Sample rate (Hz) distribution — discrete bar chart
194
+ - Channel layout (Mono / Stereo / Surround) pie chart
195
+ - Bitrate distribution
196
+ - Codec distribution
197
+
198
+ ### Image datasets
199
+ - Resolution scatter map
200
+ - Megapixel distribution
201
+ - Color mode breakdown (RGB, RGBA, Grayscale …)
202
+ - Aspect ratio breakdown
203
+ - File size distribution
204
+
205
+ ---
206
+
207
+ ## How media extraction works
208
+
209
+ ```
210
+ Video → opencv-python-headless → ffprobe (subprocess) → file size only
211
+ Audio → mutagen → ffprobe (subprocess) → file size only
212
+ Image → Pillow → file size only
213
+ ```
214
+
215
+ EDA Wizard degrades gracefully: if a library isn't installed it tries the next option and prints a tip at startup.
216
+
217
+ ---
218
+
219
+ ## Requirements
220
+
221
+ | Package | Purpose | Optional? |
222
+ |---------|---------|-----------|
223
+ | `pandas` | Data loading & stats | No |
224
+ | `numpy` | Numerics | No |
225
+ | `matplotlib` | Chart rendering | No |
226
+ | `seaborn` | Chart styling | No |
227
+ | `openpyxl` | Excel .xlsx | No |
228
+ | `pyarrow` | Parquet / Feather | No |
229
+ | `opencv-python-headless` | Video metadata | Yes (`pip install "eda-wizard[video]"`) |
230
+ | `mutagen` | Audio metadata | Yes (`pip install "eda-wizard[audio]"`) |
231
+ | `Pillow` | Image metadata | Yes (`pip install "eda-wizard[image]"`) |
232
+
233
+ ---
234
+
235
+ ## Contributing
236
+
237
+ ```bash
238
+ git clone https://github.com/SMFRafin/eda-wizard
239
+ cd eda-wizard
240
+ pip install -e ".[dev]"
241
+ pytest
242
+ ```
243
+
244
+ PRs welcome! Especially for:
245
+ - Additional file type support
246
+ - LLM-generated insight summaries
247
+ - More chart types
248
+ - Performance improvements for very large datasets
249
+
250
+ ---
251
+
252
+ ## License
253
+
254
+ MIT
@@ -0,0 +1,182 @@
1
+ # ⬡ EDA Wizard — Zero-Config Automated EDA
2
+
3
+ > Pass any folder. Get a full EDA report. No setup.
4
+
5
+ ```python
6
+ import edawizard
7
+ edawizard.eda("any/folder")
8
+ ```
9
+
10
+ ---
11
+
12
+ ## What it does
13
+
14
+ EDA Wizard recursively scans a folder, detects every supported file type, runs comprehensive exploratory data analysis on each one, and generates a **self-contained `report.html`** plus organized CSVs — all in one call.
15
+
16
+ ### Supported file types
17
+
18
+ | Category | Formats |
19
+ |----------|---------|
20
+ | **Tabular** | CSV, TSV, Excel (.xlsx/.xls), JSON, Parquet, Feather, Pickle |
21
+ | **Video** | MP4, AVI, MKV, MOV, WMV, FLV, WebM, MPEG, 3GP, TS … |
22
+ | **Audio** | MP3, WAV, FLAC, AAC, OGG, M4A, WMA, OPUS, AIFF … |
23
+ | **Image** | PNG, JPG, JPEG, WebP, TIFF, BMP, GIF |
24
+
25
+ ---
26
+
27
+ ## Installation
28
+
29
+ ### Core (tabular data only — zero extra deps):
30
+ ```bash
31
+ pip install eda-wizard
32
+ ```
33
+
34
+ ### With full media support (video + audio + image):
35
+ ```bash
36
+ pip install "eda-wizard[full]"
37
+ ```
38
+
39
+ ### Pick what you need:
40
+ ```bash
41
+ pip install "eda-wizard[video]" # adds opencv-python-headless
42
+ pip install "eda-wizard[audio]" # adds mutagen
43
+ pip install "eda-wizard[image]" # adds Pillow
44
+ ```
45
+
46
+ > **Video fallback:** If `opencv-python-headless` is not installed, EDA Wizard automatically tries `ffprobe` (part of [FFmpeg](https://ffmpeg.org)) if available on your PATH.
47
+
48
+ ---
49
+
50
+ ## Usage
51
+
52
+ ```python
53
+ import edawizard
54
+
55
+ # Scan a folder — all supported files analysed automatically
56
+ edawizard.eda("path/to/any/folder")
57
+
58
+ # Custom output folder name
59
+ edawizard.eda("my_data", output_name="eda_results")
60
+
61
+ # Disable row sampling (analyse all rows, slower on huge files)
62
+ edawizard.eda("my_data", max_rows=0)
63
+
64
+ # Silent mode
65
+ edawizard.eda("my_data", verbose=False)
66
+ ```
67
+
68
+ ---
69
+
70
+ ## Output structure
71
+
72
+ ```
73
+ PyEDA_Output/
74
+ ├── report.html ← self-contained interactive report (open in browser)
75
+ ├── summary.csv ← overview of all files
76
+
77
+ ├── sales/ ← one folder per tabular file
78
+ │ ├── summary_stats.csv
79
+ │ ├── missing_values.csv
80
+ │ ├── correlation_matrix.csv
81
+ │ └── value_counts/
82
+ │ ├── region.csv
83
+ │ └── category.csv
84
+
85
+ ├── media_video/
86
+ │ └── metadata.csv ← FPS, resolution, duration, codec, bitrate per file
87
+
88
+ ├── media_audio/
89
+ │ └── metadata.csv ← sample rate, channels, bitrate, duration per file
90
+
91
+ └── media_image/
92
+ └── metadata.csv ← width, height, megapixels, mode, DPI per file
93
+ ```
94
+
95
+ ---
96
+
97
+ ## What's in the report
98
+
99
+ ### Tabular datasets
100
+ - Shape, memory usage, dtype breakdown
101
+ - Missing value counts + heatmap (per column)
102
+ - Duplicate row detection
103
+ - Descriptive statistics (mean, std, percentiles, skew, kurtosis)
104
+ - Outlier detection (IQR method) with fences
105
+ - Correlation matrix heatmap + top correlated pairs
106
+ - Distribution histograms + KDE per numeric column
107
+ - Box plots per numeric column
108
+ - Top-value bar charts per categorical column
109
+ - Zero / negative counts per numeric column
110
+
111
+ ### Video datasets
112
+ - Duration distribution + total hours
113
+ - FPS distribution (discrete bar or continuous histogram)
114
+ - Resolution scatter plot (width × height bubble map)
115
+ - Aspect ratio breakdown
116
+ - Bitrate distribution
117
+ - Codec distribution
118
+
119
+ ### Audio datasets
120
+ - Duration distribution + total hours
121
+ - Sample rate (Hz) distribution — discrete bar chart
122
+ - Channel layout (Mono / Stereo / Surround) pie chart
123
+ - Bitrate distribution
124
+ - Codec distribution
125
+
126
+ ### Image datasets
127
+ - Resolution scatter map
128
+ - Megapixel distribution
129
+ - Color mode breakdown (RGB, RGBA, Grayscale …)
130
+ - Aspect ratio breakdown
131
+ - File size distribution
132
+
133
+ ---
134
+
135
+ ## How media extraction works
136
+
137
+ ```
138
+ Video → opencv-python-headless → ffprobe (subprocess) → file size only
139
+ Audio → mutagen → ffprobe (subprocess) → file size only
140
+ Image → Pillow → file size only
141
+ ```
142
+
143
+ EDA Wizard degrades gracefully: if a library isn't installed it tries the next option and prints a tip at startup.
144
+
145
+ ---
146
+
147
+ ## Requirements
148
+
149
+ | Package | Purpose | Optional? |
150
+ |---------|---------|-----------|
151
+ | `pandas` | Data loading & stats | No |
152
+ | `numpy` | Numerics | No |
153
+ | `matplotlib` | Chart rendering | No |
154
+ | `seaborn` | Chart styling | No |
155
+ | `openpyxl` | Excel .xlsx | No |
156
+ | `pyarrow` | Parquet / Feather | No |
157
+ | `opencv-python-headless` | Video metadata | Yes (`pip install "eda-wizard[video]"`) |
158
+ | `mutagen` | Audio metadata | Yes (`pip install "eda-wizard[audio]"`) |
159
+ | `Pillow` | Image metadata | Yes (`pip install "eda-wizard[image]"`) |
160
+
161
+ ---
162
+
163
+ ## Contributing
164
+
165
+ ```bash
166
+ git clone https://github.com/SMFRafin/eda-wizard
167
+ cd eda-wizard
168
+ pip install -e ".[dev]"
169
+ pytest
170
+ ```
171
+
172
+ PRs welcome! Especially for:
173
+ - Additional file type support
174
+ - LLM-generated insight summaries
175
+ - More chart types
176
+ - Performance improvements for very large datasets
177
+
178
+ ---
179
+
180
+ ## License
181
+
182
+ MIT