eda-wizard 0.1.0__tar.gz
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- eda_wizard-0.1.0/LICENSE +21 -0
- eda_wizard-0.1.0/MANIFEST.in +7 -0
- eda_wizard-0.1.0/PKG-INFO +254 -0
- eda_wizard-0.1.0/README.md +182 -0
- eda_wizard-0.1.0/eda_wizard.egg-info/PKG-INFO +254 -0
- eda_wizard-0.1.0/eda_wizard.egg-info/SOURCES.txt +26 -0
- eda_wizard-0.1.0/eda_wizard.egg-info/dependency_links.txt +1 -0
- eda_wizard-0.1.0/eda_wizard.egg-info/entry_points.txt +2 -0
- eda_wizard-0.1.0/eda_wizard.egg-info/requires.txt +31 -0
- eda_wizard-0.1.0/eda_wizard.egg-info/top_level.txt +1 -0
- eda_wizard-0.1.0/edawizard/__init__.py +20 -0
- eda_wizard-0.1.0/edawizard/__main__.py +79 -0
- eda_wizard-0.1.0/edawizard/analyzer.py +360 -0
- eda_wizard-0.1.0/edawizard/core.py +197 -0
- eda_wizard-0.1.0/edawizard/detector.py +104 -0
- eda_wizard-0.1.0/edawizard/media_analyzer.py +554 -0
- eda_wizard-0.1.0/edawizard/media_visualizer.py +416 -0
- eda_wizard-0.1.0/edawizard/py.typed +0 -0
- eda_wizard-0.1.0/edawizard/reporter.py +991 -0
- eda_wizard-0.1.0/edawizard/utils.py +54 -0
- eda_wizard-0.1.0/edawizard/visualizer.py +360 -0
- eda_wizard-0.1.0/pyproject.toml +76 -0
- eda_wizard-0.1.0/setup.cfg +4 -0
- eda_wizard-0.1.0/tests/test_analyzer.py +112 -0
- eda_wizard-0.1.0/tests/test_cli.py +51 -0
- eda_wizard-0.1.0/tests/test_detector.py +59 -0
- eda_wizard-0.1.0/tests/test_eda_integration.py +118 -0
- eda_wizard-0.1.0/tests/test_media_analyzer.py +75 -0
eda_wizard-0.1.0/LICENSE
ADDED
|
@@ -0,0 +1,21 @@
|
|
|
1
|
+
MIT License
|
|
2
|
+
|
|
3
|
+
Copyright (c) 2025 SMFRafin
|
|
4
|
+
|
|
5
|
+
Permission is hereby granted, free of charge, to any person obtaining a copy
|
|
6
|
+
of this software and associated documentation files (the "Software"), to deal
|
|
7
|
+
in the Software without restriction, including without limitation the rights
|
|
8
|
+
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
|
|
9
|
+
copies of the Software, and to permit persons to whom the Software is
|
|
10
|
+
furnished to do so, subject to the following conditions:
|
|
11
|
+
|
|
12
|
+
The above copyright notice and this permission notice shall be included in all
|
|
13
|
+
copies or substantial portions of the Software.
|
|
14
|
+
|
|
15
|
+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
|
16
|
+
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
|
17
|
+
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
|
|
18
|
+
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
|
19
|
+
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
|
|
20
|
+
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
|
|
21
|
+
SOFTWARE.
|
|
@@ -0,0 +1,254 @@
|
|
|
1
|
+
Metadata-Version: 2.4
|
|
2
|
+
Name: eda-wizard
|
|
3
|
+
Version: 0.1.0
|
|
4
|
+
Summary: Zero-config automated EDA: pass any folder, get charts, stats, and a full HTML report.
|
|
5
|
+
Author: PyEDA Contributors
|
|
6
|
+
License: MIT License
|
|
7
|
+
|
|
8
|
+
Copyright (c) 2025 SMFRafin
|
|
9
|
+
|
|
10
|
+
Permission is hereby granted, free of charge, to any person obtaining a copy
|
|
11
|
+
of this software and associated documentation files (the "Software"), to deal
|
|
12
|
+
in the Software without restriction, including without limitation the rights
|
|
13
|
+
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
|
|
14
|
+
copies of the Software, and to permit persons to whom the Software is
|
|
15
|
+
furnished to do so, subject to the following conditions:
|
|
16
|
+
|
|
17
|
+
The above copyright notice and this permission notice shall be included in all
|
|
18
|
+
copies or substantial portions of the Software.
|
|
19
|
+
|
|
20
|
+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
|
21
|
+
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
|
22
|
+
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
|
|
23
|
+
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
|
24
|
+
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
|
|
25
|
+
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
|
|
26
|
+
SOFTWARE.
|
|
27
|
+
|
|
28
|
+
Project-URL: Homepage, https://github.com/SMFRafin/eda-wizard
|
|
29
|
+
Project-URL: Repository, https://github.com/SMFRafin/eda-wizard
|
|
30
|
+
Project-URL: Bug Tracker, https://github.com/SMFRafin/eda-wizard/issues
|
|
31
|
+
Keywords: eda,exploratory-data-analysis,data-science,automation,visualization,pandas,csv,video,audio,machine-learning
|
|
32
|
+
Classifier: Development Status :: 4 - Beta
|
|
33
|
+
Classifier: Intended Audience :: Developers
|
|
34
|
+
Classifier: Intended Audience :: Science/Research
|
|
35
|
+
Classifier: License :: OSI Approved :: MIT License
|
|
36
|
+
Classifier: Programming Language :: Python :: 3
|
|
37
|
+
Classifier: Programming Language :: Python :: 3.9
|
|
38
|
+
Classifier: Programming Language :: Python :: 3.10
|
|
39
|
+
Classifier: Programming Language :: Python :: 3.11
|
|
40
|
+
Classifier: Programming Language :: Python :: 3.12
|
|
41
|
+
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
|
|
42
|
+
Classifier: Topic :: Scientific/Engineering :: Visualization
|
|
43
|
+
Requires-Python: >=3.9
|
|
44
|
+
Description-Content-Type: text/markdown
|
|
45
|
+
License-File: LICENSE
|
|
46
|
+
Requires-Dist: pandas>=1.5
|
|
47
|
+
Requires-Dist: numpy>=1.23
|
|
48
|
+
Requires-Dist: matplotlib>=3.6
|
|
49
|
+
Requires-Dist: seaborn>=0.12
|
|
50
|
+
Requires-Dist: openpyxl>=3.0
|
|
51
|
+
Requires-Dist: xlrd>=2.0
|
|
52
|
+
Requires-Dist: pyarrow>=10.0
|
|
53
|
+
Provides-Extra: video
|
|
54
|
+
Requires-Dist: opencv-python-headless>=4.7; extra == "video"
|
|
55
|
+
Provides-Extra: audio
|
|
56
|
+
Requires-Dist: mutagen>=1.46; extra == "audio"
|
|
57
|
+
Provides-Extra: image
|
|
58
|
+
Requires-Dist: Pillow>=9.0; extra == "image"
|
|
59
|
+
Provides-Extra: media
|
|
60
|
+
Requires-Dist: opencv-python-headless>=4.7; extra == "media"
|
|
61
|
+
Requires-Dist: mutagen>=1.46; extra == "media"
|
|
62
|
+
Requires-Dist: Pillow>=9.0; extra == "media"
|
|
63
|
+
Provides-Extra: full
|
|
64
|
+
Requires-Dist: eda-wizard[media]; extra == "full"
|
|
65
|
+
Provides-Extra: dev
|
|
66
|
+
Requires-Dist: pytest>=7; extra == "dev"
|
|
67
|
+
Requires-Dist: pytest-cov; extra == "dev"
|
|
68
|
+
Requires-Dist: ruff; extra == "dev"
|
|
69
|
+
Requires-Dist: build; extra == "dev"
|
|
70
|
+
Requires-Dist: twine; extra == "dev"
|
|
71
|
+
Dynamic: license-file
|
|
72
|
+
|
|
73
|
+
# ⬡ EDA Wizard — Zero-Config Automated EDA
|
|
74
|
+
|
|
75
|
+
> Pass any folder. Get a full EDA report. No setup.
|
|
76
|
+
|
|
77
|
+
```python
|
|
78
|
+
import edawizard
|
|
79
|
+
edawizard.eda("any/folder")
|
|
80
|
+
```
|
|
81
|
+
|
|
82
|
+
---
|
|
83
|
+
|
|
84
|
+
## What it does
|
|
85
|
+
|
|
86
|
+
EDA Wizard recursively scans a folder, detects every supported file type, runs comprehensive exploratory data analysis on each one, and generates a **self-contained `report.html`** plus organized CSVs — all in one call.
|
|
87
|
+
|
|
88
|
+
### Supported file types
|
|
89
|
+
|
|
90
|
+
| Category | Formats |
|
|
91
|
+
|----------|---------|
|
|
92
|
+
| **Tabular** | CSV, TSV, Excel (.xlsx/.xls), JSON, Parquet, Feather, Pickle |
|
|
93
|
+
| **Video** | MP4, AVI, MKV, MOV, WMV, FLV, WebM, MPEG, 3GP, TS … |
|
|
94
|
+
| **Audio** | MP3, WAV, FLAC, AAC, OGG, M4A, WMA, OPUS, AIFF … |
|
|
95
|
+
| **Image** | PNG, JPG, JPEG, WebP, TIFF, BMP, GIF |
|
|
96
|
+
|
|
97
|
+
---
|
|
98
|
+
|
|
99
|
+
## Installation
|
|
100
|
+
|
|
101
|
+
### Core (tabular data only — zero extra deps):
|
|
102
|
+
```bash
|
|
103
|
+
pip install eda-wizard
|
|
104
|
+
```
|
|
105
|
+
|
|
106
|
+
### With full media support (video + audio + image):
|
|
107
|
+
```bash
|
|
108
|
+
pip install "eda-wizard[full]"
|
|
109
|
+
```
|
|
110
|
+
|
|
111
|
+
### Pick what you need:
|
|
112
|
+
```bash
|
|
113
|
+
pip install "eda-wizard[video]" # adds opencv-python-headless
|
|
114
|
+
pip install "eda-wizard[audio]" # adds mutagen
|
|
115
|
+
pip install "eda-wizard[image]" # adds Pillow
|
|
116
|
+
```
|
|
117
|
+
|
|
118
|
+
> **Video fallback:** If `opencv-python-headless` is not installed, EDA Wizard automatically tries `ffprobe` (part of [FFmpeg](https://ffmpeg.org)) if available on your PATH.
|
|
119
|
+
|
|
120
|
+
---
|
|
121
|
+
|
|
122
|
+
## Usage
|
|
123
|
+
|
|
124
|
+
```python
|
|
125
|
+
import edawizard
|
|
126
|
+
|
|
127
|
+
# Scan a folder — all supported files analysed automatically
|
|
128
|
+
edawizard.eda("path/to/any/folder")
|
|
129
|
+
|
|
130
|
+
# Custom output folder name
|
|
131
|
+
edawizard.eda("my_data", output_name="eda_results")
|
|
132
|
+
|
|
133
|
+
# Disable row sampling (analyse all rows, slower on huge files)
|
|
134
|
+
edawizard.eda("my_data", max_rows=0)
|
|
135
|
+
|
|
136
|
+
# Silent mode
|
|
137
|
+
edawizard.eda("my_data", verbose=False)
|
|
138
|
+
```
|
|
139
|
+
|
|
140
|
+
---
|
|
141
|
+
|
|
142
|
+
## Output structure
|
|
143
|
+
|
|
144
|
+
```
|
|
145
|
+
PyEDA_Output/
|
|
146
|
+
├── report.html ← self-contained interactive report (open in browser)
|
|
147
|
+
├── summary.csv ← overview of all files
|
|
148
|
+
│
|
|
149
|
+
├── sales/ ← one folder per tabular file
|
|
150
|
+
│ ├── summary_stats.csv
|
|
151
|
+
│ ├── missing_values.csv
|
|
152
|
+
│ ├── correlation_matrix.csv
|
|
153
|
+
│ └── value_counts/
|
|
154
|
+
│ ├── region.csv
|
|
155
|
+
│ └── category.csv
|
|
156
|
+
│
|
|
157
|
+
├── media_video/
|
|
158
|
+
│ └── metadata.csv ← FPS, resolution, duration, codec, bitrate per file
|
|
159
|
+
│
|
|
160
|
+
├── media_audio/
|
|
161
|
+
│ └── metadata.csv ← sample rate, channels, bitrate, duration per file
|
|
162
|
+
│
|
|
163
|
+
└── media_image/
|
|
164
|
+
└── metadata.csv ← width, height, megapixels, mode, DPI per file
|
|
165
|
+
```
|
|
166
|
+
|
|
167
|
+
---
|
|
168
|
+
|
|
169
|
+
## What's in the report
|
|
170
|
+
|
|
171
|
+
### Tabular datasets
|
|
172
|
+
- Shape, memory usage, dtype breakdown
|
|
173
|
+
- Missing value counts + heatmap (per column)
|
|
174
|
+
- Duplicate row detection
|
|
175
|
+
- Descriptive statistics (mean, std, percentiles, skew, kurtosis)
|
|
176
|
+
- Outlier detection (IQR method) with fences
|
|
177
|
+
- Correlation matrix heatmap + top correlated pairs
|
|
178
|
+
- Distribution histograms + KDE per numeric column
|
|
179
|
+
- Box plots per numeric column
|
|
180
|
+
- Top-value bar charts per categorical column
|
|
181
|
+
- Zero / negative counts per numeric column
|
|
182
|
+
|
|
183
|
+
### Video datasets
|
|
184
|
+
- Duration distribution + total hours
|
|
185
|
+
- FPS distribution (discrete bar or continuous histogram)
|
|
186
|
+
- Resolution scatter plot (width × height bubble map)
|
|
187
|
+
- Aspect ratio breakdown
|
|
188
|
+
- Bitrate distribution
|
|
189
|
+
- Codec distribution
|
|
190
|
+
|
|
191
|
+
### Audio datasets
|
|
192
|
+
- Duration distribution + total hours
|
|
193
|
+
- Sample rate (Hz) distribution — discrete bar chart
|
|
194
|
+
- Channel layout (Mono / Stereo / Surround) pie chart
|
|
195
|
+
- Bitrate distribution
|
|
196
|
+
- Codec distribution
|
|
197
|
+
|
|
198
|
+
### Image datasets
|
|
199
|
+
- Resolution scatter map
|
|
200
|
+
- Megapixel distribution
|
|
201
|
+
- Color mode breakdown (RGB, RGBA, Grayscale …)
|
|
202
|
+
- Aspect ratio breakdown
|
|
203
|
+
- File size distribution
|
|
204
|
+
|
|
205
|
+
---
|
|
206
|
+
|
|
207
|
+
## How media extraction works
|
|
208
|
+
|
|
209
|
+
```
|
|
210
|
+
Video → opencv-python-headless → ffprobe (subprocess) → file size only
|
|
211
|
+
Audio → mutagen → ffprobe (subprocess) → file size only
|
|
212
|
+
Image → Pillow → file size only
|
|
213
|
+
```
|
|
214
|
+
|
|
215
|
+
EDA Wizard degrades gracefully: if a library isn't installed it tries the next option and prints a tip at startup.
|
|
216
|
+
|
|
217
|
+
---
|
|
218
|
+
|
|
219
|
+
## Requirements
|
|
220
|
+
|
|
221
|
+
| Package | Purpose | Optional? |
|
|
222
|
+
|---------|---------|-----------|
|
|
223
|
+
| `pandas` | Data loading & stats | No |
|
|
224
|
+
| `numpy` | Numerics | No |
|
|
225
|
+
| `matplotlib` | Chart rendering | No |
|
|
226
|
+
| `seaborn` | Chart styling | No |
|
|
227
|
+
| `openpyxl` | Excel .xlsx | No |
|
|
228
|
+
| `pyarrow` | Parquet / Feather | No |
|
|
229
|
+
| `opencv-python-headless` | Video metadata | Yes (`pip install "eda-wizard[video]"`) |
|
|
230
|
+
| `mutagen` | Audio metadata | Yes (`pip install "eda-wizard[audio]"`) |
|
|
231
|
+
| `Pillow` | Image metadata | Yes (`pip install "eda-wizard[image]"`) |
|
|
232
|
+
|
|
233
|
+
---
|
|
234
|
+
|
|
235
|
+
## Contributing
|
|
236
|
+
|
|
237
|
+
```bash
|
|
238
|
+
git clone https://github.com/SMFRafin/eda-wizard
|
|
239
|
+
cd eda-wizard
|
|
240
|
+
pip install -e ".[dev]"
|
|
241
|
+
pytest
|
|
242
|
+
```
|
|
243
|
+
|
|
244
|
+
PRs welcome! Especially for:
|
|
245
|
+
- Additional file type support
|
|
246
|
+
- LLM-generated insight summaries
|
|
247
|
+
- More chart types
|
|
248
|
+
- Performance improvements for very large datasets
|
|
249
|
+
|
|
250
|
+
---
|
|
251
|
+
|
|
252
|
+
## License
|
|
253
|
+
|
|
254
|
+
MIT
|
|
@@ -0,0 +1,182 @@
|
|
|
1
|
+
# ⬡ EDA Wizard — Zero-Config Automated EDA
|
|
2
|
+
|
|
3
|
+
> Pass any folder. Get a full EDA report. No setup.
|
|
4
|
+
|
|
5
|
+
```python
|
|
6
|
+
import edawizard
|
|
7
|
+
edawizard.eda("any/folder")
|
|
8
|
+
```
|
|
9
|
+
|
|
10
|
+
---
|
|
11
|
+
|
|
12
|
+
## What it does
|
|
13
|
+
|
|
14
|
+
EDA Wizard recursively scans a folder, detects every supported file type, runs comprehensive exploratory data analysis on each one, and generates a **self-contained `report.html`** plus organized CSVs — all in one call.
|
|
15
|
+
|
|
16
|
+
### Supported file types
|
|
17
|
+
|
|
18
|
+
| Category | Formats |
|
|
19
|
+
|----------|---------|
|
|
20
|
+
| **Tabular** | CSV, TSV, Excel (.xlsx/.xls), JSON, Parquet, Feather, Pickle |
|
|
21
|
+
| **Video** | MP4, AVI, MKV, MOV, WMV, FLV, WebM, MPEG, 3GP, TS … |
|
|
22
|
+
| **Audio** | MP3, WAV, FLAC, AAC, OGG, M4A, WMA, OPUS, AIFF … |
|
|
23
|
+
| **Image** | PNG, JPG, JPEG, WebP, TIFF, BMP, GIF |
|
|
24
|
+
|
|
25
|
+
---
|
|
26
|
+
|
|
27
|
+
## Installation
|
|
28
|
+
|
|
29
|
+
### Core (tabular data only — zero extra deps):
|
|
30
|
+
```bash
|
|
31
|
+
pip install eda-wizard
|
|
32
|
+
```
|
|
33
|
+
|
|
34
|
+
### With full media support (video + audio + image):
|
|
35
|
+
```bash
|
|
36
|
+
pip install "eda-wizard[full]"
|
|
37
|
+
```
|
|
38
|
+
|
|
39
|
+
### Pick what you need:
|
|
40
|
+
```bash
|
|
41
|
+
pip install "eda-wizard[video]" # adds opencv-python-headless
|
|
42
|
+
pip install "eda-wizard[audio]" # adds mutagen
|
|
43
|
+
pip install "eda-wizard[image]" # adds Pillow
|
|
44
|
+
```
|
|
45
|
+
|
|
46
|
+
> **Video fallback:** If `opencv-python-headless` is not installed, EDA Wizard automatically tries `ffprobe` (part of [FFmpeg](https://ffmpeg.org)) if available on your PATH.
|
|
47
|
+
|
|
48
|
+
---
|
|
49
|
+
|
|
50
|
+
## Usage
|
|
51
|
+
|
|
52
|
+
```python
|
|
53
|
+
import edawizard
|
|
54
|
+
|
|
55
|
+
# Scan a folder — all supported files analysed automatically
|
|
56
|
+
edawizard.eda("path/to/any/folder")
|
|
57
|
+
|
|
58
|
+
# Custom output folder name
|
|
59
|
+
edawizard.eda("my_data", output_name="eda_results")
|
|
60
|
+
|
|
61
|
+
# Disable row sampling (analyse all rows, slower on huge files)
|
|
62
|
+
edawizard.eda("my_data", max_rows=0)
|
|
63
|
+
|
|
64
|
+
# Silent mode
|
|
65
|
+
edawizard.eda("my_data", verbose=False)
|
|
66
|
+
```
|
|
67
|
+
|
|
68
|
+
---
|
|
69
|
+
|
|
70
|
+
## Output structure
|
|
71
|
+
|
|
72
|
+
```
|
|
73
|
+
PyEDA_Output/
|
|
74
|
+
├── report.html ← self-contained interactive report (open in browser)
|
|
75
|
+
├── summary.csv ← overview of all files
|
|
76
|
+
│
|
|
77
|
+
├── sales/ ← one folder per tabular file
|
|
78
|
+
│ ├── summary_stats.csv
|
|
79
|
+
│ ├── missing_values.csv
|
|
80
|
+
│ ├── correlation_matrix.csv
|
|
81
|
+
│ └── value_counts/
|
|
82
|
+
│ ├── region.csv
|
|
83
|
+
│ └── category.csv
|
|
84
|
+
│
|
|
85
|
+
├── media_video/
|
|
86
|
+
│ └── metadata.csv ← FPS, resolution, duration, codec, bitrate per file
|
|
87
|
+
│
|
|
88
|
+
├── media_audio/
|
|
89
|
+
│ └── metadata.csv ← sample rate, channels, bitrate, duration per file
|
|
90
|
+
│
|
|
91
|
+
└── media_image/
|
|
92
|
+
└── metadata.csv ← width, height, megapixels, mode, DPI per file
|
|
93
|
+
```
|
|
94
|
+
|
|
95
|
+
---
|
|
96
|
+
|
|
97
|
+
## What's in the report
|
|
98
|
+
|
|
99
|
+
### Tabular datasets
|
|
100
|
+
- Shape, memory usage, dtype breakdown
|
|
101
|
+
- Missing value counts + heatmap (per column)
|
|
102
|
+
- Duplicate row detection
|
|
103
|
+
- Descriptive statistics (mean, std, percentiles, skew, kurtosis)
|
|
104
|
+
- Outlier detection (IQR method) with fences
|
|
105
|
+
- Correlation matrix heatmap + top correlated pairs
|
|
106
|
+
- Distribution histograms + KDE per numeric column
|
|
107
|
+
- Box plots per numeric column
|
|
108
|
+
- Top-value bar charts per categorical column
|
|
109
|
+
- Zero / negative counts per numeric column
|
|
110
|
+
|
|
111
|
+
### Video datasets
|
|
112
|
+
- Duration distribution + total hours
|
|
113
|
+
- FPS distribution (discrete bar or continuous histogram)
|
|
114
|
+
- Resolution scatter plot (width × height bubble map)
|
|
115
|
+
- Aspect ratio breakdown
|
|
116
|
+
- Bitrate distribution
|
|
117
|
+
- Codec distribution
|
|
118
|
+
|
|
119
|
+
### Audio datasets
|
|
120
|
+
- Duration distribution + total hours
|
|
121
|
+
- Sample rate (Hz) distribution — discrete bar chart
|
|
122
|
+
- Channel layout (Mono / Stereo / Surround) pie chart
|
|
123
|
+
- Bitrate distribution
|
|
124
|
+
- Codec distribution
|
|
125
|
+
|
|
126
|
+
### Image datasets
|
|
127
|
+
- Resolution scatter map
|
|
128
|
+
- Megapixel distribution
|
|
129
|
+
- Color mode breakdown (RGB, RGBA, Grayscale …)
|
|
130
|
+
- Aspect ratio breakdown
|
|
131
|
+
- File size distribution
|
|
132
|
+
|
|
133
|
+
---
|
|
134
|
+
|
|
135
|
+
## How media extraction works
|
|
136
|
+
|
|
137
|
+
```
|
|
138
|
+
Video → opencv-python-headless → ffprobe (subprocess) → file size only
|
|
139
|
+
Audio → mutagen → ffprobe (subprocess) → file size only
|
|
140
|
+
Image → Pillow → file size only
|
|
141
|
+
```
|
|
142
|
+
|
|
143
|
+
EDA Wizard degrades gracefully: if a library isn't installed it tries the next option and prints a tip at startup.
|
|
144
|
+
|
|
145
|
+
---
|
|
146
|
+
|
|
147
|
+
## Requirements
|
|
148
|
+
|
|
149
|
+
| Package | Purpose | Optional? |
|
|
150
|
+
|---------|---------|-----------|
|
|
151
|
+
| `pandas` | Data loading & stats | No |
|
|
152
|
+
| `numpy` | Numerics | No |
|
|
153
|
+
| `matplotlib` | Chart rendering | No |
|
|
154
|
+
| `seaborn` | Chart styling | No |
|
|
155
|
+
| `openpyxl` | Excel .xlsx | No |
|
|
156
|
+
| `pyarrow` | Parquet / Feather | No |
|
|
157
|
+
| `opencv-python-headless` | Video metadata | Yes (`pip install "eda-wizard[video]"`) |
|
|
158
|
+
| `mutagen` | Audio metadata | Yes (`pip install "eda-wizard[audio]"`) |
|
|
159
|
+
| `Pillow` | Image metadata | Yes (`pip install "eda-wizard[image]"`) |
|
|
160
|
+
|
|
161
|
+
---
|
|
162
|
+
|
|
163
|
+
## Contributing
|
|
164
|
+
|
|
165
|
+
```bash
|
|
166
|
+
git clone https://github.com/SMFRafin/eda-wizard
|
|
167
|
+
cd eda-wizard
|
|
168
|
+
pip install -e ".[dev]"
|
|
169
|
+
pytest
|
|
170
|
+
```
|
|
171
|
+
|
|
172
|
+
PRs welcome! Especially for:
|
|
173
|
+
- Additional file type support
|
|
174
|
+
- LLM-generated insight summaries
|
|
175
|
+
- More chart types
|
|
176
|
+
- Performance improvements for very large datasets
|
|
177
|
+
|
|
178
|
+
---
|
|
179
|
+
|
|
180
|
+
## License
|
|
181
|
+
|
|
182
|
+
MIT
|