lazyanalyst 1.0.0__tar.gz
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- lazyanalyst-1.0.0/CHANGELOG.md +179 -0
- lazyanalyst-1.0.0/LICENSE +21 -0
- lazyanalyst-1.0.0/MANIFEST.in +5 -0
- lazyanalyst-1.0.0/PKG-INFO +224 -0
- lazyanalyst-1.0.0/README.md +178 -0
- lazyanalyst-1.0.0/lazyanalyst/__init__.py +1 -0
- lazyanalyst-1.0.0/lazyanalyst/analyze.py +204 -0
- lazyanalyst-1.0.0/lazyanalyst/cleaner.py +89 -0
- lazyanalyst-1.0.0/lazyanalyst/dashboard.py +233 -0
- lazyanalyst-1.0.0/lazyanalyst/eda.py +72 -0
- lazyanalyst-1.0.0/lazyanalyst/features.py +67 -0
- lazyanalyst-1.0.0/lazyanalyst/insights.py +72 -0
- lazyanalyst-1.0.0/lazyanalyst/loader.py +49 -0
- lazyanalyst-1.0.0/lazyanalyst/quality.py +110 -0
- lazyanalyst-1.0.0/lazyanalyst/reporter.py +301 -0
- lazyanalyst-1.0.0/lazyanalyst/schema.py +76 -0
- lazyanalyst-1.0.0/lazyanalyst/stats.py +150 -0
- lazyanalyst-1.0.0/lazyanalyst/visualizer.py +175 -0
- lazyanalyst-1.0.0/lazyanalyst.egg-info/PKG-INFO +224 -0
- lazyanalyst-1.0.0/lazyanalyst.egg-info/SOURCES.txt +24 -0
- lazyanalyst-1.0.0/lazyanalyst.egg-info/dependency_links.txt +1 -0
- lazyanalyst-1.0.0/lazyanalyst.egg-info/requires.txt +16 -0
- lazyanalyst-1.0.0/lazyanalyst.egg-info/top_level.txt +1 -0
- lazyanalyst-1.0.0/pyproject.toml +70 -0
- lazyanalyst-1.0.0/setup.cfg +4 -0
- lazyanalyst-1.0.0/setup.py +40 -0
|
@@ -0,0 +1,179 @@
|
|
|
1
|
+
# Changelog
|
|
2
|
+
|
|
3
|
+
All notable changes to this project will be documented in this file.
|
|
4
|
+
|
|
5
|
+
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
|
|
6
|
+
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
|
|
7
|
+
|
|
8
|
+
## [1.0.0] - 2024-06-01
|
|
9
|
+
|
|
10
|
+
### Added
|
|
11
|
+
- Initial release of LazyAnalyst
|
|
12
|
+
- Full automated data analysis pipeline
|
|
13
|
+
- CSV and XLSX file support
|
|
14
|
+
- 11 integrated analysis modules
|
|
15
|
+
- Interactive HTML dashboard with Plotly
|
|
16
|
+
- Professional HTML report generation
|
|
17
|
+
- Automated statistical testing framework
|
|
18
|
+
- Data quality auditing and reporting
|
|
19
|
+
- Feature engineering capabilities
|
|
20
|
+
- Exploratory data analysis (EDA)
|
|
21
|
+
- Natural language insights generation
|
|
22
|
+
- Data cleaning and preprocessing automation
|
|
23
|
+
|
|
24
|
+
### Modules Included
|
|
25
|
+
1. **loader.py** — File loading and dtype inference
|
|
26
|
+
- Supports CSV (auto-detects encoding and delimiter)
|
|
27
|
+
- Supports XLSX (Excel workbooks)
|
|
28
|
+
- Automatic data type conversion
|
|
29
|
+
|
|
30
|
+
2. **schema.py** — Column type detection
|
|
31
|
+
- Detects: numerical, categorical, datetime, boolean, identifier
|
|
32
|
+
- Priority-based classification
|
|
33
|
+
|
|
34
|
+
3. **quality.py** — Data quality auditing
|
|
35
|
+
- Missing value analysis
|
|
36
|
+
- Duplicate detection
|
|
37
|
+
- Outlier flagging
|
|
38
|
+
- Quality score calculation
|
|
39
|
+
|
|
40
|
+
4. **cleaner.py** — Automated data cleaning
|
|
41
|
+
- Missing value imputation
|
|
42
|
+
- Duplicate removal
|
|
43
|
+
- Outlier handling
|
|
44
|
+
- Type conversion
|
|
45
|
+
|
|
46
|
+
5. **eda.py** — Exploratory data analysis
|
|
47
|
+
- Descriptive statistics
|
|
48
|
+
- Distribution analysis
|
|
49
|
+
- Correlation analysis
|
|
50
|
+
|
|
51
|
+
6. **visualizer.py** — Chart generation
|
|
52
|
+
- Histograms and distributions
|
|
53
|
+
- Bar charts for categories
|
|
54
|
+
- Scatter plots
|
|
55
|
+
- Correlation heatmaps
|
|
56
|
+
- PNG output with matplotlib/seaborn
|
|
57
|
+
|
|
58
|
+
7. **features.py** — Feature engineering
|
|
59
|
+
- Polynomial features
|
|
60
|
+
- Interaction terms
|
|
61
|
+
- Normalization/scaling
|
|
62
|
+
- Log transforms
|
|
63
|
+
|
|
64
|
+
8. **stats.py** — Statistical testing
|
|
65
|
+
- Pearson/Spearman correlation
|
|
66
|
+
- Independent T-tests
|
|
67
|
+
- ANOVA
|
|
68
|
+
- Chi-Square tests
|
|
69
|
+
- P-value calculation and interpretation
|
|
70
|
+
|
|
71
|
+
9. **insights.py** — Automated insights
|
|
72
|
+
- Correlation insights
|
|
73
|
+
- Group difference insights
|
|
74
|
+
- Data quality insights
|
|
75
|
+
- Distribution insights
|
|
76
|
+
- Natural language output
|
|
77
|
+
|
|
78
|
+
10. **dashboard.py** — Interactive dashboard
|
|
79
|
+
- Plotly-based visualizations
|
|
80
|
+
- Dark theme design
|
|
81
|
+
- 6 main sections (Overview, Quality, Distributions, Correlations, Tests, Insights)
|
|
82
|
+
- Self-contained HTML file
|
|
83
|
+
- No server required
|
|
84
|
+
|
|
85
|
+
11. **reporter.py** — HTML report generation
|
|
86
|
+
- Executive summary
|
|
87
|
+
- Dataset overview
|
|
88
|
+
- Data quality metrics
|
|
89
|
+
- EDA statistics
|
|
90
|
+
- Visualization embeddings
|
|
91
|
+
- Statistical results
|
|
92
|
+
- Actionable insights
|
|
93
|
+
- Professional styling with print support
|
|
94
|
+
|
|
95
|
+
### Features
|
|
96
|
+
- One-line API: `dp.analyze("file.csv")`
|
|
97
|
+
- Automatic error handling and resilience
|
|
98
|
+
- Progress reporting throughout pipeline
|
|
99
|
+
- Comprehensive output documentation
|
|
100
|
+
- Browser-based results viewing
|
|
101
|
+
- No manual configuration required
|
|
102
|
+
|
|
103
|
+
### Documentation
|
|
104
|
+
- Comprehensive README with quick start guide
|
|
105
|
+
- Installation instructions via pip
|
|
106
|
+
- Full API reference
|
|
107
|
+
- Example usage patterns
|
|
108
|
+
- Architecture documentation
|
|
109
|
+
- Contributing guidelines
|
|
110
|
+
|
|
111
|
+
### License
|
|
112
|
+
- MIT License for open distribution
|
|
113
|
+
- Free for personal and commercial use
|
|
114
|
+
|
|
115
|
+
---
|
|
116
|
+
|
|
117
|
+
## Release Notes
|
|
118
|
+
|
|
119
|
+
### v1.0.0 Highlights
|
|
120
|
+
- One-liner analysis: `dp.analyze("data.csv")`
|
|
121
|
+
- 11 fully integrated modules
|
|
122
|
+
- Professional HTML outputs (dashboard + report)
|
|
123
|
+
- Automatic statistical testing
|
|
124
|
+
- Natural language insights
|
|
125
|
+
- Beautiful dark theme interface
|
|
126
|
+
- Fast processing even on moderate datasets
|
|
127
|
+
- Robust error handling
|
|
128
|
+
|
|
129
|
+
### What's Coming in v1.1
|
|
130
|
+
- Time series analysis module
|
|
131
|
+
- Anomaly detection
|
|
132
|
+
- Clustering analysis
|
|
133
|
+
- Advanced feature selection
|
|
134
|
+
- Custom analysis templates
|
|
135
|
+
- Export to PDF
|
|
136
|
+
- Interactive Jupyter notebook support
|
|
137
|
+
|
|
138
|
+
### Known Limitations
|
|
139
|
+
- Dashboard rendering requires modern browser
|
|
140
|
+
- Large files (>1GB) may require increased memory
|
|
141
|
+
- Some advanced statistical tests not included
|
|
142
|
+
- ML models use only sklearn basics
|
|
143
|
+
|
|
144
|
+
---
|
|
145
|
+
|
|
146
|
+
## Installation & Usage
|
|
147
|
+
|
|
148
|
+
### Install
|
|
149
|
+
```bash
|
|
150
|
+
pip install lazyanalyst
|
|
151
|
+
```
|
|
152
|
+
|
|
153
|
+
### Basic Usage
|
|
154
|
+
```python
|
|
155
|
+
import lazyanalyst as dp
|
|
156
|
+
|
|
157
|
+
result = dp.analyze("sales.csv")
|
|
158
|
+
result.dashboard() # View interactive dashboard
|
|
159
|
+
result.report() # View HTML report
|
|
160
|
+
```
|
|
161
|
+
|
|
162
|
+
### Supported Formats
|
|
163
|
+
- CSV (comma, semicolon, or tab-separated)
|
|
164
|
+
- XLSX (Excel workbooks)
|
|
165
|
+
|
|
166
|
+
---
|
|
167
|
+
|
|
168
|
+
## Changelog Format
|
|
169
|
+
|
|
170
|
+
- **Added** for new features
|
|
171
|
+
- **Changed** for changes in existing functionality
|
|
172
|
+
- **Deprecated** for soon-to-be removed features
|
|
173
|
+
- **Removed** for now removed features
|
|
174
|
+
- **Fixed** for any bug fixes
|
|
175
|
+
- **Security** for security fix announcements
|
|
176
|
+
|
|
177
|
+
---
|
|
178
|
+
|
|
179
|
+
**For more information, visit the GitHub repository or PyPI page.**
|
|
@@ -0,0 +1,21 @@
|
|
|
1
|
+
MIT License
|
|
2
|
+
|
|
3
|
+
Copyright (c) 2024 LazyAnalyst Contributors
|
|
4
|
+
|
|
5
|
+
Permission is hereby granted, free of charge, to any person obtaining a copy
|
|
6
|
+
of this software and associated documentation files (the "Software"), to deal
|
|
7
|
+
in the Software without restriction, including without limitation the rights
|
|
8
|
+
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
|
|
9
|
+
copies of the Software, and to permit persons to whom the Software is
|
|
10
|
+
furnished to do so, subject to the following conditions:
|
|
11
|
+
|
|
12
|
+
The above copyright notice and this permission notice shall be included in all
|
|
13
|
+
copies or substantial portions of the Software.
|
|
14
|
+
|
|
15
|
+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
|
16
|
+
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
|
17
|
+
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
|
|
18
|
+
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
|
19
|
+
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
|
|
20
|
+
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
|
|
21
|
+
SOFTWARE.
|
|
@@ -0,0 +1,224 @@
|
|
|
1
|
+
Metadata-Version: 2.4
|
|
2
|
+
Name: lazyanalyst
|
|
3
|
+
Version: 1.0.0
|
|
4
|
+
Summary: Automated data analysis library with end-to-end analysis pipeline
|
|
5
|
+
Home-page: https://github.com/Tenali-Rama/lazyanalyst
|
|
6
|
+
Author: Tenali-Rama
|
|
7
|
+
Author-email: Tenali-Rama <tenalirama.krishna125@gmail.com>
|
|
8
|
+
License: MIT
|
|
9
|
+
Project-URL: Homepage, https://github.com/Tenali-Rama/lazyanalyst
|
|
10
|
+
Project-URL: Documentation, https://github.com/Tenali-Rama/lazyanalyst#readme
|
|
11
|
+
Project-URL: Repository, https://github.com/Tenali-Rama/lazyanalyst.git
|
|
12
|
+
Keywords: data-analysis,automation,pandas,visualization,statistics
|
|
13
|
+
Classifier: Programming Language :: Python :: 3
|
|
14
|
+
Classifier: Programming Language :: Python :: 3.8
|
|
15
|
+
Classifier: Programming Language :: Python :: 3.9
|
|
16
|
+
Classifier: Programming Language :: Python :: 3.10
|
|
17
|
+
Classifier: Programming Language :: Python :: 3.11
|
|
18
|
+
Classifier: License :: OSI Approved :: MIT License
|
|
19
|
+
Classifier: Operating System :: OS Independent
|
|
20
|
+
Classifier: Development Status :: 4 - Beta
|
|
21
|
+
Classifier: Intended Audience :: Developers
|
|
22
|
+
Classifier: Intended Audience :: Science/Research
|
|
23
|
+
Classifier: Topic :: Scientific/Engineering
|
|
24
|
+
Requires-Python: >=3.8
|
|
25
|
+
Description-Content-Type: text/markdown
|
|
26
|
+
License-File: LICENSE
|
|
27
|
+
Requires-Dist: pandas>=1.0.0
|
|
28
|
+
Requires-Dist: numpy>=1.19.0
|
|
29
|
+
Requires-Dist: scipy>=1.5.0
|
|
30
|
+
Requires-Dist: matplotlib>=3.3.0
|
|
31
|
+
Requires-Dist: seaborn>=0.11.0
|
|
32
|
+
Requires-Dist: plotly>=5.0.0
|
|
33
|
+
Requires-Dist: dash>=2.0.0
|
|
34
|
+
Requires-Dist: openpyxl>=3.0.0
|
|
35
|
+
Provides-Extra: dev
|
|
36
|
+
Requires-Dist: pytest>=6.0; extra == "dev"
|
|
37
|
+
Requires-Dist: black>=21.0; extra == "dev"
|
|
38
|
+
Requires-Dist: flake8>=3.9; extra == "dev"
|
|
39
|
+
Requires-Dist: mypy>=0.910; extra == "dev"
|
|
40
|
+
Requires-Dist: build>=0.7; extra == "dev"
|
|
41
|
+
Requires-Dist: twine>=3.4; extra == "dev"
|
|
42
|
+
Dynamic: author
|
|
43
|
+
Dynamic: home-page
|
|
44
|
+
Dynamic: license-file
|
|
45
|
+
Dynamic: requires-python
|
|
46
|
+
|
|
47
|
+
# LazyAnalyst v1.0.0
|
|
48
|
+
|
|
49
|
+
**Automated data analysis library for Python**
|
|
50
|
+
|
|
51
|
+
LazyAnalyst is an end-to-end data analysis library that automates everything you'd do manually with Pandas and NumPy. Load a dataset, run one line of code, and get a complete analysis with insights, visualizations, statistical tests, and a professional HTML report.
|
|
52
|
+
|
|
53
|
+
## Quick Start
|
|
54
|
+
|
|
55
|
+
```python
|
|
56
|
+
import lazyanalyst as dp
|
|
57
|
+
|
|
58
|
+
# Analyze any CSV or Excel file
|
|
59
|
+
result = dp.analyze("sales_data.csv")
|
|
60
|
+
|
|
61
|
+
# Open the interactive dashboard
|
|
62
|
+
result.dashboard()
|
|
63
|
+
|
|
64
|
+
# Or view the professional report
|
|
65
|
+
result.report()
|
|
66
|
+
```
|
|
67
|
+
|
|
68
|
+
That's it! LazyAnalyst handles:
|
|
69
|
+
- Automated data loading and type detection
|
|
70
|
+
- Data quality auditing and reporting
|
|
71
|
+
- Intelligent data cleaning
|
|
72
|
+
- Exploratory data analysis
|
|
73
|
+
- Statistical testing (Pearson, Spearman, ANOVA, Chi-Square)
|
|
74
|
+
- Feature engineering
|
|
75
|
+
- Interactive Plotly dashboard
|
|
76
|
+
- Professional HTML report generation
|
|
77
|
+
- Automated insights and interpretations
|
|
78
|
+
|
|
79
|
+
## Installation
|
|
80
|
+
|
|
81
|
+
Install via pip:
|
|
82
|
+
|
|
83
|
+
```bash
|
|
84
|
+
pip install lazyanalyst
|
|
85
|
+
```
|
|
86
|
+
|
|
87
|
+
**Requirements:** Python 3.8+
|
|
88
|
+
|
|
89
|
+
Or install from source:
|
|
90
|
+
|
|
91
|
+
```bash
|
|
92
|
+
git clone https://github.com/Tenali-Rama/lazyanalyst.git
|
|
93
|
+
cd lazyanalyst
|
|
94
|
+
pip install -e .
|
|
95
|
+
```
|
|
96
|
+
|
|
97
|
+
## Features
|
|
98
|
+
|
|
99
|
+
### 1. **Automated Pipeline**
|
|
100
|
+
No configuration needed. Just provide a CSV or Excel file and LazyAnalyst handles the rest.
|
|
101
|
+
|
|
102
|
+
### 2. **Data Quality Auditing**
|
|
103
|
+
Automatically detects:
|
|
104
|
+
- Missing values
|
|
105
|
+
- Duplicate rows
|
|
106
|
+
- Outliers
|
|
107
|
+
- Data type inconsistencies
|
|
108
|
+
- Quality score calculation
|
|
109
|
+
|
|
110
|
+
### 3. **Intelligent Cleaning**
|
|
111
|
+
- Auto-detects and fixes common issues
|
|
112
|
+
- Handles missing values intelligently
|
|
113
|
+
- Removes duplicates
|
|
114
|
+
- Converts data types automatically
|
|
115
|
+
|
|
116
|
+
### 4. **Exploratory Data Analysis (EDA)**
|
|
117
|
+
- Summary statistics (mean, median, std, min, max)
|
|
118
|
+
- Distribution analysis
|
|
119
|
+
- Correlation detection
|
|
120
|
+
- Categorical value counts
|
|
121
|
+
|
|
122
|
+
### 5. **Statistical Testing**
|
|
123
|
+
Runs appropriate tests automatically:
|
|
124
|
+
- **Pearson/Spearman Correlation** for numerical relationships
|
|
125
|
+
- **Independent T-Test** for 2-group comparisons
|
|
126
|
+
- **ANOVA** for 3+ group comparisons
|
|
127
|
+
- **Chi-Square** for categorical relationships
|
|
128
|
+
|
|
129
|
+
### 6. **Feature Engineering**
|
|
130
|
+
- Polynomial features
|
|
131
|
+
- Interaction terms
|
|
132
|
+
- Scaled/normalized versions
|
|
133
|
+
- Log transforms for skewed data
|
|
134
|
+
|
|
135
|
+
### 7. **Visualizations**
|
|
136
|
+
Generates:
|
|
137
|
+
- Distribution histograms
|
|
138
|
+
- Categorical bar charts
|
|
139
|
+
- Correlation heatmaps
|
|
140
|
+
- Scatter plots for relationships
|
|
141
|
+
|
|
142
|
+
### 8. **Interactive Dashboard**
|
|
143
|
+
Beautiful, self-contained HTML dashboard with all analyses and charts.
|
|
144
|
+
|
|
145
|
+
### 9. **Professional Report**
|
|
146
|
+
PDF-ready HTML report with executive summary, findings, and visualizations.
|
|
147
|
+
|
|
148
|
+
## Example Usage
|
|
149
|
+
|
|
150
|
+
### Basic Analysis
|
|
151
|
+
```python
|
|
152
|
+
import lazyanalyst as dp
|
|
153
|
+
|
|
154
|
+
result = dp.analyze("data.csv")
|
|
155
|
+
result.dashboard() # Open interactive dashboard
|
|
156
|
+
result.report() # Open HTML report
|
|
157
|
+
```
|
|
158
|
+
|
|
159
|
+
### With Options
|
|
160
|
+
```python
|
|
161
|
+
result = dp.analyze("data.xlsx", dashboard=True, report=True)
|
|
162
|
+
|
|
163
|
+
# Access cleaned data
|
|
164
|
+
cleaned_df = result.cleaned_data()
|
|
165
|
+
```
|
|
166
|
+
|
|
167
|
+
### Supported File Types
|
|
168
|
+
- CSV (auto-detects encoding and delimiter)
|
|
169
|
+
- XLSX (Excel workbooks)
|
|
170
|
+
|
|
171
|
+
## Output Files
|
|
172
|
+
|
|
173
|
+
LazyAnalyst creates an `outputs/` folder with:
|
|
174
|
+
|
|
175
|
+
- `cleaned_data.csv` — Your cleaned dataset
|
|
176
|
+
- `report.html` — Professional report
|
|
177
|
+
- `dashboard.html` — Interactive dashboard
|
|
178
|
+
- `insights.txt` — Text summary of insights
|
|
179
|
+
- `plots/` — All generated visualizations
|
|
180
|
+
|
|
181
|
+
## Architecture
|
|
182
|
+
|
|
183
|
+
LazyAnalyst consists of 11 integrated modules:
|
|
184
|
+
|
|
185
|
+
1. **loader.py** — File loading with auto type inference
|
|
186
|
+
2. **schema.py** — Column type detection
|
|
187
|
+
3. **quality.py** — Data quality auditing
|
|
188
|
+
4. **cleaner.py** — Automated data cleaning
|
|
189
|
+
5. **eda.py** — Exploratory data analysis
|
|
190
|
+
6. **visualizer.py** — Chart generation
|
|
191
|
+
7. **features.py** — Feature engineering
|
|
192
|
+
8. **stats.py** — Statistical testing
|
|
193
|
+
9. **insights.py** — Natural language insights
|
|
194
|
+
10. **dashboard.py** — Interactive dashboard generation
|
|
195
|
+
11. **reporter.py** — HTML report generation
|
|
196
|
+
|
|
197
|
+
## Documentation
|
|
198
|
+
|
|
199
|
+
Full documentation available in the GitHub repository.
|
|
200
|
+
|
|
201
|
+
## License
|
|
202
|
+
|
|
203
|
+
MIT License - See LICENSE file for details
|
|
204
|
+
|
|
205
|
+
## Troubleshooting
|
|
206
|
+
|
|
207
|
+
### "FileNotFoundError"
|
|
208
|
+
- Check file path is correct
|
|
209
|
+
- Use absolute path if relative path doesn't work
|
|
210
|
+
|
|
211
|
+
### "ValueError: unsupported file type"
|
|
212
|
+
- Ensure file is .csv or .xlsx
|
|
213
|
+
|
|
214
|
+
### Dashboard won't open
|
|
215
|
+
- Check plotly and dash are installed
|
|
216
|
+
- Try opening dashboard.html directly in browser
|
|
217
|
+
|
|
218
|
+
## Support
|
|
219
|
+
|
|
220
|
+
For questions or issues, check this README or the GitHub repository.
|
|
221
|
+
|
|
222
|
+
---
|
|
223
|
+
|
|
224
|
+
Transform your data analysis workflow with one line of code.
|
|
@@ -0,0 +1,178 @@
|
|
|
1
|
+
# LazyAnalyst v1.0.0
|
|
2
|
+
|
|
3
|
+
**Automated data analysis library for Python**
|
|
4
|
+
|
|
5
|
+
LazyAnalyst is an end-to-end data analysis library that automates everything you'd do manually with Pandas and NumPy. Load a dataset, run one line of code, and get a complete analysis with insights, visualizations, statistical tests, and a professional HTML report.
|
|
6
|
+
|
|
7
|
+
## Quick Start
|
|
8
|
+
|
|
9
|
+
```python
|
|
10
|
+
import lazyanalyst as dp
|
|
11
|
+
|
|
12
|
+
# Analyze any CSV or Excel file
|
|
13
|
+
result = dp.analyze("sales_data.csv")
|
|
14
|
+
|
|
15
|
+
# Open the interactive dashboard
|
|
16
|
+
result.dashboard()
|
|
17
|
+
|
|
18
|
+
# Or view the professional report
|
|
19
|
+
result.report()
|
|
20
|
+
```
|
|
21
|
+
|
|
22
|
+
That's it! LazyAnalyst handles:
|
|
23
|
+
- Automated data loading and type detection
|
|
24
|
+
- Data quality auditing and reporting
|
|
25
|
+
- Intelligent data cleaning
|
|
26
|
+
- Exploratory data analysis
|
|
27
|
+
- Statistical testing (Pearson, Spearman, ANOVA, Chi-Square)
|
|
28
|
+
- Feature engineering
|
|
29
|
+
- Interactive Plotly dashboard
|
|
30
|
+
- Professional HTML report generation
|
|
31
|
+
- Automated insights and interpretations
|
|
32
|
+
|
|
33
|
+
## Installation
|
|
34
|
+
|
|
35
|
+
Install via pip:
|
|
36
|
+
|
|
37
|
+
```bash
|
|
38
|
+
pip install lazyanalyst
|
|
39
|
+
```
|
|
40
|
+
|
|
41
|
+
**Requirements:** Python 3.8+
|
|
42
|
+
|
|
43
|
+
Or install from source:
|
|
44
|
+
|
|
45
|
+
```bash
|
|
46
|
+
git clone https://github.com/Tenali-Rama/lazyanalyst.git
|
|
47
|
+
cd lazyanalyst
|
|
48
|
+
pip install -e .
|
|
49
|
+
```
|
|
50
|
+
|
|
51
|
+
## Features
|
|
52
|
+
|
|
53
|
+
### 1. **Automated Pipeline**
|
|
54
|
+
No configuration needed. Just provide a CSV or Excel file and LazyAnalyst handles the rest.
|
|
55
|
+
|
|
56
|
+
### 2. **Data Quality Auditing**
|
|
57
|
+
Automatically detects:
|
|
58
|
+
- Missing values
|
|
59
|
+
- Duplicate rows
|
|
60
|
+
- Outliers
|
|
61
|
+
- Data type inconsistencies
|
|
62
|
+
- Quality score calculation
|
|
63
|
+
|
|
64
|
+
### 3. **Intelligent Cleaning**
|
|
65
|
+
- Auto-detects and fixes common issues
|
|
66
|
+
- Handles missing values intelligently
|
|
67
|
+
- Removes duplicates
|
|
68
|
+
- Converts data types automatically
|
|
69
|
+
|
|
70
|
+
### 4. **Exploratory Data Analysis (EDA)**
|
|
71
|
+
- Summary statistics (mean, median, std, min, max)
|
|
72
|
+
- Distribution analysis
|
|
73
|
+
- Correlation detection
|
|
74
|
+
- Categorical value counts
|
|
75
|
+
|
|
76
|
+
### 5. **Statistical Testing**
|
|
77
|
+
Runs appropriate tests automatically:
|
|
78
|
+
- **Pearson/Spearman Correlation** for numerical relationships
|
|
79
|
+
- **Independent T-Test** for 2-group comparisons
|
|
80
|
+
- **ANOVA** for 3+ group comparisons
|
|
81
|
+
- **Chi-Square** for categorical relationships
|
|
82
|
+
|
|
83
|
+
### 6. **Feature Engineering**
|
|
84
|
+
- Polynomial features
|
|
85
|
+
- Interaction terms
|
|
86
|
+
- Scaled/normalized versions
|
|
87
|
+
- Log transforms for skewed data
|
|
88
|
+
|
|
89
|
+
### 7. **Visualizations**
|
|
90
|
+
Generates:
|
|
91
|
+
- Distribution histograms
|
|
92
|
+
- Categorical bar charts
|
|
93
|
+
- Correlation heatmaps
|
|
94
|
+
- Scatter plots for relationships
|
|
95
|
+
|
|
96
|
+
### 8. **Interactive Dashboard**
|
|
97
|
+
Beautiful, self-contained HTML dashboard with all analyses and charts.
|
|
98
|
+
|
|
99
|
+
### 9. **Professional Report**
|
|
100
|
+
PDF-ready HTML report with executive summary, findings, and visualizations.
|
|
101
|
+
|
|
102
|
+
## Example Usage
|
|
103
|
+
|
|
104
|
+
### Basic Analysis
|
|
105
|
+
```python
|
|
106
|
+
import lazyanalyst as dp
|
|
107
|
+
|
|
108
|
+
result = dp.analyze("data.csv")
|
|
109
|
+
result.dashboard() # Open interactive dashboard
|
|
110
|
+
result.report() # Open HTML report
|
|
111
|
+
```
|
|
112
|
+
|
|
113
|
+
### With Options
|
|
114
|
+
```python
|
|
115
|
+
result = dp.analyze("data.xlsx", dashboard=True, report=True)
|
|
116
|
+
|
|
117
|
+
# Access cleaned data
|
|
118
|
+
cleaned_df = result.cleaned_data()
|
|
119
|
+
```
|
|
120
|
+
|
|
121
|
+
### Supported File Types
|
|
122
|
+
- CSV (auto-detects encoding and delimiter)
|
|
123
|
+
- XLSX (Excel workbooks)
|
|
124
|
+
|
|
125
|
+
## Output Files
|
|
126
|
+
|
|
127
|
+
LazyAnalyst creates an `outputs/` folder with:
|
|
128
|
+
|
|
129
|
+
- `cleaned_data.csv` — Your cleaned dataset
|
|
130
|
+
- `report.html` — Professional report
|
|
131
|
+
- `dashboard.html` — Interactive dashboard
|
|
132
|
+
- `insights.txt` — Text summary of insights
|
|
133
|
+
- `plots/` — All generated visualizations
|
|
134
|
+
|
|
135
|
+
## Architecture
|
|
136
|
+
|
|
137
|
+
LazyAnalyst consists of 11 integrated modules:
|
|
138
|
+
|
|
139
|
+
1. **loader.py** — File loading with auto type inference
|
|
140
|
+
2. **schema.py** — Column type detection
|
|
141
|
+
3. **quality.py** — Data quality auditing
|
|
142
|
+
4. **cleaner.py** — Automated data cleaning
|
|
143
|
+
5. **eda.py** — Exploratory data analysis
|
|
144
|
+
6. **visualizer.py** — Chart generation
|
|
145
|
+
7. **features.py** — Feature engineering
|
|
146
|
+
8. **stats.py** — Statistical testing
|
|
147
|
+
9. **insights.py** — Natural language insights
|
|
148
|
+
10. **dashboard.py** — Interactive dashboard generation
|
|
149
|
+
11. **reporter.py** — HTML report generation
|
|
150
|
+
|
|
151
|
+
## Documentation
|
|
152
|
+
|
|
153
|
+
Full documentation available in the GitHub repository.
|
|
154
|
+
|
|
155
|
+
## License
|
|
156
|
+
|
|
157
|
+
MIT License - See LICENSE file for details
|
|
158
|
+
|
|
159
|
+
## Troubleshooting
|
|
160
|
+
|
|
161
|
+
### "FileNotFoundError"
|
|
162
|
+
- Check file path is correct
|
|
163
|
+
- Use absolute path if relative path doesn't work
|
|
164
|
+
|
|
165
|
+
### "ValueError: unsupported file type"
|
|
166
|
+
- Ensure file is .csv or .xlsx
|
|
167
|
+
|
|
168
|
+
### Dashboard won't open
|
|
169
|
+
- Check plotly and dash are installed
|
|
170
|
+
- Try opening dashboard.html directly in browser
|
|
171
|
+
|
|
172
|
+
## Support
|
|
173
|
+
|
|
174
|
+
For questions or issues, check this README or the GitHub repository.
|
|
175
|
+
|
|
176
|
+
---
|
|
177
|
+
|
|
178
|
+
Transform your data analysis workflow with one line of code.
|
|
@@ -0,0 +1 @@
|
|
|
1
|
+
from .analyze import analyze, LazyAnalystResult
|