lazyanalyst 1.0.0__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,179 @@
1
+ # Changelog
2
+
3
+ All notable changes to this project will be documented in this file.
4
+
5
+ The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
6
+ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
7
+
8
+ ## [1.0.0] - 2024-06-01
9
+
10
+ ### Added
11
+ - Initial release of LazyAnalyst
12
+ - Full automated data analysis pipeline
13
+ - CSV and XLSX file support
14
+ - 11 integrated analysis modules
15
+ - Interactive HTML dashboard with Plotly
16
+ - Professional HTML report generation
17
+ - Automated statistical testing framework
18
+ - Data quality auditing and reporting
19
+ - Feature engineering capabilities
20
+ - Exploratory data analysis (EDA)
21
+ - Natural language insights generation
22
+ - Data cleaning and preprocessing automation
23
+
24
+ ### Modules Included
25
+ 1. **loader.py** — File loading and dtype inference
26
+ - Supports CSV (auto-detects encoding and delimiter)
27
+ - Supports XLSX (Excel workbooks)
28
+ - Automatic data type conversion
29
+
30
+ 2. **schema.py** — Column type detection
31
+ - Detects: numerical, categorical, datetime, boolean, identifier
32
+ - Priority-based classification
33
+
34
+ 3. **quality.py** — Data quality auditing
35
+ - Missing value analysis
36
+ - Duplicate detection
37
+ - Outlier flagging
38
+ - Quality score calculation
39
+
40
+ 4. **cleaner.py** — Automated data cleaning
41
+ - Missing value imputation
42
+ - Duplicate removal
43
+ - Outlier handling
44
+ - Type conversion
45
+
46
+ 5. **eda.py** — Exploratory data analysis
47
+ - Descriptive statistics
48
+ - Distribution analysis
49
+ - Correlation analysis
50
+
51
+ 6. **visualizer.py** — Chart generation
52
+ - Histograms and distributions
53
+ - Bar charts for categories
54
+ - Scatter plots
55
+ - Correlation heatmaps
56
+ - PNG output with matplotlib/seaborn
57
+
58
+ 7. **features.py** — Feature engineering
59
+ - Polynomial features
60
+ - Interaction terms
61
+ - Normalization/scaling
62
+ - Log transforms
63
+
64
+ 8. **stats.py** — Statistical testing
65
+ - Pearson/Spearman correlation
66
+ - Independent T-tests
67
+ - ANOVA
68
+ - Chi-Square tests
69
+ - P-value calculation and interpretation
70
+
71
+ 9. **insights.py** — Automated insights
72
+ - Correlation insights
73
+ - Group difference insights
74
+ - Data quality insights
75
+ - Distribution insights
76
+ - Natural language output
77
+
78
+ 10. **dashboard.py** — Interactive dashboard
79
+ - Plotly-based visualizations
80
+ - Dark theme design
81
+ - 6 main sections (Overview, Quality, Distributions, Correlations, Tests, Insights)
82
+ - Self-contained HTML file
83
+ - No server required
84
+
85
+ 11. **reporter.py** — HTML report generation
86
+ - Executive summary
87
+ - Dataset overview
88
+ - Data quality metrics
89
+ - EDA statistics
90
+ - Visualization embeddings
91
+ - Statistical results
92
+ - Actionable insights
93
+ - Professional styling with print support
94
+
95
+ ### Features
96
+ - One-line API: `dp.analyze("file.csv")`
97
+ - Automatic error handling and resilience
98
+ - Progress reporting throughout pipeline
99
+ - Comprehensive output documentation
100
+ - Browser-based results viewing
101
+ - No manual configuration required
102
+
103
+ ### Documentation
104
+ - Comprehensive README with quick start guide
105
+ - Installation instructions via pip
106
+ - Full API reference
107
+ - Example usage patterns
108
+ - Architecture documentation
109
+ - Contributing guidelines
110
+
111
+ ### License
112
+ - MIT License for open distribution
113
+ - Free for personal and commercial use
114
+
115
+ ---
116
+
117
+ ## Release Notes
118
+
119
+ ### v1.0.0 Highlights
120
+ - One-liner analysis: `dp.analyze("data.csv")`
121
+ - 11 fully integrated modules
122
+ - Professional HTML outputs (dashboard + report)
123
+ - Automatic statistical testing
124
+ - Natural language insights
125
+ - Beautiful dark theme interface
126
+ - Fast processing even on moderate datasets
127
+ - Robust error handling
128
+
129
+ ### What's Coming in v1.1
130
+ - Time series analysis module
131
+ - Anomaly detection
132
+ - Clustering analysis
133
+ - Advanced feature selection
134
+ - Custom analysis templates
135
+ - Export to PDF
136
+ - Interactive Jupyter notebook support
137
+
138
+ ### Known Limitations
139
+ - Dashboard rendering requires modern browser
140
+ - Large files (>1GB) may require increased memory
141
+ - Some advanced statistical tests not included
142
+ - ML models use only sklearn basics
143
+
144
+ ---
145
+
146
+ ## Installation & Usage
147
+
148
+ ### Install
149
+ ```bash
150
+ pip install lazyanalyst
151
+ ```
152
+
153
+ ### Basic Usage
154
+ ```python
155
+ import lazyanalyst as dp
156
+
157
+ result = dp.analyze("sales.csv")
158
+ result.dashboard() # View interactive dashboard
159
+ result.report() # View HTML report
160
+ ```
161
+
162
+ ### Supported Formats
163
+ - CSV (comma, semicolon, or tab-separated)
164
+ - XLSX (Excel workbooks)
165
+
166
+ ---
167
+
168
+ ## Changelog Format
169
+
170
+ - **Added** for new features
171
+ - **Changed** for changes in existing functionality
172
+ - **Deprecated** for soon-to-be removed features
173
+ - **Removed** for now removed features
174
+ - **Fixed** for any bug fixes
175
+ - **Security** for security fix announcements
176
+
177
+ ---
178
+
179
+ **For more information, visit the GitHub repository or PyPI page.**
@@ -0,0 +1,21 @@
1
+ MIT License
2
+
3
+ Copyright (c) 2024 LazyAnalyst Contributors
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining a copy
6
+ of this software and associated documentation files (the "Software"), to deal
7
+ in the Software without restriction, including without limitation the rights
8
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
+ copies of the Software, and to permit persons to whom the Software is
10
+ furnished to do so, subject to the following conditions:
11
+
12
+ The above copyright notice and this permission notice shall be included in all
13
+ copies or substantial portions of the Software.
14
+
15
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21
+ SOFTWARE.
@@ -0,0 +1,5 @@
1
+ include README.md
2
+ include LICENSE
3
+ include CHANGELOG.md
4
+ recursive-include lazyanalyst *.py
5
+
@@ -0,0 +1,224 @@
1
+ Metadata-Version: 2.4
2
+ Name: lazyanalyst
3
+ Version: 1.0.0
4
+ Summary: Automated data analysis library with end-to-end analysis pipeline
5
+ Home-page: https://github.com/Tenali-Rama/lazyanalyst
6
+ Author: Tenali-Rama
7
+ Author-email: Tenali-Rama <tenalirama.krishna125@gmail.com>
8
+ License: MIT
9
+ Project-URL: Homepage, https://github.com/Tenali-Rama/lazyanalyst
10
+ Project-URL: Documentation, https://github.com/Tenali-Rama/lazyanalyst#readme
11
+ Project-URL: Repository, https://github.com/Tenali-Rama/lazyanalyst.git
12
+ Keywords: data-analysis,automation,pandas,visualization,statistics
13
+ Classifier: Programming Language :: Python :: 3
14
+ Classifier: Programming Language :: Python :: 3.8
15
+ Classifier: Programming Language :: Python :: 3.9
16
+ Classifier: Programming Language :: Python :: 3.10
17
+ Classifier: Programming Language :: Python :: 3.11
18
+ Classifier: License :: OSI Approved :: MIT License
19
+ Classifier: Operating System :: OS Independent
20
+ Classifier: Development Status :: 4 - Beta
21
+ Classifier: Intended Audience :: Developers
22
+ Classifier: Intended Audience :: Science/Research
23
+ Classifier: Topic :: Scientific/Engineering
24
+ Requires-Python: >=3.8
25
+ Description-Content-Type: text/markdown
26
+ License-File: LICENSE
27
+ Requires-Dist: pandas>=1.0.0
28
+ Requires-Dist: numpy>=1.19.0
29
+ Requires-Dist: scipy>=1.5.0
30
+ Requires-Dist: matplotlib>=3.3.0
31
+ Requires-Dist: seaborn>=0.11.0
32
+ Requires-Dist: plotly>=5.0.0
33
+ Requires-Dist: dash>=2.0.0
34
+ Requires-Dist: openpyxl>=3.0.0
35
+ Provides-Extra: dev
36
+ Requires-Dist: pytest>=6.0; extra == "dev"
37
+ Requires-Dist: black>=21.0; extra == "dev"
38
+ Requires-Dist: flake8>=3.9; extra == "dev"
39
+ Requires-Dist: mypy>=0.910; extra == "dev"
40
+ Requires-Dist: build>=0.7; extra == "dev"
41
+ Requires-Dist: twine>=3.4; extra == "dev"
42
+ Dynamic: author
43
+ Dynamic: home-page
44
+ Dynamic: license-file
45
+ Dynamic: requires-python
46
+
47
+ # LazyAnalyst v1.0.0
48
+
49
+ **Automated data analysis library for Python**
50
+
51
+ LazyAnalyst is an end-to-end data analysis library that automates everything you'd do manually with Pandas and NumPy. Load a dataset, run one line of code, and get a complete analysis with insights, visualizations, statistical tests, and a professional HTML report.
52
+
53
+ ## Quick Start
54
+
55
+ ```python
56
+ import lazyanalyst as dp
57
+
58
+ # Analyze any CSV or Excel file
59
+ result = dp.analyze("sales_data.csv")
60
+
61
+ # Open the interactive dashboard
62
+ result.dashboard()
63
+
64
+ # Or view the professional report
65
+ result.report()
66
+ ```
67
+
68
+ That's it! LazyAnalyst handles:
69
+ - Automated data loading and type detection
70
+ - Data quality auditing and reporting
71
+ - Intelligent data cleaning
72
+ - Exploratory data analysis
73
+ - Statistical testing (Pearson, Spearman, ANOVA, Chi-Square)
74
+ - Feature engineering
75
+ - Interactive Plotly dashboard
76
+ - Professional HTML report generation
77
+ - Automated insights and interpretations
78
+
79
+ ## Installation
80
+
81
+ Install via pip:
82
+
83
+ ```bash
84
+ pip install lazyanalyst
85
+ ```
86
+
87
+ **Requirements:** Python 3.8+
88
+
89
+ Or install from source:
90
+
91
+ ```bash
92
+ git clone https://github.com/Tenali-Rama/lazyanalyst.git
93
+ cd lazyanalyst
94
+ pip install -e .
95
+ ```
96
+
97
+ ## Features
98
+
99
+ ### 1. **Automated Pipeline**
100
+ No configuration needed. Just provide a CSV or Excel file and LazyAnalyst handles the rest.
101
+
102
+ ### 2. **Data Quality Auditing**
103
+ Automatically detects:
104
+ - Missing values
105
+ - Duplicate rows
106
+ - Outliers
107
+ - Data type inconsistencies
108
+ - Quality score calculation
109
+
110
+ ### 3. **Intelligent Cleaning**
111
+ - Auto-detects and fixes common issues
112
+ - Handles missing values intelligently
113
+ - Removes duplicates
114
+ - Converts data types automatically
115
+
116
+ ### 4. **Exploratory Data Analysis (EDA)**
117
+ - Summary statistics (mean, median, std, min, max)
118
+ - Distribution analysis
119
+ - Correlation detection
120
+ - Categorical value counts
121
+
122
+ ### 5. **Statistical Testing**
123
+ Runs appropriate tests automatically:
124
+ - **Pearson/Spearman Correlation** for numerical relationships
125
+ - **Independent T-Test** for 2-group comparisons
126
+ - **ANOVA** for 3+ group comparisons
127
+ - **Chi-Square** for categorical relationships
128
+
129
+ ### 6. **Feature Engineering**
130
+ - Polynomial features
131
+ - Interaction terms
132
+ - Scaled/normalized versions
133
+ - Log transforms for skewed data
134
+
135
+ ### 7. **Visualizations**
136
+ Generates:
137
+ - Distribution histograms
138
+ - Categorical bar charts
139
+ - Correlation heatmaps
140
+ - Scatter plots for relationships
141
+
142
+ ### 8. **Interactive Dashboard**
143
+ Beautiful, self-contained HTML dashboard with all analyses and charts.
144
+
145
+ ### 9. **Professional Report**
146
+ PDF-ready HTML report with executive summary, findings, and visualizations.
147
+
148
+ ## Example Usage
149
+
150
+ ### Basic Analysis
151
+ ```python
152
+ import lazyanalyst as dp
153
+
154
+ result = dp.analyze("data.csv")
155
+ result.dashboard() # Open interactive dashboard
156
+ result.report() # Open HTML report
157
+ ```
158
+
159
+ ### With Options
160
+ ```python
161
+ result = dp.analyze("data.xlsx", dashboard=True, report=True)
162
+
163
+ # Access cleaned data
164
+ cleaned_df = result.cleaned_data()
165
+ ```
166
+
167
+ ### Supported File Types
168
+ - CSV (auto-detects encoding and delimiter)
169
+ - XLSX (Excel workbooks)
170
+
171
+ ## Output Files
172
+
173
+ LazyAnalyst creates an `outputs/` folder with:
174
+
175
+ - `cleaned_data.csv` — Your cleaned dataset
176
+ - `report.html` — Professional report
177
+ - `dashboard.html` — Interactive dashboard
178
+ - `insights.txt` — Text summary of insights
179
+ - `plots/` — All generated visualizations
180
+
181
+ ## Architecture
182
+
183
+ LazyAnalyst consists of 11 integrated modules:
184
+
185
+ 1. **loader.py** — File loading with auto type inference
186
+ 2. **schema.py** — Column type detection
187
+ 3. **quality.py** — Data quality auditing
188
+ 4. **cleaner.py** — Automated data cleaning
189
+ 5. **eda.py** — Exploratory data analysis
190
+ 6. **visualizer.py** — Chart generation
191
+ 7. **features.py** — Feature engineering
192
+ 8. **stats.py** — Statistical testing
193
+ 9. **insights.py** — Natural language insights
194
+ 10. **dashboard.py** — Interactive dashboard generation
195
+ 11. **reporter.py** — HTML report generation
196
+
197
+ ## Documentation
198
+
199
+ Full documentation available in the GitHub repository.
200
+
201
+ ## License
202
+
203
+ MIT License - See LICENSE file for details
204
+
205
+ ## Troubleshooting
206
+
207
+ ### "FileNotFoundError"
208
+ - Check file path is correct
209
+ - Use absolute path if relative path doesn't work
210
+
211
+ ### "ValueError: unsupported file type"
212
+ - Ensure file is .csv or .xlsx
213
+
214
+ ### Dashboard won't open
215
+ - Check plotly and dash are installed
216
+ - Try opening dashboard.html directly in browser
217
+
218
+ ## Support
219
+
220
+ For questions or issues, check this README or the GitHub repository.
221
+
222
+ ---
223
+
224
+ Transform your data analysis workflow with one line of code.
@@ -0,0 +1,178 @@
1
+ # LazyAnalyst v1.0.0
2
+
3
+ **Automated data analysis library for Python**
4
+
5
+ LazyAnalyst is an end-to-end data analysis library that automates everything you'd do manually with Pandas and NumPy. Load a dataset, run one line of code, and get a complete analysis with insights, visualizations, statistical tests, and a professional HTML report.
6
+
7
+ ## Quick Start
8
+
9
+ ```python
10
+ import lazyanalyst as dp
11
+
12
+ # Analyze any CSV or Excel file
13
+ result = dp.analyze("sales_data.csv")
14
+
15
+ # Open the interactive dashboard
16
+ result.dashboard()
17
+
18
+ # Or view the professional report
19
+ result.report()
20
+ ```
21
+
22
+ That's it! LazyAnalyst handles:
23
+ - Automated data loading and type detection
24
+ - Data quality auditing and reporting
25
+ - Intelligent data cleaning
26
+ - Exploratory data analysis
27
+ - Statistical testing (Pearson, Spearman, ANOVA, Chi-Square)
28
+ - Feature engineering
29
+ - Interactive Plotly dashboard
30
+ - Professional HTML report generation
31
+ - Automated insights and interpretations
32
+
33
+ ## Installation
34
+
35
+ Install via pip:
36
+
37
+ ```bash
38
+ pip install lazyanalyst
39
+ ```
40
+
41
+ **Requirements:** Python 3.8+
42
+
43
+ Or install from source:
44
+
45
+ ```bash
46
+ git clone https://github.com/Tenali-Rama/lazyanalyst.git
47
+ cd lazyanalyst
48
+ pip install -e .
49
+ ```
50
+
51
+ ## Features
52
+
53
+ ### 1. **Automated Pipeline**
54
+ No configuration needed. Just provide a CSV or Excel file and LazyAnalyst handles the rest.
55
+
56
+ ### 2. **Data Quality Auditing**
57
+ Automatically detects:
58
+ - Missing values
59
+ - Duplicate rows
60
+ - Outliers
61
+ - Data type inconsistencies
62
+ - Quality score calculation
63
+
64
+ ### 3. **Intelligent Cleaning**
65
+ - Auto-detects and fixes common issues
66
+ - Handles missing values intelligently
67
+ - Removes duplicates
68
+ - Converts data types automatically
69
+
70
+ ### 4. **Exploratory Data Analysis (EDA)**
71
+ - Summary statistics (mean, median, std, min, max)
72
+ - Distribution analysis
73
+ - Correlation detection
74
+ - Categorical value counts
75
+
76
+ ### 5. **Statistical Testing**
77
+ Runs appropriate tests automatically:
78
+ - **Pearson/Spearman Correlation** for numerical relationships
79
+ - **Independent T-Test** for 2-group comparisons
80
+ - **ANOVA** for 3+ group comparisons
81
+ - **Chi-Square** for categorical relationships
82
+
83
+ ### 6. **Feature Engineering**
84
+ - Polynomial features
85
+ - Interaction terms
86
+ - Scaled/normalized versions
87
+ - Log transforms for skewed data
88
+
89
+ ### 7. **Visualizations**
90
+ Generates:
91
+ - Distribution histograms
92
+ - Categorical bar charts
93
+ - Correlation heatmaps
94
+ - Scatter plots for relationships
95
+
96
+ ### 8. **Interactive Dashboard**
97
+ Beautiful, self-contained HTML dashboard with all analyses and charts.
98
+
99
+ ### 9. **Professional Report**
100
+ PDF-ready HTML report with executive summary, findings, and visualizations.
101
+
102
+ ## Example Usage
103
+
104
+ ### Basic Analysis
105
+ ```python
106
+ import lazyanalyst as dp
107
+
108
+ result = dp.analyze("data.csv")
109
+ result.dashboard() # Open interactive dashboard
110
+ result.report() # Open HTML report
111
+ ```
112
+
113
+ ### With Options
114
+ ```python
115
+ result = dp.analyze("data.xlsx", dashboard=True, report=True)
116
+
117
+ # Access cleaned data
118
+ cleaned_df = result.cleaned_data()
119
+ ```
120
+
121
+ ### Supported File Types
122
+ - CSV (auto-detects encoding and delimiter)
123
+ - XLSX (Excel workbooks)
124
+
125
+ ## Output Files
126
+
127
+ LazyAnalyst creates an `outputs/` folder with:
128
+
129
+ - `cleaned_data.csv` — Your cleaned dataset
130
+ - `report.html` — Professional report
131
+ - `dashboard.html` — Interactive dashboard
132
+ - `insights.txt` — Text summary of insights
133
+ - `plots/` — All generated visualizations
134
+
135
+ ## Architecture
136
+
137
+ LazyAnalyst consists of 11 integrated modules:
138
+
139
+ 1. **loader.py** — File loading with auto type inference
140
+ 2. **schema.py** — Column type detection
141
+ 3. **quality.py** — Data quality auditing
142
+ 4. **cleaner.py** — Automated data cleaning
143
+ 5. **eda.py** — Exploratory data analysis
144
+ 6. **visualizer.py** — Chart generation
145
+ 7. **features.py** — Feature engineering
146
+ 8. **stats.py** — Statistical testing
147
+ 9. **insights.py** — Natural language insights
148
+ 10. **dashboard.py** — Interactive dashboard generation
149
+ 11. **reporter.py** — HTML report generation
150
+
151
+ ## Documentation
152
+
153
+ Full documentation available in the GitHub repository.
154
+
155
+ ## License
156
+
157
+ MIT License - See LICENSE file for details
158
+
159
+ ## Troubleshooting
160
+
161
+ ### "FileNotFoundError"
162
+ - Check file path is correct
163
+ - Use absolute path if relative path doesn't work
164
+
165
+ ### "ValueError: unsupported file type"
166
+ - Ensure file is .csv or .xlsx
167
+
168
+ ### Dashboard won't open
169
+ - Check plotly and dash are installed
170
+ - Try opening dashboard.html directly in browser
171
+
172
+ ## Support
173
+
174
+ For questions or issues, check this README or the GitHub repository.
175
+
176
+ ---
177
+
178
+ Transform your data analysis workflow with one line of code.
@@ -0,0 +1 @@
1
+ from .analyze import analyze, LazyAnalystResult