kaizenstat 0.2.0__tar.gz → 0.2.2__tar.gz
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- kaizenstat-0.2.2/PKG-INFO +172 -0
- kaizenstat-0.2.2/README.md +142 -0
- kaizenstat-0.2.2/kaizenstat/__init__.py +6 -0
- {kaizenstat-0.2.0 → kaizenstat-0.2.2}/kaizenstat/core.py +5 -0
- kaizenstat-0.2.2/kaizenstat.egg-info/PKG-INFO +172 -0
- {kaizenstat-0.2.0 → kaizenstat-0.2.2}/setup.py +1 -1
- kaizenstat-0.2.0/PKG-INFO +0 -176
- kaizenstat-0.2.0/README.md +0 -146
- kaizenstat-0.2.0/kaizenstat/__init__.py +0 -3
- kaizenstat-0.2.0/kaizenstat.egg-info/PKG-INFO +0 -176
- {kaizenstat-0.2.0 → kaizenstat-0.2.2}/kaizenstat/cli.py +0 -0
- {kaizenstat-0.2.0 → kaizenstat-0.2.2}/kaizenstat.egg-info/SOURCES.txt +0 -0
- {kaizenstat-0.2.0 → kaizenstat-0.2.2}/kaizenstat.egg-info/dependency_links.txt +0 -0
- {kaizenstat-0.2.0 → kaizenstat-0.2.2}/kaizenstat.egg-info/entry_points.txt +0 -0
- {kaizenstat-0.2.0 → kaizenstat-0.2.2}/kaizenstat.egg-info/requires.txt +0 -0
- {kaizenstat-0.2.0 → kaizenstat-0.2.2}/kaizenstat.egg-info/top_level.txt +0 -0
- {kaizenstat-0.2.0 → kaizenstat-0.2.2}/pyproject.toml +0 -0
- {kaizenstat-0.2.0 → kaizenstat-0.2.2}/setup.cfg +0 -0
|
@@ -0,0 +1,172 @@
|
|
|
1
|
+
Metadata-Version: 2.4
|
|
2
|
+
Name: kaizenstat
|
|
3
|
+
Version: 0.2.2
|
|
4
|
+
Summary: Zero-friction AutoML + Data Cleaning Toolkit
|
|
5
|
+
Author: Masuddar Rahman
|
|
6
|
+
Requires-Python: >=3.8
|
|
7
|
+
Description-Content-Type: text/markdown
|
|
8
|
+
Requires-Dist: pandas
|
|
9
|
+
Requires-Dist: numpy
|
|
10
|
+
Requires-Dist: scikit-learn
|
|
11
|
+
Requires-Dist: rich
|
|
12
|
+
Requires-Dist: joblib
|
|
13
|
+
Provides-Extra: ui
|
|
14
|
+
Requires-Dist: streamlit; extra == "ui"
|
|
15
|
+
Provides-Extra: gpu
|
|
16
|
+
Requires-Dist: xgboost; extra == "gpu"
|
|
17
|
+
Provides-Extra: fast
|
|
18
|
+
Requires-Dist: polars; extra == "fast"
|
|
19
|
+
Provides-Extra: all
|
|
20
|
+
Requires-Dist: streamlit; extra == "all"
|
|
21
|
+
Requires-Dist: xgboost; extra == "all"
|
|
22
|
+
Requires-Dist: polars; extra == "all"
|
|
23
|
+
Dynamic: author
|
|
24
|
+
Dynamic: description
|
|
25
|
+
Dynamic: description-content-type
|
|
26
|
+
Dynamic: provides-extra
|
|
27
|
+
Dynamic: requires-dist
|
|
28
|
+
Dynamic: requires-python
|
|
29
|
+
Dynamic: summary
|
|
30
|
+
|
|
31
|
+
# 🚀 KaizenStat
|
|
32
|
+
|
|
33
|
+
[](https://pypi.org/project/kaizenstat/)
|
|
34
|
+
[](https://opensource.org/licenses/MIT)
|
|
35
|
+
[](https://www.python.org/downloads/)
|
|
36
|
+
[](https://github.com/psf/black)
|
|
37
|
+
|
|
38
|
+
**KaizenStat** is a zero-friction, production-grade AutoML, automated data cleaning, and model explanation engine. It allows you to audit datasets, repair data issues, benchmark models with hardware-aware optimization, export standalone pipeline code, and host web-based dashboards—all with a single command or Python import.
|
|
39
|
+
|
|
40
|
+
---
|
|
41
|
+
|
|
42
|
+
## 🎯 Core Philosophy
|
|
43
|
+
|
|
44
|
+
* **Zero-Friction AutoML:** No complex configuration files. Pass your dataset, name your target, and KaizenStat does the rest.
|
|
45
|
+
* **Production Crash-Proofing:** Automatically handles messy real-world data issues: high-cardinality ID columns, datetime parsing, missing inputs, class imbalance, and label encoding.
|
|
46
|
+
* **Explainable AI:** Breaks open the "black box" by generating standalone, human-readable Python training code reproducing the best-found pipeline.
|
|
47
|
+
* **Hybrid Interface:** 100% parity between CLI and Python API.
|
|
48
|
+
|
|
49
|
+
---
|
|
50
|
+
|
|
51
|
+
## 📦 Installation
|
|
52
|
+
|
|
53
|
+
Install the core package with zero heavy external dependencies:
|
|
54
|
+
|
|
55
|
+
```bash
|
|
56
|
+
pip install kaizenstat
|
|
57
|
+
```
|
|
58
|
+
|
|
59
|
+
### Optional Drivers & Accelerators
|
|
60
|
+
|
|
61
|
+
Tailor KaizenStat to your specific workload:
|
|
62
|
+
|
|
63
|
+
```bash
|
|
64
|
+
pip install kaizenstat[ui] # Install Streamlit for web dashboards
|
|
65
|
+
pip install kaizenstat[gpu] # Install XGBoost with GPU/MPS support
|
|
66
|
+
pip install kaizenstat[fast] # Install Polars for ultra-fast CSV parsing
|
|
67
|
+
pip install kaizenstat[all] # Install all optional components
|
|
68
|
+
```
|
|
69
|
+
|
|
70
|
+
---
|
|
71
|
+
|
|
72
|
+
## ⚔️ CLI & Python API Feature Matrix
|
|
73
|
+
|
|
74
|
+
KaizenStat is designed around a single unified vocabulary. Every CLI command has a direct, equivalent function in the Python SDK.
|
|
75
|
+
|
|
76
|
+
| Command | Python API | Purpose |
|
|
77
|
+
| :--- | :--- | :--- |
|
|
78
|
+
| `kz audit` | `KaizenStat.audit()` | 🔍 Runs a diagnostic sweep (missing values, duplicates, imbalance, dead features). |
|
|
79
|
+
| `kz heal` | `KaizenStat.heal()` | 🩹 Clean, impute, parse datetimes, drop IDs, and encode string labels. |
|
|
80
|
+
| `kz benchmark` | `KaizenStat.benchmark()` | 🚀 Automatically trains, optimizes, and ranks model pipelines. |
|
|
81
|
+
| `kz auto` | `KaizenStat.auto()` | ⚡ Orchestrates the entire pipeline in sequence (Audit ➔ Heal ➔ Benchmark). |
|
|
82
|
+
| `kz explain` | `KaizenStat.explain()` | 💬 Generates plain-English diagnostic summaries and model recommendations. |
|
|
83
|
+
| `kz codegen` | `KaizenStat.codegen()` | 📝 Generates standalone, dependency-free Python code for the best model. |
|
|
84
|
+
| `kz export-model` | `KaizenStat.save_model()` | 💾 Trains the top pipeline and saves it directly to a `.joblib` binary. |
|
|
85
|
+
| `kz report` | `KaizenStat.report()` | 📊 Generates a beautiful, interactive HTML profiling report with Chart.js. |
|
|
86
|
+
| `kz serve` | `KaizenStat.serve()` | 🌐 Launches a local web dashboard to explore the data and run predictions. |
|
|
87
|
+
|
|
88
|
+
---
|
|
89
|
+
|
|
90
|
+
## 💡 Quick Start Guide
|
|
91
|
+
|
|
92
|
+
### 1. Python SDK Usage
|
|
93
|
+
|
|
94
|
+
```python
|
|
95
|
+
from kaizenstat import KaizenStat
|
|
96
|
+
import pandas as pd
|
|
97
|
+
|
|
98
|
+
# Load your dataset
|
|
99
|
+
df = pd.read_csv("dataset.csv")
|
|
100
|
+
|
|
101
|
+
# 1. Diagnose issues
|
|
102
|
+
findings = KaizenStat.audit(df, target="target_column")
|
|
103
|
+
|
|
104
|
+
# 2. Automatically heal dirty data
|
|
105
|
+
clean_df = KaizenStat.heal(df, target="target_column")
|
|
106
|
+
|
|
107
|
+
# 3. Benchmark models with cross-validation
|
|
108
|
+
leaderboard = KaizenStat.benchmark(clean_df, target="target_column")
|
|
109
|
+
|
|
110
|
+
# 4. Generate standalone code for reproduction
|
|
111
|
+
KaizenStat.codegen("dataset.csv", target="target_column", output_path="reproduce.py")
|
|
112
|
+
```
|
|
113
|
+
|
|
114
|
+
### 2. Command Line Interface (CLI)
|
|
115
|
+
|
|
116
|
+
```bash
|
|
117
|
+
# Get quick help and list commands
|
|
118
|
+
kz --help
|
|
119
|
+
|
|
120
|
+
# Run the full pipeline in one command
|
|
121
|
+
kz auto dataset.csv --target target_column
|
|
122
|
+
|
|
123
|
+
# Repair a dataset and save the clean file
|
|
124
|
+
kz heal dataset.csv --target target_column -o cleaned_dataset.csv
|
|
125
|
+
|
|
126
|
+
# Launch a local Streamlit app to preview and test model performance
|
|
127
|
+
kz serve dataset.csv --target target_column --port 8501
|
|
128
|
+
```
|
|
129
|
+
|
|
130
|
+
---
|
|
131
|
+
|
|
132
|
+
## 🧠 Behind the Scenes: Core Engines
|
|
133
|
+
|
|
134
|
+
### 1. Hardware-Aware Execution
|
|
135
|
+
KaizenStat automatically checks your environment using `detect_device()`. It leverages CUDA on Nvidia GPUs and MPS on Apple Silicon (M1/M2/M3 Mac) to accelerate training when optional dependencies (like `xgboost`) are installed.
|
|
136
|
+
|
|
137
|
+
### 2. Smart Model Selection
|
|
138
|
+
The benchmarking engine adjusts its logic dynamically based on the dataset properties:
|
|
139
|
+
* **Large Datasets (>100k rows):** Excludes slow estimators (like Gradient Boosting) on standard CPU hosts to prevent compute lockups.
|
|
140
|
+
* **High-Cardinality Categoricals:** Optimizes feature preprocessors and prioritizes tree-based models (Random Forests, Gradient Boosting, XGBoost).
|
|
141
|
+
* **Float Targets:** Detects values with a continuous numeric profile and switches the entire pipeline to regression mode automatically.
|
|
142
|
+
|
|
143
|
+
### 3. Automatic Imbalance Correction
|
|
144
|
+
During data healing, KaizenStat computes target ratios. If target class distribution has a skew larger than `65% / 35%`, it adjusts model parameters dynamically (e.g. setting `class_weight="balanced"` in scikit-learn estimators).
|
|
145
|
+
|
|
146
|
+
---
|
|
147
|
+
|
|
148
|
+
## 🛠 Developer Guide
|
|
149
|
+
|
|
150
|
+
### Setting up a local workspace
|
|
151
|
+
|
|
152
|
+
To contribute or run local enhancements:
|
|
153
|
+
|
|
154
|
+
1. Clone the repository:
|
|
155
|
+
```bash
|
|
156
|
+
git clone https://github.com/masuddarrahaman/KaizenStat-Library.git
|
|
157
|
+
cd KaizenStat-Library
|
|
158
|
+
```
|
|
159
|
+
2. Install the package in editable mode with all optional drivers:
|
|
160
|
+
```bash
|
|
161
|
+
pip install -e ".[all]"
|
|
162
|
+
```
|
|
163
|
+
3. Run tests or validation:
|
|
164
|
+
```bash
|
|
165
|
+
python3 -m unittest discover -s tests
|
|
166
|
+
```
|
|
167
|
+
|
|
168
|
+
---
|
|
169
|
+
|
|
170
|
+
## 📄 License
|
|
171
|
+
|
|
172
|
+
Distributed under the MIT License. See `LICENSE` for details.
|
|
@@ -0,0 +1,142 @@
|
|
|
1
|
+
# 🚀 KaizenStat
|
|
2
|
+
|
|
3
|
+
[](https://pypi.org/project/kaizenstat/)
|
|
4
|
+
[](https://opensource.org/licenses/MIT)
|
|
5
|
+
[](https://www.python.org/downloads/)
|
|
6
|
+
[](https://github.com/psf/black)
|
|
7
|
+
|
|
8
|
+
**KaizenStat** is a zero-friction, production-grade AutoML, automated data cleaning, and model explanation engine. It allows you to audit datasets, repair data issues, benchmark models with hardware-aware optimization, export standalone pipeline code, and host web-based dashboards—all with a single command or Python import.
|
|
9
|
+
|
|
10
|
+
---
|
|
11
|
+
|
|
12
|
+
## 🎯 Core Philosophy
|
|
13
|
+
|
|
14
|
+
* **Zero-Friction AutoML:** No complex configuration files. Pass your dataset, name your target, and KaizenStat does the rest.
|
|
15
|
+
* **Production Crash-Proofing:** Automatically handles messy real-world data issues: high-cardinality ID columns, datetime parsing, missing inputs, class imbalance, and label encoding.
|
|
16
|
+
* **Explainable AI:** Breaks open the "black box" by generating standalone, human-readable Python training code reproducing the best-found pipeline.
|
|
17
|
+
* **Hybrid Interface:** 100% parity between CLI and Python API.
|
|
18
|
+
|
|
19
|
+
---
|
|
20
|
+
|
|
21
|
+
## 📦 Installation
|
|
22
|
+
|
|
23
|
+
Install the core package with zero heavy external dependencies:
|
|
24
|
+
|
|
25
|
+
```bash
|
|
26
|
+
pip install kaizenstat
|
|
27
|
+
```
|
|
28
|
+
|
|
29
|
+
### Optional Drivers & Accelerators
|
|
30
|
+
|
|
31
|
+
Tailor KaizenStat to your specific workload:
|
|
32
|
+
|
|
33
|
+
```bash
|
|
34
|
+
pip install kaizenstat[ui] # Install Streamlit for web dashboards
|
|
35
|
+
pip install kaizenstat[gpu] # Install XGBoost with GPU/MPS support
|
|
36
|
+
pip install kaizenstat[fast] # Install Polars for ultra-fast CSV parsing
|
|
37
|
+
pip install kaizenstat[all] # Install all optional components
|
|
38
|
+
```
|
|
39
|
+
|
|
40
|
+
---
|
|
41
|
+
|
|
42
|
+
## ⚔️ CLI & Python API Feature Matrix
|
|
43
|
+
|
|
44
|
+
KaizenStat is designed around a single unified vocabulary. Every CLI command has a direct, equivalent function in the Python SDK.
|
|
45
|
+
|
|
46
|
+
| Command | Python API | Purpose |
|
|
47
|
+
| :--- | :--- | :--- |
|
|
48
|
+
| `kz audit` | `KaizenStat.audit()` | 🔍 Runs a diagnostic sweep (missing values, duplicates, imbalance, dead features). |
|
|
49
|
+
| `kz heal` | `KaizenStat.heal()` | 🩹 Clean, impute, parse datetimes, drop IDs, and encode string labels. |
|
|
50
|
+
| `kz benchmark` | `KaizenStat.benchmark()` | 🚀 Automatically trains, optimizes, and ranks model pipelines. |
|
|
51
|
+
| `kz auto` | `KaizenStat.auto()` | ⚡ Orchestrates the entire pipeline in sequence (Audit ➔ Heal ➔ Benchmark). |
|
|
52
|
+
| `kz explain` | `KaizenStat.explain()` | 💬 Generates plain-English diagnostic summaries and model recommendations. |
|
|
53
|
+
| `kz codegen` | `KaizenStat.codegen()` | 📝 Generates standalone, dependency-free Python code for the best model. |
|
|
54
|
+
| `kz export-model` | `KaizenStat.save_model()` | 💾 Trains the top pipeline and saves it directly to a `.joblib` binary. |
|
|
55
|
+
| `kz report` | `KaizenStat.report()` | 📊 Generates a beautiful, interactive HTML profiling report with Chart.js. |
|
|
56
|
+
| `kz serve` | `KaizenStat.serve()` | 🌐 Launches a local web dashboard to explore the data and run predictions. |
|
|
57
|
+
|
|
58
|
+
---
|
|
59
|
+
|
|
60
|
+
## 💡 Quick Start Guide
|
|
61
|
+
|
|
62
|
+
### 1. Python SDK Usage
|
|
63
|
+
|
|
64
|
+
```python
|
|
65
|
+
from kaizenstat import KaizenStat
|
|
66
|
+
import pandas as pd
|
|
67
|
+
|
|
68
|
+
# Load your dataset
|
|
69
|
+
df = pd.read_csv("dataset.csv")
|
|
70
|
+
|
|
71
|
+
# 1. Diagnose issues
|
|
72
|
+
findings = KaizenStat.audit(df, target="target_column")
|
|
73
|
+
|
|
74
|
+
# 2. Automatically heal dirty data
|
|
75
|
+
clean_df = KaizenStat.heal(df, target="target_column")
|
|
76
|
+
|
|
77
|
+
# 3. Benchmark models with cross-validation
|
|
78
|
+
leaderboard = KaizenStat.benchmark(clean_df, target="target_column")
|
|
79
|
+
|
|
80
|
+
# 4. Generate standalone code for reproduction
|
|
81
|
+
KaizenStat.codegen("dataset.csv", target="target_column", output_path="reproduce.py")
|
|
82
|
+
```
|
|
83
|
+
|
|
84
|
+
### 2. Command Line Interface (CLI)
|
|
85
|
+
|
|
86
|
+
```bash
|
|
87
|
+
# Get quick help and list commands
|
|
88
|
+
kz --help
|
|
89
|
+
|
|
90
|
+
# Run the full pipeline in one command
|
|
91
|
+
kz auto dataset.csv --target target_column
|
|
92
|
+
|
|
93
|
+
# Repair a dataset and save the clean file
|
|
94
|
+
kz heal dataset.csv --target target_column -o cleaned_dataset.csv
|
|
95
|
+
|
|
96
|
+
# Launch a local Streamlit app to preview and test model performance
|
|
97
|
+
kz serve dataset.csv --target target_column --port 8501
|
|
98
|
+
```
|
|
99
|
+
|
|
100
|
+
---
|
|
101
|
+
|
|
102
|
+
## 🧠 Behind the Scenes: Core Engines
|
|
103
|
+
|
|
104
|
+
### 1. Hardware-Aware Execution
|
|
105
|
+
KaizenStat automatically checks your environment using `detect_device()`. It leverages CUDA on Nvidia GPUs and MPS on Apple Silicon (M1/M2/M3 Mac) to accelerate training when optional dependencies (like `xgboost`) are installed.
|
|
106
|
+
|
|
107
|
+
### 2. Smart Model Selection
|
|
108
|
+
The benchmarking engine adjusts its logic dynamically based on the dataset properties:
|
|
109
|
+
* **Large Datasets (>100k rows):** Excludes slow estimators (like Gradient Boosting) on standard CPU hosts to prevent compute lockups.
|
|
110
|
+
* **High-Cardinality Categoricals:** Optimizes feature preprocessors and prioritizes tree-based models (Random Forests, Gradient Boosting, XGBoost).
|
|
111
|
+
* **Float Targets:** Detects values with a continuous numeric profile and switches the entire pipeline to regression mode automatically.
|
|
112
|
+
|
|
113
|
+
### 3. Automatic Imbalance Correction
|
|
114
|
+
During data healing, KaizenStat computes target ratios. If target class distribution has a skew larger than `65% / 35%`, it adjusts model parameters dynamically (e.g. setting `class_weight="balanced"` in scikit-learn estimators).
|
|
115
|
+
|
|
116
|
+
---
|
|
117
|
+
|
|
118
|
+
## 🛠 Developer Guide
|
|
119
|
+
|
|
120
|
+
### Setting up a local workspace
|
|
121
|
+
|
|
122
|
+
To contribute or run local enhancements:
|
|
123
|
+
|
|
124
|
+
1. Clone the repository:
|
|
125
|
+
```bash
|
|
126
|
+
git clone https://github.com/masuddarrahaman/KaizenStat-Library.git
|
|
127
|
+
cd KaizenStat-Library
|
|
128
|
+
```
|
|
129
|
+
2. Install the package in editable mode with all optional drivers:
|
|
130
|
+
```bash
|
|
131
|
+
pip install -e ".[all]"
|
|
132
|
+
```
|
|
133
|
+
3. Run tests or validation:
|
|
134
|
+
```bash
|
|
135
|
+
python3 -m unittest discover -s tests
|
|
136
|
+
```
|
|
137
|
+
|
|
138
|
+
---
|
|
139
|
+
|
|
140
|
+
## 📄 License
|
|
141
|
+
|
|
142
|
+
Distributed under the MIT License. See `LICENSE` for details.
|
|
@@ -650,6 +650,11 @@ class KaizenStat:
|
|
|
650
650
|
"""
|
|
651
651
|
df = DataEngine.load(data)
|
|
652
652
|
data_path = data if isinstance(data, str) else "data.csv"
|
|
653
|
+
if isinstance(data, pd.DataFrame):
|
|
654
|
+
try:
|
|
655
|
+
data.to_csv("data.csv", index=False)
|
|
656
|
+
except Exception:
|
|
657
|
+
pass
|
|
653
658
|
|
|
654
659
|
# Run pipeline to determine best model
|
|
655
660
|
df_clean = KaizenStat.heal(df, target)
|
|
@@ -0,0 +1,172 @@
|
|
|
1
|
+
Metadata-Version: 2.4
|
|
2
|
+
Name: kaizenstat
|
|
3
|
+
Version: 0.2.2
|
|
4
|
+
Summary: Zero-friction AutoML + Data Cleaning Toolkit
|
|
5
|
+
Author: Masuddar Rahman
|
|
6
|
+
Requires-Python: >=3.8
|
|
7
|
+
Description-Content-Type: text/markdown
|
|
8
|
+
Requires-Dist: pandas
|
|
9
|
+
Requires-Dist: numpy
|
|
10
|
+
Requires-Dist: scikit-learn
|
|
11
|
+
Requires-Dist: rich
|
|
12
|
+
Requires-Dist: joblib
|
|
13
|
+
Provides-Extra: ui
|
|
14
|
+
Requires-Dist: streamlit; extra == "ui"
|
|
15
|
+
Provides-Extra: gpu
|
|
16
|
+
Requires-Dist: xgboost; extra == "gpu"
|
|
17
|
+
Provides-Extra: fast
|
|
18
|
+
Requires-Dist: polars; extra == "fast"
|
|
19
|
+
Provides-Extra: all
|
|
20
|
+
Requires-Dist: streamlit; extra == "all"
|
|
21
|
+
Requires-Dist: xgboost; extra == "all"
|
|
22
|
+
Requires-Dist: polars; extra == "all"
|
|
23
|
+
Dynamic: author
|
|
24
|
+
Dynamic: description
|
|
25
|
+
Dynamic: description-content-type
|
|
26
|
+
Dynamic: provides-extra
|
|
27
|
+
Dynamic: requires-dist
|
|
28
|
+
Dynamic: requires-python
|
|
29
|
+
Dynamic: summary
|
|
30
|
+
|
|
31
|
+
# 🚀 KaizenStat
|
|
32
|
+
|
|
33
|
+
[](https://pypi.org/project/kaizenstat/)
|
|
34
|
+
[](https://opensource.org/licenses/MIT)
|
|
35
|
+
[](https://www.python.org/downloads/)
|
|
36
|
+
[](https://github.com/psf/black)
|
|
37
|
+
|
|
38
|
+
**KaizenStat** is a zero-friction, production-grade AutoML, automated data cleaning, and model explanation engine. It allows you to audit datasets, repair data issues, benchmark models with hardware-aware optimization, export standalone pipeline code, and host web-based dashboards—all with a single command or Python import.
|
|
39
|
+
|
|
40
|
+
---
|
|
41
|
+
|
|
42
|
+
## 🎯 Core Philosophy
|
|
43
|
+
|
|
44
|
+
* **Zero-Friction AutoML:** No complex configuration files. Pass your dataset, name your target, and KaizenStat does the rest.
|
|
45
|
+
* **Production Crash-Proofing:** Automatically handles messy real-world data issues: high-cardinality ID columns, datetime parsing, missing inputs, class imbalance, and label encoding.
|
|
46
|
+
* **Explainable AI:** Breaks open the "black box" by generating standalone, human-readable Python training code reproducing the best-found pipeline.
|
|
47
|
+
* **Hybrid Interface:** 100% parity between CLI and Python API.
|
|
48
|
+
|
|
49
|
+
---
|
|
50
|
+
|
|
51
|
+
## 📦 Installation
|
|
52
|
+
|
|
53
|
+
Install the core package with zero heavy external dependencies:
|
|
54
|
+
|
|
55
|
+
```bash
|
|
56
|
+
pip install kaizenstat
|
|
57
|
+
```
|
|
58
|
+
|
|
59
|
+
### Optional Drivers & Accelerators
|
|
60
|
+
|
|
61
|
+
Tailor KaizenStat to your specific workload:
|
|
62
|
+
|
|
63
|
+
```bash
|
|
64
|
+
pip install kaizenstat[ui] # Install Streamlit for web dashboards
|
|
65
|
+
pip install kaizenstat[gpu] # Install XGBoost with GPU/MPS support
|
|
66
|
+
pip install kaizenstat[fast] # Install Polars for ultra-fast CSV parsing
|
|
67
|
+
pip install kaizenstat[all] # Install all optional components
|
|
68
|
+
```
|
|
69
|
+
|
|
70
|
+
---
|
|
71
|
+
|
|
72
|
+
## ⚔️ CLI & Python API Feature Matrix
|
|
73
|
+
|
|
74
|
+
KaizenStat is designed around a single unified vocabulary. Every CLI command has a direct, equivalent function in the Python SDK.
|
|
75
|
+
|
|
76
|
+
| Command | Python API | Purpose |
|
|
77
|
+
| :--- | :--- | :--- |
|
|
78
|
+
| `kz audit` | `KaizenStat.audit()` | 🔍 Runs a diagnostic sweep (missing values, duplicates, imbalance, dead features). |
|
|
79
|
+
| `kz heal` | `KaizenStat.heal()` | 🩹 Clean, impute, parse datetimes, drop IDs, and encode string labels. |
|
|
80
|
+
| `kz benchmark` | `KaizenStat.benchmark()` | 🚀 Automatically trains, optimizes, and ranks model pipelines. |
|
|
81
|
+
| `kz auto` | `KaizenStat.auto()` | ⚡ Orchestrates the entire pipeline in sequence (Audit ➔ Heal ➔ Benchmark). |
|
|
82
|
+
| `kz explain` | `KaizenStat.explain()` | 💬 Generates plain-English diagnostic summaries and model recommendations. |
|
|
83
|
+
| `kz codegen` | `KaizenStat.codegen()` | 📝 Generates standalone, dependency-free Python code for the best model. |
|
|
84
|
+
| `kz export-model` | `KaizenStat.save_model()` | 💾 Trains the top pipeline and saves it directly to a `.joblib` binary. |
|
|
85
|
+
| `kz report` | `KaizenStat.report()` | 📊 Generates a beautiful, interactive HTML profiling report with Chart.js. |
|
|
86
|
+
| `kz serve` | `KaizenStat.serve()` | 🌐 Launches a local web dashboard to explore the data and run predictions. |
|
|
87
|
+
|
|
88
|
+
---
|
|
89
|
+
|
|
90
|
+
## 💡 Quick Start Guide
|
|
91
|
+
|
|
92
|
+
### 1. Python SDK Usage
|
|
93
|
+
|
|
94
|
+
```python
|
|
95
|
+
from kaizenstat import KaizenStat
|
|
96
|
+
import pandas as pd
|
|
97
|
+
|
|
98
|
+
# Load your dataset
|
|
99
|
+
df = pd.read_csv("dataset.csv")
|
|
100
|
+
|
|
101
|
+
# 1. Diagnose issues
|
|
102
|
+
findings = KaizenStat.audit(df, target="target_column")
|
|
103
|
+
|
|
104
|
+
# 2. Automatically heal dirty data
|
|
105
|
+
clean_df = KaizenStat.heal(df, target="target_column")
|
|
106
|
+
|
|
107
|
+
# 3. Benchmark models with cross-validation
|
|
108
|
+
leaderboard = KaizenStat.benchmark(clean_df, target="target_column")
|
|
109
|
+
|
|
110
|
+
# 4. Generate standalone code for reproduction
|
|
111
|
+
KaizenStat.codegen("dataset.csv", target="target_column", output_path="reproduce.py")
|
|
112
|
+
```
|
|
113
|
+
|
|
114
|
+
### 2. Command Line Interface (CLI)
|
|
115
|
+
|
|
116
|
+
```bash
|
|
117
|
+
# Get quick help and list commands
|
|
118
|
+
kz --help
|
|
119
|
+
|
|
120
|
+
# Run the full pipeline in one command
|
|
121
|
+
kz auto dataset.csv --target target_column
|
|
122
|
+
|
|
123
|
+
# Repair a dataset and save the clean file
|
|
124
|
+
kz heal dataset.csv --target target_column -o cleaned_dataset.csv
|
|
125
|
+
|
|
126
|
+
# Launch a local Streamlit app to preview and test model performance
|
|
127
|
+
kz serve dataset.csv --target target_column --port 8501
|
|
128
|
+
```
|
|
129
|
+
|
|
130
|
+
---
|
|
131
|
+
|
|
132
|
+
## 🧠 Behind the Scenes: Core Engines
|
|
133
|
+
|
|
134
|
+
### 1. Hardware-Aware Execution
|
|
135
|
+
KaizenStat automatically checks your environment using `detect_device()`. It leverages CUDA on Nvidia GPUs and MPS on Apple Silicon (M1/M2/M3 Mac) to accelerate training when optional dependencies (like `xgboost`) are installed.
|
|
136
|
+
|
|
137
|
+
### 2. Smart Model Selection
|
|
138
|
+
The benchmarking engine adjusts its logic dynamically based on the dataset properties:
|
|
139
|
+
* **Large Datasets (>100k rows):** Excludes slow estimators (like Gradient Boosting) on standard CPU hosts to prevent compute lockups.
|
|
140
|
+
* **High-Cardinality Categoricals:** Optimizes feature preprocessors and prioritizes tree-based models (Random Forests, Gradient Boosting, XGBoost).
|
|
141
|
+
* **Float Targets:** Detects values with a continuous numeric profile and switches the entire pipeline to regression mode automatically.
|
|
142
|
+
|
|
143
|
+
### 3. Automatic Imbalance Correction
|
|
144
|
+
During data healing, KaizenStat computes target ratios. If target class distribution has a skew larger than `65% / 35%`, it adjusts model parameters dynamically (e.g. setting `class_weight="balanced"` in scikit-learn estimators).
|
|
145
|
+
|
|
146
|
+
---
|
|
147
|
+
|
|
148
|
+
## 🛠 Developer Guide
|
|
149
|
+
|
|
150
|
+
### Setting up a local workspace
|
|
151
|
+
|
|
152
|
+
To contribute or run local enhancements:
|
|
153
|
+
|
|
154
|
+
1. Clone the repository:
|
|
155
|
+
```bash
|
|
156
|
+
git clone https://github.com/masuddarrahaman/KaizenStat-Library.git
|
|
157
|
+
cd KaizenStat-Library
|
|
158
|
+
```
|
|
159
|
+
2. Install the package in editable mode with all optional drivers:
|
|
160
|
+
```bash
|
|
161
|
+
pip install -e ".[all]"
|
|
162
|
+
```
|
|
163
|
+
3. Run tests or validation:
|
|
164
|
+
```bash
|
|
165
|
+
python3 -m unittest discover -s tests
|
|
166
|
+
```
|
|
167
|
+
|
|
168
|
+
---
|
|
169
|
+
|
|
170
|
+
## 📄 License
|
|
171
|
+
|
|
172
|
+
Distributed under the MIT License. See `LICENSE` for details.
|
|
@@ -2,7 +2,7 @@ from setuptools import setup, find_packages
|
|
|
2
2
|
|
|
3
3
|
setup(
|
|
4
4
|
name="kaizenstat",
|
|
5
|
-
version="0.2.
|
|
5
|
+
version="0.2.2",
|
|
6
6
|
author="Masuddar Rahman",
|
|
7
7
|
description="Zero-friction AutoML + Data Cleaning Toolkit",
|
|
8
8
|
long_description=open("README.md").read() if open("README.md") else "",
|
kaizenstat-0.2.0/PKG-INFO
DELETED
|
@@ -1,176 +0,0 @@
|
|
|
1
|
-
Metadata-Version: 2.4
|
|
2
|
-
Name: kaizenstat
|
|
3
|
-
Version: 0.2.0
|
|
4
|
-
Summary: Zero-friction AutoML + Data Cleaning Toolkit
|
|
5
|
-
Author: Masuddar Rahman
|
|
6
|
-
Requires-Python: >=3.8
|
|
7
|
-
Description-Content-Type: text/markdown
|
|
8
|
-
Requires-Dist: pandas
|
|
9
|
-
Requires-Dist: numpy
|
|
10
|
-
Requires-Dist: scikit-learn
|
|
11
|
-
Requires-Dist: rich
|
|
12
|
-
Requires-Dist: joblib
|
|
13
|
-
Provides-Extra: ui
|
|
14
|
-
Requires-Dist: streamlit; extra == "ui"
|
|
15
|
-
Provides-Extra: gpu
|
|
16
|
-
Requires-Dist: xgboost; extra == "gpu"
|
|
17
|
-
Provides-Extra: fast
|
|
18
|
-
Requires-Dist: polars; extra == "fast"
|
|
19
|
-
Provides-Extra: all
|
|
20
|
-
Requires-Dist: streamlit; extra == "all"
|
|
21
|
-
Requires-Dist: xgboost; extra == "all"
|
|
22
|
-
Requires-Dist: polars; extra == "all"
|
|
23
|
-
Dynamic: author
|
|
24
|
-
Dynamic: description
|
|
25
|
-
Dynamic: description-content-type
|
|
26
|
-
Dynamic: provides-extra
|
|
27
|
-
Dynamic: requires-dist
|
|
28
|
-
Dynamic: requires-python
|
|
29
|
-
Dynamic: summary
|
|
30
|
-
|
|
31
|
-
# 🚀 KaizenStat
|
|
32
|
-
|
|
33
|
-
[](https://pypi.org/project/kaizenstat/)
|
|
34
|
-
[](https://opensource.org/licenses/MIT)
|
|
35
|
-
[](https://www.python.org/downloads/)
|
|
36
|
-
|
|
37
|
-
**KaizenStat** is a zero-friction data validation, automatic cleaning, and AutoML benchmarking toolkit. Diagnose datasets instantly, auto-repair issues, train baseline models, generate standalone Python code, and launch interactive dashboards — all in one command.
|
|
38
|
-
|
|
39
|
-
---
|
|
40
|
-
|
|
41
|
-
## ✨ Features
|
|
42
|
-
|
|
43
|
-
| Command | What it does |
|
|
44
|
-
|---|---|
|
|
45
|
-
| `kz audit` | 🔍 Diagnostic sweep — duplicates, NaNs, infs, ID columns, imbalance |
|
|
46
|
-
| `kz heal` | 🩹 Auto-clean — impute, deduplicate, drop dead columns |
|
|
47
|
-
| `kz benchmark` | 🚀 Train & rank ML models with cross-validation |
|
|
48
|
-
| `kz auto` | ⚡ Full pipeline in one command (audit → heal → benchmark) |
|
|
49
|
-
| `kz explain` | 💬 Plain-English summary of findings and recommendations |
|
|
50
|
-
| `kz codegen` | 📝 Generate a standalone Python training script |
|
|
51
|
-
| `kz export-model` | 💾 Train best model and save to `.joblib` |
|
|
52
|
-
| `kz report` | 📊 Generate interactive HTML report with charts |
|
|
53
|
-
| `kz serve` | 🌐 Launch interactive Streamlit web dashboard |
|
|
54
|
-
|
|
55
|
-
---
|
|
56
|
-
|
|
57
|
-
## 📦 Installation
|
|
58
|
-
|
|
59
|
-
```bash
|
|
60
|
-
pip install kaizenstat
|
|
61
|
-
```
|
|
62
|
-
|
|
63
|
-
**Optional extras:**
|
|
64
|
-
|
|
65
|
-
```bash
|
|
66
|
-
pip install kaizenstat[ui] # + Streamlit dashboard
|
|
67
|
-
pip install kaizenstat[gpu] # + XGBoost GPU support
|
|
68
|
-
pip install kaizenstat[fast] # + Polars fast data loading
|
|
69
|
-
pip install kaizenstat[all] # everything
|
|
70
|
-
```
|
|
71
|
-
|
|
72
|
-
---
|
|
73
|
-
|
|
74
|
-
## 🚀 Quick Start
|
|
75
|
-
|
|
76
|
-
### Python API
|
|
77
|
-
|
|
78
|
-
```python
|
|
79
|
-
from kaizenstat import KaizenStat
|
|
80
|
-
|
|
81
|
-
# Full pipeline in one call
|
|
82
|
-
KaizenStat.auto("data.csv", target="price")
|
|
83
|
-
|
|
84
|
-
# Or step-by-step
|
|
85
|
-
import pandas as pd
|
|
86
|
-
df = pd.read_csv("data.csv")
|
|
87
|
-
|
|
88
|
-
KaizenStat.audit(df, target="price")
|
|
89
|
-
df_clean = KaizenStat.heal(df, target="price")
|
|
90
|
-
results = KaizenStat.benchmark(df_clean, target="price")
|
|
91
|
-
```
|
|
92
|
-
|
|
93
|
-
### 💬 Get a Plain-English Explanation
|
|
94
|
-
|
|
95
|
-
```python
|
|
96
|
-
KaizenStat.explain("data.csv", target="price")
|
|
97
|
-
```
|
|
98
|
-
|
|
99
|
-
### 📝 Generate Standalone Code
|
|
100
|
-
|
|
101
|
-
```python
|
|
102
|
-
KaizenStat.codegen("data.csv", target="price", output_path="deploy.py")
|
|
103
|
-
```
|
|
104
|
-
|
|
105
|
-
### 💾 Export & Load Models
|
|
106
|
-
|
|
107
|
-
```python
|
|
108
|
-
# Train + save
|
|
109
|
-
KaizenStat.auto("data.csv", target="price")
|
|
110
|
-
KaizenStat.save_model(path="model.joblib")
|
|
111
|
-
|
|
112
|
-
# Load later
|
|
113
|
-
pipeline = KaizenStat.load_model("model.joblib")
|
|
114
|
-
predictions = pipeline.predict(new_data)
|
|
115
|
-
```
|
|
116
|
-
|
|
117
|
-
### 📊 Generate HTML Report
|
|
118
|
-
|
|
119
|
-
```python
|
|
120
|
-
KaizenStat.report("data.csv", target="price", output_path="report.html")
|
|
121
|
-
```
|
|
122
|
-
|
|
123
|
-
### 🌐 Launch Web Dashboard
|
|
124
|
-
|
|
125
|
-
```python
|
|
126
|
-
KaizenStat.serve("data.csv", target="price")
|
|
127
|
-
```
|
|
128
|
-
|
|
129
|
-
---
|
|
130
|
-
|
|
131
|
-
## 💻 CLI Usage
|
|
132
|
-
|
|
133
|
-
```bash
|
|
134
|
-
# Diagnostic sweep
|
|
135
|
-
kz audit data.csv --target price
|
|
136
|
-
|
|
137
|
-
# Auto-clean dataset
|
|
138
|
-
kz heal data.csv --target price -o clean.csv
|
|
139
|
-
|
|
140
|
-
# Train & rank models
|
|
141
|
-
kz benchmark clean.csv --target price
|
|
142
|
-
|
|
143
|
-
# Full pipeline
|
|
144
|
-
kz auto data.csv --target price
|
|
145
|
-
|
|
146
|
-
# Plain-English summary
|
|
147
|
-
kz explain data.csv --target price
|
|
148
|
-
|
|
149
|
-
# Generate standalone Python script
|
|
150
|
-
kz codegen data.csv --target price -o deploy.py
|
|
151
|
-
|
|
152
|
-
# Train best model and export
|
|
153
|
-
kz export-model data.csv --target price -o model.joblib
|
|
154
|
-
|
|
155
|
-
# Generate interactive HTML report
|
|
156
|
-
kz report data.csv --target price -o report.html
|
|
157
|
-
|
|
158
|
-
# Launch web dashboard
|
|
159
|
-
kz serve data.csv --target price
|
|
160
|
-
```
|
|
161
|
-
|
|
162
|
-
---
|
|
163
|
-
|
|
164
|
-
## 🛠 Development
|
|
165
|
-
|
|
166
|
-
```bash
|
|
167
|
-
git clone https://github.com/yourusername/kaizenstat.git
|
|
168
|
-
cd kaizenstat
|
|
169
|
-
pip install -e ".[all]"
|
|
170
|
-
```
|
|
171
|
-
|
|
172
|
-
---
|
|
173
|
-
|
|
174
|
-
## 📄 License
|
|
175
|
-
|
|
176
|
-
Distributed under the MIT License.
|
kaizenstat-0.2.0/README.md
DELETED
|
@@ -1,146 +0,0 @@
|
|
|
1
|
-
# 🚀 KaizenStat
|
|
2
|
-
|
|
3
|
-
[](https://pypi.org/project/kaizenstat/)
|
|
4
|
-
[](https://opensource.org/licenses/MIT)
|
|
5
|
-
[](https://www.python.org/downloads/)
|
|
6
|
-
|
|
7
|
-
**KaizenStat** is a zero-friction data validation, automatic cleaning, and AutoML benchmarking toolkit. Diagnose datasets instantly, auto-repair issues, train baseline models, generate standalone Python code, and launch interactive dashboards — all in one command.
|
|
8
|
-
|
|
9
|
-
---
|
|
10
|
-
|
|
11
|
-
## ✨ Features
|
|
12
|
-
|
|
13
|
-
| Command | What it does |
|
|
14
|
-
|---|---|
|
|
15
|
-
| `kz audit` | 🔍 Diagnostic sweep — duplicates, NaNs, infs, ID columns, imbalance |
|
|
16
|
-
| `kz heal` | 🩹 Auto-clean — impute, deduplicate, drop dead columns |
|
|
17
|
-
| `kz benchmark` | 🚀 Train & rank ML models with cross-validation |
|
|
18
|
-
| `kz auto` | ⚡ Full pipeline in one command (audit → heal → benchmark) |
|
|
19
|
-
| `kz explain` | 💬 Plain-English summary of findings and recommendations |
|
|
20
|
-
| `kz codegen` | 📝 Generate a standalone Python training script |
|
|
21
|
-
| `kz export-model` | 💾 Train best model and save to `.joblib` |
|
|
22
|
-
| `kz report` | 📊 Generate interactive HTML report with charts |
|
|
23
|
-
| `kz serve` | 🌐 Launch interactive Streamlit web dashboard |
|
|
24
|
-
|
|
25
|
-
---
|
|
26
|
-
|
|
27
|
-
## 📦 Installation
|
|
28
|
-
|
|
29
|
-
```bash
|
|
30
|
-
pip install kaizenstat
|
|
31
|
-
```
|
|
32
|
-
|
|
33
|
-
**Optional extras:**
|
|
34
|
-
|
|
35
|
-
```bash
|
|
36
|
-
pip install kaizenstat[ui] # + Streamlit dashboard
|
|
37
|
-
pip install kaizenstat[gpu] # + XGBoost GPU support
|
|
38
|
-
pip install kaizenstat[fast] # + Polars fast data loading
|
|
39
|
-
pip install kaizenstat[all] # everything
|
|
40
|
-
```
|
|
41
|
-
|
|
42
|
-
---
|
|
43
|
-
|
|
44
|
-
## 🚀 Quick Start
|
|
45
|
-
|
|
46
|
-
### Python API
|
|
47
|
-
|
|
48
|
-
```python
|
|
49
|
-
from kaizenstat import KaizenStat
|
|
50
|
-
|
|
51
|
-
# Full pipeline in one call
|
|
52
|
-
KaizenStat.auto("data.csv", target="price")
|
|
53
|
-
|
|
54
|
-
# Or step-by-step
|
|
55
|
-
import pandas as pd
|
|
56
|
-
df = pd.read_csv("data.csv")
|
|
57
|
-
|
|
58
|
-
KaizenStat.audit(df, target="price")
|
|
59
|
-
df_clean = KaizenStat.heal(df, target="price")
|
|
60
|
-
results = KaizenStat.benchmark(df_clean, target="price")
|
|
61
|
-
```
|
|
62
|
-
|
|
63
|
-
### 💬 Get a Plain-English Explanation
|
|
64
|
-
|
|
65
|
-
```python
|
|
66
|
-
KaizenStat.explain("data.csv", target="price")
|
|
67
|
-
```
|
|
68
|
-
|
|
69
|
-
### 📝 Generate Standalone Code
|
|
70
|
-
|
|
71
|
-
```python
|
|
72
|
-
KaizenStat.codegen("data.csv", target="price", output_path="deploy.py")
|
|
73
|
-
```
|
|
74
|
-
|
|
75
|
-
### 💾 Export & Load Models
|
|
76
|
-
|
|
77
|
-
```python
|
|
78
|
-
# Train + save
|
|
79
|
-
KaizenStat.auto("data.csv", target="price")
|
|
80
|
-
KaizenStat.save_model(path="model.joblib")
|
|
81
|
-
|
|
82
|
-
# Load later
|
|
83
|
-
pipeline = KaizenStat.load_model("model.joblib")
|
|
84
|
-
predictions = pipeline.predict(new_data)
|
|
85
|
-
```
|
|
86
|
-
|
|
87
|
-
### 📊 Generate HTML Report
|
|
88
|
-
|
|
89
|
-
```python
|
|
90
|
-
KaizenStat.report("data.csv", target="price", output_path="report.html")
|
|
91
|
-
```
|
|
92
|
-
|
|
93
|
-
### 🌐 Launch Web Dashboard
|
|
94
|
-
|
|
95
|
-
```python
|
|
96
|
-
KaizenStat.serve("data.csv", target="price")
|
|
97
|
-
```
|
|
98
|
-
|
|
99
|
-
---
|
|
100
|
-
|
|
101
|
-
## 💻 CLI Usage
|
|
102
|
-
|
|
103
|
-
```bash
|
|
104
|
-
# Diagnostic sweep
|
|
105
|
-
kz audit data.csv --target price
|
|
106
|
-
|
|
107
|
-
# Auto-clean dataset
|
|
108
|
-
kz heal data.csv --target price -o clean.csv
|
|
109
|
-
|
|
110
|
-
# Train & rank models
|
|
111
|
-
kz benchmark clean.csv --target price
|
|
112
|
-
|
|
113
|
-
# Full pipeline
|
|
114
|
-
kz auto data.csv --target price
|
|
115
|
-
|
|
116
|
-
# Plain-English summary
|
|
117
|
-
kz explain data.csv --target price
|
|
118
|
-
|
|
119
|
-
# Generate standalone Python script
|
|
120
|
-
kz codegen data.csv --target price -o deploy.py
|
|
121
|
-
|
|
122
|
-
# Train best model and export
|
|
123
|
-
kz export-model data.csv --target price -o model.joblib
|
|
124
|
-
|
|
125
|
-
# Generate interactive HTML report
|
|
126
|
-
kz report data.csv --target price -o report.html
|
|
127
|
-
|
|
128
|
-
# Launch web dashboard
|
|
129
|
-
kz serve data.csv --target price
|
|
130
|
-
```
|
|
131
|
-
|
|
132
|
-
---
|
|
133
|
-
|
|
134
|
-
## 🛠 Development
|
|
135
|
-
|
|
136
|
-
```bash
|
|
137
|
-
git clone https://github.com/yourusername/kaizenstat.git
|
|
138
|
-
cd kaizenstat
|
|
139
|
-
pip install -e ".[all]"
|
|
140
|
-
```
|
|
141
|
-
|
|
142
|
-
---
|
|
143
|
-
|
|
144
|
-
## 📄 License
|
|
145
|
-
|
|
146
|
-
Distributed under the MIT License.
|
|
@@ -1,176 +0,0 @@
|
|
|
1
|
-
Metadata-Version: 2.4
|
|
2
|
-
Name: kaizenstat
|
|
3
|
-
Version: 0.2.0
|
|
4
|
-
Summary: Zero-friction AutoML + Data Cleaning Toolkit
|
|
5
|
-
Author: Masuddar Rahman
|
|
6
|
-
Requires-Python: >=3.8
|
|
7
|
-
Description-Content-Type: text/markdown
|
|
8
|
-
Requires-Dist: pandas
|
|
9
|
-
Requires-Dist: numpy
|
|
10
|
-
Requires-Dist: scikit-learn
|
|
11
|
-
Requires-Dist: rich
|
|
12
|
-
Requires-Dist: joblib
|
|
13
|
-
Provides-Extra: ui
|
|
14
|
-
Requires-Dist: streamlit; extra == "ui"
|
|
15
|
-
Provides-Extra: gpu
|
|
16
|
-
Requires-Dist: xgboost; extra == "gpu"
|
|
17
|
-
Provides-Extra: fast
|
|
18
|
-
Requires-Dist: polars; extra == "fast"
|
|
19
|
-
Provides-Extra: all
|
|
20
|
-
Requires-Dist: streamlit; extra == "all"
|
|
21
|
-
Requires-Dist: xgboost; extra == "all"
|
|
22
|
-
Requires-Dist: polars; extra == "all"
|
|
23
|
-
Dynamic: author
|
|
24
|
-
Dynamic: description
|
|
25
|
-
Dynamic: description-content-type
|
|
26
|
-
Dynamic: provides-extra
|
|
27
|
-
Dynamic: requires-dist
|
|
28
|
-
Dynamic: requires-python
|
|
29
|
-
Dynamic: summary
|
|
30
|
-
|
|
31
|
-
# 🚀 KaizenStat
|
|
32
|
-
|
|
33
|
-
[](https://pypi.org/project/kaizenstat/)
|
|
34
|
-
[](https://opensource.org/licenses/MIT)
|
|
35
|
-
[](https://www.python.org/downloads/)
|
|
36
|
-
|
|
37
|
-
**KaizenStat** is a zero-friction data validation, automatic cleaning, and AutoML benchmarking toolkit. Diagnose datasets instantly, auto-repair issues, train baseline models, generate standalone Python code, and launch interactive dashboards — all in one command.
|
|
38
|
-
|
|
39
|
-
---
|
|
40
|
-
|
|
41
|
-
## ✨ Features
|
|
42
|
-
|
|
43
|
-
| Command | What it does |
|
|
44
|
-
|---|---|
|
|
45
|
-
| `kz audit` | 🔍 Diagnostic sweep — duplicates, NaNs, infs, ID columns, imbalance |
|
|
46
|
-
| `kz heal` | 🩹 Auto-clean — impute, deduplicate, drop dead columns |
|
|
47
|
-
| `kz benchmark` | 🚀 Train & rank ML models with cross-validation |
|
|
48
|
-
| `kz auto` | ⚡ Full pipeline in one command (audit → heal → benchmark) |
|
|
49
|
-
| `kz explain` | 💬 Plain-English summary of findings and recommendations |
|
|
50
|
-
| `kz codegen` | 📝 Generate a standalone Python training script |
|
|
51
|
-
| `kz export-model` | 💾 Train best model and save to `.joblib` |
|
|
52
|
-
| `kz report` | 📊 Generate interactive HTML report with charts |
|
|
53
|
-
| `kz serve` | 🌐 Launch interactive Streamlit web dashboard |
|
|
54
|
-
|
|
55
|
-
---
|
|
56
|
-
|
|
57
|
-
## 📦 Installation
|
|
58
|
-
|
|
59
|
-
```bash
|
|
60
|
-
pip install kaizenstat
|
|
61
|
-
```
|
|
62
|
-
|
|
63
|
-
**Optional extras:**
|
|
64
|
-
|
|
65
|
-
```bash
|
|
66
|
-
pip install kaizenstat[ui] # + Streamlit dashboard
|
|
67
|
-
pip install kaizenstat[gpu] # + XGBoost GPU support
|
|
68
|
-
pip install kaizenstat[fast] # + Polars fast data loading
|
|
69
|
-
pip install kaizenstat[all] # everything
|
|
70
|
-
```
|
|
71
|
-
|
|
72
|
-
---
|
|
73
|
-
|
|
74
|
-
## 🚀 Quick Start
|
|
75
|
-
|
|
76
|
-
### Python API
|
|
77
|
-
|
|
78
|
-
```python
|
|
79
|
-
from kaizenstat import KaizenStat
|
|
80
|
-
|
|
81
|
-
# Full pipeline in one call
|
|
82
|
-
KaizenStat.auto("data.csv", target="price")
|
|
83
|
-
|
|
84
|
-
# Or step-by-step
|
|
85
|
-
import pandas as pd
|
|
86
|
-
df = pd.read_csv("data.csv")
|
|
87
|
-
|
|
88
|
-
KaizenStat.audit(df, target="price")
|
|
89
|
-
df_clean = KaizenStat.heal(df, target="price")
|
|
90
|
-
results = KaizenStat.benchmark(df_clean, target="price")
|
|
91
|
-
```
|
|
92
|
-
|
|
93
|
-
### 💬 Get a Plain-English Explanation
|
|
94
|
-
|
|
95
|
-
```python
|
|
96
|
-
KaizenStat.explain("data.csv", target="price")
|
|
97
|
-
```
|
|
98
|
-
|
|
99
|
-
### 📝 Generate Standalone Code
|
|
100
|
-
|
|
101
|
-
```python
|
|
102
|
-
KaizenStat.codegen("data.csv", target="price", output_path="deploy.py")
|
|
103
|
-
```
|
|
104
|
-
|
|
105
|
-
### 💾 Export & Load Models
|
|
106
|
-
|
|
107
|
-
```python
|
|
108
|
-
# Train + save
|
|
109
|
-
KaizenStat.auto("data.csv", target="price")
|
|
110
|
-
KaizenStat.save_model(path="model.joblib")
|
|
111
|
-
|
|
112
|
-
# Load later
|
|
113
|
-
pipeline = KaizenStat.load_model("model.joblib")
|
|
114
|
-
predictions = pipeline.predict(new_data)
|
|
115
|
-
```
|
|
116
|
-
|
|
117
|
-
### 📊 Generate HTML Report
|
|
118
|
-
|
|
119
|
-
```python
|
|
120
|
-
KaizenStat.report("data.csv", target="price", output_path="report.html")
|
|
121
|
-
```
|
|
122
|
-
|
|
123
|
-
### 🌐 Launch Web Dashboard
|
|
124
|
-
|
|
125
|
-
```python
|
|
126
|
-
KaizenStat.serve("data.csv", target="price")
|
|
127
|
-
```
|
|
128
|
-
|
|
129
|
-
---
|
|
130
|
-
|
|
131
|
-
## 💻 CLI Usage
|
|
132
|
-
|
|
133
|
-
```bash
|
|
134
|
-
# Diagnostic sweep
|
|
135
|
-
kz audit data.csv --target price
|
|
136
|
-
|
|
137
|
-
# Auto-clean dataset
|
|
138
|
-
kz heal data.csv --target price -o clean.csv
|
|
139
|
-
|
|
140
|
-
# Train & rank models
|
|
141
|
-
kz benchmark clean.csv --target price
|
|
142
|
-
|
|
143
|
-
# Full pipeline
|
|
144
|
-
kz auto data.csv --target price
|
|
145
|
-
|
|
146
|
-
# Plain-English summary
|
|
147
|
-
kz explain data.csv --target price
|
|
148
|
-
|
|
149
|
-
# Generate standalone Python script
|
|
150
|
-
kz codegen data.csv --target price -o deploy.py
|
|
151
|
-
|
|
152
|
-
# Train best model and export
|
|
153
|
-
kz export-model data.csv --target price -o model.joblib
|
|
154
|
-
|
|
155
|
-
# Generate interactive HTML report
|
|
156
|
-
kz report data.csv --target price -o report.html
|
|
157
|
-
|
|
158
|
-
# Launch web dashboard
|
|
159
|
-
kz serve data.csv --target price
|
|
160
|
-
```
|
|
161
|
-
|
|
162
|
-
---
|
|
163
|
-
|
|
164
|
-
## 🛠 Development
|
|
165
|
-
|
|
166
|
-
```bash
|
|
167
|
-
git clone https://github.com/yourusername/kaizenstat.git
|
|
168
|
-
cd kaizenstat
|
|
169
|
-
pip install -e ".[all]"
|
|
170
|
-
```
|
|
171
|
-
|
|
172
|
-
---
|
|
173
|
-
|
|
174
|
-
## 📄 License
|
|
175
|
-
|
|
176
|
-
Distributed under the MIT License.
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|