kaizenstat 0.2.0__tar.gz → 0.2.2__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,172 @@
1
+ Metadata-Version: 2.4
2
+ Name: kaizenstat
3
+ Version: 0.2.2
4
+ Summary: Zero-friction AutoML + Data Cleaning Toolkit
5
+ Author: Masuddar Rahman
6
+ Requires-Python: >=3.8
7
+ Description-Content-Type: text/markdown
8
+ Requires-Dist: pandas
9
+ Requires-Dist: numpy
10
+ Requires-Dist: scikit-learn
11
+ Requires-Dist: rich
12
+ Requires-Dist: joblib
13
+ Provides-Extra: ui
14
+ Requires-Dist: streamlit; extra == "ui"
15
+ Provides-Extra: gpu
16
+ Requires-Dist: xgboost; extra == "gpu"
17
+ Provides-Extra: fast
18
+ Requires-Dist: polars; extra == "fast"
19
+ Provides-Extra: all
20
+ Requires-Dist: streamlit; extra == "all"
21
+ Requires-Dist: xgboost; extra == "all"
22
+ Requires-Dist: polars; extra == "all"
23
+ Dynamic: author
24
+ Dynamic: description
25
+ Dynamic: description-content-type
26
+ Dynamic: provides-extra
27
+ Dynamic: requires-dist
28
+ Dynamic: requires-python
29
+ Dynamic: summary
30
+
31
+ # 🚀 KaizenStat
32
+
33
+ [![PyPI Version](https://img.shields.io/pypi/v/kaizenstat.svg?style=flat-square&color=blue)](https://pypi.org/project/kaizenstat/)
34
+ [![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg?style=flat-square)](https://opensource.org/licenses/MIT)
35
+ [![Python Version](https://img.shields.io/badge/python-3.8%2B-blue.svg?style=flat-square)](https://www.python.org/downloads/)
36
+ [![Code Style: Black](https://img.shields.io/badge/code%20style-black-000000.svg?style=flat-square)](https://github.com/psf/black)
37
+
38
+ **KaizenStat** is a zero-friction, production-grade AutoML, automated data cleaning, and model explanation engine. It allows you to audit datasets, repair data issues, benchmark models with hardware-aware optimization, export standalone pipeline code, and host web-based dashboards—all with a single command or Python import.
39
+
40
+ ---
41
+
42
+ ## 🎯 Core Philosophy
43
+
44
+ * **Zero-Friction AutoML:** No complex configuration files. Pass your dataset, name your target, and KaizenStat does the rest.
45
+ * **Production Crash-Proofing:** Automatically handles messy real-world data issues: high-cardinality ID columns, datetime parsing, missing inputs, class imbalance, and label encoding.
46
+ * **Explainable AI:** Breaks open the "black box" by generating standalone, human-readable Python training code reproducing the best-found pipeline.
47
+ * **Hybrid Interface:** 100% parity between CLI and Python API.
48
+
49
+ ---
50
+
51
+ ## 📦 Installation
52
+
53
+ Install the core package with zero heavy external dependencies:
54
+
55
+ ```bash
56
+ pip install kaizenstat
57
+ ```
58
+
59
+ ### Optional Drivers & Accelerators
60
+
61
+ Tailor KaizenStat to your specific workload:
62
+
63
+ ```bash
64
+ pip install kaizenstat[ui] # Install Streamlit for web dashboards
65
+ pip install kaizenstat[gpu] # Install XGBoost with GPU/MPS support
66
+ pip install kaizenstat[fast] # Install Polars for ultra-fast CSV parsing
67
+ pip install kaizenstat[all] # Install all optional components
68
+ ```
69
+
70
+ ---
71
+
72
+ ## ⚔️ CLI & Python API Feature Matrix
73
+
74
+ KaizenStat is designed around a single unified vocabulary. Every CLI command has a direct, equivalent function in the Python SDK.
75
+
76
+ | Command | Python API | Purpose |
77
+ | :--- | :--- | :--- |
78
+ | `kz audit` | `KaizenStat.audit()` | 🔍 Runs a diagnostic sweep (missing values, duplicates, imbalance, dead features). |
79
+ | `kz heal` | `KaizenStat.heal()` | 🩹 Clean, impute, parse datetimes, drop IDs, and encode string labels. |
80
+ | `kz benchmark` | `KaizenStat.benchmark()` | 🚀 Automatically trains, optimizes, and ranks model pipelines. |
81
+ | `kz auto` | `KaizenStat.auto()` | ⚡ Orchestrates the entire pipeline in sequence (Audit ➔ Heal ➔ Benchmark). |
82
+ | `kz explain` | `KaizenStat.explain()` | 💬 Generates plain-English diagnostic summaries and model recommendations. |
83
+ | `kz codegen` | `KaizenStat.codegen()` | 📝 Generates standalone, dependency-free Python code for the best model. |
84
+ | `kz export-model` | `KaizenStat.save_model()` | 💾 Trains the top pipeline and saves it directly to a `.joblib` binary. |
85
+ | `kz report` | `KaizenStat.report()` | 📊 Generates a beautiful, interactive HTML profiling report with Chart.js. |
86
+ | `kz serve` | `KaizenStat.serve()` | 🌐 Launches a local web dashboard to explore the data and run predictions. |
87
+
88
+ ---
89
+
90
+ ## 💡 Quick Start Guide
91
+
92
+ ### 1. Python SDK Usage
93
+
94
+ ```python
95
+ from kaizenstat import KaizenStat
96
+ import pandas as pd
97
+
98
+ # Load your dataset
99
+ df = pd.read_csv("dataset.csv")
100
+
101
+ # 1. Diagnose issues
102
+ findings = KaizenStat.audit(df, target="target_column")
103
+
104
+ # 2. Automatically heal dirty data
105
+ clean_df = KaizenStat.heal(df, target="target_column")
106
+
107
+ # 3. Benchmark models with cross-validation
108
+ leaderboard = KaizenStat.benchmark(clean_df, target="target_column")
109
+
110
+ # 4. Generate standalone code for reproduction
111
+ KaizenStat.codegen("dataset.csv", target="target_column", output_path="reproduce.py")
112
+ ```
113
+
114
+ ### 2. Command Line Interface (CLI)
115
+
116
+ ```bash
117
+ # Get quick help and list commands
118
+ kz --help
119
+
120
+ # Run the full pipeline in one command
121
+ kz auto dataset.csv --target target_column
122
+
123
+ # Repair a dataset and save the clean file
124
+ kz heal dataset.csv --target target_column -o cleaned_dataset.csv
125
+
126
+ # Launch a local Streamlit app to preview and test model performance
127
+ kz serve dataset.csv --target target_column --port 8501
128
+ ```
129
+
130
+ ---
131
+
132
+ ## 🧠 Behind the Scenes: Core Engines
133
+
134
+ ### 1. Hardware-Aware Execution
135
+ KaizenStat automatically checks your environment using `detect_device()`. It leverages CUDA on Nvidia GPUs and MPS on Apple Silicon (M1/M2/M3 Mac) to accelerate training when optional dependencies (like `xgboost`) are installed.
136
+
137
+ ### 2. Smart Model Selection
138
+ The benchmarking engine adjusts its logic dynamically based on the dataset properties:
139
+ * **Large Datasets (>100k rows):** Excludes slow estimators (like Gradient Boosting) on standard CPU hosts to prevent compute lockups.
140
+ * **High-Cardinality Categoricals:** Optimizes feature preprocessors and prioritizes tree-based models (Random Forests, Gradient Boosting, XGBoost).
141
+ * **Float Targets:** Detects values with a continuous numeric profile and switches the entire pipeline to regression mode automatically.
142
+
143
+ ### 3. Automatic Imbalance Correction
144
+ During data healing, KaizenStat computes target ratios. If target class distribution has a skew larger than `65% / 35%`, it adjusts model parameters dynamically (e.g. setting `class_weight="balanced"` in scikit-learn estimators).
145
+
146
+ ---
147
+
148
+ ## 🛠 Developer Guide
149
+
150
+ ### Setting up a local workspace
151
+
152
+ To contribute or run local enhancements:
153
+
154
+ 1. Clone the repository:
155
+ ```bash
156
+ git clone https://github.com/masuddarrahaman/KaizenStat-Library.git
157
+ cd KaizenStat-Library
158
+ ```
159
+ 2. Install the package in editable mode with all optional drivers:
160
+ ```bash
161
+ pip install -e ".[all]"
162
+ ```
163
+ 3. Run tests or validation:
164
+ ```bash
165
+ python3 -m unittest discover -s tests
166
+ ```
167
+
168
+ ---
169
+
170
+ ## 📄 License
171
+
172
+ Distributed under the MIT License. See `LICENSE` for details.
@@ -0,0 +1,142 @@
1
+ # 🚀 KaizenStat
2
+
3
+ [![PyPI Version](https://img.shields.io/pypi/v/kaizenstat.svg?style=flat-square&color=blue)](https://pypi.org/project/kaizenstat/)
4
+ [![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg?style=flat-square)](https://opensource.org/licenses/MIT)
5
+ [![Python Version](https://img.shields.io/badge/python-3.8%2B-blue.svg?style=flat-square)](https://www.python.org/downloads/)
6
+ [![Code Style: Black](https://img.shields.io/badge/code%20style-black-000000.svg?style=flat-square)](https://github.com/psf/black)
7
+
8
+ **KaizenStat** is a zero-friction, production-grade AutoML, automated data cleaning, and model explanation engine. It allows you to audit datasets, repair data issues, benchmark models with hardware-aware optimization, export standalone pipeline code, and host web-based dashboards—all with a single command or Python import.
9
+
10
+ ---
11
+
12
+ ## 🎯 Core Philosophy
13
+
14
+ * **Zero-Friction AutoML:** No complex configuration files. Pass your dataset, name your target, and KaizenStat does the rest.
15
+ * **Production Crash-Proofing:** Automatically handles messy real-world data issues: high-cardinality ID columns, datetime parsing, missing inputs, class imbalance, and label encoding.
16
+ * **Explainable AI:** Breaks open the "black box" by generating standalone, human-readable Python training code reproducing the best-found pipeline.
17
+ * **Hybrid Interface:** 100% parity between CLI and Python API.
18
+
19
+ ---
20
+
21
+ ## 📦 Installation
22
+
23
+ Install the core package with zero heavy external dependencies:
24
+
25
+ ```bash
26
+ pip install kaizenstat
27
+ ```
28
+
29
+ ### Optional Drivers & Accelerators
30
+
31
+ Tailor KaizenStat to your specific workload:
32
+
33
+ ```bash
34
+ pip install kaizenstat[ui] # Install Streamlit for web dashboards
35
+ pip install kaizenstat[gpu] # Install XGBoost with GPU/MPS support
36
+ pip install kaizenstat[fast] # Install Polars for ultra-fast CSV parsing
37
+ pip install kaizenstat[all] # Install all optional components
38
+ ```
39
+
40
+ ---
41
+
42
+ ## ⚔️ CLI & Python API Feature Matrix
43
+
44
+ KaizenStat is designed around a single unified vocabulary. Every CLI command has a direct, equivalent function in the Python SDK.
45
+
46
+ | Command | Python API | Purpose |
47
+ | :--- | :--- | :--- |
48
+ | `kz audit` | `KaizenStat.audit()` | 🔍 Runs a diagnostic sweep (missing values, duplicates, imbalance, dead features). |
49
+ | `kz heal` | `KaizenStat.heal()` | 🩹 Clean, impute, parse datetimes, drop IDs, and encode string labels. |
50
+ | `kz benchmark` | `KaizenStat.benchmark()` | 🚀 Automatically trains, optimizes, and ranks model pipelines. |
51
+ | `kz auto` | `KaizenStat.auto()` | ⚡ Orchestrates the entire pipeline in sequence (Audit ➔ Heal ➔ Benchmark). |
52
+ | `kz explain` | `KaizenStat.explain()` | 💬 Generates plain-English diagnostic summaries and model recommendations. |
53
+ | `kz codegen` | `KaizenStat.codegen()` | 📝 Generates standalone, dependency-free Python code for the best model. |
54
+ | `kz export-model` | `KaizenStat.save_model()` | 💾 Trains the top pipeline and saves it directly to a `.joblib` binary. |
55
+ | `kz report` | `KaizenStat.report()` | 📊 Generates a beautiful, interactive HTML profiling report with Chart.js. |
56
+ | `kz serve` | `KaizenStat.serve()` | 🌐 Launches a local web dashboard to explore the data and run predictions. |
57
+
58
+ ---
59
+
60
+ ## 💡 Quick Start Guide
61
+
62
+ ### 1. Python SDK Usage
63
+
64
+ ```python
65
+ from kaizenstat import KaizenStat
66
+ import pandas as pd
67
+
68
+ # Load your dataset
69
+ df = pd.read_csv("dataset.csv")
70
+
71
+ # 1. Diagnose issues
72
+ findings = KaizenStat.audit(df, target="target_column")
73
+
74
+ # 2. Automatically heal dirty data
75
+ clean_df = KaizenStat.heal(df, target="target_column")
76
+
77
+ # 3. Benchmark models with cross-validation
78
+ leaderboard = KaizenStat.benchmark(clean_df, target="target_column")
79
+
80
+ # 4. Generate standalone code for reproduction
81
+ KaizenStat.codegen("dataset.csv", target="target_column", output_path="reproduce.py")
82
+ ```
83
+
84
+ ### 2. Command Line Interface (CLI)
85
+
86
+ ```bash
87
+ # Get quick help and list commands
88
+ kz --help
89
+
90
+ # Run the full pipeline in one command
91
+ kz auto dataset.csv --target target_column
92
+
93
+ # Repair a dataset and save the clean file
94
+ kz heal dataset.csv --target target_column -o cleaned_dataset.csv
95
+
96
+ # Launch a local Streamlit app to preview and test model performance
97
+ kz serve dataset.csv --target target_column --port 8501
98
+ ```
99
+
100
+ ---
101
+
102
+ ## 🧠 Behind the Scenes: Core Engines
103
+
104
+ ### 1. Hardware-Aware Execution
105
+ KaizenStat automatically checks your environment using `detect_device()`. It leverages CUDA on Nvidia GPUs and MPS on Apple Silicon (M1/M2/M3 Mac) to accelerate training when optional dependencies (like `xgboost`) are installed.
106
+
107
+ ### 2. Smart Model Selection
108
+ The benchmarking engine adjusts its logic dynamically based on the dataset properties:
109
+ * **Large Datasets (>100k rows):** Excludes slow estimators (like Gradient Boosting) on standard CPU hosts to prevent compute lockups.
110
+ * **High-Cardinality Categoricals:** Optimizes feature preprocessors and prioritizes tree-based models (Random Forests, Gradient Boosting, XGBoost).
111
+ * **Float Targets:** Detects values with a continuous numeric profile and switches the entire pipeline to regression mode automatically.
112
+
113
+ ### 3. Automatic Imbalance Correction
114
+ During data healing, KaizenStat computes target ratios. If target class distribution has a skew larger than `65% / 35%`, it adjusts model parameters dynamically (e.g. setting `class_weight="balanced"` in scikit-learn estimators).
115
+
116
+ ---
117
+
118
+ ## 🛠 Developer Guide
119
+
120
+ ### Setting up a local workspace
121
+
122
+ To contribute or run local enhancements:
123
+
124
+ 1. Clone the repository:
125
+ ```bash
126
+ git clone https://github.com/masuddarrahaman/KaizenStat-Library.git
127
+ cd KaizenStat-Library
128
+ ```
129
+ 2. Install the package in editable mode with all optional drivers:
130
+ ```bash
131
+ pip install -e ".[all]"
132
+ ```
133
+ 3. Run tests or validation:
134
+ ```bash
135
+ python3 -m unittest discover -s tests
136
+ ```
137
+
138
+ ---
139
+
140
+ ## 📄 License
141
+
142
+ Distributed under the MIT License. See `LICENSE` for details.
@@ -0,0 +1,6 @@
1
+ from .core import KaizenStat, DataEngine, detect_device
2
+
3
+ __version__ = "0.2.2"
4
+
5
+ __all__ = ["KaizenStat", "DataEngine", "detect_device", "__version__"]
6
+
@@ -650,6 +650,11 @@ class KaizenStat:
650
650
  """
651
651
  df = DataEngine.load(data)
652
652
  data_path = data if isinstance(data, str) else "data.csv"
653
+ if isinstance(data, pd.DataFrame):
654
+ try:
655
+ data.to_csv("data.csv", index=False)
656
+ except Exception:
657
+ pass
653
658
 
654
659
  # Run pipeline to determine best model
655
660
  df_clean = KaizenStat.heal(df, target)
@@ -0,0 +1,172 @@
1
+ Metadata-Version: 2.4
2
+ Name: kaizenstat
3
+ Version: 0.2.2
4
+ Summary: Zero-friction AutoML + Data Cleaning Toolkit
5
+ Author: Masuddar Rahman
6
+ Requires-Python: >=3.8
7
+ Description-Content-Type: text/markdown
8
+ Requires-Dist: pandas
9
+ Requires-Dist: numpy
10
+ Requires-Dist: scikit-learn
11
+ Requires-Dist: rich
12
+ Requires-Dist: joblib
13
+ Provides-Extra: ui
14
+ Requires-Dist: streamlit; extra == "ui"
15
+ Provides-Extra: gpu
16
+ Requires-Dist: xgboost; extra == "gpu"
17
+ Provides-Extra: fast
18
+ Requires-Dist: polars; extra == "fast"
19
+ Provides-Extra: all
20
+ Requires-Dist: streamlit; extra == "all"
21
+ Requires-Dist: xgboost; extra == "all"
22
+ Requires-Dist: polars; extra == "all"
23
+ Dynamic: author
24
+ Dynamic: description
25
+ Dynamic: description-content-type
26
+ Dynamic: provides-extra
27
+ Dynamic: requires-dist
28
+ Dynamic: requires-python
29
+ Dynamic: summary
30
+
31
+ # 🚀 KaizenStat
32
+
33
+ [![PyPI Version](https://img.shields.io/pypi/v/kaizenstat.svg?style=flat-square&color=blue)](https://pypi.org/project/kaizenstat/)
34
+ [![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg?style=flat-square)](https://opensource.org/licenses/MIT)
35
+ [![Python Version](https://img.shields.io/badge/python-3.8%2B-blue.svg?style=flat-square)](https://www.python.org/downloads/)
36
+ [![Code Style: Black](https://img.shields.io/badge/code%20style-black-000000.svg?style=flat-square)](https://github.com/psf/black)
37
+
38
+ **KaizenStat** is a zero-friction, production-grade AutoML, automated data cleaning, and model explanation engine. It allows you to audit datasets, repair data issues, benchmark models with hardware-aware optimization, export standalone pipeline code, and host web-based dashboards—all with a single command or Python import.
39
+
40
+ ---
41
+
42
+ ## 🎯 Core Philosophy
43
+
44
+ * **Zero-Friction AutoML:** No complex configuration files. Pass your dataset, name your target, and KaizenStat does the rest.
45
+ * **Production Crash-Proofing:** Automatically handles messy real-world data issues: high-cardinality ID columns, datetime parsing, missing inputs, class imbalance, and label encoding.
46
+ * **Explainable AI:** Breaks open the "black box" by generating standalone, human-readable Python training code reproducing the best-found pipeline.
47
+ * **Hybrid Interface:** 100% parity between CLI and Python API.
48
+
49
+ ---
50
+
51
+ ## 📦 Installation
52
+
53
+ Install the core package with zero heavy external dependencies:
54
+
55
+ ```bash
56
+ pip install kaizenstat
57
+ ```
58
+
59
+ ### Optional Drivers & Accelerators
60
+
61
+ Tailor KaizenStat to your specific workload:
62
+
63
+ ```bash
64
+ pip install kaizenstat[ui] # Install Streamlit for web dashboards
65
+ pip install kaizenstat[gpu] # Install XGBoost with GPU/MPS support
66
+ pip install kaizenstat[fast] # Install Polars for ultra-fast CSV parsing
67
+ pip install kaizenstat[all] # Install all optional components
68
+ ```
69
+
70
+ ---
71
+
72
+ ## ⚔️ CLI & Python API Feature Matrix
73
+
74
+ KaizenStat is designed around a single unified vocabulary. Every CLI command has a direct, equivalent function in the Python SDK.
75
+
76
+ | Command | Python API | Purpose |
77
+ | :--- | :--- | :--- |
78
+ | `kz audit` | `KaizenStat.audit()` | 🔍 Runs a diagnostic sweep (missing values, duplicates, imbalance, dead features). |
79
+ | `kz heal` | `KaizenStat.heal()` | 🩹 Clean, impute, parse datetimes, drop IDs, and encode string labels. |
80
+ | `kz benchmark` | `KaizenStat.benchmark()` | 🚀 Automatically trains, optimizes, and ranks model pipelines. |
81
+ | `kz auto` | `KaizenStat.auto()` | ⚡ Orchestrates the entire pipeline in sequence (Audit ➔ Heal ➔ Benchmark). |
82
+ | `kz explain` | `KaizenStat.explain()` | 💬 Generates plain-English diagnostic summaries and model recommendations. |
83
+ | `kz codegen` | `KaizenStat.codegen()` | 📝 Generates standalone, dependency-free Python code for the best model. |
84
+ | `kz export-model` | `KaizenStat.save_model()` | 💾 Trains the top pipeline and saves it directly to a `.joblib` binary. |
85
+ | `kz report` | `KaizenStat.report()` | 📊 Generates a beautiful, interactive HTML profiling report with Chart.js. |
86
+ | `kz serve` | `KaizenStat.serve()` | 🌐 Launches a local web dashboard to explore the data and run predictions. |
87
+
88
+ ---
89
+
90
+ ## 💡 Quick Start Guide
91
+
92
+ ### 1. Python SDK Usage
93
+
94
+ ```python
95
+ from kaizenstat import KaizenStat
96
+ import pandas as pd
97
+
98
+ # Load your dataset
99
+ df = pd.read_csv("dataset.csv")
100
+
101
+ # 1. Diagnose issues
102
+ findings = KaizenStat.audit(df, target="target_column")
103
+
104
+ # 2. Automatically heal dirty data
105
+ clean_df = KaizenStat.heal(df, target="target_column")
106
+
107
+ # 3. Benchmark models with cross-validation
108
+ leaderboard = KaizenStat.benchmark(clean_df, target="target_column")
109
+
110
+ # 4. Generate standalone code for reproduction
111
+ KaizenStat.codegen("dataset.csv", target="target_column", output_path="reproduce.py")
112
+ ```
113
+
114
+ ### 2. Command Line Interface (CLI)
115
+
116
+ ```bash
117
+ # Get quick help and list commands
118
+ kz --help
119
+
120
+ # Run the full pipeline in one command
121
+ kz auto dataset.csv --target target_column
122
+
123
+ # Repair a dataset and save the clean file
124
+ kz heal dataset.csv --target target_column -o cleaned_dataset.csv
125
+
126
+ # Launch a local Streamlit app to preview and test model performance
127
+ kz serve dataset.csv --target target_column --port 8501
128
+ ```
129
+
130
+ ---
131
+
132
+ ## 🧠 Behind the Scenes: Core Engines
133
+
134
+ ### 1. Hardware-Aware Execution
135
+ KaizenStat automatically checks your environment using `detect_device()`. It leverages CUDA on Nvidia GPUs and MPS on Apple Silicon (M1/M2/M3 Mac) to accelerate training when optional dependencies (like `xgboost`) are installed.
136
+
137
+ ### 2. Smart Model Selection
138
+ The benchmarking engine adjusts its logic dynamically based on the dataset properties:
139
+ * **Large Datasets (>100k rows):** Excludes slow estimators (like Gradient Boosting) on standard CPU hosts to prevent compute lockups.
140
+ * **High-Cardinality Categoricals:** Optimizes feature preprocessors and prioritizes tree-based models (Random Forests, Gradient Boosting, XGBoost).
141
+ * **Float Targets:** Detects values with a continuous numeric profile and switches the entire pipeline to regression mode automatically.
142
+
143
+ ### 3. Automatic Imbalance Correction
144
+ During data healing, KaizenStat computes target ratios. If target class distribution has a skew larger than `65% / 35%`, it adjusts model parameters dynamically (e.g. setting `class_weight="balanced"` in scikit-learn estimators).
145
+
146
+ ---
147
+
148
+ ## 🛠 Developer Guide
149
+
150
+ ### Setting up a local workspace
151
+
152
+ To contribute or run local enhancements:
153
+
154
+ 1. Clone the repository:
155
+ ```bash
156
+ git clone https://github.com/masuddarrahaman/KaizenStat-Library.git
157
+ cd KaizenStat-Library
158
+ ```
159
+ 2. Install the package in editable mode with all optional drivers:
160
+ ```bash
161
+ pip install -e ".[all]"
162
+ ```
163
+ 3. Run tests or validation:
164
+ ```bash
165
+ python3 -m unittest discover -s tests
166
+ ```
167
+
168
+ ---
169
+
170
+ ## 📄 License
171
+
172
+ Distributed under the MIT License. See `LICENSE` for details.
@@ -2,7 +2,7 @@ from setuptools import setup, find_packages
2
2
 
3
3
  setup(
4
4
  name="kaizenstat",
5
- version="0.2.0",
5
+ version="0.2.2",
6
6
  author="Masuddar Rahman",
7
7
  description="Zero-friction AutoML + Data Cleaning Toolkit",
8
8
  long_description=open("README.md").read() if open("README.md") else "",
kaizenstat-0.2.0/PKG-INFO DELETED
@@ -1,176 +0,0 @@
1
- Metadata-Version: 2.4
2
- Name: kaizenstat
3
- Version: 0.2.0
4
- Summary: Zero-friction AutoML + Data Cleaning Toolkit
5
- Author: Masuddar Rahman
6
- Requires-Python: >=3.8
7
- Description-Content-Type: text/markdown
8
- Requires-Dist: pandas
9
- Requires-Dist: numpy
10
- Requires-Dist: scikit-learn
11
- Requires-Dist: rich
12
- Requires-Dist: joblib
13
- Provides-Extra: ui
14
- Requires-Dist: streamlit; extra == "ui"
15
- Provides-Extra: gpu
16
- Requires-Dist: xgboost; extra == "gpu"
17
- Provides-Extra: fast
18
- Requires-Dist: polars; extra == "fast"
19
- Provides-Extra: all
20
- Requires-Dist: streamlit; extra == "all"
21
- Requires-Dist: xgboost; extra == "all"
22
- Requires-Dist: polars; extra == "all"
23
- Dynamic: author
24
- Dynamic: description
25
- Dynamic: description-content-type
26
- Dynamic: provides-extra
27
- Dynamic: requires-dist
28
- Dynamic: requires-python
29
- Dynamic: summary
30
-
31
- # 🚀 KaizenStat
32
-
33
- [![PyPI Version](https://img.shields.io/pypi/v/kaizenstat.svg)](https://pypi.org/project/kaizenstat/)
34
- [![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
35
- [![Python Version](https://img.shields.io/badge/python-3.8%2B-blue.svg)](https://www.python.org/downloads/)
36
-
37
- **KaizenStat** is a zero-friction data validation, automatic cleaning, and AutoML benchmarking toolkit. Diagnose datasets instantly, auto-repair issues, train baseline models, generate standalone Python code, and launch interactive dashboards — all in one command.
38
-
39
- ---
40
-
41
- ## ✨ Features
42
-
43
- | Command | What it does |
44
- |---|---|
45
- | `kz audit` | 🔍 Diagnostic sweep — duplicates, NaNs, infs, ID columns, imbalance |
46
- | `kz heal` | 🩹 Auto-clean — impute, deduplicate, drop dead columns |
47
- | `kz benchmark` | 🚀 Train & rank ML models with cross-validation |
48
- | `kz auto` | ⚡ Full pipeline in one command (audit → heal → benchmark) |
49
- | `kz explain` | 💬 Plain-English summary of findings and recommendations |
50
- | `kz codegen` | 📝 Generate a standalone Python training script |
51
- | `kz export-model` | 💾 Train best model and save to `.joblib` |
52
- | `kz report` | 📊 Generate interactive HTML report with charts |
53
- | `kz serve` | 🌐 Launch interactive Streamlit web dashboard |
54
-
55
- ---
56
-
57
- ## 📦 Installation
58
-
59
- ```bash
60
- pip install kaizenstat
61
- ```
62
-
63
- **Optional extras:**
64
-
65
- ```bash
66
- pip install kaizenstat[ui] # + Streamlit dashboard
67
- pip install kaizenstat[gpu] # + XGBoost GPU support
68
- pip install kaizenstat[fast] # + Polars fast data loading
69
- pip install kaizenstat[all] # everything
70
- ```
71
-
72
- ---
73
-
74
- ## 🚀 Quick Start
75
-
76
- ### Python API
77
-
78
- ```python
79
- from kaizenstat import KaizenStat
80
-
81
- # Full pipeline in one call
82
- KaizenStat.auto("data.csv", target="price")
83
-
84
- # Or step-by-step
85
- import pandas as pd
86
- df = pd.read_csv("data.csv")
87
-
88
- KaizenStat.audit(df, target="price")
89
- df_clean = KaizenStat.heal(df, target="price")
90
- results = KaizenStat.benchmark(df_clean, target="price")
91
- ```
92
-
93
- ### 💬 Get a Plain-English Explanation
94
-
95
- ```python
96
- KaizenStat.explain("data.csv", target="price")
97
- ```
98
-
99
- ### 📝 Generate Standalone Code
100
-
101
- ```python
102
- KaizenStat.codegen("data.csv", target="price", output_path="deploy.py")
103
- ```
104
-
105
- ### 💾 Export & Load Models
106
-
107
- ```python
108
- # Train + save
109
- KaizenStat.auto("data.csv", target="price")
110
- KaizenStat.save_model(path="model.joblib")
111
-
112
- # Load later
113
- pipeline = KaizenStat.load_model("model.joblib")
114
- predictions = pipeline.predict(new_data)
115
- ```
116
-
117
- ### 📊 Generate HTML Report
118
-
119
- ```python
120
- KaizenStat.report("data.csv", target="price", output_path="report.html")
121
- ```
122
-
123
- ### 🌐 Launch Web Dashboard
124
-
125
- ```python
126
- KaizenStat.serve("data.csv", target="price")
127
- ```
128
-
129
- ---
130
-
131
- ## 💻 CLI Usage
132
-
133
- ```bash
134
- # Diagnostic sweep
135
- kz audit data.csv --target price
136
-
137
- # Auto-clean dataset
138
- kz heal data.csv --target price -o clean.csv
139
-
140
- # Train & rank models
141
- kz benchmark clean.csv --target price
142
-
143
- # Full pipeline
144
- kz auto data.csv --target price
145
-
146
- # Plain-English summary
147
- kz explain data.csv --target price
148
-
149
- # Generate standalone Python script
150
- kz codegen data.csv --target price -o deploy.py
151
-
152
- # Train best model and export
153
- kz export-model data.csv --target price -o model.joblib
154
-
155
- # Generate interactive HTML report
156
- kz report data.csv --target price -o report.html
157
-
158
- # Launch web dashboard
159
- kz serve data.csv --target price
160
- ```
161
-
162
- ---
163
-
164
- ## 🛠 Development
165
-
166
- ```bash
167
- git clone https://github.com/yourusername/kaizenstat.git
168
- cd kaizenstat
169
- pip install -e ".[all]"
170
- ```
171
-
172
- ---
173
-
174
- ## 📄 License
175
-
176
- Distributed under the MIT License.
@@ -1,146 +0,0 @@
1
- # 🚀 KaizenStat
2
-
3
- [![PyPI Version](https://img.shields.io/pypi/v/kaizenstat.svg)](https://pypi.org/project/kaizenstat/)
4
- [![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
5
- [![Python Version](https://img.shields.io/badge/python-3.8%2B-blue.svg)](https://www.python.org/downloads/)
6
-
7
- **KaizenStat** is a zero-friction data validation, automatic cleaning, and AutoML benchmarking toolkit. Diagnose datasets instantly, auto-repair issues, train baseline models, generate standalone Python code, and launch interactive dashboards — all in one command.
8
-
9
- ---
10
-
11
- ## ✨ Features
12
-
13
- | Command | What it does |
14
- |---|---|
15
- | `kz audit` | 🔍 Diagnostic sweep — duplicates, NaNs, infs, ID columns, imbalance |
16
- | `kz heal` | 🩹 Auto-clean — impute, deduplicate, drop dead columns |
17
- | `kz benchmark` | 🚀 Train & rank ML models with cross-validation |
18
- | `kz auto` | ⚡ Full pipeline in one command (audit → heal → benchmark) |
19
- | `kz explain` | 💬 Plain-English summary of findings and recommendations |
20
- | `kz codegen` | 📝 Generate a standalone Python training script |
21
- | `kz export-model` | 💾 Train best model and save to `.joblib` |
22
- | `kz report` | 📊 Generate interactive HTML report with charts |
23
- | `kz serve` | 🌐 Launch interactive Streamlit web dashboard |
24
-
25
- ---
26
-
27
- ## 📦 Installation
28
-
29
- ```bash
30
- pip install kaizenstat
31
- ```
32
-
33
- **Optional extras:**
34
-
35
- ```bash
36
- pip install kaizenstat[ui] # + Streamlit dashboard
37
- pip install kaizenstat[gpu] # + XGBoost GPU support
38
- pip install kaizenstat[fast] # + Polars fast data loading
39
- pip install kaizenstat[all] # everything
40
- ```
41
-
42
- ---
43
-
44
- ## 🚀 Quick Start
45
-
46
- ### Python API
47
-
48
- ```python
49
- from kaizenstat import KaizenStat
50
-
51
- # Full pipeline in one call
52
- KaizenStat.auto("data.csv", target="price")
53
-
54
- # Or step-by-step
55
- import pandas as pd
56
- df = pd.read_csv("data.csv")
57
-
58
- KaizenStat.audit(df, target="price")
59
- df_clean = KaizenStat.heal(df, target="price")
60
- results = KaizenStat.benchmark(df_clean, target="price")
61
- ```
62
-
63
- ### 💬 Get a Plain-English Explanation
64
-
65
- ```python
66
- KaizenStat.explain("data.csv", target="price")
67
- ```
68
-
69
- ### 📝 Generate Standalone Code
70
-
71
- ```python
72
- KaizenStat.codegen("data.csv", target="price", output_path="deploy.py")
73
- ```
74
-
75
- ### 💾 Export & Load Models
76
-
77
- ```python
78
- # Train + save
79
- KaizenStat.auto("data.csv", target="price")
80
- KaizenStat.save_model(path="model.joblib")
81
-
82
- # Load later
83
- pipeline = KaizenStat.load_model("model.joblib")
84
- predictions = pipeline.predict(new_data)
85
- ```
86
-
87
- ### 📊 Generate HTML Report
88
-
89
- ```python
90
- KaizenStat.report("data.csv", target="price", output_path="report.html")
91
- ```
92
-
93
- ### 🌐 Launch Web Dashboard
94
-
95
- ```python
96
- KaizenStat.serve("data.csv", target="price")
97
- ```
98
-
99
- ---
100
-
101
- ## 💻 CLI Usage
102
-
103
- ```bash
104
- # Diagnostic sweep
105
- kz audit data.csv --target price
106
-
107
- # Auto-clean dataset
108
- kz heal data.csv --target price -o clean.csv
109
-
110
- # Train & rank models
111
- kz benchmark clean.csv --target price
112
-
113
- # Full pipeline
114
- kz auto data.csv --target price
115
-
116
- # Plain-English summary
117
- kz explain data.csv --target price
118
-
119
- # Generate standalone Python script
120
- kz codegen data.csv --target price -o deploy.py
121
-
122
- # Train best model and export
123
- kz export-model data.csv --target price -o model.joblib
124
-
125
- # Generate interactive HTML report
126
- kz report data.csv --target price -o report.html
127
-
128
- # Launch web dashboard
129
- kz serve data.csv --target price
130
- ```
131
-
132
- ---
133
-
134
- ## 🛠 Development
135
-
136
- ```bash
137
- git clone https://github.com/yourusername/kaizenstat.git
138
- cd kaizenstat
139
- pip install -e ".[all]"
140
- ```
141
-
142
- ---
143
-
144
- ## 📄 License
145
-
146
- Distributed under the MIT License.
@@ -1,3 +0,0 @@
1
- from .core import KaizenStat, DataEngine, detect_device
2
-
3
- __all__ = ["KaizenStat", "DataEngine", "detect_device"]
@@ -1,176 +0,0 @@
1
- Metadata-Version: 2.4
2
- Name: kaizenstat
3
- Version: 0.2.0
4
- Summary: Zero-friction AutoML + Data Cleaning Toolkit
5
- Author: Masuddar Rahman
6
- Requires-Python: >=3.8
7
- Description-Content-Type: text/markdown
8
- Requires-Dist: pandas
9
- Requires-Dist: numpy
10
- Requires-Dist: scikit-learn
11
- Requires-Dist: rich
12
- Requires-Dist: joblib
13
- Provides-Extra: ui
14
- Requires-Dist: streamlit; extra == "ui"
15
- Provides-Extra: gpu
16
- Requires-Dist: xgboost; extra == "gpu"
17
- Provides-Extra: fast
18
- Requires-Dist: polars; extra == "fast"
19
- Provides-Extra: all
20
- Requires-Dist: streamlit; extra == "all"
21
- Requires-Dist: xgboost; extra == "all"
22
- Requires-Dist: polars; extra == "all"
23
- Dynamic: author
24
- Dynamic: description
25
- Dynamic: description-content-type
26
- Dynamic: provides-extra
27
- Dynamic: requires-dist
28
- Dynamic: requires-python
29
- Dynamic: summary
30
-
31
- # 🚀 KaizenStat
32
-
33
- [![PyPI Version](https://img.shields.io/pypi/v/kaizenstat.svg)](https://pypi.org/project/kaizenstat/)
34
- [![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
35
- [![Python Version](https://img.shields.io/badge/python-3.8%2B-blue.svg)](https://www.python.org/downloads/)
36
-
37
- **KaizenStat** is a zero-friction data validation, automatic cleaning, and AutoML benchmarking toolkit. Diagnose datasets instantly, auto-repair issues, train baseline models, generate standalone Python code, and launch interactive dashboards — all in one command.
38
-
39
- ---
40
-
41
- ## ✨ Features
42
-
43
- | Command | What it does |
44
- |---|---|
45
- | `kz audit` | 🔍 Diagnostic sweep — duplicates, NaNs, infs, ID columns, imbalance |
46
- | `kz heal` | 🩹 Auto-clean — impute, deduplicate, drop dead columns |
47
- | `kz benchmark` | 🚀 Train & rank ML models with cross-validation |
48
- | `kz auto` | ⚡ Full pipeline in one command (audit → heal → benchmark) |
49
- | `kz explain` | 💬 Plain-English summary of findings and recommendations |
50
- | `kz codegen` | 📝 Generate a standalone Python training script |
51
- | `kz export-model` | 💾 Train best model and save to `.joblib` |
52
- | `kz report` | 📊 Generate interactive HTML report with charts |
53
- | `kz serve` | 🌐 Launch interactive Streamlit web dashboard |
54
-
55
- ---
56
-
57
- ## 📦 Installation
58
-
59
- ```bash
60
- pip install kaizenstat
61
- ```
62
-
63
- **Optional extras:**
64
-
65
- ```bash
66
- pip install kaizenstat[ui] # + Streamlit dashboard
67
- pip install kaizenstat[gpu] # + XGBoost GPU support
68
- pip install kaizenstat[fast] # + Polars fast data loading
69
- pip install kaizenstat[all] # everything
70
- ```
71
-
72
- ---
73
-
74
- ## 🚀 Quick Start
75
-
76
- ### Python API
77
-
78
- ```python
79
- from kaizenstat import KaizenStat
80
-
81
- # Full pipeline in one call
82
- KaizenStat.auto("data.csv", target="price")
83
-
84
- # Or step-by-step
85
- import pandas as pd
86
- df = pd.read_csv("data.csv")
87
-
88
- KaizenStat.audit(df, target="price")
89
- df_clean = KaizenStat.heal(df, target="price")
90
- results = KaizenStat.benchmark(df_clean, target="price")
91
- ```
92
-
93
- ### 💬 Get a Plain-English Explanation
94
-
95
- ```python
96
- KaizenStat.explain("data.csv", target="price")
97
- ```
98
-
99
- ### 📝 Generate Standalone Code
100
-
101
- ```python
102
- KaizenStat.codegen("data.csv", target="price", output_path="deploy.py")
103
- ```
104
-
105
- ### 💾 Export & Load Models
106
-
107
- ```python
108
- # Train + save
109
- KaizenStat.auto("data.csv", target="price")
110
- KaizenStat.save_model(path="model.joblib")
111
-
112
- # Load later
113
- pipeline = KaizenStat.load_model("model.joblib")
114
- predictions = pipeline.predict(new_data)
115
- ```
116
-
117
- ### 📊 Generate HTML Report
118
-
119
- ```python
120
- KaizenStat.report("data.csv", target="price", output_path="report.html")
121
- ```
122
-
123
- ### 🌐 Launch Web Dashboard
124
-
125
- ```python
126
- KaizenStat.serve("data.csv", target="price")
127
- ```
128
-
129
- ---
130
-
131
- ## 💻 CLI Usage
132
-
133
- ```bash
134
- # Diagnostic sweep
135
- kz audit data.csv --target price
136
-
137
- # Auto-clean dataset
138
- kz heal data.csv --target price -o clean.csv
139
-
140
- # Train & rank models
141
- kz benchmark clean.csv --target price
142
-
143
- # Full pipeline
144
- kz auto data.csv --target price
145
-
146
- # Plain-English summary
147
- kz explain data.csv --target price
148
-
149
- # Generate standalone Python script
150
- kz codegen data.csv --target price -o deploy.py
151
-
152
- # Train best model and export
153
- kz export-model data.csv --target price -o model.joblib
154
-
155
- # Generate interactive HTML report
156
- kz report data.csv --target price -o report.html
157
-
158
- # Launch web dashboard
159
- kz serve data.csv --target price
160
- ```
161
-
162
- ---
163
-
164
- ## 🛠 Development
165
-
166
- ```bash
167
- git clone https://github.com/yourusername/kaizenstat.git
168
- cd kaizenstat
169
- pip install -e ".[all]"
170
- ```
171
-
172
- ---
173
-
174
- ## 📄 License
175
-
176
- Distributed under the MIT License.
File without changes
File without changes
File without changes