kmds-modeling 0.1.0__tar.gz
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- kmds_modeling-0.1.0/MANIFEST.in +10 -0
- kmds_modeling-0.1.0/PKG-INFO +54 -0
- kmds_modeling-0.1.0/README.md +40 -0
- kmds_modeling-0.1.0/pyproject.toml +26 -0
- kmds_modeling-0.1.0/setup.cfg +4 -0
- kmds_modeling-0.1.0/src/kmds_modeling/__init__.py +10 -0
- kmds_modeling-0.1.0/src/kmds_modeling/cli.py +32 -0
- kmds_modeling-0.1.0/src/kmds_modeling/core/__init__.py +17 -0
- kmds_modeling-0.1.0/src/kmds_modeling/core/base.py +36 -0
- kmds_modeling-0.1.0/src/kmds_modeling/core/notebook_utils.py +48 -0
- kmds_modeling-0.1.0/src/kmds_modeling/core/path_coordinator.py +70 -0
- kmds_modeling-0.1.0/src/kmds_modeling/core/runner.py +123 -0
- kmds_modeling-0.1.0/src/kmds_modeling/examples/__init__.py +5 -0
- kmds_modeling-0.1.0/src/kmds_modeling/examples/example_candidate.py +21 -0
- kmds_modeling-0.1.0/src/kmds_modeling/examples/example_transformer.py +16 -0
- kmds_modeling-0.1.0/src/kmds_modeling.egg-info/PKG-INFO +54 -0
- kmds_modeling-0.1.0/src/kmds_modeling.egg-info/SOURCES.txt +19 -0
- kmds_modeling-0.1.0/src/kmds_modeling.egg-info/dependency_links.txt +1 -0
- kmds_modeling-0.1.0/src/kmds_modeling.egg-info/entry_points.txt +2 -0
- kmds_modeling-0.1.0/src/kmds_modeling.egg-info/requires.txt +7 -0
- kmds_modeling-0.1.0/src/kmds_modeling.egg-info/top_level.txt +1 -0
|
@@ -0,0 +1,54 @@
|
|
|
1
|
+
Metadata-Version: 2.4
|
|
2
|
+
Name: kmds-modeling
|
|
3
|
+
Version: 0.1.0
|
|
4
|
+
Summary: KMDS modeling pipeline package for KMDS lifecycle and model selection.
|
|
5
|
+
Requires-Python: >=3.13
|
|
6
|
+
Description-Content-Type: text/markdown
|
|
7
|
+
Requires-Dist: kmds-featurization>=0.1.5
|
|
8
|
+
Requires-Dist: click>=8.0
|
|
9
|
+
Requires-Dist: pandas>=2.0
|
|
10
|
+
Requires-Dist: numpy>=1.26
|
|
11
|
+
Requires-Dist: scikit-learn>=1.4
|
|
12
|
+
Requires-Dist: PyYAML>=6.0
|
|
13
|
+
Requires-Dist: joblib>=1.3
|
|
14
|
+
|
|
15
|
+
# KMDS Modeling
|
|
16
|
+
|
|
17
|
+
`kmds-modeling` is a lightweight modeling package designed to work inside the KMDS ecosystem. It provides generic modeling infrastructure and pipeline utilities for KMDS-style workflows, while leaving domain-specific examples and workspace-specific implementations separate.
|
|
18
|
+
|
|
19
|
+
## What this package provides
|
|
20
|
+
- `src/kmds_modeling/core` — generic modeling package infrastructure
|
|
21
|
+
- `src/kmds_modeling/core/path_coordinator.py` — workspace-rooted path resolution for KMDS modeling
|
|
22
|
+
- `src/kmds_modeling/core/notebook_utils.py` — notebook-friendly workspace resolver
|
|
23
|
+
- `src/kmds_modeling/cli.py` — installable CLI glue for evaluation and export
|
|
24
|
+
- `models/sba_example` — an example SBA-specific modeling workflow kept outside the installed package
|
|
25
|
+
|
|
26
|
+
## Intended usage
|
|
27
|
+
This package is meant to be installed into a KMDS workspace and used against modeling artifacts generated by KMDS tools such as `kmds-featurization`. The package does not embed any domain-specific SBA implementation in the installable distribution.
|
|
28
|
+
|
|
29
|
+
## Installation
|
|
30
|
+
```bash
|
|
31
|
+
pip install kmds-modeling
|
|
32
|
+
```
|
|
33
|
+
|
|
34
|
+
## CLI commands
|
|
35
|
+
After installing, the package exposes the `kmds-modeling` CLI:
|
|
36
|
+
|
|
37
|
+
```bash
|
|
38
|
+
kmds-modeling evaluate --config /path/to/modeling_config.yaml
|
|
39
|
+
kmds-modeling export --config /path/to/modeling_config.yaml
|
|
40
|
+
```
|
|
41
|
+
|
|
42
|
+
## Working directory and configuration
|
|
43
|
+
KMDS modeling expects a `working_dir` and a `modeling_config.yaml` that defines the workspace layout. The package resolves paths using the `PathCoordinator` and writes modeling outputs into the workspace `models/` directory by default.
|
|
44
|
+
|
|
45
|
+
## Example workflow
|
|
46
|
+
1. Use KMDS featurization to generate `model_ready_numeric_data.csv` under `data/featurization/`.
|
|
47
|
+
2. Create `modeling_config.yaml` with a `working_dir` pointing to your KMDS workspace.
|
|
48
|
+
3. Run the package CLI to evaluate and export model artifacts.
|
|
49
|
+
|
|
50
|
+
## Packaging note
|
|
51
|
+
The published PyPI package should only contain the generic package code under `src/kmds_modeling/`. Workspace-specific examples such as `models/sba_example/` are intentionally kept outside the installable package source tree.
|
|
52
|
+
|
|
53
|
+
## Contributing
|
|
54
|
+
If you want to add another KMDS modeling example, put it under `models/<example_name>/` and leave the core package unchanged.
|
|
@@ -0,0 +1,40 @@
|
|
|
1
|
+
# KMDS Modeling
|
|
2
|
+
|
|
3
|
+
`kmds-modeling` is a lightweight modeling package designed to work inside the KMDS ecosystem. It provides generic modeling infrastructure and pipeline utilities for KMDS-style workflows, while leaving domain-specific examples and workspace-specific implementations separate.
|
|
4
|
+
|
|
5
|
+
## What this package provides
|
|
6
|
+
- `src/kmds_modeling/core` — generic modeling package infrastructure
|
|
7
|
+
- `src/kmds_modeling/core/path_coordinator.py` — workspace-rooted path resolution for KMDS modeling
|
|
8
|
+
- `src/kmds_modeling/core/notebook_utils.py` — notebook-friendly workspace resolver
|
|
9
|
+
- `src/kmds_modeling/cli.py` — installable CLI glue for evaluation and export
|
|
10
|
+
- `models/sba_example` — an example SBA-specific modeling workflow kept outside the installed package
|
|
11
|
+
|
|
12
|
+
## Intended usage
|
|
13
|
+
This package is meant to be installed into a KMDS workspace and used against modeling artifacts generated by KMDS tools such as `kmds-featurization`. The package does not embed any domain-specific SBA implementation in the installable distribution.
|
|
14
|
+
|
|
15
|
+
## Installation
|
|
16
|
+
```bash
|
|
17
|
+
pip install kmds-modeling
|
|
18
|
+
```
|
|
19
|
+
|
|
20
|
+
## CLI commands
|
|
21
|
+
After installing, the package exposes the `kmds-modeling` CLI:
|
|
22
|
+
|
|
23
|
+
```bash
|
|
24
|
+
kmds-modeling evaluate --config /path/to/modeling_config.yaml
|
|
25
|
+
kmds-modeling export --config /path/to/modeling_config.yaml
|
|
26
|
+
```
|
|
27
|
+
|
|
28
|
+
## Working directory and configuration
|
|
29
|
+
KMDS modeling expects a `working_dir` and a `modeling_config.yaml` that defines the workspace layout. The package resolves paths using the `PathCoordinator` and writes modeling outputs into the workspace `models/` directory by default.
|
|
30
|
+
|
|
31
|
+
## Example workflow
|
|
32
|
+
1. Use KMDS featurization to generate `model_ready_numeric_data.csv` under `data/featurization/`.
|
|
33
|
+
2. Create `modeling_config.yaml` with a `working_dir` pointing to your KMDS workspace.
|
|
34
|
+
3. Run the package CLI to evaluate and export model artifacts.
|
|
35
|
+
|
|
36
|
+
## Packaging note
|
|
37
|
+
The published PyPI package should only contain the generic package code under `src/kmds_modeling/`. Workspace-specific examples such as `models/sba_example/` are intentionally kept outside the installable package source tree.
|
|
38
|
+
|
|
39
|
+
## Contributing
|
|
40
|
+
If you want to add another KMDS modeling example, put it under `models/<example_name>/` and leave the core package unchanged.
|
|
@@ -0,0 +1,26 @@
|
|
|
1
|
+
[project]
|
|
2
|
+
name = "kmds-modeling"
|
|
3
|
+
version = "0.1.0"
|
|
4
|
+
description = "KMDS modeling pipeline package for KMDS lifecycle and model selection."
|
|
5
|
+
readme = "README.md"
|
|
6
|
+
requires-python = ">=3.13"
|
|
7
|
+
dependencies = [
|
|
8
|
+
"kmds-featurization>=0.1.5",
|
|
9
|
+
"click>=8.0",
|
|
10
|
+
"pandas>=2.0",
|
|
11
|
+
"numpy>=1.26",
|
|
12
|
+
"scikit-learn>=1.4",
|
|
13
|
+
"PyYAML>=6.0",
|
|
14
|
+
"joblib>=1.3",
|
|
15
|
+
]
|
|
16
|
+
|
|
17
|
+
[project.scripts]
|
|
18
|
+
kmds-modeling = "kmds_modeling.cli:cli"
|
|
19
|
+
|
|
20
|
+
[build-system]
|
|
21
|
+
requires = ["setuptools>=65.0", "wheel"]
|
|
22
|
+
build-backend = "setuptools.build_meta"
|
|
23
|
+
|
|
24
|
+
[tool.setuptools]
|
|
25
|
+
package-dir = {"" = "src"}
|
|
26
|
+
packages = { find = { where = ["src"] } }
|
|
@@ -0,0 +1,32 @@
|
|
|
1
|
+
import click
|
|
2
|
+
from .core.runner import ExperimentRunner
|
|
3
|
+
|
|
4
|
+
|
|
5
|
+
@click.group()
|
|
6
|
+
def cli():
|
|
7
|
+
"""KMDS Modeling CLI."""
|
|
8
|
+
pass
|
|
9
|
+
|
|
10
|
+
|
|
11
|
+
@cli.command()
|
|
12
|
+
@click.option("--config", required=True, type=click.Path(exists=True), help="Path to modeling_config.yaml")
|
|
13
|
+
def evaluate(config):
|
|
14
|
+
"""Run model evaluation for configured candidates."""
|
|
15
|
+
runner = ExperimentRunner(config)
|
|
16
|
+
click.echo("Starting candidate model evaluation...")
|
|
17
|
+
df = runner.run_evaluation()
|
|
18
|
+
click.echo("\n--- EXPERIMENT RESULTS LEADERBOARD ---")
|
|
19
|
+
click.echo(df.to_string(index=False))
|
|
20
|
+
|
|
21
|
+
|
|
22
|
+
@cli.command()
|
|
23
|
+
@click.option("--config", required=True, type=click.Path(exists=True), help="Path to modeling_config.yaml")
|
|
24
|
+
def export(config):
|
|
25
|
+
"""Export the selected champion model using the configured production target."""
|
|
26
|
+
runner = ExperimentRunner(config)
|
|
27
|
+
click.echo("Exporting the champion model artifacts...")
|
|
28
|
+
runner.export_champion()
|
|
29
|
+
|
|
30
|
+
|
|
31
|
+
if __name__ == '__main__':
|
|
32
|
+
cli()
|
|
@@ -0,0 +1,17 @@
|
|
|
1
|
+
from .path_coordinator import PathCoordinator
|
|
2
|
+
from .notebook_utils import (
|
|
3
|
+
build_notebook_resolver,
|
|
4
|
+
get_modeling_artifact_paths,
|
|
5
|
+
load_model_ready_dataset,
|
|
6
|
+
load_workspace_config,
|
|
7
|
+
resolve_notebook_workspace_root,
|
|
8
|
+
)
|
|
9
|
+
|
|
10
|
+
__all__ = [
|
|
11
|
+
"PathCoordinator",
|
|
12
|
+
"build_notebook_resolver",
|
|
13
|
+
"get_modeling_artifact_paths",
|
|
14
|
+
"load_model_ready_dataset",
|
|
15
|
+
"load_workspace_config",
|
|
16
|
+
"resolve_notebook_workspace_root",
|
|
17
|
+
]
|
|
@@ -0,0 +1,36 @@
|
|
|
1
|
+
from abc import ABC, abstractmethod
|
|
2
|
+
import pandas as pd
|
|
3
|
+
import numpy as np
|
|
4
|
+
|
|
5
|
+
class BaseFeatureTransformer(ABC):
|
|
6
|
+
"""Interface for ad-hoc dataset-level transformations."""
|
|
7
|
+
|
|
8
|
+
@abstractmethod
|
|
9
|
+
def fit(self, X: pd.DataFrame, y: pd.Series = None):
|
|
10
|
+
"""Fit internal parameters based on training data."""
|
|
11
|
+
pass
|
|
12
|
+
|
|
13
|
+
@abstractmethod
|
|
14
|
+
def transform(self, X: pd.DataFrame) -> pd.DataFrame:
|
|
15
|
+
"""Transforms dataset features while maintaining the exact input index."""
|
|
16
|
+
pass
|
|
17
|
+
|
|
18
|
+
def fit_transform(self, X: pd.DataFrame, y: pd.Series = None) -> pd.DataFrame:
|
|
19
|
+
return self.fit(X, y).transform(X)
|
|
20
|
+
|
|
21
|
+
|
|
22
|
+
class BaseModelCandidate(ABC):
|
|
23
|
+
"""Interface for uniform model orchestration."""
|
|
24
|
+
|
|
25
|
+
def __init__(self, hyperparameters: dict):
|
|
26
|
+
self.hyperparameters = hyperparameters
|
|
27
|
+
|
|
28
|
+
@abstractmethod
|
|
29
|
+
def fit(self, X_train: pd.DataFrame, y_train: pd.Series):
|
|
30
|
+
"""Train the underlying model."""
|
|
31
|
+
pass
|
|
32
|
+
|
|
33
|
+
@abstractmethod
|
|
34
|
+
def predict_proba(self, X: pd.DataFrame) -> np.ndarray:
|
|
35
|
+
"""Return class probabilities. Must return a 2D array [prob_class_0, prob_class_1]."""
|
|
36
|
+
pass
|
|
@@ -0,0 +1,48 @@
|
|
|
1
|
+
import os
|
|
2
|
+
from typing import Dict
|
|
3
|
+
|
|
4
|
+
import pandas as pd
|
|
5
|
+
import yaml
|
|
6
|
+
|
|
7
|
+
from .path_coordinator import PathCoordinator
|
|
8
|
+
|
|
9
|
+
|
|
10
|
+
def resolve_notebook_workspace_root(working_dir: str, config_name: str = "modeling_config.yaml") -> str:
|
|
11
|
+
working_dir = os.path.abspath(working_dir)
|
|
12
|
+
if not os.path.isdir(working_dir):
|
|
13
|
+
raise FileNotFoundError(f"Notebook directory does not exist: {working_dir}")
|
|
14
|
+
|
|
15
|
+
return working_dir
|
|
16
|
+
|
|
17
|
+
|
|
18
|
+
def load_workspace_config(working_dir: str, config_name: str = "modeling_config.yaml") -> Dict:
|
|
19
|
+
workspace_root = resolve_notebook_workspace_root(working_dir, config_name=config_name)
|
|
20
|
+
config_path = os.path.join(workspace_root, config_name)
|
|
21
|
+
if not os.path.isfile(config_path):
|
|
22
|
+
raise FileNotFoundError(f"Modeling config not found at: {config_path}")
|
|
23
|
+
|
|
24
|
+
with open(config_path, "r", encoding="utf-8") as f:
|
|
25
|
+
return yaml.safe_load(f) or {}
|
|
26
|
+
|
|
27
|
+
|
|
28
|
+
def build_notebook_resolver(working_dir: str, config_name: str = "modeling_config.yaml") -> PathCoordinator:
|
|
29
|
+
config = load_workspace_config(working_dir, config_name=config_name)
|
|
30
|
+
return PathCoordinator(working_dir=working_dir, config=config)
|
|
31
|
+
|
|
32
|
+
|
|
33
|
+
def get_modeling_artifact_paths(resolver: PathCoordinator) -> Dict[str, str]:
|
|
34
|
+
return {
|
|
35
|
+
"model_ready_dataset_path": resolver.model_ready_dataset_path,
|
|
36
|
+
"model_weights_path": resolver.model_weights_path,
|
|
37
|
+
"feature_pipeline_path": resolver.feature_pipeline_path,
|
|
38
|
+
"calibrator_path": resolver.calibrator_path,
|
|
39
|
+
"metadata_path": resolver.metadata_path,
|
|
40
|
+
"active_scores_path": resolver.active_scores_path,
|
|
41
|
+
}
|
|
42
|
+
|
|
43
|
+
|
|
44
|
+
def load_model_ready_dataset(resolver: PathCoordinator, **read_csv_kwargs) -> pd.DataFrame:
|
|
45
|
+
path = resolver.model_ready_dataset_path
|
|
46
|
+
if not os.path.isfile(path):
|
|
47
|
+
raise FileNotFoundError(f"Model-ready dataset not found at: {path}")
|
|
48
|
+
return pd.read_csv(path, **read_csv_kwargs)
|
|
@@ -0,0 +1,70 @@
|
|
|
1
|
+
import os
|
|
2
|
+
from typing import Any, Dict
|
|
3
|
+
|
|
4
|
+
|
|
5
|
+
class PathCoordinator:
|
|
6
|
+
"""Resolves KMDS modeling package paths from the workspace working directory."""
|
|
7
|
+
|
|
8
|
+
def __init__(self, working_dir: str, config: Dict[str, Any]):
|
|
9
|
+
self.working_dir = os.path.abspath(working_dir)
|
|
10
|
+
self.config = config or {}
|
|
11
|
+
|
|
12
|
+
def _remove_anchor_prefix(self, config_value: str, anchor: str) -> str:
|
|
13
|
+
if config_value.startswith(anchor + os.sep):
|
|
14
|
+
return config_value.replace(anchor + os.sep, "", 1)
|
|
15
|
+
return config_value
|
|
16
|
+
|
|
17
|
+
@property
|
|
18
|
+
def model_ready_data_file(self) -> str:
|
|
19
|
+
return self.config.get("model_ready_data_file", "model_ready_numeric_data.csv")
|
|
20
|
+
|
|
21
|
+
@property
|
|
22
|
+
def featurization_output_dir(self) -> str:
|
|
23
|
+
return self.config.get("featurization_output_dir", "featurization")
|
|
24
|
+
|
|
25
|
+
@property
|
|
26
|
+
def modeling_output_dir(self) -> str:
|
|
27
|
+
return self.config.get("modeling_output_dir", "models")
|
|
28
|
+
|
|
29
|
+
def _resolve_data_dir(self, config_value: str) -> str:
|
|
30
|
+
if os.path.isabs(config_value):
|
|
31
|
+
return config_value
|
|
32
|
+
config_value = self._remove_anchor_prefix(config_value, "data")
|
|
33
|
+
return os.path.join(self.working_dir, "data", config_value)
|
|
34
|
+
|
|
35
|
+
def _resolve_workspace_dir(self, config_value: str) -> str:
|
|
36
|
+
if os.path.isabs(config_value):
|
|
37
|
+
return config_value
|
|
38
|
+
return os.path.join(self.working_dir, config_value)
|
|
39
|
+
|
|
40
|
+
@property
|
|
41
|
+
def featurization_output_path(self) -> str:
|
|
42
|
+
return self._resolve_data_dir(self.featurization_output_dir)
|
|
43
|
+
|
|
44
|
+
@property
|
|
45
|
+
def modeling_output_path(self) -> str:
|
|
46
|
+
return self._resolve_workspace_dir(self.modeling_output_dir)
|
|
47
|
+
|
|
48
|
+
@property
|
|
49
|
+
def model_ready_dataset_path(self) -> str:
|
|
50
|
+
return os.path.join(self.featurization_output_path, self.model_ready_data_file)
|
|
51
|
+
|
|
52
|
+
@property
|
|
53
|
+
def model_weights_path(self) -> str:
|
|
54
|
+
return os.path.join(self.modeling_output_path, "model_weights.pkl")
|
|
55
|
+
|
|
56
|
+
@property
|
|
57
|
+
def feature_pipeline_path(self) -> str:
|
|
58
|
+
return os.path.join(self.modeling_output_path, "feature_pipeline.pkl")
|
|
59
|
+
|
|
60
|
+
@property
|
|
61
|
+
def calibrator_path(self) -> str:
|
|
62
|
+
return os.path.join(self.modeling_output_path, "calibrator.pkl")
|
|
63
|
+
|
|
64
|
+
@property
|
|
65
|
+
def metadata_path(self) -> str:
|
|
66
|
+
return os.path.join(self.modeling_output_path, "metadata.json")
|
|
67
|
+
|
|
68
|
+
@property
|
|
69
|
+
def active_scores_path(self) -> str:
|
|
70
|
+
return os.path.join(self.modeling_output_path, "active_set_scores.csv")
|
|
@@ -0,0 +1,123 @@
|
|
|
1
|
+
import os
|
|
2
|
+
import json
|
|
3
|
+
import importlib
|
|
4
|
+
from typing import Optional
|
|
5
|
+
|
|
6
|
+
import joblib
|
|
7
|
+
import numpy as np
|
|
8
|
+
import pandas as pd
|
|
9
|
+
import yaml
|
|
10
|
+
from sklearn.metrics import f1_score, roc_auc_score
|
|
11
|
+
from sklearn.model_selection import StratifiedKFold
|
|
12
|
+
|
|
13
|
+
from .path_coordinator import PathCoordinator
|
|
14
|
+
|
|
15
|
+
|
|
16
|
+
class ExperimentRunner:
|
|
17
|
+
def __init__(self, config_path: str):
|
|
18
|
+
self.config_path = os.path.abspath(config_path)
|
|
19
|
+
with open(self.config_path, "r") as f:
|
|
20
|
+
self.config = yaml.safe_load(f)
|
|
21
|
+
|
|
22
|
+
self.custom_transformers = []
|
|
23
|
+
self._load_data()
|
|
24
|
+
|
|
25
|
+
def _load_data(self):
|
|
26
|
+
data_cfg = self.config["data"]
|
|
27
|
+
working_dir = data_cfg.get("working_dir") or os.path.dirname(self.config_path)
|
|
28
|
+
self.path_coordinator = PathCoordinator(working_dir=working_dir, config=self.config)
|
|
29
|
+
|
|
30
|
+
df = pd.read_csv(self.path_coordinator.model_ready_dataset_path)
|
|
31
|
+
|
|
32
|
+
index_column = data_cfg.get("index_column")
|
|
33
|
+
if index_column:
|
|
34
|
+
if index_column in df.columns:
|
|
35
|
+
df.set_index(index_column, inplace=True)
|
|
36
|
+
else:
|
|
37
|
+
df.index.name = index_column
|
|
38
|
+
|
|
39
|
+
target = self.config["project"]["target_variable"]
|
|
40
|
+
self.y = df[target]
|
|
41
|
+
self.X = df.drop(columns=[target])
|
|
42
|
+
|
|
43
|
+
def register_transformer(self, transformer):
|
|
44
|
+
self.custom_transformers.append(transformer)
|
|
45
|
+
|
|
46
|
+
def _apply_transformers(self, X_train: pd.DataFrame, X_val: pd.DataFrame, y_train: pd.Series):
|
|
47
|
+
X_tr_fe = X_train.copy()
|
|
48
|
+
X_val_fe = X_val.copy()
|
|
49
|
+
for trans in self.custom_transformers:
|
|
50
|
+
X_tr_fe = trans.fit_transform(X_tr_fe, y_train)
|
|
51
|
+
X_val_fe = trans.transform(X_val_fe)
|
|
52
|
+
return X_tr_fe, X_val_fe
|
|
53
|
+
|
|
54
|
+
def _get_candidate_class(self, class_path: str):
|
|
55
|
+
module_name, class_name = class_path.rsplit(".", 1)
|
|
56
|
+
module = importlib.import_module(module_name)
|
|
57
|
+
return getattr(module, class_name)
|
|
58
|
+
|
|
59
|
+
def run_evaluation(self) -> pd.DataFrame:
|
|
60
|
+
cv_cfg = self.config["experiment_settings"]["cross_validation"]
|
|
61
|
+
skf = StratifiedKFold(
|
|
62
|
+
n_splits=cv_cfg["splits"],
|
|
63
|
+
shuffle=True,
|
|
64
|
+
random_state=cv_cfg["random_state"],
|
|
65
|
+
)
|
|
66
|
+
|
|
67
|
+
leaderboard = []
|
|
68
|
+
|
|
69
|
+
for model_cfg in self.config["candidates"]:
|
|
70
|
+
candidate_class = self._get_candidate_class(model_cfg["class_path"])
|
|
71
|
+
fold_auc, fold_f1 = [], []
|
|
72
|
+
|
|
73
|
+
for train_idx, val_idx in skf.split(self.X, self.y):
|
|
74
|
+
X_train, X_val = self.X.iloc[train_idx], self.X.iloc[val_idx]
|
|
75
|
+
y_train, y_val = self.y.iloc[train_idx], self.y.iloc[val_idx]
|
|
76
|
+
|
|
77
|
+
X_tr_fe, X_val_fe = self._apply_transformers(X_train, X_val, y_train)
|
|
78
|
+
|
|
79
|
+
model = candidate_class(model_cfg["hyperparameters"])
|
|
80
|
+
model.fit(X_tr_fe, y_train)
|
|
81
|
+
|
|
82
|
+
preds = model.predict_proba(X_val_fe)[:, 1]
|
|
83
|
+
fold_auc.append(roc_auc_score(y_val, preds))
|
|
84
|
+
fold_f1.append(f1_score(y_val, (preds >= 0.5).astype(int)))
|
|
85
|
+
|
|
86
|
+
leaderboard.append(
|
|
87
|
+
{
|
|
88
|
+
"candidate_name": model_cfg["name"],
|
|
89
|
+
"mean_roc_auc": float(np.mean(fold_auc)),
|
|
90
|
+
"mean_f1": float(np.mean(fold_f1)),
|
|
91
|
+
}
|
|
92
|
+
)
|
|
93
|
+
|
|
94
|
+
return pd.DataFrame(leaderboard)
|
|
95
|
+
|
|
96
|
+
def export_champion(self):
|
|
97
|
+
prod_cfg = self.config["production_target"]
|
|
98
|
+
champ_name = prod_cfg["champion_candidate_name"]
|
|
99
|
+
model_cfg = next(c for c in self.config["candidates"] if c["name"] == champ_name)
|
|
100
|
+
|
|
101
|
+
X_final = self.X.copy()
|
|
102
|
+
for trans in self.custom_transformers:
|
|
103
|
+
X_final = trans.fit_transform(X_final, self.y)
|
|
104
|
+
|
|
105
|
+
candidate_class = self._get_candidate_class(model_cfg["class_path"])
|
|
106
|
+
model = candidate_class(model_cfg["hyperparameters"])
|
|
107
|
+
model.fit(X_final, self.y)
|
|
108
|
+
|
|
109
|
+
out_dir = prod_cfg["export_directory"]
|
|
110
|
+
os.makedirs(out_dir, exist_ok=True)
|
|
111
|
+
|
|
112
|
+
joblib.dump(model, os.path.join(out_dir, "model_weights.pkl"))
|
|
113
|
+
joblib.dump(self.custom_transformers, os.path.join(out_dir, "feature_pipeline.pkl"))
|
|
114
|
+
|
|
115
|
+
metadata = {
|
|
116
|
+
"model_name": self.config["project"]["name"],
|
|
117
|
+
"version": self.config["project"]["experiment_version"],
|
|
118
|
+
"features": list(X_final.columns),
|
|
119
|
+
"target": self.config["project"]["target_variable"],
|
|
120
|
+
"metrics": {"primary_metric": self.config["experiment_settings"]["primary_metric"]},
|
|
121
|
+
}
|
|
122
|
+
with open(os.path.join(out_dir, "metadata.json"), "w") as f:
|
|
123
|
+
json.dump(metadata, f, indent=4)
|
|
@@ -0,0 +1,21 @@
|
|
|
1
|
+
import numpy as np
|
|
2
|
+
import pandas as pd
|
|
3
|
+
from pandas import Series
|
|
4
|
+
from sklearn.dummy import DummyClassifier
|
|
5
|
+
from ..core.base import BaseModelCandidate
|
|
6
|
+
|
|
7
|
+
|
|
8
|
+
class ExampleCandidate(BaseModelCandidate):
|
|
9
|
+
"""A minimal candidate that uses a dummy classifier."""
|
|
10
|
+
|
|
11
|
+
def __init__(self, hyperparameters: dict):
|
|
12
|
+
super().__init__(hyperparameters)
|
|
13
|
+
strategy = hyperparameters.get("strategy", "prior")
|
|
14
|
+
self.model = DummyClassifier(strategy=strategy)
|
|
15
|
+
|
|
16
|
+
def fit(self, X_train: pd.DataFrame, y_train: Series):
|
|
17
|
+
self.model.fit(X_train, y_train)
|
|
18
|
+
return self
|
|
19
|
+
|
|
20
|
+
def predict_proba(self, X: pd.DataFrame) -> np.ndarray:
|
|
21
|
+
return self.model.predict_proba(X)
|
|
@@ -0,0 +1,16 @@
|
|
|
1
|
+
import pandas as pd
|
|
2
|
+
from pandas import Series
|
|
3
|
+
from ..core.base import BaseFeatureTransformer
|
|
4
|
+
|
|
5
|
+
|
|
6
|
+
class ExampleTransformer(BaseFeatureTransformer):
|
|
7
|
+
"""A minimal sample transformer that preserves the input index."""
|
|
8
|
+
|
|
9
|
+
def fit(self, X: pd.DataFrame, y: Series = None):
|
|
10
|
+
self.feature_names_ = list(X.columns)
|
|
11
|
+
return self
|
|
12
|
+
|
|
13
|
+
def transform(self, X: pd.DataFrame) -> pd.DataFrame:
|
|
14
|
+
transformed = X.copy()
|
|
15
|
+
transformed.index = X.index
|
|
16
|
+
return transformed
|
|
@@ -0,0 +1,54 @@
|
|
|
1
|
+
Metadata-Version: 2.4
|
|
2
|
+
Name: kmds-modeling
|
|
3
|
+
Version: 0.1.0
|
|
4
|
+
Summary: KMDS modeling pipeline package for KMDS lifecycle and model selection.
|
|
5
|
+
Requires-Python: >=3.13
|
|
6
|
+
Description-Content-Type: text/markdown
|
|
7
|
+
Requires-Dist: kmds-featurization>=0.1.5
|
|
8
|
+
Requires-Dist: click>=8.0
|
|
9
|
+
Requires-Dist: pandas>=2.0
|
|
10
|
+
Requires-Dist: numpy>=1.26
|
|
11
|
+
Requires-Dist: scikit-learn>=1.4
|
|
12
|
+
Requires-Dist: PyYAML>=6.0
|
|
13
|
+
Requires-Dist: joblib>=1.3
|
|
14
|
+
|
|
15
|
+
# KMDS Modeling
|
|
16
|
+
|
|
17
|
+
`kmds-modeling` is a lightweight modeling package designed to work inside the KMDS ecosystem. It provides generic modeling infrastructure and pipeline utilities for KMDS-style workflows, while leaving domain-specific examples and workspace-specific implementations separate.
|
|
18
|
+
|
|
19
|
+
## What this package provides
|
|
20
|
+
- `src/kmds_modeling/core` — generic modeling package infrastructure
|
|
21
|
+
- `src/kmds_modeling/core/path_coordinator.py` — workspace-rooted path resolution for KMDS modeling
|
|
22
|
+
- `src/kmds_modeling/core/notebook_utils.py` — notebook-friendly workspace resolver
|
|
23
|
+
- `src/kmds_modeling/cli.py` — installable CLI glue for evaluation and export
|
|
24
|
+
- `models/sba_example` — an example SBA-specific modeling workflow kept outside the installed package
|
|
25
|
+
|
|
26
|
+
## Intended usage
|
|
27
|
+
This package is meant to be installed into a KMDS workspace and used against modeling artifacts generated by KMDS tools such as `kmds-featurization`. The package does not embed any domain-specific SBA implementation in the installable distribution.
|
|
28
|
+
|
|
29
|
+
## Installation
|
|
30
|
+
```bash
|
|
31
|
+
pip install kmds-modeling
|
|
32
|
+
```
|
|
33
|
+
|
|
34
|
+
## CLI commands
|
|
35
|
+
After installing, the package exposes the `kmds-modeling` CLI:
|
|
36
|
+
|
|
37
|
+
```bash
|
|
38
|
+
kmds-modeling evaluate --config /path/to/modeling_config.yaml
|
|
39
|
+
kmds-modeling export --config /path/to/modeling_config.yaml
|
|
40
|
+
```
|
|
41
|
+
|
|
42
|
+
## Working directory and configuration
|
|
43
|
+
KMDS modeling expects a `working_dir` and a `modeling_config.yaml` that defines the workspace layout. The package resolves paths using the `PathCoordinator` and writes modeling outputs into the workspace `models/` directory by default.
|
|
44
|
+
|
|
45
|
+
## Example workflow
|
|
46
|
+
1. Use KMDS featurization to generate `model_ready_numeric_data.csv` under `data/featurization/`.
|
|
47
|
+
2. Create `modeling_config.yaml` with a `working_dir` pointing to your KMDS workspace.
|
|
48
|
+
3. Run the package CLI to evaluate and export model artifacts.
|
|
49
|
+
|
|
50
|
+
## Packaging note
|
|
51
|
+
The published PyPI package should only contain the generic package code under `src/kmds_modeling/`. Workspace-specific examples such as `models/sba_example/` are intentionally kept outside the installable package source tree.
|
|
52
|
+
|
|
53
|
+
## Contributing
|
|
54
|
+
If you want to add another KMDS modeling example, put it under `models/<example_name>/` and leave the core package unchanged.
|
|
@@ -0,0 +1,19 @@
|
|
|
1
|
+
MANIFEST.in
|
|
2
|
+
README.md
|
|
3
|
+
pyproject.toml
|
|
4
|
+
src/kmds_modeling/__init__.py
|
|
5
|
+
src/kmds_modeling/cli.py
|
|
6
|
+
src/kmds_modeling.egg-info/PKG-INFO
|
|
7
|
+
src/kmds_modeling.egg-info/SOURCES.txt
|
|
8
|
+
src/kmds_modeling.egg-info/dependency_links.txt
|
|
9
|
+
src/kmds_modeling.egg-info/entry_points.txt
|
|
10
|
+
src/kmds_modeling.egg-info/requires.txt
|
|
11
|
+
src/kmds_modeling.egg-info/top_level.txt
|
|
12
|
+
src/kmds_modeling/core/__init__.py
|
|
13
|
+
src/kmds_modeling/core/base.py
|
|
14
|
+
src/kmds_modeling/core/notebook_utils.py
|
|
15
|
+
src/kmds_modeling/core/path_coordinator.py
|
|
16
|
+
src/kmds_modeling/core/runner.py
|
|
17
|
+
src/kmds_modeling/examples/__init__.py
|
|
18
|
+
src/kmds_modeling/examples/example_candidate.py
|
|
19
|
+
src/kmds_modeling/examples/example_transformer.py
|
|
@@ -0,0 +1 @@
|
|
|
1
|
+
|
|
@@ -0,0 +1 @@
|
|
|
1
|
+
kmds_modeling
|