mlobs 0.1.0__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
mlobs-0.1.0/.gitignore ADDED
@@ -0,0 +1,43 @@
1
+ __pycache__/
2
+ *.py[cod]
3
+ *$py.class
4
+ *.so
5
+
6
+ # Distribution / packaging
7
+ dist/
8
+ build/
9
+ *.egg-info/
10
+ .eggs/
11
+ *.egg
12
+
13
+ # Testing
14
+ .pytest_cache/
15
+ .coverage
16
+ coverage.xml
17
+ htmlcov/
18
+ .tox/
19
+
20
+ # Type checking
21
+ .mypy_cache/
22
+
23
+ # Linting
24
+ .ruff_cache/
25
+
26
+ # Virtual environments
27
+ .venv/
28
+ venv/
29
+ env/
30
+
31
+ # Editors
32
+ .vscode/
33
+ .idea/
34
+ *.swp
35
+ *.swo
36
+
37
+ # Environment files
38
+ .env
39
+ .env.*
40
+ !.env.example
41
+
42
+ # macOS
43
+ .DS_Store
@@ -0,0 +1,29 @@
1
+ # Changelog
2
+
3
+ All notable changes to this project will be documented in this file.
4
+
5
+ The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
6
+ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
7
+
8
+ ## [Unreleased]
9
+
10
+ ## [0.1.0] - 2026-02-24
11
+
12
+ ### Added
13
+ - `NumericDriftDetector` — Kolmogorov-Smirnov two-sample test for numeric columns
14
+ - `CategoricalDriftDetector` — Pearson chi-squared test for categorical columns
15
+ - `PSIDriftDetector` — Population Stability Index with quantile-based binning
16
+ - `JSDriftDetector` — Jensen-Shannon Divergence detector
17
+ - `DriftReport` and `ColumnDriftResult` dataclasses with `to_dict` / `from_dict`
18
+ serialisation
19
+ - `PipelineLogger` — records per-column feature statistics at named pipeline steps
20
+ - `JSONFormatter` — handles numpy scalars, arrays, NaN → null
21
+ - Backend adapters for **pandas**, **polars** (DataFrame + LazyFrame), and
22
+ **pyarrow** (Table)
23
+ - `get_adapter()` factory with lazy imports for each backend
24
+ - `DataFrameAdapter` Protocol (PEP 544, runtime-checkable)
25
+ - PEP 561 `py.typed` marker for inline type annotations
26
+ - GitHub Actions CI matrix (Python 3.9–3.12) and PyPI publish workflow
27
+
28
+ [Unreleased]: https://github.com/your-org/mlobs/compare/v0.1.0...HEAD
29
+ [0.1.0]: https://github.com/your-org/mlobs/releases/tag/v0.1.0
mlobs-0.1.0/LICENSE ADDED
@@ -0,0 +1,21 @@
1
+ MIT License
2
+
3
+ Copyright (c) 2026 mlobs contributors
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining a copy
6
+ of this software and associated documentation files (the "Software"), to deal
7
+ in the Software without restriction, including without limitation the rights
8
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
+ copies of the Software, and to permit persons to whom the Software is
10
+ furnished to do so, subject to the following conditions:
11
+
12
+ The above copyright notice and this permission notice shall be included in all
13
+ copies or substantial portions of the Software.
14
+
15
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21
+ SOFTWARE.
mlobs-0.1.0/PKG-INFO ADDED
@@ -0,0 +1,319 @@
1
+ Metadata-Version: 2.4
2
+ Name: mlobs
3
+ Version: 0.1.0
4
+ Summary: ML observability for structured dataframes: drift detection and pipeline logging
5
+ Project-URL: Homepage, https://github.com/Ahalya24/mlobs
6
+ Project-URL: Repository, https://github.com/Ahalya24/mlobs
7
+ Project-URL: Bug Tracker, https://github.com/Ahalya24/mlobs/issues
8
+ Project-URL: Changelog, https://github.com/Ahalya24/mlobs/blob/main/CHANGELOG.md
9
+ License: MIT License
10
+
11
+ Copyright (c) 2026 mlobs contributors
12
+
13
+ Permission is hereby granted, free of charge, to any person obtaining a copy
14
+ of this software and associated documentation files (the "Software"), to deal
15
+ in the Software without restriction, including without limitation the rights
16
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
17
+ copies of the Software, and to permit persons to whom the Software is
18
+ furnished to do so, subject to the following conditions:
19
+
20
+ The above copyright notice and this permission notice shall be included in all
21
+ copies or substantial portions of the Software.
22
+
23
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
24
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
25
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
26
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
27
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
28
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
29
+ SOFTWARE.
30
+ License-File: LICENSE
31
+ Keywords: dataframes,drift,machine-learning,mlops,monitoring,observability
32
+ Classifier: Development Status :: 4 - Beta
33
+ Classifier: Intended Audience :: Developers
34
+ Classifier: Intended Audience :: Science/Research
35
+ Classifier: License :: OSI Approved :: MIT License
36
+ Classifier: Programming Language :: Python :: 3
37
+ Classifier: Programming Language :: Python :: 3.9
38
+ Classifier: Programming Language :: Python :: 3.10
39
+ Classifier: Programming Language :: Python :: 3.11
40
+ Classifier: Programming Language :: Python :: 3.12
41
+ Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
42
+ Classifier: Topic :: Software Development :: Libraries :: Python Modules
43
+ Classifier: Typing :: Typed
44
+ Requires-Python: >=3.9
45
+ Requires-Dist: numpy>=1.21
46
+ Requires-Dist: scipy>=1.7
47
+ Provides-Extra: all
48
+ Requires-Dist: pandas>=1.3; extra == 'all'
49
+ Requires-Dist: polars>=0.18; extra == 'all'
50
+ Requires-Dist: pyarrow>=10.0; extra == 'all'
51
+ Provides-Extra: arrow
52
+ Requires-Dist: pyarrow>=10.0; extra == 'arrow'
53
+ Provides-Extra: dev
54
+ Requires-Dist: mypy>=1.5; extra == 'dev'
55
+ Requires-Dist: pandas>=1.3; extra == 'dev'
56
+ Requires-Dist: polars>=0.18; extra == 'dev'
57
+ Requires-Dist: pyarrow>=10.0; extra == 'dev'
58
+ Requires-Dist: pytest-cov>=4.1; extra == 'dev'
59
+ Requires-Dist: pytest>=7.4; extra == 'dev'
60
+ Requires-Dist: ruff>=0.1; extra == 'dev'
61
+ Provides-Extra: pandas
62
+ Requires-Dist: pandas>=1.3; extra == 'pandas'
63
+ Provides-Extra: polars
64
+ Requires-Dist: polars>=0.18; extra == 'polars'
65
+ Description-Content-Type: text/markdown
66
+
67
+ # mlobs
68
+
69
+ **ML observability for structured dataframes.**
70
+
71
+ `mlobs` provides drift detection and pipeline step logging for pandas, polars,
72
+ and PyArrow dataframes — with no dependency on any external ML platform.
73
+
74
+ [![CI](https://github.com/your-org/mlobs/actions/workflows/ci.yml/badge.svg)](https://github.com/your-org/mlobs/actions/workflows/ci.yml)
75
+ [![PyPI version](https://img.shields.io/pypi/v/mlobs.svg)](https://pypi.org/project/mlobs/)
76
+ [![License: MIT](https://img.shields.io/badge/License-MIT-blue.svg)](LICENSE)
77
+ [![Python 3.9+](https://img.shields.io/pypi/pyversions/mlobs.svg)](https://pypi.org/project/mlobs/)
78
+
79
+ ---
80
+
81
+ ## Installation
82
+
83
+ ```bash
84
+ # Core (numpy + scipy — no backend adapters)
85
+ pip install mlobs
86
+
87
+ # With a specific backend
88
+ pip install "mlobs[pandas]"
89
+ pip install "mlobs[polars]"
90
+ pip install "mlobs[arrow]"
91
+
92
+ # All backends
93
+ pip install "mlobs[all]"
94
+
95
+ # All backends + dev tools (pytest, mypy, ruff)
96
+ pip install "mlobs[dev]"
97
+ ```
98
+
99
+ ---
100
+
101
+ ## Quick Start
102
+
103
+ ### Drift Detection
104
+
105
+ ```python
106
+ import pandas as pd
107
+ import numpy as np
108
+ from mlobs import NumericDriftDetector, CategoricalDriftDetector, DriftReport
109
+ from mlobs import get_adapter
110
+
111
+ # Load reference (training) and current (production) data
112
+ ref = pd.DataFrame({
113
+ "age": np.random.default_rng(0).normal(30, 5, 1000),
114
+ "income": np.random.default_rng(1).normal(50000, 10000, 1000),
115
+ "segment": np.random.default_rng(2).choice(["A", "B", "C"], 1000),
116
+ })
117
+ cur = pd.DataFrame({
118
+ "age": np.random.default_rng(0).normal(35, 5, 1000), # shifted
119
+ "income": np.random.default_rng(1).normal(50000, 10000, 1000),
120
+ "segment": np.random.default_rng(3).choice(["B", "C", "D"], 1000), # shifted
121
+ })
122
+
123
+ adapter = get_adapter(ref)
124
+
125
+ # Test each column with the appropriate detector
126
+ ks = NumericDriftDetector(p_value_threshold=0.05)
127
+ chi = CategoricalDriftDetector(p_value_threshold=0.05)
128
+
129
+ results = [
130
+ ks.detect(adapter.to_numpy(ref, "age"),
131
+ adapter.to_numpy(cur, "age"),
132
+ column_name="age"),
133
+ ks.detect(adapter.to_numpy(ref, "income"),
134
+ adapter.to_numpy(cur, "income"),
135
+ column_name="income"),
136
+ chi.detect(adapter.to_numpy(ref, "segment"),
137
+ adapter.to_numpy(cur, "segment"),
138
+ column_name="segment"),
139
+ ]
140
+
141
+ report = DriftReport(
142
+ reference_name="2024-Q1",
143
+ current_name="2024-Q2",
144
+ results=results,
145
+ )
146
+
147
+ print(report.summary)
148
+ # {'total_columns': 3, 'drifted': 2, 'not_drifted': 1}
149
+
150
+ print(report.drifted_columns)
151
+ # ['age', 'segment']
152
+
153
+ print(report.to_json())
154
+ ```
155
+
156
+ ### Pipeline Logging
157
+
158
+ ```python
159
+ import pandas as pd
160
+ from mlobs import PipelineLogger
161
+
162
+ logger = PipelineLogger(name="preprocessing")
163
+
164
+ raw = pd.read_csv("data.csv")
165
+ logger.log_step(raw, step_name="raw_input", metadata={"source": "data.csv"})
166
+
167
+ cleaned = raw.dropna()
168
+ logger.log_step(cleaned, step_name="after_drop_na")
169
+
170
+ scaled = cleaned.copy()
171
+ scaled["age"] = (scaled["age"] - scaled["age"].mean()) / scaled["age"].std()
172
+ logger.log_step(scaled, step_name="after_scaling", columns=["age"])
173
+
174
+ # Persist the log as JSON
175
+ logger.dump("pipeline_log.json")
176
+
177
+ # Or get as a Python dict / JSON string
178
+ d = logger.to_dict()
179
+ json_str = logger.to_json()
180
+ ```
181
+
182
+ ### Context Manager
183
+
184
+ ```python
185
+ from mlobs import PipelineLogger
186
+
187
+ with PipelineLogger(name="my_pipeline") as logger:
188
+ logger.log_step(df1, "step_1")
189
+ logger.log_step(df2, "step_2")
190
+
191
+ print(logger.records)
192
+ ```
193
+
194
+ ---
195
+
196
+ ## Supported Drift Tests
197
+
198
+ | Detector | Column type | Test | p-value |
199
+ |---|---|---|---|
200
+ | `NumericDriftDetector` | numeric | Kolmogorov-Smirnov two-sample | yes |
201
+ | `CategoricalDriftDetector` | categorical | Pearson chi-squared | yes |
202
+ | `PSIDriftDetector` | numeric / categorical | Population Stability Index | no |
203
+ | `JSDriftDetector` | numeric / categorical | Jensen-Shannon Divergence | no |
204
+
205
+ **PSI thresholds** (conventional):
206
+ - PSI < 0.10 → no significant change
207
+ - 0.10 ≤ PSI < 0.20 → moderate change
208
+ - PSI ≥ 0.20 → significant drift
209
+
210
+ ---
211
+
212
+ ## Multi-Backend Support
213
+
214
+ The same API works across all supported backends:
215
+
216
+ ```python
217
+ import polars as pl
218
+ import pyarrow as pa
219
+ from mlobs import get_adapter, NumericDriftDetector
220
+
221
+ # polars
222
+ df_pl = pl.read_parquet("data.parquet")
223
+ adapter = get_adapter(df_pl)
224
+ arr = adapter.to_numpy(df_pl, "age")
225
+
226
+ # pyarrow
227
+ table = pa.ipc.open_file("data.arrow").read_all()
228
+ adapter = get_adapter(table)
229
+ arr = adapter.to_numpy(table, "age")
230
+ ```
231
+
232
+ ---
233
+
234
+ ## API Reference
235
+
236
+ ### Drift Detection
237
+
238
+ ```python
239
+ from mlobs import (
240
+ NumericDriftDetector,
241
+ CategoricalDriftDetector,
242
+ PSIDriftDetector,
243
+ JSDriftDetector,
244
+ DriftReport,
245
+ ColumnDriftResult,
246
+ )
247
+ ```
248
+
249
+ **`NumericDriftDetector(p_value_threshold=0.05, alternative="two-sided")`**
250
+ - `.detect(reference, current, column_name="unknown") -> ColumnDriftResult`
251
+
252
+ **`CategoricalDriftDetector(p_value_threshold=0.05, min_expected_count=5.0)`**
253
+ - `.detect(reference, current, column_name="unknown") -> ColumnDriftResult`
254
+
255
+ **`PSIDriftDetector(threshold=0.20, n_bins=10, epsilon=1e-4)`**
256
+ - `.detect(reference, current, column_name="unknown", is_categorical=False) -> ColumnDriftResult`
257
+
258
+ **`JSDriftDetector(threshold=0.1, n_bins=10, epsilon=1e-4)`**
259
+ - `.detect(reference, current, column_name="unknown", is_categorical=False) -> ColumnDriftResult`
260
+
261
+ **`DriftReport`**
262
+ - `.summary -> dict` — `{total_columns, drifted, not_drifted}`
263
+ - `.drifted_columns -> list[str]`
264
+ - `.to_dict() -> dict`
265
+ - `.to_json(indent=2) -> str`
266
+ - `.from_dict(d) -> DriftReport` (classmethod)
267
+
268
+ ### Pipeline Logging
269
+
270
+ ```python
271
+ from mlobs import PipelineLogger, StepRecord, ColumnStats
272
+ ```
273
+
274
+ **`PipelineLogger(name="pipeline")`**
275
+ - `.log_step(df, step_name, columns=None, metadata=None) -> StepRecord`
276
+ - `.records -> list[StepRecord]`
277
+ - `.clear()`
278
+ - `.to_dict() -> dict`
279
+ - `.to_json(indent=2) -> str`
280
+ - `.dump(path)`
281
+
282
+ ### Adapters
283
+
284
+ ```python
285
+ from mlobs import get_adapter, DataFrameAdapter
286
+ ```
287
+
288
+ **`get_adapter(df) -> DataFrameAdapter`** — auto-detects backend
289
+
290
+ **`DataFrameAdapter`** Protocol methods (all take `df` as first arg):
291
+ - `shape(df) -> (n_rows, n_cols)`
292
+ - `column_names(df) -> list[str]`
293
+ - `is_numeric(df, col) -> bool`
294
+ - `to_numpy(df, col) -> np.ndarray`
295
+ - `compute_column_stats(df, col) -> ColumnStats`
296
+
297
+ ---
298
+
299
+ ## Contributing
300
+
301
+ ```bash
302
+ git clone https://github.com/your-org/mlobs.git
303
+ cd mlobs
304
+ pip install -e ".[dev]"
305
+ pytest
306
+ ```
307
+
308
+ Run linting and type checking:
309
+
310
+ ```bash
311
+ ruff check src/ tests/
312
+ mypy src/mlobs
313
+ ```
314
+
315
+ ---
316
+
317
+ ## License
318
+
319
+ MIT — see [LICENSE](LICENSE).
mlobs-0.1.0/README.md ADDED
@@ -0,0 +1,253 @@
1
+ # mlobs
2
+
3
+ **ML observability for structured dataframes.**
4
+
5
+ `mlobs` provides drift detection and pipeline step logging for pandas, polars,
6
+ and PyArrow dataframes — with no dependency on any external ML platform.
7
+
8
+ [![CI](https://github.com/your-org/mlobs/actions/workflows/ci.yml/badge.svg)](https://github.com/your-org/mlobs/actions/workflows/ci.yml)
9
+ [![PyPI version](https://img.shields.io/pypi/v/mlobs.svg)](https://pypi.org/project/mlobs/)
10
+ [![License: MIT](https://img.shields.io/badge/License-MIT-blue.svg)](LICENSE)
11
+ [![Python 3.9+](https://img.shields.io/pypi/pyversions/mlobs.svg)](https://pypi.org/project/mlobs/)
12
+
13
+ ---
14
+
15
+ ## Installation
16
+
17
+ ```bash
18
+ # Core (numpy + scipy — no backend adapters)
19
+ pip install mlobs
20
+
21
+ # With a specific backend
22
+ pip install "mlobs[pandas]"
23
+ pip install "mlobs[polars]"
24
+ pip install "mlobs[arrow]"
25
+
26
+ # All backends
27
+ pip install "mlobs[all]"
28
+
29
+ # All backends + dev tools (pytest, mypy, ruff)
30
+ pip install "mlobs[dev]"
31
+ ```
32
+
33
+ ---
34
+
35
+ ## Quick Start
36
+
37
+ ### Drift Detection
38
+
39
+ ```python
40
+ import pandas as pd
41
+ import numpy as np
42
+ from mlobs import NumericDriftDetector, CategoricalDriftDetector, DriftReport
43
+ from mlobs import get_adapter
44
+
45
+ # Load reference (training) and current (production) data
46
+ ref = pd.DataFrame({
47
+ "age": np.random.default_rng(0).normal(30, 5, 1000),
48
+ "income": np.random.default_rng(1).normal(50000, 10000, 1000),
49
+ "segment": np.random.default_rng(2).choice(["A", "B", "C"], 1000),
50
+ })
51
+ cur = pd.DataFrame({
52
+ "age": np.random.default_rng(0).normal(35, 5, 1000), # shifted
53
+ "income": np.random.default_rng(1).normal(50000, 10000, 1000),
54
+ "segment": np.random.default_rng(3).choice(["B", "C", "D"], 1000), # shifted
55
+ })
56
+
57
+ adapter = get_adapter(ref)
58
+
59
+ # Test each column with the appropriate detector
60
+ ks = NumericDriftDetector(p_value_threshold=0.05)
61
+ chi = CategoricalDriftDetector(p_value_threshold=0.05)
62
+
63
+ results = [
64
+ ks.detect(adapter.to_numpy(ref, "age"),
65
+ adapter.to_numpy(cur, "age"),
66
+ column_name="age"),
67
+ ks.detect(adapter.to_numpy(ref, "income"),
68
+ adapter.to_numpy(cur, "income"),
69
+ column_name="income"),
70
+ chi.detect(adapter.to_numpy(ref, "segment"),
71
+ adapter.to_numpy(cur, "segment"),
72
+ column_name="segment"),
73
+ ]
74
+
75
+ report = DriftReport(
76
+ reference_name="2024-Q1",
77
+ current_name="2024-Q2",
78
+ results=results,
79
+ )
80
+
81
+ print(report.summary)
82
+ # {'total_columns': 3, 'drifted': 2, 'not_drifted': 1}
83
+
84
+ print(report.drifted_columns)
85
+ # ['age', 'segment']
86
+
87
+ print(report.to_json())
88
+ ```
89
+
90
+ ### Pipeline Logging
91
+
92
+ ```python
93
+ import pandas as pd
94
+ from mlobs import PipelineLogger
95
+
96
+ logger = PipelineLogger(name="preprocessing")
97
+
98
+ raw = pd.read_csv("data.csv")
99
+ logger.log_step(raw, step_name="raw_input", metadata={"source": "data.csv"})
100
+
101
+ cleaned = raw.dropna()
102
+ logger.log_step(cleaned, step_name="after_drop_na")
103
+
104
+ scaled = cleaned.copy()
105
+ scaled["age"] = (scaled["age"] - scaled["age"].mean()) / scaled["age"].std()
106
+ logger.log_step(scaled, step_name="after_scaling", columns=["age"])
107
+
108
+ # Persist the log as JSON
109
+ logger.dump("pipeline_log.json")
110
+
111
+ # Or get as a Python dict / JSON string
112
+ d = logger.to_dict()
113
+ json_str = logger.to_json()
114
+ ```
115
+
116
+ ### Context Manager
117
+
118
+ ```python
119
+ from mlobs import PipelineLogger
120
+
121
+ with PipelineLogger(name="my_pipeline") as logger:
122
+ logger.log_step(df1, "step_1")
123
+ logger.log_step(df2, "step_2")
124
+
125
+ print(logger.records)
126
+ ```
127
+
128
+ ---
129
+
130
+ ## Supported Drift Tests
131
+
132
+ | Detector | Column type | Test | p-value |
133
+ |---|---|---|---|
134
+ | `NumericDriftDetector` | numeric | Kolmogorov-Smirnov two-sample | yes |
135
+ | `CategoricalDriftDetector` | categorical | Pearson chi-squared | yes |
136
+ | `PSIDriftDetector` | numeric / categorical | Population Stability Index | no |
137
+ | `JSDriftDetector` | numeric / categorical | Jensen-Shannon Divergence | no |
138
+
139
+ **PSI thresholds** (conventional):
140
+ - PSI < 0.10 → no significant change
141
+ - 0.10 ≤ PSI < 0.20 → moderate change
142
+ - PSI ≥ 0.20 → significant drift
143
+
144
+ ---
145
+
146
+ ## Multi-Backend Support
147
+
148
+ The same API works across all supported backends:
149
+
150
+ ```python
151
+ import polars as pl
152
+ import pyarrow as pa
153
+ from mlobs import get_adapter, NumericDriftDetector
154
+
155
+ # polars
156
+ df_pl = pl.read_parquet("data.parquet")
157
+ adapter = get_adapter(df_pl)
158
+ arr = adapter.to_numpy(df_pl, "age")
159
+
160
+ # pyarrow
161
+ table = pa.ipc.open_file("data.arrow").read_all()
162
+ adapter = get_adapter(table)
163
+ arr = adapter.to_numpy(table, "age")
164
+ ```
165
+
166
+ ---
167
+
168
+ ## API Reference
169
+
170
+ ### Drift Detection
171
+
172
+ ```python
173
+ from mlobs import (
174
+ NumericDriftDetector,
175
+ CategoricalDriftDetector,
176
+ PSIDriftDetector,
177
+ JSDriftDetector,
178
+ DriftReport,
179
+ ColumnDriftResult,
180
+ )
181
+ ```
182
+
183
+ **`NumericDriftDetector(p_value_threshold=0.05, alternative="two-sided")`**
184
+ - `.detect(reference, current, column_name="unknown") -> ColumnDriftResult`
185
+
186
+ **`CategoricalDriftDetector(p_value_threshold=0.05, min_expected_count=5.0)`**
187
+ - `.detect(reference, current, column_name="unknown") -> ColumnDriftResult`
188
+
189
+ **`PSIDriftDetector(threshold=0.20, n_bins=10, epsilon=1e-4)`**
190
+ - `.detect(reference, current, column_name="unknown", is_categorical=False) -> ColumnDriftResult`
191
+
192
+ **`JSDriftDetector(threshold=0.1, n_bins=10, epsilon=1e-4)`**
193
+ - `.detect(reference, current, column_name="unknown", is_categorical=False) -> ColumnDriftResult`
194
+
195
+ **`DriftReport`**
196
+ - `.summary -> dict` — `{total_columns, drifted, not_drifted}`
197
+ - `.drifted_columns -> list[str]`
198
+ - `.to_dict() -> dict`
199
+ - `.to_json(indent=2) -> str`
200
+ - `.from_dict(d) -> DriftReport` (classmethod)
201
+
202
+ ### Pipeline Logging
203
+
204
+ ```python
205
+ from mlobs import PipelineLogger, StepRecord, ColumnStats
206
+ ```
207
+
208
+ **`PipelineLogger(name="pipeline")`**
209
+ - `.log_step(df, step_name, columns=None, metadata=None) -> StepRecord`
210
+ - `.records -> list[StepRecord]`
211
+ - `.clear()`
212
+ - `.to_dict() -> dict`
213
+ - `.to_json(indent=2) -> str`
214
+ - `.dump(path)`
215
+
216
+ ### Adapters
217
+
218
+ ```python
219
+ from mlobs import get_adapter, DataFrameAdapter
220
+ ```
221
+
222
+ **`get_adapter(df) -> DataFrameAdapter`** — auto-detects backend
223
+
224
+ **`DataFrameAdapter`** Protocol methods (all take `df` as first arg):
225
+ - `shape(df) -> (n_rows, n_cols)`
226
+ - `column_names(df) -> list[str]`
227
+ - `is_numeric(df, col) -> bool`
228
+ - `to_numpy(df, col) -> np.ndarray`
229
+ - `compute_column_stats(df, col) -> ColumnStats`
230
+
231
+ ---
232
+
233
+ ## Contributing
234
+
235
+ ```bash
236
+ git clone https://github.com/your-org/mlobs.git
237
+ cd mlobs
238
+ pip install -e ".[dev]"
239
+ pytest
240
+ ```
241
+
242
+ Run linting and type checking:
243
+
244
+ ```bash
245
+ ruff check src/ tests/
246
+ mypy src/mlobs
247
+ ```
248
+
249
+ ---
250
+
251
+ ## License
252
+
253
+ MIT — see [LICENSE](LICENSE).