PyPI - collie-mlops - Versions diffs - 0.1.0b0__tar.gz - Mend

collie-mlops 0.1.0b0__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Potentially problematic release.

This version of collie-mlops might be problematic. Click here for more details.

Files changed (53) hide show

collie_mlops-0.1.0b0/LICENSE ADDED Viewed

@@ -0,0 +1,21 @@
+MIT License
+Copyright (c) 2025 ChingHuanChiu
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+SOFTWARE.

collie_mlops-0.1.0b0/MANIFEST.in ADDED Viewed

@@ -0,0 +1,15 @@
+include README.md
+include LICENSE
+include requirements.txt
+exclude pytest.ini
+recursive-include collie *.py
+recursive-include collie py.typed
+recursive-exclude tests *
+recursive-exclude example *
+recursive-exclude deploy *
+recursive-exclude __pycache__ *
+recursive-exclude *.pyc
+recursive-exclude *.pyo
+recursive-exclude .DS_Store

collie_mlops-0.1.0b0/PKG-INFO ADDED Viewed

@@ -0,0 +1,217 @@
+Metadata-Version: 2.4
+Name: collie-mlops
+Version: 0.1.0b0
+Summary: A Lightweight MLOps Framework for Machine Learning Workflows
+Home-page: https://github.com/ChingHuanChiu/collie
+Author: ChingHuanChiu
+Author-email: ChingHuanChiu <stevenchiou8@gmail.com>
+Maintainer-email: ChingHuanChiu <stevenchiou8@gmail.com>
+License: MIT
+Project-URL: Homepage, https://github.com/ChingHuanChiu/collie
+Project-URL: Documentation, https://github.com/ChingHuanChiu/collie/blob/main/README.md
+Project-URL: Repository, https://github.com/ChingHuanChiu/collie
+Project-URL: Bug Tracker, https://github.com/ChingHuanChiu/collie/issues
+Project-URL: Changelog, https://github.com/ChingHuanChiu/collie/blob/main/CHANGELOG.md
+Keywords: mlops,machine-learning,mlflow,pipeline,orchestration,deep-learning,experiment-tracking
+Classifier: Development Status :: 4 - Beta
+Classifier: Intended Audience :: Developers
+Classifier: Intended Audience :: Science/Research
+Classifier: License :: OSI Approved :: MIT License
+Classifier: Operating System :: OS Independent
+Classifier: Programming Language :: Python :: 3
+Classifier: Programming Language :: Python :: 3.10
+Classifier: Programming Language :: Python :: 3.11
+Classifier: Programming Language :: Python :: 3.12
+Classifier: Programming Language :: Python :: 3.13
+Classifier: Programming Language :: Python :: 3 :: Only
+Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
+Classifier: Topic :: Software Development :: Libraries :: Python Modules
+Classifier: Typing :: Typed
+Requires-Python: >=3.10
+Description-Content-Type: text/markdown
+License-File: LICENSE
+Requires-Dist: mlflow>=2.0.0
+Requires-Dist: pydantic>=2.0.0
+Requires-Dist: pandas>=1.3.0
+Requires-Dist: numpy<2.0.0,>=1.20.0
+Requires-Dist: scikit-learn>=1.0.0
+Requires-Dist: xgboost>=1.5.0
+Requires-Dist: torch>=1.9.0
+Requires-Dist: pytorch-lightning>=2.0.0
+Requires-Dist: lightgbm>=3.0.0
+Requires-Dist: transformers>=4.0.0
+Requires-Dist: sentence-transformers>=2.0.0
+Dynamic: license-file
+# Collie 🐕
+[![PyPI version](https://badge.fury.io/py/collie-mlops.svg)](https://badge.fury.io/py/collie-mlops)
+[![Python 3.10+](https://img.shields.io/badge/python-3.10+-blue.svg)](https://www.python.org/downloads/)
+[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
+[![Documentation](https://img.shields.io/badge/docs-sphinx-blue.svg)](docs/_build/html/index.html)
+[![codecov](https://codecov.io/gh/ChingHuanChiu/collie/branch/main/graph/badge.svg)](https://codecov.io/gh/ChingHuanChiu/collie)
+A Lightweight MLOps Framework for Machine Learning Workflows
+## Overview
+Collie is a modern MLOps framework designed to streamline machine learning workflows by providing a component-based architecture integrated with MLflow. It enables data scientists and ML engineers to build, deploy, and manage ML pipelines with ease through modular components that handle different stages of the ML lifecycle.
+## Features
+- **Component-Based Architecture**: Modular design with specialized components for each ML workflow stage
+- **MLflow Integration**: Built-in experiment tracking, model registration, and deployment capabilities
+- **Pipeline Orchestration**: Seamless workflow management with event-driven architecture
+- **Model Management**: Automated model versioning, staging, and promotion
+- **Framework Agnostic**: Supports multiple ML frameworks (PyTorch, scikit-learn, XGBoost, LightGBM, Transformers)
+## Architecture
+Collie follows an event-driven architecture with the following core components:
+- **Transformer**: Data preprocessing and feature engineering
+- **Tuner**: Hyperparameter optimization
+- **Trainer**: Model training and validation
+- **Evaluator**: Model evaluation and comparison
+- **Pusher**: Model deployment and registration
+- **Orchestrator**: Workflow coordination and execution
+## Quick Start
+### Installation
+```bash
+pip install collie-mlops
+```
+This will install Collie with all supported ML frameworks including:
+- scikit-learn
+- PyTorch
+- XGBoost
+- LightGBM
+- Transformers (with Sentence Transformers)
+### Prerequisites
+- Python >= 3.10
+- MLflow tracking server (can be local or remote)
+## Components
+### Transformer
+Handles data preprocessing, feature engineering, and data validation.
+```python
+class CustomTransformer(Transformer):
+    def handle(self, event) -> Event:
+        # Process your data
+        processed_data = self.preprocess(raw_data)
+        return Event(payload=TransformerPayload(train_data=processed_data))
+```
+### Tuner
+Performs hyperparameter optimization using various strategies.
+```python
+class CustomTuner(Tuner):
+    def handle(self, event) -> Event:
+        # Optimize hyperparameters
+        best_params = self.optimize(search_space)
+        return Event(payload=TunerPayload(hyperparameters=best_params))
+```
+### Trainer
+Trains machine learning models with automatic experiment tracking.
+```python
+class CustomTrainer(Trainer):
+    def handle(self, event) -> Event:
+        # Train your model
+        model = self.train(data, hyperparameters)
+        return Event(payload=TrainerPayload(model=model))
+```
+### Evaluator
+Evaluates model performance and decides on deployment.
+```python
+class CustomEvaluator(Evaluator):
+    def handle(self, event) -> Event:
+        # Evaluate model performance
+        metrics = self.evaluate(model, test_data)
+        is_better = self.compare_with_production(metrics)
+        return Event(payload=EvaluatorPayload(
+            metrics=metrics,
+            is_better_than_production=is_better
+        ))
+```
+### Pusher
+Handles model deployment and registration.
+```python
+class CustomPusher(Pusher):
+    def handle(self, event) -> Event:
+        # Deploy model to production
+        model_uri = self.deploy(model)
+        return Event(payload=PusherPayload(model_uri=model_uri))
+```
+## Configuration
+### MLflow Setup
+Start MLflow tracking server:
+```bash
+mlflow server \
+    --backend-store-uri sqlite:///mlflow.db \
+    --default-artifact-root ./mlruns \
+    --host 0.0.0.0 \
+    --port 5000
+```
+## Supported Frameworks
+Collie supports multiple ML frameworks through its model flavor system currently:
+-  **PyTorch**
+-  **scikit-learn**
+-  **XGBoost**
+-  **LightGBM**
+-  **Transformers**
+## Documentation
+[Here you are]( https://collie-mlops.readthedocs.io/en/latest/getting_started.html )
+## Roadmap
+- [ ] TensorFlow/Keras support
+- [ ] Model monitoring and drift detection
+- [ ] Integration with Airflow/Kubeflow
+- [ ] Integrate an LLM training/fine-tuning framework
+- [ ] Solve the issue about heavy import and installation.
+## License
+This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.
+## Citation
+If you use Collie in your research, please cite:
+```bibtex
+@software{collie2025,
+  author = {ChingHuanChiu},
+  title = {Collie: A Lightweight MLOps Framework},
+  year = {2025},
+  url = {https://github.com/ChingHuanChiu/collie}
+}
+```
+---

collie_mlops-0.1.0b0/README.md ADDED Viewed

@@ -0,0 +1,172 @@
+# Collie 🐕
+[![PyPI version](https://badge.fury.io/py/collie-mlops.svg)](https://badge.fury.io/py/collie-mlops)
+[![Python 3.10+](https://img.shields.io/badge/python-3.10+-blue.svg)](https://www.python.org/downloads/)
+[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
+[![Documentation](https://img.shields.io/badge/docs-sphinx-blue.svg)](docs/_build/html/index.html)
+[![codecov](https://codecov.io/gh/ChingHuanChiu/collie/branch/main/graph/badge.svg)](https://codecov.io/gh/ChingHuanChiu/collie)
+A Lightweight MLOps Framework for Machine Learning Workflows
+## Overview
+Collie is a modern MLOps framework designed to streamline machine learning workflows by providing a component-based architecture integrated with MLflow. It enables data scientists and ML engineers to build, deploy, and manage ML pipelines with ease through modular components that handle different stages of the ML lifecycle.
+## Features
+- **Component-Based Architecture**: Modular design with specialized components for each ML workflow stage
+- **MLflow Integration**: Built-in experiment tracking, model registration, and deployment capabilities
+- **Pipeline Orchestration**: Seamless workflow management with event-driven architecture
+- **Model Management**: Automated model versioning, staging, and promotion
+- **Framework Agnostic**: Supports multiple ML frameworks (PyTorch, scikit-learn, XGBoost, LightGBM, Transformers)
+## Architecture
+Collie follows an event-driven architecture with the following core components:
+- **Transformer**: Data preprocessing and feature engineering
+- **Tuner**: Hyperparameter optimization
+- **Trainer**: Model training and validation
+- **Evaluator**: Model evaluation and comparison
+- **Pusher**: Model deployment and registration
+- **Orchestrator**: Workflow coordination and execution
+## Quick Start
+### Installation
+```bash
+pip install collie-mlops
+```
+This will install Collie with all supported ML frameworks including:
+- scikit-learn
+- PyTorch
+- XGBoost
+- LightGBM
+- Transformers (with Sentence Transformers)
+### Prerequisites
+- Python >= 3.10
+- MLflow tracking server (can be local or remote)
+## Components
+### Transformer
+Handles data preprocessing, feature engineering, and data validation.
+```python
+class CustomTransformer(Transformer):
+    def handle(self, event) -> Event:
+        # Process your data
+        processed_data = self.preprocess(raw_data)
+        return Event(payload=TransformerPayload(train_data=processed_data))
+```
+### Tuner
+Performs hyperparameter optimization using various strategies.
+```python
+class CustomTuner(Tuner):
+    def handle(self, event) -> Event:
+        # Optimize hyperparameters
+        best_params = self.optimize(search_space)
+        return Event(payload=TunerPayload(hyperparameters=best_params))
+```
+### Trainer
+Trains machine learning models with automatic experiment tracking.
+```python
+class CustomTrainer(Trainer):
+    def handle(self, event) -> Event:
+        # Train your model
+        model = self.train(data, hyperparameters)
+        return Event(payload=TrainerPayload(model=model))
+```
+### Evaluator
+Evaluates model performance and decides on deployment.
+```python
+class CustomEvaluator(Evaluator):
+    def handle(self, event) -> Event:
+        # Evaluate model performance
+        metrics = self.evaluate(model, test_data)
+        is_better = self.compare_with_production(metrics)
+        return Event(payload=EvaluatorPayload(
+            metrics=metrics,
+            is_better_than_production=is_better
+        ))
+```
+### Pusher
+Handles model deployment and registration.
+```python
+class CustomPusher(Pusher):
+    def handle(self, event) -> Event:
+        # Deploy model to production
+        model_uri = self.deploy(model)
+        return Event(payload=PusherPayload(model_uri=model_uri))
+```
+## Configuration
+### MLflow Setup
+Start MLflow tracking server:
+```bash
+mlflow server \
+    --backend-store-uri sqlite:///mlflow.db \
+    --default-artifact-root ./mlruns \
+    --host 0.0.0.0 \
+    --port 5000
+```
+## Supported Frameworks
+Collie supports multiple ML frameworks through its model flavor system currently:
+-  **PyTorch**
+-  **scikit-learn**
+-  **XGBoost**
+-  **LightGBM**
+-  **Transformers**
+## Documentation
+[Here you are]( https://collie-mlops.readthedocs.io/en/latest/getting_started.html )
+## Roadmap
+- [ ] TensorFlow/Keras support
+- [ ] Model monitoring and drift detection
+- [ ] Integration with Airflow/Kubeflow
+- [ ] Integrate an LLM training/fine-tuning framework
+- [ ] Solve the issue about heavy import and installation.
+## License
+This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.
+## Citation
+If you use Collie in your research, please cite:
+```bibtex
+@software{collie2025,
+  author = {ChingHuanChiu},
+  title = {Collie: A Lightweight MLOps Framework},
+  year = {2025},
+  url = {https://github.com/ChingHuanChiu/collie}
+}
+```
+---

collie_mlops-0.1.0b0/collie/__init__.py ADDED Viewed

@@ -0,0 +1,69 @@
+"""
+Collie - A Lightweight MLOps Framework for Machine Learning Workflows
+Collie provides a modular, event-driven architecture for building ML pipelines
+with deep MLflow integration.
+Quick Start:
+    >>> from collie import Transformer, Trainer, Orchestrator
+    >>> # Define your components
+    >>> orchestrator = Orchestrator(
+    ...     components=[MyTransformer(), MyTrainer()],
+    ...     tracking_uri="http://localhost:5000",
+    ...     registered_model_name="my_model"
+    ... )
+    >>> orchestrator.run()
+For more examples, see: https://github.com/ChingHuanChiu/collie
+"""
+__author__ = "ChingHuanChiu"
+__email__ = "stevenchiou8@gmail.com"
+__version__ = "0.1.0b0"
+# Import all main components for easy access
+from .contracts.event import Event, EventType, PipelineContext
+from .core.transform.transform import Transformer
+from .core.trainer.trainer import Trainer
+from .core.tuner.tuner import Tuner
+from .core.evaluator.evaluator import Evaluator
+from .core.pusher.pusher import Pusher
+from .core.orchestrator.orchestrator import Orchestrator
+# Import data models
+from .core.models import (
+    TransformerPayload,
+    TrainerPayload,
+    TunerPayload,
+    EvaluatorPayload,
+    PusherPayload,
+)
+# Import enums for configuration
+from .core.enums.ml_models import ModelFlavor, MLflowModelStage
+__all__ = [
+    # Core components - the main classes users interact with
+    "Transformer",
+    "Trainer",
+    "Tuner",
+    "Evaluator",
+    "Pusher",
+    "Orchestrator",
+    # Event system - for building custom workflows
+    "Event",
+    "EventType",
+    "PipelineContext",
+    # Payload models - for type-safe data passing
+    "TransformerPayload",
+    "TrainerPayload",
+    "TunerPayload",
+    "EvaluatorPayload",
+    "PusherPayload",
+    # Configuration enums
+    "ModelFlavor",
+    "MLflowModelStage",
+]

collie_mlops-0.1.0b0/collie/_common/__init__.py ADDED Viewed

File without changes

collie_mlops-0.1.0b0/collie/_common/decorator.py ADDED Viewed

@@ -0,0 +1,53 @@
+from typing import Tuple, List
+from functools import wraps
+def type_checker(
+    typing: Tuple[type],
+    error_msg: str
+):
+    """
+    A decorator that checks the type of the output of a function.
+    Args:
+        typing (Tuple[type]): A tuple of types to check against.
+        error_msg (str): The error message to be raised if the type does not match.
+    Raises:
+        TypeError: If the type of the output of the function does not match with given types.
+    """
+    def closure(func):
+        @wraps(func)
+        def wrapper(*arg, **kwarg):
+            result = func(*arg, **kwarg)
+            if not isinstance(result, typing):
+                raise TypeError(error_msg)
+            return result
+        return wrapper
+    return closure
+def dict_key_checker(keys: List[str]):
+    """
+    A decorator that checks the keys of the output of a function.
+    Args:
+        keys (List[str]): A list of keys to check against.
+    Raises:
+        TypeError: If the output of the function is not a dictionary.
+        KeyError: If the output of the function does not contain all the keys in the list.
+    """
+    def closure(func):
+        @wraps(func)
+        def wrapper(*arg, **kwarg):
+            result = func(*arg, **kwarg)
+            if not isinstance(result, dict):
+                raise TypeError("The output must be a dictionary.")
+            all_keys_exist = all(key in result for key in keys)
+            if not all_keys_exist:
+                raise KeyError(f"The following keys must all exist in the output: {keys}. Output: {result}")
+            return result
+        return wrapper
+    return closure

collie_mlops-0.1.0b0/collie/_common/exceptions.py ADDED Viewed

@@ -0,0 +1,104 @@
+class CollieBaseException(Exception):
+    """Base exception for all Collie framework errors."""
+    def __init__(self, message: str, component: str = None, details: dict = None):
+        self.message = message
+        self.component = component or self.__class__.__name__.replace('Error', '')
+        self.details = details or {}
+        detailed_message = f"[{self.component}] {message}"
+        if self.details:
+            detailed_message += f" Details: {self.details}"
+        super().__init__(detailed_message)
+class MLflowConfigurationError(CollieBaseException):
+    """Raised when MLflow configuration is invalid."""
+    def __init__(self, message: str, config_param: str = None, **kwargs):
+        details = kwargs.get('details', {})
+        if config_param:
+            details['config_parameter'] = config_param
+        super().__init__(message, component="MLflow Config", details=details)
+class MLflowOperationError(CollieBaseException):
+    """Raised when MLflow operations fail."""
+    def __init__(self, message: str, operation: str = None, **kwargs):
+        details = kwargs.get('details', {})
+        if operation:
+            details['operation'] = operation
+        super().__init__(message, component="MLflow Operation", details=details)
+class OrchestratorError(CollieBaseException):
+    """Raised for errors in the orchestrator process."""
+    def __init__(self, message: str, pipeline_stage: str = None, **kwargs):
+        details = kwargs.get('details', {})
+        if pipeline_stage:
+            details['pipeline_stage'] = pipeline_stage
+        super().__init__(message, component="Orchestrator", details=details)
+class TransformerError(CollieBaseException):
+    """Raised when data transformation fails."""
+    def __init__(self, message: str, data_type: str = None, **kwargs):
+        details = kwargs.get('details', {})
+        if data_type:
+            details['data_type'] = data_type
+        super().__init__(message, component="Transformer", details=details)
+class TrainerError(CollieBaseException):
+    """Raised when model training fails."""
+    def __init__(self, message: str, model_type: str = None, **kwargs):
+        details = kwargs.get('details', {})
+        if model_type:
+            details['model_type'] = model_type
+        super().__init__(message, component="Trainer", details=details)
+class TunerError(CollieBaseException):
+    """Raised when hyperparameter tuning fails."""
+    def __init__(self, message: str, tuning_method: str = None, **kwargs):
+        details = kwargs.get('details', {})
+        if tuning_method:
+            details['tuning_method'] = tuning_method
+        super().__init__(message, component="Tuner", details=details)
+class EvaluatorError(CollieBaseException):
+    """Raised when model evaluation fails."""
+    def __init__(self, message: str, metric: str = None, **kwargs):
+        details = kwargs.get('details', {})
+        if metric:
+            details['metric'] = metric
+        super().__init__(message, component="Evaluator", details=details)
+class PusherError(CollieBaseException):
+    """Raised when model pushing/deployment fails."""
+    def __init__(self, message: str, deployment_target: str = None, **kwargs):
+        details = kwargs.get('details', {})
+        if deployment_target:
+            details['deployment_target'] = deployment_target
+        super().__init__(message, component="Pusher", details=details)
+class ModelFlavorError(CollieBaseException):
+    """Raised when model flavor operations fail."""
+    def __init__(self, message: str, flavor: str = None, **kwargs):
+        details = kwargs.get('details', {})
+        if flavor:
+            details['flavor'] = flavor
+        super().__init__(message, component="Model Flavor", details=details)

collie_mlops-0.1.0b0/collie/_common/mlflow_model_io/__init__.py ADDED Viewed

File without changes

collie_mlops-0.1.0b0/collie/_common/mlflow_model_io/base_flavor_handler.py ADDED Viewed

@@ -0,0 +1,26 @@
+from abc import ABC, abstractmethod
+from typing import Any
+class FlavorHandler(ABC):
+    @abstractmethod
+    def can_handle(self, model: Any) -> bool:
+        raise NotImplementedError
+    @abstractmethod
+    def flavor(self):
+        raise NotImplementedError
+    @abstractmethod
+    def log_model(
+        self,
+        model: Any,
+        name: str,
+        **kwargs: Any
+    ) -> None:
+        raise NotImplementedError
+    @abstractmethod
+    def load_model(self, model_uri: str) -> Any:
+        raise NotImplementedError