PyPI - triggerflow - Versions diffs - 0.1.12__tar.gz → 0.2.1__tar.gz - Mend

triggerflow 0.1.12tar.gz → 0.2.1tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (113) hide show

triggerflow-0.2.1/MANIFEST.in ADDED Viewed

	@@ -0,0 +1,2 @@
1	+ recursive-include src/triggerflow/starter *
2	+ recursive-include src/triggerflow/templates *

triggerflow-0.2.1/PKG-INFO ADDED Viewed

@@ -0,0 +1,97 @@
+Metadata-Version: 2.4
+Name: triggerflow
+Version: 0.2.1
+Summary: Utilities for ML models targeting hardware triggers
+Classifier: Programming Language :: Python :: 3
+Classifier: License :: OSI Approved :: MIT License
+Classifier: Operating System :: OS Independent
+Requires-Python: >=3.11
+Description-Content-Type: text/markdown
+Requires-Dist: cookiecutter>=2.3
+Requires-Dist: PyYAML>=6
+Requires-Dist: Jinja2>=3
+Requires-Dist: mlflow>=2.0
+Requires-Dist: kedro==1.0.0
+Provides-Extra: dev
+Requires-Dist: pytest-cov~=3.0; extra == "dev"
+Requires-Dist: pytest-mock<2.0,>=1.7.1; extra == "dev"
+Requires-Dist: pytest~=7.2; extra == "dev"
+Requires-Dist: ruff~=0.1.8; extra == "dev"
+# Machine Learning for Hardware Triggers
+`triggerflow` provides a set of utilities for Machine Learning models targeting FPGA deployment.
+The `TriggerModel` class consolidates several Machine Learning frontends and compiler backends to construct a "trigger model". MLflow utilities are for logging, versioning, and loading of trigger models.
+## Installation
+```bash
+pip install triggerflow
+```
+## Usage
+```python
+from triggerflow.core import TriggerModel
+triggerflow = TriggerModel(name="my-trigger-model", ml_backend="Keras", compiler="hls4ml", model, compiler_config or None)
+triggerflow() # call the constructor
+# then:
+output_software = triggerflow.software_predict(input_data)
+output_firmware = triggerflow.firmware_predict(input_data)
+output_qonnx = triggerflow.qonnx_predict(input_data)
+# save and load trigger models:
+triggerflow.save("triggerflow.tar.xz")
+# in a separate session:
+from triggerflow.core import TriggerModel
+triggerflow = TriggerModel.load("triggerflow.tar.xz")
+```
+## Logging with MLflow
+```python
+# logging with MLFlow:
+import mlflow
+from triggerflow.mlflow_wrapper import log_model
+mlflow.set_tracking_uri("https://ngt.cern.ch/models")
+experiment_id = mlflow.create_experiment("example-experiment")
+with mlflow.start_run(run_name="trial-v1", experiment_id=experiment_id):
+    log_model(triggerflow, registered_model_name="TriggerModel")
+```
+### Note: This package doesn't install dependencies so it won't disrupt specific training environments or custom compilers. For a reference environment, see `environment.yml`.
+# Creating a kedro pipeline
+This repository also comes with a default pipeline for trigger models based on kedro.
+One can create a new pipeline via:
+NOTE: no "-" and upper cases!
+```bash
+# Create a conda environment & activate it
+conda create -n triggerflow python=3.11
+conda activate triggerflow
+# install triggerflow
+pip install triggerflow
+# Create a pipeline
+triggerflow new demo_pipeline
+# NOTE: since we dont install dependency one has to create a
+# conda env based on the environment.yml file of the pipeline
+# this file can be changed to the needs of the indiviual project
+cd demo_pipeline
+conda env update -n triggerflow --file environment.yml
+# Run Kedro
+kedro run
+```

triggerflow-0.2.1/README.md ADDED Viewed

@@ -0,0 +1,77 @@
+# Machine Learning for Hardware Triggers
+`triggerflow` provides a set of utilities for Machine Learning models targeting FPGA deployment.
+The `TriggerModel` class consolidates several Machine Learning frontends and compiler backends to construct a "trigger model". MLflow utilities are for logging, versioning, and loading of trigger models.
+## Installation
+```bash
+pip install triggerflow
+```
+## Usage
+```python
+from triggerflow.core import TriggerModel
+triggerflow = TriggerModel(name="my-trigger-model", ml_backend="Keras", compiler="hls4ml", model, compiler_config or None)
+triggerflow() # call the constructor
+# then:
+output_software = triggerflow.software_predict(input_data)
+output_firmware = triggerflow.firmware_predict(input_data)
+output_qonnx = triggerflow.qonnx_predict(input_data)
+# save and load trigger models:
+triggerflow.save("triggerflow.tar.xz")
+# in a separate session:
+from triggerflow.core import TriggerModel
+triggerflow = TriggerModel.load("triggerflow.tar.xz")
+```
+## Logging with MLflow
+```python
+# logging with MLFlow:
+import mlflow
+from triggerflow.mlflow_wrapper import log_model
+mlflow.set_tracking_uri("https://ngt.cern.ch/models")
+experiment_id = mlflow.create_experiment("example-experiment")
+with mlflow.start_run(run_name="trial-v1", experiment_id=experiment_id):
+    log_model(triggerflow, registered_model_name="TriggerModel")
+```
+### Note: This package doesn't install dependencies so it won't disrupt specific training environments or custom compilers. For a reference environment, see `environment.yml`.
+# Creating a kedro pipeline
+This repository also comes with a default pipeline for trigger models based on kedro.
+One can create a new pipeline via:
+NOTE: no "-" and upper cases!
+```bash
+# Create a conda environment & activate it
+conda create -n triggerflow python=3.11
+conda activate triggerflow
+# install triggerflow
+pip install triggerflow
+# Create a pipeline
+triggerflow new demo_pipeline
+# NOTE: since we dont install dependency one has to create a
+# conda env based on the environment.yml file of the pipeline
+# this file can be changed to the needs of the indiviual project
+cd demo_pipeline
+conda env update -n triggerflow --file environment.yml
+# Run Kedro
+kedro run
+```

triggerflow-0.2.1/pyproject.toml ADDED Viewed

@@ -0,0 +1,49 @@
+[build-system]
+requires = ["setuptools>=65.5", "wheel"]
+build-backend = "setuptools.build_meta"
+[project]
+name = "triggerflow"
+version = "0.2.1"
+description = "Utilities for ML models targeting hardware triggers"
+readme = "README.md"
+requires-python = ">=3.11"
+dependencies = [
+    "cookiecutter>=2.3",
+    "PyYAML>=6",
+    "Jinja2>=3",
+    "mlflow>=2.0",
+    "kedro==1.0.0",
+]
+classifiers = [
+    "Programming Language :: Python :: 3",
+    "License :: OSI Approved :: MIT License",
+    "Operating System :: OS Independent"
+]
+[project.optional-dependencies]
+dev = [
+    "pytest-cov~=3.0",
+    "pytest-mock>=1.7.1, <2.0",
+    "pytest~=7.2", "ruff~=0.1.8",
+]
+[tool.setuptools]
+include-package-data = true
+[tool.setuptools.packages.find]
+where = ["src"]
+[tool.setuptools.package-data]
+triggerflow = ["starter/**", "starter/**/*"]
+[tool.ruff]
+line-length = 88
+show-fixes = true
+select = [ "F", "W", "E", "I", "UP", "PL", "T201",]
+ignore = [ "E501",]
+extend-exclude = ["src/triggerflow/starter"]
+# expose CLI entrypoint
+[project.scripts]
+triggerflow = "triggerflow.cli:main"

triggerflow-0.2.1/src/trigger_dataset/core.py ADDED Viewed

@@ -0,0 +1,88 @@
+import warnings
+from abc import ABC, abstractmethod
+from fnmatch import filter as fnmatch_filter
+import pandas as pd
+import uproot
+class TriggerDataset(ABC):
+    """
+    Abstract Base Class for loading data from ROOT files.
+    Users must inherit from this class and implement the abstract methods.
+    The core processing logic in `process_file` is fixed and cannot be overridden.
+    """
+    def __init__(self):
+        pass
+    @abstractmethod
+    def get_features(self) -> list[str]:
+        """
+        Return a list of branch names or patterns to keep from the dataset.
+        Accepts wildcards (e.g. Jet_*).
+        """
+        pass
+    @abstractmethod
+    def get_cut(self) -> str | None:
+        """
+        Return a string representing the cuts to apply to the data.
+        """
+        pass
+    @abstractmethod
+    def convert_to_pandas(self, data: dict) -> pd.DataFrame:
+        """
+        Convert the loaded data from a dictionary format to a pandas DataFrame.
+        """
+        pass
+    def _resolve_branches(self, all_branches: list) -> list[str]:
+        """Internal method to resolve wildcard patterns."""
+        selected = []
+        for pattern in self.get_features():
+            matched = fnmatch_filter(all_branches, pattern)
+            if not matched:
+                warnings.warn(f"'{pattern}' did not match any branches.")
+            selected.extend(matched)
+        return sorted(list(set(selected)))
+    def _save_to_parquet(self, df: pd.DataFrame, output_path: str):
+        """
+        Save the processed DataFrame to a file.
+        """
+        df.to_parquet(output_path)
+    def _save_to_csv(self, df: pd.DataFrame, output_path: str):
+        """
+        Save the processed DataFrame to a CSV file.
+        """
+        df.to_csv(output_path, index=False)
+    def process_file(self, file_path: str, out_file_path: str) -> pd.DataFrame:
+        """
+        Loads and processes a single ROOT file.
+        """
+        with uproot.open(f"{file_path}") as f:
+            tree = f[self.get_tree_name()]
+            all_branches = tree.keys()
+            branches_to_load = self._resolve_branches(all_branches)
+            if not branches_to_load:
+                return pd.DataFrame()
+            data = tree.arrays(branches_to_load, cut=self.get_cut(), how=dict)
+        df = self.convert_to_pandas(data)
+        if self.output_format == "parquet":
+            self._save_to_parquet(df, f"{out_file_path}.parquet")
+        elif self.output_format == "csv":
+            self._save_to_csv(df, f"{out_file_path}.csv")
+        else:
+            return pd.DataFrame()
+        return pd.DataFrame()

triggerflow-0.2.1/src/trigger_loader/__init__.py ADDED Viewed

File without changes

triggerflow-0.2.1/src/trigger_loader/cluster_manager.py ADDED Viewed

@@ -0,0 +1,107 @@
+from __future__ import annotations
+import logging
+from typing import Any
+from dask.distributed import Client, LocalCluster
+from dask_cuda import LocalCUDACluster
+from dask_jobqueue import HTCondorCluster
+from dask_kubernetes import KubeCluster
+logger = logging.getLogger(__name__)
+class ClusterManager:
+    """Context manager to provision and tear down a Dask cluster.
+    Parameters
+    ----------
+    cluster_type : str
+        Backend to use ("local", "condor", "cuda", "kubernetes").
+    cluster_config : dict | None, optional
+        Keyword arguments forwarded to the specific cluster constructor.
+    jobs : int, optional
+        Desired number of jobs / workers (used for queue / scalable backends).
+    """
+    def __init__(
+        self,
+        cluster_type: str,
+        cluster_config: dict[str, Any] | None = None,
+        jobs: int = 1,
+    ) -> None:
+        if cluster_config is None:
+            cluster_config = {}
+        # Copy to avoid mutating caller's dict accidentally.
+        self.cluster_config: dict[str, Any] = dict(cluster_config)
+        self.cluster_type: str = cluster_type
+        self.jobs: int = jobs
+        self.cluster: Any | None = None
+        self.client: Any | None = None
+    # ------------------------------------------------------------------
+    # Context manager protocol
+    # ------------------------------------------------------------------
+    def __enter__(self):  # -> distributed.Client (avoids importing type eagerly)
+        self._start_cluster()
+        return self.client
+    def __exit__(self, exc_type, exc, tb) -> bool:  # noqa: D401 (simple)
+        self._close_cluster()
+        # Returning False propagates any exception (desired behavior)
+        return False
+    # ------------------------------------------------------------------
+    # Internal helpers
+    # ------------------------------------------------------------------
+    def _start_cluster(self) -> None:
+        ct = self.cluster_type.lower()
+        if ct == "local":
+            self.cluster = LocalCluster(**self.cluster_config)
+        elif ct == "condor":
+            self.cluster = HTCondorCluster(**self.cluster_config)
+            if self.jobs and self.jobs > 0:
+                # Scale to the requested number of jobs
+                self.cluster.scale(jobs=self.jobs)
+        elif ct == "cuda":
+            self.cluster = LocalCUDACluster(**self.cluster_config)
+        elif ct == "kubernetes":
+            self.cluster = KubeCluster(**self.cluster_config)
+            if self.jobs and self.jobs > 0:
+                try:
+                    # Not all KubeCluster versions expose scale() identically
+                    self.cluster.scale(self.jobs)
+                except Exception:
+                    pass  # Best effort; ignore if unsupported
+        else:
+            raise ValueError(f"Unsupported cluster type: {self.cluster_type}")
+        self.client = Client(self.cluster)
+        dash = getattr(self.client, "dashboard_link", None)
+        if dash:
+            logger.info(f"Dask dashboard: {dash}")
+    def _close_cluster(self) -> None:
+        # Close client first so tasks wind down before cluster termination.
+        if self.client is not None:
+            try:
+                self.client.close()
+            except Exception:
+                pass
+            finally:
+                self.client = None
+        if self.cluster is not None:
+            try:
+                self.cluster.close()
+            except Exception:
+                pass
+            finally:
+                self.cluster = None

triggerflow-0.2.1/src/trigger_loader/loader.py ADDED Viewed

@@ -0,0 +1,95 @@
+import json
+import logging
+import platform
+import time
+import uuid
+import awkward as ak
+import coffea
+from coffea import processor
+from coffea.nanoevents import NanoAODSchema
+from .cluster_manager import ClusterManager
+from .processor import TriggerProcessor
+logger = logging.getLogger(__name__)
+class TriggerLoader:
+    def __init__(self,
+        sample_json: str,
+        transform: callable,
+        output_path: str,
+    ):
+        self.transform = transform
+        self.fileset = self._load_sample_json(sample_json)
+        self.output_path = output_path
+        self.run_uuid = str(uuid.uuid4())
+    def _build_processor(self):
+        run_meta = {
+            "run_uuid": self.run_uuid,
+            "fileset_size": sum(len(v) if isinstance(v, list) else 1 for v in self.fileset.values()),
+            "coffea_version": coffea.__version__,
+            "awkward_version": ak.__version__,
+            "python_version": platform.python_version(),
+        }
+        return TriggerProcessor(
+            output_path=self.output_path,
+            transform=self.transform,
+            compression="zstd",
+            add_uuid=False,
+            run_uuid=self.run_uuid,
+            run_metadata=run_meta,
+        )
+    def _load_sample_json(self, sample_json: str) -> dict:
+        with open(sample_json) as f:
+            return json.load(f)
+    def _write_run_metadata_file(self, path: str, duration_s: float | None = None):
+        meta_path = f"{path}/run_metadata.json"
+        data = {
+            "run_uuid": self.run_uuid,
+            "duration_seconds": duration_s,
+        }
+        with open(meta_path, "w") as f:
+            json.dump(data, f, indent=2)
+    def _run(self, runner: processor.Runner, label: str):
+        logger.log(f"Starting processing ({label})...")
+        start = time.time()
+        proc = self._build_processor()
+        acc = runner(
+            self.fileset,
+            treename="Events",
+            processor_instance=proc
+        )
+        elapsed = time.time() - start
+        self._write_run_metadata_file(self.output_path, elapsed)
+        logger.log(f"Finished in {elapsed:.2f}s (run_uuid={self.run_uuid})")
+        return acc
+    def run_distributed(self, cluster_type: str, cluster_config: dict,
+                        chunksize: int = 100_000, jobs: int = 1):
+        with ClusterManager(cluster_type, cluster_config, jobs) as client:
+            executor = processor.DaskExecutor(client=client)
+            runner = processor.Runner(
+                executor=executor,
+                schema=NanoAODSchema,
+                chunksize=chunksize
+            )
+            self._run(runner, f"Distributed ({cluster_type})")
+    def run_local(self, num_workers: int = 4, chunksize: int = 100_000):
+        """
+        Run processing locally using a multi-processing executor.
+        """
+        executor = processor.FuturesExecutor(workers=num_workers)
+        runner = processor.Runner(
+            executor=executor,
+            schema=NanoAODSchema,
+            chunksize=chunksize
+        )
+        self._run(runner, f"Local ({num_workers} workers)")

triggerflow 0.1.12__tar.gz → 0.2.1__tar.gz

triggerflow 0.1.12tar.gz → 0.2.1tar.gz