PyPI - csa-common-lib - Versions diffs - 2.0.0__tar.gz - Mend

csa-common-lib 2.0.0__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (36) hide show

csa_common_lib-2.0.0/PKG-INFO ADDED Viewed

@@ -0,0 +1,6 @@
+Metadata-Version: 2.1
+Name: csa_common_lib
+Version: 2.0.0
+Summary: csa_common_lib is a shared library designed to provide utility modules, class definitions, enumerations, and helper functions for the CSA Prediction Engine Python client. It standardizes and simplifies complex operations across different parts of the CSA Prediction Engine.
+Author: Cambridge Sports Analytics
+Author-email: prediction@csanalytics.io

csa_common_lib-2.0.0/README.md ADDED Viewed

@@ -0,0 +1,61 @@
+# PSR_LIBRARY
+The PSR Library is the core framework for calculating relevance-based prediction models and managing interactions with AWS Lambda functions. It includes several modules ranging from core math functions to end-user APIs and internal tools to support the operation of Cambridge Sports Analytics' prediction models.
+## Repository Structure
+The repository is structured into several key components:
+### Key Directories
+- **`_aws_layers/`**: Contains the AWS Lambda layers needed for dependencies such as Python packages to be used in Lambda functions.
+- **`csa_common_lib/`**: A collection of common utilities and helper functions that can be reused across different modules in the repository, including shared classes, validation utilities, and enumerations.
+- **`csanalytics/`**: The package intended for the end-user interaction, primarily to interface with Cambridge Sports Analytics' prediction engine API. It provides functions to call predictive models, fetch results, and handle user data.
+- **`csanalytics_local/`**: Internal tools used for development and infrastructure management, meant for use by CSA engineers.
+- **`lambda_functions/`**: Contains various AWS Lambda functions that handle job processing, submission, and results retrieval. These functions facilitate interaction with the PSR models through serverless operations.
+  - **`accessid_usage_handler/`**: Lambda function for handling access ID usage.
+  - **`filter_response/`**: Lambda function for filtering responses from job results.
+  - **`get_accessid_usage/`**: Retrieves access ID usage statistics.
+  - **`get_apikey_usage/`**: Retrieves API key usage statistics.
+  - **`get_job_results/`**: Fetches results of a job from the server.
+  - **`post_job/`**: Submits a job to the PSR prediction engine.
+  - **`process_job/`**: Processes a job and manages its state.
+  - **`start_state_machine_psr/`**: Starts an AWS Step Functions state machine to manage long-running jobs.
+- **`psr/`**: The main library where core mathematical modules are implemented. This is the heart of the PSR system, containing the math and algorithms for relevance-based predictions.
+- **`psr_lambda/`**: Helper functions and utilities for managing AWS Lambda functions specifically for PSR-related tasks. This package helps with the deployment and orchestration of Lambda functions for running predictions.
+## Getting Started
+### Cloning the Repository
+To get started, clone this repository using:
+```bash
+git clone https://github.com/CambridgeSportsAnalytics/PSR_LIBRARY.git
+```
+### Setting Up Your Environment
+The repository includes various packages and functions that require dependencies for AWS Lambda and Python packages. Make sure you have the necessary Python environment set up, and install required dependencies using:
+```bash
+pip install -r requirements.txt
+```
+### Running Lambda Functions
+To interact with AWS Lambda functions, you can navigate to the lambda_functions/ directory and deploy the functions using the AWS CLI or your preferred deployment method (e.g., AWS SAM or Serverless Framework).
+## Documentation
+For full documentation on each module and function, refer to the inline docstrings and module-specific README files located within each subdirectory. You can contact Cel Kulasekaran or Logan Waien for technical inquiries.
+## License
+(c) 2023 - 2024 Cambridge Sports Analytics, LLC. All rights reserved.

csa_common_lib-2.0.0/csa_common_lib/__init__.py ADDED Viewed

@@ -0,0 +1,15 @@
+"""CSA Common Library
+Description of module should go here.
+"""
+# Options classes available for Optimal Variable Grid prediciton,
+# Max Fit prediction, and relevance-based prediction.
+from .classes.prediction_options import GridOptions
+from .classes.prediction_options import MaxFitOptions
+from .classes.prediction_options import PredictionOptions
+from .classes.prediction_results import PredictionResults
+from .classes.prediction_receipt import PredictionReceipt

csa_common_lib-2.0.0/csa_common_lib/classes/__init__.py ADDED Viewed

@@ -0,0 +1,6 @@
+from .prediction_options import PredictionOptions
+from .prediction_options import MaxFitOptions
+from .prediction_options import GridOptions
+from .prediction_results import PredictionResults

csa_common_lib-2.0.0/csa_common_lib/classes/prediction_options.py ADDED Viewed

@@ -0,0 +1,253 @@
+import copy
+import numpy as np
+class PredictionOptions:
+    """A configurable options class for relevance-based predictions, including
+    predict, maxfit, and grid models. This class provides a comprehensive
+    list of all possible input parameters, ensuring flexibility across
+    different prediction models. While some parameters are shared across
+    inherrited models, setting an unused option for a specific model
+    will have no effect, ensuring compatibility and ease of use.
+    threshold : float or ndarray [1-by-T], optional (default=None)
+        Evaluation threshold to determine whether observations will be
+        included or excluded from the censor function in the
+        partial-sample regression. If threshold = None, the model
+        will evaluate across thresholds from [0, 0.90) in 0.10 increments.
+    is_threshold_percent : bool, optional (default=True)
+        Specify whether threshold is in percentage (decimal) units.
+    most_eval : bool, optional (default=True)
+        Specify the direction of the censor evaluation of the threshold.
+        True:  [eval_type] score > threshold
+        False: [eval_type] score < threshold
+    eval_type : str, optional (default="both")
+        Specify evaluation censor type, relevance, similarity, or both.
+    adj_fit_multiplier : str, optional (default='K')
+        Adjusted fit multiplier. Specify either 'log', 'K', or '1'.
+    cov_inv : ndarray [K-by-K], optional (default=None)
+        Inverse covariance matrix, specify for speed.
+    Returns
+    -------
+    PredictionsOptions
+        Options class to organize and persist parameters used in the
+        the prediction models.
+    Raises
+    ------
+    AttributeError
+        When attempting to set or get an attribute that does not
+        exist in the options dictionary.
+    """
+    def __init__(self, **kwargs):
+        self.options = {
+            'threshold': [0],
+            'is_threshold_percent': True,
+            'most_eval': True,
+            'eval_type': 'both',
+            'adj_fit_multiplier': 'K',
+            'cov_inv': None,
+        }
+        # Update the options dictionary with any provided kwargs
+        self.options.update(kwargs)
+    def __getattr__(self, name):
+        # Avoid recursion by checking if the attribute is already present in __dict__
+        if name in self.__dict__:
+            return self.__dict__[name]
+        # Check if 'options' is in self.__dict__ to avoid KeyError
+        if 'options' in self.__dict__ and name in self.__dict__['options']:
+            return self.__dict__['options'][name]
+        # Raise an AttributeError if the attribute is not found
+        raise AttributeError(f"'PredictionOptions' object has no attribute '{name}'")
+    def __setattr__(self, name, value):
+        if name == "options":
+            super().__setattr__(name, value)
+        elif 'options' in self.__dict__ and name in self.options:
+            self.options[name] = value
+        else:
+            raise AttributeError(f"'PredictionOptions' object has no attribute '{name}'")
+    def display(self):
+        for key, value in self.options.items():
+            print(f"{key}: {value}")
+    def init_from_dict(self, inputs):
+        """ Accepts a dictionary of inputs and returns a
+        PredictionOptions object updated with all passed optional values.
+        Essentially, this is an update method.
+        Args:
+            inputs (dict): Intakes a dictionary of inputs deconstructed
+            in an AWS Lambda function.
+        Returns:
+            PredictionOptions: PredictionOptions obj that
+            holds all passed optional values. Non-passed options
+            remain default setting
+        """
+        # Iterate through input dict key/value pairs
+        for key, value in inputs.items():
+            # If obj attribute matches key in input dict
+            if hasattr(self, key):
+                # Update corresponding attribute in options object to hold dictionary value
+                super().__setattr__(key, value)  # Use super() to avoid calling custom __setattr__
+    def clone_with(self, **kwargs):
+        """ Returns a clone of the passed PredictionOptions object
+        with user-specified attribute overwrites (via key value pairs)
+        Args:
+            key/value pair (attr/value): Attributes to overwrite in
+            the cloned object lambda function
+        Returns:
+            PredictionOptions: PredictionOptions obj
+        """
+         # Create a new instance of PredictionOptions to avoid recursive loop in .deepcopy()
+        new_copy = self.__class__()
+        # Copy attributes from the original instance to the new instance
+        for attr, value in self.__dict__.items():
+            setattr(new_copy, attr, value)
+        # Overwrite attributes with passed parameter
+        for key, value in kwargs.items():
+            setattr(new_copy, key, value)
+        return new_copy
+class MaxFitOptions(PredictionOptions):
+    """
+    MaxFitOptions Class:
+    Inherits from PredictionOptions and adds additional options specific
+    max fit problems.
+    threshold : not applicable
+        Max fit solves for the optimal threshold that maximizes the
+        fit (or adjusted fit) value.
+    threshold_range : tuple or ndarray
+        Min/max range for evaluating maxfit threshold, by default (0, 0.20, 0.50, 0.80)
+        If an ndarray is passed in, max fit evaluates over the specified
+        threshold values in the ndarray
+    stepsize : float, optional (default=0.20)
+        Stepsize to evaluate range of thresholds to solve for max fit.
+        Decreasing stepsize will increase the grid resolution.
+    most_eval : bool, optional (default=True)
+        Specify the direction of threshold evluation on the censor score.
+        The censor score is determined by eval_type.
+        True:  censor score > threshold
+        False: censor score < threshold
+    eval_type : str, optional (default="both")
+        Specify censor threshold type, relevance, similarity, or both.
+    cov_inv : ndarray [K-by-K], optional (default=None)
+        Inverse covariance matrix, specify for speed.
+    objective : str, optional (default="adjusted_fit)
+        Objective function to optimize, either fit or adjusted_fit.
+    Returns
+    -------
+    MaxFitOptions
+        Options class to organize and persist parameters used for the
+        maximum fit prediction model.
+    Raises
+    ------
+    AttributeError
+        When attempting to set or get an attribute that does not
+        exist in the options dictionary.
+    """
+    def __init__(self, **kwargs):
+        super().__init__(**kwargs)
+        self.options.update({
+            'threshold': None,
+            'threshold_range': np.array((0, 0.20, 0.50, 0.80), dtype='float32'),
+            'stepsize': 0.20,
+            'objective': 'adjusted_fit',
+        })
+        # Update the options dictionary with any provided kwargs
+        self.options.update(kwargs)
+class GridOptions(MaxFitOptions):
+    """
+    GridOptions Class:
+    Inherits from MaxFitOptions and adds additional options.
+    threshold_range : tuple or ndarray
+        Min/max range for evaluating maxfit threshold, by default (0, 0.20, 0.50, 0.80)
+        If an ndarray is passed in, max fit evaluates over the specified
+        threshold values in the ndarray
+    stepsize : float, optional (default=0.20)
+        Stepsize to evaluate range of thresholds to solve for max fit.
+        Decreasing stepsize will increase the granularity of the search.
+        Not applicable if threshold_range is an ndarray.
+    most_eval : bool, optional (default=True)
+        Specify the direction of threshold evluation on the censor score.
+        The censor score is determined by eval_type.
+        True:  censor score > threshold
+        False: censor score < threshold
+    eval_type : str, optional (default="both")
+        Specify censor threshold type, relevance, similarity, or both.
+    cov_inv : ndarray [K-by-K], optional (default=None)
+        Inverse covariance matrix, specify for speed.
+    objective : str, optional (default="adjusted_fit)
+        Objective function to optimize, either fit or adjusted_fit.
+    attribute_combi : ndarray [Q-by-K], optional (default=None)
+        Matrix of binary row vectors to indicate variable choices.
+        Each row is a combination of variables to evaluate.
+        If not specified, function will evaluate all possible combinations.
+    max_iter : int, optional (default=1_000_000)
+        Maximum number of grid cells to evaluate. Since this is a O(n^K)
+        computational time, we suggest balancing computation time
+        and memory with the maximum number of cells to evaluate.
+    k : int, optional (default=1)
+        Lower bound for the number of variables to include for any
+        combination Q, by default 1.
+    _is_retain_all_grid_objects : boolean, optional (default=False)
+        Saves and returns the weights grid for all censors, this is the
+        the largest matrix in yhat_details. This is typically set to True
+        for audit or deep research and development purposes.
+    Returns
+    -------
+    GridOptions
+        Options class to organize and persist parameters used for the
+        grid (and grid singularity) prediction model.
+    Raises
+    ------
+    AttributeError
+        When attempting to set or get an attribute that does not
+        exist in the options dictionary.
+    """
+    def __init__(self, **kwargs):
+        super().__init__(**kwargs)
+        self.options.update({
+            'attribute_combi': None,
+            'max_iter': 1_000_000,
+            'k': 1,
+            '_is_retain_all_grid_objects': False, # Set this to True to retain memory expensive objects for audits or deep R&D
+        })
+        # Update the options dictionary with any provided kwargs
+        self.options.update(kwargs)

csa_common_lib-2.0.0/csa_common_lib/classes/prediction_receipt.py ADDED Viewed

@@ -0,0 +1,86 @@
+import numpy as np
+import pickle
+import json
+import uuid
+from datetime import datetime
+from csa_common_lib.helpers._conversions import convert_ndarray_to_list
+from csa_common_lib.helpers._os import is_valid_path, calc_crc64
+class PredictionReceipt:
+    """Saves and orgnaizes input dimensions, prediction durations,
+    timestamps, input options and more. This is meant to assist in
+    the validation process of prediction results.
+    Returns
+    -------
+    PredictionReceipt
+        Receipt class to store and persist information that is relavant
+        to cross checking prediction requests
+    Raises
+    ------
+    AttributeError
+        When attempting to set or get an attribute that does not
+        exist in the receipt dictionary.
+    """
+    def __init__(self, model_type, y, X, theta, options, yhat, prediction_duration=None, seed:int=-1):
+        self.prediction_id = str(uuid.uuid4()) # Unique id for the prediction request
+        self.timestamp = str(datetime.now().strftime("%Y-%m-%d %H:%M:%S")) # Timestamp of the receipt
+        self.prediction_duration =  round(prediction_duration, 3) # Time to run a prediction (in seconds)
+        self.model_type = str(model_type) # Prediction model that was run
+        self.X_dim = X.shape  # Save input dimensions
+        self.y_dim = y.shape  # Save input dimensions
+        self.theta_dim = theta.shape  # Save input dimensions
+        self.options = convert_ndarray_to_list(options.options) # Save input options
+        self.yhat = yhat # Save output info
+        self.y_checksum = calc_crc64(pickle.dumps(y)) # convert y to bytes and get checksum
+        self.X_checksum = calc_crc64(pickle.dumps(X)) # convert X to bytes and get checksum
+        self.theta_checksum = calc_crc64(pickle.dumps(theta)) # convert theta to bytes and get checksum
+        self.seed = seed # User provided (if applciable). Otherwise will default to -1
+    def display(self, detail:bool=False):
+        """Displays basic validation info. Excludes lengthy results objects
+        """
+        attributes = dir(self)
+        # If suer does not request a detailed display(), remove input options and yhat array
+        if detail is False:
+            remove_attributes = ['options','yhat']
+            attributes = [attr for attr in attributes if attr not in remove_attributes]
+        # Print out a menu of accessible attributes in the receipt
+        for attr in attributes:
+            if not attr.startswith('__') and not callable(getattr(self, attr)):
+                print(f"{attr}: {getattr(self, attr)}")
+    def save_receipt(self, path:str='', file_name:str=None):
+        """Saves prediction_receipts as .json file
+        """
+        # Convert timestamp to filename if not supplied
+        if file_name is None:
+            file_name = self.timestamp.replace(" ", "_").replace(":", "-")
+        # Validate that the user supplied a valid path before saving .json
+        try:
+            if path != '':
+                is_valid_path(path)
+        except (FileNotFoundError, PermissionError) as e:
+            print(f"Error: {e}")
+        # Convert any nd.arrays to lists (to be json serializable)
+        for attr in dir(self):
+            attr_value = getattr(self, attr)
+            if isinstance(attr_value, np.ndarray):
+                setattr(self, attr, attr_value.tolist())
+        # Turn receipt object into dictionary so that it can be saved as a json file
+        obj_dict = self.__dict__
+        # Save to a JSON file
+        with open(f'{path}{file_name}.json', 'w') as json_file:
+            json.dump(obj_dict, json_file)

csa_common_lib-2.0.0/csa_common_lib/classes/prediction_results.py ADDED Viewed

@@ -0,0 +1,82 @@
+import numpy as np
+class PredictionResults:
+    """Stores an array of dictionaries containing prediction results
+    and filters specific keys (output_details' attributes)
+    into their respective arrays.
+    Returns
+    -------
+    PredictionResults
+        Flattened array of dictionaries by keys (output_details attributes)
+    Raises
+    ------
+    TypeError
+        If items in raw_data are not dictionaries.
+    """
+    def __init__(self, results):
+        self.raw_data = results
+        self._initialize_attributes()
+        # compute weights concentration and add to class
+        self.weights_concentration = [np.std(row) for row in self.weights]
+    def _initialize_attributes(self):
+        if not self.raw_data:
+            return
+        if isinstance(self.raw_data, dict):
+            first_item = self.raw_data
+            self.raw_data = [self.raw_data]
+        else:
+            first_item = self.raw_data[0]
+        if not isinstance(first_item, dict):
+            raise TypeError("PredictionResults: Items in raw_data must be dictionaries")
+        # keys_to_populate = [key for key in first_item if key in allowed_keys]
+        allowed_keys = list(self.raw_data[0].keys()) # Pull results keys that we want to capture
+        for key in allowed_keys:
+            values = []
+            for item in self.raw_data:
+                if key in item:
+                    value = item[key]
+                    if isinstance(value, np.ndarray) and value.shape == (1, 1):
+                        value = value[0][0]
+                    values.append(value)
+            setattr(self, key, values)
+    def attributes(self):
+        """Display a list of accessible attributes of the class
+        Returns
+        -------
+        list
+            List of accessble attributes of the class.
+        """
+        attribute_list = [key for key in self.__dict__.keys() if not key.startswith('__')]
+        return attribute_list
+    def display(self):
+        """Display key-value pairs of all accessible attributes of the class.
+        """
+        for attr in dir(self):
+            if not attr.startswith('__') and not callable(getattr(self, attr)):
+                print(f"{attr}: {getattr(self, attr)}")
+    def __repr__(self):
+        """Displays a list of all accessible attributes in the class
+        """
+        class_name = self.__class__.__name__
+        attributes = "\n".join(f"- {key}" for key in self.raw_data[0].keys())
+        return f"\nResults:\n--------- \n{attributes}\n--------- "

csa_common_lib-2.0.0/csa_common_lib/enum_types/__init__.py ADDED Viewed

	@@ -0,0 +1,2 @@
1	+ from .functions import PSRFunction
2	+ from .results import PSRResult

csa_common_lib-2.0.0/csa_common_lib/enum_types/exit_flags.py ADDED Viewed

@@ -0,0 +1,49 @@
+from enum import Enum
+# http return codes
+# from http import HTTPStatus
+class AccessIDStatus(Enum):
+    """Enumeration of Access ID Status flags.
+    Parameters
+    ----------
+    Enum : AccessIDStatus
+        Access ID status codes
+    """
+    VALID = (0, 'Access ID verified.')
+    EXPIRED = (1, 'Access ID expired.')
+    INVALID = (2, 'Invalid Access ID or Key.')
+    def __str__(self):
+        return self.value[1]
+    def __float__(self):
+        return float(self.value[0])
+    def __int__(self):
+        return self.value[0]
+class UserTokenStatus(Enum):
+    """Enumeration of user Token Status flags.
+    Parameters
+    ----------
+    Enum : UserTokenStatus
+        User token status codes
+    """
+    VALID = (0, 'Token verified.')
+    INVALID = (1, 'Invalid token.')
+    EXPIRED_ACCESS = (2, 'Expired token.')
+    MAX_TOKEN = (3, 'Invalid token: Maximum number of tokens reached.')
+    NON_EXISTENT = (4, 'Token does not exist.')
+    def __str__(self):
+        return self.value[1]
+    def __float__(self):
+        return float(self.value[0])
+    def __int__(self):
+        return self.value[0]

csa_common_lib-2.0.0/csa_common_lib/enum_types/functions.py ADDED Viewed

@@ -0,0 +1,32 @@
+from enum import Enum
+class PSRFunction(Enum):
+    """Enumeration of PSR library function types.
+    Parameters
+    ----------
+    Enum : PSR Function Types
+        Partial Sample Regression function types.
+    """
+    PSR = (0, 'psr')
+    MAXFIT = (1, 'maxfit')
+    GRID = (2, 'grid')
+    GRID_SINGULARITY = (3, 'grid_singularity')
+    RELEVANCE = (4, 'relevance')
+    SIMILARITY = (5, 'similarity')
+    INFORMATIVENESS = (6, 'informativeness')
+    FIT = (7, 'fit')
+    ADJUSTED_FIT = (8, 'adjusted_fit')
+    ASYMMETRY = (9, 'asymmetry')
+    CO_OCCURENCE = (10, 'co-occurence')
+    def __str__(self):
+        return self.value[1]
+    def __float__(self):
+        return float(self.value[0])
+    def __int__(self):
+        return self.value[0]

csa_common_lib-2.0.0/csa_common_lib/enum_types/job_types.py ADDED Viewed

@@ -0,0 +1,36 @@
+from enum import Enum
+class JobType(Enum):
+    """Enumeration of prediction task types.
+    This enumeration provides a list of task types for different
+    prediction job modes. Each task type is represented by a tuple,
+    containing a numerical identifier and a string label.
+    Attributes
+    ----------
+    SINGLE : tuple
+        Represents a single prediction task type with an identifier of 0.
+    MULTI_Y : tuple
+        Represents a multi-y prediction task type with an identifier of 1.
+    MULTI_THETA : tuple
+        Represents a multi-theta prediction task type with an identifier of 2.
+    """
+    SINGLE = (0, 'single')
+    MULTI_Y = (1, 'multi_y')
+    MULTI_THETA = (2, 'multi_theta')
+    def __str__(self):
+        """Returns the string representation of the task type."""
+        return self.value[1]
+    def __float__(self):
+        """Returns the numerical identifier of the task type as a float."""
+        return float(self.value[0])
+    def __int__(self):
+        """Returns the numerical identifier of the task type as an integer."""
+        return self.value[0]

csa_common_lib-2.0.0/csa_common_lib/enum_types/results.py ADDED Viewed

@@ -0,0 +1,47 @@
+from enum import Enum
+# 'weights': weights,
+#         'relevance': r,
+#         'similarity': simlr_x,
+#         'info_x': info_x,
+#         'info_theta': info_theta,
+#         'include': include,
+#         'lambda_sq': verify_row_vector(lambda_sq),
+#         'n': verify_row_vector(n),
+#         'phi': verify_row_vector(phi),
+#         'r_star': verify_row_vector(r_star),
+#         'r_star_percent': verify_row_vector(r_star_percent/100),
+#         'most_eval': repmat(np.array([most_eval]), 1, r_star.size)
+class PSRResult(Enum):
+    """Enumeration of PSR library result types.
+    Parameters
+    ----------
+    Enum : PSR Result Types
+        Partial Sample Regression result types.
+    """
+    YHAT = (0, 'y_hat')
+    FIT = (1, 'fit')
+    WEIGHTS = (2, 'weights')
+    RELEVANCE = (3, 'relevance')
+    SIMILARITY = (4, 'similarity')
+    INFO_X = (5, 'info_X')
+    INFO_THETA = (6, 'info_theta')
+    INCLUDE = (7, 'include')
+    LAMBDA_SQ = (8, 'lambda_sq')
+    N = (9, 'n')
+    PHI = (10, 'phi')
+    R_STAR = (11, 'r_star')
+    R_STAR_PERCENT = (12, 'r_star_percent')
+    ALL = (13, 'all')
+    def __str__(self):
+        return self.value[1]
+    def __float__(self):
+        return float(self.value[0])
+    def __int__(self):
+        return self.value[0]