PyPI - autogluon.timeseries - Versions diffs - 1.0.1b20240405__py3-none-any.whl → 1.0.1b20240407__py3-none-any.whl - Mend

autogluon.timeseries 1.0.1b20240405py3-none-any.whl → 1.0.1b20240407py3-none-any.whl

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Potentially problematic release.

This version of autogluon.timeseries might be problematic. Click here for more details.

Files changed (20) hide show

autogluon/timeseries/predictor.py CHANGED Viewed

@@ -451,8 +451,8 @@ class TimeSeriesPredictor(TimeSeriesPredictorDeprecatedMixin):
                 data.static_features["store_id"] = data.static_features["store_id"].astype("category")
-            If provided data is an instance of pandas DataFrame, AutoGluon will attempt to automatically convert it
-            to a ``TimeSeriesDataFrame``.
+            If provided data is a path or a pandas.DataFrame, AutoGluon will attempt to automatically convert it to a
+            ``TimeSeriesDataFrame``.
         tuning_data : Union[TimeSeriesDataFrame, pd.DataFrame, Path, str], optional
             Data reserved for model selection and hyperparameter tuning, rather than training individual models. Also
@@ -472,8 +472,8 @@ class TimeSeriesPredictor(TimeSeriesPredictorDeprecatedMixin):
             If ``train_data`` has past covariates or static features, ``tuning_data`` must have also include them (with
             same columns names and dtypes).
-            If provided data is an instance of pandas DataFrame, AutoGluon will attempt to automatically convert it
-            to a ``TimeSeriesDataFrame``.
+            If provided data is a path or a pandas.DataFrame, AutoGluon will attempt to automatically convert it to a
+            ``TimeSeriesDataFrame``.
         time_limit : int, optional
             Approximately how long :meth:`~autogluon.timeseries.TimeSeriesPredictor.fit` will run (wall-clock time in
@@ -570,7 +570,7 @@ class TimeSeriesPredictor(TimeSeriesPredictorDeprecatedMixin):
             Valid preset values:
             * "auto": Performs HPO via bayesian optimization search on GluonTS-backed neural forecasting models and
-                random search on other models using local scheduler.
+              random search on other models using local scheduler.
             * "random": Performs HPO via random search.
             You can also provide a dict to specify searchers and schedulers
@@ -855,8 +855,11 @@ class TimeSeriesPredictor(TimeSeriesPredictorDeprecatedMixin):
         Parameters
         ----------
         data : Union[TimeSeriesDataFrame, pd.DataFrame, Path, str]
-            The data to evaluate the best model on. The last ``prediction_length`` time steps of the data set, for each
-            item, will be held out for prediction and forecast accuracy will be calculated on these time steps.
+            The data to evaluate the best model on. The last ``prediction_length`` time steps of each time series in
+            ``data`` will be held out for prediction and forecast accuracy will be calculated on these time steps.
+            Must include both historic and future data (i.e., length of all time series in ``data`` must be at least
+            ``prediction_length + 1``).
             If ``known_covariates_names`` were specified when creating the predictor, ``data`` must include the columns
             listed in ``known_covariates_names`` with the covariates values aligned with the target time series.
@@ -893,6 +896,137 @@ class TimeSeriesPredictor(TimeSeriesPredictorDeprecatedMixin):
             logger.info(json.dumps(scores_dict, indent=4))
         return scores_dict
+    def feature_importance(
+        self,
+        data: Optional[Union[TimeSeriesDataFrame, pd.DataFrame, Path, str]] = None,
+        model: Optional[str] = None,
+        metric: Optional[Union[str, TimeSeriesScorer]] = None,
+        features: Optional[List[str]] = None,
+        time_limit: Optional[float] = None,
+        method: Literal["naive", "permutation"] = "permutation",
+        subsample_size: int = 50,
+        num_iterations: Optional[int] = None,
+        random_seed: Optional[int] = 123,
+        relative_scores: bool = False,
+        include_confidence_band: bool = True,
+        confidence_level: float = 0.99,
+    ):
+        """
+        Calculates feature importance scores for the given model via replacing each feature by a shuffled version of the same feature
+        (also known as permutation feature importance) or by assigning a constant value representing the median or mode of the feature,
+        and computing the relative decrease in the model's predictive performance.
+        A feature's importance score represents the performance drop that results when the model makes predictions on a perturbed copy
+        of the data where this feature's values have been randomly shuffled across rows. A feature score of 0.01 would indicate that the
+        predictive performance dropped by 0.01 when the feature was randomly shuffled or replaced. The higher the score a feature has,
+        the more important it is to the model's performance.
+        If a feature has a negative score, this means that the feature is likely harmful to the final model, and a model trained with
+        the feature removed would be expected to achieve a better predictive performance. Note that calculating feature importance can
+        be a computationally expensive process, particularly if the model uses many features. In many cases, this can take longer than
+        the original model training. Roughly, this will equal to the number of features in the data multiplied by ``num_iterations``
+        (or, 1 when ``method="naive"``) and time taken when ``evaluate()`` is called on a dataset with ``subsample_size``.
+        Parameters
+        ----------
+        data : TimeSeriesDataFrame, pd.DataFrame, Path or str, optional
+            The data to evaluate feature importances on. The last ``prediction_length`` time steps of the data set, for each
+            item, will be held out for prediction and forecast accuracy will be calculated on these time steps.
+            More accurate feature importances will be obtained from new data that was held-out during ``fit()``.
+            If ``known_covariates_names`` were specified when creating the predictor, ``data`` must include the columns
+            listed in ``known_covariates_names`` with the covariates values aligned with the target time series.
+            This data must contain the label column with the same column name as specified during ``fit()``.
+            If ``train_data`` used to train the predictor contained past covariates or static features, then ``data``
+            must also include them (with same column names and dtypes).
+            If provided data is an instance of pandas DataFrame, AutoGluon will attempt to automatically convert it
+            to a ``TimeSeriesDataFrame``. If str or Path is passed, ``data`` will be loaded using the str value as the file path.
+            If ``data`` is not provided, then validation (tuning) data provided during training (or the held out data used for
+            validation if ``tuning_data`` was not explicitly provided ``fit()``) will be used.
+        model : str, optional
+            Name of the model that you would like to evaluate. By default, the best model during training
+            (with highest validation score) will be used.
+        metric : str or TimeSeriesScorer, optional
+            Metric to be used for computing feature importance. If None, the ``eval_metric`` specified during initialization of
+            the ``TimeSeriesPredictor`` will be used.
+        features : List[str], optional
+            List of feature names that feature importances are calculated for and returned. By default, all feature importances
+            will be returned.
+        method : {"permutation", "naive"}, default = "permutation"
+            Method to be used for computing feature importance.
+            * ``naive``: computes feature importance by replacing the values of each feature by a constant value and computing
+              feature importances as the relative improvement in the evaluation metric. The constant value is the median for
+              real-valued features and the mode for categorical features, for both covariates and static features, obtained from the
+              feature values in ``data`` provided.
+            * ``permutation``: computes feature importance by naively shuffling the values of the feature across different items
+              and time steps. Each feature is shuffled for ``num_iterations`` times and feature importances are computed as the
+              relative improvement in the evaluation metric. Refer to https://explained.ai/rf-importance/ for an explanation of
+              permutation importance.
+        subsample_size : int, default = 50
+            The number of items to sample from `data` when computing feature importance. Larger values increase the accuracy of
+            the feature importance scores. Runtime linearly scales with `subsample_size`.
+        time_limit : float, optional
+            Time in seconds to limit the calculation of feature importance. If None, feature importance will calculate without early stopping.
+            If ``method="permutation"``, a minimum of 1 full shuffle set will always be evaluated. If a shuffle set evaluation takes longer than
+            ``time_limit``, the method will take the length of a shuffle set evaluation to return regardless of the `time_limit`.
+        num_iterations : int, optional
+            The number of different iterations of the data that are evaluated. If ``method="permutation"``, this will be interpreted
+            as the number of shuffle sets (equivalent to ``num_shuffle_sets`` in :meth:`TabularPredictor.feature_importance`). If ``method="naive"``, the
+            constant replacement approach is repeated for ``num_iterations`` times, and a different subsample of data (of size ``subsample_size``) will
+            be taken in each iteration.
+            Default is 1 for ``method="naive"`` and 5 for ``method="permutation"``. The value will be ignored if ``method="naive"`` and the subsample
+            size is greater than the number of items in ``data`` as additional iterations will be redundant.
+            Larger values will increase the quality of the importance evaluation.
+            It is generally recommended to increase ``subsample_size`` before increasing ``num_iterations``.
+            Runtime scales linearly with ``num_iterations``.
+        random_seed : int or None, default = 123
+            If provided, fixes the seed of the random number generator for all models. This guarantees reproducible
+            results for feature importance.
+        relative_scores : bool, default = False
+            By default, this method will return expected average *absolute* improvement in the eval metric due to the feature. If True, then
+            the statistics will be computed over the *relative* (percentage) improvements.
+        include_confidence_band: bool, default = True
+            If True, returned DataFrame will include two additional columns specifying confidence interval for the true underlying importance value of
+            each feature. Increasing ``subsample_size`` and ``num_iterations`` will tighten the confidence interval.
+        confidence_level: float, default = 0.99
+            This argument is only considered when ``include_confidence_band=True``, and can be used to specify the confidence level used
+            for constructing confidence intervals. For example, if ``confidence_level`` is set to 0.99, then the returned DataFrame will include
+            columns ``p99_high`` and ``p99_low`` which indicates that the true feature importance will be between ``p99_high`` and ``p99_low`` 99% of
+            the time (99% confidence interval). More generally, if ``confidence_level`` = 0.XX, then the columns containing the XX% confidence interval
+            will be named ``pXX_high`` and ``pXX_low``.
+        Returns
+        -------
+        :class:`pd.DataFrame` of feature importance scores with 2 columns:
+            index: The feature name.
+            'importance': The estimated feature importance score.
+            'stddev': The standard deviation of the feature importance score. If NaN, then not enough ``num_iterations`` were used.
+        """
+        if data is not None:
+            data = self._check_and_prepare_data_frame(data)
+            self._check_data_for_evaluation(data)
+        fi_df = self._learner.get_feature_importance(
+            data=data,
+            model=model,
+            metric=metric,
+            features=features,
+            time_limit=time_limit,
+            method=method,
+            subsample_size=subsample_size,
+            num_iterations=num_iterations,
+            random_seed=random_seed,
+            relative_scores=relative_scores,
+            include_confidence_band=include_confidence_band,
+            confidence_level=confidence_level,
+        )
+        return fi_df
     @classmethod
     def _load_version_file(cls, path: str) -> str:
         version_file_path = os.path.join(path, cls._predictor_version_file_name)
@@ -1048,8 +1182,8 @@ class TimeSeriesPredictor(TimeSeriesPredictorDeprecatedMixin):
         Parameters
         ----------
         data : Union[TimeSeriesDataFrame, pd.DataFrame, Path, str], optional
-            dataset used for additional evaluation. If not provided, the validation set used during training will be
-            used.
+            dataset used for additional evaluation. Must include both historic and future data (i.e., length of all
+            time series in ``data`` must be at least ``prediction_length + 1``).
             If ``known_covariates_names`` were specified when creating the predictor, ``data`` must include the columns
             listed in ``known_covariates_names`` with the covariates values aligned with the target time series.
@@ -1057,8 +1191,8 @@ class TimeSeriesPredictor(TimeSeriesPredictorDeprecatedMixin):
             If ``train_data`` used to train the predictor contained past covariates or static features, then ``data``
             must also include them (with same column names and dtypes).
-            If provided data is an instance of pandas DataFrame, AutoGluon will attempt to automatically convert it
-            to a ``TimeSeriesDataFrame``.
+            If provided data is a path or a pandas.DataFrame, AutoGluon will attempt to automatically convert it to a
+            ``TimeSeriesDataFrame``.
         display : bool, default = False
             If True, the leaderboard DataFrame will be printed.
@@ -1227,7 +1361,7 @@ class TimeSeriesPredictor(TimeSeriesPredictorDeprecatedMixin):
         }
         past_data, known_covariates = test_data.get_model_inputs_for_scoring(
-            prediction_length=self.prediction_length, known_covariates_names=trainer.metadata.known_covariates_real
+            prediction_length=self.prediction_length, known_covariates_names=trainer.metadata.known_covariates
         )
         pred_proba_dict_test: Dict[str, TimeSeriesDataFrame] = trainer.get_model_pred_dict(
             base_models, data=past_data, known_covariates=known_covariates

autogluon/timeseries/trainer/abstract_trainer.py CHANGED Viewed

@@ -23,7 +23,11 @@ from autogluon.timeseries.models.abstract import AbstractTimeSeriesModel
 from autogluon.timeseries.models.ensemble import AbstractTimeSeriesEnsembleModel, TimeSeriesGreedyEnsemble
 from autogluon.timeseries.models.presets import contains_searchspace
 from autogluon.timeseries.splitter import AbstractWindowSplitter, ExpandingWindowSplitter
-from autogluon.timeseries.utils.features import CovariateMetadata
+from autogluon.timeseries.utils.features import (
+    ConstantReplacementFeatureImportanceTransform,
+    CovariateMetadata,
+    PermutationFeatureImportanceTransform,
+)
 from autogluon.timeseries.utils.warning_filters import disable_tqdm
 logger = logging.getLogger("autogluon.timeseries.trainer")
@@ -242,6 +246,9 @@ class SimpleAbstractTrainer:
 class AbstractTimeSeriesTrainer(SimpleAbstractTrainer):
     _cached_predictions_filename = "cached_predictions.pkl"
+    max_rel_importance_score: float = 1e5
+    eps_abs_importance_score: float = 1e-5
     def __init__(
         self,
         path: str,
@@ -763,7 +770,7 @@ class AbstractTimeSeriesTrainer(SimpleAbstractTrainer):
         if data is not None:
             past_data, known_covariates = data.get_model_inputs_for_scoring(
-                prediction_length=self.prediction_length, known_covariates_names=self.metadata.known_covariates_real
+                prediction_length=self.prediction_length, known_covariates_names=self.metadata.known_covariates
             )
             logger.info(
                 "Additional data provided, testing on additional data. Resulting leaderboard "
@@ -849,7 +856,9 @@ class AbstractTimeSeriesTrainer(SimpleAbstractTrainer):
                 unpersisted_models.append(model)
         return unpersisted_models
-    def _get_model_for_prediction(self, model: Optional[Union[str, AbstractTimeSeriesModel]] = None) -> str:
+    def _get_model_for_prediction(
+        self, model: Optional[Union[str, AbstractTimeSeriesModel]] = None, verbose: bool = True
+    ) -> str:
         """Given an optional identifier or model object, return the name of the model with which to predict.
         If the model is not provided, this method will default to the best model according to the validation score.
@@ -858,10 +867,11 @@ class AbstractTimeSeriesTrainer(SimpleAbstractTrainer):
             if self.model_best is None:
                 best_model_name: str = self.get_model_best()
                 self.model_best = best_model_name
-            logger.info(
-                f"Model not specified in predict, will default to the model with the "
-                f"best validation score: {self.model_best}",
-            )
+            if verbose:
+                logger.info(
+                    f"Model not specified in predict, will default to the model with the "
+                    f"best validation score: {self.model_best}",
+                )
             return self.model_best
         else:
             if isinstance(model, AbstractTimeSeriesModel):
@@ -923,7 +933,7 @@ class AbstractTimeSeriesTrainer(SimpleAbstractTrainer):
         use_cache: bool = True,
     ) -> Dict[str, float]:
         past_data, known_covariates = data.get_model_inputs_for_scoring(
-            prediction_length=self.prediction_length, known_covariates_names=self.metadata.known_covariates_real
+            prediction_length=self.prediction_length, known_covariates_names=self.metadata.known_covariates
         )
         predictions = self.predict(data=past_data, known_covariates=known_covariates, model=model, use_cache=use_cache)
         if not isinstance(metrics, list):  # a single metric is provided
@@ -936,6 +946,149 @@ class AbstractTimeSeriesTrainer(SimpleAbstractTrainer):
             )
         return scores_dict
+    def get_feature_importance(
+        self,
+        data: TimeSeriesDataFrame,
+        features: List[str],
+        model: Optional[Union[str, AbstractTimeSeriesModel]] = None,
+        metric: Optional[Union[str, TimeSeriesScorer]] = None,
+        time_limit: Optional[float] = None,
+        method: Literal["naive", "permutation"] = "permutation",
+        subsample_size: int = 50,
+        num_iterations: int = 1,
+        random_seed: Optional[int] = None,
+        relative_scores: bool = False,
+        include_confidence_band: bool = True,
+        confidence_level: float = 0.99,
+    ) -> pd.DataFrame:
+        assert method in ["naive", "permutation"], f"Invalid feature importance method {method}."
+        metric = check_get_evaluation_metric(metric) if metric is not None else self.eval_metric
+        logger.info("Computing feature importance")
+        # seed everything if random_seed is provided
+        if random_seed is not None:
+            seed_everything(random_seed)
+        # start timer and cap subsample size if it's greater than the number of items in the provided data set
+        time_start = time.time()
+        if subsample_size > data.num_items:
+            logger.info(
+                f"Subsample_size {subsample_size} is larger than the number of items in the data and will be ignored"
+            )
+            subsample_size = data.num_items
+        # set default number of iterations and cap iterations if the number of items in the data is smaller
+        # than the subsample size for the naive method
+        num_iterations = num_iterations or (5 if method == "permutation" else 1)
+        if method == "naive" and data.num_items <= subsample_size:
+            num_iterations = 1
+        # initialize the importance transform
+        importance_transform_type = {
+            "permutation": PermutationFeatureImportanceTransform,
+            "naive": ConstantReplacementFeatureImportanceTransform,
+        }.get(method)
+        importance_transform = importance_transform_type(
+            covariate_metadata=self.metadata,
+            prediction_length=self.prediction_length,
+            random_seed=random_seed,
+        )
+        # if model is not provided, use the best model according to the validation score
+        model = self._get_model_for_prediction(model, verbose=False)
+        # persist trainer to speed up repeated inference
+        persisted_models = self.persist(model_names=[model], with_ancestors=True)
+        importance_samples = defaultdict(list)
+        for n in range(num_iterations):
+            if subsample_size < data.num_items:
+                item_ids_sampled = data.item_ids.to_series().sample(subsample_size)  # noqa
+                data_sample = data.query("item_id in @item_ids_sampled")
+            else:
+                data_sample = data
+            base_score = self.evaluate(data=data_sample, model=model, metrics=metric, use_cache=False)[metric.name]
+            for feature in features:
+                # override importance for unused features
+                if not self._model_uses_feature(model, feature):
+                    continue
+                else:
+                    data_sample_replaced = importance_transform.transform(data_sample, feature_name=feature)
+                    score = self.evaluate(data=data_sample_replaced, model=model, metrics=metric, use_cache=False)[
+                        metric.name
+                    ]
+                    importance = base_score - score
+                    if relative_scores:
+                        importance /= np.abs(base_score - self.eps_abs_importance_score)
+                        importance = min(self.max_rel_importance_score, importance)
+                    importance_samples[feature].append(importance)
+            if time_limit is not None and time.time() - time_start > time_limit:
+                logger.info(f"Time limit reached, stopping feature importance computation after {n} iterations")
+                break
+        self.unpersist(model_names=persisted_models)
+        importance_df = (
+            (
+                pd.DataFrame(importance_samples)
+                .agg(["mean", "std", "count"])
+                .T.rename(columns={"mean": "importance", "std": "stdev", "count": "n"})
+            )
+            if len(importance_samples) > 0
+            else pd.DataFrame(columns=["importance", "stdev", "n"])
+        )
+        if include_confidence_band:
+            importance_df = self._add_ci_to_feature_importance(importance_df, confidence_level=confidence_level)
+        return importance_df
+    def _model_uses_feature(self, model: Optional[Union[str, AbstractTimeSeriesModel]], feature: str) -> bool:
+        """Check if the given model uses the given feature."""
+        models_with_ancestors = set(self.get_minimum_model_set(model))
+        if feature in self.metadata.static_features:
+            return any(self.load_model(m).supports_static_features for m in models_with_ancestors)
+        elif feature in self.metadata.known_covariates:
+            return any(self.load_model(m).supports_known_covariates for m in models_with_ancestors)
+        elif feature in self.metadata.past_covariates:
+            return any(self.load_model(m).supports_past_covariates for m in models_with_ancestors)
+        return False
+    def _add_ci_to_feature_importance(
+        self, importance_df: pd.DataFrame, confidence_level: float = 0.99
+    ) -> pd.DataFrame:
+        """Add confidence intervals to the feature importance."""
+        import scipy.stats
+        if confidence_level <= 0.5 or confidence_level >= 1.0:
+            raise ValueError("confidence_level must lie between 0.5 and 1.0")
+        ci_str = "{:.0f}".format(confidence_level * 100)
+        alpha = 1 - confidence_level
+        importance_df[f"p{ci_str}_low"] = np.nan
+        importance_df[f"p{ci_str}_high"] = np.nan
+        for i in importance_df.index:
+            r = importance_df.loc[i]
+            importance, stdev, n = r["importance"], r["stdev"], r["n"]
+            if np.isnan(importance) or np.isnan(stdev) or np.isnan(n) or n <= 1:
+                continue
+            t_crit = scipy.stats.t.ppf(1 - alpha / 2, df=n - 1)
+            importance_df.loc[i, f"p{ci_str}_low"] = importance - t_crit * stdev / np.sqrt(n)
+            importance_df.loc[i, f"p{ci_str}_high"] = importance + t_crit * stdev / np.sqrt(n)
+        return importance_df
     def _predict_model(
         self,
         model: Union[str, AbstractTimeSeriesModel],

autogluon/timeseries/utils/features.py CHANGED Viewed

@@ -1,8 +1,9 @@
 import logging
 import reprlib
 from dataclasses import dataclass, field
-from typing import List, Optional, Tuple
+from typing import Any, List, Literal, Optional, Tuple
+import numpy as np
 import pandas as pd
 from autogluon.common.features.types import R_FLOAT, R_INT
@@ -12,7 +13,7 @@ from autogluon.features.generators import (
     IdentityFeatureGenerator,
     PipelineFeatureGenerator,
 )
-from autogluon.timeseries import TimeSeriesDataFrame
+from autogluon.timeseries.dataset.ts_dataframe import ITEMID, TimeSeriesDataFrame
 logger = logging.getLogger(__name__)
@@ -28,6 +29,10 @@ class CovariateMetadata:
     past_covariates_real: List[str] = field(default_factory=list)
     past_covariates_cat: List[str] = field(default_factory=list)
+    @property
+    def static_features(self) -> List[str]:
+        return self.static_features_cat + self.static_features_real
     @property
     def known_covariates(self) -> List[str]:
         return self.known_covariates_cat + self.known_covariates_real
@@ -48,6 +53,18 @@ class CovariateMetadata:
     def covariates_cat(self) -> List[str]:
         return self.known_covariates_cat + self.past_covariates_cat
+    @property
+    def real_features(self) -> List[str]:
+        return self.static_features_real + self.covariates_real
+    @property
+    def cat_features(self) -> List[str]:
+        return self.static_features_cat + self.covariates_cat
+    @property
+    def all_features(self) -> List[str]:
+        return self.static_features + self.covariates
 class ContinuousAndCategoricalFeatureGenerator(PipelineFeatureGenerator):
     """Generates categorical and continuous features for time series models.
@@ -284,3 +301,102 @@ class TimeSeriesFeatureGenerator:
             raise ValueError(
                 f"{len(missing_columns)} columns are missing from {data_frame_name}: {reprlib.repr(missing_columns.to_list())}"
             )
+class AbstractFeatureImportanceTransform:
+    """Abstract class for transforms that replace a given feature with dummy or shuffled values,
+    for use in feature importance operations.
+    """
+    def __init__(
+        self,
+        covariate_metadata: CovariateMetadata,
+        prediction_length: int,
+        **kwargs,
+    ):
+        self.covariate_metadata: CovariateMetadata = covariate_metadata
+        self.prediction_length: int = prediction_length
+    def _transform_series(self, data: pd.Series, is_categorical: bool, **kwargs) -> TimeSeriesDataFrame:
+        """Transforms a series with the same index as the pandas DataFrame"""
+        raise NotImplementedError
+    def transform(self, data: TimeSeriesDataFrame, feature_name: str, **kwargs) -> TimeSeriesDataFrame:
+        if feature_name not in self.covariate_metadata.all_features:
+            raise ValueError(f"Target feature {feature_name} not found in covariate metadata")
+        # feature transform works on a shallow copy of the main time series data frame
+        # but a deep copy of the static features.
+        data = data.copy(deep=False)
+        is_categorical = feature_name in self.covariate_metadata.cat_features
+        if feature_name in self.covariate_metadata.past_covariates:
+            # we'll have to work on the history of the data alone
+            data[feature_name] = data[feature_name].copy()
+            feature_data = data[feature_name].groupby(level=ITEMID, sort=False).head(-self.prediction_length)
+            data[feature_name].update(self._transform_series(feature_data, is_categorical=is_categorical))
+        elif feature_name in self.covariate_metadata.static_features:
+            feature_data = data.static_features[feature_name].copy()
+            feature_data.reset_index(drop=True, inplace=True)
+            data.static_features[feature_name] = self._transform_static_series(
+                feature_data, is_categorical=is_categorical
+            )
+        else:  # known covariates
+            data[feature_name] = self._transform_series(data[feature_name], is_categorical=is_categorical)
+        return data
+class PermutationFeatureImportanceTransform(AbstractFeatureImportanceTransform):
+    """Naively shuffles a given feature."""
+    def __init__(
+        self,
+        covariate_metadata: CovariateMetadata,
+        prediction_length: int,
+        random_seed: Optional[int] = None,
+        shuffle_type: Literal["itemwise", "naive"] = "itemwise",
+        **kwargs,
+    ):
+        super().__init__(covariate_metadata, prediction_length, **kwargs)
+        self.shuffle_type = shuffle_type
+        self.random_seed = random_seed
+    def _transform_static_series(self, feature_data: pd.Series, is_categorical: bool) -> Any:
+        return feature_data.sample(frac=1, random_state=self.random_seed).values
+    def _transform_series(self, feature_data: pd.Series, is_categorical: bool) -> pd.Series:
+        # set random state once to shuffle 'independently' for different items
+        rng = np.random.RandomState(self.random_seed)
+        if self.shuffle_type == "itemwise":
+            return feature_data.groupby(level=ITEMID, sort=False).transform(
+                lambda x: x.sample(frac=1, random_state=rng).values
+            )
+        elif self.shuffle_type == "naive":
+            return pd.Series(feature_data.sample(frac=1, random_state=rng).values, index=feature_data.index)
+class ConstantReplacementFeatureImportanceTransform(AbstractFeatureImportanceTransform):
+    """Replaces a target feature with the median if it's a real-valued feature, and the mode if it's a
+    categorical feature."""
+    def __init__(
+        self,
+        covariate_metadata: CovariateMetadata,
+        prediction_length: int,
+        real_value_aggregation: Literal["mean", "median"] = "mean",
+        **kwargs,
+    ):
+        super().__init__(covariate_metadata, prediction_length, **kwargs)
+        self.real_value_aggregation = real_value_aggregation
+    def _transform_static_series(self, feature_data: pd.Series, is_categorical: bool) -> Any:
+        return feature_data.mode()[0] if is_categorical else feature_data.agg(self.real_value_aggregation)
+    def _transform_series(self, feature_data: pd.Series, is_categorical: bool) -> pd.Series:
+        if is_categorical:
+            return feature_data.groupby(level=ITEMID, sort=False).transform(lambda x: x.mode()[0])
+        else:
+            return feature_data.groupby(level=ITEMID, sort=False).transform(self.real_value_aggregation)

autogluon/timeseries/version.py CHANGED Viewed

@@ -1,3 +1,3 @@
 """This is the autogluon version file."""
-__version__ = '1.0.1b20240405'
+__version__ = '1.0.1b20240407'
 __lite__ = False

{autogluon.timeseries-1.0.1b20240405.dist-info → autogluon.timeseries-1.0.1b20240407.dist-info}/METADATA RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.1
 Name: autogluon.timeseries
-Version: 1.0.1b20240405
+Version: 1.0.1b20240407
 Summary: AutoML for Image, Text, and Tabular Data
 Home-page: https://github.com/autogluon/autogluon
 Author: AutoGluon Community
@@ -52,12 +52,12 @@ Requires-Dist: utilsforecast <0.0.11,>=0.0.10
 Requires-Dist: tqdm <5,>=4.38
 Requires-Dist: orjson ~=3.9
 Requires-Dist: tensorboard <3,>=2.9
-Requires-Dist: autogluon.core[raytune] ==1.0.1b20240405
-Requires-Dist: autogluon.common ==1.0.1b20240405
-Requires-Dist: autogluon.tabular[catboost,lightgbm,xgboost] ==1.0.1b20240405
+Requires-Dist: autogluon.core[raytune] ==1.0.1b20240407
+Requires-Dist: autogluon.common ==1.0.1b20240407
+Requires-Dist: autogluon.tabular[catboost,lightgbm,xgboost] ==1.0.1b20240407
 Provides-Extra: all
-Requires-Dist: optimum[nncf,openvino] <1.18,>=1.17 ; extra == 'all'
 Requires-Dist: optimum[onnxruntime] <1.18,>=1.17 ; extra == 'all'
+Requires-Dist: optimum[nncf,openvino] <1.18,>=1.17 ; extra == 'all'
 Provides-Extra: chronos-onnx
 Requires-Dist: optimum[onnxruntime] <1.18,>=1.17 ; extra == 'chronos-onnx'
 Provides-Extra: chronos-openvino

autogluon.timeseries 1.0.1b20240405__py3-none-any.whl → 1.0.1b20240407__py3-none-any.whl

Potentially problematic release.

autogluon.timeseries 1.0.1b20240405py3-none-any.whl → 1.0.1b20240407py3-none-any.whl