PyPI - anomaly-pipeline - Versions diffs - 0.1.27__py3-none-any.whl → 0.1.61__py3-none-any.whl - Mend

anomaly-pipeline 0.1.27py3-none-any.whl → 0.1.61py3-none-any.whl

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (23) hide show

anomaly_pipeline/__init__.py +73 -1
anomaly_pipeline/helpers/DB_scan.py +144 -10
anomaly_pipeline/helpers/MAD.py +45 -0
anomaly_pipeline/helpers/Preprocessing.py +274 -73
anomaly_pipeline/helpers/STD.py +64 -0
anomaly_pipeline/helpers/__init__.py +13 -1
anomaly_pipeline/helpers/evaluation_info.py +25 -17
anomaly_pipeline/helpers/evaluation_plots.py +636 -30
anomaly_pipeline/helpers/ewma.py +105 -7
anomaly_pipeline/helpers/fb_prophet.py +150 -2
anomaly_pipeline/helpers/{help_info.py → help_anomaly.py} +194 -89
anomaly_pipeline/helpers/iso_forest_general.py +5 -3
anomaly_pipeline/helpers/iso_forest_timeseries.py +195 -23
anomaly_pipeline/helpers/percentile.py +46 -3
anomaly_pipeline/main.py +158 -39
anomaly_pipeline/pipeline.py +106 -34
anomaly_pipeline-0.1.61.dist-info/METADATA +275 -0
anomaly_pipeline-0.1.61.dist-info/RECORD +24 -0
anomaly_pipeline-0.1.27.dist-info/METADATA +0 -15
anomaly_pipeline-0.1.27.dist-info/RECORD +0 -24
{anomaly_pipeline-0.1.27.dist-info → anomaly_pipeline-0.1.61.dist-info}/WHEEL +0 -0
{anomaly_pipeline-0.1.27.dist-info → anomaly_pipeline-0.1.61.dist-info}/entry_points.txt +0 -0
{anomaly_pipeline-0.1.27.dist-info → anomaly_pipeline-0.1.61.dist-info}/top_level.txt +0 -0

anomaly_pipeline-0.1.61.dist-info/METADATA ADDED Viewed

@@ -0,0 +1,275 @@
+Metadata-Version: 2.4
+Name: anomaly_pipeline
+Version: 0.1.61
+Summary: Ensemble framework for detecting outliers in grouped time-series data
+Classifier: Programming Language :: Python :: 3
+Classifier: License :: OSI Approved :: MIT License
+Classifier: Operating System :: OS Independent
+Classifier: Intended Audience :: Science/Research
+Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
+Requires-Python: >=3.8
+Description-Content-Type: text/markdown
+Requires-Dist: pandas
+Requires-Dist: numpy<2
+Requires-Dist: joblib
+Requires-Dist: prophet
+Requires-Dist: scikit-learn
+Requires-Dist: google-cloud-bigquery
+Requires-Dist: google-cloud-storage
+Requires-Dist: statsmodels
+Requires-Dist: plotly
+Requires-Dist: pandas-gbq
+Requires-Dist: gcsfs
+Dynamic: classifier
+Dynamic: description
+Dynamic: description-content-type
+Dynamic: requires-dist
+Dynamic: requires-python
+Dynamic: summary
+# anomaly-pipeline
+anomaly-pipeline is an ensemble framework for detecting outliers in grouped time-series data. It automates the entire workflow from data cleaning and calendar interpolation to running 8 different detection algorithms and generating visual diagnostic reports.
+## Key Capabilities
+- Ensemble Scoring: Combines 8 models (Statistical + ML) to provide a robust Anomaly_Score and a final is_Anomaly consensus.
+- Hierarchical Processing: Natively handles grouped data (e.g., detecting anomalies per Region, Product, or Channel).
+- Automated Preprocessing: Handles missing dates via linear interpolation and filters out "low-quality" groups automatically.
+- Parallel Execution: Leverages joblib for multi-core processing of large datasets.
+- Visual Analytics: Generates pie charts, stacked bar plots, and detailed group-level time-series breakdowns.
+## Included Models
+The pipeline utilizes an ensemble of the following methodologies:
+- Statistical: Percentile (5th/95th), Standard Deviation (SD), Median Absolute Deviation (MAD), and Interquartile Range (IQR).
+- Time-Series Specific: EWMA (Exponentially Weighted Moving Average) and FB Prophet (Walk-forward validation).
+- Machine Learning: Isolation Forest (General & Time-series optimized) and DBSCAN.
+## Detailed Functionality
+- Robust Input Validation: Clear error messaging for missing parameters or incorrect data types.
+- Quality Control: Automatically generates a Success Report and an Exclusion Report (identifying groups dropped due to low history or high interpolation).
+- Visual Suite: Automated rendering of Pie Charts (Summary), Stacked Bars (Distribution), and Top-5 Anomaly Heatmaps.
+## 🚀 Quick Start
+```python
+!pip install anomaly-pipeline
+import pandas as pd
+from anomaly_pipeline import timeseries_anomaly_detection
+ # Load your data
+df = pd.read_csv("your_data.csv")
+ # Run the pipeline
+anomaly_df, success_report, exclusion_report = timeseries_anomaly_detection( master_data=df,
+                                                                             group_columns=['category', 'region'],
+                                                                             variable='sales',
+                                                                             date_column='timestamp',
+                                                                             freq='W-MON',
+                                                                             eval_period=1  # Evaluate the most recent recor
+                                                                             )
+```
+## 📊 Visualizing Results & Deep Dives
+Inspecting a Specific Group, if a specific group shows a high anomaly rate, use the evaluation_info tool to render detailed diagnostic plots.
+```python
+from anomaly_pipeline import evaluation_info
+# Filter the specific group you want to inspect. Define the group values (must match the order in group_columns)
+group_values = ['appliances', 'TX']
+# Filter the results for this group
+mask = anomaly_df[group_columns].eq(group_values).all(axis=1)
+group_df = anomaly_df[mask]
+# Generate detailed diagnostic plots
+evaluation_info(group_df,
+                group_columns,
+                variable,
+                date_column,
+                eval_period=1
+                )
+```
+The Evaluation Dashboard provides:
+- Model Breakdown: Individual charts for FB Prophet, EWMA, and Isolation Forest with confidence intervals.
+- Ensemble View: A summary highlighting where multiple models overlap.
+- Statistical Thresholds: Visual markers for IQR, MAD, percentile and SD limits.
+## Input_data:
+### Mandatory
+master_data: Input DataFrame containing variables, dates, and group identifiers.
+group_columns: Mandatory,"A list of column names defining the granularity of the time series". Ex: For sales, if the timeseries data is at store level and would like find the anamalous sales values at store level, then the group columns will be ["store"].
+variable: Mandatory,the column name containing the time series value being analyzed. Ex: for sales it is 'sales', for Ad_requests it is "ad_requests".
+date_column: Mandatory,the column name containing the timestamp.
+### Default
+freq: Optional,Pandas frequency string for calendar interpolation. Default : "W-MON" (Weekly, starting Monday)". If it is monthly data, change it to M, Daily change it to D.
+min_records: Minimum history required per group. Default is None; If None, extracts based on freq (1 Year + eval_period). Ex: if freq is weekly and eval_period is 1: min_records = 52+1.
+max_records: Maximum history to retain per group. Default is None; if the value is provided (N), filters for the most recent N records.
+contamination (float): Expected proportion of outliers in the data (0 to 0.5). Defaults to 0.03.
+random_state (int): Seed for reproducibility in stochastic models. Defaults to 42.
+alpha (float): Smoothing factor for trend calculations. Defaults to 0.3.
+sigma (float): Standard deviation multiplier for thresholding. Defaults to 1.5.
+eval_period: The number of trailing records in each group to evaluate for anomalies. Default to 1
+prophet_CI (float): The confidence level for the prediction interval (0 to 1). Defaults to 0.9.
+mad_scale_factor (float): This is a constant used to make the MAD comparable to the Standard Deviation. Default is 0.6745.
+mad_threshold (float): This is the  "sensitivity" dial. It determines how many "Adjusted MADs" a data point must be away from the median to be flagged as an anomaly. Default is 2.
+## Output columns: All the output values are at "group_columns" level.
+MIN_value
+The minimum historical "variable" values. For train data the value is fixed. For test data varies. It is the min_value up to t-1.
+________________________________________
+MAX_value
+The maximum historical "variable" values. For train data the value is fixed. For test data varies. It is the max_value up to t-1.
+________________________________________
+Percentile_low / Percentile_high
+The 5th and 95th percentile  "variable" values
+Used to detect unusually low or unusually high "variable" values. Fixed for train data. Varies for test data. Takes the stats by considering historical data upto t-1.
+________________________________________
+Percentile_anomaly
+Flags based on percentile limits:
+• Low → value < Percentile_low
+• High → value > Percentile_high
+• None → within the range
+________________________________________
+Mean / SD (Standard Deviation)
+The average "variable"and its standard deviation based on historical data.Fixed for train data. Varies for test data. Takes the stats by considering historical data upto t-1.
+________________________________________
+SD2_low / SD2_high
+Two-standard-deviation control limits:
+• SD2_low = mean − 2×SD (floored at 0)
+• SD2_high = mean + 2×SD
+__________________________________
+SD_anomaly
+Flags based on SD2 limits:
+• Low → value < SD2_low
+• High → value > SD2_high
+• None → within the range
+________________________________________
+Median / MAD (Median Absolute Deviation)
+Median of "variable" and the median of absolute deviations from the median.Fixed for train data. Varies for test data. Takes the stats by considering historical data upto t-1.
+Used for robust anomaly detection when data contains outliers.
+________________________________________
+MAD_low / MAD_high
+MAD-based limits:
+• MAD_low = median − 2 × MAD / 0.6745 (floored at 0)
+• MAD_high = median + 2 × MAD / 0.6745
+________________________________________
+MAD_anomaly
+Flags based on MAD limits:
+• Low → value < MAD_low
+• High → value > MAD_high
+• None → within the range
+________________________________________
+Q1 / Q3 / IQR (Interquartile Range)
+• Q1: 25th percentile
+• Q3: 75th percentile
+• IQR = Q3 − Q1
+Used to detect unusually low or high "variable" values.
+________________________________________
+IQR_low / IQR_high
+IQR-based limits:
+• IQR_low = Q1 − 1.5 × IQR (floored at 0)
+• IQR_high = Q3 + 1.5 × IQR
+______________________________________
+IQR_anomaly
+Flags based on IQR limits:
+• Low → value < IQR_low
+• High → value > IQR_high
+• None → within the range
+________________________________________
+is_Percentile_anomaly / is_SD_anomaly / is_MAD_anomaly / is_IQR_anomaly
+Boolean indicators stating whether each method classified the value as an anomaly (low or high).
+________________________________________
+Alpha
+Smoothing factor used in EWMA. Higher values give more weight to recent observations.
+________________________________________
+EWMA_forecast
+Expected value estimated using the EWMA model.
+________________________________________
+EWMA_STD
+Rolling standard deviation of residuals around the EWMA forecast.
+________________________________________
+EWMA_high
+Upper anomaly threshold (EWMA_forecast + sigma × EWMA_STD).
+_____________________________________
+EWMA_low
+lower anomaly threshold (EWMA_forecast - sigma × EWMA_STD).
+_____________________________________
+Is_EWMA_anomaly
+Boolean flag indicating whether the observed value falls outside the EWMA bounds.
+________________________________________
+FB_forecast
+Expected value estimated using the EWMA model.
+________________________________________
+FB_low
+Lower confidence interval of the Prophet forecast
+________________________________________
+FB_high
+Upper confidence interval of the Prophet forecast.
+_____________________________________
+FB_residual
+Difference between observed value and Prophet forecast.
+_____________________________________
+FB_anomaly
+Raw anomaly indicator based on Prophet confidence bounds.
+_____________________________________
+Is_FB_anomaly
+Boolean flag indicating a Prophet-detected anomaly.
+______________________________________
+isolation_forest_score
+Score from the Isolation Forest model indicating anomaly severity. Typical range: –0.5 to +0.5
+• Higher scores = more normal
+• Lower scores = more anomalous
+________________________________________
+is_IsoForest_anomaly
+Boolean flag based on Isolation Forest model output:
+• True → model predicts anomaly (prediction = –1)
+• False → model predicts normal (prediction = 1)
+______________________________________
+dbscan_score
+Cluster label or distance score produced by DBSCAN (-1 indicates noise/anomaly).
+________________________________________
+is_DBSCAN_anomaly
+Boolean flag indicating DBSCAN-detected anomaly.
+________________________________________
+Anomaly_Votes
+Count of anomaly-detection methods that agree a point is anomalous.
+Ranges from 0 to 8.
+________________________________________
+is_Anomaly
+Final ensemble decision:
+• True → value flagged anomalous by 4 or more methods
+• False → fewer than 4 methods indicate anomaly

anomaly_pipeline-0.1.61.dist-info/RECORD ADDED Viewed

@@ -0,0 +1,24 @@
+anomaly_pipeline/__init__.py,sha256=tuToyXjdvPfTX_xDghxGZ78ZdcHmG16JWUMgS-Sv5jU,3568
+anomaly_pipeline/main.py,sha256=0dVdNB5yu11rAxQtJtF1XLF7KzcC6ez5NOo3ZGfklqg,7973
+anomaly_pipeline/pipeline.py,sha256=94zsCAVKvjeScV2dbYucezAZ22jUZIFyq0ztVaWEDUo,15131
+anomaly_pipeline/helpers/DB_scan.py,sha256=aPel8hjoG5Am_T0OuFTlOqfxTgaa2aY-7tztGhX54eI,15884
+anomaly_pipeline/helpers/IQR.py,sha256=VlYU6Yf-4KQmVroLvzwd220jn5BUNJEchsVE4_KxKm4,2824
+anomaly_pipeline/helpers/MAD.py,sha256=qSzdQMGkd-ynFSqoOydg76YWHdSqWM2e7mSi9QSawcY,5821
+anomaly_pipeline/helpers/Preprocessing.py,sha256=WiW7WjeKXxipyuA1vW4kpZbGQn7h68Rfm3aWHCHW6Hs,14165
+anomaly_pipeline/helpers/STD.py,sha256=uVG3lU1k65TbKpNQWHS_rjsjfP9QFVeS23_GDeksagY,4448
+anomaly_pipeline/helpers/__init__.py,sha256=LGVuwJR2Bx-xh5pdatp7Riiv9NpetsqBkABX9m9xyUc,364
+anomaly_pipeline/helpers/baseline.py,sha256=h9t_LWcAw17P9qmoRQMceukGzOOr-gFLuHfVbipQB7M,3824
+anomaly_pipeline/helpers/cluster_functions.py,sha256=Nhk2YdKVynrKywEILg_5B2xD4zrCZ_ICWw3oOdTDHuA,13040
+anomaly_pipeline/helpers/evaluation_info.py,sha256=yzhvpQiMCP0f1Njrn0he6KKlqRQMDnNEVo5U_2H31jU,6531
+anomaly_pipeline/helpers/evaluation_plots.py,sha256=DcQqA2DNEjhDpUTj_-lpmw1rMYIcnulNU3ASmewE1cA,46110
+anomaly_pipeline/helpers/ewma.py,sha256=TRrshZWS06EA7H5vutA-GbO2BG9c0MFxWvcPC86uzE8,8586
+anomaly_pipeline/helpers/fb_prophet.py,sha256=Z12LsLl1lP6-urP422awEUCuHvHz6tZmRsOeEqLSbGY,10387
+anomaly_pipeline/helpers/help_anomaly.py,sha256=fIPVgrvfgUZ49AAc6e_b7InPOzVhWsF4lsmfl3lxtds,41173
+anomaly_pipeline/helpers/iso_forest_general.py,sha256=UAEt41lGGt5MhqYyOB7_8e7kRGT5HijdJX5WA9SMAhU,2427
+anomaly_pipeline/helpers/iso_forest_timeseries.py,sha256=Q-chpPmO4FkRBKRaZhIQdl-xISHfyFelmDC9V5_8PIQ,14562
+anomaly_pipeline/helpers/percentile.py,sha256=IGk9DrlIrf-rKOQnIS72-cP9meRfAP6NAZv1UIktm9k,5436
+anomaly_pipeline-0.1.61.dist-info/METADATA,sha256=75RO3V9iOX78_hxU4l_RfV7uo9g04FNMH_fh37CndSk,11669
+anomaly_pipeline-0.1.61.dist-info/WHEEL,sha256=_zCd3N1l69ArxyTb8rzEoP9TpbYXkqRFSNOD5OuxnTs,91
+anomaly_pipeline-0.1.61.dist-info/entry_points.txt,sha256=c7aMFN_VdyQk_gKp9S2-bz4AF3eBActUectAElnEdMo,92
+anomaly_pipeline-0.1.61.dist-info/top_level.txt,sha256=3QhrLt05iNbxIQhnAA0vmIkRQje4Hc_STGY_Tukx3Vg,17
+anomaly_pipeline-0.1.61.dist-info/RECORD,,

anomaly_pipeline-0.1.27.dist-info/METADATA DELETED Viewed

@@ -1,15 +0,0 @@
-Metadata-Version: 2.4
-Name: anomaly_pipeline
-Version: 0.1.27
-Requires-Dist: pandas
-Requires-Dist: numpy<2
-Requires-Dist: joblib
-Requires-Dist: prophet
-Requires-Dist: scikit-learn
-Requires-Dist: google-cloud-bigquery
-Requires-Dist: google-cloud-storage
-Requires-Dist: statsmodels
-Requires-Dist: plotly
-Requires-Dist: pandas-gbq
-Requires-Dist: gcsfs
-Dynamic: requires-dist

anomaly_pipeline-0.1.27.dist-info/RECORD DELETED Viewed

@@ -1,24 +0,0 @@
-anomaly_pipeline/__init__.py,sha256=ED-UPADjbdS8xjK41KmWVYcFIn6q_cN-SwBx-dRI-nM,77
-anomaly_pipeline/main.py,sha256=khiatXxr01XYHB8SrIfyTnlaCu008MA6ORGiI_2Tjr4,2925
-anomaly_pipeline/pipeline.py,sha256=3Lf9b0Vok-mqWDLhhZeN9emgx5i30stPrU8XOmKpmEw,11204
-anomaly_pipeline/helpers/DB_scan.py,sha256=80PLlubpcwY6dOUx5rm569hvFlGNa1rtvjs74US9oIk,8134
-anomaly_pipeline/helpers/IQR.py,sha256=VlYU6Yf-4KQmVroLvzwd220jn5BUNJEchsVE4_KxKm4,2824
-anomaly_pipeline/helpers/MAD.py,sha256=XDG8r9o1JNi7YZ2NKwNzqmu_Oyz2OPP2rThCuw8WZhs,3377
-anomaly_pipeline/helpers/Preprocessing.py,sha256=VsAohcAW1wTKDdNAF1xNF4j4I2gyZ8rOC1HjyK0NpGk,3933
-anomaly_pipeline/helpers/STD.py,sha256=SZ1UaS_Aa5ay6qWNzKpBXpQIloUuPlliOrfr7yHba4k,2769
-anomaly_pipeline/helpers/__init__.py,sha256=aDAAxiNAusL4rwcn9XbkUIApp3i02UXolB_CWvbbY_0,32
-anomaly_pipeline/helpers/baseline.py,sha256=h9t_LWcAw17P9qmoRQMceukGzOOr-gFLuHfVbipQB7M,3824
-anomaly_pipeline/helpers/cluster_functions.py,sha256=Nhk2YdKVynrKywEILg_5B2xD4zrCZ_ICWw3oOdTDHuA,13040
-anomaly_pipeline/helpers/evaluation_info.py,sha256=SXa1LkznNQXTOcFCbryRmRJMSNC_Fa2CU-HhFnyTIKY,6219
-anomaly_pipeline/helpers/evaluation_plots.py,sha256=xfyVlE7B4E376EL4AF8A4T5kUfqzPShGOSy548psT6M,21230
-anomaly_pipeline/helpers/ewma.py,sha256=YprdcvR17EQ4X9pJo5OusaD3jNaaoHvQLHRHHt25CGk,3562
-anomaly_pipeline/helpers/fb_prophet.py,sha256=-ivBIgMBPT4DG-hbGXPMB1-aiEBfLw2LQvy6eXKzELQ,3182
-anomaly_pipeline/helpers/help_info.py,sha256=QuRd206KQ8etRnlODH9Ek_zmXUvHSBwVQtukqf0iKSc,37012
-anomaly_pipeline/helpers/iso_forest_general.py,sha256=nonZl2wcLyHe0E50mqQUw_IB3tuMochmZKQNd0xMFVk,2350
-anomaly_pipeline/helpers/iso_forest_timeseries.py,sha256=SWf6g0mwLohIRdMvGfMCAcfWi5FPPokiV7dM8Un5qpE,5900
-anomaly_pipeline/helpers/percentile.py,sha256=eLk0PgY7m7z7VKTLfXg8ykKii0ciAJvlGOYXpv84mOE,2523
-anomaly_pipeline-0.1.27.dist-info/METADATA,sha256=YIIJMpsDchA8L2Jp0T4wBXpxwcL5r-UiJ35gLP6BRCs,371
-anomaly_pipeline-0.1.27.dist-info/WHEEL,sha256=_zCd3N1l69ArxyTb8rzEoP9TpbYXkqRFSNOD5OuxnTs,91
-anomaly_pipeline-0.1.27.dist-info/entry_points.txt,sha256=c7aMFN_VdyQk_gKp9S2-bz4AF3eBActUectAElnEdMo,92
-anomaly_pipeline-0.1.27.dist-info/top_level.txt,sha256=3QhrLt05iNbxIQhnAA0vmIkRQje4Hc_STGY_Tukx3Vg,17
-anomaly_pipeline-0.1.27.dist-info/RECORD,,

{anomaly_pipeline-0.1.27.dist-info → anomaly_pipeline-0.1.61.dist-info}/WHEEL RENAMED Viewed

File without changes

{anomaly_pipeline-0.1.27.dist-info → anomaly_pipeline-0.1.61.dist-info}/entry_points.txt RENAMED Viewed

File without changes

{anomaly_pipeline-0.1.27.dist-info → anomaly_pipeline-0.1.61.dist-info}/top_level.txt RENAMED Viewed

File without changes

anomaly-pipeline 0.1.27__py3-none-any.whl → 0.1.61__py3-none-any.whl

anomaly-pipeline 0.1.27py3-none-any.whl → 0.1.61py3-none-any.whl