PyPI - seq-hybrid-detector - Versions diffs - 0.1.0__tar.gz - Mend

seq-hybrid-detector 0.1.0__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (14) hide show

seq_hybrid_detector-0.1.0/LICENSE ADDED Viewed

@@ -0,0 +1,21 @@
+MIT License
+Copyright (c) 2026
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+SOFTWARE.

seq_hybrid_detector-0.1.0/PKG-INFO ADDED Viewed

@@ -0,0 +1,61 @@
+Metadata-Version: 2.4
+Name: seq_hybrid_detector
+Version: 0.1.0
+Summary: A sequential hybrid anomaly detection framework combining PyTorch GRUs and Isolation Forests.
+Author-email: Your Name <your.email@example.com>
+Classifier: Programming Language :: Python :: 3
+Classifier: License :: OSI Approved :: MIT License
+Classifier: Operating System :: OS Independent
+Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
+Requires-Python: >=3.8
+Description-Content-Type: text/markdown
+License-File: LICENSE
+Requires-Dist: numpy>=1.20.0
+Requires-Dist: pandas>=1.3.0
+Requires-Dist: torch>=2.0.0
+Requires-Dist: scikit-learn>=1.0.0
+Requires-Dist: joblib>=1.1.0
+Requires-Dist: matplotlib>=3.4.0
+Requires-Dist: seaborn>=0.11.0
+Dynamic: license-file
+# seq_hybrid_detector
+`seq_hybrid_detector` is a small Python package scaffold for sequence hybrid detection workflows.
+## What is included
+The repository currently provides a clean `src/` layout with starter modules for loading data, defining core models, and wiring a simple pipeline.
+## Project layout
+```text
+seq_hybrid_detector/
+├── LICENSE
+├── README.md
+├── pyproject.toml
+└── src/
+	└── seq_hybrid_detector/
+		├── __init__.py
+		├── data_engine.py
+		├── models.py
+		└── pipeline.py
+```
+## Installation
+From the project root, install the package in editable mode during development:
+```bash
+pip install -e .
+```
+## Modules
+- `data_engine.py` contains basic sequence loading and normalization helpers.
+- `models.py` defines lightweight dataclasses for input samples and predictions.
+- `pipeline.py` provides a starter pipeline class for fitting and predicting.
+## Status
+This is a scaffold, not a finished detector. The modules are in place for you to extend with real preprocessing, model training, and scoring logic.

seq_hybrid_detector-0.1.0/README.md ADDED Viewed

@@ -0,0 +1,40 @@
+# seq_hybrid_detector
+`seq_hybrid_detector` is a small Python package scaffold for sequence hybrid detection workflows.
+## What is included
+The repository currently provides a clean `src/` layout with starter modules for loading data, defining core models, and wiring a simple pipeline.
+## Project layout
+```text
+seq_hybrid_detector/
+├── LICENSE
+├── README.md
+├── pyproject.toml
+└── src/
+	└── seq_hybrid_detector/
+		├── __init__.py
+		├── data_engine.py
+		├── models.py
+		└── pipeline.py
+```
+## Installation
+From the project root, install the package in editable mode during development:
+```bash
+pip install -e .
+```
+## Modules
+- `data_engine.py` contains basic sequence loading and normalization helpers.
+- `models.py` defines lightweight dataclasses for input samples and predictions.
+- `pipeline.py` provides a starter pipeline class for fitting and predicting.
+## Status
+This is a scaffold, not a finished detector. The modules are in place for you to extend with real preprocessing, model training, and scoring logic.

seq_hybrid_detector-0.1.0/pyproject.toml ADDED Viewed

@@ -0,0 +1,31 @@
+[build-system]
+requires = ["setuptools>=61.0.0", "wheel"]
+build-backend = "setuptools.build_meta"
+[project]
+name = "seq_hybrid_detector"
+version = "0.1.0"
+authors = [
+  { name="Your Name", email="your.email@example.com" },
+]
+description = "A sequential hybrid anomaly detection framework combining PyTorch GRUs and Isolation Forests."
+readme = "README.md"
+requires-python = ">=3.8"
+classifiers = [
+    "Programming Language :: Python :: 3",
+    "License :: OSI Approved :: MIT License",
+    "Operating System :: OS Independent",
+    "Topic :: Scientific/Engineering :: Artificial Intelligence"
+]
+dependencies = [
+    "numpy>=1.20.0",
+    "pandas>=1.3.0",
+    "torch>=2.0.0",
+    "scikit-learn>=1.0.0",
+    "joblib>=1.1.0",
+    "matplotlib>=3.4.0",
+    "seaborn>=0.11.0"
+]
+[tool.setuptools.packages.find]
+where = ["src"]

seq_hybrid_detector-0.1.0/setup.cfg ADDED Viewed

@@ -0,0 +1,4 @@
+[egg_info]
+tag_build =
+tag_date = 0

seq_hybrid_detector-0.1.0/src/seq_hybrid_detector/__init__.py ADDED Viewed

@@ -0,0 +1,6 @@
+from .data_engine import load_and_clean_data
+from .models import GRUFeatureExtractor
+from .pipeline import CascadedIoTDetector
+__version__ = "0.1.0"
+__all__ = ["load_and_clean_data", "GRUFeatureExtractor", "CascadedIoTDetector"]

seq_hybrid_detector-0.1.0/src/seq_hybrid_detector/data_engine.py ADDED Viewed

@@ -0,0 +1,58 @@
+import os
+import pandas as pd
+import numpy as np
+def load_and_clean_data(data_path='./datasets/', files=None, samples_per_file=10000, correlation_threshold=0.85):
+    """
+    Loads raw tabular datasets, applies metadata filtering, handles encoding,
+    and drops highly correlated variables.
+    """
+    if files is None:
+        files = {
+            'b5': 'benign_samples_5sec.csv',
+            'b10': 'benign_samples_10sec.csv'
+        }
+    dfs = []
+    for key, fname in files.items():
+        path = os.path.join(data_path, fname)
+        if not os.path.exists(path):
+            continue
+        temp_df = pd.read_csv(path)
+        temp_df = temp_df.sample(n=min(samples_per_file, len(temp_df)), random_state=42)
+        temp_df["label"] = 0
+        dfs.append(temp_df)
+    if not dfs:
+        raise FileNotFoundError(f"No valid dataset files found within: {data_path}")
+    full_df = pd.concat(dfs, ignore_index=True).sample(frac=1, random_state=42).reset_index(drop=True)
+    # Metadata Filtering Block
+    drop_cols = ['device_name', 'device_mac', 'device_id', 'ip', 'timestamp']
+    existing_drop_cols = [col for col in drop_cols if col in full_df.columns]
+    if existing_drop_cols:
+        print(f">>> Dropping metadata identifiers: {existing_drop_cols}")
+        full_df = full_df.drop(columns=existing_drop_cols)
+    # Encode non-numeric/categorical variables
+    for col in full_df.select_dtypes(include=['object', 'category']).columns:
+        if col != "label":
+            full_df[col] = full_df[col].astype("category").cat.codes
+    # Correlation Filtering Block
+    feature_df = full_df.drop(columns=["label"], errors="ignore")
+    corr_matrix = feature_df.corr().abs()
+    upper_tri = corr_matrix.where(np.triu(np.ones(corr_matrix.shape), k=1).astype(bool))
+    to_drop = [column for column in upper_tri.columns if any(upper_tri[column] > correlation_threshold)]
+    if to_drop:
+        print(f">>> Dropping highly correlated redundant features (|r| > {correlation_threshold}): {to_drop}")
+        full_df = full_df.drop(columns=to_drop)
+    # Drop zero variance columns
+    feature_cols = [col for col in full_df.columns if col != "label"]
+    non_zero_var_cols = [col for col in feature_cols if (full_df[col] != full_df[col].iloc[0]).any()]
+    full_df = full_df[non_zero_var_cols + ["label"]]
+    return full_df

seq_hybrid_detector-0.1.0/src/seq_hybrid_detector/models.py ADDED Viewed

@@ -0,0 +1,19 @@
+import torch
+import torch.nn as nn
+class GRUFeatureExtractor(nn.Module):
+    def __init__(self, input_dim, hidden_dim=32, num_layers=1):
+        super().__init__()
+        self.gru = nn.GRU(
+            input_size=input_dim,
+            hidden_size=hidden_dim,
+            num_layers=num_layers,
+            batch_first=True
+        )
+        self.decoder = nn.Linear(hidden_dim, input_dim)
+    def forward(self, x):
+        out, h_n = self.gru(x)
+        hidden_context = h_n[-1]
+        reconstructed = self.decoder(hidden_context).unsqueeze(1).repeat(1, x.size(1), 1)
+        return reconstructed, hidden_context

seq_hybrid_detector-0.1.0/src/seq_hybrid_detector/pipeline.py ADDED Viewed

@@ -0,0 +1,121 @@
+import os
+import joblib
+import torch
+import torch.nn as nn
+import numpy as np
+from torch.utils.data import DataLoader, TensorDataset
+from sklearn.ensemble import IsolationForest
+from .models import GRUFeatureExtractor
+class CascadedIoTDetector:
+    def __init__(self, input_dim, hidden_dim=32, num_layers=1, window_size=8, contamination=0.01, n_estimators=300):
+        self.input_dim = input_dim
+        self.window_size = window_size
+        self.hidden_dim = hidden_dim
+        self.num_layers = num_layers
+        self.feature_extractor = GRUFeatureExtractor(input_dim, hidden_dim, num_layers)
+        self.anomaly_classifier = IsolationForest(
+            n_estimators=n_estimators,
+            max_samples=0.8,
+            contamination=contamination,
+            random_state=42,
+            n_jobs=-1
+        )
+    def create_sliding_windows(self, data):
+        sequences = []
+        for i in range(len(data) - self.window_size + 1):
+            sequences.append(data[i : i + self.window_size])
+        return np.array(sequences)
+    def fit_backbone(self, X_train_scaled, epochs=20, batch_size=256, lr=0.002):
+        X_seq = self.create_sliding_windows(X_train_scaled)
+        dataset = TensorDataset(torch.FloatTensor(X_seq))
+        dataloader = DataLoader(dataset, batch_size=batch_size, shuffle=True)
+        optimizer = torch.optim.RMSprop(self.feature_extractor.parameters(), lr=lr, alpha=0.99, eps=1e-08)
+        criterion = nn.MSELoss()
+        print(">>> Beginning GRU Feature Extractor Training Pipeline...")
+        self.feature_extractor.train()
+        for epoch in range(epochs):
+            total_loss = 0
+            for batch in dataloader:
+                inputs = batch[0]
+                optimizer.zero_grad()
+                reconstructed, _ = self.feature_extractor(inputs)
+                loss = criterion(reconstructed, inputs)
+                loss.backward()
+                optimizer.step()
+                total_loss += loss.item() * inputs.size(0)
+            if (epoch + 1) % 5 == 0 or epoch == 0:
+                print(f"    Epoch {epoch+1:02d}/{epochs:02d} | Sequence Reconstruction Loss: {total_loss/len(dataset):.6f}")
+    def extract_fused_features(self, X_scaled):
+        X_seq = self.create_sliding_windows(X_scaled)
+        self.feature_extractor.eval()
+        with torch.no_grad():
+            _, hidden_embeddings = self.feature_extractor(torch.FloatTensor(X_seq))
+            hidden_embeddings = hidden_embeddings.numpy()
+        point_features = X_scaled[self.window_size - 1 :]
+        fused_features = np.hstack((point_features, hidden_embeddings))
+        return fused_features, hidden_embeddings
+    def fit_ensemble(self, X_train_scaled):
+        print("\n>>> Extracting Combined Temporal-Point Feature Ensembles...")
+        fused_train, _ = self.extract_fused_features(X_train_scaled)
+        print(">>> Fitting Isolation Forest Engine...")
+        self.anomaly_classifier.fit(fused_train)
+    def compute_anomaly_scores(self, X_test_scaled):
+        fused_features, hidden_embeddings = self.extract_fused_features(X_test_scaled)
+        scores = -self.anomaly_classifier.decision_function(fused_features)
+        pad_length = self.window_size - 1
+        padded_scores = np.concatenate([np.repeat(scores[0], pad_length), scores])
+        return padded_scores, hidden_embeddings
+    def export_model(self, export_dir="exported_pipeline"):
+        """
+        Saves both the PyTorch weights and the Sklearn Isolation Forest model.
+        """
+        os.makedirs(export_dir, exist_ok=True)
+        # Save structural metadata config
+        config = {
+            "input_dim": self.input_dim,
+            "hidden_dim": self.hidden_dim,
+            "num_layers": self.num_layers,
+            "window_size": self.window_size
+        }
+        joblib.dump(config, os.path.join(export_dir, "config.pkl"))
+        # Save Backbones
+        torch.save(self.feature_extractor.state_dict(), os.path.join(export_dir, "gru_backbone.pt"))
+        joblib.dump(self.anomaly_classifier, os.path.join(export_dir, "isolation_forest.pkl"))
+        print(f">>> Successfully exported full model pipeline components to '{export_dir}/'")
+    @classmethod
+    def load_model(cls, export_dir="exported_pipeline"):
+        """
+        Loads components and reconstructs an instance of CascadedIoTDetector.
+        """
+        config = joblib.load(os.path.join(export_dir, "config.pkl"))
+        # Instantiate object with saved dimensions
+        instance = cls(
+            input_dim=config["input_dim"],
+            hidden_dim=config["hidden_dim"],
+            num_layers=config["num_layers"],
+            window_size=config["window_size"]
+        )
+        # Load states
+        instance.feature_extractor.load_state_dict(torch.load(os.path.join(export_dir, "gru_backbone.pt")))
+        instance.feature_extractor.eval()
+        instance.anomaly_classifier = joblib.load(os.path.join(export_dir, "isolation_forest.pkl"))
+        print(f">>> Successfully loaded framework artifacts from '{export_dir}/'")
+        return instance

seq_hybrid_detector-0.1.0/src/seq_hybrid_detector.egg-info/PKG-INFO ADDED Viewed

@@ -0,0 +1,61 @@
+Metadata-Version: 2.4
+Name: seq_hybrid_detector
+Version: 0.1.0
+Summary: A sequential hybrid anomaly detection framework combining PyTorch GRUs and Isolation Forests.
+Author-email: Your Name <your.email@example.com>
+Classifier: Programming Language :: Python :: 3
+Classifier: License :: OSI Approved :: MIT License
+Classifier: Operating System :: OS Independent
+Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
+Requires-Python: >=3.8
+Description-Content-Type: text/markdown
+License-File: LICENSE
+Requires-Dist: numpy>=1.20.0
+Requires-Dist: pandas>=1.3.0
+Requires-Dist: torch>=2.0.0
+Requires-Dist: scikit-learn>=1.0.0
+Requires-Dist: joblib>=1.1.0
+Requires-Dist: matplotlib>=3.4.0
+Requires-Dist: seaborn>=0.11.0
+Dynamic: license-file
+# seq_hybrid_detector
+`seq_hybrid_detector` is a small Python package scaffold for sequence hybrid detection workflows.
+## What is included
+The repository currently provides a clean `src/` layout with starter modules for loading data, defining core models, and wiring a simple pipeline.
+## Project layout
+```text
+seq_hybrid_detector/
+├── LICENSE
+├── README.md
+├── pyproject.toml
+└── src/
+	└── seq_hybrid_detector/
+		├── __init__.py
+		├── data_engine.py
+		├── models.py
+		└── pipeline.py
+```
+## Installation
+From the project root, install the package in editable mode during development:
+```bash
+pip install -e .
+```
+## Modules
+- `data_engine.py` contains basic sequence loading and normalization helpers.
+- `models.py` defines lightweight dataclasses for input samples and predictions.
+- `pipeline.py` provides a starter pipeline class for fitting and predicting.
+## Status
+This is a scaffold, not a finished detector. The modules are in place for you to extend with real preprocessing, model training, and scoring logic.

seq_hybrid_detector-0.1.0/src/seq_hybrid_detector.egg-info/SOURCES.txt ADDED Viewed

@@ -0,0 +1,12 @@
+LICENSE
+README.md
+pyproject.toml
+src/seq_hybrid_detector/__init__.py
+src/seq_hybrid_detector/data_engine.py
+src/seq_hybrid_detector/models.py
+src/seq_hybrid_detector/pipeline.py
+src/seq_hybrid_detector.egg-info/PKG-INFO
+src/seq_hybrid_detector.egg-info/SOURCES.txt
+src/seq_hybrid_detector.egg-info/dependency_links.txt
+src/seq_hybrid_detector.egg-info/requires.txt
+src/seq_hybrid_detector.egg-info/top_level.txt

seq_hybrid_detector-0.1.0/src/seq_hybrid_detector.egg-info/dependency_links.txt ADDED Viewed

	@@ -0,0 +1 @@
1	+

seq_hybrid_detector-0.1.0/src/seq_hybrid_detector.egg-info/requires.txt ADDED Viewed

@@ -0,0 +1,7 @@
+numpy>=1.20.0
+pandas>=1.3.0
+torch>=2.0.0
+scikit-learn>=1.0.0
+joblib>=1.1.0
+matplotlib>=3.4.0
+seaborn>=0.11.0

seq_hybrid_detector-0.1.0/src/seq_hybrid_detector.egg-info/top_level.txt ADDED Viewed

	@@ -0,0 +1 @@
1	+ seq_hybrid_detector