PyPI - stackone-defender - Versions diffs - 0.6.3__tar.gz → 0.7.1__tar.gz - Mend

stackone-defender 0.6.3tar.gz → 0.7.1tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (62) hide show

stackone_defender-0.7.1/.release-please-manifest.json ADDED Viewed

	@@ -0,0 +1 @@
1	+ {".":"0.7.1"}

{stackone_defender-0.6.3 → stackone_defender-0.7.1}/CHANGELOG.md RENAMED Viewed

@@ -1,5 +1,31 @@
 # Changelog
+## [0.7.1](https://github.com/StackOneHQ/stackone-defender/compare/stackone-defender-v0.7.0...stackone-defender-v0.7.1) (2026-06-16)
+### Features
+* add defend_tool_results_async for npm batch parity ([a05783c](https://github.com/StackOneHQ/stackone-defender/commit/a05783c5671548aa66dfead1f129584b249d8778))
+* Python parity with @stackone/defender 0.7.1 (Tier 3) ([c58a17c](https://github.com/StackOneHQ/stackone-defender/commit/c58a17c9ba1a902148cde9204666f7f1a916d09b))
+* Tier 3 provider interface and cascade orchestration (TS 0.7.1 parity) ([f2b4109](https://github.com/StackOneHQ/stackone-defender/commit/f2b41096db4ca65741b9d4ba62f3fad7591929ab))
+### Bug Fixes
+* address Copilot PR review on Tier 3 orchestration ([570f567](https://github.com/StackOneHQ/stackone-defender/commit/570f56753292700a15b73725a12db426316468c6))
+* tighten Tier3ClassifyResult type and batch doc wording ([2515772](https://github.com/StackOneHQ/stackone-defender/commit/2515772f894dd2cbdaa51e9d0b39e26f151d257f))
+## [0.7.0](https://github.com/StackOneHQ/stackone-defender/compare/stackone-defender-v0.6.3...stackone-defender-v0.7.0) (2026-05-29)
+### ⚠ BREAKING CHANGES
+* The default ONNX model directory changed from minilm-full-aug to minilm-multihead-v5. Custom code that hardcoded the old path will no longer load.
+### Features
+* parity with TS defender 0.7.0 ([75d046a](https://github.com/StackOneHQ/stackone-defender/commit/75d046ab45066ee1f973e91357f7ecb23dea50c8))
 ## [0.6.3](https://github.com/StackOneHQ/stackone-defender/compare/stackone-defender-v0.6.2...stackone-defender-v0.6.3) (2026-05-26)

{stackone_defender-0.6.3 → stackone_defender-0.7.1}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.4
 Name: stackone-defender
-Version: 0.6.3
+Version: 0.7.1
 Summary: Indirect prompt injection defense for AI agents using tool calls
 Project-URL: Homepage, https://github.com/StackOneHQ/stackone-defender
 Project-URL: Repository, https://github.com/StackOneHQ/stackone-defender
@@ -204,6 +204,8 @@ class DefenseResult:
 ### `defense.defend_tool_results(items)`
+Sync batch API. When `enable_tier3=True`, uses one `asyncio.run()` and defends items **concurrently** via `asyncio.gather` (same scheduling model as npm `defendToolResults`; blocking sync providers still run one at a time on the event-loop thread). From async code, prefer `defend_tool_results_async`.
 ```python
 results = defense.defend_tool_results([
     {"value": email_data, "tool_name": "gmail_get_message"},
@@ -215,6 +217,17 @@ for r in results:
         print("Blocked:", ", ".join(r.fields_sanitized))
 ```
+### `await defense.defend_tool_results_async(items)`
+Async batch API — runs `defend_tool_result_async` per item concurrently via `asyncio.gather`. Required when Tier 3 is enabled inside a running event loop (e.g. FastAPI).
+```python
+results = await defense.defend_tool_results_async([
+    {"value": email_data, "tool_name": "gmail_get_message"},
+    {"value": doc_data, "tool_name": "documents_get"},
+])
+```
 ### `defense.analyze(text)`
 Tier 1 only — useful for debugging pattern hits without full tool-result traversal.

{stackone_defender-0.6.3 → stackone_defender-0.7.1}/README.md RENAMED Viewed

@@ -178,6 +178,8 @@ class DefenseResult:
 ### `defense.defend_tool_results(items)`
+Sync batch API. When `enable_tier3=True`, uses one `asyncio.run()` and defends items **concurrently** via `asyncio.gather` (same scheduling model as npm `defendToolResults`; blocking sync providers still run one at a time on the event-loop thread). From async code, prefer `defend_tool_results_async`.
 ```python
 results = defense.defend_tool_results([
     {"value": email_data, "tool_name": "gmail_get_message"},
@@ -189,6 +191,17 @@ for r in results:
         print("Blocked:", ", ".join(r.fields_sanitized))
 ```
+### `await defense.defend_tool_results_async(items)`
+Async batch API — runs `defend_tool_result_async` per item concurrently via `asyncio.gather`. Required when Tier 3 is enabled inside a running event loop (e.g. FastAPI).
+```python
+results = await defense.defend_tool_results_async([
+    {"value": email_data, "tool_name": "gmail_get_message"},
+    {"value": doc_data, "tool_name": "documents_get"},
+])
+```
 ### `defense.analyze(text)`
 Tier 1 only — useful for debugging pattern hits without full tool-result traversal.

{stackone_defender-0.6.3 → stackone_defender-0.7.1}/pyproject.toml RENAMED Viewed

@@ -1,6 +1,6 @@
 [project]
 name = "stackone-defender"
-version = "0.6.3"
+version = "0.7.1"
 description = "Indirect prompt injection defense for AI agents using tool calls"
 readme = "README.md"
 requires-python = ">=3.11"

{stackone_defender-0.6.3 → stackone_defender-0.7.1}/src/stackone_defender/__init__.py RENAMED Viewed

@@ -11,8 +11,9 @@ Usage:
         print(f"Blocked: {result.risk_level}")
 """
+from .classifiers.onnx_classifier import get_default_model_path
+from .classifiers.tier3_orchestrator import get_default_tier3_provider, set_default_tier3_provider
 from .core.prompt_defense import PromptDefense, create_prompt_defense
-from .utils.boundary import contains_boundary_patterns, generate_boundary_instructions
 from .sfe.preprocess import (
     DropDecision,
     SfePredictor,
@@ -21,20 +22,36 @@ from .sfe.preprocess import (
     get_default_sfe_model_path,
     sfe_preprocess,
 )
-from .types import DefenseResult, RiskLevel, Tier1Result
+from .types import (
+    DefenderMode,
+    DefenseResult,
+    MultiheadConfig,
+    RiskLevel,
+    Tier1Result,
+    Tier3Provider,
+    Tier3Verdict,
+)
+from .utils.boundary import contains_boundary_patterns, generate_boundary_instructions
 __all__ = [
+    "DefenderMode",
     "DefenseResult",
     "DropDecision",
+    "MultiheadConfig",
     "PromptDefense",
     "RiskLevel",
     "SfePredictor",
     "SfePreprocessResult",
     "Tier1Result",
+    "Tier3Provider",
+    "Tier3Verdict",
     "contains_boundary_patterns",
     "create_prompt_defense",
     "generate_boundary_instructions",
+    "get_default_model_path",
     "get_default_predictor",
     "get_default_sfe_model_path",
+    "get_default_tier3_provider",
+    "set_default_tier3_provider",
     "sfe_preprocess",
 ]

stackone_defender-0.7.1/src/stackone_defender/classifiers/onnx_classifier.py ADDED Viewed

@@ -0,0 +1,276 @@
+"""ONNX classifier for fine-tuned MiniLM prompt injection detection.
+Pipeline: text -> tokenizer -> ONNX Runtime -> logit -> ``sigmoid(logit / T)``
+-> score. Supports single-head ``[batch]`` / ``[batch, 1]`` models and
+multi-head ``[batch, 2]`` models (main + aux). Temperature ``T`` enables
+post-hoc calibration via temperature scaling.
+"""
+from __future__ import annotations
+import logging
+import math
+import threading
+from pathlib import Path
+from typing import Literal
+_logger = logging.getLogger(__name__)
+# Shared across all OnnxClassifier instances (keyed by resolved model dir path).
+_session_cache: dict[str, tuple[object, object]] = {}
+_registry_lock = threading.Lock()
+_load_locks: dict[str, threading.Lock] = {}
+def _lock_for_cache_key(cache_key: str) -> threading.Lock:
+    with _registry_lock:
+        if cache_key not in _load_locks:
+            _load_locks[cache_key] = threading.Lock()
+        return _load_locks[cache_key]
+def get_default_model_path() -> str:
+    """Return the absolute path to the bundled ONNX model directory.
+    Exported so :class:`Tier2Classifier` can read model-specific calibration
+    defaults from ``classifier_config.json`` at construction time without
+    needing an :class:`OnnxClassifier` instance.
+    """
+    return str(Path(__file__).resolve().parent.parent / "models" / "minilm-multihead-v5")
+# Back-compat shim retained for internal users; same value as the public name.
+def _default_model_path() -> str:
+    return get_default_model_path()
+def _sigmoid(x: float) -> float:
+    return 1.0 / (1.0 + math.exp(-x))
+class OnnxClassifier:
+    """ONNX Classifier for fine-tuned MiniLM models.
+    Loads the model lazily on first inference. The session and tokenizer
+    are cached at module level so multiple instances pointing at the same
+    model path share a single backing session (safe: ONNX Runtime
+    guarantees thread-safe ``Run()`` from v1.7.0, and the ``tokenizers``
+    library's encode methods do not mutate the tokenizer object).
+    """
+    _MAX_BATCH_CHUNK = 32
+    def __init__(self, model_path: str | None = None, temperature_t: float | None = None):
+        self._model_path = model_path or get_default_model_path()
+        self._session = None
+        self._tokenizer = None
+        self._max_length = 256
+        self._load_failed = False
+        # Output mode is detected lazily from the logits shape on the first
+        # inference call. ``None`` until then.
+        self._output_mode: Literal["single", "multi"] | None = None
+        # Temperature ``T`` must be a positive finite number. ``T <= 0`` is
+        # undefined (divide-by-zero or sign flip) and almost certainly a
+        # programming error rather than a config the caller wants gracefully
+        # ignored.
+        self._temperature_t = 1.0
+        if temperature_t is not None:
+            if not math.isfinite(temperature_t) or temperature_t <= 0:
+                raise ValueError(
+                    f"OnnxClassifier: temperature_t must be a positive finite number, got {temperature_t}"
+                )
+            self._temperature_t = float(temperature_t)
+    # ------------------------------------------------------------------
+    # Public introspection
+    # ------------------------------------------------------------------
+    def get_temperature(self) -> float:
+        """Current temperature scaling factor (``1.0`` = no calibration)."""
+        return self._temperature_t
+    def get_output_mode(self) -> Literal["single", "multi"] | None:
+        """Output mode of the loaded model.
+        ``None`` until the first inference runs. ``"multi"`` indicates the
+        model emits ``[batch, 2]`` logits (main + aux).
+        """
+        return self._output_mode
+    # ------------------------------------------------------------------
+    # Loading
+    # ------------------------------------------------------------------
+    def load_model(self, model_path: str | None = None) -> None:
+        if model_path:
+            self._model_path = model_path
+        if self._session is not None and self._tokenizer is not None:
+            return
+        if self._load_failed:
+            raise ImportError("ONNX dependencies not installed. Install with: pip install stackone-defender[onnx]")
+        self._load_model()
+    def _load_model(self) -> None:
+        cache_key = str(Path(self._model_path).resolve())
+        cached = _session_cache.get(cache_key)
+        if cached:
+            self._session, self._tokenizer = cached
+            return
+        with _lock_for_cache_key(cache_key):
+            cached = _session_cache.get(cache_key)
+            if cached:
+                self._session, self._tokenizer = cached
+                return
+            try:
+                import numpy as np  # noqa: F401
+                import onnxruntime as ort
+                from tokenizers import Tokenizer
+            except ImportError as e:
+                self._load_failed = True
+                _logger.warning("[defender] ONNX model failed to load: %s", e)
+                raise ImportError(
+                    "ONNX dependencies not installed. Install with: pip install stackone-defender[onnx]"
+                ) from e
+            try:
+                tokenizer_path = str(Path(self._model_path) / "tokenizer.json")
+                self._tokenizer = Tokenizer.from_file(tokenizer_path)
+                self._tokenizer.enable_truncation(max_length=self._max_length)
+                self._tokenizer.enable_padding(length=self._max_length)
+                onnx_path = str(Path(self._model_path) / "model_quantized.onnx")
+                self._session = ort.InferenceSession(onnx_path)
+            except Exception as e:
+                _logger.warning("[defender] ONNX model failed to load: %s", e)
+                raise
+            _session_cache[cache_key] = (self._session, self._tokenizer)
+    # ------------------------------------------------------------------
+    # Inference
+    # ------------------------------------------------------------------
+    def classify(self, text: str) -> float:
+        """Classify a single text, returning the main-head sigmoid score.
+        For multi-head models only the main score is returned; callers that
+        need the aux score should use :meth:`classify_pair`.
+        """
+        return self.classify_pair(text)[0]
+    def classify_pair(self, text: str) -> tuple[float, float | None]:
+        """Classify a single text, returning ``(main, aux)``.
+        ``aux`` is ``None`` for single-head models. Both scores are
+        sigmoid-activated with the configured temperature ``T``.
+        """
+        self._ensure_loaded()
+        import numpy as np
+        encoding = self._tokenizer.encode(text)
+        input_ids = np.array([encoding.ids], dtype=np.int64)
+        attention_mask = np.array([encoding.attention_mask], dtype=np.int64)
+        results = self._session.run(None, {"input_ids": input_ids, "attention_mask": attention_mask})
+        logits = results[0]
+        self._detect_output_mode(logits.shape)
+        t = self._temperature_t
+        row = logits[0]
+        # row shape: (), (1,) or (2,) depending on model export.
+        if self._output_mode == "multi":
+            main = _sigmoid(float(row[0]) / t)
+            aux = _sigmoid(float(row[1]) / t)
+            return main, aux
+        main_logit = float(row[0]) if hasattr(row, "__len__") and len(row) > 0 else float(row)
+        return _sigmoid(main_logit / t), None
+    def classify_batch(self, texts: list[str]) -> list[float]:
+        """Classify multiple texts; returns main-head scores only.
+        Back-compat wrapper around :meth:`classify_batch_pair`.
+        """
+        return [main for main, _ in self.classify_batch_pair(texts)]
+    def classify_batch_pair(self, texts: list[str]) -> list[tuple[float, float | None]]:
+        """Classify multiple texts, returning ``(main, aux)`` per row.
+        Aux is ``None`` per-row for single-head models. Chunks the input to
+        bound native memory; the attention matrix is ``O(chunk * seq_len^2)``,
+        and for MiniLM (``max_length=256``) a chunk of 32 keeps memory
+        under ~50MB per call.
+        """
+        if not texts:
+            return []
+        self._ensure_loaded()
+        all_pairs: list[tuple[float, float | None]] = []
+        for offset in range(0, len(texts), self._MAX_BATCH_CHUNK):
+            chunk = texts[offset : offset + self._MAX_BATCH_CHUNK]
+            all_pairs.extend(self._classify_batch_chunk_pair(chunk))
+        return all_pairs
+    def _classify_batch_chunk_pair(self, texts: list[str]) -> list[tuple[float, float | None]]:
+        import numpy as np
+        encodings = self._tokenizer.encode_batch(texts)
+        input_ids = np.array([e.ids for e in encodings], dtype=np.int64)
+        attention_mask = np.array([e.attention_mask for e in encodings], dtype=np.int64)
+        results = self._session.run(None, {"input_ids": input_ids, "attention_mask": attention_mask})
+        logits = results[0]
+        self._detect_output_mode(logits.shape)
+        t = self._temperature_t
+        pairs: list[tuple[float, float | None]] = []
+        if self._output_mode == "multi":
+            for i in range(len(texts)):
+                main = _sigmoid(float(logits[i][0]) / t)
+                aux = _sigmoid(float(logits[i][1]) / t)
+                pairs.append((main, aux))
+        else:
+            for i in range(len(texts)):
+                row = logits[i]
+                # ``row`` may be a scalar (shape ``[batch]``) or 1-vector.
+                main_logit = float(row[0]) if hasattr(row, "__len__") and len(row) > 0 else float(row)
+                pairs.append((_sigmoid(main_logit / t), None))
+        return pairs
+    def _detect_output_mode(self, dims) -> None:
+        """Detect output mode from the logits tensor shape on first inference.
+        - ``[batch]`` or ``[batch, 1]`` -> ``"single"``
+        - ``[batch, 2]`` -> ``"multi"`` (main + aux dual head)
+        Idempotent: subsequent calls are no-ops once mode is set.
+        """
+        if self._output_mode is not None:
+            return
+        if dims is None or len(dims) < 2:
+            self._output_mode = "single"
+            return
+        self._output_mode = "multi" if dims[1] == 2 else "single"
+    # ------------------------------------------------------------------
+    # Misc
+    # ------------------------------------------------------------------
+    def count_tokens(self, text: str) -> int:
+        self._ensure_loaded()
+        encoding = self._tokenizer.encode(text)
+        # Padding is enabled at a fixed length; count only real (attended) tokens.
+        return int(sum(encoding.attention_mask))
+    def get_max_length(self) -> int:
+        return self._max_length
+    def warmup(self) -> None:
+        self.load_model()
+    def is_loaded(self) -> bool:
+        return self._session is not None and self._tokenizer is not None
+    def _ensure_loaded(self) -> None:
+        if not self.is_loaded():
+            self.load_model()

{stackone_defender-0.6.3 → stackone_defender-0.7.1}/src/stackone_defender/classifiers/pattern_detector.py RENAMED Viewed

@@ -9,7 +9,10 @@ from __future__ import annotations
 import math
 import re
 import time
+import unicodedata
+from ..sanitizers.leet_normalizer import normalize_leet_speak
+from ..sanitizers.normalizer import normalize_unicode, normalize_whitespace, strip_combining_marks
 from ..types import PatternDefinition, PatternMatch, RiskLevel, StructuralFlag, Tier1Result
 from .patterns import ALL_PATTERNS, contains_filter_keywords
@@ -47,16 +50,83 @@ class PatternDetector:
             return self._empty_result(start)
         original_length = len(text)
-        analysis_text = text[: self._max_analysis_length] if len(text) > self._max_analysis_length else text
+        raw_text = text[: self._max_analysis_length] if len(text) > self._max_analysis_length else text
+        # Normalisation chain: collapse obfuscation before injection pattern
+        # matching. Order matters:
+        # 1. NFD-decompose: precomposed accents become base + combining mark.
+        # 2. strip_combining_marks: Zalgo defense + accent stripping.
+        # 3. normalize_unicode: homoglyphs/fullwidth -> ASCII.
+        # 4. normalize_whitespace: collapse spaced letters + embedded newlines.
+        # 5. normalize_leet_speak: 1gn0r3 -> ignore.
+        # NFD-decomposition lives here (not in normalize_unicode) because it
+        # strips legitimate accents like ``café`` -> ``cafe`` -- fine for
+        # analysis but would be data loss if returned to callers. The result
+        # is analysis-only and never returned.
+        analysis_text = normalize_leet_speak(
+            normalize_whitespace(
+                normalize_unicode(strip_combining_marks(unicodedata.normalize("NFD", raw_text)))
+            )
+        )
+        # Fast filter: short-circuit if neither raw nor normalised text
+        # contains keywords. Raw text is checked to preserve detection of
+        # obfuscation patterns (e.g. invisible unicode, leet-speak variants)
+        # that are normalised away before injection patterns run. Disable the
+        # fast filter when custom patterns are provided -- callers may add
+        # patterns whose keywords aren't in the static list.
         should_use_fast_filter = self._use_fast_filter and not self._has_custom
-        if should_use_fast_filter and not contains_filter_keywords(analysis_text):
-            flags = self._detect_structural_issues(analysis_text, original_length)
+        raw_has_keywords = not should_use_fast_filter or contains_filter_keywords(raw_text)
+        norm_has_keywords = not should_use_fast_filter or contains_filter_keywords(analysis_text)
+        if not raw_has_keywords and not norm_has_keywords:
+            flags = self._detect_structural_issues(raw_text, original_length)
             return self._create_result([], flags, start)
-        matches = self._detect_patterns(analysis_text)
-        flags = self._detect_structural_issues(analysis_text, original_length)
-        return self._create_result(matches, flags, start)
+        # Short-circuit: if normalisation produced no change, a single pass
+        # is sufficient and avoids doubling pattern work for plain-text input.
+        if raw_text == analysis_text:
+            matches = self._detect_patterns(raw_text) if raw_has_keywords else []
+            flags = self._detect_structural_issues(raw_text, original_length)
+            return self._create_result(matches, flags, start)
+        # Run patterns on raw text -- catches obfuscation-specific patterns
+        # (e.g. invisible_unicode, leetspeak_injection) that normalisation
+        # removes. Run whenever EITHER pass has keywords: if only the
+        # normalised text has keywords (pure leet-speak with no other
+        # fast-filter hits), we still want the raw pass to fire obfuscation
+        # patterns like leetspeak_injection.
+        raw_matches = (
+            self._detect_patterns(raw_text) if (raw_has_keywords or norm_has_keywords) else []
+        )
+        # Run patterns on normalised text -- catches injection patterns
+        # hidden behind leet-speak, whitespace, or homoglyph obfuscation.
+        # Matches are tagged ``normalised=True`` because their
+        # position/matched values reference the transformed text.
+        norm_matches_raw = self._detect_patterns(analysis_text) if norm_has_keywords else []
+        norm_matches = [
+            PatternMatch(
+                pattern=m.pattern,
+                matched=m.matched,
+                position=m.position,
+                category=m.category,
+                severity=m.severity,
+                normalised=True,
+            )
+            for m in norm_matches_raw
+        ]
+        # Merge: normalised matches take priority. Raw-only matches are
+        # appended for patterns that fired on the original text but not the
+        # normalised form (e.g. obfuscation-detection patterns that match the
+        # raw encoding characters).
+        seen_patterns = {m.pattern for m in norm_matches}
+        merged_matches: list[PatternMatch] = [*norm_matches]
+        merged_matches.extend(m for m in raw_matches if m.pattern not in seen_patterns)
+        flags = self._detect_structural_issues(raw_text, original_length)
+        return self._create_result(merged_matches, flags, start)
     # ------------------------------------------------------------------
     # Pattern detection
@@ -65,7 +135,6 @@ class PatternDetector:
     def _detect_patterns(self, text: str) -> list[PatternMatch]:
         matches: list[PatternMatch] = []
         for defn in self._patterns:
-            # Use finditer for all patterns (handles global-like behavior)
             for m in defn.pattern.finditer(text):
                 matches.append(
                     PatternMatch(

stackone-defender 0.6.3__tar.gz → 0.7.1__tar.gz

stackone-defender 0.6.3tar.gz → 0.7.1tar.gz