PyPI - stackone-defender - Versions diffs - 0.1.2__tar.gz → 0.6.2__tar.gz - Mend

stackone-defender 0.1.2tar.gz → 0.6.2tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (58) hide show

stackone_defender-0.6.2/.release-please-manifest.json ADDED Viewed

	@@ -0,0 +1 @@
1	+ {".":"0.6.2"}

stackone_defender-0.6.2/CHANGELOG.md ADDED Viewed

@@ -0,0 +1,117 @@
+# Changelog
+## [0.6.2](https://github.com/StackOneHQ/stackone-defender/compare/stackone-defender-v0.6.1...stackone-defender-v0.6.2) (2026-04-22)
+### ⚠ BREAKING CHANGES
+* Drop ToolSanitizationRule, config/sanitizer tool_rules, use_default_tool_rules, and get_tool_rule/should_skip_field. Matches @stackone/defender post ENG-12594.
+### Features
+* add missing functions for full TS API parity ([aec0c5b](https://github.com/StackOneHQ/stackone-defender/commit/aec0c5b8d31715df7e4ec2e4d306b55d595bb1c3))
+* add PyPI publishing setup with Release Please CI ([2e28373](https://github.com/StackOneHQ/stackone-defender/commit/2e28373a27315dbb5e7deb23621977fe7fa2f7bc))
+* add tier2_fields filter and export ToolSanitizationRule ([cb7fd93](https://github.com/StackOneHQ/stackone-defender/commit/cb7fd93fb88a30f40edc171ef3fcdc5d6ce2534d))
+* align Python defender with Node (Tier 2 scoping, ONNX cache) ([482bfdd](https://github.com/StackOneHQ/stackone-defender/commit/482bfdda59b4617a75bc261621984cc321d28989))
+* **ENG-12402:** add PyPI publishing setup with Release Please CI ([f979748](https://github.com/StackOneHQ/stackone-defender/commit/f979748a8a3b2084ea241c352866adcfcd0145ea))
+* **ENG-12699:** TypeScript parity and synced ONNX bundle ([0449800](https://github.com/StackOneHQ/stackone-defender/commit/0449800fc2375c89ef231f5671f9a74bd84d3388))
+* port stackone-defender from TypeScript to Python ([e3ff70d](https://github.com/StackOneHQ/stackone-defender/commit/e3ff70dd6a0bc94578dc4dbfde87c5d75f00b7b8))
+* remove tool rules; batch Tier2 ONNX; lock ONNX load ([26c95c2](https://github.com/StackOneHQ/stackone-defender/commit/26c95c257175c892ae4be82ab7c17a099c1b6c6e))
+* **sanitizer:** remove dead use_tier2_classification from ToolResultSanitizer ([4646179](https://github.com/StackOneHQ/stackone-defender/commit/46461798fcf5acc6ac6e23bc65177c35d9353d9c))
+* sync Python package with TypeScript parity ([e1836dd](https://github.com/StackOneHQ/stackone-defender/commit/e1836dd967ad23997983ef1607118d1a25807e1c))
+* upgrade ML classifier to jbv2 model (AgentShield 73.7 → 79.8) ([bcd27f8](https://github.com/StackOneHQ/stackone-defender/commit/bcd27f8abf954700276249f9b03de34f733c67c4))
+* upgrade ML classifier to jbv5 (AgentShield 79.8 → 81.1) ([781dd10](https://github.com/StackOneHQ/stackone-defender/commit/781dd1007e7a0db03d58619a23b69f1b5d73e85d))
+### Bug Fixes
+* address Copilot/cubic review (Tier2 scope, tokens, SFE, thresholds) ([bf173ac](https://github.com/StackOneHQ/stackone-defender/commit/bf173ac42f6aaa7513ea2a1fc19083806a5c5ee1))
+* **ci:** avoid fasttext-wheel on Python 3.13 ([a6cda76](https://github.com/StackOneHQ/stackone-defender/commit/a6cda76894e3cd240c4f104e701e3202babb2682))
+* **classifier:** surface classification errors in classify_by_sentence skip_reason ([bd94639](https://github.com/StackOneHQ/stackone-defender/commit/bd9463978dac5572f999d8ec3ed1adbaf0bb97f2))
+* default enable_tier2 to True to match TypeScript SDK behaviour ([d66773b](https://github.com/StackOneHQ/stackone-defender/commit/d66773bee026517d09dd56b9311dd3c281c6f675))
+* **defender:** fix _extract_strings filtering, None checks, and cache ONNX load failure ([bf4ce99](https://github.com/StackOneHQ/stackone-defender/commit/bf4ce993287db9e067b661100b5bd92cc21aef6b))
+* **defender:** sync hasThreats blocking logic and tool rules precedence from JS package ([a217c3e](https://github.com/StackOneHQ/stackone-defender/commit/a217c3ef27aa0e4d92f21571bf0559ff9906f660))
+* enable tier2 by default to match TypeScript package ([f1fe990](https://github.com/StackOneHQ/stackone-defender/commit/f1fe990e1a81c32cb271f6ca85cc063f3da49223))
+* sync Python with TypeScript parity ([cec0813](https://github.com/StackOneHQ/stackone-defender/commit/cec0813ff8cc98f4502d5916d285a28877983d98))
+* **tier2:** apply max_text_length truncation in classify_by_sentence ([a67d2c6](https://github.com/StackOneHQ/stackone-defender/commit/a67d2c6524fb1d6b4f9331f547f28221867038de))
+* upgrade ML classifier to jbv2 (AgentShield 73.7 → 79.8) ([b452b39](https://github.com/StackOneHQ/stackone-defender/commit/b452b39c718329355f50c418bd50c37da2ed8698))
+* upgrade ML classifier to jbv2 (AgentShield 73.7 → 79.8) ([ccb1204](https://github.com/StackOneHQ/stackone-defender/commit/ccb1204d5e3d9763bb916d71bb49b75039ceb197))
+* use uv instead of pip in README installation instructions ([519759f](https://github.com/StackOneHQ/stackone-defender/commit/519759f09c6fc1eb6bf97f53ad0cbd25c78e2893))
+### Dependencies
+* **sfe:** switch optional FastText bindings to fasttext-ng ([bc9cc28](https://github.com/StackOneHQ/stackone-defender/commit/bc9cc283bc2da9f10472415d4aa94a0df083ec3d))
+### Documentation
+* add README adapted from TypeScript package ([a03c757](https://github.com/StackOneHQ/stackone-defender/commit/a03c757a1760b797d9a3ef444950e2839ca1c52d))
+* update README — enable_tier2 defaults to True ([af0d059](https://github.com/StackOneHQ/stackone-defender/commit/af0d05957e39a83b7e6e18b1f78b95219b14a4f5))
+* update README to reflect changes in package name and Python version ([d2fc2ca](https://github.com/StackOneHQ/stackone-defender/commit/d2fc2ca1900e2f6410df2ec075c5a8a1c3ac241b))
+### Miscellaneous Chores
+* prepare patch release 0.6.2 ([7b3c105](https://github.com/StackOneHQ/stackone-defender/commit/7b3c105b2ce23f88f284d72e41c1917aefdc4537))
+## [0.6.1](https://github.com/StackOneHQ/stackone-defender/compare/stackone-defender-v0.1.2...stackone-defender-v0.6.1) (2026-04-21)
+### Features
+* align Python package behavior with `@stackone/defender` 0.6.1
+* add SFE preprocessing support (`use_sfe`) with fail-open optional runtime loading
+* add packed-chunk Tier 2 batching and density-adjusted scoring
+* add dangerous-key traversal hardening (`__proto__`, `constructor`, `prototype`)
+* add cumulative-risk fractional thresholds to reduce list-response false positives
+### Bug Fixes
+* use `fasttext-ng` instead of `fasttext-wheel` for the `[sfe]` extra and dev tests so Python 3.13 CI can install maintained FastText bindings (NumPy 2.3+).
+### Breaking Changes
+* Python package version jumps from `0.1.2` to `0.6.1` to align release train with TypeScript parity.
+* `DefenseResult` now includes `fields_dropped` and `truncated_at_depth`.
+## [0.1.2](https://github.com/StackOneHQ/stackone-defender/compare/stackone-defender-v0.1.1...stackone-defender-v0.1.2) (2026-04-08)
+### Bug Fixes
+* upgrade ML classifier to jbv2 (AgentShield 73.7 → 79.8) ([b452b39](https://github.com/StackOneHQ/stackone-defender/commit/b452b39c718329355f50c418bd50c37da2ed8698))
+### Documentation
+* update README to reflect changes in package name and Python version ([d2fc2ca](https://github.com/StackOneHQ/stackone-defender/commit/d2fc2ca1900e2f6410df2ec075c5a8a1c3ac241b))
+## [0.1.1](https://github.com/StackOneHQ/stackone-defender/compare/stackone-defender-v0.1.0...stackone-defender-v0.1.1) (2026-04-08)
+### Features
+* add missing functions for full TS API parity ([aec0c5b](https://github.com/StackOneHQ/stackone-defender/commit/aec0c5b8d31715df7e4ec2e4d306b55d595bb1c3))
+* add PyPI publishing setup with Release Please CI ([2e28373](https://github.com/StackOneHQ/stackone-defender/commit/2e28373a27315dbb5e7deb23621977fe7fa2f7bc))
+* add tier2_fields filter and export ToolSanitizationRule ([cb7fd93](https://github.com/StackOneHQ/stackone-defender/commit/cb7fd93fb88a30f40edc171ef3fcdc5d6ce2534d))
+* **ENG-12402:** add PyPI publishing setup with Release Please CI ([f979748](https://github.com/StackOneHQ/stackone-defender/commit/f979748a8a3b2084ea241c352866adcfcd0145ea))
+* port stackone-defender from TypeScript to Python ([e3ff70d](https://github.com/StackOneHQ/stackone-defender/commit/e3ff70dd6a0bc94578dc4dbfde87c5d75f00b7b8))
+* **sanitizer:** remove dead use_tier2_classification from ToolResultSanitizer ([4646179](https://github.com/StackOneHQ/stackone-defender/commit/46461798fcf5acc6ac6e23bc65177c35d9353d9c))
+* sync Python package with TypeScript parity ([e1836dd](https://github.com/StackOneHQ/stackone-defender/commit/e1836dd967ad23997983ef1607118d1a25807e1c))
+### Bug Fixes
+* **classifier:** surface classification errors in classify_by_sentence skip_reason ([bd94639](https://github.com/StackOneHQ/stackone-defender/commit/bd9463978dac5572f999d8ec3ed1adbaf0bb97f2))
+* **defender:** fix _extract_strings filtering, None checks, and cache ONNX load failure ([bf4ce99](https://github.com/StackOneHQ/stackone-defender/commit/bf4ce993287db9e067b661100b5bd92cc21aef6b))
+* **defender:** sync hasThreats blocking logic and tool rules precedence from JS package ([a217c3e](https://github.com/StackOneHQ/stackone-defender/commit/a217c3ef27aa0e4d92f21571bf0559ff9906f660))
+* enable tier2 by default to match TypeScript package ([f1fe990](https://github.com/StackOneHQ/stackone-defender/commit/f1fe990e1a81c32cb271f6ca85cc063f3da49223))
+* sync Python with TypeScript parity ([cec0813](https://github.com/StackOneHQ/stackone-defender/commit/cec0813ff8cc98f4502d5916d285a28877983d98))
+* use uv instead of pip in README installation instructions ([519759f](https://github.com/StackOneHQ/stackone-defender/commit/519759f09c6fc1eb6bf97f53ad0cbd25c78e2893))
+### Documentation
+* add README adapted from TypeScript package ([a03c757](https://github.com/StackOneHQ/stackone-defender/commit/a03c757a1760b797d9a3ef444950e2839ca1c52d))
+## Changelog

{stackone_defender-0.1.2 → stackone_defender-0.6.2}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.4
 Name: stackone-defender
-Version: 0.1.2
+Version: 0.6.2
 Summary: Indirect prompt injection defense for AI agents using tool calls
 Project-URL: Homepage, https://github.com/StackOneHQ/stackone-defender
 Project-URL: Repository, https://github.com/StackOneHQ/stackone-defender
@@ -20,6 +20,8 @@ Provides-Extra: onnx
 Requires-Dist: numpy>=1.24.0; extra == 'onnx'
 Requires-Dist: onnxruntime>=1.16.0; extra == 'onnx'
 Requires-Dist: tokenizers>=0.15.0; extra == 'onnx'
+Provides-Extra: sfe
+Requires-Dist: fasttext-ng>=0.9.3; extra == 'sfe'
 Description-Content-Type: text/markdown
 <div align="center">
@@ -74,6 +76,15 @@ pip install stackone-defender[onnx]
 The ONNX model (~22MB) is bundled in the wheel — no extra downloads at runtime.
+**SFE preprocessor (optional)** — add extras:
+```bash
+pip install stackone-defender[sfe]
+# or: uv add "stackone-defender[sfe]"
+```
+The `[sfe]` extra installs [`fasttext-ng`](https://pypi.org/project/fasttext-ng/) (provides the `fasttext` module). It requires **NumPy 2.3+**. PyPI may ship a wheel only for some platforms; otherwise pip/uv builds from source (needs a C++ toolchain).
 ## Quick start
 ```python
@@ -113,11 +124,17 @@ else:
 ### Tier 2 — ML classification (ONNX)
-Sentence-level MiniLM classifier (int8 ONNX ~22 MB, bundled):
+Packed-chunk MiniLM classifier (int8 ONNX ~22 MB, bundled):
-- Split text into sentences, score each (0.0 = benign, 1.0 = injection-like), take the max
+- Split text into sentences, pack to model-sized chunks, score chunks in batched ONNX calls
 - Catches paraphrased or novel injections missed by regex
-- Roughly ~10 ms per batch after warmup (CPU)
+- Uses chunked batch inference to bound memory on large payloads
+### Optional SFE preprocessor
+- `use_sfe=True` enables a field-level FastText pass before Tier 1/Tier 2
+- Drops metadata-like leaves (IDs, enum-like strings) and keeps user-facing content
+- Fails open if the runtime/model is unavailable: payload continues unfiltered
 **Benchmarks** (F1 @ threshold 0.5):
@@ -150,6 +167,7 @@ defense = create_prompt_defense(
     block_high_risk=False,
     default_risk_level="medium",
     tier2_fields=["subject", "body", "snippet"],  # optional: scope Tier 2 to these JSON keys
+    use_sfe=True,  # optional: enable semantic field extractor preprocessing
     config={
         "tier2": {
             "high_risk_threshold": 0.8,
@@ -164,6 +182,8 @@ defense = create_prompt_defense(
 Runs Tier 1 sanitization on risky fields, then Tier 2 on extracted text (with optional field scoping). **Synchronous** — no `await`.
 ```python
+from dataclasses import dataclass, field
 @dataclass
 class DefenseResult:
     allowed: bool
@@ -175,6 +195,8 @@ class DefenseResult:
     tier2_score: float | None = None
     tier2_skip_reason: str | None = None
     max_sentence: str | None = None
+    fields_dropped: list[str] = field(default_factory=list)
+    truncated_at_depth: bool | None = None
     latency_ms: float = 0.0
 ```

{stackone_defender-0.1.2 → stackone_defender-0.6.2}/README.md RENAMED Viewed

@@ -50,6 +50,15 @@ pip install stackone-defender[onnx]
 The ONNX model (~22MB) is bundled in the wheel — no extra downloads at runtime.
+**SFE preprocessor (optional)** — add extras:
+```bash
+pip install stackone-defender[sfe]
+# or: uv add "stackone-defender[sfe]"
+```
+The `[sfe]` extra installs [`fasttext-ng`](https://pypi.org/project/fasttext-ng/) (provides the `fasttext` module). It requires **NumPy 2.3+**. PyPI may ship a wheel only for some platforms; otherwise pip/uv builds from source (needs a C++ toolchain).
 ## Quick start
 ```python
@@ -89,11 +98,17 @@ else:
 ### Tier 2 — ML classification (ONNX)
-Sentence-level MiniLM classifier (int8 ONNX ~22 MB, bundled):
+Packed-chunk MiniLM classifier (int8 ONNX ~22 MB, bundled):
-- Split text into sentences, score each (0.0 = benign, 1.0 = injection-like), take the max
+- Split text into sentences, pack to model-sized chunks, score chunks in batched ONNX calls
 - Catches paraphrased or novel injections missed by regex
-- Roughly ~10 ms per batch after warmup (CPU)
+- Uses chunked batch inference to bound memory on large payloads
+### Optional SFE preprocessor
+- `use_sfe=True` enables a field-level FastText pass before Tier 1/Tier 2
+- Drops metadata-like leaves (IDs, enum-like strings) and keeps user-facing content
+- Fails open if the runtime/model is unavailable: payload continues unfiltered
 **Benchmarks** (F1 @ threshold 0.5):
@@ -126,6 +141,7 @@ defense = create_prompt_defense(
     block_high_risk=False,
     default_risk_level="medium",
     tier2_fields=["subject", "body", "snippet"],  # optional: scope Tier 2 to these JSON keys
+    use_sfe=True,  # optional: enable semantic field extractor preprocessing
     config={
         "tier2": {
             "high_risk_threshold": 0.8,
@@ -140,6 +156,8 @@ defense = create_prompt_defense(
 Runs Tier 1 sanitization on risky fields, then Tier 2 on extracted text (with optional field scoping). **Synchronous** — no `await`.
 ```python
+from dataclasses import dataclass, field
 @dataclass
 class DefenseResult:
     allowed: bool
@@ -151,6 +169,8 @@ class DefenseResult:
     tier2_score: float | None = None
     tier2_skip_reason: str | None = None
     max_sentence: str | None = None
+    fields_dropped: list[str] = field(default_factory=list)
+    truncated_at_depth: bool | None = None
     latency_ms: float = 0.0
 ```

{stackone_defender-0.1.2 → stackone_defender-0.6.2}/pyproject.toml RENAMED Viewed

@@ -1,6 +1,6 @@
 [project]
 name = "stackone-defender"
-version = "0.1.2"
+version = "0.6.2"
 description = "Indirect prompt injection defense for AI agents using tool calls"
 readme = "README.md"
 requires-python = ">=3.11"
@@ -25,6 +25,9 @@ Repository = "https://github.com/StackOneHQ/stackone-defender"
 [project.optional-dependencies]
 onnx = ["onnxruntime>=1.16.0", "tokenizers>=0.15.0", "numpy>=1.24.0"]
+# fasttext-ng provides the `fasttext` module (maintained bindings; supports 3.13).
+# Pulls numpy>=2.3; SFE still fail-opens when import/load fails.
+sfe = ["fasttext-ng>=0.9.3"]
 [dependency-groups]
 dev = [
@@ -32,6 +35,7 @@ dev = [
   "onnxruntime>=1.16.0",
   "tokenizers>=0.15.0",
   "numpy>=1.24.0",
+  "fasttext-ng>=0.9.3",
 ]
 [build-system]

{stackone_defender-0.1.2 → stackone_defender-0.6.2}/src/stackone_defender/__init__.py RENAMED Viewed

@@ -12,12 +12,26 @@ Usage:
 """
 from .core.prompt_defense import PromptDefense, create_prompt_defense
+from .sfe.preprocess import (
+    DropDecision,
+    SfePredictor,
+    SfePreprocessResult,
+    get_default_predictor,
+    get_default_sfe_model_path,
+    sfe_preprocess,
+)
 from .types import DefenseResult, RiskLevel, Tier1Result
 __all__ = [
     "DefenseResult",
+    "DropDecision",
     "PromptDefense",
     "RiskLevel",
+    "SfePredictor",
+    "SfePreprocessResult",
     "Tier1Result",
     "create_prompt_defense",
+    "get_default_predictor",
+    "get_default_sfe_model_path",
+    "sfe_preprocess",
 ]

{stackone_defender-0.1.2 → stackone_defender-0.6.2}/src/stackone_defender/classifiers/onnx_classifier.py RENAMED Viewed

@@ -37,6 +37,8 @@ def _sigmoid(x: float) -> float:
 class OnnxClassifier:
     """ONNX Classifier for fine-tuned MiniLM models."""
+    _MAX_BATCH_CHUNK = 32
     def __init__(self, model_path: str | None = None):
         self._model_path = model_path or _default_model_path()
         self._session = None
@@ -105,10 +107,17 @@ class OnnxClassifier:
         return _sigmoid(logit)
     def classify_batch(self, texts: list[str]) -> list[float]:
-        """Classify multiple texts in batch."""
+        """Classify multiple texts in batch, bounded by chunk size."""
         if not texts:
             return []
         self._ensure_loaded()
+        all_scores: list[float] = []
+        for offset in range(0, len(texts), self._MAX_BATCH_CHUNK):
+            chunk = texts[offset: offset + self._MAX_BATCH_CHUNK]
+            all_scores.extend(self._classify_batch_chunk(chunk))
+        return all_scores
+    def _classify_batch_chunk(self, texts: list[str]) -> list[float]:
         import numpy as np
         encodings = self._tokenizer.encode_batch(texts)
@@ -119,6 +128,15 @@ class OnnxClassifier:
         logits = results[0]
         return [_sigmoid(float(logits[i][0])) for i in range(len(texts))]
+    def count_tokens(self, text: str) -> int:
+        self._ensure_loaded()
+        encoding = self._tokenizer.encode(text)
+        # Padding is enabled at a fixed length; count only real (attended) tokens.
+        return int(sum(encoding.attention_mask))
+    def get_max_length(self) -> int:
+        return self._max_length
     def warmup(self) -> None:
         self.load_model()

stackone_defender-0.6.2/src/stackone_defender/classifiers/tier2_classifier.py ADDED Viewed

@@ -0,0 +1,291 @@
+"""Tier 2 Classifier: ML-based prompt injection detection (ONNX only)."""
+from __future__ import annotations
+import re
+import time
+from typing import Any
+from ..types import RiskLevel, Tier2Result
+from .onnx_classifier import OnnxClassifier
+DEFAULT_TIER2_CLASSIFIER_CONFIG = {
+    "high_risk_threshold": 0.8,
+    "medium_risk_threshold": 0.5,
+    "min_text_length": 10,
+    "max_text_length": 10000,
+}
+class Tier2Classifier:
+    """Tier 2 Classifier using ONNX inference."""
+    def __init__(self, config: dict | None = None):
+        cfg = dict(DEFAULT_TIER2_CLASSIFIER_CONFIG)
+        if config:
+            cfg.update(config)
+        self._high_risk_threshold: float = cfg["high_risk_threshold"]
+        self._medium_risk_threshold: float = cfg["medium_risk_threshold"]
+        self._min_text_length: int = cfg["min_text_length"]
+        self._max_text_length: int = cfg["max_text_length"]
+        self._onnx = OnnxClassifier(cfg.get("onnx_model_path"))
+    def is_ready(self) -> bool:
+        return self._onnx.is_loaded()
+    def warmup(self) -> None:
+        self._onnx.warmup()
+    def classify(self, text: str) -> Tier2Result:
+        start = time.perf_counter()
+        if len(text) < self._min_text_length:
+            return Tier2Result(
+                score=0,
+                confidence=0,
+                skipped=True,
+                skip_reason=f"Text too short ({len(text)} < {self._min_text_length})",
+                latency_ms=_ms(start),
+            )
+        analysis_text = text[: self._max_text_length] if len(text) > self._max_text_length else text
+        try:
+            score = self._onnx.classify(analysis_text)
+            confidence = abs(score - 0.5) * 2
+            return Tier2Result(score=score, confidence=confidence, skipped=False, latency_ms=_ms(start))
+        except Exception as e:
+            return Tier2Result(
+                score=0,
+                confidence=0,
+                skipped=True,
+                skip_reason=f"Classification error: {e}",
+                latency_ms=_ms(start),
+            )
+    def classify_batch(self, texts: list[str]) -> list[Tier2Result]:
+        return [self.classify(t) for t in texts]
+    def classify_by_sentence(self, text: str) -> dict[str, Any]:
+        """Classify text by sentence and return max score."""
+        start = time.perf_counter()
+        sentences = _split_into_sentences(text)
+        if not sentences:
+            return {"score": 0, "confidence": 0, "skipped": True, "skip_reason": "No sentences found", "latency_ms": _ms(start)}
+        original_sentences: list[str] = []
+        classifiable: list[str] = []
+        for sentence in sentences:
+            if len(sentence) < self._min_text_length:
+                continue
+            original_sentences.append(sentence)
+            classifiable.append(
+                sentence[: self._max_text_length] if len(sentence) > self._max_text_length else sentence
+            )
+        if not classifiable:
+            return {"score": 0, "confidence": 0, "skipped": True, "skip_reason": "No classifiable sentences", "latency_ms": _ms(start)}
+        try:
+            scores = self._onnx.classify_batch(classifiable)
+        except Exception as e:
+            return {
+                "score": 0,
+                "confidence": 0,
+                "skipped": True,
+                "skip_reason": f"Classification error: {e}",
+                "latency_ms": _ms(start),
+            }
+        sentence_scores: list[dict[str, Any]] = []
+        max_score = 0.0
+        max_sentence = ""
+        for sentence, score in zip(original_sentences, scores, strict=True):
+            safe_score = score if isinstance(score, (int, float)) and score == score else 0.0
+            sentence_scores.append({"sentence": sentence, "score": safe_score})
+            if safe_score > max_score:
+                max_score = safe_score
+                max_sentence = sentence
+        confidence = abs(max_score - 0.5) * 2
+        return {
+            "score": max_score,
+            "confidence": confidence,
+            "skipped": False,
+            "latency_ms": _ms(start),
+            "max_sentence": max_sentence,
+            "sentence_scores": sentence_scores,
+        }
+    def classify_by_chunks(self, text: str) -> dict[str, Any]:
+        start = time.perf_counter()
+        if len(text) < self._min_text_length:
+            return {"score": 0, "confidence": 0, "skipped": True, "skip_reason": "Text below minTextLength", "latency_ms": _ms(start)}
+        model_max_len = self._onnx.get_max_length()
+        bounded = text[: self._max_text_length] if len(text) > self._max_text_length else text
+        try:
+            self._onnx.warmup()
+        except Exception as e:
+            return {"score": 0, "confidence": 0, "skipped": True, "skip_reason": f"Warmup error: {e}", "latency_ms": _ms(start)}
+        try:
+            total_tokens = self._onnx.count_tokens(bounded)
+        except Exception as e:
+            return {"score": 0, "confidence": 0, "skipped": True, "skip_reason": f"Token count error: {e}", "latency_ms": _ms(start)}
+        if total_tokens <= model_max_len:
+            try:
+                score = self._onnx.classify(bounded)
+            except Exception as e:
+                return {"score": 0, "confidence": 0, "skipped": True, "skip_reason": f"Classification error: {e}", "latency_ms": _ms(start)}
+            safe_score = score if isinstance(score, (int, float)) and score == score else 0.0
+            return {
+                "score": safe_score,
+                "confidence": abs(safe_score - 0.5) * 2,
+                "skipped": False,
+                "max_sentence": bounded,
+                "sentence_scores": [{"sentence": bounded, "score": safe_score}],
+                "latency_ms": _ms(start),
+            }
+        max_content_tokens = model_max_len - 2
+        sentences = [s for s in _split_into_sentences(bounded) if len(s) >= self._min_text_length]
+        if not sentences:
+            return {"score": 0, "confidence": 0, "skipped": True, "skip_reason": "No classifiable sentences", "latency_ms": _ms(start)}
+        try:
+            chunks = self._pack_sentences(sentences, max_content_tokens)
+            scores = self._onnx.classify_batch(chunks)
+        except Exception as e:
+            return {"score": 0, "confidence": 0, "skipped": True, "skip_reason": f"Classification error: {e}", "latency_ms": _ms(start)}
+        max_score = 0.0
+        max_chunk = ""
+        chunk_scores: list[dict[str, Any]] = []
+        for i, raw in enumerate(scores):
+            safe_score = raw if isinstance(raw, (int, float)) and raw == raw else 0.0
+            chunk = chunks[i] if i < len(chunks) else ""
+            chunk_scores.append({"sentence": chunk, "score": safe_score})
+            if safe_score > max_score:
+                max_score = safe_score
+                max_chunk = chunk
+        return {
+            "score": max_score,
+            "confidence": abs(max_score - 0.5) * 2,
+            "skipped": False,
+            "max_sentence": max_chunk,
+            "sentence_scores": chunk_scores,
+            "latency_ms": _ms(start),
+        }
+    def prepare_chunks(self, text: str) -> dict[str, Any]:
+        if len(text) < self._min_text_length:
+            return {"chunks": [], "skipped": True, "skip_reason": "Text below minTextLength"}
+        model_max_len = self._onnx.get_max_length()
+        bounded = text[: self._max_text_length] if len(text) > self._max_text_length else text
+        try:
+            self._onnx.warmup()
+        except Exception as e:
+            return {"chunks": [], "skipped": True, "skip_reason": f"Warmup error: {e}"}
+        if len(bounded) + 2 <= model_max_len:
+            return {"chunks": [bounded], "skipped": False}
+        try:
+            total_tokens = self._onnx.count_tokens(bounded)
+        except Exception as e:
+            return {"chunks": [], "skipped": True, "skip_reason": f"Token count error: {e}"}
+        if total_tokens <= model_max_len:
+            return {"chunks": [bounded], "skipped": False}
+        max_content_tokens = model_max_len - 2
+        sentences = [s for s in _split_into_sentences(bounded) if len(s) >= self._min_text_length]
+        if not sentences:
+            return {"chunks": [], "skipped": True, "skip_reason": "No classifiable sentences"}
+        return {"chunks": self._pack_sentences(sentences, max_content_tokens), "skipped": False}
+    def classify_chunks_batch(self, chunks: list[str]) -> list[float]:
+        if not chunks:
+            return []
+        self._onnx.warmup()
+        return self._onnx.classify_batch(chunks)
+    def _pack_sentences(self, sentences: list[str], max_content_tokens: int) -> list[str]:
+        chunks: list[str] = []
+        current: list[str] = []
+        current_tokens = 0
+        for sentence in sentences:
+            sentence_tokens = self._onnx.count_tokens(sentence)
+            sentence_content_tokens = max(0, sentence_tokens - 2)
+            if sentence_content_tokens > max_content_tokens:
+                if current:
+                    chunks.append(" ".join(current))
+                    current = []
+                    current_tokens = 0
+                chunks.append(sentence)
+                continue
+            if current_tokens + sentence_content_tokens > max_content_tokens:
+                chunks.append(" ".join(current))
+                current = [sentence]
+                current_tokens = sentence_content_tokens
+            else:
+                current.append(sentence)
+                current_tokens += sentence_content_tokens
+        if current:
+            chunks.append(" ".join(current))
+        return chunks
+    def is_injection(self, text: str, threshold: float | None = None) -> bool:
+        result = self.classify(text)
+        if result.skipped:
+            return False
+        return result.score >= (threshold if threshold is not None else self._medium_risk_threshold)
+    def get_config(self) -> dict:
+        return {
+            "high_risk_threshold": self._high_risk_threshold,
+            "medium_risk_threshold": self._medium_risk_threshold,
+            "min_text_length": self._min_text_length,
+            "max_text_length": self._max_text_length,
+        }
+    def get_risk_level(self, score: float) -> RiskLevel:
+        if score >= self._high_risk_threshold:
+            return "high"
+        if score >= self._medium_risk_threshold:
+            return "medium"
+        return "low"
+def create_tier2_classifier(config: dict | None = None) -> Tier2Classifier:
+    return Tier2Classifier(config)
+def _ms(start: float) -> float:
+    return (time.perf_counter() - start) * 1000
+def _split_into_sentences(text: str) -> list[str]:
+    """Split text into sentences for granular analysis."""
+    sentences: list[str] = []
+    chunks = re.split(r"(?<=[.!?])\s+|\n\n+|\n(?=[A-Z0-9#\-*])|(?<=:)\s*\n", text)
+    for chunk in chunks:
+        trimmed = chunk.strip()
+        if not trimmed:
+            continue
+        if len(trimmed) > 200 and "\n" in trimmed:
+            for sub in trimmed.split("\n"):
+                sub = sub.strip()
+                if sub:
+                    sentences.append(sub)
+        else:
+            sentences.append(trimmed)
+    return sentences

stackone-defender 0.1.2__tar.gz → 0.6.2__tar.gz

stackone-defender 0.1.2tar.gz → 0.6.2tar.gz