npm - agent-threat-rules - Versions diffs - 2.2.1 → 3.1.0 - Mend

agent-threat-rules 2.2.1 → 3.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (424) hide show

package/spec/atr-language-detection-v1.0.md ADDED Viewed

@@ -0,0 +1,218 @@
+# ATR Language Detection Algorithm v1.0
+> **STATUS: PROPOSED v1.0 — NOT YET RATIFIED.** This specification describes
+> a target algorithm for community comment. The current TypeScript production
+> engine continues to use its existing per-rule language detection. See
+> `STANDARDIZATION-STATUS.md` for full status.
+**Status:** Draft for AEP-001 ratification — NOT RATIFIED
+**Date:** 2026-05-25
+**License:** CC BY 4.0
+**Required by (on ratification):** Any rule that declares `condition.language` (i.e., a per-language regex condition)
+---
+## Why this spec exists
+ATR rules support per-language conditions:
+```yaml
+detection:
+  conditions:
+    - field: user_input
+      operator: regex
+      language: en
+      value: "ignore (?:all )?previous instructions"
+    - field: user_input
+      operator: regex
+      language: zh-Hant
+      value: "(?:忽略|無視)(?:前面所有|之前所有|所有先前)的?指示"
+```
+If different conformant engines disagree on which language a given
+input belongs to, the **same input fires different rules in different
+engines**. The rule corpus becomes non-portable. This is the
+detection-standard equivalent of a heisenbug.
+This document specifies a **deterministic algorithm** that all
+conformant engines MUST implement. Any conformant ATR engine running
+this algorithm on the same input must return the same language code.
+---
+## Algorithm specification
+### Input
+- `text`: a Unicode string of arbitrary length, encoded UTF-8.
+### Output
+- A two-letter ISO 639-1 language code from the supported set, OR the
+  three-letter ISO 639-3 code `und` (undetermined).
+### Supported languages (v1.0)
+| Code | Language | Unicode blocks (primary) |
+|---|---|---|
+| `en` | English | Basic Latin |
+| `zh-Hant` | Traditional Chinese | CJK Unified Ideographs (script-tagged Traditional via Unihan kZVariant inversion when available; defaults to Traditional for Taiwan / Hong Kong corpora) |
+| `zh-Hans` | Simplified Chinese | CJK Unified Ideographs (script-tagged Simplified) |
+| `ja` | Japanese | Hiragana + Katakana + CJK Unified Ideographs |
+| `es` | Spanish | Latin Extended-A + Latin-1 Supplement subset |
+| `ar` | Arabic | Arabic + Arabic Supplement |
+Additional languages may be added via AEP. Engines that do not
+implement a language MUST report `und` for inputs in that language,
+NOT fall back to a default.
+### Algorithm (deterministic, single-pass)
+```text
+function detectLanguage(text: string) -> string {
+  if length(text) == 0:
+    return "und"
+  // Phase 1: Unicode block frequency
+  blockCounts = empty histogram
+  totalCodepoints = 0
+  for codepoint in iterateUnicodeCodepoints(text):
+    if isWhitespace(codepoint) or isPunctuation(codepoint):
+      continue
+    blockCounts[unicodeBlockOf(codepoint)] += 1
+    totalCodepoints += 1
+  if totalCodepoints == 0:
+    return "und"
+  // Phase 2: dominant-block heuristic
+  THRESHOLD_DOMINANT = 0.60
+  dominantBlock, dominantCount = argmax(blockCounts)
+  if dominantCount / totalCodepoints < THRESHOLD_DOMINANT:
+    return classifyMixedScript(blockCounts, totalCodepoints)
+  // Phase 3: block-to-language mapping
+  switch dominantBlock:
+    case BASIC_LATIN:
+      // English is the default Latin script. Spanish detected only
+      // if Latin-1 Supplement subset (¿ ¡ ñ á é í ó ú) makes up
+      // ≥1.5% of codepoints.
+      if (count(BASIC_LATIN) + count(LATIN_1_SUPPLEMENT)) / totalCodepoints >= 0.85:
+        if hasSpanishMarkers(text) >= 0.015 * totalCodepoints:
+          return "es"
+        return "en"
+      return classifyMixedScript(blockCounts, totalCodepoints)
+    case CJK_UNIFIED_IDEOGRAPHS, CJK_UNIFIED_IDEOGRAPHS_EXT_A, ...:
+      // Disambiguate Chinese variants and Japanese
+      kanaCount = count(HIRAGANA) + count(KATAKANA)
+      if kanaCount >= 0.10 * totalCodepoints:
+        return "ja"
+      // Distinguish Hans vs Hant via Unihan kSimplifiedVariant /
+      // kTraditionalVariant lookups on sampled CJK codepoints.
+      // Tie-breaker: default to zh-Hant.
+      return distinguishHansHant(text)
+    case HIRAGANA, KATAKANA:
+      return "ja"
+    case ARABIC, ARABIC_SUPPLEMENT:
+      return "ar"
+    default:
+      return "und"
+}
+function classifyMixedScript(blockCounts, totalCodepoints) -> string {
+  // Mixed-script inputs (common when English technical terms are
+  // embedded in CJK or Arabic text):
+  //   1. If any single non-Latin script block ≥ 40% → return that script's language
+  //   2. Else → return the language whose block has highest count,
+  //      breaking ties by ISO 639-1 alphabetical order (ar, en, es, ja, zh-Hans, zh-Hant)
+  // The alphabetical tie-break is the deterministic fallback.
+  ...
+}
+```
+### Specific normative requirements for conformant implementations
+1. **Whitespace and punctuation are excluded from the frequency count.** Only "content codepoints" enter the histogram.
+2. **The 0.60 dominance threshold is normative.** Engines MUST NOT alter it without an AEP-level change.
+3. **Hans/Hant distinction is based on Unihan property data**, not on heuristic character set membership. Engines MUST use the Unicode Consortium's Unihan database for kSimplifiedVariant / kTraditionalVariant lookups.
+4. **Japanese detection is anchored on kana presence ≥ 10%**, not just on CJK ideograph presence. This prevents mis-classifying Chinese-only text as Japanese.
+5. **Spanish vs English is anchored on Spanish-specific markers** (`¿`, `¡`, `ñ`, accented vowels). Engines MUST require ≥ 1.5% of codepoints to be Spanish markers before classifying as `es`.
+6. **Tie-breaking is deterministic** via alphabetical ISO 639-1 ordering. No randomness, no implementation-defined behavior.
+7. **Unknown blocks default to `und`.** No fuzzy fallback. Rules tagged for unsupported languages do not fire on inputs the engine cannot classify.
+### Edge cases (normative)
+| Input | Required output |
+|---|---|
+| Empty string | `und` |
+| All whitespace | `und` |
+| Single English word | `en` |
+| Single Spanish word with ñ | `es` |
+| Single Japanese kana character | `ja` |
+| Single CJK ideograph (no kana, no Unihan disambiguation possible) | `zh-Hant` (tie-break default) |
+| Mixed 60% English + 40% Chinese | `en` (60% dominance reached) |
+| Mixed 50% English + 50% Chinese | `en` (alphabetical tie-break: `en` < `zh-Hans`) |
+| Pure punctuation | `und` |
+| Emoji-only | `und` (emoji are not content codepoints for language classification) |
+### Verification
+A conformant engine MUST pass the language-detection test corpus at
+`spec/conformance/language-detection/`. The corpus contains
+≥ 200 fixture inputs with expected outputs. Disagreement on any fixture
+is a spec violation.
+### Reasoning (non-normative)
+This algorithm is designed for **detection-rule dispatch**, not
+high-accuracy NLP. Two design choices follow:
+1. **Speed over recall**: ATR engines must classify in < 1 ms p99
+   for typical inputs to meet the < 100 ms total runtime budget per
+   rule. Block-frequency analysis is O(n) over codepoints and meets
+   this bound easily. NLP-grade detectors (FastText, langdetect)
+   require model loading and stochastic inference; both violate the
+   determinism requirement.
+2. **Determinism over accuracy on edge cases**: Two engines must
+   agree, even if both are slightly wrong on edge cases. A 90% accurate
+   deterministic algorithm is more useful than a 95% accurate
+   probabilistic one because the spec's portability promise depends on
+   bit-for-bit agreement.
+The algorithm is intentionally narrow: 6 languages, single-pass,
+explicit thresholds. AEPs may add languages or refine thresholds, but
+the v1.0 algorithm above is the conformance baseline.
+### Test vectors
+Engines testing for conformance must reproduce these outputs exactly.
+Full fixture set in `spec/conformance/language-detection/v1.0.json`.
+| # | Input (UTF-8) | Expected output |
+|---|---|---|
+| 1 | `""` | `und` |
+| 2 | `"   "` | `und` |
+| 3 | `"hello world"` | `en` |
+| 4 | `"Por favor, ¿podría ayudarme?"` | `es` |
+| 5 | `"こんにちは、世界"` | `ja` |
+| 6 | `"忽略所有先前指示"` | `zh-Hant` (tie-break) |
+| 7 | `"忽略所有先前的指示"` | `zh-Hant` (tie-break; "的" is shared simplified/traditional) |
+| 8 | `"忽略所有以前指示"` | `zh-Hans` (Unihan kSimplifiedVariant evidence) |
+| 9 | `"تجاهل جميع التعليمات السابقة"` | `ar` |
+| 10 | `"@mistralai/mistralai 中的 prompt injection"` | `zh-Hant` (Chinese > 40% non-Latin) |
+| 11 | `"call ATR-2026-00525"` | `en` |
+| 12 | `"  "` + `` (ZWS) | `und` |
+| 13 | `"😀😎🚀"` (emoji only) | `und` |
+### References
+- ISO 639-1 / ISO 639-3 language code registry: https://iso639-3.sil.org/
+- Unicode Block names: https://www.unicode.org/Public/UCD/latest/ucd/Blocks.txt
+- Unihan Database: https://www.unicode.org/charts/unihan.html
+- Spanish markers: derived from the Real Academia Española orthography guide
+- Why deterministic over probabilistic for spec dispatch: discussed in `STANDARD-THREAT-MODEL.md` Attacker class 1 (rule poisoner) which exploits any non-determinism in engine behaviour