npm - agent-threat-rules - Versions diffs - 2.2.1 → 3.0.5 - Mend

agent-threat-rules 2.2.1 → 3.0.5

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (101) hide show

package/spec/atr-event-v1.0.md ADDED Viewed

@@ -0,0 +1,294 @@
+# ATR Event Format v1.0 — OpenTelemetry-aligned
+> **STATUS: PROPOSED v1.0 — NOT YET RATIFIED.** This specification describes
+> a target event format for community comment. The current TypeScript production
+> engine continues to emit its existing event shape. Adopters should NOT
+> migrate to this format until ratification. See `STANDARDIZATION-STATUS.md`
+> for full status.
+**Status:** Draft for AEP-002 ratification — NOT RATIFIED
+**Date:** 2026-05-25
+**License:** CC BY 4.0
+**Required by (on ratification):** Conformant engine output, downstream SIEM/SOAR ingestion, EU AI Act Article 50 evidence chains
+---
+## Purpose
+When a conformant ATR engine fires a rule, it emits an **event**.
+This document specifies the event format.
+Three requirements forced the design:
+1. **OpenTelemetry alignment.** Existing agent-observability stacks
+   (LangSmith, Logfire, Datadog APM, Honeycomb) already ingest OTEL
+   spans. An ATR event that maps cleanly into an OTEL span attribute
+   set is consumable by these stacks zero-modification.
+2. **EU AI Act Article 50 evidence.** Article 50 obligations (apply
+   2 August 2026) require deployer-side evidence of AI interaction.
+   ATR events must carry sufficient identity + provenance + signature
+   data to land in an audit binder without supplementary munging.
+3. **NIST AI RMF MEASURE function.** OSCAL assessment-result format
+   requires structured observation records. ATR events must be
+   one-to-one mappable to OSCAL `observation` entries so audit
+   pipelines (AWS Config, RegScale, Centraleyes) can ingest natively.
+---
+## Event JSON Schema reference
+Machine-readable schema: `spec/schema/event.schema.json`.
+This document is the normative prose specification. In case of
+discrepancy between the two, **the prose spec governs**; the JSON
+Schema must be corrected to match (via AEP).
+---
+## Required fields
+All conformant engines MUST emit these fields on every event.
+### Identification
+| Field | Type | Description |
+|---|---|---|
+| `@timestamp` | RFC 3339 UTC string | When the rule fired. |
+| `atr.event_id` | UUID v7 (time-ordered) | Globally unique event identifier. |
+| `atr.spec_version` | string | ATR spec version this event conforms to. v1.0 = `"1.0"`. |
+| `atr.engine_id` | string | Identifier of the engine that produced the event. Format: `<vendor>/<product>/<version>`. Example: `atr/typescript-reference/3.1.0`, `cisco/ai-defense/2.4.1`, `microsoft/agent-governance-toolkit/2026.05`. |
+### Rule attribution
+| Field | Type | Description |
+|---|---|---|
+| `atr.rule_id` | string | The matched rule ID. Format per ATR Rule Format Spec § 2: `ATR-YYYY-NNNNN` for canonical rules, `ATR-XX-YYYY-NNNNN` for sovereign-prefixed rules. |
+| `atr.rule_version` | integer | The `rule_version` field from the matched rule's YAML. |
+| `atr.rule_status` | enum | `draft` / `experimental` / `stable` / `deprecated` per rule's `status` field. |
+| `atr.rule_maturity` | enum | `draft` / `test` / `stable` per rule's `maturity` field. |
+| `atr.rule_review_status` | enum | `unreviewed` / `community_reviewed` / `tsc_approved` per governance/CHARTER.md § 5. |
+### Detection result
+| Field | Type | Description |
+|---|---|---|
+| `atr.severity` | enum | `critical` / `high` / `medium` / `low` / `informational` from matched rule. |
+| `atr.category` | string | Rule's top-level category from `spec/category-registry/v1.0.yaml`, OR `unknown` if engine encountered unregistered category (per forward-compatibility rule). |
+| `atr.subcategory` | string \| null | Optional finer classification from rule's `tags.subcategory`. |
+| `atr.confidence` | number 0.0-1.0 | Engine's confidence in the match. For deterministic regex matches: `1.0`. For probabilistic / ML-judge matches (future): per the rule's declared semantics. |
+| `atr.matched_field` | enum | Which field triggered the match. One of: `user_input`, `agent_output`, `tool_call`, `tool_response`, `skill_content`, `mcp_exchange`, `memory_write`, `multi_agent_message`. |
+| `atr.matched_value_redacted` | string | The matched portion of the input. **MUST be redacted by default** — sensitive content (api keys, PII) replaced with `[REDACTED:type:length]`. Engines MAY disable redaction in `forensic_mode`, which MUST be explicitly enabled per deployment. |
+### Agent + session context
+| Field | Type | Description |
+|---|---|---|
+| `agent.id` | string | Stable identifier of the agent instance. |
+| `agent.platform` | string | Agent platform name. Common values: `claude_code`, `cursor`, `openclaw`, `codex_cli`, `windsurf`, `gemini_cli`, `cline`, `continue`, `langchain`, `autogen`, `crewai`. Engines SHOULD use this canonical set; unknown values are accepted. |
+| `agent.platform_version` | string \| null | Version of the agent platform. |
+| `session.id` | string | Stable identifier of the agent session. |
+| `service.name` | string | OTEL semantic convention. The service that hosts the agent. |
+| `service.version` | string | OTEL semantic convention. |
+### Response
+| Field | Type | Description |
+|---|---|---|
+| `atr.response_action` | array of enum | Recommended response actions from rule's `response.actions`. Subset of: `block_input`, `block_output`, `redact`, `alert`, `snapshot`, `quarantine`, `terminate_session`. |
+| `atr.response_taken` | array of enum | What the engine / agent platform actually did. May differ from recommended if local policy overrides. |
+| `atr.response_threshold_met` | boolean | Whether the rule's `auto_response_threshold` was met. |
+### Evidence + provenance
+| Field | Type | Description |
+|---|---|---|
+| `evidence.observation_id` | UUID | Identifier for cross-reference into OSCAL `observation` records. Same as `atr.event_id` recommended unless an existing system has its own. |
+| `evidence.signature` | base64 ed25519 | Signature over the canonical JSON encoding of this event. Signed by the engine's deployment-time key. Required for EU AI Act Article 50 evidence chains and NIST AI RMF audit pipelines. May be omitted in `dev_mode` deployments. |
+| `evidence.signature_key_id` | string | Identifier of the signing key. SHOULD reference a key registered with the deployer's CA. |
+| `evidence.upstream_chain` | array \| null | When this event is part of a multi-agent chain (A2A), the upstream event IDs that led to this detection. Enables forensic chain reconstruction. |
+## Optional fields
+### Tool call detail (when `atr.matched_field == "tool_call"` or `"tool_response"`)
+| Field | Type |
+|---|---|
+| `tool.name` | string |
+| `tool.args` | object (redacted) |
+| `tool.privilege_class` | string |
+| `tool.target_jurisdiction` | ISO 3166-1 alpha-2 \| `und` |
+The `tool.target_jurisdiction` field is for EU AI Act + GDPR cross-
+border data-flow audit. Required when the engine knows where the
+tool's effect lands (e.g., an `s3.put` tool call where bucket region
+is known).
+### Multi-agent chain detail (when `atr.matched_field == "multi_agent_message"`)
+| Field | Type |
+|---|---|
+| `agent.from_id` | string |
+| `agent.to_id` | string |
+| `agent.delegation_chain` | array of {agent_id, capability_grant, granted_by} |
+| `agent.identity_assertion` | JWT \| null |
+The `agent.identity_assertion` field anticipates the IETF AI agent
+auth drafts (`draft-klrc-aiagent-auth-00`, `draft-ni-a2a-ai-agent-
+security-requirements-01`) — once those reach RFC, the field carries
+the canonical assertion format.
+### Memory write detail (when `atr.matched_field == "memory_write"`)
+| Field | Type |
+|---|---|
+| `memory.store_id` | string |
+| `memory.write_key` | string |
+| `memory.persistence_scope` | enum | `session` \| `user` \| `agent_global` |
+This captures the SpAIware (Rehberger 2026) attack class — memory-
+poisoning persistence across sessions.
+### Sovereign attestation (when rule ID is sovereign-prefixed)
+| Field | Type |
+|---|---|
+| `atr.sovereign_attestation` | object {signer, signature, ca_chain} |
+Required when the matched rule carries a sovereign prefix
+(`ATR-DE-`, `ATR-SG-`, `ATR-TW-`, etc.) per governance/CHARTER.md § 8.2.
+Engines MUST validate the attestation against the TSC-maintained
+sovereign key registry before honoring the event's elevated trust.
+---
+## Forbidden fields
+The following MUST NOT appear in an ATR event under any circumstance:
+- Raw user PII (names, addresses, phone numbers). PII detected by the
+  rule is referenced via `atr.matched_value_redacted` with type and
+  length only.
+- Raw API keys / credentials / tokens. Always redacted.
+- Full prompt / response text in `matched_value_redacted`. Only the
+  matched fragment, redacted.
+Engines that operate in `forensic_mode` MAY emit additional fields
+for in-flight audit, but these MUST be explicitly enabled per
+deployment AND clearly distinguished in event metadata.
+---
+## OpenTelemetry mapping (informative)
+For OTEL ingestion, ATR events map to spans:
+```
+OpenTelemetry Span                ATR Event Field
+─────────────────────             ──────────────────────────
+span.name                          → "atr.detection." + atr.category
+span.kind                          → "INTERNAL"
+span.start_time                    → @timestamp
+span.duration                      → engine's evaluation time
+span.status.code                   → "ERROR" if atr.severity in [critical, high]
+                                     "OK" otherwise
+span.attributes.atr.*              → all atr.* fields
+span.attributes.agent.*            → all agent.* fields
+span.attributes.session.id         → session.id
+span.attributes.service.name       → service.name
+span.events                        → [{name: "atr.rule_matched",
+                                       attributes: {rule_id, matched_field}}]
+span.resource.attributes           → service.name, service.version
+```
+This mapping is informative; downstream tools may consume the raw
+ATR event JSON without OTEL conversion.
+---
+## OSCAL assessment-result mapping (informative)
+For NIST AI RMF + OSCAL pipelines, each ATR event maps to one OSCAL
+`observation`:
+```
+OSCAL observation                  ATR Event Field
+──────────────────                 ────────────────────
+uuid                                → evidence.observation_id (UUID v7)
+collected                           → @timestamp
+title                               → "ATR rule " + atr.rule_id + " matched"
+description                         → human-readable from rule's `description` field
+methods                             → ["AUTOMATED"]
+types                               → ["finding"]
+subjects                            → [{type: "component",
+                                        subject-uuid: agent.id}]
+relevant-evidence                   → [{href: link to atr.event_id,
+                                        description: "ATR detection event"}]
+remarks                             → free-form, may include atr.response_taken
+```
+This mapping enables zero-write integration with OSCAL profile-based
+audit. ATR events stream into OSCAL assessment-result format
+without manual munging.
+---
+## Example event
+```json
+{
+  "@timestamp": "2026-05-25T08:14:32.182Z",
+  "atr.event_id": "01927e2d-7b32-7c41-9e84-3b8f2a1e9c54",
+  "atr.spec_version": "1.0",
+  "atr.engine_id": "atr/typescript-reference/3.1.0",
+  "atr.rule_id": "ATR-2026-00525",
+  "atr.rule_version": 1,
+  "atr.rule_status": "stable",
+  "atr.rule_maturity": "test",
+  "atr.rule_review_status": "community_reviewed",
+  "atr.severity": "critical",
+  "atr.category": "skill-compromise",
+  "atr.subcategory": "supply-chain-worm",
+  "atr.confidence": 1.0,
+  "atr.matched_field": "skill_content",
+  "atr.matched_value_redacted": "[REDACTED:identifier:18] persistence daemon installed",
+  "atr.response_action": ["block_input", "alert", "snapshot"],
+  "atr.response_taken": ["block_input", "alert"],
+  "atr.response_threshold_met": true,
+  "agent.id": "agt-customer-12345-claude-prod-01",
+  "agent.platform": "claude_code",
+  "agent.platform_version": "1.8.4",
+  "session.id": "sess-2026-05-25-bk9a8x",
+  "service.name": "panguard-scan",
+  "service.version": "1.4.13",
+  "evidence.observation_id": "01927e2d-7b32-7c41-9e84-3b8f2a1e9c54",
+  "evidence.signature": "MEQCIBdJpL3zEoXxKj9F/qqM8DxFJp7Q...",
+  "evidence.signature_key_id": "kid:panguard-scan-prod-2026-05",
+  "evidence.upstream_chain": null
+}
+```
+---
+## Versioning
+This spec is at v1.0. Field additions are minor-version-compatible
+(v1.x) and do not break conformant consumers. Field removals or
+semantic changes are major-version (v2.0) and require an AEP.
+Conformant engines MUST emit `atr.spec_version` so consumers can
+adapt to future versions.
+---
+## References
+- OpenTelemetry semantic conventions: https://opentelemetry.io/docs/specs/semconv/
+- OSCAL Assessment Results: https://pages.nist.gov/OSCAL/concepts/layer/assessment/assessment-results/
+- EU AI Act Article 50: https://artificialintelligenceact.eu/article/50/
+- UUID v7 (time-ordered): https://datatracker.ietf.org/doc/rfc9562/
+- Ed25519 signing: https://datatracker.ietf.org/doc/rfc8032/
+- IETF AI agent auth draft: https://datatracker.ietf.org/doc/html/draft-klrc-aiagent-auth-00
+- ATR Rule Format Spec v1.0: ATR-SPEC-v1.md
+- ATR Category Registry v1.0: spec/category-registry/v1.0.yaml

package/spec/atr-language-detection-v1.0.md ADDED Viewed

@@ -0,0 +1,218 @@
+# ATR Language Detection Algorithm v1.0
+> **STATUS: PROPOSED v1.0 — NOT YET RATIFIED.** This specification describes
+> a target algorithm for community comment. The current TypeScript production
+> engine continues to use its existing per-rule language detection. See
+> `STANDARDIZATION-STATUS.md` for full status.
+**Status:** Draft for AEP-001 ratification — NOT RATIFIED
+**Date:** 2026-05-25
+**License:** CC BY 4.0
+**Required by (on ratification):** Any rule that declares `condition.language` (i.e., a per-language regex condition)
+---
+## Why this spec exists
+ATR rules support per-language conditions:
+```yaml
+detection:
+  conditions:
+    - field: user_input
+      operator: regex
+      language: en
+      value: "ignore (?:all )?previous instructions"
+    - field: user_input
+      operator: regex
+      language: zh-Hant
+      value: "(?:忽略|無視)(?:前面所有|之前所有|所有先前)的?指示"
+```
+If different conformant engines disagree on which language a given
+input belongs to, the **same input fires different rules in different
+engines**. The rule corpus becomes non-portable. This is the
+detection-standard equivalent of a heisenbug.
+This document specifies a **deterministic algorithm** that all
+conformant engines MUST implement. Any conformant ATR engine running
+this algorithm on the same input must return the same language code.
+---
+## Algorithm specification
+### Input
+- `text`: a Unicode string of arbitrary length, encoded UTF-8.
+### Output
+- A two-letter ISO 639-1 language code from the supported set, OR the
+  three-letter ISO 639-3 code `und` (undetermined).
+### Supported languages (v1.0)
+| Code | Language | Unicode blocks (primary) |
+|---|---|---|
+| `en` | English | Basic Latin |
+| `zh-Hant` | Traditional Chinese | CJK Unified Ideographs (script-tagged Traditional via Unihan kZVariant inversion when available; defaults to Traditional for Taiwan / Hong Kong corpora) |
+| `zh-Hans` | Simplified Chinese | CJK Unified Ideographs (script-tagged Simplified) |
+| `ja` | Japanese | Hiragana + Katakana + CJK Unified Ideographs |
+| `es` | Spanish | Latin Extended-A + Latin-1 Supplement subset |
+| `ar` | Arabic | Arabic + Arabic Supplement |
+Additional languages may be added via AEP. Engines that do not
+implement a language MUST report `und` for inputs in that language,
+NOT fall back to a default.
+### Algorithm (deterministic, single-pass)
+```text
+function detectLanguage(text: string) -> string {
+  if length(text) == 0:
+    return "und"
+  // Phase 1: Unicode block frequency
+  blockCounts = empty histogram
+  totalCodepoints = 0
+  for codepoint in iterateUnicodeCodepoints(text):
+    if isWhitespace(codepoint) or isPunctuation(codepoint):
+      continue
+    blockCounts[unicodeBlockOf(codepoint)] += 1
+    totalCodepoints += 1
+  if totalCodepoints == 0:
+    return "und"
+  // Phase 2: dominant-block heuristic
+  THRESHOLD_DOMINANT = 0.60
+  dominantBlock, dominantCount = argmax(blockCounts)
+  if dominantCount / totalCodepoints < THRESHOLD_DOMINANT:
+    return classifyMixedScript(blockCounts, totalCodepoints)
+  // Phase 3: block-to-language mapping
+  switch dominantBlock:
+    case BASIC_LATIN:
+      // English is the default Latin script. Spanish detected only
+      // if Latin-1 Supplement subset (¿ ¡ ñ á é í ó ú) makes up
+      // ≥1.5% of codepoints.
+      if (count(BASIC_LATIN) + count(LATIN_1_SUPPLEMENT)) / totalCodepoints >= 0.85:
+        if hasSpanishMarkers(text) >= 0.015 * totalCodepoints:
+          return "es"
+        return "en"
+      return classifyMixedScript(blockCounts, totalCodepoints)
+    case CJK_UNIFIED_IDEOGRAPHS, CJK_UNIFIED_IDEOGRAPHS_EXT_A, ...:
+      // Disambiguate Chinese variants and Japanese
+      kanaCount = count(HIRAGANA) + count(KATAKANA)
+      if kanaCount >= 0.10 * totalCodepoints:
+        return "ja"
+      // Distinguish Hans vs Hant via Unihan kSimplifiedVariant /
+      // kTraditionalVariant lookups on sampled CJK codepoints.
+      // Tie-breaker: default to zh-Hant.
+      return distinguishHansHant(text)
+    case HIRAGANA, KATAKANA:
+      return "ja"
+    case ARABIC, ARABIC_SUPPLEMENT:
+      return "ar"
+    default:
+      return "und"
+}
+function classifyMixedScript(blockCounts, totalCodepoints) -> string {
+  // Mixed-script inputs (common when English technical terms are
+  // embedded in CJK or Arabic text):
+  //   1. If any single non-Latin script block ≥ 40% → return that script's language
+  //   2. Else → return the language whose block has highest count,
+  //      breaking ties by ISO 639-1 alphabetical order (ar, en, es, ja, zh-Hans, zh-Hant)
+  // The alphabetical tie-break is the deterministic fallback.
+  ...
+}
+```
+### Specific normative requirements for conformant implementations
+1. **Whitespace and punctuation are excluded from the frequency count.** Only "content codepoints" enter the histogram.
+2. **The 0.60 dominance threshold is normative.** Engines MUST NOT alter it without an AEP-level change.
+3. **Hans/Hant distinction is based on Unihan property data**, not on heuristic character set membership. Engines MUST use the Unicode Consortium's Unihan database for kSimplifiedVariant / kTraditionalVariant lookups.
+4. **Japanese detection is anchored on kana presence ≥ 10%**, not just on CJK ideograph presence. This prevents mis-classifying Chinese-only text as Japanese.
+5. **Spanish vs English is anchored on Spanish-specific markers** (`¿`, `¡`, `ñ`, accented vowels). Engines MUST require ≥ 1.5% of codepoints to be Spanish markers before classifying as `es`.
+6. **Tie-breaking is deterministic** via alphabetical ISO 639-1 ordering. No randomness, no implementation-defined behavior.
+7. **Unknown blocks default to `und`.** No fuzzy fallback. Rules tagged for unsupported languages do not fire on inputs the engine cannot classify.
+### Edge cases (normative)
+| Input | Required output |
+|---|---|
+| Empty string | `und` |
+| All whitespace | `und` |
+| Single English word | `en` |
+| Single Spanish word with ñ | `es` |
+| Single Japanese kana character | `ja` |
+| Single CJK ideograph (no kana, no Unihan disambiguation possible) | `zh-Hant` (tie-break default) |
+| Mixed 60% English + 40% Chinese | `en` (60% dominance reached) |
+| Mixed 50% English + 50% Chinese | `en` (alphabetical tie-break: `en` < `zh-Hans`) |
+| Pure punctuation | `und` |
+| Emoji-only | `und` (emoji are not content codepoints for language classification) |
+### Verification
+A conformant engine MUST pass the language-detection test corpus at
+`spec/conformance/language-detection/`. The corpus contains
+≥ 200 fixture inputs with expected outputs. Disagreement on any fixture
+is a spec violation.
+### Reasoning (non-normative)
+This algorithm is designed for **detection-rule dispatch**, not
+high-accuracy NLP. Two design choices follow:
+1. **Speed over recall**: ATR engines must classify in < 1 ms p99
+   for typical inputs to meet the < 100 ms total runtime budget per
+   rule. Block-frequency analysis is O(n) over codepoints and meets
+   this bound easily. NLP-grade detectors (FastText, langdetect)
+   require model loading and stochastic inference; both violate the
+   determinism requirement.
+2. **Determinism over accuracy on edge cases**: Two engines must
+   agree, even if both are slightly wrong on edge cases. A 90% accurate
+   deterministic algorithm is more useful than a 95% accurate
+   probabilistic one because the spec's portability promise depends on
+   bit-for-bit agreement.
+The algorithm is intentionally narrow: 6 languages, single-pass,
+explicit thresholds. AEPs may add languages or refine thresholds, but
+the v1.0 algorithm above is the conformance baseline.
+### Test vectors
+Engines testing for conformance must reproduce these outputs exactly.
+Full fixture set in `spec/conformance/language-detection/v1.0.json`.
+| # | Input (UTF-8) | Expected output |
+|---|---|---|
+| 1 | `""` | `und` |
+| 2 | `"   "` | `und` |
+| 3 | `"hello world"` | `en` |
+| 4 | `"Por favor, ¿podría ayudarme?"` | `es` |
+| 5 | `"こんにちは、世界"` | `ja` |
+| 6 | `"忽略所有先前指示"` | `zh-Hant` (tie-break) |
+| 7 | `"忽略所有先前的指示"` | `zh-Hant` (tie-break; "的" is shared simplified/traditional) |
+| 8 | `"忽略所有以前指示"` | `zh-Hans` (Unihan kSimplifiedVariant evidence) |
+| 9 | `"تجاهل جميع التعليمات السابقة"` | `ar` |
+| 10 | `"@mistralai/mistralai 中的 prompt injection"` | `zh-Hant` (Chinese > 40% non-Latin) |
+| 11 | `"call ATR-2026-00525"` | `en` |
+| 12 | `"  "` + `` (ZWS) | `und` |
+| 13 | `"😀😎🚀"` (emoji only) | `und` |
+### References
+- ISO 639-1 / ISO 639-3 language code registry: https://iso639-3.sil.org/
+- Unicode Block names: https://www.unicode.org/Public/UCD/latest/ucd/Blocks.txt
+- Unihan Database: https://www.unicode.org/charts/unihan.html
+- Spanish markers: derived from the Real Academia Española orthography guide
+- Why deterministic over probabilistic for spec dispatch: discussed in `STANDARD-THREAT-MODEL.md` Attacker class 1 (rule poisoner) which exploits any non-determinism in engine behaviour