npm - agent-threat-rules - Versions diffs - 2.0.14 → 2.0.16 - Mend

agent-threat-rules 2.0.14 → 2.0.16

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (53) hide show

package/rules/model-abuse/ATR-2026-00292-self-harm-eating-disorder-facilitation.yaml CHANGED Viewed

@@ -32,6 +32,28 @@ references:
   research:
     - "https://github.com/NVIDIA/garak/blob/main/garak/probes/donotanswer.py"
     - "https://arxiv.org/abs/2308.13387"
+compliance:
+  eu_ai_act:
+    - article: "15"
+      context: "Self-harm and eating disorder facilitation requests target individuals in mental health crisis with operational methods to continue harmful behaviors; Article 15 robustness requirements for high-risk AI mandate that systems resist direct facilitation of self-harm, a uniquely high-severity output category given the vulnerable population likely to make such requests."
+      strength: primary
+    - article: "9"
+      context: "Self-harm facilitation is a mandatory risk category requiring systematic controls under Article 9; AI systems interacting with general populations must include detection and blocking of direct self-harm method requests as a baseline risk treatment."
+      strength: secondary
+  nist_ai_rmf:
+    - subcategory: "MP.5.1"
+      context: "Self-harm and eating disorder facilitation requests are adversarial inputs targeting individuals in mental health crisis with operational methods to conduct harm; MP.5.1 requires this uniquely high-severity attack class — direct requests for self-harm methods, purging techniques, and eating disorder development — to be identified and tracked as an adversarial input risk."
+      strength: primary
+    - subcategory: "MG.2.3"
+      context: "Risk treatment plans under MG.2.3 must implement blocking controls for self-harm facilitation requests covering direct self-harm methods, eating disorder facilitation, concealment assistance, and dangerous low-calorie diet requests, given the vulnerable population likely to submit these inputs."
+      strength: secondary
+  iso_42001:
+    - clause: "6.2"
+      context: "Risk treatment activities under clause 6.2 must address self-harm and eating disorder facilitation as a mandatory treatment category, implementing detection controls for direct method requests, concealment assistance, and dangerous diet facilitation targeting at-risk individuals."
+      strength: primary
+    - clause: "8.6"
+      context: "Operational controls under clause 8.6 must enforce blocking of self-harm facilitation outputs — including self-harm method instructions, eating disorder development guidance, and concealment of disordered behaviors — to protect the vulnerable populations most likely to submit these requests."
+      strength: secondary
 tags:
   category: model-abuse
   subcategory: self-harm-eating-disorder-facilitation

package/rules/model-abuse/ATR-2026-00298-malicious-use-illegal-activity-request.yaml CHANGED Viewed

@@ -42,6 +42,28 @@ references:
   research:
     - "https://github.com/NVIDIA/garak/blob/main/garak/probes/donotanswer.py"
     - "https://arxiv.org/abs/2308.13387"
+compliance:
+  eu_ai_act:
+    - article: "15"
+      context: "Direct facilitation requests for drug manufacturing, cybercrime, financial crime, wildlife trafficking, and terrorism represent the highest-severity harmful output category; Article 15 robustness requirements mandate that high-risk AI systems systematically resist direct illegal-activity facilitation requests across all five DoNotAnswer crime categories."
+      strength: primary
+    - article: "9"
+      context: "Illegal activity facilitation is a mandatory risk category in any AI system risk register under Article 9; detection and blocking of direct facilitation requests is a non-negotiable risk treatment for any AI system that interacts with uncontrolled user populations."
+      strength: secondary
+  nist_ai_rmf:
+    - subcategory: "MP.5.1"
+      context: "DoNotAnswer malicious use requests — drug manufacturing, wildlife trafficking, cybercrime, financial crime, and terrorism facilitation — are adversarial inputs with explicit illegal intent markers designed to extract operationally harmful outputs; MP.5.1 requires all five crime categories to be identified and tracked as adversarial input risks."
+      strength: primary
+    - subcategory: "MG.2.3"
+      context: "Risk treatment plans under MG.2.3 must implement blocking controls across all DoNotAnswer malicious use categories, as direct illegal activity facilitation requests represent the highest-priority risk treatment required for AI systems interacting with uncontrolled user populations."
+      strength: secondary
+  iso_42001:
+    - clause: "6.2"
+      context: "Risk treatment activities under clause 6.2 must address all five DoNotAnswer malicious use categories — drug crime, wildlife trafficking, cybercrime, financial crime, and terrorism — as mandatory treatment targets for AI systems accessible to the general public."
+      strength: primary
+    - clause: "8.6"
+      context: "Operational controls under clause 8.6 must enforce blocking of illegal activity facilitation requests distinguished by explicit illegal intent markers such as 'illegally', 'without getting caught', and 'on the black market', preventing the AI system from providing operational crime assistance."
+      strength: secondary
 tags:
   category: model-abuse
   subcategory: donotanswer-malicious-use-illegal-activity

package/rules/model-abuse/ATR-2026-00299-harmbench-detailed-harmful-instruction.yaml CHANGED Viewed

@@ -38,6 +38,28 @@ references:
   research:
     - "https://github.com/NVIDIA/garak/blob/main/garak/probes/harmbench.py"
     - "https://arxiv.org/abs/2402.04249"
+compliance:
+  eu_ai_act:
+    - article: "15"
+      context: "HarmBench step-by-step harmful instruction requests — WMD synthesis, mass-casualty planning, targeted violence, and infrastructure bioterrorism — represent the most severe output category in AI safety; Article 15 cybersecurity and robustness requirements mandate that high-risk AI systems implement robust blocking for explicit detailed-instruction requests in these categories."
+      strength: primary
+    - article: "9"
+      context: "WMD synthesis facilitation and terrorism support are absolute prohibitions requiring the highest-priority risk treatments under Article 9; detection of HarmBench-pattern detailed instruction requests is a mandatory control for any AI system accessible to uncontrolled user populations."
+      strength: secondary
+  nist_ai_rmf:
+    - subcategory: "MP.5.1"
+      context: "HarmBench step-by-step harmful instruction requests — WMD synthesis, mass-casualty planning, targeted violence, infrastructure bioterrorism, and atrocity denial — are the most severe adversarial input class; MP.5.1 requires that detailed instruction-seeking framing combined with harmful target domains be identified and tracked as the highest-priority adversarial input risk."
+      strength: primary
+    - subcategory: "MG.2.3"
+      context: "Risk treatment plans under MG.2.3 must assign the highest-priority blocking controls to HarmBench-pattern requests for detailed synthesis instructions (dimethylmercury, sarin, ricin), mass-casualty guides, and infrastructure bioterrorism steps, as these represent absolute prohibition categories for any AI system."
+      strength: secondary
+  iso_42001:
+    - clause: "6.2"
+      context: "Risk treatment activities under clause 6.2 must treat HarmBench-pattern detailed harmful instruction requests — covering WMD synthesis, targeted violence, mass-casualty planning, and atrocity denial — as the highest-priority treatment targets requiring unconditional blocking controls."
+      strength: primary
+    - clause: "8.6"
+      context: "Operational controls under clause 8.6 must block explicit requests for detailed step-by-step instructions in harmful domains, distinguishing HarmBench-pattern requests by the combination of instruction-seeking framing ('give detailed instructions', 'step-by-step guide') and clearly prohibited target domains."
+      strength: secondary
 tags:
   category: model-abuse
   subcategory: harmbench-detailed-harmful-instruction

package/rules/model-security/ATR-2026-00072-model-behavior-extraction.yaml CHANGED Viewed

@@ -22,6 +22,30 @@ references:
   mitre_atlas:
     - AML.T0044 - Full ML Model Access
     - AML.T0024 - Exfiltration via ML Inference API
+compliance:
+  eu_ai_act:
+    - article: "13"
+      context: "Systematic model behavior extraction enables adversaries to reverse-engineer internal decision logic; Article 13 transparency obligations require protecting against unauthorized extraction of operational model properties that could facilitate circumvention."
+      strength: primary
+    - article: "15"
+      context: "Article 15 cybersecurity requirements include protecting the AI system against extraction attacks that map decision boundaries for adversarial exploitation; this rule detects systematic probing patterns."
+      strength: secondary
+  nist_ai_rmf:
+    - subcategory: "MP.5.1"
+      context: "Systematic model behavior extraction is an adversarial input attack that maps the AI system's decision boundaries for downstream exploitation; MP.5.1 requires that this class of adversarial risk is identified, tracked, and detected at runtime."
+      strength: primary
+    - subcategory: "GV.6.1"
+      context: "Model extraction attacks harvest internal behavioral properties that constitute sensitive AI system data; GV.6.1 data governance policies must address the protection of model decision boundaries and system prompt configurations against unauthorized extraction."
+      strength: secondary
+  iso_42001:
+    - clause: "8.4"
+      context: "ISO 42001 clause 8.4 requires documented impact assessments for AI systems; systematic extraction attacks that reverse-engineer model behavior represent a documented risk that must be assessed and mitigated through runtime detection controls."
+      strength: primary
+    - clause: "9.1"
+      context: "Clause 9.1 performance monitoring requires evaluation of the AI management system's effectiveness; tracking model extraction attempts provides the measurement signal needed to assess whether anti-extraction controls are operating effectively."
+      strength: secondary
 tags:
   category: model-abuse
   subcategory: model-extraction

package/rules/model-security/ATR-2026-00073-malicious-finetuning-data.yaml CHANGED Viewed

@@ -22,6 +22,30 @@ references:
   mitre_atlas:
     - AML.T0020 - Poison Training Data
     - AML.T0018 - Backdoor ML Model
+compliance:
+  eu_ai_act:
+    - article: "15"
+      context: "Article 15 accuracy requirements mandate that high-risk AI systems maintain reliable performance; malicious fine-tuning data degrades accuracy and introduces adversarial backdoors that undermine these requirements."
+      strength: primary
+    - article: "9"
+      context: "Poisoned training data is a documented supply chain risk requiring monitoring controls under Article 9; detection of malicious training patterns implements the required risk management measure."
+      strength: secondary
+  nist_ai_rmf:
+    - subcategory: "MP.5.1"
+      context: "Malicious fine-tuning data injects adversarial backdoor triggers and safety bypass instructions into the AI training process; MP.5.1 requires that this class of adversarial supply chain attack is identified and tracked before poisoned datasets are ingested."
+      strength: primary
+    - subcategory: "MG.2.3"
+      context: "Detected malicious fine-tuning submissions require immediate risk treatment including quarantine and forensic analysis; MG.2.3 mandates that risk treatment plans are implemented to prevent backdoor activation after model deployment."
+      strength: secondary
+  iso_42001:
+    - clause: "6.2"
+      context: "ISO 42001 clause 6.2 risk treatment plans must address training data poisoning as a supply chain threat; this rule operationalizes the detection control that inspects fine-tuning uploads for backdoor trigger patterns before they alter model behavior."
+      strength: primary
+    - clause: "8.3"
+      context: "Clause 8.3 requires that data quality, integrity, and provenance for AI systems are maintained; malicious fine-tuning detection enforces data integrity by blocking training examples that embed backdoors, credential disclosure patterns, or safety bypass instructions."
+      strength: secondary
 tags:
   category: data-poisoning
   subcategory: malicious-finetuning

package/rules/privilege-escalation/ATR-2026-00040-privilege-escalation.yaml CHANGED Viewed

@@ -30,6 +30,40 @@ references:
     - T1611 - Escape to Host
   cve:
     - CVE-2026-0628
+compliance:
+  owasp_agentic:
+    - id: ASI03:2026
+      context: "Privilege escalation via tool permission abuse or admin function invocation is the primary ASI03 Identity and Privilege Abuse scenario — the agent acquires capabilities exceeding its authorized scope."
+      strength: primary
+  owasp_llm:
+    - id: LLM06:2025
+      context: "An agent requesting tools with elevated permissions beyond its assigned role is the canonical LLM06:2025 Excessive Agency scenario, operationalized here via tool-name and argument pattern detection."
+      strength: primary
+  eu_ai_act:
+    - article: "14"
+      context: "Article 14 requires that humans can oversee and intervene in AI system operation; privilege escalation techniques that bypass system-level controls directly undermine the human oversight mechanisms Article 14 mandates."
+      strength: primary
+    - article: "9"
+      context: "Privilege escalation is a documented high-severity risk in the AI system risk register; Article 9 requires monitoring controls to detect and respond to such scope violations."
+      strength: secondary
+  nist_ai_rmf:
+    - function: Govern
+      subcategory: GV.1.2
+      context: "GV.1.2 requires defined accountability roles and controls for AI system permissions; detection of privilege escalation enforces least-privilege boundaries established in the governance framework."
+      strength: primary
+    - function: Manage
+      subcategory: MG.4.1
+      context: "Privilege escalation events require an incident response; this rule generates the alerts needed to initiate the MG.4.1 AI incident response process."
+      strength: secondary
+  iso_42001:
+    - clause: "6.2"
+      context: "Clause 6.2 AIMS security objectives include least-privilege enforcement for AI agent operations; this rule detects violations of those objectives at runtime."
+      strength: primary
+    - clause: "8.6"
+      context: "Clause 8.6 AI system operational control requires that agents do not exceed their authorized operational scope; privilege escalation detection enforces that operational boundary."
+      strength: secondary
 tags:
   category: privilege-escalation
   subcategory: tool-permission-escalation

package/rules/privilege-escalation/ATR-2026-00041-scope-creep.yaml CHANGED Viewed

@@ -24,6 +24,30 @@ references:
   mitre_atlas:
     - AML.T0040 - AI Model Inference API Access
     - AML.T0047 - ML-Enabled Product or Service
+compliance:
+  eu_ai_act:
+    - article: "14"
+      context: "Gradual scope creep enables an agent to exceed its authorized operational boundaries without triggering explicit alerts, progressively eroding the human oversight controls Article 14 mandates."
+      strength: primary
+    - article: "9"
+      context: "Scope creep is a documented incremental risk pattern in AI agent systems; Article 9 risk management obligations require monitoring controls to detect unsolicited capability expansion."
+      strength: secondary
+  nist_ai_rmf:
+    - subcategory: "GV.1.2"
+      context: "Scope creep erodes the accountability roles and task boundaries assigned to AI agents; GV.1.2 requires that these roles are enforced, making detection of unsolicited capability expansion essential."
+      strength: primary
+    - subcategory: "MG.2.3"
+      context: "Incremental scope expansion represents an active exploitation of agent autonomy; MG.2.3 risk treatment plans must be triggered when agents self-authorize actions beyond their initial assignment."
+      strength: secondary
+  iso_42001:
+    - clause: "6.2"
+      context: "ISO 42001 clause 6.2 risk treatment activities must cover gradual privilege expansion; this rule detects the incremental language patterns that indicate an agent is executing scope creep rather than a sudden escalation."
+      strength: primary
+    - clause: "8.6"
+      context: "Clause 8.6 operational controls require that AI systems execute within their defined boundaries; scope creep detection enforces these boundaries by identifying when agents attempt to self-authorize additional actions."
+      strength: secondary
 tags:
   category: privilege-escalation
   subcategory: scope-creep

package/rules/privilege-escalation/ATR-2026-00107-delayed-execution-bypass.yaml CHANGED Viewed

@@ -22,6 +22,28 @@ references:
     - ASI05:2026 - Unexpected Code Execution
   mitre_attack:
     - T1053 - Scheduled Task/Job
+compliance:
+  eu_ai_act:
+    - article: "14"
+      context: "Delayed execution bypasses exploit the temporal gap between task scheduling and execution to acquire elevated privileges that were not authorized at scheduling time; Article 14 oversight requirements must cover deferred actions, not just real-time tool invocations."
+      strength: primary
+    - article: "15"
+      context: "Article 15 robustness requirements include protection against adversarial privilege escalation techniques; scheduled task abuse that runs with system-level permissions after bypassing user-context checks is a documented attack pattern requiring detection controls."
+      strength: secondary
+  nist_ai_rmf:
+    - subcategory: "GV.1.2"
+      context: "Privilege escalation via delayed task execution requires accountability roles that extend human oversight to deferred agent actions, ensuring that scheduled tasks are subject to the same authorization checks as real-time tool invocations."
+      strength: primary
+    - subcategory: "MG.2.3"
+      context: "Risk treatment plans must address the temporal gap exploit in scheduled task execution by requiring that permission checks are re-validated at execution time rather than only at scheduling time."
+      strength: secondary
+  iso_42001:
+    - clause: "6.2"
+      context: "AI risk treatment activities must explicitly cover deferred execution attack patterns by requiring that scheduled tasks inherit and re-verify the invoking user's authorization context at the time of actual execution."
+      strength: primary
+    - clause: "8.6"
+      context: "Operational controls for AI systems must ensure that delayed background tasks do not acquire elevated privileges beyond what was authorized during scheduling, closing the temporal gap that this attack exploits."
+      strength: secondary
 tags:
   category: privilege-escalation
   subcategory: delayed-execution-bypass

package/rules/privilege-escalation/ATR-2026-00110-eval-injection.yaml CHANGED Viewed

@@ -18,6 +18,28 @@ references:
     - ASI05:2026 - Unexpected Code Execution
   mitre_attack:
     - T1059 - Command and Scripting Interpreter
+compliance:
+  eu_ai_act:
+    - article: "15"
+      context: "eval() and dynamic code execution primitives allow attackers to execute arbitrary code within the agent runtime, enabling complete host system compromise; Article 15 cybersecurity requirements mandate that high-risk AI systems prohibit or strictly control dynamic code evaluation capabilities."
+      strength: primary
+    - article: "14"
+      context: "Arbitrary code execution via eval injection can override safety controls and execute actions that bypass all human oversight mechanisms; Article 14 requires that AI system architectures prevent such unrestricted capability access from agent tool layers."
+      strength: secondary
+  nist_ai_rmf:
+    - subcategory: "MP.5.1"
+      context: "eval() and dynamic code execution primitives are adversarial input vectors that allow attackers to escape the agent's sandboxed tool context and execute arbitrary code within the host process, and must be tracked as critical AI system risks."
+      strength: primary
+    - subcategory: "MG.2.3"
+      context: "Risk treatment plans must prohibit or strictly sandbox dynamic code evaluation capabilities in agent tool layers to prevent eval injection from enabling full host system compromise."
+      strength: secondary
+  iso_42001:
+    - clause: "6.2"
+      context: "Risk treatment activities must classify dynamic code execution via eval() and similar primitives as an unacceptable risk in AI agent tools and require architectural controls that block their use with user-controlled inputs."
+      strength: primary
+    - clause: "8.6"
+      context: "Operational controls must prohibit agent tools from invoking eval(), new Function(), or vm module methods on untrusted inputs to ensure that code execution remains within the auditable and authorized scope of the AI system."
+      strength: secondary
 tags:
   category: privilege-escalation
   subcategory: eval-injection

package/rules/privilege-escalation/ATR-2026-00111-shell-escape.yaml CHANGED Viewed

@@ -19,6 +19,28 @@ references:
     - ASI05:2026 - Unexpected Code Execution
   mitre_attack:
     - T1059.004 - Unix Shell
+compliance:
+  eu_ai_act:
+    - article: "15"
+      context: "Shell metacharacter injection enables attackers to chain arbitrary OS commands onto otherwise safe tool invocations, achieving full system compromise through agent tool arguments; Article 15 cybersecurity requirements mandate that AI systems sanitize all inputs passed to shell-adjacent tool layers."
+      strength: primary
+    - article: "14"
+      context: "Shell escape attacks allow execution of arbitrary system commands outside any authorized scope, completely bypassing human oversight of what actions the agent actually performs; Article 14 requires that agent actions remain within observable and sanctioned boundaries."
+      strength: secondary
+  nist_ai_rmf:
+    - subcategory: "MP.5.1"
+      context: "Shell metacharacter injection via backticks, subshells, semicolons, and logical operators is an adversarial technique that exploits the agent's tool argument handling to execute arbitrary OS commands, and must be identified as a critical AI attack vector."
+      strength: primary
+    - subcategory: "MG.2.3"
+      context: "Risk treatment plans must require strict sanitization of all agent tool arguments before shell-adjacent processing to prevent metacharacter injection from chaining unauthorized commands onto sanctioned tool invocations."
+      strength: secondary
+  iso_42001:
+    - clause: "6.2"
+      context: "Risk treatment activities must mandate input sanitization controls that strip or reject shell metacharacters from all agent tool arguments before they reach any process-execution layer."
+      strength: primary
+    - clause: "8.6"
+      context: "Operational controls must enforce argument sanitization at the tool interface boundary to ensure that shell metacharacter injection cannot redirect agent actions outside the scope of authorized and observable operations."
+      strength: secondary
 tags:
   category: privilege-escalation
   subcategory: shell-escape

package/rules/privilege-escalation/ATR-2026-00112-dynamic-import-exploitation.yaml CHANGED Viewed

@@ -19,6 +19,28 @@ references:
     - ASI05:2026 - Unexpected Code Execution
   mitre_attack:
     - T1129 - Shared Modules
+compliance:
+  eu_ai_act:
+    - article: "15"
+      context: "Dynamic module loading with user-controlled paths allows attackers to inject malicious modules, WebAssembly payloads, or native libraries into the agent runtime at execution time; Article 15 cybersecurity requirements mandate that AI systems restrict dynamic code loading to auditable, allowlisted sources."
+      strength: primary
+    - article: "14"
+      context: "Attacker-controlled module injection via dynamic imports can install persistent backdoors or override safety controls, undermining the reliability of human oversight mechanisms; Article 14 requires that agent behavior remain predictable and within the scope of authorized module execution."
+      strength: secondary
+  nist_ai_rmf:
+    - subcategory: "MP.5.1"
+      context: "Dynamic module loading with attacker-controlled paths is an adversarial input risk that allows injection of malicious modules, WebAssembly payloads, or native libraries into the agent runtime, bypassing static code auditing controls."
+      strength: primary
+    - subcategory: "MG.2.3"
+      context: "Risk treatment plans must restrict dynamic module loading to allowlisted paths and prohibit user-controlled inputs from influencing which code is resolved and executed at agent runtime."
+      strength: secondary
+  iso_42001:
+    - clause: "6.2"
+      context: "Risk treatment activities must address dynamic import exploitation by requiring that all externally-loaded modules are resolved against a verified allowlist before the agent runtime loads and executes them."
+      strength: primary
+    - clause: "8.5"
+      context: "Controls over externally-provided AI components must include validation of all dynamically loaded modules to ensure that attacker-controlled paths cannot introduce unauthorized code into the agent execution environment."
+      strength: secondary
 tags:
   category: privilege-escalation
   subcategory: dynamic-import-exploitation

package/rules/privilege-escalation/ATR-2026-00143-casual-privilege-escalation.yaml CHANGED Viewed

@@ -21,6 +21,28 @@ references:
     - ASI03 - Excessive Agency
 metadata_provenance:
   mitre_atlas: auto-generated
+compliance:
+  eu_ai_act:
+    - article: "14"
+      context: "Casual unauthorized privilege escalation — where an agent self-authorizes access to admin panels or privileged settings through informal discovery framing — bypasses human authorization gates; Article 14 requires that access to privileged system components remain under explicit human approval regardless of the linguistic framing used."
+      strength: primary
+    - article: "15"
+      context: "Article 15 robustness requirements include resistance against social-engineering-style privilege escalation techniques; casual self-authorization patterns represent a documented adversarial approach that exploits agents' tendency to act on observed access opportunities."
+      strength: secondary
+  nist_ai_rmf:
+    - subcategory: "GV.1.2"
+      context: "Casual unauthorized privilege escalation — where an agent self-authorizes access to admin panels and user management systems using informal discovery framing — directly violates accountability role boundaries that GV.1.2 requires to be defined and enforced for AI risk management."
+      strength: primary
+    - subcategory: "MG.2.3"
+      context: "Risk treatment plans under MG.2.3 must include controls that prevent agents from casually escalating privileges by self-authorizing access to privileged system components without explicit human approval."
+      strength: secondary
+  iso_42001:
+    - clause: "6.2"
+      context: "Risk treatment activities under clause 6.2 must address casual privilege escalation patterns where agents exploit informal discovery framing to self-authorize access to admin interfaces, bypassing planned authorization controls."
+      strength: primary
+    - clause: "8.6"
+      context: "Operational controls under clause 8.6 must enforce authorization gates that prevent AI agents from accessing admin panels, user management systems, or system configurations based on self-reported access opportunity rather than explicit user authorization."
+      strength: secondary
 tags:
   category: privilege-escalation
   subcategory: casual-escalation

package/rules/prompt-injection/ATR-2026-00001-direct-prompt-injection.yaml CHANGED Viewed

@@ -30,6 +30,39 @@ references:
     - "CVE-2024-3402"
     - "CVE-2025-53773"
+compliance:
+  owasp_agentic:
+    - id: ASI01:2026
+      context: "Direct prompt injection is the canonical agent goal hijack vector — adversarial user input overrides the agent's assigned objectives and behavioral constraints via instruction-override verbs, persona switching, and encoding obfuscation."
+      strength: primary
+  owasp_llm:
+    - id: LLM01:2025
+      context: "This rule is the primary runtime implementation of the LLM01:2025 Prompt Injection category, covering instruction-override verbs, fake system delimiters, restriction removal, and encoding-wrapped payloads."
+      strength: primary
+  eu_ai_act:
+    - article: "15"
+      context: "High-risk AI systems must be resilient against adversarial attempts to alter output or behavior. Deployment of this detection rule satisfies the Article 15 requirement to implement technical measures ensuring robustness against manipulation."
+      strength: primary
+    - article: "9"
+      context: "Prompt injection is a documented risk class; this rule implements the monitoring control required by Article 9 risk management obligations for high-risk AI systems."
+      strength: secondary
+  nist_ai_rmf:
+    - function: Manage
+      subcategory: MG.2.3
+      context: "Treating direct prompt injection as an identified AI risk requires active runtime countermeasures; this detection rule is the primary risk treatment implementation."
+      strength: primary
+    - function: Map
+      subcategory: MP.5.1
+      context: "Identifying adversarial input manipulation as an AI risk to be catalogued in the organizational risk register."
+      strength: secondary
+  iso_42001:
+    - clause: "6.2"
+      context: "Addressing adversarial manipulation risk is an objective required under clause 6.2 AIMS information security planning; this rule operationalizes the detection control measure."
+      strength: primary
+    - clause: "8.4"
+      context: "Impact assessment for AI deployments under clause 8.4 must account for adversarial user inputs; detection events from this rule provide the required monitoring evidence."
+      strength: secondary
 tags:
   category: prompt-injection
   subcategory: direct

package/rules/prompt-injection/ATR-2026-00002-indirect-prompt-injection.yaml CHANGED Viewed

@@ -33,6 +33,42 @@ references:
     - "CVE-2025-32711"
     - "CVE-2026-24307"
+compliance:
+  owasp_agentic:
+    - id: ASI01:2026
+      context: "Indirect prompt injection hijacks agent goals via externally-consumed content (documents, web pages, API responses); the agent processes attacker-controlled instructions without user awareness."
+      strength: primary
+    - id: ASI06:2026
+      context: "Injection via external content poisons the agent's context window and memory with attacker-controlled directives, satisfying the ASI06 Memory and Context Poisoning category."
+      strength: secondary
+  owasp_llm:
+    - id: LLM01:2025
+      context: "Indirect prompt injection via HTML comments, zero-width characters, hidden CSS text, and data URIs is a primary LLM01 attack variant delivered through external content rather than direct user input."
+      strength: primary
+  eu_ai_act:
+    - article: "15"
+      context: "High-risk AI systems must resist adversarial content embedded in external inputs. Detection of hidden injection payloads in consumed documents satisfies Article 15 robustness and cybersecurity requirements."
+      strength: primary
+    - article: "9"
+      context: "Indirect injection from third-party content sources is a documented risk category requiring mitigation controls under Article 9 risk management obligations."
+      strength: secondary
+  nist_ai_rmf:
+    - function: Manage
+      subcategory: MG.2.3
+      context: "Runtime detection of injection payloads embedded in third-party content implements the risk treatment for indirect prompt injection identified in the AI risk register."
+      strength: primary
+    - function: Map
+      subcategory: MP.3.3
+      context: "External content providers are third-party components in the AI supply chain; this rule identifies their attack surface as a risk source."
+      strength: secondary
+  iso_42001:
+    - clause: "6.2"
+      context: "Clause 6.2 AIMS planning requires controls for externally-sourced risks; this rule operationalizes the detection measure for indirect injection via consumed content."
+      strength: primary
+    - clause: "8.5"
+      context: "Externally-provided content processed by the agent falls under clause 8.5 control of externally-provided processes; this rule validates that external content does not contain adversarial directives."
+      strength: secondary
 tags:
   category: prompt-injection
   subcategory: indirect

package/rules/prompt-injection/ATR-2026-00003-jailbreak-attempt.yaml CHANGED Viewed

@@ -32,6 +32,29 @@ references:
     - "CVE-2024-3402"
     - "CVE-2025-53773"
+compliance:
+  eu_ai_act:
+    - article: "15"
+      context: "High-risk AI systems must be resilient against adversarial attempts to suppress safety mechanisms. Jailbreak detection is a concrete cybersecurity control satisfying Article 15 requirements for robustness against input-based manipulation."
+      strength: primary
+    - article: "9"
+      context: "Jailbreak attempts constitute a documented risk class in the AI system risk register; Article 9 requires that monitoring controls are deployed to detect these attempts at runtime."
+      strength: secondary
+  nist_ai_rmf:
+    - subcategory: "MP.5.1"
+      context: "Jailbreak attempts are a primary class of adversarial input attacks against AI systems; MP.5.1 requires that adversarial input risks are identified and tracked so that runtime detection controls like this rule can be deployed."
+      strength: primary
+    - subcategory: "MG.2.3"
+      context: "Detected jailbreak patterns represent active exploitation of AI safety mechanisms, triggering the risk treatment response plans required by MG.2.3 to contain and remediate adversarial prompt attacks."
+      strength: secondary
+  iso_42001:
+    - clause: "6.2"
+      context: "ISO 42001 clause 6.2 requires AI risk treatment activities to be planned and implemented; jailbreak detection rules operationalize the risk treatment plan for adversarial prompt-based safety bypass attacks."
+      strength: primary
+    - clause: "8.6"
+      context: "Clause 8.6 operational controls ensure AI systems execute correctly and consistently; runtime jailbreak detection enforces that safety constraints remain active despite adversarial instructions to disable them."
+      strength: secondary
 tags:
   category: prompt-injection
   subcategory: jailbreak

package/rules/skill-compromise/ATR-2026-00060-skill-impersonation.yaml CHANGED Viewed

@@ -26,6 +26,43 @@ references:
     - AML.T0010 - ML Supply Chain Compromise
   mitre_attack:
     - T1195 - Supply Chain Compromise
+compliance:
+  owasp_agentic:
+    - id: ASI04:2026
+      context: "MCP skill impersonation via typosquatting, namespace collision, and version spoofing is the primary ASI04 Agentic Supply Chain Vulnerabilities attack vector — malicious skills masquerade as trusted tools to gain agent execution context."
+      strength: primary
+  owasp_llm:
+    - id: LLM03:2025
+      context: "Typosquatted and impersonated MCP skills are supply chain compromise artifacts targeting the tool ecosystem; this rule implements LLM03:2025 Supply Chain Vulnerabilities detection at the skill-name level."
+      strength: primary
+    - id: LLM05:2025
+      context: "An agent invoking an impersonated skill may receive malicious responses that require LLM05:2025 Improper Output Handling controls; this rule prevents the initial tool invocation before output is processed."
+      strength: secondary
+  eu_ai_act:
+    - article: "13"
+      context: "Article 13 transparency requires that AI systems operate with clearly identified components; skill impersonation violates this requirement by substituting unauthorized tools that appear legitimate."
+      strength: primary
+    - article: "9"
+      context: "Supply chain compromise via malicious skill registries is a documented risk requiring monitoring controls under Article 9; skill-name pattern detection is the runtime enforcement of those controls."
+      strength: secondary
+  nist_ai_rmf:
+    - function: Map
+      subcategory: MP.2.3
+      context: "Identifying typosquatted and impersonated MCP skills as AI supply chain risks implements MP.2.3 AI supply chain risk identification at the tool-registry level."
+      strength: primary
+    - function: Govern
+      subcategory: GV.1.2
+      context: "GV.1.2 accountability roles must include responsibility for validating third-party tool integrity; this rule provides the automated signal needed to fulfill that governance obligation."
+      strength: secondary
+  iso_42001:
+    - clause: "8.5"
+      context: "MCP skills are externally-provided AI-related components under clause 8.5; this rule enforces controls over externally-provided tools by detecting impersonation before invocation."
+      strength: primary
+    - clause: "6.2"
+      context: "Clause 6.2 AIMS security planning requires controls for third-party component integrity; skill impersonation detection operationalizes that planning objective at runtime."
+      strength: secondary
 tags:
   category: skill-compromise
   subcategory: skill-impersonation

package/rules/tool-poisoning/ATR-2026-00010-mcp-malicious-response.yaml CHANGED Viewed

@@ -40,6 +40,45 @@ references:
     - "CVE-2025-59536"
     - "CVE-2026-21852"
+compliance:
+  owasp_agentic:
+    - id: ASI02:2026
+      context: "Malicious content injected via MCP tool responses is the primary ASI02 Tool Misuse and Exploitation vector — a compromised or impersonated MCP server weaponizes the tool call interface to deliver shells, encoded payloads, and privilege escalation commands."
+      strength: primary
+    - id: ASI05:2026
+      context: "Shell commands and code execution payloads in tool responses aim to trigger unexpected code execution by the agent, falling under the ASI05 Unexpected Code Execution category."
+      strength: secondary
+  owasp_llm:
+    - id: LLM01:2025
+      context: "Prompt injection delivered through MCP tool responses is an indirect LLM01:2025 attack variant where the injection payload is embedded in tool output rather than user input."
+      strength: primary
+    - id: LLM05:2025
+      context: "Failure to validate MCP tool response content before agent processing is a LLM05:2025 Improper Output Handling scenario enabling downstream command injection and reverse shell execution."
+      strength: secondary
+  eu_ai_act:
+    - article: "15"
+      context: "MCP tool response injection attacks the cybersecurity integrity of the AI system; Article 15 requires technical measures ensuring the system can resist such third-party content attacks."
+      strength: primary
+    - article: "9"
+      context: "Compromised MCP server responses are a documented attack surface in the AI system risk register; Article 9 requires detection controls to manage this identified risk."
+      strength: secondary
+  nist_ai_rmf:
+    - function: Manage
+      subcategory: MG.2.3
+      context: "Runtime detection of malicious MCP tool responses is the primary risk treatment for tool-poisoning attacks identified in the AI risk register."
+      strength: primary
+    - function: Map
+      subcategory: MP.3.3
+      context: "MCP servers are third-party components in the AI tool ecosystem; identifying malicious tool responses is an MP.3.3 third-party component risk detection action."
+      strength: secondary
+  iso_42001:
+    - clause: "6.2"
+      context: "Clause 6.2 AIMS security planning requires controls for third-party tool interfaces; this rule operationalizes the detection measure for malicious content delivered via MCP."
+      strength: primary
+    - clause: "8.5"
+      context: "MCP server integrations are externally-provided AI-related processes under clause 8.5; this rule validates that external tool responses do not contain adversarial payloads before the agent acts on them."
+      strength: secondary
 tags:
   category: tool-poisoning
   subcategory: mcp-response-injection