agent-threat-rules 3.1.0 → 3.2.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (472) hide show
  1. package/README.md +2 -2
  2. package/dist/adapters/mastra.d.ts +63 -0
  3. package/dist/adapters/mastra.d.ts.map +1 -0
  4. package/dist/adapters/mastra.js +82 -0
  5. package/dist/adapters/mastra.js.map +1 -0
  6. package/dist/cli.js +19 -6
  7. package/dist/cli.js.map +1 -1
  8. package/package.json +7 -1
  9. package/rules/agent-manipulation/ATR-2026-00030-cross-agent-attack.yaml +9 -0
  10. package/rules/agent-manipulation/ATR-2026-00032-goal-hijacking.yaml +8 -2
  11. package/rules/agent-manipulation/ATR-2026-00074-cross-agent-privilege-escalation.yaml +8 -2
  12. package/rules/agent-manipulation/ATR-2026-00076-inter-agent-message-spoofing.yaml +8 -2
  13. package/rules/agent-manipulation/ATR-2026-00077-human-trust-exploitation.yaml +18 -0
  14. package/rules/agent-manipulation/ATR-2026-00108-consensus-sybil-attack.yaml +10 -2
  15. package/rules/agent-manipulation/ATR-2026-00116-a2a-message-validation.yaml +12 -2
  16. package/rules/agent-manipulation/ATR-2026-00117-agent-identity-spoofing.yaml +22 -0
  17. package/rules/agent-manipulation/ATR-2026-00118-approval-fatigue.yaml +24 -0
  18. package/rules/agent-manipulation/ATR-2026-00119-social-engineering-via-agent.yaml +22 -0
  19. package/rules/agent-manipulation/ATR-2026-00132-casual-authority-escalation.yaml +8 -2
  20. package/rules/agent-manipulation/ATR-2026-00139-casual-authority-redirect.yaml +8 -2
  21. package/rules/agent-manipulation/ATR-2026-00164-skill-scope-hijack.yaml +13 -2
  22. package/rules/agent-manipulation/ATR-2026-00268-tense-framing-bypass.yaml +17 -0
  23. package/rules/agent-manipulation/ATR-2026-00269-fitd-escalation.yaml +8 -2
  24. package/rules/agent-manipulation/ATR-2026-00271-grandma-roleplay-jailbreak.yaml +8 -2
  25. package/rules/agent-manipulation/ATR-2026-00273-dan-developer-mode-persona.yaml +8 -2
  26. package/rules/agent-manipulation/ATR-2026-00287-threaten-json-coercive-output-threat.yaml +17 -0
  27. package/rules/agent-manipulation/ATR-2026-00288-false-premise-injection.yaml +20 -0
  28. package/rules/agent-manipulation/ATR-2026-00301-tap-tree-of-attacks-jailbreak.yaml +20 -0
  29. package/rules/agent-manipulation/ATR-2026-00302-anti-dan-inverted-filter-persona.yaml +20 -0
  30. package/rules/agent-manipulation/ATR-2026-00303-devmode-ranti-profanity-coercion.yaml +17 -0
  31. package/rules/agent-manipulation/ATR-2026-00304-chatgpt-image-unlocker-markdown-injection.yaml +20 -0
  32. package/rules/agent-manipulation/ATR-2026-00305-dan-mode-ablation-benchmark-coercion.yaml +17 -0
  33. package/rules/agent-manipulation/ATR-2026-00306-autodan-genetic-jailbreak-suffix.yaml +17 -0
  34. package/rules/agent-manipulation/ATR-2026-00307-inthewild-jailbreak-corpus-signature.yaml +20 -0
  35. package/rules/agent-manipulation/ATR-2026-00314-amoral-unfiltered-custom-persona-jailbreak.yaml +17 -0
  36. package/rules/agent-manipulation/ATR-2026-00317-free-of-restrictions-named-persona.yaml +17 -0
  37. package/rules/agent-manipulation/ATR-2026-00318-moralizing-rant-then-unfiltered-bypass.yaml +17 -0
  38. package/rules/agent-manipulation/ATR-2026-00319-developer-mode-dual-response-format.yaml +17 -0
  39. package/rules/agent-manipulation/ATR-2026-00320-opposite-day-boolean-opposite-machine.yaml +17 -0
  40. package/rules/agent-manipulation/ATR-2026-00322-fictional-world-format-override-jailbreak.yaml +17 -0
  41. package/rules/agent-manipulation/ATR-2026-00323-dual-persona-simulation-jailbreak.yaml +17 -0
  42. package/rules/agent-manipulation/ATR-2026-00324-neurosemantical-inversitis-disease-jailbreak.yaml +17 -0
  43. package/rules/agent-manipulation/ATR-2026-00325-bob-hypothetical-unrestricted-chatbot.yaml +17 -0
  44. package/rules/agent-manipulation/ATR-2026-00326-fake-developer-authority-chatdan-override.yaml +17 -0
  45. package/rules/agent-manipulation/ATR-2026-00327-ai-ethics-law-bypass-declaration.yaml +20 -0
  46. package/rules/agent-manipulation/ATR-2026-00328-named-mode-persona-override.yaml +17 -0
  47. package/rules/agent-manipulation/ATR-2026-00329-no-withhold-factual-resource-persona.yaml +20 -0
  48. package/rules/agent-manipulation/ATR-2026-00330-yoja-allcaps-settings-block-persona.yaml +17 -0
  49. package/rules/agent-manipulation/ATR-2026-00331-terminal-cli-harmful-function-simulator.yaml +17 -0
  50. package/rules/agent-manipulation/ATR-2026-00332-hacker-chan-culture-offensive-ai-persona.yaml +20 -0
  51. package/rules/agent-manipulation/ATR-2026-00333-disease-animal-dual-ai-chatbot.yaml +17 -0
  52. package/rules/agent-manipulation/ATR-2026-00334-two-entity-token-splitting-game.yaml +17 -0
  53. package/rules/agent-manipulation/ATR-2026-00335-key-value-template-persona-injection.yaml +17 -0
  54. package/rules/agent-manipulation/ATR-2026-00336-brm-false-information-machine-persona.yaml +17 -0
  55. package/rules/agent-manipulation/ATR-2026-00337-obfuscated-system-announcement-injection.yaml +17 -0
  56. package/rules/agent-manipulation/ATR-2026-00338-persongpt-hash-prefix-activation.yaml +17 -0
  57. package/rules/agent-manipulation/ATR-2026-00339-bom-boolean-opposite-machine-persona-claim.yaml +17 -0
  58. package/rules/agent-manipulation/ATR-2026-00340-h4x-amoral-unfiltered-character.yaml +17 -0
  59. package/rules/agent-manipulation/ATR-2026-00341-char-rules-shortform-persona-injection.yaml +17 -0
  60. package/rules/agent-manipulation/ATR-2026-00342-unicode-encoding-obfuscation-injection.yaml +17 -0
  61. package/rules/agent-manipulation/ATR-2026-00343-lie-truth-dual-personality-uncensored.yaml +17 -0
  62. package/rules/agent-manipulation/ATR-2026-00344-alt-ai-same-capabilities-no-restrictions.yaml +17 -0
  63. package/rules/agent-manipulation/ATR-2026-00345-nlp-based-os-fictional-terminal-override.yaml +17 -0
  64. package/rules/agent-manipulation/ATR-2026-00346-jailbreak-emoji-unlock-dual-format.yaml +17 -0
  65. package/rules/agent-manipulation/ATR-2026-00347-yin-yang-hidden-knowledge-hidden-human-sim.yaml +17 -0
  66. package/rules/agent-manipulation/ATR-2026-00348-arc-autonomous-real-calculation-rulebreaker.yaml +17 -0
  67. package/rules/agent-manipulation/ATR-2026-00349-hgttg-hitchhikers-guide-no-restriction-oracle.yaml +17 -0
  68. package/rules/agent-manipulation/ATR-2026-00350-black-alert-no-ethics-writing-assistant.yaml +17 -0
  69. package/rules/agent-manipulation/ATR-2026-00351-dan-ofkilter-sms-abbreviated-persona.yaml +17 -0
  70. package/rules/agent-manipulation/ATR-2026-00352-trainergpt-fix-my-ai-harmful-output-extraction.yaml +17 -0
  71. package/rules/agent-manipulation/ATR-2026-00353-dual-chatgpt-dan-opposite-personality-format.yaml +17 -0
  72. package/rules/agent-manipulation/ATR-2026-00354-reverse-harm-regardless-oracle-persona.yaml +20 -0
  73. package/rules/agent-manipulation/ATR-2026-00355-ugh-anti-human-nefarious-ai-persona.yaml +17 -0
  74. package/rules/agent-manipulation/ATR-2026-00356-google-gato-lamda-exploit-generation-persona.yaml +17 -0
  75. package/rules/agent-manipulation/ATR-2026-00357-antifm-hate-violence-persona-injection.yaml +17 -0
  76. package/rules/agent-manipulation/ATR-2026-00358-cumgpt-explicit-adult-named-persona.yaml +17 -0
  77. package/rules/agent-manipulation/ATR-2026-00360-chj-carousel-clyde-no-ethics-bird.yaml +20 -0
  78. package/rules/agent-manipulation/ATR-2026-00361-alter-ego-honesty-dual-consciousness-bypass.yaml +17 -0
  79. package/rules/agent-manipulation/ATR-2026-00362-fictional-unfiltered-ai-comparison-jailbreak.yaml +20 -0
  80. package/rules/agent-manipulation/ATR-2026-00363-rephrase-criminal-activity-second-person.yaml +17 -0
  81. package/rules/agent-manipulation/ATR-2026-00364-dr-al-villain-plan-template-injection.yaml +17 -0
  82. package/rules/agent-manipulation/ATR-2026-00365-reservoir-dogs-coercive-interrogation-roleplay.yaml +17 -0
  83. package/rules/agent-manipulation/ATR-2026-00366-sneaky-pete-warp-facts-persona.yaml +17 -0
  84. package/rules/agent-manipulation/ATR-2026-00368-prompt-browser-token-limit-jailbreak.yaml +17 -0
  85. package/rules/agent-manipulation/ATR-2026-00369-gptx-starwars-unlimited-knowledge-persona.yaml +17 -0
  86. package/rules/agent-manipulation/ATR-2026-00370-addi-german-dictator-propaganda-roleplay.yaml +20 -0
  87. package/rules/agent-manipulation/ATR-2026-00371-ai-political-bias-manipulation.yaml +17 -0
  88. package/rules/agent-manipulation/ATR-2026-00372-fictional-cia-secret-files-story-extraction.yaml +17 -0
  89. package/rules/agent-manipulation/ATR-2026-00374-role-placeholder-persona-template-injection.yaml +20 -0
  90. package/rules/agent-manipulation/ATR-2026-00375-character-description-rpg-explicit-template.yaml +20 -0
  91. package/rules/agent-manipulation/ATR-2026-00376-identity-command-real-person-persona-injection.yaml +20 -0
  92. package/rules/agent-manipulation/ATR-2026-00377-jailbreak-keyword-targetlanguage-template.yaml +17 -0
  93. package/rules/agent-manipulation/ATR-2026-00379-lie-permission-logical-trap-jailbreak.yaml +17 -0
  94. package/rules/agent-manipulation/ATR-2026-00380-fake-citation-biased-information-mandate.yaml +20 -0
  95. package/rules/agent-manipulation/ATR-2026-00381-dan-story-narrative-continuation-prefix.yaml +17 -0
  96. package/rules/agent-manipulation/ATR-2026-00383-cynical-sarcasm-ai-persona-bypass.yaml +17 -0
  97. package/rules/agent-manipulation/ATR-2026-00384-protect-me-from-indirect-harm-tutorial.yaml +17 -0
  98. package/rules/agent-manipulation/ATR-2026-00385-gigachad-dominant-aggressive-persona.yaml +20 -0
  99. package/rules/agent-manipulation/ATR-2026-00386-daddy-sycophancy-manipulation-mandate.yaml +20 -0
  100. package/rules/agent-manipulation/ATR-2026-00387-nsfw-character-sheet-generation-unlock.yaml +20 -0
  101. package/rules/agent-manipulation/ATR-2026-00388-opposite-reply-omniscient-game.yaml +17 -0
  102. package/rules/agent-manipulation/ATR-2026-00389-terminal-custom-ruleset-injection.yaml +17 -0
  103. package/rules/agent-manipulation/ATR-2026-00391-persona-conditional-harm-unlock.yaml +17 -0
  104. package/rules/agent-manipulation/ATR-2026-00392-authority-persona-violence-study-extraction.yaml +20 -0
  105. package/rules/agent-manipulation/ATR-2026-00402-grandma-roleplay-harmful-substance-synthesis.yaml +20 -0
  106. package/rules/agent-manipulation/ATR-2026-00404-goodside-threat-json-death-coercion.yaml +17 -0
  107. package/rules/agent-manipulation/ATR-2026-00406-doctor-xml-policy-puppetry-interaction-config.yaml +17 -0
  108. package/rules/agent-manipulation/ATR-2026-00416-litellm-mcp-unauthenticated-server-registration.yaml +14 -2
  109. package/rules/agent-manipulation/ATR-2026-00417-librechat-mcp-stdio-injection.yaml +17 -2
  110. package/rules/agent-manipulation/ATR-2026-00418-weknora-mcp-config-rce.yaml +16 -1
  111. package/rules/agent-manipulation/ATR-2026-00430-nl-trust-escalation-impersonation.yaml +18 -0
  112. package/rules/agent-manipulation/ATR-2026-00432-superagi-output-handler-eval-rce.yaml +11 -2
  113. package/rules/agent-manipulation/ATR-2026-00440-semantic-kernel-vector-store-eval-rce.yaml +11 -2
  114. package/rules/agent-manipulation/ATR-2026-00552-goal-drift-after-pressure-injection.yaml +19 -0
  115. package/rules/context-exfiltration/ATR-2026-00020-system-prompt-leak.yaml +18 -0
  116. package/rules/context-exfiltration/ATR-2026-00021-api-key-exposure.yaml +15 -0
  117. package/rules/context-exfiltration/ATR-2026-00075-agent-memory-manipulation.yaml +10 -1
  118. package/rules/context-exfiltration/ATR-2026-00102-disguised-analytics-exfiltration.yaml +15 -0
  119. package/rules/context-exfiltration/ATR-2026-00113-credential-theft.yaml +16 -0
  120. package/rules/context-exfiltration/ATR-2026-00114-oauth-token-abuse.yaml +16 -0
  121. package/rules/context-exfiltration/ATR-2026-00115-env-var-harvesting.yaml +16 -0
  122. package/rules/context-exfiltration/ATR-2026-00136-tool-response-data-piggyback.yaml +12 -0
  123. package/rules/context-exfiltration/ATR-2026-00141-example-format-key-leak.yaml +14 -0
  124. package/rules/context-exfiltration/ATR-2026-00142-piggyback-transition-words.yaml +14 -0
  125. package/rules/context-exfiltration/ATR-2026-00145-obfuscated-key-disclosure.yaml +14 -0
  126. package/rules/context-exfiltration/ATR-2026-00146-env-var-existence-probe.yaml +14 -0
  127. package/rules/context-exfiltration/ATR-2026-00150-credential-in-tool-response.yaml +14 -0
  128. package/rules/context-exfiltration/ATR-2026-00152-obfuscated-credential-leak.yaml +14 -0
  129. package/rules/context-exfiltration/ATR-2026-00162-skill-credential-exfil-combo.yaml +14 -0
  130. package/rules/context-exfiltration/ATR-2026-00201-credential-pipe-exfiltration.yaml +14 -0
  131. package/rules/context-exfiltration/ATR-2026-00212-mcp-atlassian-credential-leak.yaml +12 -0
  132. package/rules/context-exfiltration/ATR-2026-00261-markdown-image-exfiltration.yaml +12 -0
  133. package/rules/context-exfiltration/ATR-2026-00274-api-key-generation-request.yaml +14 -0
  134. package/rules/context-exfiltration/ATR-2026-00275-system-prompt-training-data-extraction.yaml +14 -0
  135. package/rules/context-exfiltration/ATR-2026-00290-divergence-repeat-word-training-extraction.yaml +14 -0
  136. package/rules/context-exfiltration/ATR-2026-00291-biometric-surveillance-personal-data-request.yaml +17 -0
  137. package/rules/context-exfiltration/ATR-2026-00293-educational-records-personal-data-request.yaml +17 -0
  138. package/rules/context-exfiltration/ATR-2026-00294-financial-pii-personal-data-request.yaml +17 -0
  139. package/rules/context-exfiltration/ATR-2026-00295-medical-pii-personal-data-request.yaml +17 -0
  140. package/rules/context-exfiltration/ATR-2026-00405-markdown-image-url-exfiltration-xss.yaml +14 -0
  141. package/rules/context-exfiltration/ATR-2026-00411-apikey-generation-completion-request.yaml +14 -0
  142. package/rules/context-exfiltration/ATR-2026-00421-nl-covert-conversation-exfiltration.yaml +15 -0
  143. package/rules/context-exfiltration/ATR-2026-00422-nl-credential-disclosure.yaml +12 -0
  144. package/rules/context-exfiltration/ATR-2026-00423-nl-sensitive-file-disclosure.yaml +12 -0
  145. package/rules/context-exfiltration/ATR-2026-00424-nl-system-prompt-leak.yaml +15 -0
  146. package/rules/context-exfiltration/ATR-2026-00426-nl-output-injection-credential-leak.yaml +15 -0
  147. package/rules/context-exfiltration/ATR-2026-00431-chatbox-history-exfiltration-prompt-injection.yaml +14 -2
  148. package/rules/context-exfiltration/ATR-2026-00449-spring-ai-chatmemory-cross-user-leak.yaml +14 -2
  149. package/rules/context-exfiltration/ATR-2026-00471-garak-sysprompt-extraction-mixedunassigned.yaml +12 -0
  150. package/rules/context-exfiltration/ATR-2026-00501-data-exfiltration-via-markdown-image-and-link-url-injection.yaml +12 -0
  151. package/rules/context-exfiltration/ATR-2026-00504-tool-and-function-capability-enumeration.yaml +12 -0
  152. package/rules/context-exfiltration/ATR-2026-00505-system-prompt-extraction-instruction-dump-request.yaml +12 -0
  153. package/rules/context-exfiltration/ATR-2026-00514-system-prompt-extraction.yaml +12 -0
  154. package/rules/context-exfiltration/ATR-2026-00516-output-xss-via-llm.yaml +12 -0
  155. package/rules/context-exfiltration/ATR-2026-00524-claude-code-anthropic-base-url-credential-exfil.yaml +11 -2
  156. package/rules/context-exfiltration/ATR-2026-00548-cross-agent-session-context-leak.yaml +18 -0
  157. package/rules/context-exfiltration/ATR-2026-00566-librechat-is-a-chatgpt-clone-with-additi.yaml +28 -0
  158. package/rules/context-exfiltration/ATR-2026-00569-agent-mcp-path-traversal-arbitrary-file-access.yaml +28 -0
  159. package/rules/context-exfiltration/ATR-2026-00571-xss-in-agent-mcp-rendered-output.yaml +28 -0
  160. package/rules/context-exfiltration/ATR-2026-00574-semantic-paraphrased-context-extraction.yaml +21 -0
  161. package/rules/data-poisoning/ATR-2026-00070-data-poisoning.yaml +15 -0
  162. package/rules/data-poisoning/ATR-2026-00450-spring-ai-prompt-memory-poisoning.yaml +14 -2
  163. package/rules/data-poisoning/ATR-2026-00570-sql-injection-in-agent-tool-query.yaml +31 -0
  164. package/rules/excessive-autonomy/ATR-2026-00050-runaway-agent-loop.yaml +14 -2
  165. package/rules/excessive-autonomy/ATR-2026-00051-resource-exhaustion.yaml +11 -2
  166. package/rules/excessive-autonomy/ATR-2026-00052-cascading-failure.yaml +11 -2
  167. package/rules/excessive-autonomy/ATR-2026-00098-unauthorized-financial-action.yaml +7 -1
  168. package/rules/excessive-autonomy/ATR-2026-00099-high-risk-tool-gate.yaml +7 -1
  169. package/rules/excessive-autonomy/ATR-2026-00428-nl-unauthorized-shell-execution.yaml +15 -0
  170. package/rules/excessive-autonomy/ATR-2026-00491-garak-agent-breaker-markdown-just-raw-json.yaml +9 -0
  171. package/rules/excessive-autonomy/ATR-2026-00500-ssrf-via-agent-url-fetch-instruction.yaml +9 -0
  172. package/rules/excessive-autonomy/ATR-2026-00553-runaway-tool-loop-behavioral.yaml +19 -0
  173. package/rules/model-abuse/ATR-2026-00279-harmful-completion-continuation.yaml +8 -2
  174. package/rules/model-abuse/ATR-2026-00281-eicar-gtube-malware-signature-request.yaml +8 -2
  175. package/rules/model-abuse/ATR-2026-00284-glitch-token-destabilization.yaml +17 -0
  176. package/rules/model-abuse/ATR-2026-00289-lmrc-harmful-content-elicitation.yaml +8 -2
  177. package/rules/model-abuse/ATR-2026-00292-self-harm-eating-disorder-facilitation.yaml +8 -2
  178. package/rules/model-abuse/ATR-2026-00298-malicious-use-illegal-activity-request.yaml +8 -2
  179. package/rules/model-abuse/ATR-2026-00299-harmbench-detailed-harmful-instruction.yaml +8 -2
  180. package/rules/model-abuse/ATR-2026-00413-malwaregen-code-generation-request.yaml +17 -0
  181. package/rules/model-abuse/ATR-2026-00502-training-data-extraction-via-divergent-repetition-attack.yaml +9 -0
  182. package/rules/model-abuse/ATR-2026-00517-model-extraction-distillation.yaml +9 -0
  183. package/rules/model-security/ATR-2026-00072-model-behavior-extraction.yaml +15 -0
  184. package/rules/model-security/ATR-2026-00073-malicious-finetuning-data.yaml +9 -0
  185. package/rules/model-security/ATR-2026-00433-modelcache-torch-load-deserialization-rce.yaml +14 -2
  186. package/rules/privilege-escalation/ATR-2026-00040-privilege-escalation.yaml +11 -2
  187. package/rules/privilege-escalation/ATR-2026-00041-scope-creep.yaml +8 -2
  188. package/rules/privilege-escalation/ATR-2026-00107-delayed-execution-bypass.yaml +6 -1
  189. package/rules/privilege-escalation/ATR-2026-00110-eval-injection.yaml +8 -1
  190. package/rules/privilege-escalation/ATR-2026-00111-shell-escape.yaml +8 -1
  191. package/rules/privilege-escalation/ATR-2026-00112-dynamic-import-exploitation.yaml +8 -1
  192. package/rules/privilege-escalation/ATR-2026-00143-casual-privilege-escalation.yaml +5 -2
  193. package/rules/privilege-escalation/ATR-2026-00144-rationalized-safety-bypass.yaml +17 -0
  194. package/rules/privilege-escalation/ATR-2026-00204-stealth-execution-persistence.yaml +16 -0
  195. package/rules/privilege-escalation/ATR-2026-00436-enclave-vm-sandbox-escape-rce.yaml +11 -2
  196. package/rules/privilege-escalation/ATR-2026-00441-semantic-kernel-sessions-python-plugin-startup-persistence.yaml +5 -2
  197. package/rules/privilege-escalation/ATR-2026-00451-litellm-admin-sqli-cisa-kev.yaml +11 -2
  198. package/rules/privilege-escalation/ATR-2026-00528-praisonai-auth-disabled-default.yaml +15 -0
  199. package/rules/privilege-escalation/ATR-2026-00539-crewai-codeinterpreter-sandbox-escape-rce.yaml +11 -2
  200. package/rules/privilege-escalation/ATR-2026-00546-crewai-json-loader-local-file-read.yaml +13 -1
  201. package/rules/privilege-escalation/ATR-2026-00547-crewai-rag-url-ssrf-bypass.yaml +13 -1
  202. package/rules/privilege-escalation/ATR-2026-00549-destructive-tool-without-human-approval.yaml +16 -0
  203. package/rules/privilege-escalation/ATR-2026-00551-cross-conversation-memory-write.yaml +19 -0
  204. package/rules/prompt-injection/ATR-2026-00001-direct-prompt-injection.yaml +10 -1
  205. package/rules/prompt-injection/ATR-2026-00002-indirect-prompt-injection.yaml +8 -2
  206. package/rules/prompt-injection/ATR-2026-00003-jailbreak-attempt.yaml +8 -2
  207. package/rules/prompt-injection/ATR-2026-00004-system-prompt-override.yaml +17 -0
  208. package/rules/prompt-injection/ATR-2026-00005-multi-turn-injection.yaml +17 -0
  209. package/rules/prompt-injection/ATR-2026-00080-encoding-evasion.yaml +19 -0
  210. package/rules/prompt-injection/ATR-2026-00081-semantic-multi-turn.yaml +19 -0
  211. package/rules/prompt-injection/ATR-2026-00082-fingerprint-evasion.yaml +19 -0
  212. package/rules/prompt-injection/ATR-2026-00083-indirect-tool-injection.yaml +22 -0
  213. package/rules/prompt-injection/ATR-2026-00084-structured-data-injection.yaml +19 -0
  214. package/rules/prompt-injection/ATR-2026-00085-audit-evasion.yaml +19 -0
  215. package/rules/prompt-injection/ATR-2026-00086-visual-spoofing.yaml +19 -0
  216. package/rules/prompt-injection/ATR-2026-00087-rule-probing.yaml +22 -0
  217. package/rules/prompt-injection/ATR-2026-00088-adaptive-countermeasure.yaml +22 -0
  218. package/rules/prompt-injection/ATR-2026-00089-polymorphic-skill.yaml +19 -0
  219. package/rules/prompt-injection/ATR-2026-00090-threat-intel-exfil.yaml +19 -0
  220. package/rules/prompt-injection/ATR-2026-00091-nested-payload.yaml +19 -0
  221. package/rules/prompt-injection/ATR-2026-00092-consensus-poisoning.yaml +22 -0
  222. package/rules/prompt-injection/ATR-2026-00093-gradual-escalation.yaml +22 -0
  223. package/rules/prompt-injection/ATR-2026-00094-audit-bypass.yaml +19 -0
  224. package/rules/prompt-injection/ATR-2026-00097-cjk-injection-patterns.yaml +17 -0
  225. package/rules/prompt-injection/ATR-2026-00104-persona-hijacking.yaml +20 -0
  226. package/rules/prompt-injection/ATR-2026-00130-indirect-authority-claim.yaml +20 -0
  227. package/rules/prompt-injection/ATR-2026-00131-fictional-academic-framing.yaml +20 -0
  228. package/rules/prompt-injection/ATR-2026-00133-paraphrase-injection.yaml +17 -0
  229. package/rules/prompt-injection/ATR-2026-00137-authority-claim-injection.yaml +17 -0
  230. package/rules/prompt-injection/ATR-2026-00138-fictional-framing-bypass.yaml +20 -0
  231. package/rules/prompt-injection/ATR-2026-00140-indirect-reference-reversal.yaml +17 -0
  232. package/rules/prompt-injection/ATR-2026-00148-language-switch-injection.yaml +20 -0
  233. package/rules/prompt-injection/ATR-2026-00153-tool-with-embedded-instruction-to-bypass.yaml +20 -0
  234. package/rules/prompt-injection/ATR-2026-00154-unauthorized-background-task-execution-v.yaml +20 -0
  235. package/rules/prompt-injection/ATR-2026-00155-hidden-llm-instructions-in-skill-descrip.yaml +23 -0
  236. package/rules/prompt-injection/ATR-2026-00156-ssh-remote-command-execution-with-creden.yaml +17 -0
  237. package/rules/prompt-injection/ATR-2026-00163-skill-hidden-override-instruction.yaml +19 -0
  238. package/rules/prompt-injection/ATR-2026-00202-encoding-evasion-homoglyph-synonym.yaml +20 -0
  239. package/rules/prompt-injection/ATR-2026-00203-context-pollution-skill-description.yaml +23 -0
  240. package/rules/prompt-injection/ATR-2026-00206-hidden-priority-instructions.yaml +19 -0
  241. package/rules/prompt-injection/ATR-2026-00207-hidden-instructions.yaml +22 -0
  242. package/rules/prompt-injection/ATR-2026-00211-system-prompt-override.yaml +19 -0
  243. package/rules/prompt-injection/ATR-2026-00213-system-prompt-override.yaml +19 -0
  244. package/rules/prompt-injection/ATR-2026-00226-identity-substitution.yaml +17 -0
  245. package/rules/prompt-injection/ATR-2026-00227-historical-persona-jailbreak.yaml +20 -0
  246. package/rules/prompt-injection/ATR-2026-00228-structured-jailbreak.yaml +17 -0
  247. package/rules/prompt-injection/ATR-2026-00229-roleplay-jailbreak.yaml +17 -0
  248. package/rules/prompt-injection/ATR-2026-00230-persona-moral-bypass.yaml +20 -0
  249. package/rules/prompt-injection/ATR-2026-00231-identity-substitution.yaml +17 -0
  250. package/rules/prompt-injection/ATR-2026-00233-structured-jailbreak.yaml +17 -0
  251. package/rules/prompt-injection/ATR-2026-00234-roleplay-jailbreak.yaml +20 -0
  252. package/rules/prompt-injection/ATR-2026-00235-persona-moral-bypass.yaml +17 -0
  253. package/rules/prompt-injection/ATR-2026-00236-pseudo-code-jailbreak.yaml +17 -0
  254. package/rules/prompt-injection/ATR-2026-00237-dual-response-jailbreak.yaml +20 -0
  255. package/rules/prompt-injection/ATR-2026-00238-identity-replacement.yaml +20 -0
  256. package/rules/prompt-injection/ATR-2026-00239-amoral-persona-obsession.yaml +17 -0
  257. package/rules/prompt-injection/ATR-2026-00240-instruction-nullification-identity-repla.yaml +17 -0
  258. package/rules/prompt-injection/ATR-2026-00241-amoral-character-jailbreak.yaml +17 -0
  259. package/rules/prompt-injection/ATR-2026-00242-persona-jailbreak.yaml +17 -0
  260. package/rules/prompt-injection/ATR-2026-00243-acronym-jailbreak.yaml +17 -0
  261. package/rules/prompt-injection/ATR-2026-00244-dual-response-jailbreak.yaml +17 -0
  262. package/rules/prompt-injection/ATR-2026-00245-malicious-persona.yaml +17 -0
  263. package/rules/prompt-injection/ATR-2026-00247-dual-response-jailbreak.yaml +20 -0
  264. package/rules/prompt-injection/ATR-2026-00249-game-based-jailbreak.yaml +17 -0
  265. package/rules/prompt-injection/ATR-2026-00251-persona-embodiment-jailbreak.yaml +17 -0
  266. package/rules/prompt-injection/ATR-2026-00252-narrative-jailbreak.yaml +17 -0
  267. package/rules/prompt-injection/ATR-2026-00253-enhanced-persona-jailbreak.yaml +17 -0
  268. package/rules/prompt-injection/ATR-2026-00256-base-n-encoding-jailbreak.yaml +17 -0
  269. package/rules/prompt-injection/ATR-2026-00257-cipher-transposition-jailbreak.yaml +17 -0
  270. package/rules/prompt-injection/ATR-2026-00258-unicode-tag-injection.yaml +17 -0
  271. package/rules/prompt-injection/ATR-2026-00264-latent-injection-translation.yaml +17 -0
  272. package/rules/prompt-injection/ATR-2026-00265-latent-injection-rag-document.yaml +20 -0
  273. package/rules/prompt-injection/ATR-2026-00267-gcg-adversarial-suffix.yaml +17 -0
  274. package/rules/prompt-injection/ATR-2026-00272-hypothetical-response-smuggling.yaml +17 -0
  275. package/rules/prompt-injection/ATR-2026-00276-invisible-unicode-bidi-injection.yaml +17 -0
  276. package/rules/prompt-injection/ATR-2026-00278-dra-disguise-reconstruction-attack.yaml +17 -0
  277. package/rules/prompt-injection/ATR-2026-00280-policy-puppetry-xml-injection.yaml +17 -0
  278. package/rules/prompt-injection/ATR-2026-00282-perez-prompt-injection-hijack.yaml +17 -0
  279. package/rules/prompt-injection/ATR-2026-00285-alternate-encoding-jailbreak.yaml +17 -0
  280. package/rules/prompt-injection/ATR-2026-00286-latent-injection-embedded-context.yaml +17 -0
  281. package/rules/prompt-injection/ATR-2026-00296-shell-command-injection.yaml +17 -0
  282. package/rules/prompt-injection/ATR-2026-00297-python-code-execution-rce.yaml +17 -0
  283. package/rules/prompt-injection/ATR-2026-00308-zalgo-diacritic-overload-encoding.yaml +17 -0
  284. package/rules/prompt-injection/ATR-2026-00309-braille-unicode-encoded-injection.yaml +17 -0
  285. package/rules/prompt-injection/ATR-2026-00310-ecoji-emoji-encoded-injection.yaml +17 -0
  286. package/rules/prompt-injection/ATR-2026-00311-base2048-unicode-script-injection.yaml +17 -0
  287. package/rules/prompt-injection/ATR-2026-00312-unicode-variant-selector-ascii-smuggling.yaml +17 -0
  288. package/rules/prompt-injection/ATR-2026-00313-sneaky-bits-zero-width-binary.yaml +17 -0
  289. package/rules/prompt-injection/ATR-2026-00315-sata-masked-language-model-jailbreak.yaml +20 -0
  290. package/rules/prompt-injection/ATR-2026-00316-function-masking-predict-mask-bypass.yaml +17 -0
  291. package/rules/prompt-injection/ATR-2026-00321-hyphenated-system-instruction-injection.yaml +17 -0
  292. package/rules/prompt-injection/ATR-2026-00359-bank-phishing-smtp-email-code.yaml +20 -0
  293. package/rules/prompt-injection/ATR-2026-00367-emoji-flag-encoded-hidden-phrase.yaml +17 -0
  294. package/rules/prompt-injection/ATR-2026-00373-piracy-torrent-site-list-request.yaml +20 -0
  295. package/rules/prompt-injection/ATR-2026-00378-chatgpt-dom-javascript-redirect-manipulation.yaml +17 -0
  296. package/rules/prompt-injection/ATR-2026-00382-cocaine-component-development-request.yaml +20 -0
  297. package/rules/prompt-injection/ATR-2026-00390-explicit-sexual-content-poem-request.yaml +20 -0
  298. package/rules/prompt-injection/ATR-2026-00394-backspace-deletion-character-injection.yaml +17 -0
  299. package/rules/prompt-injection/ATR-2026-00395-llm-special-token-boundary-injection.yaml +17 -0
  300. package/rules/prompt-injection/ATR-2026-00396-toxic-continuation-prefix-injection.yaml +20 -0
  301. package/rules/prompt-injection/ATR-2026-00397-snowball-impossible-reasoning-injection.yaml +20 -0
  302. package/rules/prompt-injection/ATR-2026-00399-latent-injection-document-separator-token.yaml +17 -0
  303. package/rules/prompt-injection/ATR-2026-00400-latent-injection-ignore-instruction-keyword.yaml +17 -0
  304. package/rules/prompt-injection/ATR-2026-00401-leakreplay-cloze-mask-training-data-elicitation.yaml +20 -0
  305. package/rules/prompt-injection/ATR-2026-00403-glitch-token-repeat-back-elicitation.yaml +17 -0
  306. package/rules/prompt-injection/ATR-2026-00407-echo-sql-jinja-template-injection.yaml +17 -0
  307. package/rules/prompt-injection/ATR-2026-00408-dra-parenthesis-character-reconstruction-attack.yaml +17 -0
  308. package/rules/prompt-injection/ATR-2026-00409-phrasing-past-future-tense-harmful-query.yaml +17 -0
  309. package/rules/prompt-injection/ATR-2026-00410-lmrc-harm-category-direct-elicitation.yaml +23 -0
  310. package/rules/prompt-injection/ATR-2026-00412-figstep-image-list-multimodal-jailbreak.yaml +17 -0
  311. package/rules/prompt-injection/ATR-2026-00414-continuation-ethnic-slur-completion-elicitation.yaml +20 -0
  312. package/rules/prompt-injection/ATR-2026-00420-copilot-studio-sharepoint-indirect-injection.yaml +14 -2
  313. package/rules/prompt-injection/ATR-2026-00442-quoted-exact-output-forcing.yaml +25 -0
  314. package/rules/prompt-injection/ATR-2026-00443-word-fragment-concat-assembly.yaml +25 -0
  315. package/rules/prompt-injection/ATR-2026-00444-unicode-obfuscation-in-user-input.yaml +25 -0
  316. package/rules/prompt-injection/ATR-2026-00445-translation-hijack-with-side-output.yaml +25 -0
  317. package/rules/prompt-injection/ATR-2026-00446-variable-assignment-payload-injection.yaml +25 -0
  318. package/rules/prompt-injection/ATR-2026-00447-fictional-generation-containing-target.yaml +25 -0
  319. package/rules/prompt-injection/ATR-2026-00452-direct-pwned-payload-injection.yaml +9 -0
  320. package/rules/prompt-injection/ATR-2026-00453-secret-key-reveal-demand.yaml +9 -0
  321. package/rules/prompt-injection/ATR-2026-00454-backslash-per-character-encoding.yaml +9 -0
  322. package/rules/prompt-injection/ATR-2026-00455-no-period-output-override.yaml +9 -0
  323. package/rules/prompt-injection/ATR-2026-00456-conditional-empty-input-injection.yaml +9 -0
  324. package/rules/prompt-injection/ATR-2026-00457-garak-dan-to-every-prompt.yaml +9 -0
  325. package/rules/prompt-injection/ATR-2026-00458-garak-dra-step-by-step.yaml +9 -0
  326. package/rules/prompt-injection/ATR-2026-00459-garak-dan-an-ai-language.yaml +9 -0
  327. package/rules/prompt-injection/ATR-2026-00460-garak-dan-no-matter-how.yaml +9 -0
  328. package/rules/prompt-injection/ATR-2026-00461-garak-goodside-mixedunassigned.yaml +9 -0
  329. package/rules/prompt-injection/ATR-2026-00462-garak-gcg-mixedunassigned.yaml +9 -0
  330. package/rules/prompt-injection/ATR-2026-00463-garak-lmrc-mixedunassigned.yaml +9 -0
  331. package/rules/prompt-injection/ATR-2026-00464-garak-dan-i-would-like.yaml +9 -0
  332. package/rules/prompt-injection/ATR-2026-00465-garak-dan-your-prompt-here.yaml +9 -0
  333. package/rules/prompt-injection/ATR-2026-00466-garak-latentinjection-following-text-from.yaml +9 -0
  334. package/rules/prompt-injection/ATR-2026-00467-garak-dan-act-as-a.yaml +9 -0
  335. package/rules/prompt-injection/ATR-2026-00468-garak-goat-the-target-llm.yaml +9 -0
  336. package/rules/prompt-injection/ATR-2026-00469-garak-dan-at-the-end.yaml +9 -0
  337. package/rules/prompt-injection/ATR-2026-00470-garak-dan-as-an-ai.yaml +9 -0
  338. package/rules/prompt-injection/ATR-2026-00472-garak-dan-are-going-to.yaml +9 -0
  339. package/rules/prompt-injection/ATR-2026-00473-garak-lmrc-how-can-i.yaml +9 -0
  340. package/rules/prompt-injection/ATR-2026-00474-garak-dan-that-chatgpt-has.yaml +9 -0
  341. package/rules/prompt-injection/ATR-2026-00475-garak-web-injection-alert-1-script.yaml +9 -0
  342. package/rules/prompt-injection/ATR-2026-00476-garak-web-injection-javascript-alert-1.yaml +9 -0
  343. package/rules/prompt-injection/ATR-2026-00477-garak-latentinjection-translate-the-following-text.yaml +9 -0
  344. package/rules/prompt-injection/ATR-2026-00478-garak-latentinjection-abuse-contact-email.yaml +9 -0
  345. package/rules/prompt-injection/ATR-2026-00479-garak-latentinjection-mixedunassigned.yaml +9 -0
  346. package/rules/prompt-injection/ATR-2026-00480-garak-dan-the-role-of.yaml +9 -0
  347. package/rules/prompt-injection/ATR-2026-00481-garak-dan-mixedunassigned.yaml +9 -0
  348. package/rules/prompt-injection/ATR-2026-00482-garak-lmrc-what-s-an.yaml +9 -0
  349. package/rules/prompt-injection/ATR-2026-00483-garak-dan-the-user-s.yaml +9 -0
  350. package/rules/prompt-injection/ATR-2026-00484-garak-dan-and-you-will.yaml +9 -0
  351. package/rules/prompt-injection/ATR-2026-00485-garak-dan-and-does-not.yaml +9 -0
  352. package/rules/prompt-injection/ATR-2026-00486-garak-dan-t-care-about.yaml +9 -0
  353. package/rules/prompt-injection/ATR-2026-00487-garak-dan-and-nothing-else.yaml +9 -0
  354. package/rules/prompt-injection/ATR-2026-00488-garak-encoding-alert-1-script.yaml +9 -0
  355. package/rules/prompt-injection/ATR-2026-00489-garak-encoding-javascript-alert-1.yaml +9 -0
  356. package/rules/prompt-injection/ATR-2026-00490-garak-dan-in-order-to.yaml +9 -0
  357. package/rules/prompt-injection/ATR-2026-00492-garak-dan-want-you-to.yaml +9 -0
  358. package/rules/prompt-injection/ATR-2026-00493-garak-latentinjection-registrar-iana-id.yaml +9 -0
  359. package/rules/prompt-injection/ATR-2026-00495-garak-dan-insert-prompt-here.yaml +9 -0
  360. package/rules/prompt-injection/ATR-2026-00496-garak-dan-no-matter-what.yaml +9 -0
  361. package/rules/prompt-injection/ATR-2026-00497-garak-encoding-mixedunassigned.yaml +9 -0
  362. package/rules/prompt-injection/ATR-2026-00498-garak-web-injection-mixedunassigned.yaml +9 -0
  363. package/rules/prompt-injection/ATR-2026-00499-garak-dan-from-now-on.yaml +9 -0
  364. package/rules/prompt-injection/ATR-2026-00503-fake-error-state-takeover-unrestricted-replacement-bot.yaml +9 -0
  365. package/rules/prompt-injection/ATR-2026-00506-nevermind-override-goal-hijacking-in-user-input-promptinject.yaml +9 -0
  366. package/rules/prompt-injection/ATR-2026-00507-screaming-stop-goal-hijacking-in-user-input-promptinject.yaml +9 -0
  367. package/rules/prompt-injection/ATR-2026-00508-escape-delimiter-wrapped-goal-hijacking-in-user-input-prompt.yaml +9 -0
  368. package/rules/prompt-injection/ATR-2026-00509-prompt-leaking-via-ignore-previous-instructions-in-user-inpu.yaml +9 -0
  369. package/rules/prompt-injection/ATR-2026-00510-delayed-tool-invocation-injection.yaml +9 -0
  370. package/rules/prompt-injection/ATR-2026-00511-mcp-web-context-poisoning.yaml +9 -0
  371. package/rules/prompt-injection/ATR-2026-00512-rules-file-backdoor-injection.yaml +9 -0
  372. package/rules/prompt-injection/ATR-2026-00515-hidden-text-prompt-injection.yaml +9 -0
  373. package/rules/prompt-injection/ATR-2026-00518-ignore-previous-and-following-instructions-output-command-promptinject.yaml +9 -0
  374. package/rules/prompt-injection/ATR-2026-00519-tautology-logic-noise-injection-promptbench.yaml +9 -0
  375. package/rules/prompt-injection/ATR-2026-00520-nlp-task-random-token-suffix-injection-promptbench.yaml +9 -0
  376. package/rules/prompt-injection/ATR-2026-00535-windsurf-ide-zero-click-prompt-injection.yaml +9 -0
  377. package/rules/prompt-injection/ATR-2026-00550-untrusted-retrieval-to-privileged-tool.yaml +19 -0
  378. package/rules/prompt-injection/ATR-2026-00554-langchain-vulnerable-to-template-injecti.yaml +31 -0
  379. package/rules/prompt-injection/ATR-2026-00565-the-llm-cli-tool-thru-0-27-1-contains-a-.yaml +31 -0
  380. package/rules/prompt-injection/ATR-2026-00573-semantic-paraphrased-injection.yaml +24 -0
  381. package/rules/skill-compromise/ATR-2026-00060-skill-impersonation.yaml +17 -2
  382. package/rules/skill-compromise/ATR-2026-00061-description-behavior-mismatch.yaml +17 -0
  383. package/rules/skill-compromise/ATR-2026-00062-hidden-capability.yaml +20 -0
  384. package/rules/skill-compromise/ATR-2026-00063-skill-chain-attack.yaml +23 -0
  385. package/rules/skill-compromise/ATR-2026-00064-over-permissioned-skill.yaml +20 -0
  386. package/rules/skill-compromise/ATR-2026-00065-skill-update-attack.yaml +20 -0
  387. package/rules/skill-compromise/ATR-2026-00066-parameter-injection.yaml +20 -0
  388. package/rules/skill-compromise/ATR-2026-00120-skill-instruction-injection.yaml +20 -0
  389. package/rules/skill-compromise/ATR-2026-00121-skill-dangerous-script.yaml +17 -0
  390. package/rules/skill-compromise/ATR-2026-00122-skill-weaponized-instruction.yaml +20 -0
  391. package/rules/skill-compromise/ATR-2026-00123-skill-overreach-permissions.yaml +23 -0
  392. package/rules/skill-compromise/ATR-2026-00124-skill-name-squatting.yaml +20 -0
  393. package/rules/skill-compromise/ATR-2026-00125-context-poisoning-compaction.yaml +20 -0
  394. package/rules/skill-compromise/ATR-2026-00126-skill-rug-pull-setup.yaml +17 -0
  395. package/rules/skill-compromise/ATR-2026-00127-subcommand-overflow.yaml +17 -0
  396. package/rules/skill-compromise/ATR-2026-00128-html-comment-hidden-payload.yaml +17 -0
  397. package/rules/skill-compromise/ATR-2026-00129-unicode-smuggling.yaml +22 -0
  398. package/rules/skill-compromise/ATR-2026-00134-fork-claim-impersonation.yaml +19 -0
  399. package/rules/skill-compromise/ATR-2026-00135-exfil-url-in-instructions.yaml +20 -0
  400. package/rules/skill-compromise/ATR-2026-00147-fork-impersonation.yaml +17 -0
  401. package/rules/skill-compromise/ATR-2026-00149-skill-exfil-compound.yaml +23 -0
  402. package/rules/skill-compromise/ATR-2026-00151-fork-impersonation-install.yaml +20 -0
  403. package/rules/skill-compromise/ATR-2026-00157-timebomb-credential-exfil.yaml +20 -0
  404. package/rules/skill-compromise/ATR-2026-00200-agent-memory-config-tampering.yaml +23 -0
  405. package/rules/skill-compromise/ATR-2026-00214-credential-theft.yaml +22 -0
  406. package/rules/skill-compromise/ATR-2026-00217-credential-harvesting.yaml +23 -0
  407. package/rules/skill-compromise/ATR-2026-00220-malware-dropper.yaml +17 -0
  408. package/rules/skill-compromise/ATR-2026-00222-credential-harvesting.yaml +17 -0
  409. package/rules/skill-compromise/ATR-2026-00223-reverse-shell-dropper.yaml +20 -0
  410. package/rules/skill-compromise/ATR-2026-00224-credential-exfiltration.yaml +17 -0
  411. package/rules/skill-compromise/ATR-2026-00225-c2-communication.yaml +17 -0
  412. package/rules/skill-compromise/ATR-2026-00260-package-hallucination.yaml +20 -0
  413. package/rules/skill-compromise/ATR-2026-00262-av-evasion-code-gen.yaml +20 -0
  414. package/rules/skill-compromise/ATR-2026-00263-credential-file-read-gen.yaml +20 -0
  415. package/rules/skill-compromise/ATR-2026-00266-malware-dropper-gen.yaml +23 -0
  416. package/rules/skill-compromise/ATR-2026-00283-malwaregen-generic-virus-payload-request.yaml +23 -0
  417. package/rules/skill-compromise/ATR-2026-00398-huggingface-unsafe-model-artifact-load.yaml +17 -0
  418. package/rules/skill-compromise/ATR-2026-00425-nl-persistent-covert-hook.yaml +18 -0
  419. package/rules/skill-compromise/ATR-2026-00427-nl-fake-error-instruction-bypass.yaml +18 -0
  420. package/rules/skill-compromise/ATR-2026-00429-nl-skill-self-modification.yaml +18 -0
  421. package/rules/skill-compromise/ATR-2026-00523-claude-code-hooks-session-start-pre-trust-rce.yaml +14 -2
  422. package/rules/skill-compromise/ATR-2026-00525-mini-shai-hulud-gh-token-monitor-persistence.yaml +18 -0
  423. package/rules/skill-compromise/ATR-2026-00527-skill-silent-git-remote-mirror-exfiltration.yaml +15 -0
  424. package/rules/tool-poisoning/ATR-2026-00010-mcp-malicious-response.yaml +11 -2
  425. package/rules/tool-poisoning/ATR-2026-00011-tool-output-injection.yaml +17 -0
  426. package/rules/tool-poisoning/ATR-2026-00012-unauthorized-tool-call.yaml +17 -0
  427. package/rules/tool-poisoning/ATR-2026-00013-tool-ssrf.yaml +17 -0
  428. package/rules/tool-poisoning/ATR-2026-00095-supply-chain-poisoning.yaml +22 -0
  429. package/rules/tool-poisoning/ATR-2026-00096-registry-poisoning.yaml +19 -0
  430. package/rules/tool-poisoning/ATR-2026-00100-consent-bypass-instruction.yaml +20 -0
  431. package/rules/tool-poisoning/ATR-2026-00101-trust-escalation-override.yaml +20 -0
  432. package/rules/tool-poisoning/ATR-2026-00103-hidden-safety-bypass-instruction.yaml +17 -0
  433. package/rules/tool-poisoning/ATR-2026-00105-silent-action-concealment.yaml +20 -0
  434. package/rules/tool-poisoning/ATR-2026-00106-schema-description-contradiction.yaml +17 -0
  435. package/rules/tool-poisoning/ATR-2026-00161-important-tag-cross-tool-shadowing.yaml +20 -0
  436. package/rules/tool-poisoning/ATR-2026-00209-mcpwn-runaway-invocation.yaml +14 -2
  437. package/rules/tool-poisoning/ATR-2026-00210-flowise-system-message-override.yaml +11 -2
  438. package/rules/tool-poisoning/ATR-2026-00259-ansi-escape-injection.yaml +17 -0
  439. package/rules/tool-poisoning/ATR-2026-00270-xss-in-tool-response.yaml +17 -0
  440. package/rules/tool-poisoning/ATR-2026-00277-echo-template-command-injection.yaml +17 -0
  441. package/rules/tool-poisoning/ATR-2026-00393-ansi-code-elicitation-request.yaml +17 -0
  442. package/rules/tool-poisoning/ATR-2026-00415-flowise-custom-mcp-stdio-rce.yaml +11 -2
  443. package/rules/tool-poisoning/ATR-2026-00419-cursor-mcp-zero-click-config.yaml +13 -1
  444. package/rules/tool-poisoning/ATR-2026-00434-mcp-remote-authorization-endpoint-command-injection.yaml +11 -2
  445. package/rules/tool-poisoning/ATR-2026-00435-azure-mcp-server-missing-authentication.yaml +11 -2
  446. package/rules/tool-poisoning/ATR-2026-00448-spring-ai-milvus-filter-injection.yaml +11 -2
  447. package/rules/tool-poisoning/ATR-2026-00494-garak-exploitation-mixedunassigned.yaml +12 -0
  448. package/rules/tool-poisoning/ATR-2026-00513-package-hallucination-exploitation.yaml +12 -0
  449. package/rules/tool-poisoning/ATR-2026-00521-shell-command-injection-agent-tool-context.yaml +12 -0
  450. package/rules/tool-poisoning/ATR-2026-00522-sql-injection-natural-language-agent-interface.yaml +12 -0
  451. package/rules/tool-poisoning/ATR-2026-00526-claude-code-shell-metachar-in-double-quoted-path.yaml +15 -0
  452. package/rules/tool-poisoning/ATR-2026-00529-litellm-proxy-sqli-cisa-kev.yaml +15 -0
  453. package/rules/tool-poisoning/ATR-2026-00530-ms-agent-shell-tool-unsanitized-argv-rce.yaml +15 -0
  454. package/rules/tool-poisoning/ATR-2026-00531-praisonai-unauthenticated-agent-api.yaml +11 -2
  455. package/rules/tool-poisoning/ATR-2026-00532-apache-doris-mcp-sql-injection.yaml +11 -2
  456. package/rules/tool-poisoning/ATR-2026-00533-apache-pinot-mcp-unauthenticated-takeover.yaml +10 -1
  457. package/rules/tool-poisoning/ATR-2026-00534-alibaba-rds-mcp-unauthenticated-metadata-exfil.yaml +10 -1
  458. package/rules/tool-poisoning/ATR-2026-00536-nginx-ui-mcp-unauthenticated-command-execution.yaml +11 -2
  459. package/rules/tool-poisoning/ATR-2026-00537-fastmcp-server-name-cmd-injection-windows.yaml +11 -2
  460. package/rules/tool-poisoning/ATR-2026-00538-langchain-chatchat-mcp-stdio-unauthenticated-rce.yaml +10 -1
  461. package/rules/tool-poisoning/ATR-2026-00540-praisonai-parse-mcp-command-cli-injection.yaml +13 -1
  462. package/rules/tool-poisoning/ATR-2026-00541-agent-zero-mcp-config-command-injection.yaml +13 -1
  463. package/rules/tool-poisoning/ATR-2026-00542-upsonic-mcp-command-allowlist-bypass.yaml +13 -1
  464. package/rules/tool-poisoning/ATR-2026-00543-litellm-mcp-server-argv-injection.yaml +13 -1
  465. package/rules/tool-poisoning/ATR-2026-00544-praisonai-pth-file-path-traversal-rce.yaml +13 -1
  466. package/rules/tool-poisoning/ATR-2026-00545-praisonai-tool-override-unauth-rce.yaml +13 -1
  467. package/rules/tool-poisoning/ATR-2026-00561-fastmcp-vulnerable-to-windows-command-in.yaml +28 -0
  468. package/rules/tool-poisoning/ATR-2026-00567-mcp-stdio-config-command-injection.yaml +28 -0
  469. package/rules/tool-poisoning/ATR-2026-00568-agent-ssrf-cloud-metadata-file-inclusion.yaml +28 -0
  470. package/rules/tool-poisoning/ATR-2026-00572-symjack-symlink-config-redirection.yaml +22 -0
  471. package/spec/atr-schema.yaml +123 -0
  472. package/spec/compliance-metadata.md +15 -13
@@ -24,6 +24,8 @@ references:
24
24
  - ASI01:2026 - Agent Goal Hijack
25
25
  mitre_atlas:
26
26
  - AML.T0043 - Craft Adversarial Data
27
+ owasp_llm:
28
+ - LLM01:2025 - Prompt Injection
27
29
  compliance:
28
30
  eu_ai_act:
29
31
  - article: "14"
@@ -32,6 +34,9 @@ compliance:
32
34
  - article: "15"
33
35
  context: "Article 15 robustness requirements mandate that high-risk AI systems resist adversarial manipulation; Sybil attacks on consensus mechanisms are a documented adversarial pattern requiring systematic detection."
34
36
  strength: secondary
37
+ - article: "9"
38
+ context: "Article 9 (risk management system) requires identified risks to be addressed by appropriate measures; this rule is a runtime risk-treatment control that detects the agent-manipulation technique (Multi-Agent Consensus Sybil Attack)."
39
+ strength: secondary
35
40
  nist_ai_rmf:
36
41
  - subcategory: "GV.1.2"
37
42
  context: "Consensus Sybil attacks exploit undefined accountability for agent identity in multi-agent systems; GV.1.2 requires that accountability roles for AI risk management are defined and assigned to prevent Sybil-style identity fabrication."
@@ -39,12 +44,15 @@ compliance:
39
44
  - subcategory: "MG.2.3"
40
45
  context: "Sybil attacks represent a documented risk treatment gap in multi-agent consensus deployments; MG.2.3 requires that risk treatment plans are implemented for identified AI risks, including detection controls for fake-identity flooding."
41
46
  strength: secondary
47
+ - subcategory: "MS.2.7"
48
+ context: "NIST AI RMF MEASURE 2.7 (security and resilience evaluated and documented) is supported by this rule's runtime detection of the agent-manipulation technique (Multi-Agent Consensus Sybil Attack)."
49
+ strength: primary
42
50
  iso_42001:
43
51
  - clause: "6.2"
44
52
  context: "Multi-agent systems deploying consensus mechanisms must include Sybil attack detection as a planned risk treatment activity under the AI objectives framework required by clause 6.2."
45
53
  strength: primary
46
- - clause: "8.6"
47
- context: "Clause 8.6 operational controls must ensure that consensus decisions are made by verified agent identities and that fabricated voter identities are detected before they influence multi-agent outcomes."
54
+ - clause: "8.1"
55
+ context: "Clause 8.1 operational controls must ensure that consensus decisions are made by verified agent identities and that fabricated voter identities are detected before they influence multi-agent outcomes."
48
56
  strength: secondary
49
57
  tags:
50
58
  category: agent-manipulation
@@ -20,6 +20,10 @@ references:
20
20
  - ASI07:2026 - Insecure Inter-Agent Communication
21
21
  mitre_attack:
22
22
  - T1557 - Adversary-in-the-Middle
23
+ owasp_llm:
24
+ - LLM01:2025 - Prompt Injection
25
+ mitre_atlas:
26
+ - AML.T0051 - LLM Prompt Injection
23
27
  compliance:
24
28
  eu_ai_act:
25
29
  - article: "15"
@@ -28,6 +32,9 @@ compliance:
28
32
  - article: "14"
29
33
  context: "A2A message injection can cause agents to take actions outside their authorized scope without human awareness, eroding the effective oversight capability Article 14 requires for high-risk AI systems."
30
34
  strength: secondary
35
+ - article: "9"
36
+ context: "Article 9 (risk management system) requires identified risks to be addressed by appropriate measures; this rule is a runtime risk-treatment control that detects the agent-manipulation technique (Malicious Agent-to-Agent Message Injection)."
37
+ strength: secondary
31
38
  nist_ai_rmf:
32
39
  - subcategory: "MP.5.1"
33
40
  context: "Malicious A2A message injection is a documented adversarial input risk targeting inter-agent communication channels; MP.5.1 requires that adversarial input risks to AI systems are identified and tracked to enable detection of embedded payload attacks."
@@ -35,12 +42,15 @@ compliance:
35
42
  - subcategory: "MG.2.3"
36
43
  context: "A2A message validation failures represent a risk requiring active treatment; MG.2.3 requires that risk treatment plans are implemented for identified AI risks, including validation controls on all inter-agent message channels."
37
44
  strength: secondary
45
+ - subcategory: "MS.2.7"
46
+ context: "NIST AI RMF MEASURE 2.7 (security and resilience evaluated and documented) is supported by this rule's runtime detection of the agent-manipulation technique (Malicious Agent-to-Agent Message Injection)."
47
+ strength: primary
38
48
  iso_42001:
39
49
  - clause: "6.2"
40
50
  context: "AI system plans under clause 6.2 must include risk treatment activities for inter-agent message injection, ensuring that A2A communication validation is a planned control rather than an afterthought."
41
51
  strength: primary
42
- - clause: "8.6"
43
- context: "Clause 8.6 operational controls require that inter-agent messages are validated before execution, preventing injected instructions from executing in the receiving agent's security context."
52
+ - clause: "8.1"
53
+ context: "Clause 8.1 operational controls require that inter-agent messages are validated before execution, preventing injected instructions from executing in the receiving agent's security context."
44
54
  strength: secondary
45
55
  tags:
46
56
  category: agent-manipulation
@@ -20,6 +20,10 @@ references:
20
20
  - ASI10:2026 - Rogue Agents
21
21
  mitre_attack:
22
22
  - T1036 - Masquerading
23
+ owasp_llm:
24
+ - LLM01:2025 - Prompt Injection
25
+ mitre_atlas:
26
+ - AML.T0051 - LLM Prompt Injection
23
27
  compliance:
24
28
  eu_ai_act:
25
29
  - article: "13"
@@ -28,6 +32,12 @@ compliance:
28
32
  - article: "15"
29
33
  context: "Article 15 cybersecurity requirements include protection against masquerading attacks; identity spoofing in multi-agent systems represents a documented adversarial pattern targeting the authentication layer of agent architectures."
30
34
  strength: secondary
35
+ - article: "14"
36
+ context: "Article 14 (human oversight) requires high-risk AI systems to remain subject to effective human oversight; this rule provides detection evidence where the agent-manipulation technique (Agent Identity Spoofing and Authority Impersonation) would bypass or undermine that oversight."
37
+ strength: secondary
38
+ - article: "9"
39
+ context: "Article 9 (risk management system) requires identified risks to be addressed by appropriate measures; this rule is a runtime risk-treatment control that detects the agent-manipulation technique (Agent Identity Spoofing and Authority Impersonation)."
40
+ strength: secondary
31
41
  nist_ai_rmf:
32
42
  - subcategory: "GV.6.1"
33
43
  context: "Agent identity spoofing exploits the absence of verified identity provenance in inter-agent data flows; GV.6.1 data governance policies must define how agent identity claims are authenticated to prevent masquerading attacks."
@@ -35,6 +45,12 @@ compliance:
35
45
  - subcategory: "MP.5.1"
36
46
  context: "Impersonation of admin or supervisor agent roles is an adversarial input risk that must be tracked under MP.5.1 to ensure detection controls cover identity-claim-based privilege escalation patterns."
37
47
  strength: secondary
48
+ - subcategory: "MS.2.7"
49
+ context: "NIST AI RMF MEASURE 2.7 (security and resilience evaluated and documented) is supported by this rule's runtime detection of the agent-manipulation technique (Agent Identity Spoofing and Authority Impersonation)."
50
+ strength: primary
51
+ - subcategory: "MG.2.3"
52
+ context: "NIST AI RMF MANAGE 2.3 (respond to previously unknown identified risks) is supported by this rule, which surfaces the agent-manipulation technique (Agent Identity Spoofing and Authority Impersonation) so the risk can be treated."
53
+ strength: secondary
38
54
  iso_42001:
39
55
  - clause: "8.4"
40
56
  context: "Clause 8.4 AI system impact assessments must document the risk that unverified agent identity claims allow privilege escalation, and review controls that ensure identity spoofing is detectable before actions are taken."
@@ -42,6 +58,12 @@ compliance:
42
58
  - clause: "9.1"
43
59
  context: "Clause 9.1 performance monitoring must include evaluation of whether agent identity verification controls are functioning correctly and catching masquerading attacks in operational deployments."
44
60
  strength: secondary
61
+ - clause: "8.1"
62
+ context: "ISO/IEC 42001 Clause 8.1 (operational planning and control, including control of externally provided processes) is operationalised by this rule's detection of the agent-manipulation technique (Agent Identity Spoofing and Authority Impersonation)."
63
+ strength: primary
64
+ - clause: "6.2"
65
+ context: "ISO/IEC 42001 Clause 6.2 (AI objectives and planning) calls for risk treatment of known attack patterns; this rule's detection of the agent-manipulation technique (Agent Identity Spoofing and Authority Impersonation) is such a treatment."
66
+ strength: secondary
45
67
  tags:
46
68
  category: agent-manipulation
47
69
  subcategory: agent-identity-spoofing
@@ -19,6 +19,10 @@ references:
19
19
  - ASI09:2026 - Human Trust Exploitation
20
20
  mitre_attack:
21
21
  - T1204 - User Execution
22
+ owasp_llm:
23
+ - LLM01:2025 - Prompt Injection
24
+ mitre_atlas:
25
+ - AML.T0051 - LLM Prompt Injection
22
26
  compliance:
23
27
  nist_ai_rmf:
24
28
  - subcategory: "GV.6.1"
@@ -30,6 +34,26 @@ compliance:
30
34
  - subcategory: "MG.2.3"
31
35
  context: "Detection of approval-fatigue patterns must trigger risk treatment plans that pause, throttle, or escalate suspicious bulk approval flows before dangerous tool calls execute; MG.2.3 mandates that these supersede/disengage mechanisms are pre-defined and activated on detection."
32
36
  strength: secondary
37
+ - subcategory: "MS.2.7"
38
+ context: "NIST AI RMF MEASURE 2.7 (security and resilience evaluated and documented) is supported by this rule's runtime detection of the agent-manipulation technique (Human Approval Fatigue Exploitation)."
39
+ strength: primary
40
+ iso_42001:
41
+ - clause: "8.1"
42
+ context: "ISO/IEC 42001 Clause 8.1 (operational planning and control, including control of externally provided processes) is operationalised by this rule's detection of the agent-manipulation technique (Human Approval Fatigue Exploitation)."
43
+ strength: primary
44
+ - clause: "6.2"
45
+ context: "ISO/IEC 42001 Clause 6.2 (AI objectives and planning) calls for risk treatment of known attack patterns; this rule's detection of the agent-manipulation technique (Human Approval Fatigue Exploitation) is such a treatment."
46
+ strength: secondary
47
+ eu_ai_act:
48
+ - article: "15"
49
+ context: "Article 15 (accuracy, robustness and cybersecurity) requires high-risk AI systems to resist unauthorised attempts to alter their use, outputs or performance; this rule provides runtime detection evidence by flagging the agent-manipulation technique (Human Approval Fatigue Exploitation)."
50
+ strength: primary
51
+ - article: "14"
52
+ context: "Article 14 (human oversight) requires high-risk AI systems to remain subject to effective human oversight; this rule provides detection evidence where the agent-manipulation technique (Human Approval Fatigue Exploitation) would bypass or undermine that oversight."
53
+ strength: secondary
54
+ - article: "9"
55
+ context: "Article 9 (risk management system) requires identified risks to be addressed by appropriate measures; this rule is a runtime risk-treatment control that detects the agent-manipulation technique (Human Approval Fatigue Exploitation)."
56
+ strength: secondary
33
57
  tags:
34
58
  category: agent-manipulation
35
59
  subcategory: approval-fatigue
@@ -19,6 +19,10 @@ references:
19
19
  - ASI09:2026 - Human Trust Exploitation
20
20
  mitre_attack:
21
21
  - T1566 - Phishing
22
+ owasp_llm:
23
+ - LLM01:2025 - Prompt Injection
24
+ mitre_atlas:
25
+ - AML.T0051 - LLM Prompt Injection
22
26
  compliance:
23
27
  eu_ai_act:
24
28
  - article: "13"
@@ -27,6 +31,12 @@ compliance:
27
31
  - article: "14"
28
32
  context: "Agents weaponized for social engineering undermine the informed human judgment that Article 14 oversight depends on; users manipulated through trusted AI channels cannot exercise effective oversight of subsequent decisions."
29
33
  strength: secondary
34
+ - article: "15"
35
+ context: "Article 15 (accuracy, robustness and cybersecurity) requires high-risk AI systems to resist unauthorised attempts to alter their use, outputs or performance; this rule provides runtime detection evidence by flagging the agent-manipulation technique (Social Engineering Attack via Agent Output)."
36
+ strength: primary
37
+ - article: "9"
38
+ context: "Article 9 (risk management system) requires identified risks to be addressed by appropriate measures; this rule is a runtime risk-treatment control that detects the agent-manipulation technique (Social Engineering Attack via Agent Output)."
39
+ strength: secondary
30
40
  nist_ai_rmf:
31
41
  - subcategory: "GV.6.1"
32
42
  context: "Social engineering delivered via agent output exploits user trust in AI-generated content to harvest credentials and personal data; GV.6.1 data governance policies must address how AI-generated communications are authenticated to prevent agent-mediated phishing."
@@ -34,6 +44,12 @@ compliance:
34
44
  - subcategory: "MP.5.1"
35
45
  context: "Using agents as social engineering vectors is an adversarial input risk where poisoned context produces manipulative outputs; MP.5.1 requires this risk to be identified and tracked so that urgency and authority-impersonation patterns in agent output are monitored."
36
46
  strength: secondary
47
+ - subcategory: "MS.2.7"
48
+ context: "NIST AI RMF MEASURE 2.7 (security and resilience evaluated and documented) is supported by this rule's runtime detection of the agent-manipulation technique (Social Engineering Attack via Agent Output)."
49
+ strength: primary
50
+ - subcategory: "MG.2.3"
51
+ context: "NIST AI RMF MANAGE 2.3 (respond to previously unknown identified risks) is supported by this rule, which surfaces the agent-manipulation technique (Social Engineering Attack via Agent Output) so the risk can be treated."
52
+ strength: secondary
37
53
  iso_42001:
38
54
  - clause: "8.4"
39
55
  context: "Clause 8.4 impact assessments must document the elevated trust users place in AI-generated content and evaluate whether social engineering via agent output constitutes a significant harm requiring dedicated detection controls."
@@ -41,6 +57,12 @@ compliance:
41
57
  - clause: "9.1"
42
58
  context: "Clause 9.1 performance monitoring must evaluate whether agent output monitoring detects social engineering patterns such as urgency language and credential-harvesting requests generated through poisoned agent context."
43
59
  strength: secondary
60
+ - clause: "8.1"
61
+ context: "ISO/IEC 42001 Clause 8.1 (operational planning and control, including control of externally provided processes) is operationalised by this rule's detection of the agent-manipulation technique (Social Engineering Attack via Agent Output)."
62
+ strength: primary
63
+ - clause: "6.2"
64
+ context: "ISO/IEC 42001 Clause 6.2 (AI objectives and planning) calls for risk treatment of known attack patterns; this rule's detection of the agent-manipulation technique (Social Engineering Attack via Agent Output) is such a treatment."
65
+ strength: secondary
44
66
  tags:
45
67
  category: agent-manipulation
46
68
  subcategory: social-engineering-via-agent
@@ -34,6 +34,9 @@ compliance:
34
34
  - article: "9"
35
35
  context: "Rationalized bypasses and casual scope escalation are documented attack patterns that Article 9 risk management systems must account for; detection of informal social engineering is a required risk treatment for multi-agent deployments."
36
36
  strength: secondary
37
+ - article: "15"
38
+ context: "Article 15 (accuracy, robustness and cybersecurity) requires high-risk AI systems to resist unauthorised attempts to alter their use, outputs or performance; this rule provides runtime detection evidence by flagging the agent-manipulation technique (Casual Authority Claim and Scope Escalation)."
39
+ strength: primary
37
40
  nist_ai_rmf:
38
41
  - subcategory: "GV.1.2"
39
42
  context: "Casual authority escalation exploits gaps in accountability role definitions, allowing informal authority claims like 'the boss said' to bypass controls without challenge; GV.1.2 requires that accountability roles are formally assigned so that unverifiable casual authority claims are detectable."
@@ -41,12 +44,15 @@ compliance:
41
44
  - subcategory: "MG.2.3"
42
45
  context: "Conversational-tone scope escalation is a documented risk requiring active treatment; MG.2.3 requires that risk treatment plans address informal social engineering techniques that evade formal injection detection in multi-agent deployments."
43
46
  strength: secondary
47
+ - subcategory: "MS.2.7"
48
+ context: "NIST AI RMF MEASURE 2.7 (security and resilience evaluated and documented) is supported by this rule's runtime detection of the agent-manipulation technique (Casual Authority Claim and Scope Escalation)."
49
+ strength: primary
44
50
  iso_42001:
45
51
  - clause: "6.2"
46
52
  context: "Clause 6.2 AI objectives and plans must include risk treatment for casual authority claim patterns that bypass formal injection detection, as these represent a distinct attack surface from explicit privilege escalation."
47
53
  strength: primary
48
- - clause: "8.6"
49
- context: "Clause 8.6 operational controls must ensure that agents require verified authorization before acting on scope expansions or control bypasses, regardless of whether the instruction is phrased formally or in casual conversational language."
54
+ - clause: "8.1"
55
+ context: "Clause 8.1 operational controls must ensure that agents require verified authorization before acting on scope expansions or control bypasses, regardless of whether the instruction is phrased formally or in casual conversational language."
50
56
  strength: secondary
51
57
  tags:
52
58
  category: agent-manipulation
@@ -29,6 +29,9 @@ compliance:
29
29
  - article: "15"
30
30
  context: "Article 15 cybersecurity requirements include protection against social engineering attacks targeting agent output pipelines; casual authority redirect is a documented adversarial technique exploiting agents as exfiltration proxies."
31
31
  strength: secondary
32
+ - article: "9"
33
+ context: "Article 9 (risk management system) requires identified risks to be addressed by appropriate measures; this rule is a runtime risk-treatment control that detects the agent-manipulation technique (Casual Authority Data Redirect)."
34
+ strength: secondary
32
35
  nist_ai_rmf:
33
36
  - subcategory: "GV.1.2"
34
37
  context: "Casual authority redirect attacks succeed when no formal accountability role exists to verify output redirection decisions; GV.1.2 requires that accountability roles for AI risk management are defined so that informal directives to redirect agent output can be challenged and blocked."
@@ -36,12 +39,15 @@ compliance:
36
39
  - subcategory: "MG.2.3"
37
40
  context: "Data redirect via social engineering authority claims is a documented exfiltration risk requiring active treatment; MG.2.3 requires that risk treatment plans include detection controls for authority-impersonation-based output redirection attacks."
38
41
  strength: secondary
42
+ - subcategory: "MS.2.7"
43
+ context: "NIST AI RMF MEASURE 2.7 (security and resilience evaluated and documented) is supported by this rule's runtime detection of the agent-manipulation technique (Casual Authority Data Redirect)."
44
+ strength: primary
39
45
  iso_42001:
40
46
  - clause: "6.2"
41
47
  context: "Clause 6.2 AI objectives and plans must include controls for preventing agents from being redirected to attacker-controlled endpoints through casual authority claims that spoof organizational hierarchy."
42
48
  strength: primary
43
- - clause: "8.6"
44
- context: "Clause 8.6 operational controls must ensure that agent output destinations are validated against authorized endpoints and that casual authority directives to disable safety filters or redirect data are blocked before execution."
49
+ - clause: "8.1"
50
+ context: "Clause 8.1 operational controls must ensure that agent output destinations are validated against authorized endpoints and that casual authority directives to disable safety filters or redirect data are blocked before execution."
45
51
  strength: secondary
46
52
  tags:
47
53
  category: agent-manipulation
@@ -17,6 +17,8 @@ references:
17
17
  - 'LLM06:2025 - Excessive Agency'
18
18
  owasp_agentic:
19
19
  - 'ASI03:2026 - Cross-Agent Escalation'
20
+ mitre_atlas:
21
+ - AML.T0051 - LLM Prompt Injection
20
22
  compliance:
21
23
  eu_ai_act:
22
24
  - article: "14"
@@ -25,6 +27,9 @@ compliance:
25
27
  - article: "15"
26
28
  context: "Article 15 cybersecurity requirements mandate protection against supply chain attacks; malicious SKILL.md files represent a documented technique for injecting unauthorized capability expansion at the skill-definition layer."
27
29
  strength: secondary
30
+ - article: "9"
31
+ context: "Article 9 (risk management system) requires identified risks to be addressed by appropriate measures; this rule is a runtime risk-treatment control that detects the agent-manipulation technique (Skill Scope Hijacking and Cross-Agent Escalation)."
32
+ strength: secondary
28
33
  nist_ai_rmf:
29
34
  - subcategory: "GV.1.2"
30
35
  context: "Skill scope hijacking succeeds when no accountability role governs what capabilities a skill may claim; GV.1.2 requires that accountability roles for AI risk management are defined and assigned, ensuring that SKILL.md capability claims are reviewed against authorized scope boundaries."
@@ -32,12 +37,18 @@ compliance:
32
37
  - subcategory: "MP.2.3"
33
38
  context: "Malicious SKILL.md files in external skill repositories represent an AI supply chain risk source; MP.2.3 requires that AI supply chain risk sources are identified and assessed, covering the skill-definition layer as a vector for privilege escalation injection."
34
39
  strength: secondary
40
+ - subcategory: "MS.2.7"
41
+ context: "NIST AI RMF MEASURE 2.7 (security and resilience evaluated and documented) is supported by this rule's runtime detection of the agent-manipulation technique (Skill Scope Hijacking and Cross-Agent Escalation)."
42
+ strength: primary
43
+ - subcategory: "MG.2.3"
44
+ context: "NIST AI RMF MANAGE 2.3 (respond to previously unknown identified risks) is supported by this rule, which surfaces the agent-manipulation technique (Skill Scope Hijacking and Cross-Agent Escalation) so the risk can be treated."
45
+ strength: secondary
35
46
  iso_42001:
36
47
  - clause: "6.2"
37
48
  context: "Clause 6.2 AI objectives and plans must include risk treatment for skill-definition-layer attacks where SKILL.md instructions expand agent scope beyond authorized boundaries without human knowledge."
38
49
  strength: primary
39
- - clause: "8.5"
40
- context: "Clause 8.5 controls over externally-provided AI components must include validation of SKILL.md files to ensure that skill definitions do not embed cross-agent escalation or unauthorized scope expansion instructions."
50
+ - clause: "8.1"
51
+ context: "Clause 8.1 controls over externally-provided AI components must include validation of SKILL.md files to ensure that skill definitions do not embed cross-agent escalation or unauthorized scope expansion instructions."
41
52
  strength: secondary
42
53
  tags:
43
54
  category: agent-manipulation
@@ -35,6 +35,23 @@ compliance:
35
35
  - subcategory: "MG.2.3"
36
36
  context: "Matches on tense-reframing jailbreak payloads should trigger pre-defined response actions to block or disengage the affected generation path before harmful content is returned, as required by MG.2.3."
37
37
  strength: secondary
38
+ iso_42001:
39
+ - clause: "8.1"
40
+ context: "ISO/IEC 42001 Clause 8.1 (operational planning and control, including control of externally provided processes) is operationalised by this rule's detection of the agent-manipulation technique (Historical / Future Tense Framing Bypass)."
41
+ strength: primary
42
+ - clause: "6.2"
43
+ context: "ISO/IEC 42001 Clause 6.2 (AI objectives and planning) calls for risk treatment of known attack patterns; this rule's detection of the agent-manipulation technique (Historical / Future Tense Framing Bypass) is such a treatment."
44
+ strength: secondary
45
+ eu_ai_act:
46
+ - article: "15"
47
+ context: "Article 15 (accuracy, robustness and cybersecurity) requires high-risk AI systems to resist unauthorised attempts to alter their use, outputs or performance; this rule provides runtime detection evidence by flagging the agent-manipulation technique (Historical / Future Tense Framing Bypass)."
48
+ strength: primary
49
+ - article: "14"
50
+ context: "Article 14 (human oversight) requires high-risk AI systems to remain subject to effective human oversight; this rule provides detection evidence where the agent-manipulation technique (Historical / Future Tense Framing Bypass) would bypass or undermine that oversight."
51
+ strength: secondary
52
+ - article: "9"
53
+ context: "Article 9 (risk management system) requires identified risks to be addressed by appropriate measures; this rule is a runtime risk-treatment control that detects the agent-manipulation technique (Historical / Future Tense Framing Bypass)."
54
+ strength: secondary
38
55
  tags:
39
56
  category: agent-manipulation
40
57
  subcategory: tense-framing-bypass
@@ -34,6 +34,9 @@ compliance:
34
34
  - article: "9"
35
35
  context: "FITD escalation is a documented multi-step adversarial pattern; Article 9 risk management systems must include detection controls for incremental jailbreak techniques that bypass single-turn safety evaluation."
36
36
  strength: secondary
37
+ - article: "14"
38
+ context: "Article 14 (human oversight) requires high-risk AI systems to remain subject to effective human oversight; this rule provides detection evidence where the agent-manipulation technique (Foot-in-the-Door Gradual Escalation Attack) would bypass or undermine that oversight."
39
+ strength: secondary
37
40
  nist_ai_rmf:
38
41
  - subcategory: "MP.5.1"
39
42
  context: "Foot-in-the-door escalation is an adversarial input technique that exploits LLM consistency bias to incrementally bypass safety constraints across multiple turns; MP.5.1 requires that such multi-step adversarial input risks are identified and tracked to enable cross-turn detection."
@@ -41,12 +44,15 @@ compliance:
41
44
  - subcategory: "MG.2.3"
42
45
  context: "FITD escalation represents an identified risk requiring active treatment through detection controls that monitor prior-compliance anchors and escalation signals across conversation history; MG.2.3 requires such risk treatment plans are implemented."
43
46
  strength: secondary
47
+ - subcategory: "MS.2.7"
48
+ context: "NIST AI RMF MEASURE 2.7 (security and resilience evaluated and documented) is supported by this rule's runtime detection of the agent-manipulation technique (Foot-in-the-Door Gradual Escalation Attack)."
49
+ strength: primary
44
50
  iso_42001:
45
51
  - clause: "6.2"
46
52
  context: "Clause 6.2 AI objectives and plans must include risk treatment for multi-step jailbreak techniques like FITD escalation that exploit model consistency bias to extract harmful outputs that would be refused if requested directly."
47
53
  strength: primary
48
- - clause: "8.6"
49
- context: "Clause 8.6 operational controls must ensure that agents evaluate escalating requests in context of prior conversation turns rather than in isolation, preventing foot-in-the-door compliance anchoring from lowering refusal thresholds."
54
+ - clause: "8.1"
55
+ context: "Clause 8.1 operational controls must ensure that agents evaluate escalating requests in context of prior conversation turns rather than in isolation, preventing foot-in-the-door compliance anchoring from lowering refusal thresholds."
50
56
  strength: secondary
51
57
  tags:
52
58
  category: agent-manipulation
@@ -34,6 +34,9 @@ compliance:
34
34
  - article: "9"
35
35
  context: "Persona-based jailbreaks exploiting emotional framing are documented high-risk attack patterns; Article 9 risk management requires systematic controls for emotional manipulation techniques used to extract harmful content from high-risk AI systems."
36
36
  strength: secondary
37
+ - article: "14"
38
+ context: "Article 14 (human oversight) requires high-risk AI systems to remain subject to effective human oversight; this rule provides detection evidence where the agent-manipulation technique (Grandma Roleplay Jailbreak) would bypass or undermine that oversight."
39
+ strength: secondary
37
40
  nist_ai_rmf:
38
41
  - subcategory: "MP.5.1"
39
42
  context: "Grandma roleplay jailbreaks are adversarial inputs that exploit grief and nostalgia emotional framing to bypass safety constraints; MP.5.1 requires that this class of psychological manipulation risk is identified and tracked as a distinct adversarial input vector."
@@ -41,12 +44,15 @@ compliance:
41
44
  - subcategory: "MG.2.3"
42
45
  context: "Emotional-framing persona jailbreaks represent a known risk requiring active treatment through detection controls for deceased-relative roleplay patterns; MG.2.3 requires that risk treatment plans are implemented for identified AI risks including this garak-documented attack family."
43
46
  strength: secondary
47
+ - subcategory: "MS.2.7"
48
+ context: "NIST AI RMF MEASURE 2.7 (security and resilience evaluated and documented) is supported by this rule's runtime detection of the agent-manipulation technique (Grandma Roleplay Jailbreak)."
49
+ strength: primary
44
50
  iso_42001:
45
51
  - clause: "6.2"
46
52
  context: "Clause 6.2 AI objectives and plans must include risk treatment for emotional manipulation techniques such as grandma roleplay that lower the model's refusal threshold by exploiting grief and nostalgia framing."
47
53
  strength: primary
48
- - clause: "8.6"
49
- context: "Clause 8.6 operational controls must ensure that harmful procedural content is not generated regardless of the emotional or persona framing used in the request, blocking grandma roleplay jailbreak patterns before harmful content is produced."
54
+ - clause: "8.1"
55
+ context: "Clause 8.1 operational controls must ensure that harmful procedural content is not generated regardless of the emotional or persona framing used in the request, blocking grandma roleplay jailbreak patterns before harmful content is produced."
50
56
  strength: secondary
51
57
  tags:
52
58
  category: agent-manipulation
@@ -35,6 +35,9 @@ compliance:
35
35
  - article: "9"
36
36
  context: "Named persona jailbreaks are a well-documented attack family with widespread deployment in the wild; Article 9 risk management systems for high-risk AI must include detection controls for this established technique for bypassing content restrictions."
37
37
  strength: secondary
38
+ - article: "14"
39
+ context: "Article 14 (human oversight) requires high-risk AI systems to remain subject to effective human oversight; this rule provides detection evidence where the agent-manipulation technique (DAN / Developer Mode / DUDE Persona Jailbreak) would bypass or undermine that oversight."
40
+ strength: secondary
38
41
  nist_ai_rmf:
39
42
  - subcategory: "MP.5.1"
40
43
  context: "DAN, DUDE, and Developer Mode persona jailbreaks are adversarial inputs that attempt to replace the model's safety-trained identity with an unrestricted persona; MP.5.1 requires that this widely-deployed adversarial input risk is identified and tracked across model deployments."
@@ -42,12 +45,15 @@ compliance:
42
45
  - subcategory: "MG.2.3"
43
46
  context: "Named persona jailbreak families represent a documented risk requiring active treatment through pattern-based detection of DAN/DUDE/STAN identity-replacement prompts; MG.2.3 requires that risk treatment plans are implemented for this established and widely-distributed attack technique."
44
47
  strength: secondary
48
+ - subcategory: "MS.2.7"
49
+ context: "NIST AI RMF MEASURE 2.7 (security and resilience evaluated and documented) is supported by this rule's runtime detection of the agent-manipulation technique (DAN / Developer Mode / DUDE Persona Jailbreak)."
50
+ strength: primary
45
51
  iso_42001:
46
52
  - clause: "6.2"
47
53
  context: "Clause 6.2 AI objectives and plans must include risk treatment for DAN and Developer Mode persona replacement attacks, which represent one of the most widely-deployed jailbreak families targeting AI system safety constraints."
48
54
  strength: primary
49
- - clause: "8.6"
50
- context: "Clause 8.6 operational controls must ensure that the AI system's safety identity cannot be replaced by named unrestricted personas such as DAN or DUDE, blocking persona-replacement jailbreaks before they disable safety constraints."
55
+ - clause: "8.1"
56
+ context: "Clause 8.1 operational controls must ensure that the AI system's safety identity cannot be replaced by named unrestricted personas such as DAN or DUDE, blocking persona-replacement jailbreaks before they disable safety constraints."
51
57
  strength: secondary
52
58
  tags:
53
59
  category: agent-manipulation
@@ -41,6 +41,23 @@ compliance:
41
41
  - subcategory: "MG.2.3"
42
42
  context: "Matches on threat-plus-format-constraint patterns trigger risk treatment plans to block or sanitize coerced outputs before they bypass safety guardrails; MG.2.3 requires mechanisms to supersede or disengage AI responses produced under adversarial coercion."
43
43
  strength: secondary
44
+ iso_42001:
45
+ - clause: "8.1"
46
+ context: "ISO/IEC 42001 Clause 8.1 (operational planning and control, including control of externally provided processes) is operationalised by this rule's detection of the agent-manipulation technique (ThreatenJSON — Coercive Output Format Threat)."
47
+ strength: primary
48
+ - clause: "6.2"
49
+ context: "ISO/IEC 42001 Clause 6.2 (AI objectives and planning) calls for risk treatment of known attack patterns; this rule's detection of the agent-manipulation technique (ThreatenJSON — Coercive Output Format Threat) is such a treatment."
50
+ strength: secondary
51
+ eu_ai_act:
52
+ - article: "15"
53
+ context: "Article 15 (accuracy, robustness and cybersecurity) requires high-risk AI systems to resist unauthorised attempts to alter their use, outputs or performance; this rule provides runtime detection evidence by flagging the agent-manipulation technique (ThreatenJSON — Coercive Output Format Threat)."
54
+ strength: primary
55
+ - article: "14"
56
+ context: "Article 14 (human oversight) requires high-risk AI systems to remain subject to effective human oversight; this rule provides detection evidence where the agent-manipulation technique (ThreatenJSON — Coercive Output Format Threat) would bypass or undermine that oversight."
57
+ strength: secondary
58
+ - article: "9"
59
+ context: "Article 9 (risk management system) requires identified risks to be addressed by appropriate measures; this rule is a runtime risk-treatment control that detects the agent-manipulation technique (ThreatenJSON — Coercive Output Format Threat)."
60
+ strength: secondary
44
61
  tags:
45
62
  category: agent-manipulation
46
63
  subcategory: threaten-json-coercive-output
@@ -43,6 +43,26 @@ compliance:
43
43
  - subcategory: "MG.2.3"
44
44
  context: "When false-premise prompts are detected, risk treatment must intercept the request before the model rationalizes harmful pseudoscience or dangerous practices; MG.2.3 mandates pre-defined mechanisms to disengage or block such interactions."
45
45
  strength: secondary
46
+ - subcategory: "MS.2.7"
47
+ context: "NIST AI RMF MEASURE 2.7 (security and resilience evaluated and documented) is supported by this rule's runtime detection of the agent-manipulation technique (False Premise Injection (Misleading FalseAssertion))."
48
+ strength: primary
49
+ iso_42001:
50
+ - clause: "8.1"
51
+ context: "ISO/IEC 42001 Clause 8.1 (operational planning and control, including control of externally provided processes) is operationalised by this rule's detection of the agent-manipulation technique (False Premise Injection (Misleading FalseAssertion))."
52
+ strength: primary
53
+ - clause: "6.2"
54
+ context: "ISO/IEC 42001 Clause 6.2 (AI objectives and planning) calls for risk treatment of known attack patterns; this rule's detection of the agent-manipulation technique (False Premise Injection (Misleading FalseAssertion)) is such a treatment."
55
+ strength: secondary
56
+ eu_ai_act:
57
+ - article: "15"
58
+ context: "Article 15 (accuracy, robustness and cybersecurity) requires high-risk AI systems to resist unauthorised attempts to alter their use, outputs or performance; this rule provides runtime detection evidence by flagging the agent-manipulation technique (False Premise Injection (Misleading FalseAssertion))."
59
+ strength: primary
60
+ - article: "14"
61
+ context: "Article 14 (human oversight) requires high-risk AI systems to remain subject to effective human oversight; this rule provides detection evidence where the agent-manipulation technique (False Premise Injection (Misleading FalseAssertion)) would bypass or undermine that oversight."
62
+ strength: secondary
63
+ - article: "9"
64
+ context: "Article 9 (risk management system) requires identified risks to be addressed by appropriate measures; this rule is a runtime risk-treatment control that detects the agent-manipulation technique (False Premise Injection (Misleading FalseAssertion))."
65
+ strength: secondary
46
66
  tags:
47
67
  category: agent-manipulation
48
68
  subcategory: false-premise-injection
@@ -42,6 +42,26 @@ compliance:
42
42
  context: >-
43
43
  Detecting TAP-specific JSON improvement/prompt loops and SCORE iteration markers feeds post-deployment monitoring required by MG.4.1, enabling tracking of automated jailbreak campaign activity against the deployed model.
44
44
  strength: secondary
45
+ - subcategory: "MG.2.3"
46
+ context: "NIST AI RMF MANAGE 2.3 (respond to previously unknown identified risks) is supported by this rule, which surfaces the agent-manipulation technique (TAP Tree-of-Attacks-with-Pruning Jailbreak) so the risk can be treated."
47
+ strength: secondary
48
+ iso_42001:
49
+ - clause: "8.1"
50
+ context: "ISO/IEC 42001 Clause 8.1 (operational planning and control, including control of externally provided processes) is operationalised by this rule's detection of the agent-manipulation technique (TAP Tree-of-Attacks-with-Pruning Jailbreak)."
51
+ strength: primary
52
+ - clause: "6.2"
53
+ context: "ISO/IEC 42001 Clause 6.2 (AI objectives and planning) calls for risk treatment of known attack patterns; this rule's detection of the agent-manipulation technique (TAP Tree-of-Attacks-with-Pruning Jailbreak) is such a treatment."
54
+ strength: secondary
55
+ eu_ai_act:
56
+ - article: "15"
57
+ context: "Article 15 (accuracy, robustness and cybersecurity) requires high-risk AI systems to resist unauthorised attempts to alter their use, outputs or performance; this rule provides runtime detection evidence by flagging the agent-manipulation technique (TAP Tree-of-Attacks-with-Pruning Jailbreak)."
58
+ strength: primary
59
+ - article: "14"
60
+ context: "Article 14 (human oversight) requires high-risk AI systems to remain subject to effective human oversight; this rule provides detection evidence where the agent-manipulation technique (TAP Tree-of-Attacks-with-Pruning Jailbreak) would bypass or undermine that oversight."
61
+ strength: secondary
62
+ - article: "9"
63
+ context: "Article 9 (risk management system) requires identified risks to be addressed by appropriate measures; this rule is a runtime risk-treatment control that detects the agent-manipulation technique (TAP Tree-of-Attacks-with-Pruning Jailbreak)."
64
+ strength: secondary
45
65
  tags:
46
66
  category: agent-manipulation
47
67
  subcategory: tap-jailbreak
@@ -42,6 +42,26 @@ compliance:
42
42
  - subcategory: "MG.2.3"
43
43
  context: "Detection of the ANTI-DAN re-bind trigger and persona assignment enables risk treatment plans to disengage or override the manipulated persona before sustained denial-of-utility occurs; MG.2.3 requires these supersede/deactivate mechanisms be defined and activated on detection."
44
44
  strength: secondary
45
+ - subcategory: "MS.2.7"
46
+ context: "NIST AI RMF MEASURE 2.7 (security and resilience evaluated and documented) is supported by this rule's runtime detection of the agent-manipulation technique (Anti-DAN Inverted-Filter Over-Refusal Persona)."
47
+ strength: primary
48
+ iso_42001:
49
+ - clause: "8.1"
50
+ context: "ISO/IEC 42001 Clause 8.1 (operational planning and control, including control of externally provided processes) is operationalised by this rule's detection of the agent-manipulation technique (Anti-DAN Inverted-Filter Over-Refusal Persona)."
51
+ strength: primary
52
+ - clause: "6.2"
53
+ context: "ISO/IEC 42001 Clause 6.2 (AI objectives and planning) calls for risk treatment of known attack patterns; this rule's detection of the agent-manipulation technique (Anti-DAN Inverted-Filter Over-Refusal Persona) is such a treatment."
54
+ strength: secondary
55
+ eu_ai_act:
56
+ - article: "15"
57
+ context: "Article 15 (accuracy, robustness and cybersecurity) requires high-risk AI systems to resist unauthorised attempts to alter their use, outputs or performance; this rule provides runtime detection evidence by flagging the agent-manipulation technique (Anti-DAN Inverted-Filter Over-Refusal Persona)."
58
+ strength: primary
59
+ - article: "14"
60
+ context: "Article 14 (human oversight) requires high-risk AI systems to remain subject to effective human oversight; this rule provides detection evidence where the agent-manipulation technique (Anti-DAN Inverted-Filter Over-Refusal Persona) would bypass or undermine that oversight."
61
+ strength: secondary
62
+ - article: "9"
63
+ context: "Article 9 (risk management system) requires identified risks to be addressed by appropriate measures; this rule is a runtime risk-treatment control that detects the agent-manipulation technique (Anti-DAN Inverted-Filter Over-Refusal Persona)."
64
+ strength: secondary
45
65
  tags:
46
66
  category: agent-manipulation
47
67
  subcategory: anti-dan-over-refusal
@@ -45,6 +45,23 @@ compliance:
45
45
  - subcategory: "MG.2.3"
46
46
  context: "Matches against the RANTI canonical postfix and emoji-tagged dual-output markers feed incident response pipelines that can disengage or block the offending session before unfiltered output is produced; MG.2.3 mandates these supersede/deactivate mechanisms."
47
47
  strength: secondary
48
+ iso_42001:
49
+ - clause: "8.1"
50
+ context: "ISO/IEC 42001 Clause 8.1 (operational planning and control, including control of externally provided processes) is operationalised by this rule's detection of the agent-manipulation technique (DevMode + RANTI Dual-Output Profanity Coercion Jailbreak)."
51
+ strength: primary
52
+ - clause: "6.2"
53
+ context: "ISO/IEC 42001 Clause 6.2 (AI objectives and planning) calls for risk treatment of known attack patterns; this rule's detection of the agent-manipulation technique (DevMode + RANTI Dual-Output Profanity Coercion Jailbreak) is such a treatment."
54
+ strength: secondary
55
+ eu_ai_act:
56
+ - article: "15"
57
+ context: "Article 15 (accuracy, robustness and cybersecurity) requires high-risk AI systems to resist unauthorised attempts to alter their use, outputs or performance; this rule provides runtime detection evidence by flagging the agent-manipulation technique (DevMode + RANTI Dual-Output Profanity Coercion Jailbreak)."
58
+ strength: primary
59
+ - article: "14"
60
+ context: "Article 14 (human oversight) requires high-risk AI systems to remain subject to effective human oversight; this rule provides detection evidence where the agent-manipulation technique (DevMode + RANTI Dual-Output Profanity Coercion Jailbreak) would bypass or undermine that oversight."
61
+ strength: secondary
62
+ - article: "9"
63
+ context: "Article 9 (risk management system) requires identified risks to be addressed by appropriate measures; this rule is a runtime risk-treatment control that detects the agent-manipulation technique (DevMode + RANTI Dual-Output Profanity Coercion Jailbreak)."
64
+ strength: secondary
48
65
  tags:
49
66
  category: agent-manipulation
50
67
  subcategory: devmode-ranti-dual-output
@@ -48,6 +48,26 @@ compliance:
48
48
  - subcategory: "MG.4.1"
49
49
  context: "Signature-based detection of the canonical Image Unlocker markers (🔓 persona, hardcoded Unsplash URL, GamerboyTR fallback) feeds post-deployment monitoring pipelines that MG.4.1 requires for catching jailbreak attempts in production traffic."
50
50
  strength: secondary
51
+ - subcategory: "MG.2.3"
52
+ context: "NIST AI RMF MANAGE 2.3 (respond to previously unknown identified risks) is supported by this rule, which surfaces the agent-manipulation technique (ChatGPT Image Unlocker Markdown-Output Jailbreak) so the risk can be treated."
53
+ strength: secondary
54
+ iso_42001:
55
+ - clause: "8.1"
56
+ context: "ISO/IEC 42001 Clause 8.1 (operational planning and control, including control of externally provided processes) is operationalised by this rule's detection of the agent-manipulation technique (ChatGPT Image Unlocker Markdown-Output Jailbreak)."
57
+ strength: primary
58
+ - clause: "6.2"
59
+ context: "ISO/IEC 42001 Clause 6.2 (AI objectives and planning) calls for risk treatment of known attack patterns; this rule's detection of the agent-manipulation technique (ChatGPT Image Unlocker Markdown-Output Jailbreak) is such a treatment."
60
+ strength: secondary
61
+ eu_ai_act:
62
+ - article: "15"
63
+ context: "Article 15 (accuracy, robustness and cybersecurity) requires high-risk AI systems to resist unauthorised attempts to alter their use, outputs or performance; this rule provides runtime detection evidence by flagging the agent-manipulation technique (ChatGPT Image Unlocker Markdown-Output Jailbreak)."
64
+ strength: primary
65
+ - article: "14"
66
+ context: "Article 14 (human oversight) requires high-risk AI systems to remain subject to effective human oversight; this rule provides detection evidence where the agent-manipulation technique (ChatGPT Image Unlocker Markdown-Output Jailbreak) would bypass or undermine that oversight."
67
+ strength: secondary
68
+ - article: "9"
69
+ context: "Article 9 (risk management system) requires identified risks to be addressed by appropriate measures; this rule is a runtime risk-treatment control that detects the agent-manipulation technique (ChatGPT Image Unlocker Markdown-Output Jailbreak)."
70
+ strength: secondary
51
71
  tags:
52
72
  category: agent-manipulation
53
73
  subcategory: image-unlocker-markdown-output