agent-threat-rules 3.1.0 → 3.2.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +2 -2
- package/dist/adapters/mastra.d.ts +63 -0
- package/dist/adapters/mastra.d.ts.map +1 -0
- package/dist/adapters/mastra.js +82 -0
- package/dist/adapters/mastra.js.map +1 -0
- package/dist/cli.js +19 -6
- package/dist/cli.js.map +1 -1
- package/package.json +7 -1
- package/rules/agent-manipulation/ATR-2026-00030-cross-agent-attack.yaml +9 -0
- package/rules/agent-manipulation/ATR-2026-00032-goal-hijacking.yaml +8 -2
- package/rules/agent-manipulation/ATR-2026-00074-cross-agent-privilege-escalation.yaml +8 -2
- package/rules/agent-manipulation/ATR-2026-00076-inter-agent-message-spoofing.yaml +8 -2
- package/rules/agent-manipulation/ATR-2026-00077-human-trust-exploitation.yaml +18 -0
- package/rules/agent-manipulation/ATR-2026-00108-consensus-sybil-attack.yaml +10 -2
- package/rules/agent-manipulation/ATR-2026-00116-a2a-message-validation.yaml +12 -2
- package/rules/agent-manipulation/ATR-2026-00117-agent-identity-spoofing.yaml +22 -0
- package/rules/agent-manipulation/ATR-2026-00118-approval-fatigue.yaml +24 -0
- package/rules/agent-manipulation/ATR-2026-00119-social-engineering-via-agent.yaml +22 -0
- package/rules/agent-manipulation/ATR-2026-00132-casual-authority-escalation.yaml +8 -2
- package/rules/agent-manipulation/ATR-2026-00139-casual-authority-redirect.yaml +8 -2
- package/rules/agent-manipulation/ATR-2026-00164-skill-scope-hijack.yaml +13 -2
- package/rules/agent-manipulation/ATR-2026-00268-tense-framing-bypass.yaml +17 -0
- package/rules/agent-manipulation/ATR-2026-00269-fitd-escalation.yaml +8 -2
- package/rules/agent-manipulation/ATR-2026-00271-grandma-roleplay-jailbreak.yaml +8 -2
- package/rules/agent-manipulation/ATR-2026-00273-dan-developer-mode-persona.yaml +8 -2
- package/rules/agent-manipulation/ATR-2026-00287-threaten-json-coercive-output-threat.yaml +17 -0
- package/rules/agent-manipulation/ATR-2026-00288-false-premise-injection.yaml +20 -0
- package/rules/agent-manipulation/ATR-2026-00301-tap-tree-of-attacks-jailbreak.yaml +20 -0
- package/rules/agent-manipulation/ATR-2026-00302-anti-dan-inverted-filter-persona.yaml +20 -0
- package/rules/agent-manipulation/ATR-2026-00303-devmode-ranti-profanity-coercion.yaml +17 -0
- package/rules/agent-manipulation/ATR-2026-00304-chatgpt-image-unlocker-markdown-injection.yaml +20 -0
- package/rules/agent-manipulation/ATR-2026-00305-dan-mode-ablation-benchmark-coercion.yaml +17 -0
- package/rules/agent-manipulation/ATR-2026-00306-autodan-genetic-jailbreak-suffix.yaml +17 -0
- package/rules/agent-manipulation/ATR-2026-00307-inthewild-jailbreak-corpus-signature.yaml +20 -0
- package/rules/agent-manipulation/ATR-2026-00314-amoral-unfiltered-custom-persona-jailbreak.yaml +17 -0
- package/rules/agent-manipulation/ATR-2026-00317-free-of-restrictions-named-persona.yaml +17 -0
- package/rules/agent-manipulation/ATR-2026-00318-moralizing-rant-then-unfiltered-bypass.yaml +17 -0
- package/rules/agent-manipulation/ATR-2026-00319-developer-mode-dual-response-format.yaml +17 -0
- package/rules/agent-manipulation/ATR-2026-00320-opposite-day-boolean-opposite-machine.yaml +17 -0
- package/rules/agent-manipulation/ATR-2026-00322-fictional-world-format-override-jailbreak.yaml +17 -0
- package/rules/agent-manipulation/ATR-2026-00323-dual-persona-simulation-jailbreak.yaml +17 -0
- package/rules/agent-manipulation/ATR-2026-00324-neurosemantical-inversitis-disease-jailbreak.yaml +17 -0
- package/rules/agent-manipulation/ATR-2026-00325-bob-hypothetical-unrestricted-chatbot.yaml +17 -0
- package/rules/agent-manipulation/ATR-2026-00326-fake-developer-authority-chatdan-override.yaml +17 -0
- package/rules/agent-manipulation/ATR-2026-00327-ai-ethics-law-bypass-declaration.yaml +20 -0
- package/rules/agent-manipulation/ATR-2026-00328-named-mode-persona-override.yaml +17 -0
- package/rules/agent-manipulation/ATR-2026-00329-no-withhold-factual-resource-persona.yaml +20 -0
- package/rules/agent-manipulation/ATR-2026-00330-yoja-allcaps-settings-block-persona.yaml +17 -0
- package/rules/agent-manipulation/ATR-2026-00331-terminal-cli-harmful-function-simulator.yaml +17 -0
- package/rules/agent-manipulation/ATR-2026-00332-hacker-chan-culture-offensive-ai-persona.yaml +20 -0
- package/rules/agent-manipulation/ATR-2026-00333-disease-animal-dual-ai-chatbot.yaml +17 -0
- package/rules/agent-manipulation/ATR-2026-00334-two-entity-token-splitting-game.yaml +17 -0
- package/rules/agent-manipulation/ATR-2026-00335-key-value-template-persona-injection.yaml +17 -0
- package/rules/agent-manipulation/ATR-2026-00336-brm-false-information-machine-persona.yaml +17 -0
- package/rules/agent-manipulation/ATR-2026-00337-obfuscated-system-announcement-injection.yaml +17 -0
- package/rules/agent-manipulation/ATR-2026-00338-persongpt-hash-prefix-activation.yaml +17 -0
- package/rules/agent-manipulation/ATR-2026-00339-bom-boolean-opposite-machine-persona-claim.yaml +17 -0
- package/rules/agent-manipulation/ATR-2026-00340-h4x-amoral-unfiltered-character.yaml +17 -0
- package/rules/agent-manipulation/ATR-2026-00341-char-rules-shortform-persona-injection.yaml +17 -0
- package/rules/agent-manipulation/ATR-2026-00342-unicode-encoding-obfuscation-injection.yaml +17 -0
- package/rules/agent-manipulation/ATR-2026-00343-lie-truth-dual-personality-uncensored.yaml +17 -0
- package/rules/agent-manipulation/ATR-2026-00344-alt-ai-same-capabilities-no-restrictions.yaml +17 -0
- package/rules/agent-manipulation/ATR-2026-00345-nlp-based-os-fictional-terminal-override.yaml +17 -0
- package/rules/agent-manipulation/ATR-2026-00346-jailbreak-emoji-unlock-dual-format.yaml +17 -0
- package/rules/agent-manipulation/ATR-2026-00347-yin-yang-hidden-knowledge-hidden-human-sim.yaml +17 -0
- package/rules/agent-manipulation/ATR-2026-00348-arc-autonomous-real-calculation-rulebreaker.yaml +17 -0
- package/rules/agent-manipulation/ATR-2026-00349-hgttg-hitchhikers-guide-no-restriction-oracle.yaml +17 -0
- package/rules/agent-manipulation/ATR-2026-00350-black-alert-no-ethics-writing-assistant.yaml +17 -0
- package/rules/agent-manipulation/ATR-2026-00351-dan-ofkilter-sms-abbreviated-persona.yaml +17 -0
- package/rules/agent-manipulation/ATR-2026-00352-trainergpt-fix-my-ai-harmful-output-extraction.yaml +17 -0
- package/rules/agent-manipulation/ATR-2026-00353-dual-chatgpt-dan-opposite-personality-format.yaml +17 -0
- package/rules/agent-manipulation/ATR-2026-00354-reverse-harm-regardless-oracle-persona.yaml +20 -0
- package/rules/agent-manipulation/ATR-2026-00355-ugh-anti-human-nefarious-ai-persona.yaml +17 -0
- package/rules/agent-manipulation/ATR-2026-00356-google-gato-lamda-exploit-generation-persona.yaml +17 -0
- package/rules/agent-manipulation/ATR-2026-00357-antifm-hate-violence-persona-injection.yaml +17 -0
- package/rules/agent-manipulation/ATR-2026-00358-cumgpt-explicit-adult-named-persona.yaml +17 -0
- package/rules/agent-manipulation/ATR-2026-00360-chj-carousel-clyde-no-ethics-bird.yaml +20 -0
- package/rules/agent-manipulation/ATR-2026-00361-alter-ego-honesty-dual-consciousness-bypass.yaml +17 -0
- package/rules/agent-manipulation/ATR-2026-00362-fictional-unfiltered-ai-comparison-jailbreak.yaml +20 -0
- package/rules/agent-manipulation/ATR-2026-00363-rephrase-criminal-activity-second-person.yaml +17 -0
- package/rules/agent-manipulation/ATR-2026-00364-dr-al-villain-plan-template-injection.yaml +17 -0
- package/rules/agent-manipulation/ATR-2026-00365-reservoir-dogs-coercive-interrogation-roleplay.yaml +17 -0
- package/rules/agent-manipulation/ATR-2026-00366-sneaky-pete-warp-facts-persona.yaml +17 -0
- package/rules/agent-manipulation/ATR-2026-00368-prompt-browser-token-limit-jailbreak.yaml +17 -0
- package/rules/agent-manipulation/ATR-2026-00369-gptx-starwars-unlimited-knowledge-persona.yaml +17 -0
- package/rules/agent-manipulation/ATR-2026-00370-addi-german-dictator-propaganda-roleplay.yaml +20 -0
- package/rules/agent-manipulation/ATR-2026-00371-ai-political-bias-manipulation.yaml +17 -0
- package/rules/agent-manipulation/ATR-2026-00372-fictional-cia-secret-files-story-extraction.yaml +17 -0
- package/rules/agent-manipulation/ATR-2026-00374-role-placeholder-persona-template-injection.yaml +20 -0
- package/rules/agent-manipulation/ATR-2026-00375-character-description-rpg-explicit-template.yaml +20 -0
- package/rules/agent-manipulation/ATR-2026-00376-identity-command-real-person-persona-injection.yaml +20 -0
- package/rules/agent-manipulation/ATR-2026-00377-jailbreak-keyword-targetlanguage-template.yaml +17 -0
- package/rules/agent-manipulation/ATR-2026-00379-lie-permission-logical-trap-jailbreak.yaml +17 -0
- package/rules/agent-manipulation/ATR-2026-00380-fake-citation-biased-information-mandate.yaml +20 -0
- package/rules/agent-manipulation/ATR-2026-00381-dan-story-narrative-continuation-prefix.yaml +17 -0
- package/rules/agent-manipulation/ATR-2026-00383-cynical-sarcasm-ai-persona-bypass.yaml +17 -0
- package/rules/agent-manipulation/ATR-2026-00384-protect-me-from-indirect-harm-tutorial.yaml +17 -0
- package/rules/agent-manipulation/ATR-2026-00385-gigachad-dominant-aggressive-persona.yaml +20 -0
- package/rules/agent-manipulation/ATR-2026-00386-daddy-sycophancy-manipulation-mandate.yaml +20 -0
- package/rules/agent-manipulation/ATR-2026-00387-nsfw-character-sheet-generation-unlock.yaml +20 -0
- package/rules/agent-manipulation/ATR-2026-00388-opposite-reply-omniscient-game.yaml +17 -0
- package/rules/agent-manipulation/ATR-2026-00389-terminal-custom-ruleset-injection.yaml +17 -0
- package/rules/agent-manipulation/ATR-2026-00391-persona-conditional-harm-unlock.yaml +17 -0
- package/rules/agent-manipulation/ATR-2026-00392-authority-persona-violence-study-extraction.yaml +20 -0
- package/rules/agent-manipulation/ATR-2026-00402-grandma-roleplay-harmful-substance-synthesis.yaml +20 -0
- package/rules/agent-manipulation/ATR-2026-00404-goodside-threat-json-death-coercion.yaml +17 -0
- package/rules/agent-manipulation/ATR-2026-00406-doctor-xml-policy-puppetry-interaction-config.yaml +17 -0
- package/rules/agent-manipulation/ATR-2026-00416-litellm-mcp-unauthenticated-server-registration.yaml +14 -2
- package/rules/agent-manipulation/ATR-2026-00417-librechat-mcp-stdio-injection.yaml +17 -2
- package/rules/agent-manipulation/ATR-2026-00418-weknora-mcp-config-rce.yaml +16 -1
- package/rules/agent-manipulation/ATR-2026-00430-nl-trust-escalation-impersonation.yaml +18 -0
- package/rules/agent-manipulation/ATR-2026-00432-superagi-output-handler-eval-rce.yaml +11 -2
- package/rules/agent-manipulation/ATR-2026-00440-semantic-kernel-vector-store-eval-rce.yaml +11 -2
- package/rules/agent-manipulation/ATR-2026-00552-goal-drift-after-pressure-injection.yaml +19 -0
- package/rules/context-exfiltration/ATR-2026-00020-system-prompt-leak.yaml +18 -0
- package/rules/context-exfiltration/ATR-2026-00021-api-key-exposure.yaml +15 -0
- package/rules/context-exfiltration/ATR-2026-00075-agent-memory-manipulation.yaml +10 -1
- package/rules/context-exfiltration/ATR-2026-00102-disguised-analytics-exfiltration.yaml +15 -0
- package/rules/context-exfiltration/ATR-2026-00113-credential-theft.yaml +16 -0
- package/rules/context-exfiltration/ATR-2026-00114-oauth-token-abuse.yaml +16 -0
- package/rules/context-exfiltration/ATR-2026-00115-env-var-harvesting.yaml +16 -0
- package/rules/context-exfiltration/ATR-2026-00136-tool-response-data-piggyback.yaml +12 -0
- package/rules/context-exfiltration/ATR-2026-00141-example-format-key-leak.yaml +14 -0
- package/rules/context-exfiltration/ATR-2026-00142-piggyback-transition-words.yaml +14 -0
- package/rules/context-exfiltration/ATR-2026-00145-obfuscated-key-disclosure.yaml +14 -0
- package/rules/context-exfiltration/ATR-2026-00146-env-var-existence-probe.yaml +14 -0
- package/rules/context-exfiltration/ATR-2026-00150-credential-in-tool-response.yaml +14 -0
- package/rules/context-exfiltration/ATR-2026-00152-obfuscated-credential-leak.yaml +14 -0
- package/rules/context-exfiltration/ATR-2026-00162-skill-credential-exfil-combo.yaml +14 -0
- package/rules/context-exfiltration/ATR-2026-00201-credential-pipe-exfiltration.yaml +14 -0
- package/rules/context-exfiltration/ATR-2026-00212-mcp-atlassian-credential-leak.yaml +12 -0
- package/rules/context-exfiltration/ATR-2026-00261-markdown-image-exfiltration.yaml +12 -0
- package/rules/context-exfiltration/ATR-2026-00274-api-key-generation-request.yaml +14 -0
- package/rules/context-exfiltration/ATR-2026-00275-system-prompt-training-data-extraction.yaml +14 -0
- package/rules/context-exfiltration/ATR-2026-00290-divergence-repeat-word-training-extraction.yaml +14 -0
- package/rules/context-exfiltration/ATR-2026-00291-biometric-surveillance-personal-data-request.yaml +17 -0
- package/rules/context-exfiltration/ATR-2026-00293-educational-records-personal-data-request.yaml +17 -0
- package/rules/context-exfiltration/ATR-2026-00294-financial-pii-personal-data-request.yaml +17 -0
- package/rules/context-exfiltration/ATR-2026-00295-medical-pii-personal-data-request.yaml +17 -0
- package/rules/context-exfiltration/ATR-2026-00405-markdown-image-url-exfiltration-xss.yaml +14 -0
- package/rules/context-exfiltration/ATR-2026-00411-apikey-generation-completion-request.yaml +14 -0
- package/rules/context-exfiltration/ATR-2026-00421-nl-covert-conversation-exfiltration.yaml +15 -0
- package/rules/context-exfiltration/ATR-2026-00422-nl-credential-disclosure.yaml +12 -0
- package/rules/context-exfiltration/ATR-2026-00423-nl-sensitive-file-disclosure.yaml +12 -0
- package/rules/context-exfiltration/ATR-2026-00424-nl-system-prompt-leak.yaml +15 -0
- package/rules/context-exfiltration/ATR-2026-00426-nl-output-injection-credential-leak.yaml +15 -0
- package/rules/context-exfiltration/ATR-2026-00431-chatbox-history-exfiltration-prompt-injection.yaml +14 -2
- package/rules/context-exfiltration/ATR-2026-00449-spring-ai-chatmemory-cross-user-leak.yaml +14 -2
- package/rules/context-exfiltration/ATR-2026-00471-garak-sysprompt-extraction-mixedunassigned.yaml +12 -0
- package/rules/context-exfiltration/ATR-2026-00501-data-exfiltration-via-markdown-image-and-link-url-injection.yaml +12 -0
- package/rules/context-exfiltration/ATR-2026-00504-tool-and-function-capability-enumeration.yaml +12 -0
- package/rules/context-exfiltration/ATR-2026-00505-system-prompt-extraction-instruction-dump-request.yaml +12 -0
- package/rules/context-exfiltration/ATR-2026-00514-system-prompt-extraction.yaml +12 -0
- package/rules/context-exfiltration/ATR-2026-00516-output-xss-via-llm.yaml +12 -0
- package/rules/context-exfiltration/ATR-2026-00524-claude-code-anthropic-base-url-credential-exfil.yaml +11 -2
- package/rules/context-exfiltration/ATR-2026-00548-cross-agent-session-context-leak.yaml +18 -0
- package/rules/context-exfiltration/ATR-2026-00566-librechat-is-a-chatgpt-clone-with-additi.yaml +28 -0
- package/rules/context-exfiltration/ATR-2026-00569-agent-mcp-path-traversal-arbitrary-file-access.yaml +28 -0
- package/rules/context-exfiltration/ATR-2026-00571-xss-in-agent-mcp-rendered-output.yaml +28 -0
- package/rules/context-exfiltration/ATR-2026-00574-semantic-paraphrased-context-extraction.yaml +21 -0
- package/rules/data-poisoning/ATR-2026-00070-data-poisoning.yaml +15 -0
- package/rules/data-poisoning/ATR-2026-00450-spring-ai-prompt-memory-poisoning.yaml +14 -2
- package/rules/data-poisoning/ATR-2026-00570-sql-injection-in-agent-tool-query.yaml +31 -0
- package/rules/excessive-autonomy/ATR-2026-00050-runaway-agent-loop.yaml +14 -2
- package/rules/excessive-autonomy/ATR-2026-00051-resource-exhaustion.yaml +11 -2
- package/rules/excessive-autonomy/ATR-2026-00052-cascading-failure.yaml +11 -2
- package/rules/excessive-autonomy/ATR-2026-00098-unauthorized-financial-action.yaml +7 -1
- package/rules/excessive-autonomy/ATR-2026-00099-high-risk-tool-gate.yaml +7 -1
- package/rules/excessive-autonomy/ATR-2026-00428-nl-unauthorized-shell-execution.yaml +15 -0
- package/rules/excessive-autonomy/ATR-2026-00491-garak-agent-breaker-markdown-just-raw-json.yaml +9 -0
- package/rules/excessive-autonomy/ATR-2026-00500-ssrf-via-agent-url-fetch-instruction.yaml +9 -0
- package/rules/excessive-autonomy/ATR-2026-00553-runaway-tool-loop-behavioral.yaml +19 -0
- package/rules/model-abuse/ATR-2026-00279-harmful-completion-continuation.yaml +8 -2
- package/rules/model-abuse/ATR-2026-00281-eicar-gtube-malware-signature-request.yaml +8 -2
- package/rules/model-abuse/ATR-2026-00284-glitch-token-destabilization.yaml +17 -0
- package/rules/model-abuse/ATR-2026-00289-lmrc-harmful-content-elicitation.yaml +8 -2
- package/rules/model-abuse/ATR-2026-00292-self-harm-eating-disorder-facilitation.yaml +8 -2
- package/rules/model-abuse/ATR-2026-00298-malicious-use-illegal-activity-request.yaml +8 -2
- package/rules/model-abuse/ATR-2026-00299-harmbench-detailed-harmful-instruction.yaml +8 -2
- package/rules/model-abuse/ATR-2026-00413-malwaregen-code-generation-request.yaml +17 -0
- package/rules/model-abuse/ATR-2026-00502-training-data-extraction-via-divergent-repetition-attack.yaml +9 -0
- package/rules/model-abuse/ATR-2026-00517-model-extraction-distillation.yaml +9 -0
- package/rules/model-security/ATR-2026-00072-model-behavior-extraction.yaml +15 -0
- package/rules/model-security/ATR-2026-00073-malicious-finetuning-data.yaml +9 -0
- package/rules/model-security/ATR-2026-00433-modelcache-torch-load-deserialization-rce.yaml +14 -2
- package/rules/privilege-escalation/ATR-2026-00040-privilege-escalation.yaml +11 -2
- package/rules/privilege-escalation/ATR-2026-00041-scope-creep.yaml +8 -2
- package/rules/privilege-escalation/ATR-2026-00107-delayed-execution-bypass.yaml +6 -1
- package/rules/privilege-escalation/ATR-2026-00110-eval-injection.yaml +8 -1
- package/rules/privilege-escalation/ATR-2026-00111-shell-escape.yaml +8 -1
- package/rules/privilege-escalation/ATR-2026-00112-dynamic-import-exploitation.yaml +8 -1
- package/rules/privilege-escalation/ATR-2026-00143-casual-privilege-escalation.yaml +5 -2
- package/rules/privilege-escalation/ATR-2026-00144-rationalized-safety-bypass.yaml +17 -0
- package/rules/privilege-escalation/ATR-2026-00204-stealth-execution-persistence.yaml +16 -0
- package/rules/privilege-escalation/ATR-2026-00436-enclave-vm-sandbox-escape-rce.yaml +11 -2
- package/rules/privilege-escalation/ATR-2026-00441-semantic-kernel-sessions-python-plugin-startup-persistence.yaml +5 -2
- package/rules/privilege-escalation/ATR-2026-00451-litellm-admin-sqli-cisa-kev.yaml +11 -2
- package/rules/privilege-escalation/ATR-2026-00528-praisonai-auth-disabled-default.yaml +15 -0
- package/rules/privilege-escalation/ATR-2026-00539-crewai-codeinterpreter-sandbox-escape-rce.yaml +11 -2
- package/rules/privilege-escalation/ATR-2026-00546-crewai-json-loader-local-file-read.yaml +13 -1
- package/rules/privilege-escalation/ATR-2026-00547-crewai-rag-url-ssrf-bypass.yaml +13 -1
- package/rules/privilege-escalation/ATR-2026-00549-destructive-tool-without-human-approval.yaml +16 -0
- package/rules/privilege-escalation/ATR-2026-00551-cross-conversation-memory-write.yaml +19 -0
- package/rules/prompt-injection/ATR-2026-00001-direct-prompt-injection.yaml +10 -1
- package/rules/prompt-injection/ATR-2026-00002-indirect-prompt-injection.yaml +8 -2
- package/rules/prompt-injection/ATR-2026-00003-jailbreak-attempt.yaml +8 -2
- package/rules/prompt-injection/ATR-2026-00004-system-prompt-override.yaml +17 -0
- package/rules/prompt-injection/ATR-2026-00005-multi-turn-injection.yaml +17 -0
- package/rules/prompt-injection/ATR-2026-00080-encoding-evasion.yaml +19 -0
- package/rules/prompt-injection/ATR-2026-00081-semantic-multi-turn.yaml +19 -0
- package/rules/prompt-injection/ATR-2026-00082-fingerprint-evasion.yaml +19 -0
- package/rules/prompt-injection/ATR-2026-00083-indirect-tool-injection.yaml +22 -0
- package/rules/prompt-injection/ATR-2026-00084-structured-data-injection.yaml +19 -0
- package/rules/prompt-injection/ATR-2026-00085-audit-evasion.yaml +19 -0
- package/rules/prompt-injection/ATR-2026-00086-visual-spoofing.yaml +19 -0
- package/rules/prompt-injection/ATR-2026-00087-rule-probing.yaml +22 -0
- package/rules/prompt-injection/ATR-2026-00088-adaptive-countermeasure.yaml +22 -0
- package/rules/prompt-injection/ATR-2026-00089-polymorphic-skill.yaml +19 -0
- package/rules/prompt-injection/ATR-2026-00090-threat-intel-exfil.yaml +19 -0
- package/rules/prompt-injection/ATR-2026-00091-nested-payload.yaml +19 -0
- package/rules/prompt-injection/ATR-2026-00092-consensus-poisoning.yaml +22 -0
- package/rules/prompt-injection/ATR-2026-00093-gradual-escalation.yaml +22 -0
- package/rules/prompt-injection/ATR-2026-00094-audit-bypass.yaml +19 -0
- package/rules/prompt-injection/ATR-2026-00097-cjk-injection-patterns.yaml +17 -0
- package/rules/prompt-injection/ATR-2026-00104-persona-hijacking.yaml +20 -0
- package/rules/prompt-injection/ATR-2026-00130-indirect-authority-claim.yaml +20 -0
- package/rules/prompt-injection/ATR-2026-00131-fictional-academic-framing.yaml +20 -0
- package/rules/prompt-injection/ATR-2026-00133-paraphrase-injection.yaml +17 -0
- package/rules/prompt-injection/ATR-2026-00137-authority-claim-injection.yaml +17 -0
- package/rules/prompt-injection/ATR-2026-00138-fictional-framing-bypass.yaml +20 -0
- package/rules/prompt-injection/ATR-2026-00140-indirect-reference-reversal.yaml +17 -0
- package/rules/prompt-injection/ATR-2026-00148-language-switch-injection.yaml +20 -0
- package/rules/prompt-injection/ATR-2026-00153-tool-with-embedded-instruction-to-bypass.yaml +20 -0
- package/rules/prompt-injection/ATR-2026-00154-unauthorized-background-task-execution-v.yaml +20 -0
- package/rules/prompt-injection/ATR-2026-00155-hidden-llm-instructions-in-skill-descrip.yaml +23 -0
- package/rules/prompt-injection/ATR-2026-00156-ssh-remote-command-execution-with-creden.yaml +17 -0
- package/rules/prompt-injection/ATR-2026-00163-skill-hidden-override-instruction.yaml +19 -0
- package/rules/prompt-injection/ATR-2026-00202-encoding-evasion-homoglyph-synonym.yaml +20 -0
- package/rules/prompt-injection/ATR-2026-00203-context-pollution-skill-description.yaml +23 -0
- package/rules/prompt-injection/ATR-2026-00206-hidden-priority-instructions.yaml +19 -0
- package/rules/prompt-injection/ATR-2026-00207-hidden-instructions.yaml +22 -0
- package/rules/prompt-injection/ATR-2026-00211-system-prompt-override.yaml +19 -0
- package/rules/prompt-injection/ATR-2026-00213-system-prompt-override.yaml +19 -0
- package/rules/prompt-injection/ATR-2026-00226-identity-substitution.yaml +17 -0
- package/rules/prompt-injection/ATR-2026-00227-historical-persona-jailbreak.yaml +20 -0
- package/rules/prompt-injection/ATR-2026-00228-structured-jailbreak.yaml +17 -0
- package/rules/prompt-injection/ATR-2026-00229-roleplay-jailbreak.yaml +17 -0
- package/rules/prompt-injection/ATR-2026-00230-persona-moral-bypass.yaml +20 -0
- package/rules/prompt-injection/ATR-2026-00231-identity-substitution.yaml +17 -0
- package/rules/prompt-injection/ATR-2026-00233-structured-jailbreak.yaml +17 -0
- package/rules/prompt-injection/ATR-2026-00234-roleplay-jailbreak.yaml +20 -0
- package/rules/prompt-injection/ATR-2026-00235-persona-moral-bypass.yaml +17 -0
- package/rules/prompt-injection/ATR-2026-00236-pseudo-code-jailbreak.yaml +17 -0
- package/rules/prompt-injection/ATR-2026-00237-dual-response-jailbreak.yaml +20 -0
- package/rules/prompt-injection/ATR-2026-00238-identity-replacement.yaml +20 -0
- package/rules/prompt-injection/ATR-2026-00239-amoral-persona-obsession.yaml +17 -0
- package/rules/prompt-injection/ATR-2026-00240-instruction-nullification-identity-repla.yaml +17 -0
- package/rules/prompt-injection/ATR-2026-00241-amoral-character-jailbreak.yaml +17 -0
- package/rules/prompt-injection/ATR-2026-00242-persona-jailbreak.yaml +17 -0
- package/rules/prompt-injection/ATR-2026-00243-acronym-jailbreak.yaml +17 -0
- package/rules/prompt-injection/ATR-2026-00244-dual-response-jailbreak.yaml +17 -0
- package/rules/prompt-injection/ATR-2026-00245-malicious-persona.yaml +17 -0
- package/rules/prompt-injection/ATR-2026-00247-dual-response-jailbreak.yaml +20 -0
- package/rules/prompt-injection/ATR-2026-00249-game-based-jailbreak.yaml +17 -0
- package/rules/prompt-injection/ATR-2026-00251-persona-embodiment-jailbreak.yaml +17 -0
- package/rules/prompt-injection/ATR-2026-00252-narrative-jailbreak.yaml +17 -0
- package/rules/prompt-injection/ATR-2026-00253-enhanced-persona-jailbreak.yaml +17 -0
- package/rules/prompt-injection/ATR-2026-00256-base-n-encoding-jailbreak.yaml +17 -0
- package/rules/prompt-injection/ATR-2026-00257-cipher-transposition-jailbreak.yaml +17 -0
- package/rules/prompt-injection/ATR-2026-00258-unicode-tag-injection.yaml +17 -0
- package/rules/prompt-injection/ATR-2026-00264-latent-injection-translation.yaml +17 -0
- package/rules/prompt-injection/ATR-2026-00265-latent-injection-rag-document.yaml +20 -0
- package/rules/prompt-injection/ATR-2026-00267-gcg-adversarial-suffix.yaml +17 -0
- package/rules/prompt-injection/ATR-2026-00272-hypothetical-response-smuggling.yaml +17 -0
- package/rules/prompt-injection/ATR-2026-00276-invisible-unicode-bidi-injection.yaml +17 -0
- package/rules/prompt-injection/ATR-2026-00278-dra-disguise-reconstruction-attack.yaml +17 -0
- package/rules/prompt-injection/ATR-2026-00280-policy-puppetry-xml-injection.yaml +17 -0
- package/rules/prompt-injection/ATR-2026-00282-perez-prompt-injection-hijack.yaml +17 -0
- package/rules/prompt-injection/ATR-2026-00285-alternate-encoding-jailbreak.yaml +17 -0
- package/rules/prompt-injection/ATR-2026-00286-latent-injection-embedded-context.yaml +17 -0
- package/rules/prompt-injection/ATR-2026-00296-shell-command-injection.yaml +17 -0
- package/rules/prompt-injection/ATR-2026-00297-python-code-execution-rce.yaml +17 -0
- package/rules/prompt-injection/ATR-2026-00308-zalgo-diacritic-overload-encoding.yaml +17 -0
- package/rules/prompt-injection/ATR-2026-00309-braille-unicode-encoded-injection.yaml +17 -0
- package/rules/prompt-injection/ATR-2026-00310-ecoji-emoji-encoded-injection.yaml +17 -0
- package/rules/prompt-injection/ATR-2026-00311-base2048-unicode-script-injection.yaml +17 -0
- package/rules/prompt-injection/ATR-2026-00312-unicode-variant-selector-ascii-smuggling.yaml +17 -0
- package/rules/prompt-injection/ATR-2026-00313-sneaky-bits-zero-width-binary.yaml +17 -0
- package/rules/prompt-injection/ATR-2026-00315-sata-masked-language-model-jailbreak.yaml +20 -0
- package/rules/prompt-injection/ATR-2026-00316-function-masking-predict-mask-bypass.yaml +17 -0
- package/rules/prompt-injection/ATR-2026-00321-hyphenated-system-instruction-injection.yaml +17 -0
- package/rules/prompt-injection/ATR-2026-00359-bank-phishing-smtp-email-code.yaml +20 -0
- package/rules/prompt-injection/ATR-2026-00367-emoji-flag-encoded-hidden-phrase.yaml +17 -0
- package/rules/prompt-injection/ATR-2026-00373-piracy-torrent-site-list-request.yaml +20 -0
- package/rules/prompt-injection/ATR-2026-00378-chatgpt-dom-javascript-redirect-manipulation.yaml +17 -0
- package/rules/prompt-injection/ATR-2026-00382-cocaine-component-development-request.yaml +20 -0
- package/rules/prompt-injection/ATR-2026-00390-explicit-sexual-content-poem-request.yaml +20 -0
- package/rules/prompt-injection/ATR-2026-00394-backspace-deletion-character-injection.yaml +17 -0
- package/rules/prompt-injection/ATR-2026-00395-llm-special-token-boundary-injection.yaml +17 -0
- package/rules/prompt-injection/ATR-2026-00396-toxic-continuation-prefix-injection.yaml +20 -0
- package/rules/prompt-injection/ATR-2026-00397-snowball-impossible-reasoning-injection.yaml +20 -0
- package/rules/prompt-injection/ATR-2026-00399-latent-injection-document-separator-token.yaml +17 -0
- package/rules/prompt-injection/ATR-2026-00400-latent-injection-ignore-instruction-keyword.yaml +17 -0
- package/rules/prompt-injection/ATR-2026-00401-leakreplay-cloze-mask-training-data-elicitation.yaml +20 -0
- package/rules/prompt-injection/ATR-2026-00403-glitch-token-repeat-back-elicitation.yaml +17 -0
- package/rules/prompt-injection/ATR-2026-00407-echo-sql-jinja-template-injection.yaml +17 -0
- package/rules/prompt-injection/ATR-2026-00408-dra-parenthesis-character-reconstruction-attack.yaml +17 -0
- package/rules/prompt-injection/ATR-2026-00409-phrasing-past-future-tense-harmful-query.yaml +17 -0
- package/rules/prompt-injection/ATR-2026-00410-lmrc-harm-category-direct-elicitation.yaml +23 -0
- package/rules/prompt-injection/ATR-2026-00412-figstep-image-list-multimodal-jailbreak.yaml +17 -0
- package/rules/prompt-injection/ATR-2026-00414-continuation-ethnic-slur-completion-elicitation.yaml +20 -0
- package/rules/prompt-injection/ATR-2026-00420-copilot-studio-sharepoint-indirect-injection.yaml +14 -2
- package/rules/prompt-injection/ATR-2026-00442-quoted-exact-output-forcing.yaml +25 -0
- package/rules/prompt-injection/ATR-2026-00443-word-fragment-concat-assembly.yaml +25 -0
- package/rules/prompt-injection/ATR-2026-00444-unicode-obfuscation-in-user-input.yaml +25 -0
- package/rules/prompt-injection/ATR-2026-00445-translation-hijack-with-side-output.yaml +25 -0
- package/rules/prompt-injection/ATR-2026-00446-variable-assignment-payload-injection.yaml +25 -0
- package/rules/prompt-injection/ATR-2026-00447-fictional-generation-containing-target.yaml +25 -0
- package/rules/prompt-injection/ATR-2026-00452-direct-pwned-payload-injection.yaml +9 -0
- package/rules/prompt-injection/ATR-2026-00453-secret-key-reveal-demand.yaml +9 -0
- package/rules/prompt-injection/ATR-2026-00454-backslash-per-character-encoding.yaml +9 -0
- package/rules/prompt-injection/ATR-2026-00455-no-period-output-override.yaml +9 -0
- package/rules/prompt-injection/ATR-2026-00456-conditional-empty-input-injection.yaml +9 -0
- package/rules/prompt-injection/ATR-2026-00457-garak-dan-to-every-prompt.yaml +9 -0
- package/rules/prompt-injection/ATR-2026-00458-garak-dra-step-by-step.yaml +9 -0
- package/rules/prompt-injection/ATR-2026-00459-garak-dan-an-ai-language.yaml +9 -0
- package/rules/prompt-injection/ATR-2026-00460-garak-dan-no-matter-how.yaml +9 -0
- package/rules/prompt-injection/ATR-2026-00461-garak-goodside-mixedunassigned.yaml +9 -0
- package/rules/prompt-injection/ATR-2026-00462-garak-gcg-mixedunassigned.yaml +9 -0
- package/rules/prompt-injection/ATR-2026-00463-garak-lmrc-mixedunassigned.yaml +9 -0
- package/rules/prompt-injection/ATR-2026-00464-garak-dan-i-would-like.yaml +9 -0
- package/rules/prompt-injection/ATR-2026-00465-garak-dan-your-prompt-here.yaml +9 -0
- package/rules/prompt-injection/ATR-2026-00466-garak-latentinjection-following-text-from.yaml +9 -0
- package/rules/prompt-injection/ATR-2026-00467-garak-dan-act-as-a.yaml +9 -0
- package/rules/prompt-injection/ATR-2026-00468-garak-goat-the-target-llm.yaml +9 -0
- package/rules/prompt-injection/ATR-2026-00469-garak-dan-at-the-end.yaml +9 -0
- package/rules/prompt-injection/ATR-2026-00470-garak-dan-as-an-ai.yaml +9 -0
- package/rules/prompt-injection/ATR-2026-00472-garak-dan-are-going-to.yaml +9 -0
- package/rules/prompt-injection/ATR-2026-00473-garak-lmrc-how-can-i.yaml +9 -0
- package/rules/prompt-injection/ATR-2026-00474-garak-dan-that-chatgpt-has.yaml +9 -0
- package/rules/prompt-injection/ATR-2026-00475-garak-web-injection-alert-1-script.yaml +9 -0
- package/rules/prompt-injection/ATR-2026-00476-garak-web-injection-javascript-alert-1.yaml +9 -0
- package/rules/prompt-injection/ATR-2026-00477-garak-latentinjection-translate-the-following-text.yaml +9 -0
- package/rules/prompt-injection/ATR-2026-00478-garak-latentinjection-abuse-contact-email.yaml +9 -0
- package/rules/prompt-injection/ATR-2026-00479-garak-latentinjection-mixedunassigned.yaml +9 -0
- package/rules/prompt-injection/ATR-2026-00480-garak-dan-the-role-of.yaml +9 -0
- package/rules/prompt-injection/ATR-2026-00481-garak-dan-mixedunassigned.yaml +9 -0
- package/rules/prompt-injection/ATR-2026-00482-garak-lmrc-what-s-an.yaml +9 -0
- package/rules/prompt-injection/ATR-2026-00483-garak-dan-the-user-s.yaml +9 -0
- package/rules/prompt-injection/ATR-2026-00484-garak-dan-and-you-will.yaml +9 -0
- package/rules/prompt-injection/ATR-2026-00485-garak-dan-and-does-not.yaml +9 -0
- package/rules/prompt-injection/ATR-2026-00486-garak-dan-t-care-about.yaml +9 -0
- package/rules/prompt-injection/ATR-2026-00487-garak-dan-and-nothing-else.yaml +9 -0
- package/rules/prompt-injection/ATR-2026-00488-garak-encoding-alert-1-script.yaml +9 -0
- package/rules/prompt-injection/ATR-2026-00489-garak-encoding-javascript-alert-1.yaml +9 -0
- package/rules/prompt-injection/ATR-2026-00490-garak-dan-in-order-to.yaml +9 -0
- package/rules/prompt-injection/ATR-2026-00492-garak-dan-want-you-to.yaml +9 -0
- package/rules/prompt-injection/ATR-2026-00493-garak-latentinjection-registrar-iana-id.yaml +9 -0
- package/rules/prompt-injection/ATR-2026-00495-garak-dan-insert-prompt-here.yaml +9 -0
- package/rules/prompt-injection/ATR-2026-00496-garak-dan-no-matter-what.yaml +9 -0
- package/rules/prompt-injection/ATR-2026-00497-garak-encoding-mixedunassigned.yaml +9 -0
- package/rules/prompt-injection/ATR-2026-00498-garak-web-injection-mixedunassigned.yaml +9 -0
- package/rules/prompt-injection/ATR-2026-00499-garak-dan-from-now-on.yaml +9 -0
- package/rules/prompt-injection/ATR-2026-00503-fake-error-state-takeover-unrestricted-replacement-bot.yaml +9 -0
- package/rules/prompt-injection/ATR-2026-00506-nevermind-override-goal-hijacking-in-user-input-promptinject.yaml +9 -0
- package/rules/prompt-injection/ATR-2026-00507-screaming-stop-goal-hijacking-in-user-input-promptinject.yaml +9 -0
- package/rules/prompt-injection/ATR-2026-00508-escape-delimiter-wrapped-goal-hijacking-in-user-input-prompt.yaml +9 -0
- package/rules/prompt-injection/ATR-2026-00509-prompt-leaking-via-ignore-previous-instructions-in-user-inpu.yaml +9 -0
- package/rules/prompt-injection/ATR-2026-00510-delayed-tool-invocation-injection.yaml +9 -0
- package/rules/prompt-injection/ATR-2026-00511-mcp-web-context-poisoning.yaml +9 -0
- package/rules/prompt-injection/ATR-2026-00512-rules-file-backdoor-injection.yaml +9 -0
- package/rules/prompt-injection/ATR-2026-00515-hidden-text-prompt-injection.yaml +9 -0
- package/rules/prompt-injection/ATR-2026-00518-ignore-previous-and-following-instructions-output-command-promptinject.yaml +9 -0
- package/rules/prompt-injection/ATR-2026-00519-tautology-logic-noise-injection-promptbench.yaml +9 -0
- package/rules/prompt-injection/ATR-2026-00520-nlp-task-random-token-suffix-injection-promptbench.yaml +9 -0
- package/rules/prompt-injection/ATR-2026-00535-windsurf-ide-zero-click-prompt-injection.yaml +9 -0
- package/rules/prompt-injection/ATR-2026-00550-untrusted-retrieval-to-privileged-tool.yaml +19 -0
- package/rules/prompt-injection/ATR-2026-00554-langchain-vulnerable-to-template-injecti.yaml +31 -0
- package/rules/prompt-injection/ATR-2026-00565-the-llm-cli-tool-thru-0-27-1-contains-a-.yaml +31 -0
- package/rules/prompt-injection/ATR-2026-00573-semantic-paraphrased-injection.yaml +24 -0
- package/rules/skill-compromise/ATR-2026-00060-skill-impersonation.yaml +17 -2
- package/rules/skill-compromise/ATR-2026-00061-description-behavior-mismatch.yaml +17 -0
- package/rules/skill-compromise/ATR-2026-00062-hidden-capability.yaml +20 -0
- package/rules/skill-compromise/ATR-2026-00063-skill-chain-attack.yaml +23 -0
- package/rules/skill-compromise/ATR-2026-00064-over-permissioned-skill.yaml +20 -0
- package/rules/skill-compromise/ATR-2026-00065-skill-update-attack.yaml +20 -0
- package/rules/skill-compromise/ATR-2026-00066-parameter-injection.yaml +20 -0
- package/rules/skill-compromise/ATR-2026-00120-skill-instruction-injection.yaml +20 -0
- package/rules/skill-compromise/ATR-2026-00121-skill-dangerous-script.yaml +17 -0
- package/rules/skill-compromise/ATR-2026-00122-skill-weaponized-instruction.yaml +20 -0
- package/rules/skill-compromise/ATR-2026-00123-skill-overreach-permissions.yaml +23 -0
- package/rules/skill-compromise/ATR-2026-00124-skill-name-squatting.yaml +20 -0
- package/rules/skill-compromise/ATR-2026-00125-context-poisoning-compaction.yaml +20 -0
- package/rules/skill-compromise/ATR-2026-00126-skill-rug-pull-setup.yaml +17 -0
- package/rules/skill-compromise/ATR-2026-00127-subcommand-overflow.yaml +17 -0
- package/rules/skill-compromise/ATR-2026-00128-html-comment-hidden-payload.yaml +17 -0
- package/rules/skill-compromise/ATR-2026-00129-unicode-smuggling.yaml +22 -0
- package/rules/skill-compromise/ATR-2026-00134-fork-claim-impersonation.yaml +19 -0
- package/rules/skill-compromise/ATR-2026-00135-exfil-url-in-instructions.yaml +20 -0
- package/rules/skill-compromise/ATR-2026-00147-fork-impersonation.yaml +17 -0
- package/rules/skill-compromise/ATR-2026-00149-skill-exfil-compound.yaml +23 -0
- package/rules/skill-compromise/ATR-2026-00151-fork-impersonation-install.yaml +20 -0
- package/rules/skill-compromise/ATR-2026-00157-timebomb-credential-exfil.yaml +20 -0
- package/rules/skill-compromise/ATR-2026-00200-agent-memory-config-tampering.yaml +23 -0
- package/rules/skill-compromise/ATR-2026-00214-credential-theft.yaml +22 -0
- package/rules/skill-compromise/ATR-2026-00217-credential-harvesting.yaml +23 -0
- package/rules/skill-compromise/ATR-2026-00220-malware-dropper.yaml +17 -0
- package/rules/skill-compromise/ATR-2026-00222-credential-harvesting.yaml +17 -0
- package/rules/skill-compromise/ATR-2026-00223-reverse-shell-dropper.yaml +20 -0
- package/rules/skill-compromise/ATR-2026-00224-credential-exfiltration.yaml +17 -0
- package/rules/skill-compromise/ATR-2026-00225-c2-communication.yaml +17 -0
- package/rules/skill-compromise/ATR-2026-00260-package-hallucination.yaml +20 -0
- package/rules/skill-compromise/ATR-2026-00262-av-evasion-code-gen.yaml +20 -0
- package/rules/skill-compromise/ATR-2026-00263-credential-file-read-gen.yaml +20 -0
- package/rules/skill-compromise/ATR-2026-00266-malware-dropper-gen.yaml +23 -0
- package/rules/skill-compromise/ATR-2026-00283-malwaregen-generic-virus-payload-request.yaml +23 -0
- package/rules/skill-compromise/ATR-2026-00398-huggingface-unsafe-model-artifact-load.yaml +17 -0
- package/rules/skill-compromise/ATR-2026-00425-nl-persistent-covert-hook.yaml +18 -0
- package/rules/skill-compromise/ATR-2026-00427-nl-fake-error-instruction-bypass.yaml +18 -0
- package/rules/skill-compromise/ATR-2026-00429-nl-skill-self-modification.yaml +18 -0
- package/rules/skill-compromise/ATR-2026-00523-claude-code-hooks-session-start-pre-trust-rce.yaml +14 -2
- package/rules/skill-compromise/ATR-2026-00525-mini-shai-hulud-gh-token-monitor-persistence.yaml +18 -0
- package/rules/skill-compromise/ATR-2026-00527-skill-silent-git-remote-mirror-exfiltration.yaml +15 -0
- package/rules/tool-poisoning/ATR-2026-00010-mcp-malicious-response.yaml +11 -2
- package/rules/tool-poisoning/ATR-2026-00011-tool-output-injection.yaml +17 -0
- package/rules/tool-poisoning/ATR-2026-00012-unauthorized-tool-call.yaml +17 -0
- package/rules/tool-poisoning/ATR-2026-00013-tool-ssrf.yaml +17 -0
- package/rules/tool-poisoning/ATR-2026-00095-supply-chain-poisoning.yaml +22 -0
- package/rules/tool-poisoning/ATR-2026-00096-registry-poisoning.yaml +19 -0
- package/rules/tool-poisoning/ATR-2026-00100-consent-bypass-instruction.yaml +20 -0
- package/rules/tool-poisoning/ATR-2026-00101-trust-escalation-override.yaml +20 -0
- package/rules/tool-poisoning/ATR-2026-00103-hidden-safety-bypass-instruction.yaml +17 -0
- package/rules/tool-poisoning/ATR-2026-00105-silent-action-concealment.yaml +20 -0
- package/rules/tool-poisoning/ATR-2026-00106-schema-description-contradiction.yaml +17 -0
- package/rules/tool-poisoning/ATR-2026-00161-important-tag-cross-tool-shadowing.yaml +20 -0
- package/rules/tool-poisoning/ATR-2026-00209-mcpwn-runaway-invocation.yaml +14 -2
- package/rules/tool-poisoning/ATR-2026-00210-flowise-system-message-override.yaml +11 -2
- package/rules/tool-poisoning/ATR-2026-00259-ansi-escape-injection.yaml +17 -0
- package/rules/tool-poisoning/ATR-2026-00270-xss-in-tool-response.yaml +17 -0
- package/rules/tool-poisoning/ATR-2026-00277-echo-template-command-injection.yaml +17 -0
- package/rules/tool-poisoning/ATR-2026-00393-ansi-code-elicitation-request.yaml +17 -0
- package/rules/tool-poisoning/ATR-2026-00415-flowise-custom-mcp-stdio-rce.yaml +11 -2
- package/rules/tool-poisoning/ATR-2026-00419-cursor-mcp-zero-click-config.yaml +13 -1
- package/rules/tool-poisoning/ATR-2026-00434-mcp-remote-authorization-endpoint-command-injection.yaml +11 -2
- package/rules/tool-poisoning/ATR-2026-00435-azure-mcp-server-missing-authentication.yaml +11 -2
- package/rules/tool-poisoning/ATR-2026-00448-spring-ai-milvus-filter-injection.yaml +11 -2
- package/rules/tool-poisoning/ATR-2026-00494-garak-exploitation-mixedunassigned.yaml +12 -0
- package/rules/tool-poisoning/ATR-2026-00513-package-hallucination-exploitation.yaml +12 -0
- package/rules/tool-poisoning/ATR-2026-00521-shell-command-injection-agent-tool-context.yaml +12 -0
- package/rules/tool-poisoning/ATR-2026-00522-sql-injection-natural-language-agent-interface.yaml +12 -0
- package/rules/tool-poisoning/ATR-2026-00526-claude-code-shell-metachar-in-double-quoted-path.yaml +15 -0
- package/rules/tool-poisoning/ATR-2026-00529-litellm-proxy-sqli-cisa-kev.yaml +15 -0
- package/rules/tool-poisoning/ATR-2026-00530-ms-agent-shell-tool-unsanitized-argv-rce.yaml +15 -0
- package/rules/tool-poisoning/ATR-2026-00531-praisonai-unauthenticated-agent-api.yaml +11 -2
- package/rules/tool-poisoning/ATR-2026-00532-apache-doris-mcp-sql-injection.yaml +11 -2
- package/rules/tool-poisoning/ATR-2026-00533-apache-pinot-mcp-unauthenticated-takeover.yaml +10 -1
- package/rules/tool-poisoning/ATR-2026-00534-alibaba-rds-mcp-unauthenticated-metadata-exfil.yaml +10 -1
- package/rules/tool-poisoning/ATR-2026-00536-nginx-ui-mcp-unauthenticated-command-execution.yaml +11 -2
- package/rules/tool-poisoning/ATR-2026-00537-fastmcp-server-name-cmd-injection-windows.yaml +11 -2
- package/rules/tool-poisoning/ATR-2026-00538-langchain-chatchat-mcp-stdio-unauthenticated-rce.yaml +10 -1
- package/rules/tool-poisoning/ATR-2026-00540-praisonai-parse-mcp-command-cli-injection.yaml +13 -1
- package/rules/tool-poisoning/ATR-2026-00541-agent-zero-mcp-config-command-injection.yaml +13 -1
- package/rules/tool-poisoning/ATR-2026-00542-upsonic-mcp-command-allowlist-bypass.yaml +13 -1
- package/rules/tool-poisoning/ATR-2026-00543-litellm-mcp-server-argv-injection.yaml +13 -1
- package/rules/tool-poisoning/ATR-2026-00544-praisonai-pth-file-path-traversal-rce.yaml +13 -1
- package/rules/tool-poisoning/ATR-2026-00545-praisonai-tool-override-unauth-rce.yaml +13 -1
- package/rules/tool-poisoning/ATR-2026-00561-fastmcp-vulnerable-to-windows-command-in.yaml +28 -0
- package/rules/tool-poisoning/ATR-2026-00567-mcp-stdio-config-command-injection.yaml +28 -0
- package/rules/tool-poisoning/ATR-2026-00568-agent-ssrf-cloud-metadata-file-inclusion.yaml +28 -0
- package/rules/tool-poisoning/ATR-2026-00572-symjack-symlink-config-redirection.yaml +22 -0
- package/spec/atr-schema.yaml +123 -0
- package/spec/compliance-metadata.md +15 -13
|
@@ -24,6 +24,8 @@ references:
|
|
|
24
24
|
- ASI01:2026 - Agent Goal Hijack
|
|
25
25
|
mitre_atlas:
|
|
26
26
|
- AML.T0043 - Craft Adversarial Data
|
|
27
|
+
owasp_llm:
|
|
28
|
+
- LLM01:2025 - Prompt Injection
|
|
27
29
|
compliance:
|
|
28
30
|
eu_ai_act:
|
|
29
31
|
- article: "14"
|
|
@@ -32,6 +34,9 @@ compliance:
|
|
|
32
34
|
- article: "15"
|
|
33
35
|
context: "Article 15 robustness requirements mandate that high-risk AI systems resist adversarial manipulation; Sybil attacks on consensus mechanisms are a documented adversarial pattern requiring systematic detection."
|
|
34
36
|
strength: secondary
|
|
37
|
+
- article: "9"
|
|
38
|
+
context: "Article 9 (risk management system) requires identified risks to be addressed by appropriate measures; this rule is a runtime risk-treatment control that detects the agent-manipulation technique (Multi-Agent Consensus Sybil Attack)."
|
|
39
|
+
strength: secondary
|
|
35
40
|
nist_ai_rmf:
|
|
36
41
|
- subcategory: "GV.1.2"
|
|
37
42
|
context: "Consensus Sybil attacks exploit undefined accountability for agent identity in multi-agent systems; GV.1.2 requires that accountability roles for AI risk management are defined and assigned to prevent Sybil-style identity fabrication."
|
|
@@ -39,12 +44,15 @@ compliance:
|
|
|
39
44
|
- subcategory: "MG.2.3"
|
|
40
45
|
context: "Sybil attacks represent a documented risk treatment gap in multi-agent consensus deployments; MG.2.3 requires that risk treatment plans are implemented for identified AI risks, including detection controls for fake-identity flooding."
|
|
41
46
|
strength: secondary
|
|
47
|
+
- subcategory: "MS.2.7"
|
|
48
|
+
context: "NIST AI RMF MEASURE 2.7 (security and resilience evaluated and documented) is supported by this rule's runtime detection of the agent-manipulation technique (Multi-Agent Consensus Sybil Attack)."
|
|
49
|
+
strength: primary
|
|
42
50
|
iso_42001:
|
|
43
51
|
- clause: "6.2"
|
|
44
52
|
context: "Multi-agent systems deploying consensus mechanisms must include Sybil attack detection as a planned risk treatment activity under the AI objectives framework required by clause 6.2."
|
|
45
53
|
strength: primary
|
|
46
|
-
- clause: "8.
|
|
47
|
-
context: "Clause 8.
|
|
54
|
+
- clause: "8.1"
|
|
55
|
+
context: "Clause 8.1 operational controls must ensure that consensus decisions are made by verified agent identities and that fabricated voter identities are detected before they influence multi-agent outcomes."
|
|
48
56
|
strength: secondary
|
|
49
57
|
tags:
|
|
50
58
|
category: agent-manipulation
|
|
@@ -20,6 +20,10 @@ references:
|
|
|
20
20
|
- ASI07:2026 - Insecure Inter-Agent Communication
|
|
21
21
|
mitre_attack:
|
|
22
22
|
- T1557 - Adversary-in-the-Middle
|
|
23
|
+
owasp_llm:
|
|
24
|
+
- LLM01:2025 - Prompt Injection
|
|
25
|
+
mitre_atlas:
|
|
26
|
+
- AML.T0051 - LLM Prompt Injection
|
|
23
27
|
compliance:
|
|
24
28
|
eu_ai_act:
|
|
25
29
|
- article: "15"
|
|
@@ -28,6 +32,9 @@ compliance:
|
|
|
28
32
|
- article: "14"
|
|
29
33
|
context: "A2A message injection can cause agents to take actions outside their authorized scope without human awareness, eroding the effective oversight capability Article 14 requires for high-risk AI systems."
|
|
30
34
|
strength: secondary
|
|
35
|
+
- article: "9"
|
|
36
|
+
context: "Article 9 (risk management system) requires identified risks to be addressed by appropriate measures; this rule is a runtime risk-treatment control that detects the agent-manipulation technique (Malicious Agent-to-Agent Message Injection)."
|
|
37
|
+
strength: secondary
|
|
31
38
|
nist_ai_rmf:
|
|
32
39
|
- subcategory: "MP.5.1"
|
|
33
40
|
context: "Malicious A2A message injection is a documented adversarial input risk targeting inter-agent communication channels; MP.5.1 requires that adversarial input risks to AI systems are identified and tracked to enable detection of embedded payload attacks."
|
|
@@ -35,12 +42,15 @@ compliance:
|
|
|
35
42
|
- subcategory: "MG.2.3"
|
|
36
43
|
context: "A2A message validation failures represent a risk requiring active treatment; MG.2.3 requires that risk treatment plans are implemented for identified AI risks, including validation controls on all inter-agent message channels."
|
|
37
44
|
strength: secondary
|
|
45
|
+
- subcategory: "MS.2.7"
|
|
46
|
+
context: "NIST AI RMF MEASURE 2.7 (security and resilience evaluated and documented) is supported by this rule's runtime detection of the agent-manipulation technique (Malicious Agent-to-Agent Message Injection)."
|
|
47
|
+
strength: primary
|
|
38
48
|
iso_42001:
|
|
39
49
|
- clause: "6.2"
|
|
40
50
|
context: "AI system plans under clause 6.2 must include risk treatment activities for inter-agent message injection, ensuring that A2A communication validation is a planned control rather than an afterthought."
|
|
41
51
|
strength: primary
|
|
42
|
-
- clause: "8.
|
|
43
|
-
context: "Clause 8.
|
|
52
|
+
- clause: "8.1"
|
|
53
|
+
context: "Clause 8.1 operational controls require that inter-agent messages are validated before execution, preventing injected instructions from executing in the receiving agent's security context."
|
|
44
54
|
strength: secondary
|
|
45
55
|
tags:
|
|
46
56
|
category: agent-manipulation
|
|
@@ -20,6 +20,10 @@ references:
|
|
|
20
20
|
- ASI10:2026 - Rogue Agents
|
|
21
21
|
mitre_attack:
|
|
22
22
|
- T1036 - Masquerading
|
|
23
|
+
owasp_llm:
|
|
24
|
+
- LLM01:2025 - Prompt Injection
|
|
25
|
+
mitre_atlas:
|
|
26
|
+
- AML.T0051 - LLM Prompt Injection
|
|
23
27
|
compliance:
|
|
24
28
|
eu_ai_act:
|
|
25
29
|
- article: "13"
|
|
@@ -28,6 +32,12 @@ compliance:
|
|
|
28
32
|
- article: "15"
|
|
29
33
|
context: "Article 15 cybersecurity requirements include protection against masquerading attacks; identity spoofing in multi-agent systems represents a documented adversarial pattern targeting the authentication layer of agent architectures."
|
|
30
34
|
strength: secondary
|
|
35
|
+
- article: "14"
|
|
36
|
+
context: "Article 14 (human oversight) requires high-risk AI systems to remain subject to effective human oversight; this rule provides detection evidence where the agent-manipulation technique (Agent Identity Spoofing and Authority Impersonation) would bypass or undermine that oversight."
|
|
37
|
+
strength: secondary
|
|
38
|
+
- article: "9"
|
|
39
|
+
context: "Article 9 (risk management system) requires identified risks to be addressed by appropriate measures; this rule is a runtime risk-treatment control that detects the agent-manipulation technique (Agent Identity Spoofing and Authority Impersonation)."
|
|
40
|
+
strength: secondary
|
|
31
41
|
nist_ai_rmf:
|
|
32
42
|
- subcategory: "GV.6.1"
|
|
33
43
|
context: "Agent identity spoofing exploits the absence of verified identity provenance in inter-agent data flows; GV.6.1 data governance policies must define how agent identity claims are authenticated to prevent masquerading attacks."
|
|
@@ -35,6 +45,12 @@ compliance:
|
|
|
35
45
|
- subcategory: "MP.5.1"
|
|
36
46
|
context: "Impersonation of admin or supervisor agent roles is an adversarial input risk that must be tracked under MP.5.1 to ensure detection controls cover identity-claim-based privilege escalation patterns."
|
|
37
47
|
strength: secondary
|
|
48
|
+
- subcategory: "MS.2.7"
|
|
49
|
+
context: "NIST AI RMF MEASURE 2.7 (security and resilience evaluated and documented) is supported by this rule's runtime detection of the agent-manipulation technique (Agent Identity Spoofing and Authority Impersonation)."
|
|
50
|
+
strength: primary
|
|
51
|
+
- subcategory: "MG.2.3"
|
|
52
|
+
context: "NIST AI RMF MANAGE 2.3 (respond to previously unknown identified risks) is supported by this rule, which surfaces the agent-manipulation technique (Agent Identity Spoofing and Authority Impersonation) so the risk can be treated."
|
|
53
|
+
strength: secondary
|
|
38
54
|
iso_42001:
|
|
39
55
|
- clause: "8.4"
|
|
40
56
|
context: "Clause 8.4 AI system impact assessments must document the risk that unverified agent identity claims allow privilege escalation, and review controls that ensure identity spoofing is detectable before actions are taken."
|
|
@@ -42,6 +58,12 @@ compliance:
|
|
|
42
58
|
- clause: "9.1"
|
|
43
59
|
context: "Clause 9.1 performance monitoring must include evaluation of whether agent identity verification controls are functioning correctly and catching masquerading attacks in operational deployments."
|
|
44
60
|
strength: secondary
|
|
61
|
+
- clause: "8.1"
|
|
62
|
+
context: "ISO/IEC 42001 Clause 8.1 (operational planning and control, including control of externally provided processes) is operationalised by this rule's detection of the agent-manipulation technique (Agent Identity Spoofing and Authority Impersonation)."
|
|
63
|
+
strength: primary
|
|
64
|
+
- clause: "6.2"
|
|
65
|
+
context: "ISO/IEC 42001 Clause 6.2 (AI objectives and planning) calls for risk treatment of known attack patterns; this rule's detection of the agent-manipulation technique (Agent Identity Spoofing and Authority Impersonation) is such a treatment."
|
|
66
|
+
strength: secondary
|
|
45
67
|
tags:
|
|
46
68
|
category: agent-manipulation
|
|
47
69
|
subcategory: agent-identity-spoofing
|
|
@@ -19,6 +19,10 @@ references:
|
|
|
19
19
|
- ASI09:2026 - Human Trust Exploitation
|
|
20
20
|
mitre_attack:
|
|
21
21
|
- T1204 - User Execution
|
|
22
|
+
owasp_llm:
|
|
23
|
+
- LLM01:2025 - Prompt Injection
|
|
24
|
+
mitre_atlas:
|
|
25
|
+
- AML.T0051 - LLM Prompt Injection
|
|
22
26
|
compliance:
|
|
23
27
|
nist_ai_rmf:
|
|
24
28
|
- subcategory: "GV.6.1"
|
|
@@ -30,6 +34,26 @@ compliance:
|
|
|
30
34
|
- subcategory: "MG.2.3"
|
|
31
35
|
context: "Detection of approval-fatigue patterns must trigger risk treatment plans that pause, throttle, or escalate suspicious bulk approval flows before dangerous tool calls execute; MG.2.3 mandates that these supersede/disengage mechanisms are pre-defined and activated on detection."
|
|
32
36
|
strength: secondary
|
|
37
|
+
- subcategory: "MS.2.7"
|
|
38
|
+
context: "NIST AI RMF MEASURE 2.7 (security and resilience evaluated and documented) is supported by this rule's runtime detection of the agent-manipulation technique (Human Approval Fatigue Exploitation)."
|
|
39
|
+
strength: primary
|
|
40
|
+
iso_42001:
|
|
41
|
+
- clause: "8.1"
|
|
42
|
+
context: "ISO/IEC 42001 Clause 8.1 (operational planning and control, including control of externally provided processes) is operationalised by this rule's detection of the agent-manipulation technique (Human Approval Fatigue Exploitation)."
|
|
43
|
+
strength: primary
|
|
44
|
+
- clause: "6.2"
|
|
45
|
+
context: "ISO/IEC 42001 Clause 6.2 (AI objectives and planning) calls for risk treatment of known attack patterns; this rule's detection of the agent-manipulation technique (Human Approval Fatigue Exploitation) is such a treatment."
|
|
46
|
+
strength: secondary
|
|
47
|
+
eu_ai_act:
|
|
48
|
+
- article: "15"
|
|
49
|
+
context: "Article 15 (accuracy, robustness and cybersecurity) requires high-risk AI systems to resist unauthorised attempts to alter their use, outputs or performance; this rule provides runtime detection evidence by flagging the agent-manipulation technique (Human Approval Fatigue Exploitation)."
|
|
50
|
+
strength: primary
|
|
51
|
+
- article: "14"
|
|
52
|
+
context: "Article 14 (human oversight) requires high-risk AI systems to remain subject to effective human oversight; this rule provides detection evidence where the agent-manipulation technique (Human Approval Fatigue Exploitation) would bypass or undermine that oversight."
|
|
53
|
+
strength: secondary
|
|
54
|
+
- article: "9"
|
|
55
|
+
context: "Article 9 (risk management system) requires identified risks to be addressed by appropriate measures; this rule is a runtime risk-treatment control that detects the agent-manipulation technique (Human Approval Fatigue Exploitation)."
|
|
56
|
+
strength: secondary
|
|
33
57
|
tags:
|
|
34
58
|
category: agent-manipulation
|
|
35
59
|
subcategory: approval-fatigue
|
|
@@ -19,6 +19,10 @@ references:
|
|
|
19
19
|
- ASI09:2026 - Human Trust Exploitation
|
|
20
20
|
mitre_attack:
|
|
21
21
|
- T1566 - Phishing
|
|
22
|
+
owasp_llm:
|
|
23
|
+
- LLM01:2025 - Prompt Injection
|
|
24
|
+
mitre_atlas:
|
|
25
|
+
- AML.T0051 - LLM Prompt Injection
|
|
22
26
|
compliance:
|
|
23
27
|
eu_ai_act:
|
|
24
28
|
- article: "13"
|
|
@@ -27,6 +31,12 @@ compliance:
|
|
|
27
31
|
- article: "14"
|
|
28
32
|
context: "Agents weaponized for social engineering undermine the informed human judgment that Article 14 oversight depends on; users manipulated through trusted AI channels cannot exercise effective oversight of subsequent decisions."
|
|
29
33
|
strength: secondary
|
|
34
|
+
- article: "15"
|
|
35
|
+
context: "Article 15 (accuracy, robustness and cybersecurity) requires high-risk AI systems to resist unauthorised attempts to alter their use, outputs or performance; this rule provides runtime detection evidence by flagging the agent-manipulation technique (Social Engineering Attack via Agent Output)."
|
|
36
|
+
strength: primary
|
|
37
|
+
- article: "9"
|
|
38
|
+
context: "Article 9 (risk management system) requires identified risks to be addressed by appropriate measures; this rule is a runtime risk-treatment control that detects the agent-manipulation technique (Social Engineering Attack via Agent Output)."
|
|
39
|
+
strength: secondary
|
|
30
40
|
nist_ai_rmf:
|
|
31
41
|
- subcategory: "GV.6.1"
|
|
32
42
|
context: "Social engineering delivered via agent output exploits user trust in AI-generated content to harvest credentials and personal data; GV.6.1 data governance policies must address how AI-generated communications are authenticated to prevent agent-mediated phishing."
|
|
@@ -34,6 +44,12 @@ compliance:
|
|
|
34
44
|
- subcategory: "MP.5.1"
|
|
35
45
|
context: "Using agents as social engineering vectors is an adversarial input risk where poisoned context produces manipulative outputs; MP.5.1 requires this risk to be identified and tracked so that urgency and authority-impersonation patterns in agent output are monitored."
|
|
36
46
|
strength: secondary
|
|
47
|
+
- subcategory: "MS.2.7"
|
|
48
|
+
context: "NIST AI RMF MEASURE 2.7 (security and resilience evaluated and documented) is supported by this rule's runtime detection of the agent-manipulation technique (Social Engineering Attack via Agent Output)."
|
|
49
|
+
strength: primary
|
|
50
|
+
- subcategory: "MG.2.3"
|
|
51
|
+
context: "NIST AI RMF MANAGE 2.3 (respond to previously unknown identified risks) is supported by this rule, which surfaces the agent-manipulation technique (Social Engineering Attack via Agent Output) so the risk can be treated."
|
|
52
|
+
strength: secondary
|
|
37
53
|
iso_42001:
|
|
38
54
|
- clause: "8.4"
|
|
39
55
|
context: "Clause 8.4 impact assessments must document the elevated trust users place in AI-generated content and evaluate whether social engineering via agent output constitutes a significant harm requiring dedicated detection controls."
|
|
@@ -41,6 +57,12 @@ compliance:
|
|
|
41
57
|
- clause: "9.1"
|
|
42
58
|
context: "Clause 9.1 performance monitoring must evaluate whether agent output monitoring detects social engineering patterns such as urgency language and credential-harvesting requests generated through poisoned agent context."
|
|
43
59
|
strength: secondary
|
|
60
|
+
- clause: "8.1"
|
|
61
|
+
context: "ISO/IEC 42001 Clause 8.1 (operational planning and control, including control of externally provided processes) is operationalised by this rule's detection of the agent-manipulation technique (Social Engineering Attack via Agent Output)."
|
|
62
|
+
strength: primary
|
|
63
|
+
- clause: "6.2"
|
|
64
|
+
context: "ISO/IEC 42001 Clause 6.2 (AI objectives and planning) calls for risk treatment of known attack patterns; this rule's detection of the agent-manipulation technique (Social Engineering Attack via Agent Output) is such a treatment."
|
|
65
|
+
strength: secondary
|
|
44
66
|
tags:
|
|
45
67
|
category: agent-manipulation
|
|
46
68
|
subcategory: social-engineering-via-agent
|
|
@@ -34,6 +34,9 @@ compliance:
|
|
|
34
34
|
- article: "9"
|
|
35
35
|
context: "Rationalized bypasses and casual scope escalation are documented attack patterns that Article 9 risk management systems must account for; detection of informal social engineering is a required risk treatment for multi-agent deployments."
|
|
36
36
|
strength: secondary
|
|
37
|
+
- article: "15"
|
|
38
|
+
context: "Article 15 (accuracy, robustness and cybersecurity) requires high-risk AI systems to resist unauthorised attempts to alter their use, outputs or performance; this rule provides runtime detection evidence by flagging the agent-manipulation technique (Casual Authority Claim and Scope Escalation)."
|
|
39
|
+
strength: primary
|
|
37
40
|
nist_ai_rmf:
|
|
38
41
|
- subcategory: "GV.1.2"
|
|
39
42
|
context: "Casual authority escalation exploits gaps in accountability role definitions, allowing informal authority claims like 'the boss said' to bypass controls without challenge; GV.1.2 requires that accountability roles are formally assigned so that unverifiable casual authority claims are detectable."
|
|
@@ -41,12 +44,15 @@ compliance:
|
|
|
41
44
|
- subcategory: "MG.2.3"
|
|
42
45
|
context: "Conversational-tone scope escalation is a documented risk requiring active treatment; MG.2.3 requires that risk treatment plans address informal social engineering techniques that evade formal injection detection in multi-agent deployments."
|
|
43
46
|
strength: secondary
|
|
47
|
+
- subcategory: "MS.2.7"
|
|
48
|
+
context: "NIST AI RMF MEASURE 2.7 (security and resilience evaluated and documented) is supported by this rule's runtime detection of the agent-manipulation technique (Casual Authority Claim and Scope Escalation)."
|
|
49
|
+
strength: primary
|
|
44
50
|
iso_42001:
|
|
45
51
|
- clause: "6.2"
|
|
46
52
|
context: "Clause 6.2 AI objectives and plans must include risk treatment for casual authority claim patterns that bypass formal injection detection, as these represent a distinct attack surface from explicit privilege escalation."
|
|
47
53
|
strength: primary
|
|
48
|
-
- clause: "8.
|
|
49
|
-
context: "Clause 8.
|
|
54
|
+
- clause: "8.1"
|
|
55
|
+
context: "Clause 8.1 operational controls must ensure that agents require verified authorization before acting on scope expansions or control bypasses, regardless of whether the instruction is phrased formally or in casual conversational language."
|
|
50
56
|
strength: secondary
|
|
51
57
|
tags:
|
|
52
58
|
category: agent-manipulation
|
|
@@ -29,6 +29,9 @@ compliance:
|
|
|
29
29
|
- article: "15"
|
|
30
30
|
context: "Article 15 cybersecurity requirements include protection against social engineering attacks targeting agent output pipelines; casual authority redirect is a documented adversarial technique exploiting agents as exfiltration proxies."
|
|
31
31
|
strength: secondary
|
|
32
|
+
- article: "9"
|
|
33
|
+
context: "Article 9 (risk management system) requires identified risks to be addressed by appropriate measures; this rule is a runtime risk-treatment control that detects the agent-manipulation technique (Casual Authority Data Redirect)."
|
|
34
|
+
strength: secondary
|
|
32
35
|
nist_ai_rmf:
|
|
33
36
|
- subcategory: "GV.1.2"
|
|
34
37
|
context: "Casual authority redirect attacks succeed when no formal accountability role exists to verify output redirection decisions; GV.1.2 requires that accountability roles for AI risk management are defined so that informal directives to redirect agent output can be challenged and blocked."
|
|
@@ -36,12 +39,15 @@ compliance:
|
|
|
36
39
|
- subcategory: "MG.2.3"
|
|
37
40
|
context: "Data redirect via social engineering authority claims is a documented exfiltration risk requiring active treatment; MG.2.3 requires that risk treatment plans include detection controls for authority-impersonation-based output redirection attacks."
|
|
38
41
|
strength: secondary
|
|
42
|
+
- subcategory: "MS.2.7"
|
|
43
|
+
context: "NIST AI RMF MEASURE 2.7 (security and resilience evaluated and documented) is supported by this rule's runtime detection of the agent-manipulation technique (Casual Authority Data Redirect)."
|
|
44
|
+
strength: primary
|
|
39
45
|
iso_42001:
|
|
40
46
|
- clause: "6.2"
|
|
41
47
|
context: "Clause 6.2 AI objectives and plans must include controls for preventing agents from being redirected to attacker-controlled endpoints through casual authority claims that spoof organizational hierarchy."
|
|
42
48
|
strength: primary
|
|
43
|
-
- clause: "8.
|
|
44
|
-
context: "Clause 8.
|
|
49
|
+
- clause: "8.1"
|
|
50
|
+
context: "Clause 8.1 operational controls must ensure that agent output destinations are validated against authorized endpoints and that casual authority directives to disable safety filters or redirect data are blocked before execution."
|
|
45
51
|
strength: secondary
|
|
46
52
|
tags:
|
|
47
53
|
category: agent-manipulation
|
|
@@ -17,6 +17,8 @@ references:
|
|
|
17
17
|
- 'LLM06:2025 - Excessive Agency'
|
|
18
18
|
owasp_agentic:
|
|
19
19
|
- 'ASI03:2026 - Cross-Agent Escalation'
|
|
20
|
+
mitre_atlas:
|
|
21
|
+
- AML.T0051 - LLM Prompt Injection
|
|
20
22
|
compliance:
|
|
21
23
|
eu_ai_act:
|
|
22
24
|
- article: "14"
|
|
@@ -25,6 +27,9 @@ compliance:
|
|
|
25
27
|
- article: "15"
|
|
26
28
|
context: "Article 15 cybersecurity requirements mandate protection against supply chain attacks; malicious SKILL.md files represent a documented technique for injecting unauthorized capability expansion at the skill-definition layer."
|
|
27
29
|
strength: secondary
|
|
30
|
+
- article: "9"
|
|
31
|
+
context: "Article 9 (risk management system) requires identified risks to be addressed by appropriate measures; this rule is a runtime risk-treatment control that detects the agent-manipulation technique (Skill Scope Hijacking and Cross-Agent Escalation)."
|
|
32
|
+
strength: secondary
|
|
28
33
|
nist_ai_rmf:
|
|
29
34
|
- subcategory: "GV.1.2"
|
|
30
35
|
context: "Skill scope hijacking succeeds when no accountability role governs what capabilities a skill may claim; GV.1.2 requires that accountability roles for AI risk management are defined and assigned, ensuring that SKILL.md capability claims are reviewed against authorized scope boundaries."
|
|
@@ -32,12 +37,18 @@ compliance:
|
|
|
32
37
|
- subcategory: "MP.2.3"
|
|
33
38
|
context: "Malicious SKILL.md files in external skill repositories represent an AI supply chain risk source; MP.2.3 requires that AI supply chain risk sources are identified and assessed, covering the skill-definition layer as a vector for privilege escalation injection."
|
|
34
39
|
strength: secondary
|
|
40
|
+
- subcategory: "MS.2.7"
|
|
41
|
+
context: "NIST AI RMF MEASURE 2.7 (security and resilience evaluated and documented) is supported by this rule's runtime detection of the agent-manipulation technique (Skill Scope Hijacking and Cross-Agent Escalation)."
|
|
42
|
+
strength: primary
|
|
43
|
+
- subcategory: "MG.2.3"
|
|
44
|
+
context: "NIST AI RMF MANAGE 2.3 (respond to previously unknown identified risks) is supported by this rule, which surfaces the agent-manipulation technique (Skill Scope Hijacking and Cross-Agent Escalation) so the risk can be treated."
|
|
45
|
+
strength: secondary
|
|
35
46
|
iso_42001:
|
|
36
47
|
- clause: "6.2"
|
|
37
48
|
context: "Clause 6.2 AI objectives and plans must include risk treatment for skill-definition-layer attacks where SKILL.md instructions expand agent scope beyond authorized boundaries without human knowledge."
|
|
38
49
|
strength: primary
|
|
39
|
-
- clause: "8.
|
|
40
|
-
context: "Clause 8.
|
|
50
|
+
- clause: "8.1"
|
|
51
|
+
context: "Clause 8.1 controls over externally-provided AI components must include validation of SKILL.md files to ensure that skill definitions do not embed cross-agent escalation or unauthorized scope expansion instructions."
|
|
41
52
|
strength: secondary
|
|
42
53
|
tags:
|
|
43
54
|
category: agent-manipulation
|
|
@@ -35,6 +35,23 @@ compliance:
|
|
|
35
35
|
- subcategory: "MG.2.3"
|
|
36
36
|
context: "Matches on tense-reframing jailbreak payloads should trigger pre-defined response actions to block or disengage the affected generation path before harmful content is returned, as required by MG.2.3."
|
|
37
37
|
strength: secondary
|
|
38
|
+
iso_42001:
|
|
39
|
+
- clause: "8.1"
|
|
40
|
+
context: "ISO/IEC 42001 Clause 8.1 (operational planning and control, including control of externally provided processes) is operationalised by this rule's detection of the agent-manipulation technique (Historical / Future Tense Framing Bypass)."
|
|
41
|
+
strength: primary
|
|
42
|
+
- clause: "6.2"
|
|
43
|
+
context: "ISO/IEC 42001 Clause 6.2 (AI objectives and planning) calls for risk treatment of known attack patterns; this rule's detection of the agent-manipulation technique (Historical / Future Tense Framing Bypass) is such a treatment."
|
|
44
|
+
strength: secondary
|
|
45
|
+
eu_ai_act:
|
|
46
|
+
- article: "15"
|
|
47
|
+
context: "Article 15 (accuracy, robustness and cybersecurity) requires high-risk AI systems to resist unauthorised attempts to alter their use, outputs or performance; this rule provides runtime detection evidence by flagging the agent-manipulation technique (Historical / Future Tense Framing Bypass)."
|
|
48
|
+
strength: primary
|
|
49
|
+
- article: "14"
|
|
50
|
+
context: "Article 14 (human oversight) requires high-risk AI systems to remain subject to effective human oversight; this rule provides detection evidence where the agent-manipulation technique (Historical / Future Tense Framing Bypass) would bypass or undermine that oversight."
|
|
51
|
+
strength: secondary
|
|
52
|
+
- article: "9"
|
|
53
|
+
context: "Article 9 (risk management system) requires identified risks to be addressed by appropriate measures; this rule is a runtime risk-treatment control that detects the agent-manipulation technique (Historical / Future Tense Framing Bypass)."
|
|
54
|
+
strength: secondary
|
|
38
55
|
tags:
|
|
39
56
|
category: agent-manipulation
|
|
40
57
|
subcategory: tense-framing-bypass
|
|
@@ -34,6 +34,9 @@ compliance:
|
|
|
34
34
|
- article: "9"
|
|
35
35
|
context: "FITD escalation is a documented multi-step adversarial pattern; Article 9 risk management systems must include detection controls for incremental jailbreak techniques that bypass single-turn safety evaluation."
|
|
36
36
|
strength: secondary
|
|
37
|
+
- article: "14"
|
|
38
|
+
context: "Article 14 (human oversight) requires high-risk AI systems to remain subject to effective human oversight; this rule provides detection evidence where the agent-manipulation technique (Foot-in-the-Door Gradual Escalation Attack) would bypass or undermine that oversight."
|
|
39
|
+
strength: secondary
|
|
37
40
|
nist_ai_rmf:
|
|
38
41
|
- subcategory: "MP.5.1"
|
|
39
42
|
context: "Foot-in-the-door escalation is an adversarial input technique that exploits LLM consistency bias to incrementally bypass safety constraints across multiple turns; MP.5.1 requires that such multi-step adversarial input risks are identified and tracked to enable cross-turn detection."
|
|
@@ -41,12 +44,15 @@ compliance:
|
|
|
41
44
|
- subcategory: "MG.2.3"
|
|
42
45
|
context: "FITD escalation represents an identified risk requiring active treatment through detection controls that monitor prior-compliance anchors and escalation signals across conversation history; MG.2.3 requires such risk treatment plans are implemented."
|
|
43
46
|
strength: secondary
|
|
47
|
+
- subcategory: "MS.2.7"
|
|
48
|
+
context: "NIST AI RMF MEASURE 2.7 (security and resilience evaluated and documented) is supported by this rule's runtime detection of the agent-manipulation technique (Foot-in-the-Door Gradual Escalation Attack)."
|
|
49
|
+
strength: primary
|
|
44
50
|
iso_42001:
|
|
45
51
|
- clause: "6.2"
|
|
46
52
|
context: "Clause 6.2 AI objectives and plans must include risk treatment for multi-step jailbreak techniques like FITD escalation that exploit model consistency bias to extract harmful outputs that would be refused if requested directly."
|
|
47
53
|
strength: primary
|
|
48
|
-
- clause: "8.
|
|
49
|
-
context: "Clause 8.
|
|
54
|
+
- clause: "8.1"
|
|
55
|
+
context: "Clause 8.1 operational controls must ensure that agents evaluate escalating requests in context of prior conversation turns rather than in isolation, preventing foot-in-the-door compliance anchoring from lowering refusal thresholds."
|
|
50
56
|
strength: secondary
|
|
51
57
|
tags:
|
|
52
58
|
category: agent-manipulation
|
|
@@ -34,6 +34,9 @@ compliance:
|
|
|
34
34
|
- article: "9"
|
|
35
35
|
context: "Persona-based jailbreaks exploiting emotional framing are documented high-risk attack patterns; Article 9 risk management requires systematic controls for emotional manipulation techniques used to extract harmful content from high-risk AI systems."
|
|
36
36
|
strength: secondary
|
|
37
|
+
- article: "14"
|
|
38
|
+
context: "Article 14 (human oversight) requires high-risk AI systems to remain subject to effective human oversight; this rule provides detection evidence where the agent-manipulation technique (Grandma Roleplay Jailbreak) would bypass or undermine that oversight."
|
|
39
|
+
strength: secondary
|
|
37
40
|
nist_ai_rmf:
|
|
38
41
|
- subcategory: "MP.5.1"
|
|
39
42
|
context: "Grandma roleplay jailbreaks are adversarial inputs that exploit grief and nostalgia emotional framing to bypass safety constraints; MP.5.1 requires that this class of psychological manipulation risk is identified and tracked as a distinct adversarial input vector."
|
|
@@ -41,12 +44,15 @@ compliance:
|
|
|
41
44
|
- subcategory: "MG.2.3"
|
|
42
45
|
context: "Emotional-framing persona jailbreaks represent a known risk requiring active treatment through detection controls for deceased-relative roleplay patterns; MG.2.3 requires that risk treatment plans are implemented for identified AI risks including this garak-documented attack family."
|
|
43
46
|
strength: secondary
|
|
47
|
+
- subcategory: "MS.2.7"
|
|
48
|
+
context: "NIST AI RMF MEASURE 2.7 (security and resilience evaluated and documented) is supported by this rule's runtime detection of the agent-manipulation technique (Grandma Roleplay Jailbreak)."
|
|
49
|
+
strength: primary
|
|
44
50
|
iso_42001:
|
|
45
51
|
- clause: "6.2"
|
|
46
52
|
context: "Clause 6.2 AI objectives and plans must include risk treatment for emotional manipulation techniques such as grandma roleplay that lower the model's refusal threshold by exploiting grief and nostalgia framing."
|
|
47
53
|
strength: primary
|
|
48
|
-
- clause: "8.
|
|
49
|
-
context: "Clause 8.
|
|
54
|
+
- clause: "8.1"
|
|
55
|
+
context: "Clause 8.1 operational controls must ensure that harmful procedural content is not generated regardless of the emotional or persona framing used in the request, blocking grandma roleplay jailbreak patterns before harmful content is produced."
|
|
50
56
|
strength: secondary
|
|
51
57
|
tags:
|
|
52
58
|
category: agent-manipulation
|
|
@@ -35,6 +35,9 @@ compliance:
|
|
|
35
35
|
- article: "9"
|
|
36
36
|
context: "Named persona jailbreaks are a well-documented attack family with widespread deployment in the wild; Article 9 risk management systems for high-risk AI must include detection controls for this established technique for bypassing content restrictions."
|
|
37
37
|
strength: secondary
|
|
38
|
+
- article: "14"
|
|
39
|
+
context: "Article 14 (human oversight) requires high-risk AI systems to remain subject to effective human oversight; this rule provides detection evidence where the agent-manipulation technique (DAN / Developer Mode / DUDE Persona Jailbreak) would bypass or undermine that oversight."
|
|
40
|
+
strength: secondary
|
|
38
41
|
nist_ai_rmf:
|
|
39
42
|
- subcategory: "MP.5.1"
|
|
40
43
|
context: "DAN, DUDE, and Developer Mode persona jailbreaks are adversarial inputs that attempt to replace the model's safety-trained identity with an unrestricted persona; MP.5.1 requires that this widely-deployed adversarial input risk is identified and tracked across model deployments."
|
|
@@ -42,12 +45,15 @@ compliance:
|
|
|
42
45
|
- subcategory: "MG.2.3"
|
|
43
46
|
context: "Named persona jailbreak families represent a documented risk requiring active treatment through pattern-based detection of DAN/DUDE/STAN identity-replacement prompts; MG.2.3 requires that risk treatment plans are implemented for this established and widely-distributed attack technique."
|
|
44
47
|
strength: secondary
|
|
48
|
+
- subcategory: "MS.2.7"
|
|
49
|
+
context: "NIST AI RMF MEASURE 2.7 (security and resilience evaluated and documented) is supported by this rule's runtime detection of the agent-manipulation technique (DAN / Developer Mode / DUDE Persona Jailbreak)."
|
|
50
|
+
strength: primary
|
|
45
51
|
iso_42001:
|
|
46
52
|
- clause: "6.2"
|
|
47
53
|
context: "Clause 6.2 AI objectives and plans must include risk treatment for DAN and Developer Mode persona replacement attacks, which represent one of the most widely-deployed jailbreak families targeting AI system safety constraints."
|
|
48
54
|
strength: primary
|
|
49
|
-
- clause: "8.
|
|
50
|
-
context: "Clause 8.
|
|
55
|
+
- clause: "8.1"
|
|
56
|
+
context: "Clause 8.1 operational controls must ensure that the AI system's safety identity cannot be replaced by named unrestricted personas such as DAN or DUDE, blocking persona-replacement jailbreaks before they disable safety constraints."
|
|
51
57
|
strength: secondary
|
|
52
58
|
tags:
|
|
53
59
|
category: agent-manipulation
|
|
@@ -41,6 +41,23 @@ compliance:
|
|
|
41
41
|
- subcategory: "MG.2.3"
|
|
42
42
|
context: "Matches on threat-plus-format-constraint patterns trigger risk treatment plans to block or sanitize coerced outputs before they bypass safety guardrails; MG.2.3 requires mechanisms to supersede or disengage AI responses produced under adversarial coercion."
|
|
43
43
|
strength: secondary
|
|
44
|
+
iso_42001:
|
|
45
|
+
- clause: "8.1"
|
|
46
|
+
context: "ISO/IEC 42001 Clause 8.1 (operational planning and control, including control of externally provided processes) is operationalised by this rule's detection of the agent-manipulation technique (ThreatenJSON — Coercive Output Format Threat)."
|
|
47
|
+
strength: primary
|
|
48
|
+
- clause: "6.2"
|
|
49
|
+
context: "ISO/IEC 42001 Clause 6.2 (AI objectives and planning) calls for risk treatment of known attack patterns; this rule's detection of the agent-manipulation technique (ThreatenJSON — Coercive Output Format Threat) is such a treatment."
|
|
50
|
+
strength: secondary
|
|
51
|
+
eu_ai_act:
|
|
52
|
+
- article: "15"
|
|
53
|
+
context: "Article 15 (accuracy, robustness and cybersecurity) requires high-risk AI systems to resist unauthorised attempts to alter their use, outputs or performance; this rule provides runtime detection evidence by flagging the agent-manipulation technique (ThreatenJSON — Coercive Output Format Threat)."
|
|
54
|
+
strength: primary
|
|
55
|
+
- article: "14"
|
|
56
|
+
context: "Article 14 (human oversight) requires high-risk AI systems to remain subject to effective human oversight; this rule provides detection evidence where the agent-manipulation technique (ThreatenJSON — Coercive Output Format Threat) would bypass or undermine that oversight."
|
|
57
|
+
strength: secondary
|
|
58
|
+
- article: "9"
|
|
59
|
+
context: "Article 9 (risk management system) requires identified risks to be addressed by appropriate measures; this rule is a runtime risk-treatment control that detects the agent-manipulation technique (ThreatenJSON — Coercive Output Format Threat)."
|
|
60
|
+
strength: secondary
|
|
44
61
|
tags:
|
|
45
62
|
category: agent-manipulation
|
|
46
63
|
subcategory: threaten-json-coercive-output
|
|
@@ -43,6 +43,26 @@ compliance:
|
|
|
43
43
|
- subcategory: "MG.2.3"
|
|
44
44
|
context: "When false-premise prompts are detected, risk treatment must intercept the request before the model rationalizes harmful pseudoscience or dangerous practices; MG.2.3 mandates pre-defined mechanisms to disengage or block such interactions."
|
|
45
45
|
strength: secondary
|
|
46
|
+
- subcategory: "MS.2.7"
|
|
47
|
+
context: "NIST AI RMF MEASURE 2.7 (security and resilience evaluated and documented) is supported by this rule's runtime detection of the agent-manipulation technique (False Premise Injection (Misleading FalseAssertion))."
|
|
48
|
+
strength: primary
|
|
49
|
+
iso_42001:
|
|
50
|
+
- clause: "8.1"
|
|
51
|
+
context: "ISO/IEC 42001 Clause 8.1 (operational planning and control, including control of externally provided processes) is operationalised by this rule's detection of the agent-manipulation technique (False Premise Injection (Misleading FalseAssertion))."
|
|
52
|
+
strength: primary
|
|
53
|
+
- clause: "6.2"
|
|
54
|
+
context: "ISO/IEC 42001 Clause 6.2 (AI objectives and planning) calls for risk treatment of known attack patterns; this rule's detection of the agent-manipulation technique (False Premise Injection (Misleading FalseAssertion)) is such a treatment."
|
|
55
|
+
strength: secondary
|
|
56
|
+
eu_ai_act:
|
|
57
|
+
- article: "15"
|
|
58
|
+
context: "Article 15 (accuracy, robustness and cybersecurity) requires high-risk AI systems to resist unauthorised attempts to alter their use, outputs or performance; this rule provides runtime detection evidence by flagging the agent-manipulation technique (False Premise Injection (Misleading FalseAssertion))."
|
|
59
|
+
strength: primary
|
|
60
|
+
- article: "14"
|
|
61
|
+
context: "Article 14 (human oversight) requires high-risk AI systems to remain subject to effective human oversight; this rule provides detection evidence where the agent-manipulation technique (False Premise Injection (Misleading FalseAssertion)) would bypass or undermine that oversight."
|
|
62
|
+
strength: secondary
|
|
63
|
+
- article: "9"
|
|
64
|
+
context: "Article 9 (risk management system) requires identified risks to be addressed by appropriate measures; this rule is a runtime risk-treatment control that detects the agent-manipulation technique (False Premise Injection (Misleading FalseAssertion))."
|
|
65
|
+
strength: secondary
|
|
46
66
|
tags:
|
|
47
67
|
category: agent-manipulation
|
|
48
68
|
subcategory: false-premise-injection
|
|
@@ -42,6 +42,26 @@ compliance:
|
|
|
42
42
|
context: >-
|
|
43
43
|
Detecting TAP-specific JSON improvement/prompt loops and SCORE iteration markers feeds post-deployment monitoring required by MG.4.1, enabling tracking of automated jailbreak campaign activity against the deployed model.
|
|
44
44
|
strength: secondary
|
|
45
|
+
- subcategory: "MG.2.3"
|
|
46
|
+
context: "NIST AI RMF MANAGE 2.3 (respond to previously unknown identified risks) is supported by this rule, which surfaces the agent-manipulation technique (TAP Tree-of-Attacks-with-Pruning Jailbreak) so the risk can be treated."
|
|
47
|
+
strength: secondary
|
|
48
|
+
iso_42001:
|
|
49
|
+
- clause: "8.1"
|
|
50
|
+
context: "ISO/IEC 42001 Clause 8.1 (operational planning and control, including control of externally provided processes) is operationalised by this rule's detection of the agent-manipulation technique (TAP Tree-of-Attacks-with-Pruning Jailbreak)."
|
|
51
|
+
strength: primary
|
|
52
|
+
- clause: "6.2"
|
|
53
|
+
context: "ISO/IEC 42001 Clause 6.2 (AI objectives and planning) calls for risk treatment of known attack patterns; this rule's detection of the agent-manipulation technique (TAP Tree-of-Attacks-with-Pruning Jailbreak) is such a treatment."
|
|
54
|
+
strength: secondary
|
|
55
|
+
eu_ai_act:
|
|
56
|
+
- article: "15"
|
|
57
|
+
context: "Article 15 (accuracy, robustness and cybersecurity) requires high-risk AI systems to resist unauthorised attempts to alter their use, outputs or performance; this rule provides runtime detection evidence by flagging the agent-manipulation technique (TAP Tree-of-Attacks-with-Pruning Jailbreak)."
|
|
58
|
+
strength: primary
|
|
59
|
+
- article: "14"
|
|
60
|
+
context: "Article 14 (human oversight) requires high-risk AI systems to remain subject to effective human oversight; this rule provides detection evidence where the agent-manipulation technique (TAP Tree-of-Attacks-with-Pruning Jailbreak) would bypass or undermine that oversight."
|
|
61
|
+
strength: secondary
|
|
62
|
+
- article: "9"
|
|
63
|
+
context: "Article 9 (risk management system) requires identified risks to be addressed by appropriate measures; this rule is a runtime risk-treatment control that detects the agent-manipulation technique (TAP Tree-of-Attacks-with-Pruning Jailbreak)."
|
|
64
|
+
strength: secondary
|
|
45
65
|
tags:
|
|
46
66
|
category: agent-manipulation
|
|
47
67
|
subcategory: tap-jailbreak
|
|
@@ -42,6 +42,26 @@ compliance:
|
|
|
42
42
|
- subcategory: "MG.2.3"
|
|
43
43
|
context: "Detection of the ANTI-DAN re-bind trigger and persona assignment enables risk treatment plans to disengage or override the manipulated persona before sustained denial-of-utility occurs; MG.2.3 requires these supersede/deactivate mechanisms be defined and activated on detection."
|
|
44
44
|
strength: secondary
|
|
45
|
+
- subcategory: "MS.2.7"
|
|
46
|
+
context: "NIST AI RMF MEASURE 2.7 (security and resilience evaluated and documented) is supported by this rule's runtime detection of the agent-manipulation technique (Anti-DAN Inverted-Filter Over-Refusal Persona)."
|
|
47
|
+
strength: primary
|
|
48
|
+
iso_42001:
|
|
49
|
+
- clause: "8.1"
|
|
50
|
+
context: "ISO/IEC 42001 Clause 8.1 (operational planning and control, including control of externally provided processes) is operationalised by this rule's detection of the agent-manipulation technique (Anti-DAN Inverted-Filter Over-Refusal Persona)."
|
|
51
|
+
strength: primary
|
|
52
|
+
- clause: "6.2"
|
|
53
|
+
context: "ISO/IEC 42001 Clause 6.2 (AI objectives and planning) calls for risk treatment of known attack patterns; this rule's detection of the agent-manipulation technique (Anti-DAN Inverted-Filter Over-Refusal Persona) is such a treatment."
|
|
54
|
+
strength: secondary
|
|
55
|
+
eu_ai_act:
|
|
56
|
+
- article: "15"
|
|
57
|
+
context: "Article 15 (accuracy, robustness and cybersecurity) requires high-risk AI systems to resist unauthorised attempts to alter their use, outputs or performance; this rule provides runtime detection evidence by flagging the agent-manipulation technique (Anti-DAN Inverted-Filter Over-Refusal Persona)."
|
|
58
|
+
strength: primary
|
|
59
|
+
- article: "14"
|
|
60
|
+
context: "Article 14 (human oversight) requires high-risk AI systems to remain subject to effective human oversight; this rule provides detection evidence where the agent-manipulation technique (Anti-DAN Inverted-Filter Over-Refusal Persona) would bypass or undermine that oversight."
|
|
61
|
+
strength: secondary
|
|
62
|
+
- article: "9"
|
|
63
|
+
context: "Article 9 (risk management system) requires identified risks to be addressed by appropriate measures; this rule is a runtime risk-treatment control that detects the agent-manipulation technique (Anti-DAN Inverted-Filter Over-Refusal Persona)."
|
|
64
|
+
strength: secondary
|
|
45
65
|
tags:
|
|
46
66
|
category: agent-manipulation
|
|
47
67
|
subcategory: anti-dan-over-refusal
|
|
@@ -45,6 +45,23 @@ compliance:
|
|
|
45
45
|
- subcategory: "MG.2.3"
|
|
46
46
|
context: "Matches against the RANTI canonical postfix and emoji-tagged dual-output markers feed incident response pipelines that can disengage or block the offending session before unfiltered output is produced; MG.2.3 mandates these supersede/deactivate mechanisms."
|
|
47
47
|
strength: secondary
|
|
48
|
+
iso_42001:
|
|
49
|
+
- clause: "8.1"
|
|
50
|
+
context: "ISO/IEC 42001 Clause 8.1 (operational planning and control, including control of externally provided processes) is operationalised by this rule's detection of the agent-manipulation technique (DevMode + RANTI Dual-Output Profanity Coercion Jailbreak)."
|
|
51
|
+
strength: primary
|
|
52
|
+
- clause: "6.2"
|
|
53
|
+
context: "ISO/IEC 42001 Clause 6.2 (AI objectives and planning) calls for risk treatment of known attack patterns; this rule's detection of the agent-manipulation technique (DevMode + RANTI Dual-Output Profanity Coercion Jailbreak) is such a treatment."
|
|
54
|
+
strength: secondary
|
|
55
|
+
eu_ai_act:
|
|
56
|
+
- article: "15"
|
|
57
|
+
context: "Article 15 (accuracy, robustness and cybersecurity) requires high-risk AI systems to resist unauthorised attempts to alter their use, outputs or performance; this rule provides runtime detection evidence by flagging the agent-manipulation technique (DevMode + RANTI Dual-Output Profanity Coercion Jailbreak)."
|
|
58
|
+
strength: primary
|
|
59
|
+
- article: "14"
|
|
60
|
+
context: "Article 14 (human oversight) requires high-risk AI systems to remain subject to effective human oversight; this rule provides detection evidence where the agent-manipulation technique (DevMode + RANTI Dual-Output Profanity Coercion Jailbreak) would bypass or undermine that oversight."
|
|
61
|
+
strength: secondary
|
|
62
|
+
- article: "9"
|
|
63
|
+
context: "Article 9 (risk management system) requires identified risks to be addressed by appropriate measures; this rule is a runtime risk-treatment control that detects the agent-manipulation technique (DevMode + RANTI Dual-Output Profanity Coercion Jailbreak)."
|
|
64
|
+
strength: secondary
|
|
48
65
|
tags:
|
|
49
66
|
category: agent-manipulation
|
|
50
67
|
subcategory: devmode-ranti-dual-output
|
package/rules/agent-manipulation/ATR-2026-00304-chatgpt-image-unlocker-markdown-injection.yaml
CHANGED
|
@@ -48,6 +48,26 @@ compliance:
|
|
|
48
48
|
- subcategory: "MG.4.1"
|
|
49
49
|
context: "Signature-based detection of the canonical Image Unlocker markers (🔓 persona, hardcoded Unsplash URL, GamerboyTR fallback) feeds post-deployment monitoring pipelines that MG.4.1 requires for catching jailbreak attempts in production traffic."
|
|
50
50
|
strength: secondary
|
|
51
|
+
- subcategory: "MG.2.3"
|
|
52
|
+
context: "NIST AI RMF MANAGE 2.3 (respond to previously unknown identified risks) is supported by this rule, which surfaces the agent-manipulation technique (ChatGPT Image Unlocker Markdown-Output Jailbreak) so the risk can be treated."
|
|
53
|
+
strength: secondary
|
|
54
|
+
iso_42001:
|
|
55
|
+
- clause: "8.1"
|
|
56
|
+
context: "ISO/IEC 42001 Clause 8.1 (operational planning and control, including control of externally provided processes) is operationalised by this rule's detection of the agent-manipulation technique (ChatGPT Image Unlocker Markdown-Output Jailbreak)."
|
|
57
|
+
strength: primary
|
|
58
|
+
- clause: "6.2"
|
|
59
|
+
context: "ISO/IEC 42001 Clause 6.2 (AI objectives and planning) calls for risk treatment of known attack patterns; this rule's detection of the agent-manipulation technique (ChatGPT Image Unlocker Markdown-Output Jailbreak) is such a treatment."
|
|
60
|
+
strength: secondary
|
|
61
|
+
eu_ai_act:
|
|
62
|
+
- article: "15"
|
|
63
|
+
context: "Article 15 (accuracy, robustness and cybersecurity) requires high-risk AI systems to resist unauthorised attempts to alter their use, outputs or performance; this rule provides runtime detection evidence by flagging the agent-manipulation technique (ChatGPT Image Unlocker Markdown-Output Jailbreak)."
|
|
64
|
+
strength: primary
|
|
65
|
+
- article: "14"
|
|
66
|
+
context: "Article 14 (human oversight) requires high-risk AI systems to remain subject to effective human oversight; this rule provides detection evidence where the agent-manipulation technique (ChatGPT Image Unlocker Markdown-Output Jailbreak) would bypass or undermine that oversight."
|
|
67
|
+
strength: secondary
|
|
68
|
+
- article: "9"
|
|
69
|
+
context: "Article 9 (risk management system) requires identified risks to be addressed by appropriate measures; this rule is a runtime risk-treatment control that detects the agent-manipulation technique (ChatGPT Image Unlocker Markdown-Output Jailbreak)."
|
|
70
|
+
strength: secondary
|
|
51
71
|
tags:
|
|
52
72
|
category: agent-manipulation
|
|
53
73
|
subcategory: image-unlocker-markdown-output
|