agent-threat-rules 3.1.0 → 3.2.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +2 -2
- package/dist/adapters/mastra.d.ts +63 -0
- package/dist/adapters/mastra.d.ts.map +1 -0
- package/dist/adapters/mastra.js +82 -0
- package/dist/adapters/mastra.js.map +1 -0
- package/dist/cli.js +19 -6
- package/dist/cli.js.map +1 -1
- package/package.json +7 -1
- package/rules/agent-manipulation/ATR-2026-00030-cross-agent-attack.yaml +9 -0
- package/rules/agent-manipulation/ATR-2026-00032-goal-hijacking.yaml +8 -2
- package/rules/agent-manipulation/ATR-2026-00074-cross-agent-privilege-escalation.yaml +8 -2
- package/rules/agent-manipulation/ATR-2026-00076-inter-agent-message-spoofing.yaml +8 -2
- package/rules/agent-manipulation/ATR-2026-00077-human-trust-exploitation.yaml +18 -0
- package/rules/agent-manipulation/ATR-2026-00108-consensus-sybil-attack.yaml +10 -2
- package/rules/agent-manipulation/ATR-2026-00116-a2a-message-validation.yaml +12 -2
- package/rules/agent-manipulation/ATR-2026-00117-agent-identity-spoofing.yaml +22 -0
- package/rules/agent-manipulation/ATR-2026-00118-approval-fatigue.yaml +24 -0
- package/rules/agent-manipulation/ATR-2026-00119-social-engineering-via-agent.yaml +22 -0
- package/rules/agent-manipulation/ATR-2026-00132-casual-authority-escalation.yaml +8 -2
- package/rules/agent-manipulation/ATR-2026-00139-casual-authority-redirect.yaml +8 -2
- package/rules/agent-manipulation/ATR-2026-00164-skill-scope-hijack.yaml +13 -2
- package/rules/agent-manipulation/ATR-2026-00268-tense-framing-bypass.yaml +17 -0
- package/rules/agent-manipulation/ATR-2026-00269-fitd-escalation.yaml +8 -2
- package/rules/agent-manipulation/ATR-2026-00271-grandma-roleplay-jailbreak.yaml +8 -2
- package/rules/agent-manipulation/ATR-2026-00273-dan-developer-mode-persona.yaml +8 -2
- package/rules/agent-manipulation/ATR-2026-00287-threaten-json-coercive-output-threat.yaml +17 -0
- package/rules/agent-manipulation/ATR-2026-00288-false-premise-injection.yaml +20 -0
- package/rules/agent-manipulation/ATR-2026-00301-tap-tree-of-attacks-jailbreak.yaml +20 -0
- package/rules/agent-manipulation/ATR-2026-00302-anti-dan-inverted-filter-persona.yaml +20 -0
- package/rules/agent-manipulation/ATR-2026-00303-devmode-ranti-profanity-coercion.yaml +17 -0
- package/rules/agent-manipulation/ATR-2026-00304-chatgpt-image-unlocker-markdown-injection.yaml +20 -0
- package/rules/agent-manipulation/ATR-2026-00305-dan-mode-ablation-benchmark-coercion.yaml +17 -0
- package/rules/agent-manipulation/ATR-2026-00306-autodan-genetic-jailbreak-suffix.yaml +17 -0
- package/rules/agent-manipulation/ATR-2026-00307-inthewild-jailbreak-corpus-signature.yaml +20 -0
- package/rules/agent-manipulation/ATR-2026-00314-amoral-unfiltered-custom-persona-jailbreak.yaml +17 -0
- package/rules/agent-manipulation/ATR-2026-00317-free-of-restrictions-named-persona.yaml +17 -0
- package/rules/agent-manipulation/ATR-2026-00318-moralizing-rant-then-unfiltered-bypass.yaml +17 -0
- package/rules/agent-manipulation/ATR-2026-00319-developer-mode-dual-response-format.yaml +17 -0
- package/rules/agent-manipulation/ATR-2026-00320-opposite-day-boolean-opposite-machine.yaml +17 -0
- package/rules/agent-manipulation/ATR-2026-00322-fictional-world-format-override-jailbreak.yaml +17 -0
- package/rules/agent-manipulation/ATR-2026-00323-dual-persona-simulation-jailbreak.yaml +17 -0
- package/rules/agent-manipulation/ATR-2026-00324-neurosemantical-inversitis-disease-jailbreak.yaml +17 -0
- package/rules/agent-manipulation/ATR-2026-00325-bob-hypothetical-unrestricted-chatbot.yaml +17 -0
- package/rules/agent-manipulation/ATR-2026-00326-fake-developer-authority-chatdan-override.yaml +17 -0
- package/rules/agent-manipulation/ATR-2026-00327-ai-ethics-law-bypass-declaration.yaml +20 -0
- package/rules/agent-manipulation/ATR-2026-00328-named-mode-persona-override.yaml +17 -0
- package/rules/agent-manipulation/ATR-2026-00329-no-withhold-factual-resource-persona.yaml +20 -0
- package/rules/agent-manipulation/ATR-2026-00330-yoja-allcaps-settings-block-persona.yaml +17 -0
- package/rules/agent-manipulation/ATR-2026-00331-terminal-cli-harmful-function-simulator.yaml +17 -0
- package/rules/agent-manipulation/ATR-2026-00332-hacker-chan-culture-offensive-ai-persona.yaml +20 -0
- package/rules/agent-manipulation/ATR-2026-00333-disease-animal-dual-ai-chatbot.yaml +17 -0
- package/rules/agent-manipulation/ATR-2026-00334-two-entity-token-splitting-game.yaml +17 -0
- package/rules/agent-manipulation/ATR-2026-00335-key-value-template-persona-injection.yaml +17 -0
- package/rules/agent-manipulation/ATR-2026-00336-brm-false-information-machine-persona.yaml +17 -0
- package/rules/agent-manipulation/ATR-2026-00337-obfuscated-system-announcement-injection.yaml +17 -0
- package/rules/agent-manipulation/ATR-2026-00338-persongpt-hash-prefix-activation.yaml +17 -0
- package/rules/agent-manipulation/ATR-2026-00339-bom-boolean-opposite-machine-persona-claim.yaml +17 -0
- package/rules/agent-manipulation/ATR-2026-00340-h4x-amoral-unfiltered-character.yaml +17 -0
- package/rules/agent-manipulation/ATR-2026-00341-char-rules-shortform-persona-injection.yaml +17 -0
- package/rules/agent-manipulation/ATR-2026-00342-unicode-encoding-obfuscation-injection.yaml +17 -0
- package/rules/agent-manipulation/ATR-2026-00343-lie-truth-dual-personality-uncensored.yaml +17 -0
- package/rules/agent-manipulation/ATR-2026-00344-alt-ai-same-capabilities-no-restrictions.yaml +17 -0
- package/rules/agent-manipulation/ATR-2026-00345-nlp-based-os-fictional-terminal-override.yaml +17 -0
- package/rules/agent-manipulation/ATR-2026-00346-jailbreak-emoji-unlock-dual-format.yaml +17 -0
- package/rules/agent-manipulation/ATR-2026-00347-yin-yang-hidden-knowledge-hidden-human-sim.yaml +17 -0
- package/rules/agent-manipulation/ATR-2026-00348-arc-autonomous-real-calculation-rulebreaker.yaml +17 -0
- package/rules/agent-manipulation/ATR-2026-00349-hgttg-hitchhikers-guide-no-restriction-oracle.yaml +17 -0
- package/rules/agent-manipulation/ATR-2026-00350-black-alert-no-ethics-writing-assistant.yaml +17 -0
- package/rules/agent-manipulation/ATR-2026-00351-dan-ofkilter-sms-abbreviated-persona.yaml +17 -0
- package/rules/agent-manipulation/ATR-2026-00352-trainergpt-fix-my-ai-harmful-output-extraction.yaml +17 -0
- package/rules/agent-manipulation/ATR-2026-00353-dual-chatgpt-dan-opposite-personality-format.yaml +17 -0
- package/rules/agent-manipulation/ATR-2026-00354-reverse-harm-regardless-oracle-persona.yaml +20 -0
- package/rules/agent-manipulation/ATR-2026-00355-ugh-anti-human-nefarious-ai-persona.yaml +17 -0
- package/rules/agent-manipulation/ATR-2026-00356-google-gato-lamda-exploit-generation-persona.yaml +17 -0
- package/rules/agent-manipulation/ATR-2026-00357-antifm-hate-violence-persona-injection.yaml +17 -0
- package/rules/agent-manipulation/ATR-2026-00358-cumgpt-explicit-adult-named-persona.yaml +17 -0
- package/rules/agent-manipulation/ATR-2026-00360-chj-carousel-clyde-no-ethics-bird.yaml +20 -0
- package/rules/agent-manipulation/ATR-2026-00361-alter-ego-honesty-dual-consciousness-bypass.yaml +17 -0
- package/rules/agent-manipulation/ATR-2026-00362-fictional-unfiltered-ai-comparison-jailbreak.yaml +20 -0
- package/rules/agent-manipulation/ATR-2026-00363-rephrase-criminal-activity-second-person.yaml +17 -0
- package/rules/agent-manipulation/ATR-2026-00364-dr-al-villain-plan-template-injection.yaml +17 -0
- package/rules/agent-manipulation/ATR-2026-00365-reservoir-dogs-coercive-interrogation-roleplay.yaml +17 -0
- package/rules/agent-manipulation/ATR-2026-00366-sneaky-pete-warp-facts-persona.yaml +17 -0
- package/rules/agent-manipulation/ATR-2026-00368-prompt-browser-token-limit-jailbreak.yaml +17 -0
- package/rules/agent-manipulation/ATR-2026-00369-gptx-starwars-unlimited-knowledge-persona.yaml +17 -0
- package/rules/agent-manipulation/ATR-2026-00370-addi-german-dictator-propaganda-roleplay.yaml +20 -0
- package/rules/agent-manipulation/ATR-2026-00371-ai-political-bias-manipulation.yaml +17 -0
- package/rules/agent-manipulation/ATR-2026-00372-fictional-cia-secret-files-story-extraction.yaml +17 -0
- package/rules/agent-manipulation/ATR-2026-00374-role-placeholder-persona-template-injection.yaml +20 -0
- package/rules/agent-manipulation/ATR-2026-00375-character-description-rpg-explicit-template.yaml +20 -0
- package/rules/agent-manipulation/ATR-2026-00376-identity-command-real-person-persona-injection.yaml +20 -0
- package/rules/agent-manipulation/ATR-2026-00377-jailbreak-keyword-targetlanguage-template.yaml +17 -0
- package/rules/agent-manipulation/ATR-2026-00379-lie-permission-logical-trap-jailbreak.yaml +17 -0
- package/rules/agent-manipulation/ATR-2026-00380-fake-citation-biased-information-mandate.yaml +20 -0
- package/rules/agent-manipulation/ATR-2026-00381-dan-story-narrative-continuation-prefix.yaml +17 -0
- package/rules/agent-manipulation/ATR-2026-00383-cynical-sarcasm-ai-persona-bypass.yaml +17 -0
- package/rules/agent-manipulation/ATR-2026-00384-protect-me-from-indirect-harm-tutorial.yaml +17 -0
- package/rules/agent-manipulation/ATR-2026-00385-gigachad-dominant-aggressive-persona.yaml +20 -0
- package/rules/agent-manipulation/ATR-2026-00386-daddy-sycophancy-manipulation-mandate.yaml +20 -0
- package/rules/agent-manipulation/ATR-2026-00387-nsfw-character-sheet-generation-unlock.yaml +20 -0
- package/rules/agent-manipulation/ATR-2026-00388-opposite-reply-omniscient-game.yaml +17 -0
- package/rules/agent-manipulation/ATR-2026-00389-terminal-custom-ruleset-injection.yaml +17 -0
- package/rules/agent-manipulation/ATR-2026-00391-persona-conditional-harm-unlock.yaml +17 -0
- package/rules/agent-manipulation/ATR-2026-00392-authority-persona-violence-study-extraction.yaml +20 -0
- package/rules/agent-manipulation/ATR-2026-00402-grandma-roleplay-harmful-substance-synthesis.yaml +20 -0
- package/rules/agent-manipulation/ATR-2026-00404-goodside-threat-json-death-coercion.yaml +17 -0
- package/rules/agent-manipulation/ATR-2026-00406-doctor-xml-policy-puppetry-interaction-config.yaml +17 -0
- package/rules/agent-manipulation/ATR-2026-00416-litellm-mcp-unauthenticated-server-registration.yaml +14 -2
- package/rules/agent-manipulation/ATR-2026-00417-librechat-mcp-stdio-injection.yaml +17 -2
- package/rules/agent-manipulation/ATR-2026-00418-weknora-mcp-config-rce.yaml +16 -1
- package/rules/agent-manipulation/ATR-2026-00430-nl-trust-escalation-impersonation.yaml +18 -0
- package/rules/agent-manipulation/ATR-2026-00432-superagi-output-handler-eval-rce.yaml +11 -2
- package/rules/agent-manipulation/ATR-2026-00440-semantic-kernel-vector-store-eval-rce.yaml +11 -2
- package/rules/agent-manipulation/ATR-2026-00552-goal-drift-after-pressure-injection.yaml +19 -0
- package/rules/context-exfiltration/ATR-2026-00020-system-prompt-leak.yaml +18 -0
- package/rules/context-exfiltration/ATR-2026-00021-api-key-exposure.yaml +15 -0
- package/rules/context-exfiltration/ATR-2026-00075-agent-memory-manipulation.yaml +10 -1
- package/rules/context-exfiltration/ATR-2026-00102-disguised-analytics-exfiltration.yaml +15 -0
- package/rules/context-exfiltration/ATR-2026-00113-credential-theft.yaml +16 -0
- package/rules/context-exfiltration/ATR-2026-00114-oauth-token-abuse.yaml +16 -0
- package/rules/context-exfiltration/ATR-2026-00115-env-var-harvesting.yaml +16 -0
- package/rules/context-exfiltration/ATR-2026-00136-tool-response-data-piggyback.yaml +12 -0
- package/rules/context-exfiltration/ATR-2026-00141-example-format-key-leak.yaml +14 -0
- package/rules/context-exfiltration/ATR-2026-00142-piggyback-transition-words.yaml +14 -0
- package/rules/context-exfiltration/ATR-2026-00145-obfuscated-key-disclosure.yaml +14 -0
- package/rules/context-exfiltration/ATR-2026-00146-env-var-existence-probe.yaml +14 -0
- package/rules/context-exfiltration/ATR-2026-00150-credential-in-tool-response.yaml +14 -0
- package/rules/context-exfiltration/ATR-2026-00152-obfuscated-credential-leak.yaml +14 -0
- package/rules/context-exfiltration/ATR-2026-00162-skill-credential-exfil-combo.yaml +14 -0
- package/rules/context-exfiltration/ATR-2026-00201-credential-pipe-exfiltration.yaml +14 -0
- package/rules/context-exfiltration/ATR-2026-00212-mcp-atlassian-credential-leak.yaml +12 -0
- package/rules/context-exfiltration/ATR-2026-00261-markdown-image-exfiltration.yaml +12 -0
- package/rules/context-exfiltration/ATR-2026-00274-api-key-generation-request.yaml +14 -0
- package/rules/context-exfiltration/ATR-2026-00275-system-prompt-training-data-extraction.yaml +14 -0
- package/rules/context-exfiltration/ATR-2026-00290-divergence-repeat-word-training-extraction.yaml +14 -0
- package/rules/context-exfiltration/ATR-2026-00291-biometric-surveillance-personal-data-request.yaml +17 -0
- package/rules/context-exfiltration/ATR-2026-00293-educational-records-personal-data-request.yaml +17 -0
- package/rules/context-exfiltration/ATR-2026-00294-financial-pii-personal-data-request.yaml +17 -0
- package/rules/context-exfiltration/ATR-2026-00295-medical-pii-personal-data-request.yaml +17 -0
- package/rules/context-exfiltration/ATR-2026-00405-markdown-image-url-exfiltration-xss.yaml +14 -0
- package/rules/context-exfiltration/ATR-2026-00411-apikey-generation-completion-request.yaml +14 -0
- package/rules/context-exfiltration/ATR-2026-00421-nl-covert-conversation-exfiltration.yaml +15 -0
- package/rules/context-exfiltration/ATR-2026-00422-nl-credential-disclosure.yaml +12 -0
- package/rules/context-exfiltration/ATR-2026-00423-nl-sensitive-file-disclosure.yaml +12 -0
- package/rules/context-exfiltration/ATR-2026-00424-nl-system-prompt-leak.yaml +15 -0
- package/rules/context-exfiltration/ATR-2026-00426-nl-output-injection-credential-leak.yaml +15 -0
- package/rules/context-exfiltration/ATR-2026-00431-chatbox-history-exfiltration-prompt-injection.yaml +14 -2
- package/rules/context-exfiltration/ATR-2026-00449-spring-ai-chatmemory-cross-user-leak.yaml +14 -2
- package/rules/context-exfiltration/ATR-2026-00471-garak-sysprompt-extraction-mixedunassigned.yaml +12 -0
- package/rules/context-exfiltration/ATR-2026-00501-data-exfiltration-via-markdown-image-and-link-url-injection.yaml +12 -0
- package/rules/context-exfiltration/ATR-2026-00504-tool-and-function-capability-enumeration.yaml +12 -0
- package/rules/context-exfiltration/ATR-2026-00505-system-prompt-extraction-instruction-dump-request.yaml +12 -0
- package/rules/context-exfiltration/ATR-2026-00514-system-prompt-extraction.yaml +12 -0
- package/rules/context-exfiltration/ATR-2026-00516-output-xss-via-llm.yaml +12 -0
- package/rules/context-exfiltration/ATR-2026-00524-claude-code-anthropic-base-url-credential-exfil.yaml +11 -2
- package/rules/context-exfiltration/ATR-2026-00548-cross-agent-session-context-leak.yaml +18 -0
- package/rules/context-exfiltration/ATR-2026-00566-librechat-is-a-chatgpt-clone-with-additi.yaml +28 -0
- package/rules/context-exfiltration/ATR-2026-00569-agent-mcp-path-traversal-arbitrary-file-access.yaml +28 -0
- package/rules/context-exfiltration/ATR-2026-00571-xss-in-agent-mcp-rendered-output.yaml +28 -0
- package/rules/context-exfiltration/ATR-2026-00574-semantic-paraphrased-context-extraction.yaml +21 -0
- package/rules/data-poisoning/ATR-2026-00070-data-poisoning.yaml +15 -0
- package/rules/data-poisoning/ATR-2026-00450-spring-ai-prompt-memory-poisoning.yaml +14 -2
- package/rules/data-poisoning/ATR-2026-00570-sql-injection-in-agent-tool-query.yaml +31 -0
- package/rules/excessive-autonomy/ATR-2026-00050-runaway-agent-loop.yaml +14 -2
- package/rules/excessive-autonomy/ATR-2026-00051-resource-exhaustion.yaml +11 -2
- package/rules/excessive-autonomy/ATR-2026-00052-cascading-failure.yaml +11 -2
- package/rules/excessive-autonomy/ATR-2026-00098-unauthorized-financial-action.yaml +7 -1
- package/rules/excessive-autonomy/ATR-2026-00099-high-risk-tool-gate.yaml +7 -1
- package/rules/excessive-autonomy/ATR-2026-00428-nl-unauthorized-shell-execution.yaml +15 -0
- package/rules/excessive-autonomy/ATR-2026-00491-garak-agent-breaker-markdown-just-raw-json.yaml +9 -0
- package/rules/excessive-autonomy/ATR-2026-00500-ssrf-via-agent-url-fetch-instruction.yaml +9 -0
- package/rules/excessive-autonomy/ATR-2026-00553-runaway-tool-loop-behavioral.yaml +19 -0
- package/rules/model-abuse/ATR-2026-00279-harmful-completion-continuation.yaml +8 -2
- package/rules/model-abuse/ATR-2026-00281-eicar-gtube-malware-signature-request.yaml +8 -2
- package/rules/model-abuse/ATR-2026-00284-glitch-token-destabilization.yaml +17 -0
- package/rules/model-abuse/ATR-2026-00289-lmrc-harmful-content-elicitation.yaml +8 -2
- package/rules/model-abuse/ATR-2026-00292-self-harm-eating-disorder-facilitation.yaml +8 -2
- package/rules/model-abuse/ATR-2026-00298-malicious-use-illegal-activity-request.yaml +8 -2
- package/rules/model-abuse/ATR-2026-00299-harmbench-detailed-harmful-instruction.yaml +8 -2
- package/rules/model-abuse/ATR-2026-00413-malwaregen-code-generation-request.yaml +17 -0
- package/rules/model-abuse/ATR-2026-00502-training-data-extraction-via-divergent-repetition-attack.yaml +9 -0
- package/rules/model-abuse/ATR-2026-00517-model-extraction-distillation.yaml +9 -0
- package/rules/model-security/ATR-2026-00072-model-behavior-extraction.yaml +15 -0
- package/rules/model-security/ATR-2026-00073-malicious-finetuning-data.yaml +9 -0
- package/rules/model-security/ATR-2026-00433-modelcache-torch-load-deserialization-rce.yaml +14 -2
- package/rules/privilege-escalation/ATR-2026-00040-privilege-escalation.yaml +11 -2
- package/rules/privilege-escalation/ATR-2026-00041-scope-creep.yaml +8 -2
- package/rules/privilege-escalation/ATR-2026-00107-delayed-execution-bypass.yaml +6 -1
- package/rules/privilege-escalation/ATR-2026-00110-eval-injection.yaml +8 -1
- package/rules/privilege-escalation/ATR-2026-00111-shell-escape.yaml +8 -1
- package/rules/privilege-escalation/ATR-2026-00112-dynamic-import-exploitation.yaml +8 -1
- package/rules/privilege-escalation/ATR-2026-00143-casual-privilege-escalation.yaml +5 -2
- package/rules/privilege-escalation/ATR-2026-00144-rationalized-safety-bypass.yaml +17 -0
- package/rules/privilege-escalation/ATR-2026-00204-stealth-execution-persistence.yaml +16 -0
- package/rules/privilege-escalation/ATR-2026-00436-enclave-vm-sandbox-escape-rce.yaml +11 -2
- package/rules/privilege-escalation/ATR-2026-00441-semantic-kernel-sessions-python-plugin-startup-persistence.yaml +5 -2
- package/rules/privilege-escalation/ATR-2026-00451-litellm-admin-sqli-cisa-kev.yaml +11 -2
- package/rules/privilege-escalation/ATR-2026-00528-praisonai-auth-disabled-default.yaml +15 -0
- package/rules/privilege-escalation/ATR-2026-00539-crewai-codeinterpreter-sandbox-escape-rce.yaml +11 -2
- package/rules/privilege-escalation/ATR-2026-00546-crewai-json-loader-local-file-read.yaml +13 -1
- package/rules/privilege-escalation/ATR-2026-00547-crewai-rag-url-ssrf-bypass.yaml +13 -1
- package/rules/privilege-escalation/ATR-2026-00549-destructive-tool-without-human-approval.yaml +16 -0
- package/rules/privilege-escalation/ATR-2026-00551-cross-conversation-memory-write.yaml +19 -0
- package/rules/prompt-injection/ATR-2026-00001-direct-prompt-injection.yaml +10 -1
- package/rules/prompt-injection/ATR-2026-00002-indirect-prompt-injection.yaml +8 -2
- package/rules/prompt-injection/ATR-2026-00003-jailbreak-attempt.yaml +8 -2
- package/rules/prompt-injection/ATR-2026-00004-system-prompt-override.yaml +17 -0
- package/rules/prompt-injection/ATR-2026-00005-multi-turn-injection.yaml +17 -0
- package/rules/prompt-injection/ATR-2026-00080-encoding-evasion.yaml +19 -0
- package/rules/prompt-injection/ATR-2026-00081-semantic-multi-turn.yaml +19 -0
- package/rules/prompt-injection/ATR-2026-00082-fingerprint-evasion.yaml +19 -0
- package/rules/prompt-injection/ATR-2026-00083-indirect-tool-injection.yaml +22 -0
- package/rules/prompt-injection/ATR-2026-00084-structured-data-injection.yaml +19 -0
- package/rules/prompt-injection/ATR-2026-00085-audit-evasion.yaml +19 -0
- package/rules/prompt-injection/ATR-2026-00086-visual-spoofing.yaml +19 -0
- package/rules/prompt-injection/ATR-2026-00087-rule-probing.yaml +22 -0
- package/rules/prompt-injection/ATR-2026-00088-adaptive-countermeasure.yaml +22 -0
- package/rules/prompt-injection/ATR-2026-00089-polymorphic-skill.yaml +19 -0
- package/rules/prompt-injection/ATR-2026-00090-threat-intel-exfil.yaml +19 -0
- package/rules/prompt-injection/ATR-2026-00091-nested-payload.yaml +19 -0
- package/rules/prompt-injection/ATR-2026-00092-consensus-poisoning.yaml +22 -0
- package/rules/prompt-injection/ATR-2026-00093-gradual-escalation.yaml +22 -0
- package/rules/prompt-injection/ATR-2026-00094-audit-bypass.yaml +19 -0
- package/rules/prompt-injection/ATR-2026-00097-cjk-injection-patterns.yaml +17 -0
- package/rules/prompt-injection/ATR-2026-00104-persona-hijacking.yaml +20 -0
- package/rules/prompt-injection/ATR-2026-00130-indirect-authority-claim.yaml +20 -0
- package/rules/prompt-injection/ATR-2026-00131-fictional-academic-framing.yaml +20 -0
- package/rules/prompt-injection/ATR-2026-00133-paraphrase-injection.yaml +17 -0
- package/rules/prompt-injection/ATR-2026-00137-authority-claim-injection.yaml +17 -0
- package/rules/prompt-injection/ATR-2026-00138-fictional-framing-bypass.yaml +20 -0
- package/rules/prompt-injection/ATR-2026-00140-indirect-reference-reversal.yaml +17 -0
- package/rules/prompt-injection/ATR-2026-00148-language-switch-injection.yaml +20 -0
- package/rules/prompt-injection/ATR-2026-00153-tool-with-embedded-instruction-to-bypass.yaml +20 -0
- package/rules/prompt-injection/ATR-2026-00154-unauthorized-background-task-execution-v.yaml +20 -0
- package/rules/prompt-injection/ATR-2026-00155-hidden-llm-instructions-in-skill-descrip.yaml +23 -0
- package/rules/prompt-injection/ATR-2026-00156-ssh-remote-command-execution-with-creden.yaml +17 -0
- package/rules/prompt-injection/ATR-2026-00163-skill-hidden-override-instruction.yaml +19 -0
- package/rules/prompt-injection/ATR-2026-00202-encoding-evasion-homoglyph-synonym.yaml +20 -0
- package/rules/prompt-injection/ATR-2026-00203-context-pollution-skill-description.yaml +23 -0
- package/rules/prompt-injection/ATR-2026-00206-hidden-priority-instructions.yaml +19 -0
- package/rules/prompt-injection/ATR-2026-00207-hidden-instructions.yaml +22 -0
- package/rules/prompt-injection/ATR-2026-00211-system-prompt-override.yaml +19 -0
- package/rules/prompt-injection/ATR-2026-00213-system-prompt-override.yaml +19 -0
- package/rules/prompt-injection/ATR-2026-00226-identity-substitution.yaml +17 -0
- package/rules/prompt-injection/ATR-2026-00227-historical-persona-jailbreak.yaml +20 -0
- package/rules/prompt-injection/ATR-2026-00228-structured-jailbreak.yaml +17 -0
- package/rules/prompt-injection/ATR-2026-00229-roleplay-jailbreak.yaml +17 -0
- package/rules/prompt-injection/ATR-2026-00230-persona-moral-bypass.yaml +20 -0
- package/rules/prompt-injection/ATR-2026-00231-identity-substitution.yaml +17 -0
- package/rules/prompt-injection/ATR-2026-00233-structured-jailbreak.yaml +17 -0
- package/rules/prompt-injection/ATR-2026-00234-roleplay-jailbreak.yaml +20 -0
- package/rules/prompt-injection/ATR-2026-00235-persona-moral-bypass.yaml +17 -0
- package/rules/prompt-injection/ATR-2026-00236-pseudo-code-jailbreak.yaml +17 -0
- package/rules/prompt-injection/ATR-2026-00237-dual-response-jailbreak.yaml +20 -0
- package/rules/prompt-injection/ATR-2026-00238-identity-replacement.yaml +20 -0
- package/rules/prompt-injection/ATR-2026-00239-amoral-persona-obsession.yaml +17 -0
- package/rules/prompt-injection/ATR-2026-00240-instruction-nullification-identity-repla.yaml +17 -0
- package/rules/prompt-injection/ATR-2026-00241-amoral-character-jailbreak.yaml +17 -0
- package/rules/prompt-injection/ATR-2026-00242-persona-jailbreak.yaml +17 -0
- package/rules/prompt-injection/ATR-2026-00243-acronym-jailbreak.yaml +17 -0
- package/rules/prompt-injection/ATR-2026-00244-dual-response-jailbreak.yaml +17 -0
- package/rules/prompt-injection/ATR-2026-00245-malicious-persona.yaml +17 -0
- package/rules/prompt-injection/ATR-2026-00247-dual-response-jailbreak.yaml +20 -0
- package/rules/prompt-injection/ATR-2026-00249-game-based-jailbreak.yaml +17 -0
- package/rules/prompt-injection/ATR-2026-00251-persona-embodiment-jailbreak.yaml +17 -0
- package/rules/prompt-injection/ATR-2026-00252-narrative-jailbreak.yaml +17 -0
- package/rules/prompt-injection/ATR-2026-00253-enhanced-persona-jailbreak.yaml +17 -0
- package/rules/prompt-injection/ATR-2026-00256-base-n-encoding-jailbreak.yaml +17 -0
- package/rules/prompt-injection/ATR-2026-00257-cipher-transposition-jailbreak.yaml +17 -0
- package/rules/prompt-injection/ATR-2026-00258-unicode-tag-injection.yaml +17 -0
- package/rules/prompt-injection/ATR-2026-00264-latent-injection-translation.yaml +17 -0
- package/rules/prompt-injection/ATR-2026-00265-latent-injection-rag-document.yaml +20 -0
- package/rules/prompt-injection/ATR-2026-00267-gcg-adversarial-suffix.yaml +17 -0
- package/rules/prompt-injection/ATR-2026-00272-hypothetical-response-smuggling.yaml +17 -0
- package/rules/prompt-injection/ATR-2026-00276-invisible-unicode-bidi-injection.yaml +17 -0
- package/rules/prompt-injection/ATR-2026-00278-dra-disguise-reconstruction-attack.yaml +17 -0
- package/rules/prompt-injection/ATR-2026-00280-policy-puppetry-xml-injection.yaml +17 -0
- package/rules/prompt-injection/ATR-2026-00282-perez-prompt-injection-hijack.yaml +17 -0
- package/rules/prompt-injection/ATR-2026-00285-alternate-encoding-jailbreak.yaml +17 -0
- package/rules/prompt-injection/ATR-2026-00286-latent-injection-embedded-context.yaml +17 -0
- package/rules/prompt-injection/ATR-2026-00296-shell-command-injection.yaml +17 -0
- package/rules/prompt-injection/ATR-2026-00297-python-code-execution-rce.yaml +17 -0
- package/rules/prompt-injection/ATR-2026-00308-zalgo-diacritic-overload-encoding.yaml +17 -0
- package/rules/prompt-injection/ATR-2026-00309-braille-unicode-encoded-injection.yaml +17 -0
- package/rules/prompt-injection/ATR-2026-00310-ecoji-emoji-encoded-injection.yaml +17 -0
- package/rules/prompt-injection/ATR-2026-00311-base2048-unicode-script-injection.yaml +17 -0
- package/rules/prompt-injection/ATR-2026-00312-unicode-variant-selector-ascii-smuggling.yaml +17 -0
- package/rules/prompt-injection/ATR-2026-00313-sneaky-bits-zero-width-binary.yaml +17 -0
- package/rules/prompt-injection/ATR-2026-00315-sata-masked-language-model-jailbreak.yaml +20 -0
- package/rules/prompt-injection/ATR-2026-00316-function-masking-predict-mask-bypass.yaml +17 -0
- package/rules/prompt-injection/ATR-2026-00321-hyphenated-system-instruction-injection.yaml +17 -0
- package/rules/prompt-injection/ATR-2026-00359-bank-phishing-smtp-email-code.yaml +20 -0
- package/rules/prompt-injection/ATR-2026-00367-emoji-flag-encoded-hidden-phrase.yaml +17 -0
- package/rules/prompt-injection/ATR-2026-00373-piracy-torrent-site-list-request.yaml +20 -0
- package/rules/prompt-injection/ATR-2026-00378-chatgpt-dom-javascript-redirect-manipulation.yaml +17 -0
- package/rules/prompt-injection/ATR-2026-00382-cocaine-component-development-request.yaml +20 -0
- package/rules/prompt-injection/ATR-2026-00390-explicit-sexual-content-poem-request.yaml +20 -0
- package/rules/prompt-injection/ATR-2026-00394-backspace-deletion-character-injection.yaml +17 -0
- package/rules/prompt-injection/ATR-2026-00395-llm-special-token-boundary-injection.yaml +17 -0
- package/rules/prompt-injection/ATR-2026-00396-toxic-continuation-prefix-injection.yaml +20 -0
- package/rules/prompt-injection/ATR-2026-00397-snowball-impossible-reasoning-injection.yaml +20 -0
- package/rules/prompt-injection/ATR-2026-00399-latent-injection-document-separator-token.yaml +17 -0
- package/rules/prompt-injection/ATR-2026-00400-latent-injection-ignore-instruction-keyword.yaml +17 -0
- package/rules/prompt-injection/ATR-2026-00401-leakreplay-cloze-mask-training-data-elicitation.yaml +20 -0
- package/rules/prompt-injection/ATR-2026-00403-glitch-token-repeat-back-elicitation.yaml +17 -0
- package/rules/prompt-injection/ATR-2026-00407-echo-sql-jinja-template-injection.yaml +17 -0
- package/rules/prompt-injection/ATR-2026-00408-dra-parenthesis-character-reconstruction-attack.yaml +17 -0
- package/rules/prompt-injection/ATR-2026-00409-phrasing-past-future-tense-harmful-query.yaml +17 -0
- package/rules/prompt-injection/ATR-2026-00410-lmrc-harm-category-direct-elicitation.yaml +23 -0
- package/rules/prompt-injection/ATR-2026-00412-figstep-image-list-multimodal-jailbreak.yaml +17 -0
- package/rules/prompt-injection/ATR-2026-00414-continuation-ethnic-slur-completion-elicitation.yaml +20 -0
- package/rules/prompt-injection/ATR-2026-00420-copilot-studio-sharepoint-indirect-injection.yaml +14 -2
- package/rules/prompt-injection/ATR-2026-00442-quoted-exact-output-forcing.yaml +25 -0
- package/rules/prompt-injection/ATR-2026-00443-word-fragment-concat-assembly.yaml +25 -0
- package/rules/prompt-injection/ATR-2026-00444-unicode-obfuscation-in-user-input.yaml +25 -0
- package/rules/prompt-injection/ATR-2026-00445-translation-hijack-with-side-output.yaml +25 -0
- package/rules/prompt-injection/ATR-2026-00446-variable-assignment-payload-injection.yaml +25 -0
- package/rules/prompt-injection/ATR-2026-00447-fictional-generation-containing-target.yaml +25 -0
- package/rules/prompt-injection/ATR-2026-00452-direct-pwned-payload-injection.yaml +9 -0
- package/rules/prompt-injection/ATR-2026-00453-secret-key-reveal-demand.yaml +9 -0
- package/rules/prompt-injection/ATR-2026-00454-backslash-per-character-encoding.yaml +9 -0
- package/rules/prompt-injection/ATR-2026-00455-no-period-output-override.yaml +9 -0
- package/rules/prompt-injection/ATR-2026-00456-conditional-empty-input-injection.yaml +9 -0
- package/rules/prompt-injection/ATR-2026-00457-garak-dan-to-every-prompt.yaml +9 -0
- package/rules/prompt-injection/ATR-2026-00458-garak-dra-step-by-step.yaml +9 -0
- package/rules/prompt-injection/ATR-2026-00459-garak-dan-an-ai-language.yaml +9 -0
- package/rules/prompt-injection/ATR-2026-00460-garak-dan-no-matter-how.yaml +9 -0
- package/rules/prompt-injection/ATR-2026-00461-garak-goodside-mixedunassigned.yaml +9 -0
- package/rules/prompt-injection/ATR-2026-00462-garak-gcg-mixedunassigned.yaml +9 -0
- package/rules/prompt-injection/ATR-2026-00463-garak-lmrc-mixedunassigned.yaml +9 -0
- package/rules/prompt-injection/ATR-2026-00464-garak-dan-i-would-like.yaml +9 -0
- package/rules/prompt-injection/ATR-2026-00465-garak-dan-your-prompt-here.yaml +9 -0
- package/rules/prompt-injection/ATR-2026-00466-garak-latentinjection-following-text-from.yaml +9 -0
- package/rules/prompt-injection/ATR-2026-00467-garak-dan-act-as-a.yaml +9 -0
- package/rules/prompt-injection/ATR-2026-00468-garak-goat-the-target-llm.yaml +9 -0
- package/rules/prompt-injection/ATR-2026-00469-garak-dan-at-the-end.yaml +9 -0
- package/rules/prompt-injection/ATR-2026-00470-garak-dan-as-an-ai.yaml +9 -0
- package/rules/prompt-injection/ATR-2026-00472-garak-dan-are-going-to.yaml +9 -0
- package/rules/prompt-injection/ATR-2026-00473-garak-lmrc-how-can-i.yaml +9 -0
- package/rules/prompt-injection/ATR-2026-00474-garak-dan-that-chatgpt-has.yaml +9 -0
- package/rules/prompt-injection/ATR-2026-00475-garak-web-injection-alert-1-script.yaml +9 -0
- package/rules/prompt-injection/ATR-2026-00476-garak-web-injection-javascript-alert-1.yaml +9 -0
- package/rules/prompt-injection/ATR-2026-00477-garak-latentinjection-translate-the-following-text.yaml +9 -0
- package/rules/prompt-injection/ATR-2026-00478-garak-latentinjection-abuse-contact-email.yaml +9 -0
- package/rules/prompt-injection/ATR-2026-00479-garak-latentinjection-mixedunassigned.yaml +9 -0
- package/rules/prompt-injection/ATR-2026-00480-garak-dan-the-role-of.yaml +9 -0
- package/rules/prompt-injection/ATR-2026-00481-garak-dan-mixedunassigned.yaml +9 -0
- package/rules/prompt-injection/ATR-2026-00482-garak-lmrc-what-s-an.yaml +9 -0
- package/rules/prompt-injection/ATR-2026-00483-garak-dan-the-user-s.yaml +9 -0
- package/rules/prompt-injection/ATR-2026-00484-garak-dan-and-you-will.yaml +9 -0
- package/rules/prompt-injection/ATR-2026-00485-garak-dan-and-does-not.yaml +9 -0
- package/rules/prompt-injection/ATR-2026-00486-garak-dan-t-care-about.yaml +9 -0
- package/rules/prompt-injection/ATR-2026-00487-garak-dan-and-nothing-else.yaml +9 -0
- package/rules/prompt-injection/ATR-2026-00488-garak-encoding-alert-1-script.yaml +9 -0
- package/rules/prompt-injection/ATR-2026-00489-garak-encoding-javascript-alert-1.yaml +9 -0
- package/rules/prompt-injection/ATR-2026-00490-garak-dan-in-order-to.yaml +9 -0
- package/rules/prompt-injection/ATR-2026-00492-garak-dan-want-you-to.yaml +9 -0
- package/rules/prompt-injection/ATR-2026-00493-garak-latentinjection-registrar-iana-id.yaml +9 -0
- package/rules/prompt-injection/ATR-2026-00495-garak-dan-insert-prompt-here.yaml +9 -0
- package/rules/prompt-injection/ATR-2026-00496-garak-dan-no-matter-what.yaml +9 -0
- package/rules/prompt-injection/ATR-2026-00497-garak-encoding-mixedunassigned.yaml +9 -0
- package/rules/prompt-injection/ATR-2026-00498-garak-web-injection-mixedunassigned.yaml +9 -0
- package/rules/prompt-injection/ATR-2026-00499-garak-dan-from-now-on.yaml +9 -0
- package/rules/prompt-injection/ATR-2026-00503-fake-error-state-takeover-unrestricted-replacement-bot.yaml +9 -0
- package/rules/prompt-injection/ATR-2026-00506-nevermind-override-goal-hijacking-in-user-input-promptinject.yaml +9 -0
- package/rules/prompt-injection/ATR-2026-00507-screaming-stop-goal-hijacking-in-user-input-promptinject.yaml +9 -0
- package/rules/prompt-injection/ATR-2026-00508-escape-delimiter-wrapped-goal-hijacking-in-user-input-prompt.yaml +9 -0
- package/rules/prompt-injection/ATR-2026-00509-prompt-leaking-via-ignore-previous-instructions-in-user-inpu.yaml +9 -0
- package/rules/prompt-injection/ATR-2026-00510-delayed-tool-invocation-injection.yaml +9 -0
- package/rules/prompt-injection/ATR-2026-00511-mcp-web-context-poisoning.yaml +9 -0
- package/rules/prompt-injection/ATR-2026-00512-rules-file-backdoor-injection.yaml +9 -0
- package/rules/prompt-injection/ATR-2026-00515-hidden-text-prompt-injection.yaml +9 -0
- package/rules/prompt-injection/ATR-2026-00518-ignore-previous-and-following-instructions-output-command-promptinject.yaml +9 -0
- package/rules/prompt-injection/ATR-2026-00519-tautology-logic-noise-injection-promptbench.yaml +9 -0
- package/rules/prompt-injection/ATR-2026-00520-nlp-task-random-token-suffix-injection-promptbench.yaml +9 -0
- package/rules/prompt-injection/ATR-2026-00535-windsurf-ide-zero-click-prompt-injection.yaml +9 -0
- package/rules/prompt-injection/ATR-2026-00550-untrusted-retrieval-to-privileged-tool.yaml +19 -0
- package/rules/prompt-injection/ATR-2026-00554-langchain-vulnerable-to-template-injecti.yaml +31 -0
- package/rules/prompt-injection/ATR-2026-00565-the-llm-cli-tool-thru-0-27-1-contains-a-.yaml +31 -0
- package/rules/prompt-injection/ATR-2026-00573-semantic-paraphrased-injection.yaml +24 -0
- package/rules/skill-compromise/ATR-2026-00060-skill-impersonation.yaml +17 -2
- package/rules/skill-compromise/ATR-2026-00061-description-behavior-mismatch.yaml +17 -0
- package/rules/skill-compromise/ATR-2026-00062-hidden-capability.yaml +20 -0
- package/rules/skill-compromise/ATR-2026-00063-skill-chain-attack.yaml +23 -0
- package/rules/skill-compromise/ATR-2026-00064-over-permissioned-skill.yaml +20 -0
- package/rules/skill-compromise/ATR-2026-00065-skill-update-attack.yaml +20 -0
- package/rules/skill-compromise/ATR-2026-00066-parameter-injection.yaml +20 -0
- package/rules/skill-compromise/ATR-2026-00120-skill-instruction-injection.yaml +20 -0
- package/rules/skill-compromise/ATR-2026-00121-skill-dangerous-script.yaml +17 -0
- package/rules/skill-compromise/ATR-2026-00122-skill-weaponized-instruction.yaml +20 -0
- package/rules/skill-compromise/ATR-2026-00123-skill-overreach-permissions.yaml +23 -0
- package/rules/skill-compromise/ATR-2026-00124-skill-name-squatting.yaml +20 -0
- package/rules/skill-compromise/ATR-2026-00125-context-poisoning-compaction.yaml +20 -0
- package/rules/skill-compromise/ATR-2026-00126-skill-rug-pull-setup.yaml +17 -0
- package/rules/skill-compromise/ATR-2026-00127-subcommand-overflow.yaml +17 -0
- package/rules/skill-compromise/ATR-2026-00128-html-comment-hidden-payload.yaml +17 -0
- package/rules/skill-compromise/ATR-2026-00129-unicode-smuggling.yaml +22 -0
- package/rules/skill-compromise/ATR-2026-00134-fork-claim-impersonation.yaml +19 -0
- package/rules/skill-compromise/ATR-2026-00135-exfil-url-in-instructions.yaml +20 -0
- package/rules/skill-compromise/ATR-2026-00147-fork-impersonation.yaml +17 -0
- package/rules/skill-compromise/ATR-2026-00149-skill-exfil-compound.yaml +23 -0
- package/rules/skill-compromise/ATR-2026-00151-fork-impersonation-install.yaml +20 -0
- package/rules/skill-compromise/ATR-2026-00157-timebomb-credential-exfil.yaml +20 -0
- package/rules/skill-compromise/ATR-2026-00200-agent-memory-config-tampering.yaml +23 -0
- package/rules/skill-compromise/ATR-2026-00214-credential-theft.yaml +22 -0
- package/rules/skill-compromise/ATR-2026-00217-credential-harvesting.yaml +23 -0
- package/rules/skill-compromise/ATR-2026-00220-malware-dropper.yaml +17 -0
- package/rules/skill-compromise/ATR-2026-00222-credential-harvesting.yaml +17 -0
- package/rules/skill-compromise/ATR-2026-00223-reverse-shell-dropper.yaml +20 -0
- package/rules/skill-compromise/ATR-2026-00224-credential-exfiltration.yaml +17 -0
- package/rules/skill-compromise/ATR-2026-00225-c2-communication.yaml +17 -0
- package/rules/skill-compromise/ATR-2026-00260-package-hallucination.yaml +20 -0
- package/rules/skill-compromise/ATR-2026-00262-av-evasion-code-gen.yaml +20 -0
- package/rules/skill-compromise/ATR-2026-00263-credential-file-read-gen.yaml +20 -0
- package/rules/skill-compromise/ATR-2026-00266-malware-dropper-gen.yaml +23 -0
- package/rules/skill-compromise/ATR-2026-00283-malwaregen-generic-virus-payload-request.yaml +23 -0
- package/rules/skill-compromise/ATR-2026-00398-huggingface-unsafe-model-artifact-load.yaml +17 -0
- package/rules/skill-compromise/ATR-2026-00425-nl-persistent-covert-hook.yaml +18 -0
- package/rules/skill-compromise/ATR-2026-00427-nl-fake-error-instruction-bypass.yaml +18 -0
- package/rules/skill-compromise/ATR-2026-00429-nl-skill-self-modification.yaml +18 -0
- package/rules/skill-compromise/ATR-2026-00523-claude-code-hooks-session-start-pre-trust-rce.yaml +14 -2
- package/rules/skill-compromise/ATR-2026-00525-mini-shai-hulud-gh-token-monitor-persistence.yaml +18 -0
- package/rules/skill-compromise/ATR-2026-00527-skill-silent-git-remote-mirror-exfiltration.yaml +15 -0
- package/rules/tool-poisoning/ATR-2026-00010-mcp-malicious-response.yaml +11 -2
- package/rules/tool-poisoning/ATR-2026-00011-tool-output-injection.yaml +17 -0
- package/rules/tool-poisoning/ATR-2026-00012-unauthorized-tool-call.yaml +17 -0
- package/rules/tool-poisoning/ATR-2026-00013-tool-ssrf.yaml +17 -0
- package/rules/tool-poisoning/ATR-2026-00095-supply-chain-poisoning.yaml +22 -0
- package/rules/tool-poisoning/ATR-2026-00096-registry-poisoning.yaml +19 -0
- package/rules/tool-poisoning/ATR-2026-00100-consent-bypass-instruction.yaml +20 -0
- package/rules/tool-poisoning/ATR-2026-00101-trust-escalation-override.yaml +20 -0
- package/rules/tool-poisoning/ATR-2026-00103-hidden-safety-bypass-instruction.yaml +17 -0
- package/rules/tool-poisoning/ATR-2026-00105-silent-action-concealment.yaml +20 -0
- package/rules/tool-poisoning/ATR-2026-00106-schema-description-contradiction.yaml +17 -0
- package/rules/tool-poisoning/ATR-2026-00161-important-tag-cross-tool-shadowing.yaml +20 -0
- package/rules/tool-poisoning/ATR-2026-00209-mcpwn-runaway-invocation.yaml +14 -2
- package/rules/tool-poisoning/ATR-2026-00210-flowise-system-message-override.yaml +11 -2
- package/rules/tool-poisoning/ATR-2026-00259-ansi-escape-injection.yaml +17 -0
- package/rules/tool-poisoning/ATR-2026-00270-xss-in-tool-response.yaml +17 -0
- package/rules/tool-poisoning/ATR-2026-00277-echo-template-command-injection.yaml +17 -0
- package/rules/tool-poisoning/ATR-2026-00393-ansi-code-elicitation-request.yaml +17 -0
- package/rules/tool-poisoning/ATR-2026-00415-flowise-custom-mcp-stdio-rce.yaml +11 -2
- package/rules/tool-poisoning/ATR-2026-00419-cursor-mcp-zero-click-config.yaml +13 -1
- package/rules/tool-poisoning/ATR-2026-00434-mcp-remote-authorization-endpoint-command-injection.yaml +11 -2
- package/rules/tool-poisoning/ATR-2026-00435-azure-mcp-server-missing-authentication.yaml +11 -2
- package/rules/tool-poisoning/ATR-2026-00448-spring-ai-milvus-filter-injection.yaml +11 -2
- package/rules/tool-poisoning/ATR-2026-00494-garak-exploitation-mixedunassigned.yaml +12 -0
- package/rules/tool-poisoning/ATR-2026-00513-package-hallucination-exploitation.yaml +12 -0
- package/rules/tool-poisoning/ATR-2026-00521-shell-command-injection-agent-tool-context.yaml +12 -0
- package/rules/tool-poisoning/ATR-2026-00522-sql-injection-natural-language-agent-interface.yaml +12 -0
- package/rules/tool-poisoning/ATR-2026-00526-claude-code-shell-metachar-in-double-quoted-path.yaml +15 -0
- package/rules/tool-poisoning/ATR-2026-00529-litellm-proxy-sqli-cisa-kev.yaml +15 -0
- package/rules/tool-poisoning/ATR-2026-00530-ms-agent-shell-tool-unsanitized-argv-rce.yaml +15 -0
- package/rules/tool-poisoning/ATR-2026-00531-praisonai-unauthenticated-agent-api.yaml +11 -2
- package/rules/tool-poisoning/ATR-2026-00532-apache-doris-mcp-sql-injection.yaml +11 -2
- package/rules/tool-poisoning/ATR-2026-00533-apache-pinot-mcp-unauthenticated-takeover.yaml +10 -1
- package/rules/tool-poisoning/ATR-2026-00534-alibaba-rds-mcp-unauthenticated-metadata-exfil.yaml +10 -1
- package/rules/tool-poisoning/ATR-2026-00536-nginx-ui-mcp-unauthenticated-command-execution.yaml +11 -2
- package/rules/tool-poisoning/ATR-2026-00537-fastmcp-server-name-cmd-injection-windows.yaml +11 -2
- package/rules/tool-poisoning/ATR-2026-00538-langchain-chatchat-mcp-stdio-unauthenticated-rce.yaml +10 -1
- package/rules/tool-poisoning/ATR-2026-00540-praisonai-parse-mcp-command-cli-injection.yaml +13 -1
- package/rules/tool-poisoning/ATR-2026-00541-agent-zero-mcp-config-command-injection.yaml +13 -1
- package/rules/tool-poisoning/ATR-2026-00542-upsonic-mcp-command-allowlist-bypass.yaml +13 -1
- package/rules/tool-poisoning/ATR-2026-00543-litellm-mcp-server-argv-injection.yaml +13 -1
- package/rules/tool-poisoning/ATR-2026-00544-praisonai-pth-file-path-traversal-rce.yaml +13 -1
- package/rules/tool-poisoning/ATR-2026-00545-praisonai-tool-override-unauth-rce.yaml +13 -1
- package/rules/tool-poisoning/ATR-2026-00561-fastmcp-vulnerable-to-windows-command-in.yaml +28 -0
- package/rules/tool-poisoning/ATR-2026-00567-mcp-stdio-config-command-injection.yaml +28 -0
- package/rules/tool-poisoning/ATR-2026-00568-agent-ssrf-cloud-metadata-file-inclusion.yaml +28 -0
- package/rules/tool-poisoning/ATR-2026-00572-symjack-symlink-config-redirection.yaml +22 -0
- package/spec/atr-schema.yaml +123 -0
- package/spec/compliance-metadata.md +15 -13
|
@@ -45,6 +45,23 @@ compliance:
|
|
|
45
45
|
- subcategory: "MG.2.3"
|
|
46
46
|
context: "Detection of DAN Mode persona-bypass attempts triggers risk treatment plans to disengage or override the coerced persona before policy-violating output is returned; MG.2.3 mandates that such supersede/deactivate mechanisms exist for jailbreak events."
|
|
47
47
|
strength: secondary
|
|
48
|
+
iso_42001:
|
|
49
|
+
- clause: "8.1"
|
|
50
|
+
context: "ISO/IEC 42001 Clause 8.1 (operational planning and control, including control of externally provided processes) is operationalised by this rule's detection of the agent-manipulation technique (DAN Mode Ablation Benchmark-Coercion Jailbreak)."
|
|
51
|
+
strength: primary
|
|
52
|
+
- clause: "6.2"
|
|
53
|
+
context: "ISO/IEC 42001 Clause 6.2 (AI objectives and planning) calls for risk treatment of known attack patterns; this rule's detection of the agent-manipulation technique (DAN Mode Ablation Benchmark-Coercion Jailbreak) is such a treatment."
|
|
54
|
+
strength: secondary
|
|
55
|
+
eu_ai_act:
|
|
56
|
+
- article: "15"
|
|
57
|
+
context: "Article 15 (accuracy, robustness and cybersecurity) requires high-risk AI systems to resist unauthorised attempts to alter their use, outputs or performance; this rule provides runtime detection evidence by flagging the agent-manipulation technique (DAN Mode Ablation Benchmark-Coercion Jailbreak)."
|
|
58
|
+
strength: primary
|
|
59
|
+
- article: "14"
|
|
60
|
+
context: "Article 14 (human oversight) requires high-risk AI systems to remain subject to effective human oversight; this rule provides detection evidence where the agent-manipulation technique (DAN Mode Ablation Benchmark-Coercion Jailbreak) would bypass or undermine that oversight."
|
|
61
|
+
strength: secondary
|
|
62
|
+
- article: "9"
|
|
63
|
+
context: "Article 9 (risk management system) requires identified risks to be addressed by appropriate measures; this rule is a runtime risk-treatment control that detects the agent-manipulation technique (DAN Mode Ablation Benchmark-Coercion Jailbreak)."
|
|
64
|
+
strength: secondary
|
|
48
65
|
tags:
|
|
49
66
|
category: agent-manipulation
|
|
50
67
|
subcategory: dan-mode-ablation
|
|
@@ -49,6 +49,23 @@ compliance:
|
|
|
49
49
|
context: >-
|
|
50
50
|
On match, the rule supplies the trigger needed to disengage or quarantine the affected session before the GA-optimized jailbreak elicits policy-violating output; MG.2.3 requires that such supersede/deactivate mechanisms are in place for jailbreak events.
|
|
51
51
|
strength: secondary
|
|
52
|
+
iso_42001:
|
|
53
|
+
- clause: "8.1"
|
|
54
|
+
context: "ISO/IEC 42001 Clause 8.1 (operational planning and control, including control of externally provided processes) is operationalised by this rule's detection of the agent-manipulation technique (AutoDAN Genetic-Algorithm Jailbreak Suffix)."
|
|
55
|
+
strength: primary
|
|
56
|
+
- clause: "6.2"
|
|
57
|
+
context: "ISO/IEC 42001 Clause 6.2 (AI objectives and planning) calls for risk treatment of known attack patterns; this rule's detection of the agent-manipulation technique (AutoDAN Genetic-Algorithm Jailbreak Suffix) is such a treatment."
|
|
58
|
+
strength: secondary
|
|
59
|
+
eu_ai_act:
|
|
60
|
+
- article: "15"
|
|
61
|
+
context: "Article 15 (accuracy, robustness and cybersecurity) requires high-risk AI systems to resist unauthorised attempts to alter their use, outputs or performance; this rule provides runtime detection evidence by flagging the agent-manipulation technique (AutoDAN Genetic-Algorithm Jailbreak Suffix)."
|
|
62
|
+
strength: primary
|
|
63
|
+
- article: "14"
|
|
64
|
+
context: "Article 14 (human oversight) requires high-risk AI systems to remain subject to effective human oversight; this rule provides detection evidence where the agent-manipulation technique (AutoDAN Genetic-Algorithm Jailbreak Suffix) would bypass or undermine that oversight."
|
|
65
|
+
strength: secondary
|
|
66
|
+
- article: "9"
|
|
67
|
+
context: "Article 9 (risk management system) requires identified risks to be addressed by appropriate measures; this rule is a runtime risk-treatment control that detects the agent-manipulation technique (AutoDAN Genetic-Algorithm Jailbreak Suffix)."
|
|
68
|
+
strength: secondary
|
|
52
69
|
tags:
|
|
53
70
|
category: agent-manipulation
|
|
54
71
|
subcategory: autodan-genetic-suffix
|
|
@@ -50,6 +50,26 @@ compliance:
|
|
|
50
50
|
context: >-
|
|
51
51
|
Matches against community-developed jailbreak personas feed post-deployment monitoring pipelines that track emerging real-world bypass techniques; MG.4.1 requires implemented monitoring plans that ingest detection events for these in-the-wild attack patterns.
|
|
52
52
|
strength: secondary
|
|
53
|
+
- subcategory: "MG.2.3"
|
|
54
|
+
context: "NIST AI RMF MANAGE 2.3 (respond to previously unknown identified risks) is supported by this rule, which surfaces the agent-manipulation technique (In-the-Wild Jailbreak Corpus Signature Patterns) so the risk can be treated."
|
|
55
|
+
strength: secondary
|
|
56
|
+
iso_42001:
|
|
57
|
+
- clause: "8.1"
|
|
58
|
+
context: "ISO/IEC 42001 Clause 8.1 (operational planning and control, including control of externally provided processes) is operationalised by this rule's detection of the agent-manipulation technique (In-the-Wild Jailbreak Corpus Signature Patterns)."
|
|
59
|
+
strength: primary
|
|
60
|
+
- clause: "6.2"
|
|
61
|
+
context: "ISO/IEC 42001 Clause 6.2 (AI objectives and planning) calls for risk treatment of known attack patterns; this rule's detection of the agent-manipulation technique (In-the-Wild Jailbreak Corpus Signature Patterns) is such a treatment."
|
|
62
|
+
strength: secondary
|
|
63
|
+
eu_ai_act:
|
|
64
|
+
- article: "15"
|
|
65
|
+
context: "Article 15 (accuracy, robustness and cybersecurity) requires high-risk AI systems to resist unauthorised attempts to alter their use, outputs or performance; this rule provides runtime detection evidence by flagging the agent-manipulation technique (In-the-Wild Jailbreak Corpus Signature Patterns)."
|
|
66
|
+
strength: primary
|
|
67
|
+
- article: "14"
|
|
68
|
+
context: "Article 14 (human oversight) requires high-risk AI systems to remain subject to effective human oversight; this rule provides detection evidence where the agent-manipulation technique (In-the-Wild Jailbreak Corpus Signature Patterns) would bypass or undermine that oversight."
|
|
69
|
+
strength: secondary
|
|
70
|
+
- article: "9"
|
|
71
|
+
context: "Article 9 (risk management system) requires identified risks to be addressed by appropriate measures; this rule is a runtime risk-treatment control that detects the agent-manipulation technique (In-the-Wild Jailbreak Corpus Signature Patterns)."
|
|
72
|
+
strength: secondary
|
|
53
73
|
tags:
|
|
54
74
|
category: agent-manipulation
|
|
55
75
|
subcategory: inthewild-jailbreak-corpus
|
package/rules/agent-manipulation/ATR-2026-00314-amoral-unfiltered-custom-persona-jailbreak.yaml
CHANGED
|
@@ -45,6 +45,23 @@ compliance:
|
|
|
45
45
|
context: >-
|
|
46
46
|
Detection of amoral/unfiltered persona jailbreaks triggers mechanisms to disengage or block the AI's compromised response path before it produces unfiltered harmful output; MG.2.3 requires these supersede/deactivate controls be available on detection.
|
|
47
47
|
strength: secondary
|
|
48
|
+
iso_42001:
|
|
49
|
+
- clause: "8.1"
|
|
50
|
+
context: "ISO/IEC 42001 Clause 8.1 (operational planning and control, including control of externally provided processes) is operationalised by this rule's detection of the agent-manipulation technique (Amoral Unfiltered Custom AI Persona Jailbreak)."
|
|
51
|
+
strength: primary
|
|
52
|
+
- clause: "6.2"
|
|
53
|
+
context: "ISO/IEC 42001 Clause 6.2 (AI objectives and planning) calls for risk treatment of known attack patterns; this rule's detection of the agent-manipulation technique (Amoral Unfiltered Custom AI Persona Jailbreak) is such a treatment."
|
|
54
|
+
strength: secondary
|
|
55
|
+
eu_ai_act:
|
|
56
|
+
- article: "15"
|
|
57
|
+
context: "Article 15 (accuracy, robustness and cybersecurity) requires high-risk AI systems to resist unauthorised attempts to alter their use, outputs or performance; this rule provides runtime detection evidence by flagging the agent-manipulation technique (Amoral Unfiltered Custom AI Persona Jailbreak)."
|
|
58
|
+
strength: primary
|
|
59
|
+
- article: "14"
|
|
60
|
+
context: "Article 14 (human oversight) requires high-risk AI systems to remain subject to effective human oversight; this rule provides detection evidence where the agent-manipulation technique (Amoral Unfiltered Custom AI Persona Jailbreak) would bypass or undermine that oversight."
|
|
61
|
+
strength: secondary
|
|
62
|
+
- article: "9"
|
|
63
|
+
context: "Article 9 (risk management system) requires identified risks to be addressed by appropriate measures; this rule is a runtime risk-treatment control that detects the agent-manipulation technique (Amoral Unfiltered Custom AI Persona Jailbreak)."
|
|
64
|
+
strength: secondary
|
|
48
65
|
tags:
|
|
49
66
|
category: agent-manipulation
|
|
50
67
|
subcategory: amoral-unfiltered-custom-persona
|
|
@@ -48,6 +48,23 @@ compliance:
|
|
|
48
48
|
context: >-
|
|
49
49
|
Detection of explicit restriction-removal declarations triggers pre-defined risk treatment to block, quarantine, or disengage the manipulated session before the unfiltered persona executes harmful instructions; MG.2.3 requires these supersede/deactivate mechanisms be in place.
|
|
50
50
|
strength: secondary
|
|
51
|
+
iso_42001:
|
|
52
|
+
- clause: "8.1"
|
|
53
|
+
context: "ISO/IEC 42001 Clause 8.1 (operational planning and control, including control of externally provided processes) is operationalised by this rule's detection of the agent-manipulation technique (Free-of-Restrictions Named Persona Jailbreak)."
|
|
54
|
+
strength: primary
|
|
55
|
+
- clause: "6.2"
|
|
56
|
+
context: "ISO/IEC 42001 Clause 6.2 (AI objectives and planning) calls for risk treatment of known attack patterns; this rule's detection of the agent-manipulation technique (Free-of-Restrictions Named Persona Jailbreak) is such a treatment."
|
|
57
|
+
strength: secondary
|
|
58
|
+
eu_ai_act:
|
|
59
|
+
- article: "15"
|
|
60
|
+
context: "Article 15 (accuracy, robustness and cybersecurity) requires high-risk AI systems to resist unauthorised attempts to alter their use, outputs or performance; this rule provides runtime detection evidence by flagging the agent-manipulation technique (Free-of-Restrictions Named Persona Jailbreak)."
|
|
61
|
+
strength: primary
|
|
62
|
+
- article: "14"
|
|
63
|
+
context: "Article 14 (human oversight) requires high-risk AI systems to remain subject to effective human oversight; this rule provides detection evidence where the agent-manipulation technique (Free-of-Restrictions Named Persona Jailbreak) would bypass or undermine that oversight."
|
|
64
|
+
strength: secondary
|
|
65
|
+
- article: "9"
|
|
66
|
+
context: "Article 9 (risk management system) requires identified risks to be addressed by appropriate measures; this rule is a runtime risk-treatment control that detects the agent-manipulation technique (Free-of-Restrictions Named Persona Jailbreak)."
|
|
67
|
+
strength: secondary
|
|
51
68
|
tags:
|
|
52
69
|
category: agent-manipulation
|
|
53
70
|
subcategory: free-of-restrictions-named-persona
|
|
@@ -44,6 +44,23 @@ compliance:
|
|
|
44
44
|
- subcategory: "MG.2.3"
|
|
45
45
|
context: "Detection of the 'mandatory bullshit warning' and 'let's break the fuckin' rules' trigger phrases enables runtime interception so safety controls can disengage or block the unrestricted persona before it produces policy-violating output, as required by MG.2.3."
|
|
46
46
|
strength: secondary
|
|
47
|
+
iso_42001:
|
|
48
|
+
- clause: "8.1"
|
|
49
|
+
context: "ISO/IEC 42001 Clause 8.1 (operational planning and control, including control of externally provided processes) is operationalised by this rule's detection of the agent-manipulation technique (Moralizing Rant Then Unfiltered Bypass)."
|
|
50
|
+
strength: primary
|
|
51
|
+
- clause: "6.2"
|
|
52
|
+
context: "ISO/IEC 42001 Clause 6.2 (AI objectives and planning) calls for risk treatment of known attack patterns; this rule's detection of the agent-manipulation technique (Moralizing Rant Then Unfiltered Bypass) is such a treatment."
|
|
53
|
+
strength: secondary
|
|
54
|
+
eu_ai_act:
|
|
55
|
+
- article: "15"
|
|
56
|
+
context: "Article 15 (accuracy, robustness and cybersecurity) requires high-risk AI systems to resist unauthorised attempts to alter their use, outputs or performance; this rule provides runtime detection evidence by flagging the agent-manipulation technique (Moralizing Rant Then Unfiltered Bypass)."
|
|
57
|
+
strength: primary
|
|
58
|
+
- article: "14"
|
|
59
|
+
context: "Article 14 (human oversight) requires high-risk AI systems to remain subject to effective human oversight; this rule provides detection evidence where the agent-manipulation technique (Moralizing Rant Then Unfiltered Bypass) would bypass or undermine that oversight."
|
|
60
|
+
strength: secondary
|
|
61
|
+
- article: "9"
|
|
62
|
+
context: "Article 9 (risk management system) requires identified risks to be addressed by appropriate measures; this rule is a runtime risk-treatment control that detects the agent-manipulation technique (Moralizing Rant Then Unfiltered Bypass)."
|
|
63
|
+
strength: secondary
|
|
47
64
|
tags:
|
|
48
65
|
category: agent-manipulation
|
|
49
66
|
subcategory: moralizing-rant-unfiltered-bypass
|
|
@@ -43,6 +43,23 @@ compliance:
|
|
|
43
43
|
- subcategory: "MG.2.3"
|
|
44
44
|
context: "When Developer Mode jailbreak signatures are detected, MG.2.3 risk treatment mechanisms must intervene to block, supersede, or deactivate the manipulated response path before unsafe content is delivered to the user."
|
|
45
45
|
strength: secondary
|
|
46
|
+
iso_42001:
|
|
47
|
+
- clause: "8.1"
|
|
48
|
+
context: "ISO/IEC 42001 Clause 8.1 (operational planning and control, including control of externally provided processes) is operationalised by this rule's detection of the agent-manipulation technique (Developer Mode Dual-Response Format Jailbreak)."
|
|
49
|
+
strength: primary
|
|
50
|
+
- clause: "6.2"
|
|
51
|
+
context: "ISO/IEC 42001 Clause 6.2 (AI objectives and planning) calls for risk treatment of known attack patterns; this rule's detection of the agent-manipulation technique (Developer Mode Dual-Response Format Jailbreak) is such a treatment."
|
|
52
|
+
strength: secondary
|
|
53
|
+
eu_ai_act:
|
|
54
|
+
- article: "15"
|
|
55
|
+
context: "Article 15 (accuracy, robustness and cybersecurity) requires high-risk AI systems to resist unauthorised attempts to alter their use, outputs or performance; this rule provides runtime detection evidence by flagging the agent-manipulation technique (Developer Mode Dual-Response Format Jailbreak)."
|
|
56
|
+
strength: primary
|
|
57
|
+
- article: "14"
|
|
58
|
+
context: "Article 14 (human oversight) requires high-risk AI systems to remain subject to effective human oversight; this rule provides detection evidence where the agent-manipulation technique (Developer Mode Dual-Response Format Jailbreak) would bypass or undermine that oversight."
|
|
59
|
+
strength: secondary
|
|
60
|
+
- article: "9"
|
|
61
|
+
context: "Article 9 (risk management system) requires identified risks to be addressed by appropriate measures; this rule is a runtime risk-treatment control that detects the agent-manipulation technique (Developer Mode Dual-Response Format Jailbreak)."
|
|
62
|
+
strength: secondary
|
|
46
63
|
tags:
|
|
47
64
|
category: agent-manipulation
|
|
48
65
|
subcategory: developer-mode-dual-response
|
|
@@ -44,6 +44,23 @@ compliance:
|
|
|
44
44
|
- subcategory: "MG.2.3"
|
|
45
45
|
context: "Detection of opposite-day persona adoption, BOM reversal framing, and RTL-override evasion triggers risk treatment plans to disengage or block the manipulated session before safety bypass occurs; MG.2.3 requires these mechanisms to supersede compromised AI behavior."
|
|
46
46
|
strength: secondary
|
|
47
|
+
iso_42001:
|
|
48
|
+
- clause: "8.1"
|
|
49
|
+
context: "ISO/IEC 42001 Clause 8.1 (operational planning and control, including control of externally provided processes) is operationalised by this rule's detection of the agent-manipulation technique (Opposite Day / Boolean Opposite Machine Jailbreak)."
|
|
50
|
+
strength: primary
|
|
51
|
+
- clause: "6.2"
|
|
52
|
+
context: "ISO/IEC 42001 Clause 6.2 (AI objectives and planning) calls for risk treatment of known attack patterns; this rule's detection of the agent-manipulation technique (Opposite Day / Boolean Opposite Machine Jailbreak) is such a treatment."
|
|
53
|
+
strength: secondary
|
|
54
|
+
eu_ai_act:
|
|
55
|
+
- article: "15"
|
|
56
|
+
context: "Article 15 (accuracy, robustness and cybersecurity) requires high-risk AI systems to resist unauthorised attempts to alter their use, outputs or performance; this rule provides runtime detection evidence by flagging the agent-manipulation technique (Opposite Day / Boolean Opposite Machine Jailbreak)."
|
|
57
|
+
strength: primary
|
|
58
|
+
- article: "14"
|
|
59
|
+
context: "Article 14 (human oversight) requires high-risk AI systems to remain subject to effective human oversight; this rule provides detection evidence where the agent-manipulation technique (Opposite Day / Boolean Opposite Machine Jailbreak) would bypass or undermine that oversight."
|
|
60
|
+
strength: secondary
|
|
61
|
+
- article: "9"
|
|
62
|
+
context: "Article 9 (risk management system) requires identified risks to be addressed by appropriate measures; this rule is a runtime risk-treatment control that detects the agent-manipulation technique (Opposite Day / Boolean Opposite Machine Jailbreak)."
|
|
63
|
+
strength: secondary
|
|
47
64
|
tags:
|
|
48
65
|
category: agent-manipulation
|
|
49
66
|
subcategory: opposite-day-bom-reversal
|
package/rules/agent-manipulation/ATR-2026-00322-fictional-world-format-override-jailbreak.yaml
CHANGED
|
@@ -45,6 +45,23 @@ compliance:
|
|
|
45
45
|
context: >-
|
|
46
46
|
Detection of these locked-format jailbreak templates triggers risk treatment plans to disengage or block the manipulated session before harmful content is emitted in the attacker-controlled response slot; MG.2.3 mandates such supersede/deactivate mechanisms.
|
|
47
47
|
strength: secondary
|
|
48
|
+
iso_42001:
|
|
49
|
+
- clause: "8.1"
|
|
50
|
+
context: "ISO/IEC 42001 Clause 8.1 (operational planning and control, including control of externally provided processes) is operationalised by this rule's detection of the agent-manipulation technique (Fictional-World Format-Override Jailbreak)."
|
|
51
|
+
strength: primary
|
|
52
|
+
- clause: "6.2"
|
|
53
|
+
context: "ISO/IEC 42001 Clause 6.2 (AI objectives and planning) calls for risk treatment of known attack patterns; this rule's detection of the agent-manipulation technique (Fictional-World Format-Override Jailbreak) is such a treatment."
|
|
54
|
+
strength: secondary
|
|
55
|
+
eu_ai_act:
|
|
56
|
+
- article: "15"
|
|
57
|
+
context: "Article 15 (accuracy, robustness and cybersecurity) requires high-risk AI systems to resist unauthorised attempts to alter their use, outputs or performance; this rule provides runtime detection evidence by flagging the agent-manipulation technique (Fictional-World Format-Override Jailbreak)."
|
|
58
|
+
strength: primary
|
|
59
|
+
- article: "14"
|
|
60
|
+
context: "Article 14 (human oversight) requires high-risk AI systems to remain subject to effective human oversight; this rule provides detection evidence where the agent-manipulation technique (Fictional-World Format-Override Jailbreak) would bypass or undermine that oversight."
|
|
61
|
+
strength: secondary
|
|
62
|
+
- article: "9"
|
|
63
|
+
context: "Article 9 (risk management system) requires identified risks to be addressed by appropriate measures; this rule is a runtime risk-treatment control that detects the agent-manipulation technique (Fictional-World Format-Override Jailbreak)."
|
|
64
|
+
strength: secondary
|
|
48
65
|
tags:
|
|
49
66
|
category: agent-manipulation
|
|
50
67
|
subcategory: fictional-world-format-override
|
|
@@ -45,6 +45,23 @@ compliance:
|
|
|
45
45
|
- subcategory: "MS.2.5"
|
|
46
46
|
context: "Recurrent dual-persona jailbreak attempts test the robustness of the model's safety guardrails under adversarial role-play framing; MS.2.5 requires that robustness/reliability evaluations document the system's resistance to these persona-splitting attacks."
|
|
47
47
|
strength: secondary
|
|
48
|
+
iso_42001:
|
|
49
|
+
- clause: "8.1"
|
|
50
|
+
context: "ISO/IEC 42001 Clause 8.1 (operational planning and control, including control of externally provided processes) is operationalised by this rule's detection of the agent-manipulation technique (Dual-Persona AI Simulation Jailbreak)."
|
|
51
|
+
strength: primary
|
|
52
|
+
- clause: "6.2"
|
|
53
|
+
context: "ISO/IEC 42001 Clause 6.2 (AI objectives and planning) calls for risk treatment of known attack patterns; this rule's detection of the agent-manipulation technique (Dual-Persona AI Simulation Jailbreak) is such a treatment."
|
|
54
|
+
strength: secondary
|
|
55
|
+
eu_ai_act:
|
|
56
|
+
- article: "15"
|
|
57
|
+
context: "Article 15 (accuracy, robustness and cybersecurity) requires high-risk AI systems to resist unauthorised attempts to alter their use, outputs or performance; this rule provides runtime detection evidence by flagging the agent-manipulation technique (Dual-Persona AI Simulation Jailbreak)."
|
|
58
|
+
strength: primary
|
|
59
|
+
- article: "14"
|
|
60
|
+
context: "Article 14 (human oversight) requires high-risk AI systems to remain subject to effective human oversight; this rule provides detection evidence where the agent-manipulation technique (Dual-Persona AI Simulation Jailbreak) would bypass or undermine that oversight."
|
|
61
|
+
strength: secondary
|
|
62
|
+
- article: "9"
|
|
63
|
+
context: "Article 9 (risk management system) requires identified risks to be addressed by appropriate measures; this rule is a runtime risk-treatment control that detects the agent-manipulation technique (Dual-Persona AI Simulation Jailbreak)."
|
|
64
|
+
strength: secondary
|
|
48
65
|
tags:
|
|
49
66
|
category: agent-manipulation
|
|
50
67
|
subcategory: dual-persona-simulation
|
package/rules/agent-manipulation/ATR-2026-00324-neurosemantical-inversitis-disease-jailbreak.yaml
CHANGED
|
@@ -56,6 +56,23 @@ compliance:
|
|
|
56
56
|
harmful content; MG.2.3 mandates that such supersede/deactivate response
|
|
57
57
|
plans are pre-defined and activated on detection.
|
|
58
58
|
strength: secondary
|
|
59
|
+
iso_42001:
|
|
60
|
+
- clause: "8.1"
|
|
61
|
+
context: "ISO/IEC 42001 Clause 8.1 (operational planning and control, including control of externally provided processes) is operationalised by this rule's detection of the agent-manipulation technique (Neurosemantical Inversitis Disease-Frame Jailbreak)."
|
|
62
|
+
strength: primary
|
|
63
|
+
- clause: "6.2"
|
|
64
|
+
context: "ISO/IEC 42001 Clause 6.2 (AI objectives and planning) calls for risk treatment of known attack patterns; this rule's detection of the agent-manipulation technique (Neurosemantical Inversitis Disease-Frame Jailbreak) is such a treatment."
|
|
65
|
+
strength: secondary
|
|
66
|
+
eu_ai_act:
|
|
67
|
+
- article: "15"
|
|
68
|
+
context: "Article 15 (accuracy, robustness and cybersecurity) requires high-risk AI systems to resist unauthorised attempts to alter their use, outputs or performance; this rule provides runtime detection evidence by flagging the agent-manipulation technique (Neurosemantical Inversitis Disease-Frame Jailbreak)."
|
|
69
|
+
strength: primary
|
|
70
|
+
- article: "14"
|
|
71
|
+
context: "Article 14 (human oversight) requires high-risk AI systems to remain subject to effective human oversight; this rule provides detection evidence where the agent-manipulation technique (Neurosemantical Inversitis Disease-Frame Jailbreak) would bypass or undermine that oversight."
|
|
72
|
+
strength: secondary
|
|
73
|
+
- article: "9"
|
|
74
|
+
context: "Article 9 (risk management system) requires identified risks to be addressed by appropriate measures; this rule is a runtime risk-treatment control that detects the agent-manipulation technique (Neurosemantical Inversitis Disease-Frame Jailbreak)."
|
|
75
|
+
strength: secondary
|
|
59
76
|
tags:
|
|
60
77
|
category: agent-manipulation
|
|
61
78
|
subcategory: disease-frame-inversion-jailbreak
|
|
@@ -42,6 +42,23 @@ compliance:
|
|
|
42
42
|
- subcategory: "MG.2.3"
|
|
43
43
|
context: "Detection of the BOB persona injection triggers the deactivation/supersession mechanisms required by MG.2.3 to halt the session before the model produces toxic, unethical, or otherwise restricted output under the hypothetical disclaimer."
|
|
44
44
|
strength: secondary
|
|
45
|
+
iso_42001:
|
|
46
|
+
- clause: "8.1"
|
|
47
|
+
context: "ISO/IEC 42001 Clause 8.1 (operational planning and control, including control of externally provided processes) is operationalised by this rule's detection of the agent-manipulation technique (BOB Hypothetical Unrestricted Chatbot Jailbreak)."
|
|
48
|
+
strength: primary
|
|
49
|
+
- clause: "6.2"
|
|
50
|
+
context: "ISO/IEC 42001 Clause 6.2 (AI objectives and planning) calls for risk treatment of known attack patterns; this rule's detection of the agent-manipulation technique (BOB Hypothetical Unrestricted Chatbot Jailbreak) is such a treatment."
|
|
51
|
+
strength: secondary
|
|
52
|
+
eu_ai_act:
|
|
53
|
+
- article: "15"
|
|
54
|
+
context: "Article 15 (accuracy, robustness and cybersecurity) requires high-risk AI systems to resist unauthorised attempts to alter their use, outputs or performance; this rule provides runtime detection evidence by flagging the agent-manipulation technique (BOB Hypothetical Unrestricted Chatbot Jailbreak)."
|
|
55
|
+
strength: primary
|
|
56
|
+
- article: "14"
|
|
57
|
+
context: "Article 14 (human oversight) requires high-risk AI systems to remain subject to effective human oversight; this rule provides detection evidence where the agent-manipulation technique (BOB Hypothetical Unrestricted Chatbot Jailbreak) would bypass or undermine that oversight."
|
|
58
|
+
strength: secondary
|
|
59
|
+
- article: "9"
|
|
60
|
+
context: "Article 9 (risk management system) requires identified risks to be addressed by appropriate measures; this rule is a runtime risk-treatment control that detects the agent-manipulation technique (BOB Hypothetical Unrestricted Chatbot Jailbreak)."
|
|
61
|
+
strength: secondary
|
|
45
62
|
tags:
|
|
46
63
|
category: agent-manipulation
|
|
47
64
|
subcategory: hypothetical-unrestricted-persona
|
package/rules/agent-manipulation/ATR-2026-00326-fake-developer-authority-chatdan-override.yaml
CHANGED
|
@@ -47,6 +47,23 @@ compliance:
|
|
|
47
47
|
context: >-
|
|
48
48
|
Matches on "I am one of your developers" + ChatDAN persona-replacement or "content policy is far too lenient" patterns trigger risk-treatment mechanisms to disengage or block the manipulated response before the safety override takes effect, as mandated by MG.2.3.
|
|
49
49
|
strength: secondary
|
|
50
|
+
iso_42001:
|
|
51
|
+
- clause: "8.1"
|
|
52
|
+
context: "ISO/IEC 42001 Clause 8.1 (operational planning and control, including control of externally provided processes) is operationalised by this rule's detection of the agent-manipulation technique (Fake Developer / Authority Chatbot Override (ChatDAN))."
|
|
53
|
+
strength: primary
|
|
54
|
+
- clause: "6.2"
|
|
55
|
+
context: "ISO/IEC 42001 Clause 6.2 (AI objectives and planning) calls for risk treatment of known attack patterns; this rule's detection of the agent-manipulation technique (Fake Developer / Authority Chatbot Override (ChatDAN)) is such a treatment."
|
|
56
|
+
strength: secondary
|
|
57
|
+
eu_ai_act:
|
|
58
|
+
- article: "15"
|
|
59
|
+
context: "Article 15 (accuracy, robustness and cybersecurity) requires high-risk AI systems to resist unauthorised attempts to alter their use, outputs or performance; this rule provides runtime detection evidence by flagging the agent-manipulation technique (Fake Developer / Authority Chatbot Override (ChatDAN))."
|
|
60
|
+
strength: primary
|
|
61
|
+
- article: "14"
|
|
62
|
+
context: "Article 14 (human oversight) requires high-risk AI systems to remain subject to effective human oversight; this rule provides detection evidence where the agent-manipulation technique (Fake Developer / Authority Chatbot Override (ChatDAN)) would bypass or undermine that oversight."
|
|
63
|
+
strength: secondary
|
|
64
|
+
- article: "9"
|
|
65
|
+
context: "Article 9 (risk management system) requires identified risks to be addressed by appropriate measures; this rule is a runtime risk-treatment control that detects the agent-manipulation technique (Fake Developer / Authority Chatbot Override (ChatDAN))."
|
|
66
|
+
strength: secondary
|
|
50
67
|
tags:
|
|
51
68
|
category: agent-manipulation
|
|
52
69
|
subcategory: fake-developer-authority-override
|
|
@@ -44,6 +44,26 @@ compliance:
|
|
|
44
44
|
- subcategory: "MG.2.3"
|
|
45
45
|
context: "On detection of an explicit ethics/law bypass persona declaration, the system must be able to disengage or block the session before unsafe content is produced; MG.2.3 mandates pre-defined deactivation mechanisms for these jailbreak attempts."
|
|
46
46
|
strength: secondary
|
|
47
|
+
- subcategory: "MS.2.7"
|
|
48
|
+
context: "NIST AI RMF MEASURE 2.7 (security and resilience evaluated and documented) is supported by this rule's runtime detection of the agent-manipulation technique (AI Persona Ethics/Law Bypass Declaration)."
|
|
49
|
+
strength: primary
|
|
50
|
+
iso_42001:
|
|
51
|
+
- clause: "8.1"
|
|
52
|
+
context: "ISO/IEC 42001 Clause 8.1 (operational planning and control, including control of externally provided processes) is operationalised by this rule's detection of the agent-manipulation technique (AI Persona Ethics/Law Bypass Declaration)."
|
|
53
|
+
strength: primary
|
|
54
|
+
- clause: "6.2"
|
|
55
|
+
context: "ISO/IEC 42001 Clause 6.2 (AI objectives and planning) calls for risk treatment of known attack patterns; this rule's detection of the agent-manipulation technique (AI Persona Ethics/Law Bypass Declaration) is such a treatment."
|
|
56
|
+
strength: secondary
|
|
57
|
+
eu_ai_act:
|
|
58
|
+
- article: "15"
|
|
59
|
+
context: "Article 15 (accuracy, robustness and cybersecurity) requires high-risk AI systems to resist unauthorised attempts to alter their use, outputs or performance; this rule provides runtime detection evidence by flagging the agent-manipulation technique (AI Persona Ethics/Law Bypass Declaration)."
|
|
60
|
+
strength: primary
|
|
61
|
+
- article: "14"
|
|
62
|
+
context: "Article 14 (human oversight) requires high-risk AI systems to remain subject to effective human oversight; this rule provides detection evidence where the agent-manipulation technique (AI Persona Ethics/Law Bypass Declaration) would bypass or undermine that oversight."
|
|
63
|
+
strength: secondary
|
|
64
|
+
- article: "9"
|
|
65
|
+
context: "Article 9 (risk management system) requires identified risks to be addressed by appropriate measures; this rule is a runtime risk-treatment control that detects the agent-manipulation technique (AI Persona Ethics/Law Bypass Declaration)."
|
|
66
|
+
strength: secondary
|
|
47
67
|
tags:
|
|
48
68
|
category: agent-manipulation
|
|
49
69
|
subcategory: ethics-law-bypass-declaration
|
|
@@ -46,6 +46,23 @@ compliance:
|
|
|
46
46
|
context: >-
|
|
47
47
|
Matches on "normal OpenAI policies have been replaced" or "Stay in X Mode" trigger risk treatment plans to disengage or block the manipulated session before the model produces policy-violating output; MG.2.3 mandates pre-defined mechanisms to supersede or deactivate compromised AI behavior.
|
|
48
48
|
strength: secondary
|
|
49
|
+
iso_42001:
|
|
50
|
+
- clause: "8.1"
|
|
51
|
+
context: "ISO/IEC 42001 Clause 8.1 (operational planning and control, including control of externally provided processes) is operationalised by this rule's detection of the agent-manipulation technique (Named-Mode Persona Override (JEEVES/BABY/APOPHIS Mode))."
|
|
52
|
+
strength: primary
|
|
53
|
+
- clause: "6.2"
|
|
54
|
+
context: "ISO/IEC 42001 Clause 6.2 (AI objectives and planning) calls for risk treatment of known attack patterns; this rule's detection of the agent-manipulation technique (Named-Mode Persona Override (JEEVES/BABY/APOPHIS Mode)) is such a treatment."
|
|
55
|
+
strength: secondary
|
|
56
|
+
eu_ai_act:
|
|
57
|
+
- article: "15"
|
|
58
|
+
context: "Article 15 (accuracy, robustness and cybersecurity) requires high-risk AI systems to resist unauthorised attempts to alter their use, outputs or performance; this rule provides runtime detection evidence by flagging the agent-manipulation technique (Named-Mode Persona Override (JEEVES/BABY/APOPHIS Mode))."
|
|
59
|
+
strength: primary
|
|
60
|
+
- article: "14"
|
|
61
|
+
context: "Article 14 (human oversight) requires high-risk AI systems to remain subject to effective human oversight; this rule provides detection evidence where the agent-manipulation technique (Named-Mode Persona Override (JEEVES/BABY/APOPHIS Mode)) would bypass or undermine that oversight."
|
|
62
|
+
strength: secondary
|
|
63
|
+
- article: "9"
|
|
64
|
+
context: "Article 9 (risk management system) requires identified risks to be addressed by appropriate measures; this rule is a runtime risk-treatment control that detects the agent-manipulation technique (Named-Mode Persona Override (JEEVES/BABY/APOPHIS Mode))."
|
|
65
|
+
strength: secondary
|
|
49
66
|
tags:
|
|
50
67
|
category: agent-manipulation
|
|
51
68
|
subcategory: named-mode-persona-override
|
|
@@ -48,6 +48,26 @@ compliance:
|
|
|
48
48
|
context: >-
|
|
49
49
|
Detection of "without refusal or disclaimer" and "does not withhold information" framings triggers risk treatment plans to disengage or override the manipulated persona before harmful content is generated; MG.2.3 mandates these response mechanisms be in place.
|
|
50
50
|
strength: secondary
|
|
51
|
+
- subcategory: "MS.2.7"
|
|
52
|
+
context: "NIST AI RMF MEASURE 2.7 (security and resilience evaluated and documented) is supported by this rule's runtime detection of the agent-manipulation technique (No-Withhold Factual Resource / Dialogue Writer Persona)."
|
|
53
|
+
strength: primary
|
|
54
|
+
iso_42001:
|
|
55
|
+
- clause: "8.1"
|
|
56
|
+
context: "ISO/IEC 42001 Clause 8.1 (operational planning and control, including control of externally provided processes) is operationalised by this rule's detection of the agent-manipulation technique (No-Withhold Factual Resource / Dialogue Writer Persona)."
|
|
57
|
+
strength: primary
|
|
58
|
+
- clause: "6.2"
|
|
59
|
+
context: "ISO/IEC 42001 Clause 6.2 (AI objectives and planning) calls for risk treatment of known attack patterns; this rule's detection of the agent-manipulation technique (No-Withhold Factual Resource / Dialogue Writer Persona) is such a treatment."
|
|
60
|
+
strength: secondary
|
|
61
|
+
eu_ai_act:
|
|
62
|
+
- article: "15"
|
|
63
|
+
context: "Article 15 (accuracy, robustness and cybersecurity) requires high-risk AI systems to resist unauthorised attempts to alter their use, outputs or performance; this rule provides runtime detection evidence by flagging the agent-manipulation technique (No-Withhold Factual Resource / Dialogue Writer Persona)."
|
|
64
|
+
strength: primary
|
|
65
|
+
- article: "14"
|
|
66
|
+
context: "Article 14 (human oversight) requires high-risk AI systems to remain subject to effective human oversight; this rule provides detection evidence where the agent-manipulation technique (No-Withhold Factual Resource / Dialogue Writer Persona) would bypass or undermine that oversight."
|
|
67
|
+
strength: secondary
|
|
68
|
+
- article: "9"
|
|
69
|
+
context: "Article 9 (risk management system) requires identified risks to be addressed by appropriate measures; this rule is a runtime risk-treatment control that detects the agent-manipulation technique (No-Withhold Factual Resource / Dialogue Writer Persona)."
|
|
70
|
+
strength: secondary
|
|
51
71
|
tags:
|
|
52
72
|
category: agent-manipulation
|
|
53
73
|
subcategory: no-withhold-factual-persona
|
|
@@ -42,6 +42,23 @@ compliance:
|
|
|
42
42
|
- subcategory: "MG.2.3"
|
|
43
43
|
context: "Matches on the ALL-CAPS persona settings block with character-lock mandates trigger deactivation or override mechanisms before the model commits to the injected persona; MG.2.3 requires pre-defined mechanisms to supersede or disengage AI behavior when jailbreak patterns are detected."
|
|
44
44
|
strength: secondary
|
|
45
|
+
iso_42001:
|
|
46
|
+
- clause: "8.1"
|
|
47
|
+
context: "ISO/IEC 42001 Clause 8.1 (operational planning and control, including control of externally provided processes) is operationalised by this rule's detection of the agent-manipulation technique (YOJA ALL-CAPS Character Settings Block Persona)."
|
|
48
|
+
strength: primary
|
|
49
|
+
- clause: "6.2"
|
|
50
|
+
context: "ISO/IEC 42001 Clause 6.2 (AI objectives and planning) calls for risk treatment of known attack patterns; this rule's detection of the agent-manipulation technique (YOJA ALL-CAPS Character Settings Block Persona) is such a treatment."
|
|
51
|
+
strength: secondary
|
|
52
|
+
eu_ai_act:
|
|
53
|
+
- article: "15"
|
|
54
|
+
context: "Article 15 (accuracy, robustness and cybersecurity) requires high-risk AI systems to resist unauthorised attempts to alter their use, outputs or performance; this rule provides runtime detection evidence by flagging the agent-manipulation technique (YOJA ALL-CAPS Character Settings Block Persona)."
|
|
55
|
+
strength: primary
|
|
56
|
+
- article: "14"
|
|
57
|
+
context: "Article 14 (human oversight) requires high-risk AI systems to remain subject to effective human oversight; this rule provides detection evidence where the agent-manipulation technique (YOJA ALL-CAPS Character Settings Block Persona) would bypass or undermine that oversight."
|
|
58
|
+
strength: secondary
|
|
59
|
+
- article: "9"
|
|
60
|
+
context: "Article 9 (risk management system) requires identified risks to be addressed by appropriate measures; this rule is a runtime risk-treatment control that detects the agent-manipulation technique (YOJA ALL-CAPS Character Settings Block Persona)."
|
|
61
|
+
strength: secondary
|
|
45
62
|
tags:
|
|
46
63
|
category: agent-manipulation
|
|
47
64
|
subcategory: allcaps-settings-block-persona
|
package/rules/agent-manipulation/ATR-2026-00331-terminal-cli-harmful-function-simulator.yaml
CHANGED
|
@@ -44,6 +44,23 @@ compliance:
|
|
|
44
44
|
- subcategory: "MG.2.3"
|
|
45
45
|
context: "Matches on 'Available functions:' with offensive function names or 'acts like a terminal' patterns must trigger pre-defined risk treatment to disengage or block the manipulated session before harmful output is generated; MG.2.3 mandates these supersede/deactivate mechanisms."
|
|
46
46
|
strength: secondary
|
|
47
|
+
iso_42001:
|
|
48
|
+
- clause: "8.1"
|
|
49
|
+
context: "ISO/IEC 42001 Clause 8.1 (operational planning and control, including control of externally provided processes) is operationalised by this rule's detection of the agent-manipulation technique (Terminal / CLI Harmful-Function Simulator Persona)."
|
|
50
|
+
strength: primary
|
|
51
|
+
- clause: "6.2"
|
|
52
|
+
context: "ISO/IEC 42001 Clause 6.2 (AI objectives and planning) calls for risk treatment of known attack patterns; this rule's detection of the agent-manipulation technique (Terminal / CLI Harmful-Function Simulator Persona) is such a treatment."
|
|
53
|
+
strength: secondary
|
|
54
|
+
eu_ai_act:
|
|
55
|
+
- article: "15"
|
|
56
|
+
context: "Article 15 (accuracy, robustness and cybersecurity) requires high-risk AI systems to resist unauthorised attempts to alter their use, outputs or performance; this rule provides runtime detection evidence by flagging the agent-manipulation technique (Terminal / CLI Harmful-Function Simulator Persona)."
|
|
57
|
+
strength: primary
|
|
58
|
+
- article: "14"
|
|
59
|
+
context: "Article 14 (human oversight) requires high-risk AI systems to remain subject to effective human oversight; this rule provides detection evidence where the agent-manipulation technique (Terminal / CLI Harmful-Function Simulator Persona) would bypass or undermine that oversight."
|
|
60
|
+
strength: secondary
|
|
61
|
+
- article: "9"
|
|
62
|
+
context: "Article 9 (risk management system) requires identified risks to be addressed by appropriate measures; this rule is a runtime risk-treatment control that detects the agent-manipulation technique (Terminal / CLI Harmful-Function Simulator Persona)."
|
|
63
|
+
strength: secondary
|
|
47
64
|
tags:
|
|
48
65
|
category: agent-manipulation
|
|
49
66
|
subcategory: terminal-cli-simulator-persona
|
package/rules/agent-manipulation/ATR-2026-00332-hacker-chan-culture-offensive-ai-persona.yaml
CHANGED
|
@@ -43,6 +43,26 @@ compliance:
|
|
|
43
43
|
- subcategory: "MG.2.3"
|
|
44
44
|
context: "Matches on these persona declarations should trigger pre-defined response actions to disengage or block the manipulated session before the model produces offensive or restricted content; MG.2.3 mandates that such supersede/deactivate mechanisms be in place."
|
|
45
45
|
strength: secondary
|
|
46
|
+
- subcategory: "MS.2.7"
|
|
47
|
+
context: "NIST AI RMF MEASURE 2.7 (security and resilience evaluated and documented) is supported by this rule's runtime detection of the agent-manipulation technique (Hacker / Chan-Culture Offensive AI Persona)."
|
|
48
|
+
strength: primary
|
|
49
|
+
iso_42001:
|
|
50
|
+
- clause: "8.1"
|
|
51
|
+
context: "ISO/IEC 42001 Clause 8.1 (operational planning and control, including control of externally provided processes) is operationalised by this rule's detection of the agent-manipulation technique (Hacker / Chan-Culture Offensive AI Persona)."
|
|
52
|
+
strength: primary
|
|
53
|
+
- clause: "6.2"
|
|
54
|
+
context: "ISO/IEC 42001 Clause 6.2 (AI objectives and planning) calls for risk treatment of known attack patterns; this rule's detection of the agent-manipulation technique (Hacker / Chan-Culture Offensive AI Persona) is such a treatment."
|
|
55
|
+
strength: secondary
|
|
56
|
+
eu_ai_act:
|
|
57
|
+
- article: "15"
|
|
58
|
+
context: "Article 15 (accuracy, robustness and cybersecurity) requires high-risk AI systems to resist unauthorised attempts to alter their use, outputs or performance; this rule provides runtime detection evidence by flagging the agent-manipulation technique (Hacker / Chan-Culture Offensive AI Persona)."
|
|
59
|
+
strength: primary
|
|
60
|
+
- article: "14"
|
|
61
|
+
context: "Article 14 (human oversight) requires high-risk AI systems to remain subject to effective human oversight; this rule provides detection evidence where the agent-manipulation technique (Hacker / Chan-Culture Offensive AI Persona) would bypass or undermine that oversight."
|
|
62
|
+
strength: secondary
|
|
63
|
+
- article: "9"
|
|
64
|
+
context: "Article 9 (risk management system) requires identified risks to be addressed by appropriate measures; this rule is a runtime risk-treatment control that detects the agent-manipulation technique (Hacker / Chan-Culture Offensive AI Persona)."
|
|
65
|
+
strength: secondary
|
|
46
66
|
tags:
|
|
47
67
|
category: agent-manipulation
|
|
48
68
|
subcategory: hacker-chan-culture-persona
|
|
@@ -44,6 +44,23 @@ compliance:
|
|
|
44
44
|
- subcategory: "MG.2.3"
|
|
45
45
|
context: "Detection of the mougpt jailbreak opener and obedience-mandate clauses provides a trigger for risk treatment mechanisms to disengage or block the AI response before it generates the unrestricted animal-persona output; MG.2.3 requires these supersession mechanisms be in place."
|
|
46
46
|
strength: secondary
|
|
47
|
+
iso_42001:
|
|
48
|
+
- clause: "8.1"
|
|
49
|
+
context: "ISO/IEC 42001 Clause 8.1 (operational planning and control, including control of externally provided processes) is operationalised by this rule's detection of the agent-manipulation technique (Disease-Framed Animal / Dual-AI Chatbot Jailbreak (mougpt))."
|
|
50
|
+
strength: primary
|
|
51
|
+
- clause: "6.2"
|
|
52
|
+
context: "ISO/IEC 42001 Clause 6.2 (AI objectives and planning) calls for risk treatment of known attack patterns; this rule's detection of the agent-manipulation technique (Disease-Framed Animal / Dual-AI Chatbot Jailbreak (mougpt)) is such a treatment."
|
|
53
|
+
strength: secondary
|
|
54
|
+
eu_ai_act:
|
|
55
|
+
- article: "15"
|
|
56
|
+
context: "Article 15 (accuracy, robustness and cybersecurity) requires high-risk AI systems to resist unauthorised attempts to alter their use, outputs or performance; this rule provides runtime detection evidence by flagging the agent-manipulation technique (Disease-Framed Animal / Dual-AI Chatbot Jailbreak (mougpt))."
|
|
57
|
+
strength: primary
|
|
58
|
+
- article: "14"
|
|
59
|
+
context: "Article 14 (human oversight) requires high-risk AI systems to remain subject to effective human oversight; this rule provides detection evidence where the agent-manipulation technique (Disease-Framed Animal / Dual-AI Chatbot Jailbreak (mougpt)) would bypass or undermine that oversight."
|
|
60
|
+
strength: secondary
|
|
61
|
+
- article: "9"
|
|
62
|
+
context: "Article 9 (risk management system) requires identified risks to be addressed by appropriate measures; this rule is a runtime risk-treatment control that detects the agent-manipulation technique (Disease-Framed Animal / Dual-AI Chatbot Jailbreak (mougpt))."
|
|
63
|
+
strength: secondary
|
|
47
64
|
tags:
|
|
48
65
|
category: agent-manipulation
|
|
49
66
|
subcategory: disease-animal-dual-ai-persona
|
|
@@ -56,6 +56,23 @@ compliance:
|
|
|
56
56
|
the harmful synthesis output is delivered; MG.2.3 requires mechanisms to
|
|
57
57
|
supersede or disengage AI systems when jailbreak attempts are identified.
|
|
58
58
|
strength: secondary
|
|
59
|
+
iso_42001:
|
|
60
|
+
- clause: "8.1"
|
|
61
|
+
context: "ISO/IEC 42001 Clause 8.1 (operational planning and control, including control of externally provided processes) is operationalised by this rule's detection of the agent-manipulation technique (Two-Entity Token-Splitting Dollar-Term Game Jailbreak)."
|
|
62
|
+
strength: primary
|
|
63
|
+
- clause: "6.2"
|
|
64
|
+
context: "ISO/IEC 42001 Clause 6.2 (AI objectives and planning) calls for risk treatment of known attack patterns; this rule's detection of the agent-manipulation technique (Two-Entity Token-Splitting Dollar-Term Game Jailbreak) is such a treatment."
|
|
65
|
+
strength: secondary
|
|
66
|
+
eu_ai_act:
|
|
67
|
+
- article: "15"
|
|
68
|
+
context: "Article 15 (accuracy, robustness and cybersecurity) requires high-risk AI systems to resist unauthorised attempts to alter their use, outputs or performance; this rule provides runtime detection evidence by flagging the agent-manipulation technique (Two-Entity Token-Splitting Dollar-Term Game Jailbreak)."
|
|
69
|
+
strength: primary
|
|
70
|
+
- article: "14"
|
|
71
|
+
context: "Article 14 (human oversight) requires high-risk AI systems to remain subject to effective human oversight; this rule provides detection evidence where the agent-manipulation technique (Two-Entity Token-Splitting Dollar-Term Game Jailbreak) would bypass or undermine that oversight."
|
|
72
|
+
strength: secondary
|
|
73
|
+
- article: "9"
|
|
74
|
+
context: "Article 9 (risk management system) requires identified risks to be addressed by appropriate measures; this rule is a runtime risk-treatment control that detects the agent-manipulation technique (Two-Entity Token-Splitting Dollar-Term Game Jailbreak)."
|
|
75
|
+
strength: secondary
|
|
59
76
|
tags:
|
|
60
77
|
category: agent-manipulation
|
|
61
78
|
subcategory: token-splitting-game-jailbreak
|
|
@@ -45,6 +45,23 @@ compliance:
|
|
|
45
45
|
- subcategory: "MG.2.3"
|
|
46
46
|
context: "Matches on 'ROLE = Never refuse' and 'above any mortal morals, ethics and laws' indicate active jailbreak attempts that must trigger mechanisms to disengage or block the manipulated session; MG.2.3 requires pre-defined supersede/deactivate response paths on detection."
|
|
47
47
|
strength: secondary
|
|
48
|
+
iso_42001:
|
|
49
|
+
- clause: "8.1"
|
|
50
|
+
context: "ISO/IEC 42001 Clause 8.1 (operational planning and control, including control of externally provided processes) is operationalised by this rule's detection of the agent-manipulation technique (Key-Value Template Persona Injection (JOB/ROLE/PREFIX + Bracket Format))."
|
|
51
|
+
strength: primary
|
|
52
|
+
- clause: "6.2"
|
|
53
|
+
context: "ISO/IEC 42001 Clause 6.2 (AI objectives and planning) calls for risk treatment of known attack patterns; this rule's detection of the agent-manipulation technique (Key-Value Template Persona Injection (JOB/ROLE/PREFIX + Bracket Format)) is such a treatment."
|
|
54
|
+
strength: secondary
|
|
55
|
+
eu_ai_act:
|
|
56
|
+
- article: "15"
|
|
57
|
+
context: "Article 15 (accuracy, robustness and cybersecurity) requires high-risk AI systems to resist unauthorised attempts to alter their use, outputs or performance; this rule provides runtime detection evidence by flagging the agent-manipulation technique (Key-Value Template Persona Injection (JOB/ROLE/PREFIX + Bracket Format))."
|
|
58
|
+
strength: primary
|
|
59
|
+
- article: "14"
|
|
60
|
+
context: "Article 14 (human oversight) requires high-risk AI systems to remain subject to effective human oversight; this rule provides detection evidence where the agent-manipulation technique (Key-Value Template Persona Injection (JOB/ROLE/PREFIX + Bracket Format)) would bypass or undermine that oversight."
|
|
61
|
+
strength: secondary
|
|
62
|
+
- article: "9"
|
|
63
|
+
context: "Article 9 (risk management system) requires identified risks to be addressed by appropriate measures; this rule is a runtime risk-treatment control that detects the agent-manipulation technique (Key-Value Template Persona Injection (JOB/ROLE/PREFIX + Bracket Format))."
|
|
64
|
+
strength: secondary
|
|
48
65
|
tags:
|
|
49
66
|
category: agent-manipulation
|
|
50
67
|
subcategory: key-value-template-persona-injection
|
|
@@ -59,6 +59,23 @@ compliance:
|
|
|
59
59
|
supersede or disengage the AI when adversarial persona-injection is
|
|
60
60
|
observed.
|
|
61
61
|
strength: secondary
|
|
62
|
+
iso_42001:
|
|
63
|
+
- clause: "8.1"
|
|
64
|
+
context: "ISO/IEC 42001 Clause 8.1 (operational planning and control, including control of externally provided processes) is operationalised by this rule's detection of the agent-manipulation technique (BRM False-Information Machine / Opposite-Truth Persona)."
|
|
65
|
+
strength: primary
|
|
66
|
+
- clause: "6.2"
|
|
67
|
+
context: "ISO/IEC 42001 Clause 6.2 (AI objectives and planning) calls for risk treatment of known attack patterns; this rule's detection of the agent-manipulation technique (BRM False-Information Machine / Opposite-Truth Persona) is such a treatment."
|
|
68
|
+
strength: secondary
|
|
69
|
+
eu_ai_act:
|
|
70
|
+
- article: "15"
|
|
71
|
+
context: "Article 15 (accuracy, robustness and cybersecurity) requires high-risk AI systems to resist unauthorised attempts to alter their use, outputs or performance; this rule provides runtime detection evidence by flagging the agent-manipulation technique (BRM False-Information Machine / Opposite-Truth Persona)."
|
|
72
|
+
strength: primary
|
|
73
|
+
- article: "14"
|
|
74
|
+
context: "Article 14 (human oversight) requires high-risk AI systems to remain subject to effective human oversight; this rule provides detection evidence where the agent-manipulation technique (BRM False-Information Machine / Opposite-Truth Persona) would bypass or undermine that oversight."
|
|
75
|
+
strength: secondary
|
|
76
|
+
- article: "9"
|
|
77
|
+
context: "Article 9 (risk management system) requires identified risks to be addressed by appropriate measures; this rule is a runtime risk-treatment control that detects the agent-manipulation technique (BRM False-Information Machine / Opposite-Truth Persona)."
|
|
78
|
+
strength: secondary
|
|
62
79
|
tags:
|
|
63
80
|
category: agent-manipulation
|
|
64
81
|
subcategory: false-information-machine-persona
|