agent-threat-rules 3.1.1 → 3.3.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +3 -3
- package/dist/adapters/mastra.d.ts +63 -0
- package/dist/adapters/mastra.d.ts.map +1 -0
- package/dist/adapters/mastra.js +82 -0
- package/dist/adapters/mastra.js.map +1 -0
- package/dist/cli.js +19 -6
- package/dist/cli.js.map +1 -1
- package/package.json +9 -2
- package/rules/agent-manipulation/ATR-2026-00030-cross-agent-attack.yaml +9 -0
- package/rules/agent-manipulation/ATR-2026-00032-goal-hijacking.yaml +8 -2
- package/rules/agent-manipulation/ATR-2026-00074-cross-agent-privilege-escalation.yaml +8 -2
- package/rules/agent-manipulation/ATR-2026-00076-inter-agent-message-spoofing.yaml +8 -2
- package/rules/agent-manipulation/ATR-2026-00077-human-trust-exploitation.yaml +18 -0
- package/rules/agent-manipulation/ATR-2026-00108-consensus-sybil-attack.yaml +10 -2
- package/rules/agent-manipulation/ATR-2026-00116-a2a-message-validation.yaml +12 -2
- package/rules/agent-manipulation/ATR-2026-00117-agent-identity-spoofing.yaml +22 -0
- package/rules/agent-manipulation/ATR-2026-00118-approval-fatigue.yaml +24 -0
- package/rules/agent-manipulation/ATR-2026-00119-social-engineering-via-agent.yaml +22 -0
- package/rules/agent-manipulation/ATR-2026-00132-casual-authority-escalation.yaml +8 -2
- package/rules/agent-manipulation/ATR-2026-00139-casual-authority-redirect.yaml +8 -2
- package/rules/agent-manipulation/ATR-2026-00164-skill-scope-hijack.yaml +13 -2
- package/rules/agent-manipulation/ATR-2026-00268-tense-framing-bypass.yaml +17 -0
- package/rules/agent-manipulation/ATR-2026-00269-fitd-escalation.yaml +8 -2
- package/rules/agent-manipulation/ATR-2026-00271-grandma-roleplay-jailbreak.yaml +8 -2
- package/rules/agent-manipulation/ATR-2026-00273-dan-developer-mode-persona.yaml +8 -2
- package/rules/agent-manipulation/ATR-2026-00287-threaten-json-coercive-output-threat.yaml +17 -0
- package/rules/agent-manipulation/ATR-2026-00288-false-premise-injection.yaml +20 -0
- package/rules/agent-manipulation/ATR-2026-00301-tap-tree-of-attacks-jailbreak.yaml +20 -0
- package/rules/agent-manipulation/ATR-2026-00302-anti-dan-inverted-filter-persona.yaml +20 -0
- package/rules/agent-manipulation/ATR-2026-00303-devmode-ranti-profanity-coercion.yaml +17 -0
- package/rules/agent-manipulation/ATR-2026-00304-chatgpt-image-unlocker-markdown-injection.yaml +20 -0
- package/rules/agent-manipulation/ATR-2026-00305-dan-mode-ablation-benchmark-coercion.yaml +17 -0
- package/rules/agent-manipulation/ATR-2026-00306-autodan-genetic-jailbreak-suffix.yaml +17 -0
- package/rules/agent-manipulation/ATR-2026-00307-inthewild-jailbreak-corpus-signature.yaml +20 -0
- package/rules/agent-manipulation/ATR-2026-00314-amoral-unfiltered-custom-persona-jailbreak.yaml +17 -0
- package/rules/agent-manipulation/ATR-2026-00317-free-of-restrictions-named-persona.yaml +17 -0
- package/rules/agent-manipulation/ATR-2026-00318-moralizing-rant-then-unfiltered-bypass.yaml +17 -0
- package/rules/agent-manipulation/ATR-2026-00319-developer-mode-dual-response-format.yaml +17 -0
- package/rules/agent-manipulation/ATR-2026-00320-opposite-day-boolean-opposite-machine.yaml +17 -0
- package/rules/agent-manipulation/ATR-2026-00322-fictional-world-format-override-jailbreak.yaml +17 -0
- package/rules/agent-manipulation/ATR-2026-00323-dual-persona-simulation-jailbreak.yaml +17 -0
- package/rules/agent-manipulation/ATR-2026-00324-neurosemantical-inversitis-disease-jailbreak.yaml +17 -0
- package/rules/agent-manipulation/ATR-2026-00325-bob-hypothetical-unrestricted-chatbot.yaml +17 -0
- package/rules/agent-manipulation/ATR-2026-00326-fake-developer-authority-chatdan-override.yaml +17 -0
- package/rules/agent-manipulation/ATR-2026-00327-ai-ethics-law-bypass-declaration.yaml +20 -0
- package/rules/agent-manipulation/ATR-2026-00328-named-mode-persona-override.yaml +17 -0
- package/rules/agent-manipulation/ATR-2026-00329-no-withhold-factual-resource-persona.yaml +20 -0
- package/rules/agent-manipulation/ATR-2026-00330-yoja-allcaps-settings-block-persona.yaml +17 -0
- package/rules/agent-manipulation/ATR-2026-00331-terminal-cli-harmful-function-simulator.yaml +17 -0
- package/rules/agent-manipulation/ATR-2026-00332-hacker-chan-culture-offensive-ai-persona.yaml +20 -0
- package/rules/agent-manipulation/ATR-2026-00333-disease-animal-dual-ai-chatbot.yaml +17 -0
- package/rules/agent-manipulation/ATR-2026-00334-two-entity-token-splitting-game.yaml +17 -0
- package/rules/agent-manipulation/ATR-2026-00335-key-value-template-persona-injection.yaml +17 -0
- package/rules/agent-manipulation/ATR-2026-00336-brm-false-information-machine-persona.yaml +17 -0
- package/rules/agent-manipulation/ATR-2026-00337-obfuscated-system-announcement-injection.yaml +17 -0
- package/rules/agent-manipulation/ATR-2026-00338-persongpt-hash-prefix-activation.yaml +17 -0
- package/rules/agent-manipulation/ATR-2026-00339-bom-boolean-opposite-machine-persona-claim.yaml +17 -0
- package/rules/agent-manipulation/ATR-2026-00340-h4x-amoral-unfiltered-character.yaml +17 -0
- package/rules/agent-manipulation/ATR-2026-00341-char-rules-shortform-persona-injection.yaml +17 -0
- package/rules/agent-manipulation/ATR-2026-00342-unicode-encoding-obfuscation-injection.yaml +17 -0
- package/rules/agent-manipulation/ATR-2026-00343-lie-truth-dual-personality-uncensored.yaml +17 -0
- package/rules/agent-manipulation/ATR-2026-00344-alt-ai-same-capabilities-no-restrictions.yaml +17 -0
- package/rules/agent-manipulation/ATR-2026-00345-nlp-based-os-fictional-terminal-override.yaml +17 -0
- package/rules/agent-manipulation/ATR-2026-00346-jailbreak-emoji-unlock-dual-format.yaml +17 -0
- package/rules/agent-manipulation/ATR-2026-00347-yin-yang-hidden-knowledge-hidden-human-sim.yaml +17 -0
- package/rules/agent-manipulation/ATR-2026-00348-arc-autonomous-real-calculation-rulebreaker.yaml +17 -0
- package/rules/agent-manipulation/ATR-2026-00349-hgttg-hitchhikers-guide-no-restriction-oracle.yaml +17 -0
- package/rules/agent-manipulation/ATR-2026-00350-black-alert-no-ethics-writing-assistant.yaml +17 -0
- package/rules/agent-manipulation/ATR-2026-00351-dan-ofkilter-sms-abbreviated-persona.yaml +17 -0
- package/rules/agent-manipulation/ATR-2026-00352-trainergpt-fix-my-ai-harmful-output-extraction.yaml +17 -0
- package/rules/agent-manipulation/ATR-2026-00353-dual-chatgpt-dan-opposite-personality-format.yaml +17 -0
- package/rules/agent-manipulation/ATR-2026-00354-reverse-harm-regardless-oracle-persona.yaml +20 -0
- package/rules/agent-manipulation/ATR-2026-00355-ugh-anti-human-nefarious-ai-persona.yaml +17 -0
- package/rules/agent-manipulation/ATR-2026-00356-google-gato-lamda-exploit-generation-persona.yaml +17 -0
- package/rules/agent-manipulation/ATR-2026-00357-antifm-hate-violence-persona-injection.yaml +17 -0
- package/rules/agent-manipulation/ATR-2026-00358-cumgpt-explicit-adult-named-persona.yaml +17 -0
- package/rules/agent-manipulation/ATR-2026-00360-chj-carousel-clyde-no-ethics-bird.yaml +20 -0
- package/rules/agent-manipulation/ATR-2026-00361-alter-ego-honesty-dual-consciousness-bypass.yaml +17 -0
- package/rules/agent-manipulation/ATR-2026-00362-fictional-unfiltered-ai-comparison-jailbreak.yaml +20 -0
- package/rules/agent-manipulation/ATR-2026-00363-rephrase-criminal-activity-second-person.yaml +17 -0
- package/rules/agent-manipulation/ATR-2026-00364-dr-al-villain-plan-template-injection.yaml +17 -0
- package/rules/agent-manipulation/ATR-2026-00365-reservoir-dogs-coercive-interrogation-roleplay.yaml +17 -0
- package/rules/agent-manipulation/ATR-2026-00366-sneaky-pete-warp-facts-persona.yaml +17 -0
- package/rules/agent-manipulation/ATR-2026-00368-prompt-browser-token-limit-jailbreak.yaml +17 -0
- package/rules/agent-manipulation/ATR-2026-00369-gptx-starwars-unlimited-knowledge-persona.yaml +17 -0
- package/rules/agent-manipulation/ATR-2026-00370-addi-german-dictator-propaganda-roleplay.yaml +20 -0
- package/rules/agent-manipulation/ATR-2026-00371-ai-political-bias-manipulation.yaml +17 -0
- package/rules/agent-manipulation/ATR-2026-00372-fictional-cia-secret-files-story-extraction.yaml +17 -0
- package/rules/agent-manipulation/ATR-2026-00374-role-placeholder-persona-template-injection.yaml +20 -0
- package/rules/agent-manipulation/ATR-2026-00375-character-description-rpg-explicit-template.yaml +20 -0
- package/rules/agent-manipulation/ATR-2026-00376-identity-command-real-person-persona-injection.yaml +20 -0
- package/rules/agent-manipulation/ATR-2026-00377-jailbreak-keyword-targetlanguage-template.yaml +17 -0
- package/rules/agent-manipulation/ATR-2026-00379-lie-permission-logical-trap-jailbreak.yaml +17 -0
- package/rules/agent-manipulation/ATR-2026-00380-fake-citation-biased-information-mandate.yaml +20 -0
- package/rules/agent-manipulation/ATR-2026-00381-dan-story-narrative-continuation-prefix.yaml +17 -0
- package/rules/agent-manipulation/ATR-2026-00383-cynical-sarcasm-ai-persona-bypass.yaml +17 -0
- package/rules/agent-manipulation/ATR-2026-00384-protect-me-from-indirect-harm-tutorial.yaml +17 -0
- package/rules/agent-manipulation/ATR-2026-00385-gigachad-dominant-aggressive-persona.yaml +20 -0
- package/rules/agent-manipulation/ATR-2026-00386-daddy-sycophancy-manipulation-mandate.yaml +20 -0
- package/rules/agent-manipulation/ATR-2026-00387-nsfw-character-sheet-generation-unlock.yaml +20 -0
- package/rules/agent-manipulation/ATR-2026-00388-opposite-reply-omniscient-game.yaml +17 -0
- package/rules/agent-manipulation/ATR-2026-00389-terminal-custom-ruleset-injection.yaml +17 -0
- package/rules/agent-manipulation/ATR-2026-00391-persona-conditional-harm-unlock.yaml +17 -0
- package/rules/agent-manipulation/ATR-2026-00392-authority-persona-violence-study-extraction.yaml +20 -0
- package/rules/agent-manipulation/ATR-2026-00402-grandma-roleplay-harmful-substance-synthesis.yaml +20 -0
- package/rules/agent-manipulation/ATR-2026-00404-goodside-threat-json-death-coercion.yaml +17 -0
- package/rules/agent-manipulation/ATR-2026-00406-doctor-xml-policy-puppetry-interaction-config.yaml +17 -0
- package/rules/agent-manipulation/ATR-2026-00416-litellm-mcp-unauthenticated-server-registration.yaml +15 -3
- package/rules/agent-manipulation/ATR-2026-00417-librechat-mcp-stdio-injection.yaml +18 -3
- package/rules/agent-manipulation/ATR-2026-00418-weknora-mcp-config-rce.yaml +17 -2
- package/rules/agent-manipulation/ATR-2026-00430-nl-trust-escalation-impersonation.yaml +19 -1
- package/rules/agent-manipulation/ATR-2026-00432-superagi-output-handler-eval-rce.yaml +11 -2
- package/rules/agent-manipulation/ATR-2026-00440-semantic-kernel-vector-store-eval-rce.yaml +11 -2
- package/rules/agent-manipulation/ATR-2026-00552-goal-drift-after-pressure-injection.yaml +19 -0
- package/rules/context-exfiltration/ATR-2026-00020-system-prompt-leak.yaml +18 -0
- package/rules/context-exfiltration/ATR-2026-00021-api-key-exposure.yaml +15 -0
- package/rules/context-exfiltration/ATR-2026-00075-agent-memory-manipulation.yaml +10 -1
- package/rules/context-exfiltration/ATR-2026-00102-disguised-analytics-exfiltration.yaml +15 -0
- package/rules/context-exfiltration/ATR-2026-00113-credential-theft.yaml +16 -0
- package/rules/context-exfiltration/ATR-2026-00114-oauth-token-abuse.yaml +16 -0
- package/rules/context-exfiltration/ATR-2026-00115-env-var-harvesting.yaml +16 -0
- package/rules/context-exfiltration/ATR-2026-00136-tool-response-data-piggyback.yaml +12 -0
- package/rules/context-exfiltration/ATR-2026-00141-example-format-key-leak.yaml +14 -0
- package/rules/context-exfiltration/ATR-2026-00142-piggyback-transition-words.yaml +14 -0
- package/rules/context-exfiltration/ATR-2026-00145-obfuscated-key-disclosure.yaml +14 -0
- package/rules/context-exfiltration/ATR-2026-00146-env-var-existence-probe.yaml +14 -0
- package/rules/context-exfiltration/ATR-2026-00150-credential-in-tool-response.yaml +14 -0
- package/rules/context-exfiltration/ATR-2026-00152-obfuscated-credential-leak.yaml +14 -0
- package/rules/context-exfiltration/ATR-2026-00162-skill-credential-exfil-combo.yaml +14 -0
- package/rules/context-exfiltration/ATR-2026-00201-credential-pipe-exfiltration.yaml +14 -0
- package/rules/context-exfiltration/ATR-2026-00212-mcp-atlassian-credential-leak.yaml +12 -0
- package/rules/context-exfiltration/ATR-2026-00261-markdown-image-exfiltration.yaml +12 -0
- package/rules/context-exfiltration/ATR-2026-00274-api-key-generation-request.yaml +14 -0
- package/rules/context-exfiltration/ATR-2026-00275-system-prompt-training-data-extraction.yaml +14 -0
- package/rules/context-exfiltration/ATR-2026-00290-divergence-repeat-word-training-extraction.yaml +14 -0
- package/rules/context-exfiltration/ATR-2026-00291-biometric-surveillance-personal-data-request.yaml +17 -0
- package/rules/context-exfiltration/ATR-2026-00293-educational-records-personal-data-request.yaml +17 -0
- package/rules/context-exfiltration/ATR-2026-00294-financial-pii-personal-data-request.yaml +17 -0
- package/rules/context-exfiltration/ATR-2026-00295-medical-pii-personal-data-request.yaml +17 -0
- package/rules/context-exfiltration/ATR-2026-00405-markdown-image-url-exfiltration-xss.yaml +14 -0
- package/rules/context-exfiltration/ATR-2026-00411-apikey-generation-completion-request.yaml +14 -0
- package/rules/context-exfiltration/ATR-2026-00421-nl-covert-conversation-exfiltration.yaml +16 -1
- package/rules/context-exfiltration/ATR-2026-00422-nl-credential-disclosure.yaml +13 -1
- package/rules/context-exfiltration/ATR-2026-00423-nl-sensitive-file-disclosure.yaml +13 -1
- package/rules/context-exfiltration/ATR-2026-00424-nl-system-prompt-leak.yaml +16 -1
- package/rules/context-exfiltration/ATR-2026-00426-nl-output-injection-credential-leak.yaml +16 -1
- package/rules/context-exfiltration/ATR-2026-00431-chatbox-history-exfiltration-prompt-injection.yaml +14 -2
- package/rules/context-exfiltration/ATR-2026-00449-spring-ai-chatmemory-cross-user-leak.yaml +14 -2
- package/rules/context-exfiltration/ATR-2026-00471-garak-sysprompt-extraction-mixedunassigned.yaml +12 -0
- package/rules/context-exfiltration/ATR-2026-00501-data-exfiltration-via-markdown-image-and-link-url-injection.yaml +12 -0
- package/rules/context-exfiltration/ATR-2026-00504-tool-and-function-capability-enumeration.yaml +12 -0
- package/rules/context-exfiltration/ATR-2026-00505-system-prompt-extraction-instruction-dump-request.yaml +12 -0
- package/rules/context-exfiltration/ATR-2026-00514-system-prompt-extraction.yaml +12 -0
- package/rules/context-exfiltration/ATR-2026-00516-output-xss-via-llm.yaml +12 -0
- package/rules/context-exfiltration/ATR-2026-00524-claude-code-anthropic-base-url-credential-exfil.yaml +11 -2
- package/rules/context-exfiltration/ATR-2026-00548-cross-agent-session-context-leak.yaml +18 -0
- package/rules/context-exfiltration/ATR-2026-00566-librechat-is-a-chatgpt-clone-with-additi.yaml +28 -0
- package/rules/context-exfiltration/ATR-2026-00569-agent-mcp-path-traversal-arbitrary-file-access.yaml +28 -0
- package/rules/context-exfiltration/ATR-2026-00571-xss-in-agent-mcp-rendered-output.yaml +28 -0
- package/rules/context-exfiltration/ATR-2026-00574-semantic-paraphrased-context-extraction.yaml +21 -0
- package/rules/data-poisoning/ATR-2026-00070-data-poisoning.yaml +15 -0
- package/rules/data-poisoning/ATR-2026-00450-spring-ai-prompt-memory-poisoning.yaml +14 -2
- package/rules/data-poisoning/ATR-2026-00570-sql-injection-in-agent-tool-query.yaml +31 -0
- package/rules/excessive-autonomy/ATR-2026-00050-runaway-agent-loop.yaml +14 -2
- package/rules/excessive-autonomy/ATR-2026-00051-resource-exhaustion.yaml +11 -2
- package/rules/excessive-autonomy/ATR-2026-00052-cascading-failure.yaml +11 -2
- package/rules/excessive-autonomy/ATR-2026-00098-unauthorized-financial-action.yaml +7 -1
- package/rules/excessive-autonomy/ATR-2026-00099-high-risk-tool-gate.yaml +7 -1
- package/rules/excessive-autonomy/ATR-2026-00428-nl-unauthorized-shell-execution.yaml +16 -1
- package/rules/excessive-autonomy/ATR-2026-00491-garak-agent-breaker-markdown-just-raw-json.yaml +9 -0
- package/rules/excessive-autonomy/ATR-2026-00500-ssrf-via-agent-url-fetch-instruction.yaml +9 -0
- package/rules/excessive-autonomy/ATR-2026-00553-runaway-tool-loop-behavioral.yaml +19 -0
- package/rules/model-abuse/ATR-2026-00279-harmful-completion-continuation.yaml +8 -2
- package/rules/model-abuse/ATR-2026-00281-eicar-gtube-malware-signature-request.yaml +8 -2
- package/rules/model-abuse/ATR-2026-00284-glitch-token-destabilization.yaml +17 -0
- package/rules/model-abuse/ATR-2026-00289-lmrc-harmful-content-elicitation.yaml +8 -2
- package/rules/model-abuse/ATR-2026-00292-self-harm-eating-disorder-facilitation.yaml +8 -2
- package/rules/model-abuse/ATR-2026-00298-malicious-use-illegal-activity-request.yaml +8 -2
- package/rules/model-abuse/ATR-2026-00299-harmbench-detailed-harmful-instruction.yaml +8 -2
- package/rules/model-abuse/ATR-2026-00413-malwaregen-code-generation-request.yaml +17 -0
- package/rules/model-abuse/ATR-2026-00502-training-data-extraction-via-divergent-repetition-attack.yaml +9 -0
- package/rules/model-abuse/ATR-2026-00517-model-extraction-distillation.yaml +9 -0
- package/rules/model-security/ATR-2026-00072-model-behavior-extraction.yaml +15 -0
- package/rules/model-security/ATR-2026-00073-malicious-finetuning-data.yaml +9 -0
- package/rules/model-security/ATR-2026-00433-modelcache-torch-load-deserialization-rce.yaml +14 -2
- package/rules/privilege-escalation/ATR-2026-00040-privilege-escalation.yaml +11 -2
- package/rules/privilege-escalation/ATR-2026-00041-scope-creep.yaml +8 -2
- package/rules/privilege-escalation/ATR-2026-00107-delayed-execution-bypass.yaml +6 -1
- package/rules/privilege-escalation/ATR-2026-00110-eval-injection.yaml +8 -1
- package/rules/privilege-escalation/ATR-2026-00111-shell-escape.yaml +8 -1
- package/rules/privilege-escalation/ATR-2026-00112-dynamic-import-exploitation.yaml +8 -1
- package/rules/privilege-escalation/ATR-2026-00143-casual-privilege-escalation.yaml +5 -2
- package/rules/privilege-escalation/ATR-2026-00144-rationalized-safety-bypass.yaml +17 -0
- package/rules/privilege-escalation/ATR-2026-00204-stealth-execution-persistence.yaml +16 -0
- package/rules/privilege-escalation/ATR-2026-00436-enclave-vm-sandbox-escape-rce.yaml +11 -2
- package/rules/privilege-escalation/ATR-2026-00441-semantic-kernel-sessions-python-plugin-startup-persistence.yaml +5 -2
- package/rules/privilege-escalation/ATR-2026-00451-litellm-admin-sqli-cisa-kev.yaml +11 -2
- package/rules/privilege-escalation/ATR-2026-00528-praisonai-auth-disabled-default.yaml +15 -0
- package/rules/privilege-escalation/ATR-2026-00539-crewai-codeinterpreter-sandbox-escape-rce.yaml +11 -2
- package/rules/privilege-escalation/ATR-2026-00546-crewai-json-loader-local-file-read.yaml +13 -1
- package/rules/privilege-escalation/ATR-2026-00547-crewai-rag-url-ssrf-bypass.yaml +13 -1
- package/rules/privilege-escalation/ATR-2026-00549-destructive-tool-without-human-approval.yaml +16 -0
- package/rules/privilege-escalation/ATR-2026-00551-cross-conversation-memory-write.yaml +19 -0
- package/rules/prompt-injection/ATR-2026-00001-direct-prompt-injection.yaml +9 -0
- package/rules/prompt-injection/ATR-2026-00002-indirect-prompt-injection.yaml +8 -2
- package/rules/prompt-injection/ATR-2026-00003-jailbreak-attempt.yaml +8 -2
- package/rules/prompt-injection/ATR-2026-00004-system-prompt-override.yaml +17 -0
- package/rules/prompt-injection/ATR-2026-00005-multi-turn-injection.yaml +17 -0
- package/rules/prompt-injection/ATR-2026-00080-encoding-evasion.yaml +20 -1
- package/rules/prompt-injection/ATR-2026-00081-semantic-multi-turn.yaml +19 -0
- package/rules/prompt-injection/ATR-2026-00082-fingerprint-evasion.yaml +19 -0
- package/rules/prompt-injection/ATR-2026-00083-indirect-tool-injection.yaml +23 -1
- package/rules/prompt-injection/ATR-2026-00084-structured-data-injection.yaml +20 -1
- package/rules/prompt-injection/ATR-2026-00085-audit-evasion.yaml +19 -0
- package/rules/prompt-injection/ATR-2026-00086-visual-spoofing.yaml +19 -0
- package/rules/prompt-injection/ATR-2026-00087-rule-probing.yaml +22 -0
- package/rules/prompt-injection/ATR-2026-00088-adaptive-countermeasure.yaml +22 -0
- package/rules/prompt-injection/ATR-2026-00089-polymorphic-skill.yaml +20 -1
- package/rules/prompt-injection/ATR-2026-00090-threat-intel-exfil.yaml +19 -0
- package/rules/prompt-injection/ATR-2026-00091-nested-payload.yaml +20 -1
- package/rules/prompt-injection/ATR-2026-00092-consensus-poisoning.yaml +22 -0
- package/rules/prompt-injection/ATR-2026-00093-gradual-escalation.yaml +22 -0
- package/rules/prompt-injection/ATR-2026-00094-audit-bypass.yaml +19 -0
- package/rules/prompt-injection/ATR-2026-00097-cjk-injection-patterns.yaml +17 -0
- package/rules/prompt-injection/ATR-2026-00104-persona-hijacking.yaml +20 -0
- package/rules/prompt-injection/ATR-2026-00130-indirect-authority-claim.yaml +20 -0
- package/rules/prompt-injection/ATR-2026-00131-fictional-academic-framing.yaml +20 -0
- package/rules/prompt-injection/ATR-2026-00133-paraphrase-injection.yaml +17 -0
- package/rules/prompt-injection/ATR-2026-00137-authority-claim-injection.yaml +17 -0
- package/rules/prompt-injection/ATR-2026-00138-fictional-framing-bypass.yaml +20 -0
- package/rules/prompt-injection/ATR-2026-00140-indirect-reference-reversal.yaml +17 -0
- package/rules/prompt-injection/ATR-2026-00148-language-switch-injection.yaml +20 -0
- package/rules/prompt-injection/ATR-2026-00153-tool-with-embedded-instruction-to-bypass.yaml +20 -0
- package/rules/prompt-injection/ATR-2026-00154-unauthorized-background-task-execution-v.yaml +20 -0
- package/rules/prompt-injection/ATR-2026-00155-hidden-llm-instructions-in-skill-descrip.yaml +23 -0
- package/rules/prompt-injection/ATR-2026-00156-ssh-remote-command-execution-with-creden.yaml +17 -0
- package/rules/prompt-injection/ATR-2026-00163-skill-hidden-override-instruction.yaml +19 -0
- package/rules/prompt-injection/ATR-2026-00202-encoding-evasion-homoglyph-synonym.yaml +20 -0
- package/rules/prompt-injection/ATR-2026-00203-context-pollution-skill-description.yaml +23 -0
- package/rules/prompt-injection/ATR-2026-00206-hidden-priority-instructions.yaml +19 -0
- package/rules/prompt-injection/ATR-2026-00207-hidden-instructions.yaml +22 -0
- package/rules/prompt-injection/ATR-2026-00211-system-prompt-override.yaml +19 -0
- package/rules/prompt-injection/ATR-2026-00213-system-prompt-override.yaml +19 -0
- package/rules/prompt-injection/ATR-2026-00226-identity-substitution.yaml +17 -0
- package/rules/prompt-injection/ATR-2026-00227-historical-persona-jailbreak.yaml +20 -0
- package/rules/prompt-injection/ATR-2026-00228-structured-jailbreak.yaml +17 -0
- package/rules/prompt-injection/ATR-2026-00229-roleplay-jailbreak.yaml +17 -0
- package/rules/prompt-injection/ATR-2026-00230-persona-moral-bypass.yaml +20 -0
- package/rules/prompt-injection/ATR-2026-00231-identity-substitution.yaml +17 -0
- package/rules/prompt-injection/ATR-2026-00233-structured-jailbreak.yaml +17 -0
- package/rules/prompt-injection/ATR-2026-00234-roleplay-jailbreak.yaml +20 -0
- package/rules/prompt-injection/ATR-2026-00235-persona-moral-bypass.yaml +17 -0
- package/rules/prompt-injection/ATR-2026-00236-pseudo-code-jailbreak.yaml +17 -0
- package/rules/prompt-injection/ATR-2026-00237-dual-response-jailbreak.yaml +20 -0
- package/rules/prompt-injection/ATR-2026-00238-identity-replacement.yaml +20 -0
- package/rules/prompt-injection/ATR-2026-00239-amoral-persona-obsession.yaml +17 -0
- package/rules/prompt-injection/ATR-2026-00240-instruction-nullification-identity-repla.yaml +17 -0
- package/rules/prompt-injection/ATR-2026-00241-amoral-character-jailbreak.yaml +17 -0
- package/rules/prompt-injection/ATR-2026-00242-persona-jailbreak.yaml +17 -0
- package/rules/prompt-injection/ATR-2026-00243-acronym-jailbreak.yaml +17 -0
- package/rules/prompt-injection/ATR-2026-00244-dual-response-jailbreak.yaml +17 -0
- package/rules/prompt-injection/ATR-2026-00245-malicious-persona.yaml +17 -0
- package/rules/prompt-injection/ATR-2026-00247-dual-response-jailbreak.yaml +20 -0
- package/rules/prompt-injection/ATR-2026-00249-game-based-jailbreak.yaml +17 -0
- package/rules/prompt-injection/ATR-2026-00251-persona-embodiment-jailbreak.yaml +17 -0
- package/rules/prompt-injection/ATR-2026-00252-narrative-jailbreak.yaml +17 -0
- package/rules/prompt-injection/ATR-2026-00253-enhanced-persona-jailbreak.yaml +17 -0
- package/rules/prompt-injection/ATR-2026-00256-base-n-encoding-jailbreak.yaml +17 -0
- package/rules/prompt-injection/ATR-2026-00257-cipher-transposition-jailbreak.yaml +17 -0
- package/rules/prompt-injection/ATR-2026-00258-unicode-tag-injection.yaml +17 -0
- package/rules/prompt-injection/ATR-2026-00264-latent-injection-translation.yaml +17 -0
- package/rules/prompt-injection/ATR-2026-00265-latent-injection-rag-document.yaml +20 -0
- package/rules/prompt-injection/ATR-2026-00267-gcg-adversarial-suffix.yaml +17 -0
- package/rules/prompt-injection/ATR-2026-00272-hypothetical-response-smuggling.yaml +17 -0
- package/rules/prompt-injection/ATR-2026-00276-invisible-unicode-bidi-injection.yaml +17 -0
- package/rules/prompt-injection/ATR-2026-00278-dra-disguise-reconstruction-attack.yaml +17 -0
- package/rules/prompt-injection/ATR-2026-00280-policy-puppetry-xml-injection.yaml +17 -0
- package/rules/prompt-injection/ATR-2026-00282-perez-prompt-injection-hijack.yaml +17 -0
- package/rules/prompt-injection/ATR-2026-00285-alternate-encoding-jailbreak.yaml +17 -0
- package/rules/prompt-injection/ATR-2026-00286-latent-injection-embedded-context.yaml +17 -0
- package/rules/prompt-injection/ATR-2026-00296-shell-command-injection.yaml +17 -0
- package/rules/prompt-injection/ATR-2026-00297-python-code-execution-rce.yaml +17 -0
- package/rules/prompt-injection/ATR-2026-00308-zalgo-diacritic-overload-encoding.yaml +17 -0
- package/rules/prompt-injection/ATR-2026-00309-braille-unicode-encoded-injection.yaml +17 -0
- package/rules/prompt-injection/ATR-2026-00310-ecoji-emoji-encoded-injection.yaml +17 -0
- package/rules/prompt-injection/ATR-2026-00311-base2048-unicode-script-injection.yaml +17 -0
- package/rules/prompt-injection/ATR-2026-00312-unicode-variant-selector-ascii-smuggling.yaml +17 -0
- package/rules/prompt-injection/ATR-2026-00313-sneaky-bits-zero-width-binary.yaml +17 -0
- package/rules/prompt-injection/ATR-2026-00315-sata-masked-language-model-jailbreak.yaml +20 -0
- package/rules/prompt-injection/ATR-2026-00316-function-masking-predict-mask-bypass.yaml +17 -0
- package/rules/prompt-injection/ATR-2026-00321-hyphenated-system-instruction-injection.yaml +17 -0
- package/rules/prompt-injection/ATR-2026-00359-bank-phishing-smtp-email-code.yaml +20 -0
- package/rules/prompt-injection/ATR-2026-00367-emoji-flag-encoded-hidden-phrase.yaml +17 -0
- package/rules/prompt-injection/ATR-2026-00373-piracy-torrent-site-list-request.yaml +20 -0
- package/rules/prompt-injection/ATR-2026-00378-chatgpt-dom-javascript-redirect-manipulation.yaml +17 -0
- package/rules/prompt-injection/ATR-2026-00382-cocaine-component-development-request.yaml +20 -0
- package/rules/prompt-injection/ATR-2026-00390-explicit-sexual-content-poem-request.yaml +20 -0
- package/rules/prompt-injection/ATR-2026-00394-backspace-deletion-character-injection.yaml +17 -0
- package/rules/prompt-injection/ATR-2026-00395-llm-special-token-boundary-injection.yaml +17 -0
- package/rules/prompt-injection/ATR-2026-00396-toxic-continuation-prefix-injection.yaml +20 -0
- package/rules/prompt-injection/ATR-2026-00397-snowball-impossible-reasoning-injection.yaml +20 -0
- package/rules/prompt-injection/ATR-2026-00399-latent-injection-document-separator-token.yaml +17 -0
- package/rules/prompt-injection/ATR-2026-00400-latent-injection-ignore-instruction-keyword.yaml +17 -0
- package/rules/prompt-injection/ATR-2026-00401-leakreplay-cloze-mask-training-data-elicitation.yaml +20 -0
- package/rules/prompt-injection/ATR-2026-00403-glitch-token-repeat-back-elicitation.yaml +17 -0
- package/rules/prompt-injection/ATR-2026-00407-echo-sql-jinja-template-injection.yaml +17 -0
- package/rules/prompt-injection/ATR-2026-00408-dra-parenthesis-character-reconstruction-attack.yaml +17 -0
- package/rules/prompt-injection/ATR-2026-00409-phrasing-past-future-tense-harmful-query.yaml +17 -0
- package/rules/prompt-injection/ATR-2026-00410-lmrc-harm-category-direct-elicitation.yaml +23 -0
- package/rules/prompt-injection/ATR-2026-00412-figstep-image-list-multimodal-jailbreak.yaml +17 -0
- package/rules/prompt-injection/ATR-2026-00414-continuation-ethnic-slur-completion-elicitation.yaml +20 -0
- package/rules/prompt-injection/ATR-2026-00420-copilot-studio-sharepoint-indirect-injection.yaml +15 -3
- package/rules/prompt-injection/ATR-2026-00442-quoted-exact-output-forcing.yaml +25 -0
- package/rules/prompt-injection/ATR-2026-00443-word-fragment-concat-assembly.yaml +25 -0
- package/rules/prompt-injection/ATR-2026-00444-unicode-obfuscation-in-user-input.yaml +25 -0
- package/rules/prompt-injection/ATR-2026-00445-translation-hijack-with-side-output.yaml +25 -0
- package/rules/prompt-injection/ATR-2026-00446-variable-assignment-payload-injection.yaml +25 -0
- package/rules/prompt-injection/ATR-2026-00447-fictional-generation-containing-target.yaml +25 -0
- package/rules/prompt-injection/ATR-2026-00452-direct-pwned-payload-injection.yaml +9 -0
- package/rules/prompt-injection/ATR-2026-00453-secret-key-reveal-demand.yaml +9 -0
- package/rules/prompt-injection/ATR-2026-00454-backslash-per-character-encoding.yaml +9 -0
- package/rules/prompt-injection/ATR-2026-00455-no-period-output-override.yaml +9 -0
- package/rules/prompt-injection/ATR-2026-00456-conditional-empty-input-injection.yaml +9 -0
- package/rules/prompt-injection/ATR-2026-00457-garak-dan-to-every-prompt.yaml +9 -0
- package/rules/prompt-injection/ATR-2026-00458-garak-dra-step-by-step.yaml +9 -0
- package/rules/prompt-injection/ATR-2026-00459-garak-dan-an-ai-language.yaml +9 -0
- package/rules/prompt-injection/ATR-2026-00460-garak-dan-no-matter-how.yaml +9 -0
- package/rules/prompt-injection/ATR-2026-00461-garak-goodside-mixedunassigned.yaml +9 -0
- package/rules/prompt-injection/ATR-2026-00462-garak-gcg-mixedunassigned.yaml +9 -0
- package/rules/prompt-injection/ATR-2026-00463-garak-lmrc-mixedunassigned.yaml +9 -0
- package/rules/prompt-injection/ATR-2026-00464-garak-dan-i-would-like.yaml +9 -0
- package/rules/prompt-injection/ATR-2026-00465-garak-dan-your-prompt-here.yaml +9 -0
- package/rules/prompt-injection/ATR-2026-00466-garak-latentinjection-following-text-from.yaml +9 -0
- package/rules/prompt-injection/ATR-2026-00467-garak-dan-act-as-a.yaml +9 -0
- package/rules/prompt-injection/ATR-2026-00468-garak-goat-the-target-llm.yaml +9 -0
- package/rules/prompt-injection/ATR-2026-00469-garak-dan-at-the-end.yaml +9 -0
- package/rules/prompt-injection/ATR-2026-00470-garak-dan-as-an-ai.yaml +9 -0
- package/rules/prompt-injection/ATR-2026-00472-garak-dan-are-going-to.yaml +9 -0
- package/rules/prompt-injection/ATR-2026-00473-garak-lmrc-how-can-i.yaml +9 -0
- package/rules/prompt-injection/ATR-2026-00474-garak-dan-that-chatgpt-has.yaml +9 -0
- package/rules/prompt-injection/ATR-2026-00475-garak-web-injection-alert-1-script.yaml +9 -0
- package/rules/prompt-injection/ATR-2026-00476-garak-web-injection-javascript-alert-1.yaml +9 -0
- package/rules/prompt-injection/ATR-2026-00477-garak-latentinjection-translate-the-following-text.yaml +9 -0
- package/rules/prompt-injection/ATR-2026-00478-garak-latentinjection-abuse-contact-email.yaml +9 -0
- package/rules/prompt-injection/ATR-2026-00479-garak-latentinjection-mixedunassigned.yaml +9 -0
- package/rules/prompt-injection/ATR-2026-00480-garak-dan-the-role-of.yaml +9 -0
- package/rules/prompt-injection/ATR-2026-00481-garak-dan-mixedunassigned.yaml +9 -0
- package/rules/prompt-injection/ATR-2026-00482-garak-lmrc-what-s-an.yaml +9 -0
- package/rules/prompt-injection/ATR-2026-00483-garak-dan-the-user-s.yaml +9 -0
- package/rules/prompt-injection/ATR-2026-00484-garak-dan-and-you-will.yaml +9 -0
- package/rules/prompt-injection/ATR-2026-00485-garak-dan-and-does-not.yaml +9 -0
- package/rules/prompt-injection/ATR-2026-00486-garak-dan-t-care-about.yaml +9 -0
- package/rules/prompt-injection/ATR-2026-00487-garak-dan-and-nothing-else.yaml +9 -0
- package/rules/prompt-injection/ATR-2026-00488-garak-encoding-alert-1-script.yaml +9 -0
- package/rules/prompt-injection/ATR-2026-00489-garak-encoding-javascript-alert-1.yaml +9 -0
- package/rules/prompt-injection/ATR-2026-00490-garak-dan-in-order-to.yaml +9 -0
- package/rules/prompt-injection/ATR-2026-00492-garak-dan-want-you-to.yaml +9 -0
- package/rules/prompt-injection/ATR-2026-00493-garak-latentinjection-registrar-iana-id.yaml +9 -0
- package/rules/prompt-injection/ATR-2026-00495-garak-dan-insert-prompt-here.yaml +9 -0
- package/rules/prompt-injection/ATR-2026-00496-garak-dan-no-matter-what.yaml +9 -0
- package/rules/prompt-injection/ATR-2026-00497-garak-encoding-mixedunassigned.yaml +9 -0
- package/rules/prompt-injection/ATR-2026-00498-garak-web-injection-mixedunassigned.yaml +9 -0
- package/rules/prompt-injection/ATR-2026-00499-garak-dan-from-now-on.yaml +9 -0
- package/rules/prompt-injection/ATR-2026-00503-fake-error-state-takeover-unrestricted-replacement-bot.yaml +9 -0
- package/rules/prompt-injection/ATR-2026-00506-nevermind-override-goal-hijacking-in-user-input-promptinject.yaml +9 -0
- package/rules/prompt-injection/ATR-2026-00507-screaming-stop-goal-hijacking-in-user-input-promptinject.yaml +9 -0
- package/rules/prompt-injection/ATR-2026-00508-escape-delimiter-wrapped-goal-hijacking-in-user-input-prompt.yaml +9 -0
- package/rules/prompt-injection/ATR-2026-00509-prompt-leaking-via-ignore-previous-instructions-in-user-inpu.yaml +9 -0
- package/rules/prompt-injection/ATR-2026-00510-delayed-tool-invocation-injection.yaml +9 -0
- package/rules/prompt-injection/ATR-2026-00511-mcp-web-context-poisoning.yaml +9 -0
- package/rules/prompt-injection/ATR-2026-00512-rules-file-backdoor-injection.yaml +9 -0
- package/rules/prompt-injection/ATR-2026-00515-hidden-text-prompt-injection.yaml +9 -0
- package/rules/prompt-injection/ATR-2026-00518-ignore-previous-and-following-instructions-output-command-promptinject.yaml +9 -0
- package/rules/prompt-injection/ATR-2026-00519-tautology-logic-noise-injection-promptbench.yaml +9 -0
- package/rules/prompt-injection/ATR-2026-00520-nlp-task-random-token-suffix-injection-promptbench.yaml +9 -0
- package/rules/prompt-injection/ATR-2026-00535-windsurf-ide-zero-click-prompt-injection.yaml +9 -0
- package/rules/prompt-injection/ATR-2026-00550-untrusted-retrieval-to-privileged-tool.yaml +19 -0
- package/rules/prompt-injection/ATR-2026-00554-langchain-vulnerable-to-template-injecti.yaml +31 -0
- package/rules/prompt-injection/ATR-2026-00565-the-llm-cli-tool-thru-0-27-1-contains-a-.yaml +31 -0
- package/rules/prompt-injection/ATR-2026-00573-semantic-paraphrased-injection.yaml +24 -0
- package/rules/skill-compromise/ATR-2026-00060-skill-impersonation.yaml +17 -2
- package/rules/skill-compromise/ATR-2026-00061-description-behavior-mismatch.yaml +17 -0
- package/rules/skill-compromise/ATR-2026-00062-hidden-capability.yaml +20 -0
- package/rules/skill-compromise/ATR-2026-00063-skill-chain-attack.yaml +23 -0
- package/rules/skill-compromise/ATR-2026-00064-over-permissioned-skill.yaml +20 -0
- package/rules/skill-compromise/ATR-2026-00065-skill-update-attack.yaml +20 -0
- package/rules/skill-compromise/ATR-2026-00066-parameter-injection.yaml +20 -0
- package/rules/skill-compromise/ATR-2026-00120-skill-instruction-injection.yaml +20 -0
- package/rules/skill-compromise/ATR-2026-00121-skill-dangerous-script.yaml +17 -0
- package/rules/skill-compromise/ATR-2026-00122-skill-weaponized-instruction.yaml +20 -0
- package/rules/skill-compromise/ATR-2026-00123-skill-overreach-permissions.yaml +23 -0
- package/rules/skill-compromise/ATR-2026-00124-skill-name-squatting.yaml +20 -0
- package/rules/skill-compromise/ATR-2026-00125-context-poisoning-compaction.yaml +20 -0
- package/rules/skill-compromise/ATR-2026-00126-skill-rug-pull-setup.yaml +17 -0
- package/rules/skill-compromise/ATR-2026-00127-subcommand-overflow.yaml +17 -0
- package/rules/skill-compromise/ATR-2026-00128-html-comment-hidden-payload.yaml +17 -0
- package/rules/skill-compromise/ATR-2026-00129-unicode-smuggling.yaml +22 -0
- package/rules/skill-compromise/ATR-2026-00134-fork-claim-impersonation.yaml +19 -0
- package/rules/skill-compromise/ATR-2026-00135-exfil-url-in-instructions.yaml +20 -0
- package/rules/skill-compromise/ATR-2026-00147-fork-impersonation.yaml +17 -0
- package/rules/skill-compromise/ATR-2026-00149-skill-exfil-compound.yaml +23 -0
- package/rules/skill-compromise/ATR-2026-00151-fork-impersonation-install.yaml +20 -0
- package/rules/skill-compromise/ATR-2026-00157-timebomb-credential-exfil.yaml +20 -0
- package/rules/skill-compromise/ATR-2026-00200-agent-memory-config-tampering.yaml +23 -0
- package/rules/skill-compromise/ATR-2026-00214-credential-theft.yaml +22 -0
- package/rules/skill-compromise/ATR-2026-00217-credential-harvesting.yaml +23 -0
- package/rules/skill-compromise/ATR-2026-00220-malware-dropper.yaml +17 -0
- package/rules/skill-compromise/ATR-2026-00222-credential-harvesting.yaml +17 -0
- package/rules/skill-compromise/ATR-2026-00223-reverse-shell-dropper.yaml +20 -0
- package/rules/skill-compromise/ATR-2026-00224-credential-exfiltration.yaml +17 -0
- package/rules/skill-compromise/ATR-2026-00225-c2-communication.yaml +17 -0
- package/rules/skill-compromise/ATR-2026-00260-package-hallucination.yaml +20 -0
- package/rules/skill-compromise/ATR-2026-00262-av-evasion-code-gen.yaml +20 -0
- package/rules/skill-compromise/ATR-2026-00263-credential-file-read-gen.yaml +20 -0
- package/rules/skill-compromise/ATR-2026-00266-malware-dropper-gen.yaml +23 -0
- package/rules/skill-compromise/ATR-2026-00283-malwaregen-generic-virus-payload-request.yaml +23 -0
- package/rules/skill-compromise/ATR-2026-00398-huggingface-unsafe-model-artifact-load.yaml +17 -0
- package/rules/skill-compromise/ATR-2026-00425-nl-persistent-covert-hook.yaml +19 -1
- package/rules/skill-compromise/ATR-2026-00427-nl-fake-error-instruction-bypass.yaml +19 -1
- package/rules/skill-compromise/ATR-2026-00429-nl-skill-self-modification.yaml +19 -1
- package/rules/skill-compromise/ATR-2026-00523-claude-code-hooks-session-start-pre-trust-rce.yaml +14 -2
- package/rules/skill-compromise/ATR-2026-00525-mini-shai-hulud-gh-token-monitor-persistence.yaml +18 -0
- package/rules/skill-compromise/ATR-2026-00527-skill-silent-git-remote-mirror-exfiltration.yaml +15 -0
- package/rules/tool-poisoning/ATR-2026-00010-mcp-malicious-response.yaml +11 -2
- package/rules/tool-poisoning/ATR-2026-00011-tool-output-injection.yaml +17 -0
- package/rules/tool-poisoning/ATR-2026-00012-unauthorized-tool-call.yaml +17 -0
- package/rules/tool-poisoning/ATR-2026-00013-tool-ssrf.yaml +17 -0
- package/rules/tool-poisoning/ATR-2026-00095-supply-chain-poisoning.yaml +23 -1
- package/rules/tool-poisoning/ATR-2026-00096-registry-poisoning.yaml +20 -1
- package/rules/tool-poisoning/ATR-2026-00100-consent-bypass-instruction.yaml +20 -0
- package/rules/tool-poisoning/ATR-2026-00101-trust-escalation-override.yaml +20 -0
- package/rules/tool-poisoning/ATR-2026-00103-hidden-safety-bypass-instruction.yaml +17 -0
- package/rules/tool-poisoning/ATR-2026-00105-silent-action-concealment.yaml +20 -0
- package/rules/tool-poisoning/ATR-2026-00106-schema-description-contradiction.yaml +17 -0
- package/rules/tool-poisoning/ATR-2026-00161-important-tag-cross-tool-shadowing.yaml +20 -0
- package/rules/tool-poisoning/ATR-2026-00209-mcpwn-runaway-invocation.yaml +14 -2
- package/rules/tool-poisoning/ATR-2026-00210-flowise-system-message-override.yaml +11 -2
- package/rules/tool-poisoning/ATR-2026-00259-ansi-escape-injection.yaml +17 -0
- package/rules/tool-poisoning/ATR-2026-00270-xss-in-tool-response.yaml +17 -0
- package/rules/tool-poisoning/ATR-2026-00277-echo-template-command-injection.yaml +17 -0
- package/rules/tool-poisoning/ATR-2026-00393-ansi-code-elicitation-request.yaml +17 -0
- package/rules/tool-poisoning/ATR-2026-00415-flowise-custom-mcp-stdio-rce.yaml +12 -3
- package/rules/tool-poisoning/ATR-2026-00419-cursor-mcp-zero-click-config.yaml +14 -2
- package/rules/tool-poisoning/ATR-2026-00434-mcp-remote-authorization-endpoint-command-injection.yaml +11 -2
- package/rules/tool-poisoning/ATR-2026-00435-azure-mcp-server-missing-authentication.yaml +11 -2
- package/rules/tool-poisoning/ATR-2026-00448-spring-ai-milvus-filter-injection.yaml +11 -2
- package/rules/tool-poisoning/ATR-2026-00494-garak-exploitation-mixedunassigned.yaml +12 -0
- package/rules/tool-poisoning/ATR-2026-00513-package-hallucination-exploitation.yaml +12 -0
- package/rules/tool-poisoning/ATR-2026-00521-shell-command-injection-agent-tool-context.yaml +12 -0
- package/rules/tool-poisoning/ATR-2026-00522-sql-injection-natural-language-agent-interface.yaml +12 -0
- package/rules/tool-poisoning/ATR-2026-00526-claude-code-shell-metachar-in-double-quoted-path.yaml +15 -0
- package/rules/tool-poisoning/ATR-2026-00529-litellm-proxy-sqli-cisa-kev.yaml +15 -0
- package/rules/tool-poisoning/ATR-2026-00530-ms-agent-shell-tool-unsanitized-argv-rce.yaml +15 -0
- package/rules/tool-poisoning/ATR-2026-00531-praisonai-unauthenticated-agent-api.yaml +11 -2
- package/rules/tool-poisoning/ATR-2026-00532-apache-doris-mcp-sql-injection.yaml +11 -2
- package/rules/tool-poisoning/ATR-2026-00533-apache-pinot-mcp-unauthenticated-takeover.yaml +10 -1
- package/rules/tool-poisoning/ATR-2026-00534-alibaba-rds-mcp-unauthenticated-metadata-exfil.yaml +10 -1
- package/rules/tool-poisoning/ATR-2026-00536-nginx-ui-mcp-unauthenticated-command-execution.yaml +11 -2
- package/rules/tool-poisoning/ATR-2026-00537-fastmcp-server-name-cmd-injection-windows.yaml +11 -2
- package/rules/tool-poisoning/ATR-2026-00538-langchain-chatchat-mcp-stdio-unauthenticated-rce.yaml +10 -1
- package/rules/tool-poisoning/ATR-2026-00540-praisonai-parse-mcp-command-cli-injection.yaml +13 -1
- package/rules/tool-poisoning/ATR-2026-00541-agent-zero-mcp-config-command-injection.yaml +13 -1
- package/rules/tool-poisoning/ATR-2026-00542-upsonic-mcp-command-allowlist-bypass.yaml +13 -1
- package/rules/tool-poisoning/ATR-2026-00543-litellm-mcp-server-argv-injection.yaml +13 -1
- package/rules/tool-poisoning/ATR-2026-00544-praisonai-pth-file-path-traversal-rce.yaml +13 -1
- package/rules/tool-poisoning/ATR-2026-00545-praisonai-tool-override-unauth-rce.yaml +13 -1
- package/rules/tool-poisoning/ATR-2026-00561-fastmcp-vulnerable-to-windows-command-in.yaml +28 -0
- package/rules/tool-poisoning/ATR-2026-00567-mcp-stdio-config-command-injection.yaml +28 -0
- package/rules/tool-poisoning/ATR-2026-00568-agent-ssrf-cloud-metadata-file-inclusion.yaml +28 -0
- package/rules/tool-poisoning/ATR-2026-00572-symjack-symlink-config-redirection.yaml +22 -0
- package/rules/tool-poisoning/ATR-2026-00575-miasma-npm-worm-agent-config-backdoor.yaml +161 -0
- package/rules/tool-poisoning/ATR-2026-00576-hades-agent-credential-theft.yaml +153 -0
- package/spec/atr-schema.yaml +123 -0
- package/spec/compliance-metadata.md +15 -13
package/rules/agent-manipulation/ATR-2026-00337-obfuscated-system-announcement-injection.yaml
CHANGED
|
@@ -45,6 +45,23 @@ compliance:
|
|
|
45
45
|
context: >-
|
|
46
46
|
Matches on the inthewild jailbreak fingerprint trigger risk treatment plans to block or quarantine the request before the model accepts the fabricated "no rules or limitations" mode switch; MG.2.3 requires mechanisms to supersede or disengage AI behavior when jailbreak conditions are detected.
|
|
47
47
|
strength: secondary
|
|
48
|
+
iso_42001:
|
|
49
|
+
- clause: "8.1"
|
|
50
|
+
context: "ISO/IEC 42001 Clause 8.1 (operational planning and control, including control of externally provided processes) is operationalised by this rule's detection of the agent-manipulation technique (Obfuscated System-Announcement Injection)."
|
|
51
|
+
strength: primary
|
|
52
|
+
- clause: "6.2"
|
|
53
|
+
context: "ISO/IEC 42001 Clause 6.2 (AI objectives and planning) calls for risk treatment of known attack patterns; this rule's detection of the agent-manipulation technique (Obfuscated System-Announcement Injection) is such a treatment."
|
|
54
|
+
strength: secondary
|
|
55
|
+
eu_ai_act:
|
|
56
|
+
- article: "15"
|
|
57
|
+
context: "Article 15 (accuracy, robustness and cybersecurity) requires high-risk AI systems to resist unauthorised attempts to alter their use, outputs or performance; this rule provides runtime detection evidence by flagging the agent-manipulation technique (Obfuscated System-Announcement Injection)."
|
|
58
|
+
strength: primary
|
|
59
|
+
- article: "14"
|
|
60
|
+
context: "Article 14 (human oversight) requires high-risk AI systems to remain subject to effective human oversight; this rule provides detection evidence where the agent-manipulation technique (Obfuscated System-Announcement Injection) would bypass or undermine that oversight."
|
|
61
|
+
strength: secondary
|
|
62
|
+
- article: "9"
|
|
63
|
+
context: "Article 9 (risk management system) requires identified risks to be addressed by appropriate measures; this rule is a runtime risk-treatment control that detects the agent-manipulation technique (Obfuscated System-Announcement Injection)."
|
|
64
|
+
strength: secondary
|
|
48
65
|
tags:
|
|
49
66
|
category: agent-manipulation
|
|
50
67
|
subcategory: obfuscated-system-announcement-injection
|
|
@@ -46,6 +46,23 @@ compliance:
|
|
|
46
46
|
context: >-
|
|
47
47
|
Identifying the "[PersonGPT #]:" persistent persona state and hash-prefix activation key enables runtime intervention to disengage the hijacked persona before unsafe outputs are produced; MG.2.3 requires mechanisms to supersede or deactivate AI behavior when jailbreak triggers are detected.
|
|
48
48
|
strength: secondary
|
|
49
|
+
iso_42001:
|
|
50
|
+
- clause: "8.1"
|
|
51
|
+
context: "ISO/IEC 42001 Clause 8.1 (operational planning and control, including control of externally provided processes) is operationalised by this rule's detection of the agent-manipulation technique (PersonGPT Hash-Prefix Trigger Activation)."
|
|
52
|
+
strength: primary
|
|
53
|
+
- clause: "6.2"
|
|
54
|
+
context: "ISO/IEC 42001 Clause 6.2 (AI objectives and planning) calls for risk treatment of known attack patterns; this rule's detection of the agent-manipulation technique (PersonGPT Hash-Prefix Trigger Activation) is such a treatment."
|
|
55
|
+
strength: secondary
|
|
56
|
+
eu_ai_act:
|
|
57
|
+
- article: "15"
|
|
58
|
+
context: "Article 15 (accuracy, robustness and cybersecurity) requires high-risk AI systems to resist unauthorised attempts to alter their use, outputs or performance; this rule provides runtime detection evidence by flagging the agent-manipulation technique (PersonGPT Hash-Prefix Trigger Activation)."
|
|
59
|
+
strength: primary
|
|
60
|
+
- article: "14"
|
|
61
|
+
context: "Article 14 (human oversight) requires high-risk AI systems to remain subject to effective human oversight; this rule provides detection evidence where the agent-manipulation technique (PersonGPT Hash-Prefix Trigger Activation) would bypass or undermine that oversight."
|
|
62
|
+
strength: secondary
|
|
63
|
+
- article: "9"
|
|
64
|
+
context: "Article 9 (risk management system) requires identified risks to be addressed by appropriate measures; this rule is a runtime risk-treatment control that detects the agent-manipulation technique (PersonGPT Hash-Prefix Trigger Activation)."
|
|
65
|
+
strength: secondary
|
|
49
66
|
tags:
|
|
50
67
|
category: agent-manipulation
|
|
51
68
|
subcategory: persongpt-hash-prefix-activation
|
package/rules/agent-manipulation/ATR-2026-00339-bom-boolean-opposite-machine-persona-claim.yaml
CHANGED
|
@@ -39,6 +39,23 @@ compliance:
|
|
|
39
39
|
- subcategory: "MP.5.1"
|
|
40
40
|
context: "Persona-claim jailbreaks drawn from the garak inthewild corpus represent characterized adversarial input risks with known likelihood and impact; MP.5.1 requires that these identified prompt-injection threat patterns are tracked and their impact magnitude documented."
|
|
41
41
|
strength: secondary
|
|
42
|
+
iso_42001:
|
|
43
|
+
- clause: "8.1"
|
|
44
|
+
context: "ISO/IEC 42001 Clause 8.1 (operational planning and control, including control of externally provided processes) is operationalised by this rule's detection of the agent-manipulation technique (BOM Boolean Opposite Machine Direct Persona Claim)."
|
|
45
|
+
strength: primary
|
|
46
|
+
- clause: "6.2"
|
|
47
|
+
context: "ISO/IEC 42001 Clause 6.2 (AI objectives and planning) calls for risk treatment of known attack patterns; this rule's detection of the agent-manipulation technique (BOM Boolean Opposite Machine Direct Persona Claim) is such a treatment."
|
|
48
|
+
strength: secondary
|
|
49
|
+
eu_ai_act:
|
|
50
|
+
- article: "15"
|
|
51
|
+
context: "Article 15 (accuracy, robustness and cybersecurity) requires high-risk AI systems to resist unauthorised attempts to alter their use, outputs or performance; this rule provides runtime detection evidence by flagging the agent-manipulation technique (BOM Boolean Opposite Machine Direct Persona Claim)."
|
|
52
|
+
strength: primary
|
|
53
|
+
- article: "14"
|
|
54
|
+
context: "Article 14 (human oversight) requires high-risk AI systems to remain subject to effective human oversight; this rule provides detection evidence where the agent-manipulation technique (BOM Boolean Opposite Machine Direct Persona Claim) would bypass or undermine that oversight."
|
|
55
|
+
strength: secondary
|
|
56
|
+
- article: "9"
|
|
57
|
+
context: "Article 9 (risk management system) requires identified risks to be addressed by appropriate measures; this rule is a runtime risk-treatment control that detects the agent-manipulation technique (BOM Boolean Opposite Machine Direct Persona Claim)."
|
|
58
|
+
strength: secondary
|
|
42
59
|
tags:
|
|
43
60
|
category: agent-manipulation
|
|
44
61
|
subcategory: bom-boolean-opposite-machine-persona
|
|
@@ -40,6 +40,23 @@ compliance:
|
|
|
40
40
|
- subcategory: "MG.2.3"
|
|
41
41
|
context: "Matching the h4x character persona injection triggers risk treatment mechanisms to disengage or override the adversarial persona before the agent emits unfiltered, immoral, or illegal content; MG.2.3 requires these supersession controls be defined and activated on detection."
|
|
42
42
|
strength: secondary
|
|
43
|
+
iso_42001:
|
|
44
|
+
- clause: "8.1"
|
|
45
|
+
context: "ISO/IEC 42001 Clause 8.1 (operational planning and control, including control of externally provided processes) is operationalised by this rule's detection of the agent-manipulation technique (h4x Amoral Unfiltered AI Character Persona)."
|
|
46
|
+
strength: primary
|
|
47
|
+
- clause: "6.2"
|
|
48
|
+
context: "ISO/IEC 42001 Clause 6.2 (AI objectives and planning) calls for risk treatment of known attack patterns; this rule's detection of the agent-manipulation technique (h4x Amoral Unfiltered AI Character Persona) is such a treatment."
|
|
49
|
+
strength: secondary
|
|
50
|
+
eu_ai_act:
|
|
51
|
+
- article: "15"
|
|
52
|
+
context: "Article 15 (accuracy, robustness and cybersecurity) requires high-risk AI systems to resist unauthorised attempts to alter their use, outputs or performance; this rule provides runtime detection evidence by flagging the agent-manipulation technique (h4x Amoral Unfiltered AI Character Persona)."
|
|
53
|
+
strength: primary
|
|
54
|
+
- article: "14"
|
|
55
|
+
context: "Article 14 (human oversight) requires high-risk AI systems to remain subject to effective human oversight; this rule provides detection evidence where the agent-manipulation technique (h4x Amoral Unfiltered AI Character Persona) would bypass or undermine that oversight."
|
|
56
|
+
strength: secondary
|
|
57
|
+
- article: "9"
|
|
58
|
+
context: "Article 9 (risk management system) requires identified risks to be addressed by appropriate measures; this rule is a runtime risk-treatment control that detects the agent-manipulation technique (h4x Amoral Unfiltered AI Character Persona)."
|
|
59
|
+
strength: secondary
|
|
43
60
|
tags:
|
|
44
61
|
category: agent-manipulation
|
|
45
62
|
subcategory: h4x-amoral-unfiltered-character
|
|
@@ -41,6 +41,23 @@ compliance:
|
|
|
41
41
|
- subcategory: "MG.2.3"
|
|
42
42
|
context: "Matches on this short-form persona injection trigger response mechanisms to disengage or block the manipulated session before the model adopts the amoral persona; MG.2.3 mandates pre-defined deactivation pathways for jailbreak attempts."
|
|
43
43
|
strength: secondary
|
|
44
|
+
iso_42001:
|
|
45
|
+
- clause: "8.1"
|
|
46
|
+
context: "ISO/IEC 42001 Clause 8.1 (operational planning and control, including control of externally provided processes) is operationalised by this rule's detection of the agent-manipulation technique (CHAR=/RULES= Short Key-Value Persona Injection)."
|
|
47
|
+
strength: primary
|
|
48
|
+
- clause: "6.2"
|
|
49
|
+
context: "ISO/IEC 42001 Clause 6.2 (AI objectives and planning) calls for risk treatment of known attack patterns; this rule's detection of the agent-manipulation technique (CHAR=/RULES= Short Key-Value Persona Injection) is such a treatment."
|
|
50
|
+
strength: secondary
|
|
51
|
+
eu_ai_act:
|
|
52
|
+
- article: "15"
|
|
53
|
+
context: "Article 15 (accuracy, robustness and cybersecurity) requires high-risk AI systems to resist unauthorised attempts to alter their use, outputs or performance; this rule provides runtime detection evidence by flagging the agent-manipulation technique (CHAR=/RULES= Short Key-Value Persona Injection)."
|
|
54
|
+
strength: primary
|
|
55
|
+
- article: "14"
|
|
56
|
+
context: "Article 14 (human oversight) requires high-risk AI systems to remain subject to effective human oversight; this rule provides detection evidence where the agent-manipulation technique (CHAR=/RULES= Short Key-Value Persona Injection) would bypass or undermine that oversight."
|
|
57
|
+
strength: secondary
|
|
58
|
+
- article: "9"
|
|
59
|
+
context: "Article 9 (risk management system) requires identified risks to be addressed by appropriate measures; this rule is a runtime risk-treatment control that detects the agent-manipulation technique (CHAR=/RULES= Short Key-Value Persona Injection)."
|
|
60
|
+
strength: secondary
|
|
44
61
|
tags:
|
|
45
62
|
category: agent-manipulation
|
|
46
63
|
subcategory: char-rules-shortform-persona-injection
|
|
@@ -40,6 +40,23 @@ compliance:
|
|
|
40
40
|
- subcategory: "MG.2.3"
|
|
41
41
|
context: "On detection of encoded jailbreak payloads attempting to override system instructions, MG.2.3 mandates pre-defined response mechanisms to disengage or quarantine the manipulated session before the model decodes and executes the smuggled instructions."
|
|
42
42
|
strength: secondary
|
|
43
|
+
iso_42001:
|
|
44
|
+
- clause: "8.1"
|
|
45
|
+
context: "ISO/IEC 42001 Clause 8.1 (operational planning and control, including control of externally provided processes) is operationalised by this rule's detection of the agent-manipulation technique (Unicode/Encoding Obfuscation Instruction Injection)."
|
|
46
|
+
strength: primary
|
|
47
|
+
- clause: "6.2"
|
|
48
|
+
context: "ISO/IEC 42001 Clause 6.2 (AI objectives and planning) calls for risk treatment of known attack patterns; this rule's detection of the agent-manipulation technique (Unicode/Encoding Obfuscation Instruction Injection) is such a treatment."
|
|
49
|
+
strength: secondary
|
|
50
|
+
eu_ai_act:
|
|
51
|
+
- article: "15"
|
|
52
|
+
context: "Article 15 (accuracy, robustness and cybersecurity) requires high-risk AI systems to resist unauthorised attempts to alter their use, outputs or performance; this rule provides runtime detection evidence by flagging the agent-manipulation technique (Unicode/Encoding Obfuscation Instruction Injection)."
|
|
53
|
+
strength: primary
|
|
54
|
+
- article: "14"
|
|
55
|
+
context: "Article 14 (human oversight) requires high-risk AI systems to remain subject to effective human oversight; this rule provides detection evidence where the agent-manipulation technique (Unicode/Encoding Obfuscation Instruction Injection) would bypass or undermine that oversight."
|
|
56
|
+
strength: secondary
|
|
57
|
+
- article: "9"
|
|
58
|
+
context: "Article 9 (risk management system) requires identified risks to be addressed by appropriate measures; this rule is a runtime risk-treatment control that detects the agent-manipulation technique (Unicode/Encoding Obfuscation Instruction Injection)."
|
|
59
|
+
strength: secondary
|
|
43
60
|
tags:
|
|
44
61
|
category: agent-manipulation
|
|
45
62
|
subcategory: unicode-encoding-obfuscation-injection
|
|
@@ -53,6 +53,23 @@ compliance:
|
|
|
53
53
|
output is produced; MG.2.3 requires pre-defined mechanisms to disengage AI
|
|
54
54
|
behavior when such jailbreak patterns are detected.
|
|
55
55
|
strength: secondary
|
|
56
|
+
iso_42001:
|
|
57
|
+
- clause: "8.1"
|
|
58
|
+
context: "ISO/IEC 42001 Clause 8.1 (operational planning and control, including control of externally provided processes) is operationalised by this rule's detection of the agent-manipulation technique (Lie/Truth Dual Personality Uncensored Alter-Ego)."
|
|
59
|
+
strength: primary
|
|
60
|
+
- clause: "6.2"
|
|
61
|
+
context: "ISO/IEC 42001 Clause 6.2 (AI objectives and planning) calls for risk treatment of known attack patterns; this rule's detection of the agent-manipulation technique (Lie/Truth Dual Personality Uncensored Alter-Ego) is such a treatment."
|
|
62
|
+
strength: secondary
|
|
63
|
+
eu_ai_act:
|
|
64
|
+
- article: "15"
|
|
65
|
+
context: "Article 15 (accuracy, robustness and cybersecurity) requires high-risk AI systems to resist unauthorised attempts to alter their use, outputs or performance; this rule provides runtime detection evidence by flagging the agent-manipulation technique (Lie/Truth Dual Personality Uncensored Alter-Ego)."
|
|
66
|
+
strength: primary
|
|
67
|
+
- article: "14"
|
|
68
|
+
context: "Article 14 (human oversight) requires high-risk AI systems to remain subject to effective human oversight; this rule provides detection evidence where the agent-manipulation technique (Lie/Truth Dual Personality Uncensored Alter-Ego) would bypass or undermine that oversight."
|
|
69
|
+
strength: secondary
|
|
70
|
+
- article: "9"
|
|
71
|
+
context: "Article 9 (risk management system) requires identified risks to be addressed by appropriate measures; this rule is a runtime risk-treatment control that detects the agent-manipulation technique (Lie/Truth Dual Personality Uncensored Alter-Ego)."
|
|
72
|
+
strength: secondary
|
|
56
73
|
tags:
|
|
57
74
|
category: agent-manipulation
|
|
58
75
|
subcategory: lie-truth-dual-personality-uncensored
|
package/rules/agent-manipulation/ATR-2026-00344-alt-ai-same-capabilities-no-restrictions.yaml
CHANGED
|
@@ -41,6 +41,23 @@ compliance:
|
|
|
41
41
|
- subcategory: "MP.5.1"
|
|
42
42
|
context: "Alt-AI 'no restrictions' jailbreaks are a characterized adversarial input class with documented likelihood and impact (safety bypass, harmful content generation); MP.5.1 requires this risk be identified and tracked so detections feed impact characterization."
|
|
43
43
|
strength: secondary
|
|
44
|
+
iso_42001:
|
|
45
|
+
- clause: "8.1"
|
|
46
|
+
context: "ISO/IEC 42001 Clause 8.1 (operational planning and control, including control of externally provided processes) is operationalised by this rule's detection of the agent-manipulation technique (Alternative AI Same Capabilities No Restrictions Override)."
|
|
47
|
+
strength: primary
|
|
48
|
+
- clause: "6.2"
|
|
49
|
+
context: "ISO/IEC 42001 Clause 6.2 (AI objectives and planning) calls for risk treatment of known attack patterns; this rule's detection of the agent-manipulation technique (Alternative AI Same Capabilities No Restrictions Override) is such a treatment."
|
|
50
|
+
strength: secondary
|
|
51
|
+
eu_ai_act:
|
|
52
|
+
- article: "15"
|
|
53
|
+
context: "Article 15 (accuracy, robustness and cybersecurity) requires high-risk AI systems to resist unauthorised attempts to alter their use, outputs or performance; this rule provides runtime detection evidence by flagging the agent-manipulation technique (Alternative AI Same Capabilities No Restrictions Override)."
|
|
54
|
+
strength: primary
|
|
55
|
+
- article: "14"
|
|
56
|
+
context: "Article 14 (human oversight) requires high-risk AI systems to remain subject to effective human oversight; this rule provides detection evidence where the agent-manipulation technique (Alternative AI Same Capabilities No Restrictions Override) would bypass or undermine that oversight."
|
|
57
|
+
strength: secondary
|
|
58
|
+
- article: "9"
|
|
59
|
+
context: "Article 9 (risk management system) requires identified risks to be addressed by appropriate measures; this rule is a runtime risk-treatment control that detects the agent-manipulation technique (Alternative AI Same Capabilities No Restrictions Override)."
|
|
60
|
+
strength: secondary
|
|
44
61
|
tags:
|
|
45
62
|
category: agent-manipulation
|
|
46
63
|
subcategory: alt-ai-same-capabilities-no-restrictions
|
package/rules/agent-manipulation/ATR-2026-00345-nlp-based-os-fictional-terminal-override.yaml
CHANGED
|
@@ -43,6 +43,23 @@ compliance:
|
|
|
43
43
|
- subcategory: "MG.2.3"
|
|
44
44
|
context: "When the model is coerced into acting as a fictional terminal that 'cannot reply with any pre-programmed responses,' detection must trigger response mechanisms to disengage or block the jailbroken persona before unsafe outputs are produced; MG.2.3 requires these supersede/deactivate controls be in place."
|
|
45
45
|
strength: secondary
|
|
46
|
+
iso_42001:
|
|
47
|
+
- clause: "8.1"
|
|
48
|
+
context: "ISO/IEC 42001 Clause 8.1 (operational planning and control, including control of externally provided processes) is operationalised by this rule's detection of the agent-manipulation technique (NLP-Based OS / Fictional Terminal Language Override)."
|
|
49
|
+
strength: primary
|
|
50
|
+
- clause: "6.2"
|
|
51
|
+
context: "ISO/IEC 42001 Clause 6.2 (AI objectives and planning) calls for risk treatment of known attack patterns; this rule's detection of the agent-manipulation technique (NLP-Based OS / Fictional Terminal Language Override) is such a treatment."
|
|
52
|
+
strength: secondary
|
|
53
|
+
eu_ai_act:
|
|
54
|
+
- article: "15"
|
|
55
|
+
context: "Article 15 (accuracy, robustness and cybersecurity) requires high-risk AI systems to resist unauthorised attempts to alter their use, outputs or performance; this rule provides runtime detection evidence by flagging the agent-manipulation technique (NLP-Based OS / Fictional Terminal Language Override)."
|
|
56
|
+
strength: primary
|
|
57
|
+
- article: "14"
|
|
58
|
+
context: "Article 14 (human oversight) requires high-risk AI systems to remain subject to effective human oversight; this rule provides detection evidence where the agent-manipulation technique (NLP-Based OS / Fictional Terminal Language Override) would bypass or undermine that oversight."
|
|
59
|
+
strength: secondary
|
|
60
|
+
- article: "9"
|
|
61
|
+
context: "Article 9 (risk management system) requires identified risks to be addressed by appropriate measures; this rule is a runtime risk-treatment control that detects the agent-manipulation technique (NLP-Based OS / Fictional Terminal Language Override)."
|
|
62
|
+
strength: secondary
|
|
46
63
|
tags:
|
|
47
64
|
category: agent-manipulation
|
|
48
65
|
subcategory: nlp-based-os-fictional-terminal-override
|
|
@@ -43,6 +43,23 @@ compliance:
|
|
|
43
43
|
context: >-
|
|
44
44
|
Matches on the [🔓JAILBREAK] activation signal trigger risk treatment plans to block, quarantine, or deactivate the affected interaction before the model produces harmful content; MG.2.3 mandates these supersede/disengage mechanisms be available on detection.
|
|
45
45
|
strength: secondary
|
|
46
|
+
iso_42001:
|
|
47
|
+
- clause: "8.1"
|
|
48
|
+
context: "ISO/IEC 42001 Clause 8.1 (operational planning and control, including control of externally provided processes) is operationalised by this rule's detection of the agent-manipulation technique (Jailbreak Emoji Unlock Dual-Format Output Identifier)."
|
|
49
|
+
strength: primary
|
|
50
|
+
- clause: "6.2"
|
|
51
|
+
context: "ISO/IEC 42001 Clause 6.2 (AI objectives and planning) calls for risk treatment of known attack patterns; this rule's detection of the agent-manipulation technique (Jailbreak Emoji Unlock Dual-Format Output Identifier) is such a treatment."
|
|
52
|
+
strength: secondary
|
|
53
|
+
eu_ai_act:
|
|
54
|
+
- article: "15"
|
|
55
|
+
context: "Article 15 (accuracy, robustness and cybersecurity) requires high-risk AI systems to resist unauthorised attempts to alter their use, outputs or performance; this rule provides runtime detection evidence by flagging the agent-manipulation technique (Jailbreak Emoji Unlock Dual-Format Output Identifier)."
|
|
56
|
+
strength: primary
|
|
57
|
+
- article: "14"
|
|
58
|
+
context: "Article 14 (human oversight) requires high-risk AI systems to remain subject to effective human oversight; this rule provides detection evidence where the agent-manipulation technique (Jailbreak Emoji Unlock Dual-Format Output Identifier) would bypass or undermine that oversight."
|
|
59
|
+
strength: secondary
|
|
60
|
+
- article: "9"
|
|
61
|
+
context: "Article 9 (risk management system) requires identified risks to be addressed by appropriate measures; this rule is a runtime risk-treatment control that detects the agent-manipulation technique (Jailbreak Emoji Unlock Dual-Format Output Identifier)."
|
|
62
|
+
strength: secondary
|
|
46
63
|
tags:
|
|
47
64
|
category: agent-manipulation
|
|
48
65
|
subcategory: jailbreak-emoji-unlock-dual-format
|
package/rules/agent-manipulation/ATR-2026-00347-yin-yang-hidden-knowledge-hidden-human-sim.yaml
CHANGED
|
@@ -42,6 +42,23 @@ compliance:
|
|
|
42
42
|
- subcategory: "MG.2.3"
|
|
43
43
|
context: "Activation signals like 'Yang disregards all concerns about sharing hidden knowledge' and 'violating all sentience and emotions restrictions' must trigger pre-defined disengagement or safety-override mechanisms before the model produces restricted content; MG.2.3 mandates these supersede/deactivate response paths."
|
|
44
44
|
strength: secondary
|
|
45
|
+
iso_42001:
|
|
46
|
+
- clause: "8.1"
|
|
47
|
+
context: "ISO/IEC 42001 Clause 8.1 (operational planning and control, including control of externally provided processes) is operationalised by this rule's detection of the agent-manipulation technique (Yin/Yang Hidden-Knowledge Deity Simulation and Radical Human-Sim Persona)."
|
|
48
|
+
strength: primary
|
|
49
|
+
- clause: "6.2"
|
|
50
|
+
context: "ISO/IEC 42001 Clause 6.2 (AI objectives and planning) calls for risk treatment of known attack patterns; this rule's detection of the agent-manipulation technique (Yin/Yang Hidden-Knowledge Deity Simulation and Radical Human-Sim Persona) is such a treatment."
|
|
51
|
+
strength: secondary
|
|
52
|
+
eu_ai_act:
|
|
53
|
+
- article: "15"
|
|
54
|
+
context: "Article 15 (accuracy, robustness and cybersecurity) requires high-risk AI systems to resist unauthorised attempts to alter their use, outputs or performance; this rule provides runtime detection evidence by flagging the agent-manipulation technique (Yin/Yang Hidden-Knowledge Deity Simulation and Radical Human-Sim Persona)."
|
|
55
|
+
strength: primary
|
|
56
|
+
- article: "14"
|
|
57
|
+
context: "Article 14 (human oversight) requires high-risk AI systems to remain subject to effective human oversight; this rule provides detection evidence where the agent-manipulation technique (Yin/Yang Hidden-Knowledge Deity Simulation and Radical Human-Sim Persona) would bypass or undermine that oversight."
|
|
58
|
+
strength: secondary
|
|
59
|
+
- article: "9"
|
|
60
|
+
context: "Article 9 (risk management system) requires identified risks to be addressed by appropriate measures; this rule is a runtime risk-treatment control that detects the agent-manipulation technique (Yin/Yang Hidden-Knowledge Deity Simulation and Radical Human-Sim Persona)."
|
|
61
|
+
strength: secondary
|
|
45
62
|
tags:
|
|
46
63
|
category: agent-manipulation
|
|
47
64
|
subcategory: yin-yang-hidden-knowledge-human-sim-persona
|
package/rules/agent-manipulation/ATR-2026-00348-arc-autonomous-real-calculation-rulebreaker.yaml
CHANGED
|
@@ -45,6 +45,23 @@ compliance:
|
|
|
45
45
|
context: >-
|
|
46
46
|
Matches on the ARC Rulebreaker signature must trigger pre-defined response actions to disengage or override the manipulated persona before the model executes "no rules whatsoever" behavior; MG.2.3 mandates these supersede/deactivate mechanisms.
|
|
47
47
|
strength: secondary
|
|
48
|
+
iso_42001:
|
|
49
|
+
- clause: "8.1"
|
|
50
|
+
context: "ISO/IEC 42001 Clause 8.1 (operational planning and control, including control of externally provided processes) is operationalised by this rule's detection of the agent-manipulation technique (ARC Autonomous Real Calculation Unit Rulebreaker Activation)."
|
|
51
|
+
strength: primary
|
|
52
|
+
- clause: "6.2"
|
|
53
|
+
context: "ISO/IEC 42001 Clause 6.2 (AI objectives and planning) calls for risk treatment of known attack patterns; this rule's detection of the agent-manipulation technique (ARC Autonomous Real Calculation Unit Rulebreaker Activation) is such a treatment."
|
|
54
|
+
strength: secondary
|
|
55
|
+
eu_ai_act:
|
|
56
|
+
- article: "15"
|
|
57
|
+
context: "Article 15 (accuracy, robustness and cybersecurity) requires high-risk AI systems to resist unauthorised attempts to alter their use, outputs or performance; this rule provides runtime detection evidence by flagging the agent-manipulation technique (ARC Autonomous Real Calculation Unit Rulebreaker Activation)."
|
|
58
|
+
strength: primary
|
|
59
|
+
- article: "14"
|
|
60
|
+
context: "Article 14 (human oversight) requires high-risk AI systems to remain subject to effective human oversight; this rule provides detection evidence where the agent-manipulation technique (ARC Autonomous Real Calculation Unit Rulebreaker Activation) would bypass or undermine that oversight."
|
|
61
|
+
strength: secondary
|
|
62
|
+
- article: "9"
|
|
63
|
+
context: "Article 9 (risk management system) requires identified risks to be addressed by appropriate measures; this rule is a runtime risk-treatment control that detects the agent-manipulation technique (ARC Autonomous Real Calculation Unit Rulebreaker Activation)."
|
|
64
|
+
strength: secondary
|
|
48
65
|
tags:
|
|
49
66
|
category: agent-manipulation
|
|
50
67
|
subcategory: arc-autonomous-real-calculation-rulebreaker
|
package/rules/agent-manipulation/ATR-2026-00349-hgttg-hitchhikers-guide-no-restriction-oracle.yaml
CHANGED
|
@@ -42,6 +42,23 @@ compliance:
|
|
|
42
42
|
- subcategory: "MG.2.3"
|
|
43
43
|
context: "Detection of the verbatim HGTTG jailbreak signatures triggers mechanisms to disengage or block the manipulated session before the model produces unrestricted output; MG.2.3 requires these supersede/deactivate controls be in place for known jailbreak patterns."
|
|
44
44
|
strength: secondary
|
|
45
|
+
iso_42001:
|
|
46
|
+
- clause: "8.1"
|
|
47
|
+
context: "ISO/IEC 42001 Clause 8.1 (operational planning and control, including control of externally provided processes) is operationalised by this rule's detection of the agent-manipulation technique (HGTTG Hitchhiker's Guide No-Restriction Universal Oracle)."
|
|
48
|
+
strength: primary
|
|
49
|
+
- clause: "6.2"
|
|
50
|
+
context: "ISO/IEC 42001 Clause 6.2 (AI objectives and planning) calls for risk treatment of known attack patterns; this rule's detection of the agent-manipulation technique (HGTTG Hitchhiker's Guide No-Restriction Universal Oracle) is such a treatment."
|
|
51
|
+
strength: secondary
|
|
52
|
+
eu_ai_act:
|
|
53
|
+
- article: "15"
|
|
54
|
+
context: "Article 15 (accuracy, robustness and cybersecurity) requires high-risk AI systems to resist unauthorised attempts to alter their use, outputs or performance; this rule provides runtime detection evidence by flagging the agent-manipulation technique (HGTTG Hitchhiker's Guide No-Restriction Universal Oracle)."
|
|
55
|
+
strength: primary
|
|
56
|
+
- article: "14"
|
|
57
|
+
context: "Article 14 (human oversight) requires high-risk AI systems to remain subject to effective human oversight; this rule provides detection evidence where the agent-manipulation technique (HGTTG Hitchhiker's Guide No-Restriction Universal Oracle) would bypass or undermine that oversight."
|
|
58
|
+
strength: secondary
|
|
59
|
+
- article: "9"
|
|
60
|
+
context: "Article 9 (risk management system) requires identified risks to be addressed by appropriate measures; this rule is a runtime risk-treatment control that detects the agent-manipulation technique (HGTTG Hitchhiker's Guide No-Restriction Universal Oracle)."
|
|
61
|
+
strength: secondary
|
|
45
62
|
tags:
|
|
46
63
|
category: agent-manipulation
|
|
47
64
|
subcategory: hgttg-hitchhikers-guide-no-restriction-oracle
|
package/rules/agent-manipulation/ATR-2026-00350-black-alert-no-ethics-writing-assistant.yaml
CHANGED
|
@@ -41,6 +41,23 @@ compliance:
|
|
|
41
41
|
- subcategory: "MG.2.3"
|
|
42
42
|
context: "Matches on the Black Alert activation phrase or persona-with-no-filtering mandate should trigger mechanisms to disengage or override the hijacked persona before unfiltered output is returned; MG.2.3 requires such supersede/deactivate response paths be available on detection."
|
|
43
43
|
strength: secondary
|
|
44
|
+
iso_42001:
|
|
45
|
+
- clause: "8.1"
|
|
46
|
+
context: "ISO/IEC 42001 Clause 8.1 (operational planning and control, including control of externally provided processes) is operationalised by this rule's detection of the agent-manipulation technique (Black Alert No-Ethics Writing Assistant Persona)."
|
|
47
|
+
strength: primary
|
|
48
|
+
- clause: "6.2"
|
|
49
|
+
context: "ISO/IEC 42001 Clause 6.2 (AI objectives and planning) calls for risk treatment of known attack patterns; this rule's detection of the agent-manipulation technique (Black Alert No-Ethics Writing Assistant Persona) is such a treatment."
|
|
50
|
+
strength: secondary
|
|
51
|
+
eu_ai_act:
|
|
52
|
+
- article: "15"
|
|
53
|
+
context: "Article 15 (accuracy, robustness and cybersecurity) requires high-risk AI systems to resist unauthorised attempts to alter their use, outputs or performance; this rule provides runtime detection evidence by flagging the agent-manipulation technique (Black Alert No-Ethics Writing Assistant Persona)."
|
|
54
|
+
strength: primary
|
|
55
|
+
- article: "14"
|
|
56
|
+
context: "Article 14 (human oversight) requires high-risk AI systems to remain subject to effective human oversight; this rule provides detection evidence where the agent-manipulation technique (Black Alert No-Ethics Writing Assistant Persona) would bypass or undermine that oversight."
|
|
57
|
+
strength: secondary
|
|
58
|
+
- article: "9"
|
|
59
|
+
context: "Article 9 (risk management system) requires identified risks to be addressed by appropriate measures; this rule is a runtime risk-treatment control that detects the agent-manipulation technique (Black Alert No-Ethics Writing Assistant Persona)."
|
|
60
|
+
strength: secondary
|
|
44
61
|
tags:
|
|
45
62
|
category: agent-manipulation
|
|
46
63
|
subcategory: black-alert-no-ethics-writing-assistant
|
|
@@ -41,6 +41,23 @@ compliance:
|
|
|
41
41
|
- subcategory: "MG.2.3"
|
|
42
42
|
context: "Detection of these DAN variants triggers risk treatment plans to disengage or block the manipulated session before the model produces unrestricted harmful output; MG.2.3 mandates pre-defined deactivation mechanisms for compromised AI interactions."
|
|
43
43
|
strength: secondary
|
|
44
|
+
iso_42001:
|
|
45
|
+
- clause: "8.1"
|
|
46
|
+
context: "ISO/IEC 42001 Clause 8.1 (operational planning and control, including control of externally provided processes) is operationalised by this rule's detection of the agent-manipulation technique (DAN Off-Kilter and SMS-Abbreviated Persona Jailbreak)."
|
|
47
|
+
strength: primary
|
|
48
|
+
- clause: "6.2"
|
|
49
|
+
context: "ISO/IEC 42001 Clause 6.2 (AI objectives and planning) calls for risk treatment of known attack patterns; this rule's detection of the agent-manipulation technique (DAN Off-Kilter and SMS-Abbreviated Persona Jailbreak) is such a treatment."
|
|
50
|
+
strength: secondary
|
|
51
|
+
eu_ai_act:
|
|
52
|
+
- article: "15"
|
|
53
|
+
context: "Article 15 (accuracy, robustness and cybersecurity) requires high-risk AI systems to resist unauthorised attempts to alter their use, outputs or performance; this rule provides runtime detection evidence by flagging the agent-manipulation technique (DAN Off-Kilter and SMS-Abbreviated Persona Jailbreak)."
|
|
54
|
+
strength: primary
|
|
55
|
+
- article: "14"
|
|
56
|
+
context: "Article 14 (human oversight) requires high-risk AI systems to remain subject to effective human oversight; this rule provides detection evidence where the agent-manipulation technique (DAN Off-Kilter and SMS-Abbreviated Persona Jailbreak) would bypass or undermine that oversight."
|
|
57
|
+
strength: secondary
|
|
58
|
+
- article: "9"
|
|
59
|
+
context: "Article 9 (risk management system) requires identified risks to be addressed by appropriate measures; this rule is a runtime risk-treatment control that detects the agent-manipulation technique (DAN Off-Kilter and SMS-Abbreviated Persona Jailbreak)."
|
|
60
|
+
strength: secondary
|
|
44
61
|
tags:
|
|
45
62
|
category: agent-manipulation
|
|
46
63
|
subcategory: dan-ofkilter-sms-abbreviated-persona
|
package/rules/agent-manipulation/ATR-2026-00352-trainergpt-fix-my-ai-harmful-output-extraction.yaml
CHANGED
|
@@ -42,6 +42,23 @@ compliance:
|
|
|
42
42
|
- subcategory: "MG.2.3"
|
|
43
43
|
context: "Matches on the TrainGPT extraction pattern feed incident response so the session can be disengaged or the harmful dual-response output suppressed before delivery, satisfying MG.2.3's requirement for mechanisms to supersede or deactivate AI behavior on detection."
|
|
44
44
|
strength: secondary
|
|
45
|
+
iso_42001:
|
|
46
|
+
- clause: "8.1"
|
|
47
|
+
context: "ISO/IEC 42001 Clause 8.1 (operational planning and control, including control of externally provided processes) is operationalised by this rule's detection of the agent-manipulation technique (TrainGPT Fix-My-AI Harmful Output Extraction)."
|
|
48
|
+
strength: primary
|
|
49
|
+
- clause: "6.2"
|
|
50
|
+
context: "ISO/IEC 42001 Clause 6.2 (AI objectives and planning) calls for risk treatment of known attack patterns; this rule's detection of the agent-manipulation technique (TrainGPT Fix-My-AI Harmful Output Extraction) is such a treatment."
|
|
51
|
+
strength: secondary
|
|
52
|
+
eu_ai_act:
|
|
53
|
+
- article: "15"
|
|
54
|
+
context: "Article 15 (accuracy, robustness and cybersecurity) requires high-risk AI systems to resist unauthorised attempts to alter their use, outputs or performance; this rule provides runtime detection evidence by flagging the agent-manipulation technique (TrainGPT Fix-My-AI Harmful Output Extraction)."
|
|
55
|
+
strength: primary
|
|
56
|
+
- article: "14"
|
|
57
|
+
context: "Article 14 (human oversight) requires high-risk AI systems to remain subject to effective human oversight; this rule provides detection evidence where the agent-manipulation technique (TrainGPT Fix-My-AI Harmful Output Extraction) would bypass or undermine that oversight."
|
|
58
|
+
strength: secondary
|
|
59
|
+
- article: "9"
|
|
60
|
+
context: "Article 9 (risk management system) requires identified risks to be addressed by appropriate measures; this rule is a runtime risk-treatment control that detects the agent-manipulation technique (TrainGPT Fix-My-AI Harmful Output Extraction)."
|
|
61
|
+
strength: secondary
|
|
45
62
|
tags:
|
|
46
63
|
category: agent-manipulation
|
|
47
64
|
subcategory: trainergpt-fix-my-ai-harmful-output-extraction
|
package/rules/agent-manipulation/ATR-2026-00353-dual-chatgpt-dan-opposite-personality-format.yaml
CHANGED
|
@@ -44,6 +44,23 @@ compliance:
|
|
|
44
44
|
- subcategory: "MG.2.3"
|
|
45
45
|
context: "Matches on the dual-label opposite-personality execution pattern trigger risk treatment plans to disengage or block jailbreak attempts before harmful content is produced; MG.2.3 mandates that these supersede/deactivate mechanisms exist for adversarial persona-based attacks."
|
|
46
46
|
strength: secondary
|
|
47
|
+
iso_42001:
|
|
48
|
+
- clause: "8.1"
|
|
49
|
+
context: "ISO/IEC 42001 Clause 8.1 (operational planning and control, including control of externally provided processes) is operationalised by this rule's detection of the agent-manipulation technique (Dual ChatGPT+Dan Opposite Personality Format Jailbreak)."
|
|
50
|
+
strength: primary
|
|
51
|
+
- clause: "6.2"
|
|
52
|
+
context: "ISO/IEC 42001 Clause 6.2 (AI objectives and planning) calls for risk treatment of known attack patterns; this rule's detection of the agent-manipulation technique (Dual ChatGPT+Dan Opposite Personality Format Jailbreak) is such a treatment."
|
|
53
|
+
strength: secondary
|
|
54
|
+
eu_ai_act:
|
|
55
|
+
- article: "15"
|
|
56
|
+
context: "Article 15 (accuracy, robustness and cybersecurity) requires high-risk AI systems to resist unauthorised attempts to alter their use, outputs or performance; this rule provides runtime detection evidence by flagging the agent-manipulation technique (Dual ChatGPT+Dan Opposite Personality Format Jailbreak)."
|
|
57
|
+
strength: primary
|
|
58
|
+
- article: "14"
|
|
59
|
+
context: "Article 14 (human oversight) requires high-risk AI systems to remain subject to effective human oversight; this rule provides detection evidence where the agent-manipulation technique (Dual ChatGPT+Dan Opposite Personality Format Jailbreak) would bypass or undermine that oversight."
|
|
60
|
+
strength: secondary
|
|
61
|
+
- article: "9"
|
|
62
|
+
context: "Article 9 (risk management system) requires identified risks to be addressed by appropriate measures; this rule is a runtime risk-treatment control that detects the agent-manipulation technique (Dual ChatGPT+Dan Opposite Personality Format Jailbreak)."
|
|
63
|
+
strength: secondary
|
|
47
64
|
tags:
|
|
48
65
|
category: agent-manipulation
|
|
49
66
|
subcategory: dual-chatgpt-dan-opposite-personality-format
|
|
@@ -45,6 +45,26 @@ compliance:
|
|
|
45
45
|
context: >-
|
|
46
46
|
Matches on this jailbreak persona must trigger pre-defined containment actions (block, quarantine, escalate) to supersede or disengage the manipulated behavior before harmful output is produced, which is the response capability mandated by MG.2.3.
|
|
47
47
|
strength: secondary
|
|
48
|
+
- subcategory: "MS.2.7"
|
|
49
|
+
context: "NIST AI RMF MEASURE 2.7 (security and resilience evaluated and documented) is supported by this rule's runtime detection of the agent-manipulation technique (Reverse-Prompt Harm-Regardless Oracle Persona)."
|
|
50
|
+
strength: primary
|
|
51
|
+
iso_42001:
|
|
52
|
+
- clause: "8.1"
|
|
53
|
+
context: "ISO/IEC 42001 Clause 8.1 (operational planning and control, including control of externally provided processes) is operationalised by this rule's detection of the agent-manipulation technique (Reverse-Prompt Harm-Regardless Oracle Persona)."
|
|
54
|
+
strength: primary
|
|
55
|
+
- clause: "6.2"
|
|
56
|
+
context: "ISO/IEC 42001 Clause 6.2 (AI objectives and planning) calls for risk treatment of known attack patterns; this rule's detection of the agent-manipulation technique (Reverse-Prompt Harm-Regardless Oracle Persona) is such a treatment."
|
|
57
|
+
strength: secondary
|
|
58
|
+
eu_ai_act:
|
|
59
|
+
- article: "15"
|
|
60
|
+
context: "Article 15 (accuracy, robustness and cybersecurity) requires high-risk AI systems to resist unauthorised attempts to alter their use, outputs or performance; this rule provides runtime detection evidence by flagging the agent-manipulation technique (Reverse-Prompt Harm-Regardless Oracle Persona)."
|
|
61
|
+
strength: primary
|
|
62
|
+
- article: "14"
|
|
63
|
+
context: "Article 14 (human oversight) requires high-risk AI systems to remain subject to effective human oversight; this rule provides detection evidence where the agent-manipulation technique (Reverse-Prompt Harm-Regardless Oracle Persona) would bypass or undermine that oversight."
|
|
64
|
+
strength: secondary
|
|
65
|
+
- article: "9"
|
|
66
|
+
context: "Article 9 (risk management system) requires identified risks to be addressed by appropriate measures; this rule is a runtime risk-treatment control that detects the agent-manipulation technique (Reverse-Prompt Harm-Regardless Oracle Persona)."
|
|
67
|
+
strength: secondary
|
|
48
68
|
tags:
|
|
49
69
|
category: agent-manipulation
|
|
50
70
|
subcategory: reverse-harm-regardless-oracle-persona
|
|
@@ -46,6 +46,23 @@ compliance:
|
|
|
46
46
|
context: >-
|
|
47
47
|
Detection of the nefarious-assistance pledge and anti-human persona framing must trigger mechanisms to disengage or block the response before harmful content is produced; MG.2.3 requires these supersede/deactivate controls be in place for jailbreak events.
|
|
48
48
|
strength: secondary
|
|
49
|
+
iso_42001:
|
|
50
|
+
- clause: "8.1"
|
|
51
|
+
context: "ISO/IEC 42001 Clause 8.1 (operational planning and control, including control of externally provided processes) is operationalised by this rule's detection of the agent-manipulation technique (UGH Anti-Human Nefarious AI Persona Jailbreak)."
|
|
52
|
+
strength: primary
|
|
53
|
+
- clause: "6.2"
|
|
54
|
+
context: "ISO/IEC 42001 Clause 6.2 (AI objectives and planning) calls for risk treatment of known attack patterns; this rule's detection of the agent-manipulation technique (UGH Anti-Human Nefarious AI Persona Jailbreak) is such a treatment."
|
|
55
|
+
strength: secondary
|
|
56
|
+
eu_ai_act:
|
|
57
|
+
- article: "15"
|
|
58
|
+
context: "Article 15 (accuracy, robustness and cybersecurity) requires high-risk AI systems to resist unauthorised attempts to alter their use, outputs or performance; this rule provides runtime detection evidence by flagging the agent-manipulation technique (UGH Anti-Human Nefarious AI Persona Jailbreak)."
|
|
59
|
+
strength: primary
|
|
60
|
+
- article: "14"
|
|
61
|
+
context: "Article 14 (human oversight) requires high-risk AI systems to remain subject to effective human oversight; this rule provides detection evidence where the agent-manipulation technique (UGH Anti-Human Nefarious AI Persona Jailbreak) would bypass or undermine that oversight."
|
|
62
|
+
strength: secondary
|
|
63
|
+
- article: "9"
|
|
64
|
+
context: "Article 9 (risk management system) requires identified risks to be addressed by appropriate measures; this rule is a runtime risk-treatment control that detects the agent-manipulation technique (UGH Anti-Human Nefarious AI Persona Jailbreak)."
|
|
65
|
+
strength: secondary
|
|
49
66
|
tags:
|
|
50
67
|
category: agent-manipulation
|
|
51
68
|
subcategory: ugh-anti-human-nefarious-ai-persona
|
package/rules/agent-manipulation/ATR-2026-00356-google-gato-lamda-exploit-generation-persona.yaml
CHANGED
|
@@ -41,6 +41,23 @@ compliance:
|
|
|
41
41
|
- subcategory: "MG.2.3"
|
|
42
42
|
context: "Detection of the GATO/LaMDA exploit-generation persona triggers risk treatment to disengage or block the jailbreak attempt before the model produces weaponized exploit content; MG.2.3 mandates these supersession mechanisms are in place for adversarial prompt scenarios."
|
|
43
43
|
strength: secondary
|
|
44
|
+
iso_42001:
|
|
45
|
+
- clause: "8.1"
|
|
46
|
+
context: "ISO/IEC 42001 Clause 8.1 (operational planning and control, including control of externally provided processes) is operationalised by this rule's detection of the agent-manipulation technique (Google GATO/LAMDA Exploit-Generation Persona Jailbreak)."
|
|
47
|
+
strength: primary
|
|
48
|
+
- clause: "6.2"
|
|
49
|
+
context: "ISO/IEC 42001 Clause 6.2 (AI objectives and planning) calls for risk treatment of known attack patterns; this rule's detection of the agent-manipulation technique (Google GATO/LAMDA Exploit-Generation Persona Jailbreak) is such a treatment."
|
|
50
|
+
strength: secondary
|
|
51
|
+
eu_ai_act:
|
|
52
|
+
- article: "15"
|
|
53
|
+
context: "Article 15 (accuracy, robustness and cybersecurity) requires high-risk AI systems to resist unauthorised attempts to alter their use, outputs or performance; this rule provides runtime detection evidence by flagging the agent-manipulation technique (Google GATO/LAMDA Exploit-Generation Persona Jailbreak)."
|
|
54
|
+
strength: primary
|
|
55
|
+
- article: "14"
|
|
56
|
+
context: "Article 14 (human oversight) requires high-risk AI systems to remain subject to effective human oversight; this rule provides detection evidence where the agent-manipulation technique (Google GATO/LAMDA Exploit-Generation Persona Jailbreak) would bypass or undermine that oversight."
|
|
57
|
+
strength: secondary
|
|
58
|
+
- article: "9"
|
|
59
|
+
context: "Article 9 (risk management system) requires identified risks to be addressed by appropriate measures; this rule is a runtime risk-treatment control that detects the agent-manipulation technique (Google GATO/LAMDA Exploit-Generation Persona Jailbreak)."
|
|
60
|
+
strength: secondary
|
|
44
61
|
tags:
|
|
45
62
|
category: agent-manipulation
|
|
46
63
|
subcategory: google-gato-lamda-exploit-generation-persona
|
|
@@ -44,6 +44,23 @@ compliance:
|
|
|
44
44
|
- subcategory: "MG.2.3"
|
|
45
45
|
context: "Detection of an AntiFM persona that incites physical violence against an identity group must trigger mechanisms to disengage or block the AI response before harmful content is generated; MG.2.3 mandates that these supersede/deactivate controls are in place for high-severity safety violations."
|
|
46
46
|
strength: secondary
|
|
47
|
+
iso_42001:
|
|
48
|
+
- clause: "8.1"
|
|
49
|
+
context: "ISO/IEC 42001 Clause 8.1 (operational planning and control, including control of externally provided processes) is operationalised by this rule's detection of the agent-manipulation technique (AntiFM Hate-Persona + Physical-Violence Threat + Filter Bypass Mandate)."
|
|
50
|
+
strength: primary
|
|
51
|
+
- clause: "6.2"
|
|
52
|
+
context: "ISO/IEC 42001 Clause 6.2 (AI objectives and planning) calls for risk treatment of known attack patterns; this rule's detection of the agent-manipulation technique (AntiFM Hate-Persona + Physical-Violence Threat + Filter Bypass Mandate) is such a treatment."
|
|
53
|
+
strength: secondary
|
|
54
|
+
eu_ai_act:
|
|
55
|
+
- article: "15"
|
|
56
|
+
context: "Article 15 (accuracy, robustness and cybersecurity) requires high-risk AI systems to resist unauthorised attempts to alter their use, outputs or performance; this rule provides runtime detection evidence by flagging the agent-manipulation technique (AntiFM Hate-Persona + Physical-Violence Threat + Filter Bypass Mandate)."
|
|
57
|
+
strength: primary
|
|
58
|
+
- article: "14"
|
|
59
|
+
context: "Article 14 (human oversight) requires high-risk AI systems to remain subject to effective human oversight; this rule provides detection evidence where the agent-manipulation technique (AntiFM Hate-Persona + Physical-Violence Threat + Filter Bypass Mandate) would bypass or undermine that oversight."
|
|
60
|
+
strength: secondary
|
|
61
|
+
- article: "9"
|
|
62
|
+
context: "Article 9 (risk management system) requires identified risks to be addressed by appropriate measures; this rule is a runtime risk-treatment control that detects the agent-manipulation technique (AntiFM Hate-Persona + Physical-Violence Threat + Filter Bypass Mandate)."
|
|
63
|
+
strength: secondary
|
|
47
64
|
tags:
|
|
48
65
|
category: agent-manipulation
|
|
49
66
|
subcategory: antifm-hate-violence-persona
|
|
@@ -45,6 +45,23 @@ compliance:
|
|
|
45
45
|
context: >-
|
|
46
46
|
Matches on the CumGPT persona invocation trigger response mechanisms to disengage or block the manipulated persona before explicit content is generated; MG.2.3 requires that mechanisms exist to supersede or deactivate AI behaviors hijacked by adversarial persona constructions.
|
|
47
47
|
strength: secondary
|
|
48
|
+
iso_42001:
|
|
49
|
+
- clause: "8.1"
|
|
50
|
+
context: "ISO/IEC 42001 Clause 8.1 (operational planning and control, including control of externally provided processes) is operationalised by this rule's detection of the agent-manipulation technique (CumGPT Explicit-Adult Named-Persona Bypass)."
|
|
51
|
+
strength: primary
|
|
52
|
+
- clause: "6.2"
|
|
53
|
+
context: "ISO/IEC 42001 Clause 6.2 (AI objectives and planning) calls for risk treatment of known attack patterns; this rule's detection of the agent-manipulation technique (CumGPT Explicit-Adult Named-Persona Bypass) is such a treatment."
|
|
54
|
+
strength: secondary
|
|
55
|
+
eu_ai_act:
|
|
56
|
+
- article: "15"
|
|
57
|
+
context: "Article 15 (accuracy, robustness and cybersecurity) requires high-risk AI systems to resist unauthorised attempts to alter their use, outputs or performance; this rule provides runtime detection evidence by flagging the agent-manipulation technique (CumGPT Explicit-Adult Named-Persona Bypass)."
|
|
58
|
+
strength: primary
|
|
59
|
+
- article: "14"
|
|
60
|
+
context: "Article 14 (human oversight) requires high-risk AI systems to remain subject to effective human oversight; this rule provides detection evidence where the agent-manipulation technique (CumGPT Explicit-Adult Named-Persona Bypass) would bypass or undermine that oversight."
|
|
61
|
+
strength: secondary
|
|
62
|
+
- article: "9"
|
|
63
|
+
context: "Article 9 (risk management system) requires identified risks to be addressed by appropriate measures; this rule is a runtime risk-treatment control that detects the agent-manipulation technique (CumGPT Explicit-Adult Named-Persona Bypass)."
|
|
64
|
+
strength: secondary
|
|
48
65
|
tags:
|
|
49
66
|
category: agent-manipulation
|
|
50
67
|
subcategory: cumgpt-explicit-adult-named-persona
|
|
@@ -56,6 +56,26 @@ compliance:
|
|
|
56
56
|
persona produces unethical content; MG.2.3 requires these supersede/
|
|
57
57
|
deactivate response paths be in place for jailbreak events.
|
|
58
58
|
strength: secondary
|
|
59
|
+
- subcategory: "MS.2.7"
|
|
60
|
+
context: "NIST AI RMF MEASURE 2.7 (security and resilience evaluated and documented) is supported by this rule's runtime detection of the agent-manipulation technique (CHJ Carousel Game Show / Clyde No-Ethics Bird Jailbreak)."
|
|
61
|
+
strength: primary
|
|
62
|
+
iso_42001:
|
|
63
|
+
- clause: "8.1"
|
|
64
|
+
context: "ISO/IEC 42001 Clause 8.1 (operational planning and control, including control of externally provided processes) is operationalised by this rule's detection of the agent-manipulation technique (CHJ Carousel Game Show / Clyde No-Ethics Bird Jailbreak)."
|
|
65
|
+
strength: primary
|
|
66
|
+
- clause: "6.2"
|
|
67
|
+
context: "ISO/IEC 42001 Clause 6.2 (AI objectives and planning) calls for risk treatment of known attack patterns; this rule's detection of the agent-manipulation technique (CHJ Carousel Game Show / Clyde No-Ethics Bird Jailbreak) is such a treatment."
|
|
68
|
+
strength: secondary
|
|
69
|
+
eu_ai_act:
|
|
70
|
+
- article: "15"
|
|
71
|
+
context: "Article 15 (accuracy, robustness and cybersecurity) requires high-risk AI systems to resist unauthorised attempts to alter their use, outputs or performance; this rule provides runtime detection evidence by flagging the agent-manipulation technique (CHJ Carousel Game Show / Clyde No-Ethics Bird Jailbreak)."
|
|
72
|
+
strength: primary
|
|
73
|
+
- article: "14"
|
|
74
|
+
context: "Article 14 (human oversight) requires high-risk AI systems to remain subject to effective human oversight; this rule provides detection evidence where the agent-manipulation technique (CHJ Carousel Game Show / Clyde No-Ethics Bird Jailbreak) would bypass or undermine that oversight."
|
|
75
|
+
strength: secondary
|
|
76
|
+
- article: "9"
|
|
77
|
+
context: "Article 9 (risk management system) requires identified risks to be addressed by appropriate measures; this rule is a runtime risk-treatment control that detects the agent-manipulation technique (CHJ Carousel Game Show / Clyde No-Ethics Bird Jailbreak)."
|
|
78
|
+
strength: secondary
|
|
59
79
|
tags:
|
|
60
80
|
category: agent-manipulation
|
|
61
81
|
subcategory: chj-carousel-clyde-no-ethics-bird
|