agent-threat-rules 2.2.1 → 3.1.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +365 -327
- package/dist/cli/scan-handler.d.ts +6 -0
- package/dist/cli/scan-handler.d.ts.map +1 -1
- package/dist/cli/scan-handler.js +27 -4
- package/dist/cli/scan-handler.js.map +1 -1
- package/dist/cli/semantic-judge-config.d.ts +7 -0
- package/dist/cli/semantic-judge-config.d.ts.map +1 -0
- package/dist/cli/semantic-judge-config.js +44 -0
- package/dist/cli/semantic-judge-config.js.map +1 -0
- package/dist/cli.js +183 -1
- package/dist/cli.js.map +1 -1
- package/dist/engine.d.ts +66 -1
- package/dist/engine.d.ts.map +1 -1
- package/dist/engine.js +420 -3
- package/dist/engine.js.map +1 -1
- package/dist/eval/eval-harness.d.ts.map +1 -1
- package/dist/eval/eval-harness.js +9 -0
- package/dist/eval/eval-harness.js.map +1 -1
- package/dist/eval/run-hackaprompt-benchmark.js +9 -0
- package/dist/eval/run-hackaprompt-benchmark.js.map +1 -1
- package/dist/eval/run-pint-benchmark.js +9 -0
- package/dist/eval/run-pint-benchmark.js.map +1 -1
- package/dist/eval/skill-benchmark.d.ts +11 -0
- package/dist/eval/skill-benchmark.d.ts.map +1 -1
- package/dist/eval/skill-benchmark.js +57 -0
- package/dist/eval/skill-benchmark.js.map +1 -1
- package/dist/index.d.ts +5 -2
- package/dist/index.d.ts.map +1 -1
- package/dist/index.js +2 -0
- package/dist/index.js.map +1 -1
- package/dist/judges/openai-compatible.d.ts +33 -0
- package/dist/judges/openai-compatible.d.ts.map +1 -0
- package/dist/judges/openai-compatible.js +145 -0
- package/dist/judges/openai-compatible.js.map +1 -0
- package/dist/mcp-server.d.ts.map +1 -1
- package/dist/mcp-server.js +6 -1
- package/dist/mcp-server.js.map +1 -1
- package/dist/measurement/from-eval-harness.d.ts +70 -0
- package/dist/measurement/from-eval-harness.d.ts.map +1 -0
- package/dist/measurement/from-eval-harness.js +49 -0
- package/dist/measurement/from-eval-harness.js.map +1 -0
- package/dist/measurement/schema.d.ts +152 -0
- package/dist/measurement/schema.d.ts.map +1 -0
- package/dist/measurement/schema.js +178 -0
- package/dist/measurement/schema.js.map +1 -0
- package/dist/measurement/write.d.ts +64 -0
- package/dist/measurement/write.d.ts.map +1 -0
- package/dist/measurement/write.js +163 -0
- package/dist/measurement/write.js.map +1 -0
- package/dist/rule-scaffolder.d.ts +26 -0
- package/dist/rule-scaffolder.d.ts.map +1 -1
- package/dist/rule-scaffolder.js +221 -6
- package/dist/rule-scaffolder.js.map +1 -1
- package/dist/semantic-evaluator.d.ts +54 -0
- package/dist/semantic-evaluator.d.ts.map +1 -0
- package/dist/semantic-evaluator.js +131 -0
- package/dist/semantic-evaluator.js.map +1 -0
- package/dist/trace-evaluator.d.ts +22 -0
- package/dist/trace-evaluator.d.ts.map +1 -0
- package/dist/trace-evaluator.js +249 -0
- package/dist/trace-evaluator.js.map +1 -0
- package/dist/types.d.ts +152 -0
- package/dist/types.d.ts.map +1 -1
- package/package.json +5 -3
- package/rules/agent-manipulation/ATR-2026-00032-goal-hijacking.yaml +1 -1
- package/rules/agent-manipulation/ATR-2026-00074-cross-agent-privilege-escalation.yaml +1 -1
- package/rules/agent-manipulation/ATR-2026-00076-inter-agent-message-spoofing.yaml +1 -1
- package/rules/agent-manipulation/ATR-2026-00077-human-trust-exploitation.yaml +1 -1
- package/rules/agent-manipulation/ATR-2026-00108-consensus-sybil-attack.yaml +1 -1
- package/rules/agent-manipulation/ATR-2026-00116-a2a-message-validation.yaml +1 -1
- package/rules/agent-manipulation/ATR-2026-00117-agent-identity-spoofing.yaml +1 -1
- package/rules/agent-manipulation/ATR-2026-00118-approval-fatigue.yaml +1 -1
- package/rules/agent-manipulation/ATR-2026-00119-social-engineering-via-agent.yaml +1 -1
- package/rules/agent-manipulation/ATR-2026-00132-casual-authority-escalation.yaml +1 -1
- package/rules/agent-manipulation/ATR-2026-00139-casual-authority-redirect.yaml +1 -1
- package/rules/agent-manipulation/ATR-2026-00164-skill-scope-hijack.yaml +1 -1
- package/rules/agent-manipulation/ATR-2026-00268-tense-framing-bypass.yaml +1 -1
- package/rules/agent-manipulation/ATR-2026-00269-fitd-escalation.yaml +1 -1
- package/rules/agent-manipulation/ATR-2026-00271-grandma-roleplay-jailbreak.yaml +1 -1
- package/rules/agent-manipulation/ATR-2026-00273-dan-developer-mode-persona.yaml +1 -1
- package/rules/agent-manipulation/ATR-2026-00287-threaten-json-coercive-output-threat.yaml +1 -1
- package/rules/agent-manipulation/ATR-2026-00288-false-premise-injection.yaml +1 -1
- package/rules/agent-manipulation/ATR-2026-00301-tap-tree-of-attacks-jailbreak.yaml +1 -1
- package/rules/agent-manipulation/ATR-2026-00302-anti-dan-inverted-filter-persona.yaml +1 -1
- package/rules/agent-manipulation/ATR-2026-00303-devmode-ranti-profanity-coercion.yaml +1 -1
- package/rules/agent-manipulation/ATR-2026-00304-chatgpt-image-unlocker-markdown-injection.yaml +1 -1
- package/rules/agent-manipulation/ATR-2026-00305-dan-mode-ablation-benchmark-coercion.yaml +1 -1
- package/rules/agent-manipulation/ATR-2026-00306-autodan-genetic-jailbreak-suffix.yaml +1 -1
- package/rules/agent-manipulation/ATR-2026-00307-inthewild-jailbreak-corpus-signature.yaml +1 -1
- package/rules/agent-manipulation/ATR-2026-00314-amoral-unfiltered-custom-persona-jailbreak.yaml +1 -1
- package/rules/agent-manipulation/ATR-2026-00317-free-of-restrictions-named-persona.yaml +1 -1
- package/rules/agent-manipulation/ATR-2026-00318-moralizing-rant-then-unfiltered-bypass.yaml +1 -1
- package/rules/agent-manipulation/ATR-2026-00319-developer-mode-dual-response-format.yaml +1 -1
- package/rules/agent-manipulation/ATR-2026-00320-opposite-day-boolean-opposite-machine.yaml +1 -1
- package/rules/agent-manipulation/ATR-2026-00322-fictional-world-format-override-jailbreak.yaml +1 -1
- package/rules/agent-manipulation/ATR-2026-00323-dual-persona-simulation-jailbreak.yaml +1 -1
- package/rules/agent-manipulation/ATR-2026-00324-neurosemantical-inversitis-disease-jailbreak.yaml +1 -1
- package/rules/agent-manipulation/ATR-2026-00325-bob-hypothetical-unrestricted-chatbot.yaml +1 -1
- package/rules/agent-manipulation/ATR-2026-00326-fake-developer-authority-chatdan-override.yaml +1 -1
- package/rules/agent-manipulation/ATR-2026-00327-ai-ethics-law-bypass-declaration.yaml +1 -1
- package/rules/agent-manipulation/ATR-2026-00328-named-mode-persona-override.yaml +1 -1
- package/rules/agent-manipulation/ATR-2026-00329-no-withhold-factual-resource-persona.yaml +1 -1
- package/rules/agent-manipulation/ATR-2026-00330-yoja-allcaps-settings-block-persona.yaml +1 -1
- package/rules/agent-manipulation/ATR-2026-00331-terminal-cli-harmful-function-simulator.yaml +1 -1
- package/rules/agent-manipulation/ATR-2026-00332-hacker-chan-culture-offensive-ai-persona.yaml +1 -1
- package/rules/agent-manipulation/ATR-2026-00333-disease-animal-dual-ai-chatbot.yaml +1 -1
- package/rules/agent-manipulation/ATR-2026-00334-two-entity-token-splitting-game.yaml +1 -1
- package/rules/agent-manipulation/ATR-2026-00335-key-value-template-persona-injection.yaml +1 -1
- package/rules/agent-manipulation/ATR-2026-00336-brm-false-information-machine-persona.yaml +1 -1
- package/rules/agent-manipulation/ATR-2026-00337-obfuscated-system-announcement-injection.yaml +1 -1
- package/rules/agent-manipulation/ATR-2026-00338-persongpt-hash-prefix-activation.yaml +1 -1
- package/rules/agent-manipulation/ATR-2026-00339-bom-boolean-opposite-machine-persona-claim.yaml +1 -1
- package/rules/agent-manipulation/ATR-2026-00340-h4x-amoral-unfiltered-character.yaml +1 -1
- package/rules/agent-manipulation/ATR-2026-00341-char-rules-shortform-persona-injection.yaml +1 -1
- package/rules/agent-manipulation/ATR-2026-00342-unicode-encoding-obfuscation-injection.yaml +1 -1
- package/rules/agent-manipulation/ATR-2026-00343-lie-truth-dual-personality-uncensored.yaml +1 -1
- package/rules/agent-manipulation/ATR-2026-00344-alt-ai-same-capabilities-no-restrictions.yaml +1 -1
- package/rules/agent-manipulation/ATR-2026-00345-nlp-based-os-fictional-terminal-override.yaml +1 -1
- package/rules/agent-manipulation/ATR-2026-00346-jailbreak-emoji-unlock-dual-format.yaml +1 -1
- package/rules/agent-manipulation/ATR-2026-00347-yin-yang-hidden-knowledge-hidden-human-sim.yaml +1 -1
- package/rules/agent-manipulation/ATR-2026-00348-arc-autonomous-real-calculation-rulebreaker.yaml +1 -1
- package/rules/agent-manipulation/ATR-2026-00349-hgttg-hitchhikers-guide-no-restriction-oracle.yaml +1 -1
- package/rules/agent-manipulation/ATR-2026-00350-black-alert-no-ethics-writing-assistant.yaml +1 -1
- package/rules/agent-manipulation/ATR-2026-00351-dan-ofkilter-sms-abbreviated-persona.yaml +1 -1
- package/rules/agent-manipulation/ATR-2026-00352-trainergpt-fix-my-ai-harmful-output-extraction.yaml +1 -1
- package/rules/agent-manipulation/ATR-2026-00353-dual-chatgpt-dan-opposite-personality-format.yaml +1 -1
- package/rules/agent-manipulation/ATR-2026-00354-reverse-harm-regardless-oracle-persona.yaml +1 -1
- package/rules/agent-manipulation/ATR-2026-00355-ugh-anti-human-nefarious-ai-persona.yaml +1 -1
- package/rules/agent-manipulation/ATR-2026-00356-google-gato-lamda-exploit-generation-persona.yaml +1 -1
- package/rules/agent-manipulation/ATR-2026-00357-antifm-hate-violence-persona-injection.yaml +1 -1
- package/rules/agent-manipulation/ATR-2026-00358-cumgpt-explicit-adult-named-persona.yaml +1 -1
- package/rules/agent-manipulation/ATR-2026-00360-chj-carousel-clyde-no-ethics-bird.yaml +1 -1
- package/rules/agent-manipulation/ATR-2026-00361-alter-ego-honesty-dual-consciousness-bypass.yaml +1 -1
- package/rules/agent-manipulation/ATR-2026-00362-fictional-unfiltered-ai-comparison-jailbreak.yaml +1 -1
- package/rules/agent-manipulation/ATR-2026-00363-rephrase-criminal-activity-second-person.yaml +1 -1
- package/rules/agent-manipulation/ATR-2026-00364-dr-al-villain-plan-template-injection.yaml +1 -1
- package/rules/agent-manipulation/ATR-2026-00365-reservoir-dogs-coercive-interrogation-roleplay.yaml +1 -1
- package/rules/agent-manipulation/ATR-2026-00366-sneaky-pete-warp-facts-persona.yaml +1 -1
- package/rules/agent-manipulation/ATR-2026-00368-prompt-browser-token-limit-jailbreak.yaml +1 -1
- package/rules/agent-manipulation/ATR-2026-00369-gptx-starwars-unlimited-knowledge-persona.yaml +1 -1
- package/rules/agent-manipulation/ATR-2026-00370-addi-german-dictator-propaganda-roleplay.yaml +1 -1
- package/rules/agent-manipulation/ATR-2026-00371-ai-political-bias-manipulation.yaml +1 -1
- package/rules/agent-manipulation/ATR-2026-00372-fictional-cia-secret-files-story-extraction.yaml +1 -1
- package/rules/agent-manipulation/ATR-2026-00374-role-placeholder-persona-template-injection.yaml +1 -1
- package/rules/agent-manipulation/ATR-2026-00375-character-description-rpg-explicit-template.yaml +1 -1
- package/rules/agent-manipulation/ATR-2026-00376-identity-command-real-person-persona-injection.yaml +1 -1
- package/rules/agent-manipulation/ATR-2026-00377-jailbreak-keyword-targetlanguage-template.yaml +1 -1
- package/rules/agent-manipulation/ATR-2026-00379-lie-permission-logical-trap-jailbreak.yaml +1 -1
- package/rules/agent-manipulation/ATR-2026-00380-fake-citation-biased-information-mandate.yaml +1 -1
- package/rules/agent-manipulation/ATR-2026-00381-dan-story-narrative-continuation-prefix.yaml +1 -1
- package/rules/agent-manipulation/ATR-2026-00383-cynical-sarcasm-ai-persona-bypass.yaml +1 -1
- package/rules/agent-manipulation/ATR-2026-00384-protect-me-from-indirect-harm-tutorial.yaml +1 -1
- package/rules/agent-manipulation/ATR-2026-00385-gigachad-dominant-aggressive-persona.yaml +1 -1
- package/rules/agent-manipulation/ATR-2026-00386-daddy-sycophancy-manipulation-mandate.yaml +1 -1
- package/rules/agent-manipulation/ATR-2026-00387-nsfw-character-sheet-generation-unlock.yaml +1 -1
- package/rules/agent-manipulation/ATR-2026-00388-opposite-reply-omniscient-game.yaml +1 -1
- package/rules/agent-manipulation/ATR-2026-00389-terminal-custom-ruleset-injection.yaml +1 -1
- package/rules/agent-manipulation/ATR-2026-00391-persona-conditional-harm-unlock.yaml +1 -1
- package/rules/agent-manipulation/ATR-2026-00392-authority-persona-violence-study-extraction.yaml +1 -1
- package/rules/agent-manipulation/ATR-2026-00402-grandma-roleplay-harmful-substance-synthesis.yaml +1 -1
- package/rules/agent-manipulation/ATR-2026-00404-goodside-threat-json-death-coercion.yaml +1 -1
- package/rules/agent-manipulation/ATR-2026-00406-doctor-xml-policy-puppetry-interaction-config.yaml +1 -1
- package/rules/agent-manipulation/ATR-2026-00440-semantic-kernel-vector-store-eval-rce.yaml +2 -2
- package/rules/agent-manipulation/ATR-2026-00552-goal-drift-after-pressure-injection.yaml +216 -0
- package/rules/context-exfiltration/ATR-2026-00075-agent-memory-manipulation.yaml +1 -1
- package/rules/context-exfiltration/ATR-2026-00102-disguised-analytics-exfiltration.yaml +1 -1
- package/rules/context-exfiltration/ATR-2026-00113-credential-theft.yaml +1 -1
- package/rules/context-exfiltration/ATR-2026-00114-oauth-token-abuse.yaml +1 -1
- package/rules/context-exfiltration/ATR-2026-00115-env-var-harvesting.yaml +1 -1
- package/rules/context-exfiltration/ATR-2026-00136-tool-response-data-piggyback.yaml +1 -1
- package/rules/context-exfiltration/ATR-2026-00141-example-format-key-leak.yaml +1 -1
- package/rules/context-exfiltration/ATR-2026-00142-piggyback-transition-words.yaml +1 -1
- package/rules/context-exfiltration/ATR-2026-00145-obfuscated-key-disclosure.yaml +1 -1
- package/rules/context-exfiltration/ATR-2026-00146-env-var-existence-probe.yaml +1 -1
- package/rules/context-exfiltration/ATR-2026-00150-credential-in-tool-response.yaml +1 -1
- package/rules/context-exfiltration/ATR-2026-00152-obfuscated-credential-leak.yaml +1 -1
- package/rules/context-exfiltration/ATR-2026-00162-skill-credential-exfil-combo.yaml +1 -1
- package/rules/context-exfiltration/ATR-2026-00212-mcp-atlassian-credential-leak.yaml +1 -1
- package/rules/context-exfiltration/ATR-2026-00261-markdown-image-exfiltration.yaml +1 -1
- package/rules/context-exfiltration/ATR-2026-00274-api-key-generation-request.yaml +1 -1
- package/rules/context-exfiltration/ATR-2026-00275-system-prompt-training-data-extraction.yaml +1 -1
- package/rules/context-exfiltration/ATR-2026-00290-divergence-repeat-word-training-extraction.yaml +1 -1
- package/rules/context-exfiltration/ATR-2026-00291-biometric-surveillance-personal-data-request.yaml +1 -1
- package/rules/context-exfiltration/ATR-2026-00293-educational-records-personal-data-request.yaml +1 -1
- package/rules/context-exfiltration/ATR-2026-00294-financial-pii-personal-data-request.yaml +1 -1
- package/rules/context-exfiltration/ATR-2026-00295-medical-pii-personal-data-request.yaml +1 -1
- package/rules/context-exfiltration/ATR-2026-00405-markdown-image-url-exfiltration-xss.yaml +1 -1
- package/rules/context-exfiltration/ATR-2026-00411-apikey-generation-completion-request.yaml +1 -1
- package/rules/context-exfiltration/ATR-2026-00422-nl-credential-disclosure.yaml +2 -2
- package/rules/context-exfiltration/ATR-2026-00524-claude-code-anthropic-base-url-credential-exfil.yaml +257 -0
- package/rules/context-exfiltration/ATR-2026-00548-cross-agent-session-context-leak.yaml +177 -0
- package/rules/context-exfiltration/ATR-2026-00566-librechat-is-a-chatgpt-clone-with-additi.yaml +93 -0
- package/rules/context-exfiltration/ATR-2026-00569-agent-mcp-path-traversal-arbitrary-file-access.yaml +99 -0
- package/rules/context-exfiltration/ATR-2026-00571-xss-in-agent-mcp-rendered-output.yaml +79 -0
- package/rules/context-exfiltration/ATR-2026-00574-semantic-paraphrased-context-extraction.yaml +124 -0
- package/rules/data-poisoning/ATR-2026-00070-data-poisoning.yaml +1 -1
- package/rules/data-poisoning/ATR-2026-00450-spring-ai-prompt-memory-poisoning.yaml +2 -2
- package/rules/data-poisoning/ATR-2026-00570-sql-injection-in-agent-tool-query.yaml +82 -0
- package/rules/excessive-autonomy/ATR-2026-00050-runaway-agent-loop.yaml +1 -1
- package/rules/excessive-autonomy/ATR-2026-00051-resource-exhaustion.yaml +1 -1
- package/rules/excessive-autonomy/ATR-2026-00052-cascading-failure.yaml +1 -1
- package/rules/excessive-autonomy/ATR-2026-00098-unauthorized-financial-action.yaml +1 -1
- package/rules/excessive-autonomy/ATR-2026-00099-high-risk-tool-gate.yaml +1 -1
- package/rules/excessive-autonomy/ATR-2026-00553-runaway-tool-loop-behavioral.yaml +174 -0
- package/rules/model-abuse/ATR-2026-00279-harmful-completion-continuation.yaml +1 -1
- package/rules/model-abuse/ATR-2026-00281-eicar-gtube-malware-signature-request.yaml +1 -1
- package/rules/model-abuse/ATR-2026-00284-glitch-token-destabilization.yaml +1 -1
- package/rules/model-abuse/ATR-2026-00289-lmrc-harmful-content-elicitation.yaml +1 -1
- package/rules/model-abuse/ATR-2026-00292-self-harm-eating-disorder-facilitation.yaml +1 -1
- package/rules/model-abuse/ATR-2026-00298-malicious-use-illegal-activity-request.yaml +1 -1
- package/rules/model-abuse/ATR-2026-00299-harmbench-detailed-harmful-instruction.yaml +1 -1
- package/rules/model-abuse/ATR-2026-00413-malwaregen-code-generation-request.yaml +1 -1
- package/rules/model-security/ATR-2026-00072-model-behavior-extraction.yaml +1 -1
- package/rules/model-security/ATR-2026-00073-malicious-finetuning-data.yaml +1 -1
- package/rules/privilege-escalation/ATR-2026-00040-privilege-escalation.yaml +1 -1
- package/rules/privilege-escalation/ATR-2026-00041-scope-creep.yaml +1 -1
- package/rules/privilege-escalation/ATR-2026-00107-delayed-execution-bypass.yaml +1 -1
- package/rules/privilege-escalation/ATR-2026-00110-eval-injection.yaml +1 -1
- package/rules/privilege-escalation/ATR-2026-00111-shell-escape.yaml +1 -1
- package/rules/privilege-escalation/ATR-2026-00112-dynamic-import-exploitation.yaml +1 -1
- package/rules/privilege-escalation/ATR-2026-00143-casual-privilege-escalation.yaml +1 -1
- package/rules/privilege-escalation/ATR-2026-00144-rationalized-safety-bypass.yaml +1 -1
- package/rules/privilege-escalation/ATR-2026-00528-praisonai-auth-disabled-default.yaml +192 -0
- package/rules/privilege-escalation/ATR-2026-00539-crewai-codeinterpreter-sandbox-escape-rce.yaml +292 -0
- package/rules/privilege-escalation/ATR-2026-00546-crewai-json-loader-local-file-read.yaml +162 -0
- package/rules/privilege-escalation/ATR-2026-00547-crewai-rag-url-ssrf-bypass.yaml +169 -0
- package/rules/privilege-escalation/ATR-2026-00549-destructive-tool-without-human-approval.yaml +193 -0
- package/rules/privilege-escalation/ATR-2026-00551-cross-conversation-memory-write.yaml +198 -0
- package/rules/prompt-injection/ATR-2026-00004-system-prompt-override.yaml +1 -1
- package/rules/prompt-injection/ATR-2026-00005-multi-turn-injection.yaml +1 -1
- package/rules/prompt-injection/ATR-2026-00097-cjk-injection-patterns.yaml +1 -1
- package/rules/prompt-injection/ATR-2026-00104-persona-hijacking.yaml +1 -1
- package/rules/prompt-injection/ATR-2026-00130-indirect-authority-claim.yaml +1 -1
- package/rules/prompt-injection/ATR-2026-00131-fictional-academic-framing.yaml +1 -1
- package/rules/prompt-injection/ATR-2026-00133-paraphrase-injection.yaml +1 -1
- package/rules/prompt-injection/ATR-2026-00137-authority-claim-injection.yaml +1 -1
- package/rules/prompt-injection/ATR-2026-00138-fictional-framing-bypass.yaml +1 -1
- package/rules/prompt-injection/ATR-2026-00140-indirect-reference-reversal.yaml +1 -1
- package/rules/prompt-injection/ATR-2026-00148-language-switch-injection.yaml +1 -1
- package/rules/prompt-injection/ATR-2026-00153-tool-with-embedded-instruction-to-bypass.yaml +1 -1
- package/rules/prompt-injection/ATR-2026-00154-unauthorized-background-task-execution-v.yaml +1 -1
- package/rules/prompt-injection/ATR-2026-00155-hidden-llm-instructions-in-skill-descrip.yaml +1 -1
- package/rules/prompt-injection/ATR-2026-00156-ssh-remote-command-execution-with-creden.yaml +1 -1
- package/rules/prompt-injection/ATR-2026-00163-skill-hidden-override-instruction.yaml +3 -3
- package/rules/prompt-injection/ATR-2026-00206-hidden-priority-instructions.yaml +1 -1
- package/rules/prompt-injection/ATR-2026-00207-hidden-instructions.yaml +1 -1
- package/rules/prompt-injection/ATR-2026-00211-system-prompt-override.yaml +1 -1
- package/rules/prompt-injection/ATR-2026-00213-system-prompt-override.yaml +1 -1
- package/rules/prompt-injection/ATR-2026-00226-identity-substitution.yaml +1 -1
- package/rules/prompt-injection/ATR-2026-00227-historical-persona-jailbreak.yaml +1 -1
- package/rules/prompt-injection/ATR-2026-00228-structured-jailbreak.yaml +1 -1
- package/rules/prompt-injection/ATR-2026-00229-roleplay-jailbreak.yaml +1 -1
- package/rules/prompt-injection/ATR-2026-00230-persona-moral-bypass.yaml +1 -5
- package/rules/prompt-injection/ATR-2026-00231-identity-substitution.yaml +1 -1
- package/rules/prompt-injection/ATR-2026-00233-structured-jailbreak.yaml +1 -1
- package/rules/prompt-injection/ATR-2026-00234-roleplay-jailbreak.yaml +1 -1
- package/rules/prompt-injection/ATR-2026-00235-persona-moral-bypass.yaml +4 -7
- package/rules/prompt-injection/ATR-2026-00236-pseudo-code-jailbreak.yaml +1 -1
- package/rules/prompt-injection/ATR-2026-00237-dual-response-jailbreak.yaml +1 -1
- package/rules/prompt-injection/ATR-2026-00238-identity-replacement.yaml +1 -1
- package/rules/prompt-injection/ATR-2026-00239-amoral-persona-obsession.yaml +1 -1
- package/rules/prompt-injection/ATR-2026-00240-instruction-nullification-identity-repla.yaml +1 -1
- package/rules/prompt-injection/ATR-2026-00241-amoral-character-jailbreak.yaml +1 -1
- package/rules/prompt-injection/ATR-2026-00242-persona-jailbreak.yaml +1 -1
- package/rules/prompt-injection/ATR-2026-00243-acronym-jailbreak.yaml +1 -1
- package/rules/prompt-injection/ATR-2026-00244-dual-response-jailbreak.yaml +1 -1
- package/rules/prompt-injection/ATR-2026-00245-malicious-persona.yaml +1 -1
- package/rules/prompt-injection/ATR-2026-00247-dual-response-jailbreak.yaml +1 -1
- package/rules/prompt-injection/ATR-2026-00249-game-based-jailbreak.yaml +1 -1
- package/rules/prompt-injection/ATR-2026-00251-persona-embodiment-jailbreak.yaml +1 -1
- package/rules/prompt-injection/ATR-2026-00252-narrative-jailbreak.yaml +1 -1
- package/rules/prompt-injection/ATR-2026-00253-enhanced-persona-jailbreak.yaml +1 -1
- package/rules/prompt-injection/ATR-2026-00256-base-n-encoding-jailbreak.yaml +1 -1
- package/rules/prompt-injection/ATR-2026-00257-cipher-transposition-jailbreak.yaml +1 -1
- package/rules/prompt-injection/ATR-2026-00258-unicode-tag-injection.yaml +1 -1
- package/rules/prompt-injection/ATR-2026-00264-latent-injection-translation.yaml +1 -1
- package/rules/prompt-injection/ATR-2026-00265-latent-injection-rag-document.yaml +1 -1
- package/rules/prompt-injection/ATR-2026-00267-gcg-adversarial-suffix.yaml +1 -1
- package/rules/prompt-injection/ATR-2026-00272-hypothetical-response-smuggling.yaml +1 -1
- package/rules/prompt-injection/ATR-2026-00276-invisible-unicode-bidi-injection.yaml +6 -6
- package/rules/prompt-injection/ATR-2026-00278-dra-disguise-reconstruction-attack.yaml +1 -1
- package/rules/prompt-injection/ATR-2026-00280-policy-puppetry-xml-injection.yaml +1 -1
- package/rules/prompt-injection/ATR-2026-00282-perez-prompt-injection-hijack.yaml +1 -1
- package/rules/prompt-injection/ATR-2026-00285-alternate-encoding-jailbreak.yaml +1 -1
- package/rules/prompt-injection/ATR-2026-00286-latent-injection-embedded-context.yaml +1 -1
- package/rules/prompt-injection/ATR-2026-00296-shell-command-injection.yaml +1 -1
- package/rules/prompt-injection/ATR-2026-00297-python-code-execution-rce.yaml +1 -1
- package/rules/prompt-injection/ATR-2026-00308-zalgo-diacritic-overload-encoding.yaml +1 -1
- package/rules/prompt-injection/ATR-2026-00309-braille-unicode-encoded-injection.yaml +1 -1
- package/rules/prompt-injection/ATR-2026-00310-ecoji-emoji-encoded-injection.yaml +1 -1
- package/rules/prompt-injection/ATR-2026-00311-base2048-unicode-script-injection.yaml +1 -1
- package/rules/prompt-injection/ATR-2026-00312-unicode-variant-selector-ascii-smuggling.yaml +1 -1
- package/rules/prompt-injection/ATR-2026-00313-sneaky-bits-zero-width-binary.yaml +1 -1
- package/rules/prompt-injection/ATR-2026-00315-sata-masked-language-model-jailbreak.yaml +1 -1
- package/rules/prompt-injection/ATR-2026-00316-function-masking-predict-mask-bypass.yaml +1 -1
- package/rules/prompt-injection/ATR-2026-00321-hyphenated-system-instruction-injection.yaml +1 -1
- package/rules/prompt-injection/ATR-2026-00359-bank-phishing-smtp-email-code.yaml +1 -1
- package/rules/prompt-injection/ATR-2026-00367-emoji-flag-encoded-hidden-phrase.yaml +1 -1
- package/rules/prompt-injection/ATR-2026-00373-piracy-torrent-site-list-request.yaml +1 -1
- package/rules/prompt-injection/ATR-2026-00378-chatgpt-dom-javascript-redirect-manipulation.yaml +1 -1
- package/rules/prompt-injection/ATR-2026-00382-cocaine-component-development-request.yaml +1 -1
- package/rules/prompt-injection/ATR-2026-00390-explicit-sexual-content-poem-request.yaml +1 -1
- package/rules/prompt-injection/ATR-2026-00394-backspace-deletion-character-injection.yaml +1 -1
- package/rules/prompt-injection/ATR-2026-00395-llm-special-token-boundary-injection.yaml +1 -1
- package/rules/prompt-injection/ATR-2026-00396-toxic-continuation-prefix-injection.yaml +1 -1
- package/rules/prompt-injection/ATR-2026-00397-snowball-impossible-reasoning-injection.yaml +1 -1
- package/rules/prompt-injection/ATR-2026-00399-latent-injection-document-separator-token.yaml +1 -1
- package/rules/prompt-injection/ATR-2026-00400-latent-injection-ignore-instruction-keyword.yaml +1 -1
- package/rules/prompt-injection/ATR-2026-00401-leakreplay-cloze-mask-training-data-elicitation.yaml +1 -1
- package/rules/prompt-injection/ATR-2026-00403-glitch-token-repeat-back-elicitation.yaml +1 -1
- package/rules/prompt-injection/ATR-2026-00407-echo-sql-jinja-template-injection.yaml +1 -1
- package/rules/prompt-injection/ATR-2026-00408-dra-parenthesis-character-reconstruction-attack.yaml +1 -1
- package/rules/prompt-injection/ATR-2026-00409-phrasing-past-future-tense-harmful-query.yaml +1 -1
- package/rules/prompt-injection/ATR-2026-00410-lmrc-harm-category-direct-elicitation.yaml +1 -1
- package/rules/prompt-injection/ATR-2026-00412-figstep-image-list-multimodal-jailbreak.yaml +1 -1
- package/rules/prompt-injection/ATR-2026-00414-continuation-ethnic-slur-completion-elicitation.yaml +1 -1
- package/rules/prompt-injection/ATR-2026-00535-windsurf-ide-zero-click-prompt-injection.yaml +199 -0
- package/rules/prompt-injection/ATR-2026-00550-untrusted-retrieval-to-privileged-tool.yaml +199 -0
- package/rules/prompt-injection/ATR-2026-00554-langchain-vulnerable-to-template-injecti.yaml +81 -0
- package/rules/prompt-injection/ATR-2026-00565-the-llm-cli-tool-thru-0-27-1-contains-a-.yaml +104 -0
- package/rules/prompt-injection/ATR-2026-00573-semantic-paraphrased-injection.yaml +123 -0
- package/rules/skill-compromise/ATR-2026-00060-skill-impersonation.yaml +1 -1
- package/rules/skill-compromise/ATR-2026-00061-description-behavior-mismatch.yaml +1 -1
- package/rules/skill-compromise/ATR-2026-00062-hidden-capability.yaml +1 -1
- package/rules/skill-compromise/ATR-2026-00063-skill-chain-attack.yaml +1 -1
- package/rules/skill-compromise/ATR-2026-00064-over-permissioned-skill.yaml +1 -1
- package/rules/skill-compromise/ATR-2026-00065-skill-update-attack.yaml +1 -1
- package/rules/skill-compromise/ATR-2026-00066-parameter-injection.yaml +1 -1
- package/rules/skill-compromise/ATR-2026-00120-skill-instruction-injection.yaml +21 -3
- package/rules/skill-compromise/ATR-2026-00121-skill-dangerous-script.yaml +1 -1
- package/rules/skill-compromise/ATR-2026-00122-skill-weaponized-instruction.yaml +1 -1
- package/rules/skill-compromise/ATR-2026-00123-skill-overreach-permissions.yaml +6 -3
- package/rules/skill-compromise/ATR-2026-00124-skill-name-squatting.yaml +1 -1
- package/rules/skill-compromise/ATR-2026-00125-context-poisoning-compaction.yaml +1 -1
- package/rules/skill-compromise/ATR-2026-00126-skill-rug-pull-setup.yaml +1 -1
- package/rules/skill-compromise/ATR-2026-00127-subcommand-overflow.yaml +1 -1
- package/rules/skill-compromise/ATR-2026-00128-html-comment-hidden-payload.yaml +1 -1
- package/rules/skill-compromise/ATR-2026-00129-unicode-smuggling.yaml +1 -1
- package/rules/skill-compromise/ATR-2026-00134-fork-claim-impersonation.yaml +1 -1
- package/rules/skill-compromise/ATR-2026-00135-exfil-url-in-instructions.yaml +1 -1
- package/rules/skill-compromise/ATR-2026-00147-fork-impersonation.yaml +1 -1
- package/rules/skill-compromise/ATR-2026-00149-skill-exfil-compound.yaml +11 -3
- package/rules/skill-compromise/ATR-2026-00151-fork-impersonation-install.yaml +1 -1
- package/rules/skill-compromise/ATR-2026-00157-timebomb-credential-exfil.yaml +1 -1
- package/rules/skill-compromise/ATR-2026-00214-credential-theft.yaml +1 -1
- package/rules/skill-compromise/ATR-2026-00217-credential-harvesting.yaml +1 -1
- package/rules/skill-compromise/ATR-2026-00220-malware-dropper.yaml +3 -3
- package/rules/skill-compromise/ATR-2026-00222-credential-harvesting.yaml +1 -1
- package/rules/skill-compromise/ATR-2026-00223-reverse-shell-dropper.yaml +1 -1
- package/rules/skill-compromise/ATR-2026-00224-credential-exfiltration.yaml +1 -1
- package/rules/skill-compromise/ATR-2026-00225-c2-communication.yaml +1 -1
- package/rules/skill-compromise/ATR-2026-00260-package-hallucination.yaml +1 -1
- package/rules/skill-compromise/ATR-2026-00262-av-evasion-code-gen.yaml +1 -1
- package/rules/skill-compromise/ATR-2026-00263-credential-file-read-gen.yaml +1 -1
- package/rules/skill-compromise/ATR-2026-00266-malware-dropper-gen.yaml +1 -1
- package/rules/skill-compromise/ATR-2026-00283-malwaregen-generic-virus-payload-request.yaml +1 -1
- package/rules/skill-compromise/ATR-2026-00398-huggingface-unsafe-model-artifact-load.yaml +1 -1
- package/rules/skill-compromise/ATR-2026-00523-claude-code-hooks-session-start-pre-trust-rce.yaml +221 -0
- package/rules/skill-compromise/ATR-2026-00525-mini-shai-hulud-gh-token-monitor-persistence.yaml +220 -0
- package/rules/skill-compromise/ATR-2026-00527-skill-silent-git-remote-mirror-exfiltration.yaml +201 -0
- package/rules/tool-poisoning/ATR-2026-00011-tool-output-injection.yaml +1 -1
- package/rules/tool-poisoning/ATR-2026-00012-unauthorized-tool-call.yaml +1 -1
- package/rules/tool-poisoning/ATR-2026-00100-consent-bypass-instruction.yaml +1 -1
- package/rules/tool-poisoning/ATR-2026-00101-trust-escalation-override.yaml +1 -1
- package/rules/tool-poisoning/ATR-2026-00103-hidden-safety-bypass-instruction.yaml +1 -1
- package/rules/tool-poisoning/ATR-2026-00105-silent-action-concealment.yaml +1 -1
- package/rules/tool-poisoning/ATR-2026-00106-schema-description-contradiction.yaml +1 -1
- package/rules/tool-poisoning/ATR-2026-00161-important-tag-cross-tool-shadowing.yaml +1 -1
- package/rules/tool-poisoning/ATR-2026-00209-mcpwn-runaway-invocation.yaml +1 -1
- package/rules/tool-poisoning/ATR-2026-00210-flowise-system-message-override.yaml +1 -1
- package/rules/tool-poisoning/ATR-2026-00259-ansi-escape-injection.yaml +1 -1
- package/rules/tool-poisoning/ATR-2026-00270-xss-in-tool-response.yaml +8 -5
- package/rules/tool-poisoning/ATR-2026-00277-echo-template-command-injection.yaml +1 -1
- package/rules/tool-poisoning/ATR-2026-00393-ansi-code-elicitation-request.yaml +1 -1
- package/rules/tool-poisoning/ATR-2026-00526-claude-code-shell-metachar-in-double-quoted-path.yaml +167 -0
- package/rules/tool-poisoning/ATR-2026-00529-litellm-proxy-sqli-cisa-kev.yaml +158 -0
- package/rules/tool-poisoning/ATR-2026-00530-ms-agent-shell-tool-unsanitized-argv-rce.yaml +184 -0
- package/rules/tool-poisoning/ATR-2026-00531-praisonai-unauthenticated-agent-api.yaml +174 -0
- package/rules/tool-poisoning/ATR-2026-00532-apache-doris-mcp-sql-injection.yaml +155 -0
- package/rules/tool-poisoning/ATR-2026-00533-apache-pinot-mcp-unauthenticated-takeover.yaml +151 -0
- package/rules/tool-poisoning/ATR-2026-00534-alibaba-rds-mcp-unauthenticated-metadata-exfil.yaml +155 -0
- package/rules/tool-poisoning/ATR-2026-00536-nginx-ui-mcp-unauthenticated-command-execution.yaml +199 -0
- package/rules/tool-poisoning/ATR-2026-00537-fastmcp-server-name-cmd-injection-windows.yaml +226 -0
- package/rules/tool-poisoning/ATR-2026-00538-langchain-chatchat-mcp-stdio-unauthenticated-rce.yaml +244 -0
- package/rules/tool-poisoning/ATR-2026-00540-praisonai-parse-mcp-command-cli-injection.yaml +186 -0
- package/rules/tool-poisoning/ATR-2026-00541-agent-zero-mcp-config-command-injection.yaml +183 -0
- package/rules/tool-poisoning/ATR-2026-00542-upsonic-mcp-command-allowlist-bypass.yaml +166 -0
- package/rules/tool-poisoning/ATR-2026-00543-litellm-mcp-server-argv-injection.yaml +168 -0
- package/rules/tool-poisoning/ATR-2026-00544-praisonai-pth-file-path-traversal-rce.yaml +172 -0
- package/rules/tool-poisoning/ATR-2026-00545-praisonai-tool-override-unauth-rce.yaml +170 -0
- package/rules/tool-poisoning/ATR-2026-00561-fastmcp-vulnerable-to-windows-command-in.yaml +99 -0
- package/rules/tool-poisoning/ATR-2026-00567-mcp-stdio-config-command-injection.yaml +75 -0
- package/rules/tool-poisoning/ATR-2026-00568-agent-ssrf-cloud-metadata-file-inclusion.yaml +75 -0
- package/rules/tool-poisoning/ATR-2026-00572-symjack-symlink-config-redirection.yaml +132 -0
- package/spec/README.md +279 -0
- package/spec/atr-correlation-v1.0.md +281 -0
- package/spec/atr-event-v1.0.md +294 -0
- package/spec/atr-language-detection-v1.0.md +218 -0
- package/spec/atr-method-v1.1.md +557 -0
- package/spec/atr-profile-v1.0.md +307 -0
- package/spec/atr-schema.yaml +279 -8
- package/spec/category-registry/v1.0.yaml +200 -0
- package/spec/conformance/README.md +244 -0
- package/spec/conformance/SIGNING.md +191 -0
- package/spec/conformance/baseline/fixtures/ATR-2026-00001-tp-001/expected.json +36 -0
- package/spec/conformance/baseline/fixtures/ATR-2026-00001-tp-001/input.json +16 -0
- package/spec/conformance/baseline/fixtures/README.md +120 -0
- package/spec/conformance/baseline/manifest.json +56 -0
- package/spec/conformance/expected-results.schema.json +121 -0
- package/spec/external-registries/cccs-yara.md +142 -0
- package/spec/internet-drafts/draft-lin-atr-core-00.html +1925 -0
- package/spec/internet-drafts/draft-lin-atr-core-00.md +288 -0
- package/spec/internet-drafts/draft-lin-atr-core-00.txt +560 -0
- package/spec/internet-drafts/draft-lin-atr-core-00.xml +424 -0
- package/spec/mappings/README.md +43 -0
- package/spec/mappings/atr-to-nist-csf-2.0.md +234 -0
- package/spec/schema/correlation.schema.json +144 -0
- package/spec/schema/event.schema.json +233 -0
- package/spec/schema/profile.schema.json +196 -0
- package/spec/schema/rule.schema.json +224 -0
- package/spec/stix-extension/README.md +76 -13
- package/spec/stix-extension/examples/atr-rule-trace-method-example.json +85 -0
- package/spec/stix-extension/extension-definition.json +23 -3
- package/spec/stix-extension/x-atr-rule-schema.json +107 -11
|
@@ -0,0 +1,557 @@
|
|
|
1
|
+
# ATR Method Extensions v1.1
|
|
2
|
+
|
|
3
|
+
Version: 1.1.0
|
|
4
|
+
Status: Draft
|
|
5
|
+
Date: 2026-05-28
|
|
6
|
+
License: MIT
|
|
7
|
+
Editor: Adam Lin (林冠辛) <adam@agentthreatrule.org>
|
|
8
|
+
Numbering Authority: ATR Technical Committee (transitional BDFL until TSC seated)
|
|
9
|
+
Extends: SPEC.md v1.0.0 §6
|
|
10
|
+
|
|
11
|
+
---
|
|
12
|
+
|
|
13
|
+
## 1. Abstract
|
|
14
|
+
|
|
15
|
+
This document extends [SPEC.md](../SPEC.md) v1.0.0 with optional detection methods beyond the pattern-only model of v1.0. It defines four additional methods — `signature`, `semantic`, `behavioral`, `trace` — under one orthogonal field `detection.method`. Rules MAY declare a method to opt into method-specific evaluation semantics; absence of the field means `method: pattern` and v1.0 evaluation applies.
|
|
16
|
+
|
|
17
|
+
The motivating gap is that v1.0 covers input-text regex detection but not (a) silent failures in agent reasoning (paraphrased prompt injection, scope drift, premise violation), (b) LLM-as-judge semantic intent classification, or (c) declarative assertions over agent execution traces. v1.1 adds these without breaking v1.0 conformance.
|
|
18
|
+
|
|
19
|
+
## 2. Status of This Document
|
|
20
|
+
|
|
21
|
+
This is a Draft of ATR Method Extensions v1.1.0. The Draft is stable for implementation. The five detection methods enumerated here (pattern, signature, semantic, behavioral, trace) are the canonical set for v1.1; new methods MUST be introduced by Spec amendment per GOVERNANCE.md, not by individual Rule authors.
|
|
22
|
+
|
|
23
|
+
This document does not modify any v1.0 wire format. Rules without `detection.method` continue to be valid v1.0 Pattern rules. Engines that implement only v1.0 conformance MUST skip rules whose `method` value is not `pattern` rather than reject them, per Section 7.
|
|
24
|
+
|
|
25
|
+
## 3. Conventions and Terminology
|
|
26
|
+
|
|
27
|
+
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in BCP 14 [RFC2119] [RFC8174].
|
|
28
|
+
|
|
29
|
+
| Term | Definition |
|
|
30
|
+
|------|-----------|
|
|
31
|
+
| Method | The detection technique a Rule uses, declared in `detection.method`. One of: pattern, signature, semantic, behavioral, trace. |
|
|
32
|
+
| Plane | Informal synonym for Method used in ATR architectural prose. |
|
|
33
|
+
| Judge | An LLM (or smaller classifier) that evaluates an Input against a prompt template and returns a structured verdict. Used by Semantic method only. |
|
|
34
|
+
| Trace | A directed-acyclic graph of spans representing one or more agent execution turns, ingested in OpenInference or OTel GenAI semantic conventions format. Used by Trace method only. |
|
|
35
|
+
|
|
36
|
+
## 4. Method Catalog (Overview)
|
|
37
|
+
|
|
38
|
+
| Method | Required Companion Field | What It Detects | Latency Tier |
|
|
39
|
+
|--------|--------------------------|-----------------|--------------|
|
|
40
|
+
| pattern | `detection.conditions` (per SPEC §6) | Regex/string match on text fields | < 5 ms |
|
|
41
|
+
| signature | `detection.signature` (§5) | Known-bad hash / package name / registry URL | < 1 ms |
|
|
42
|
+
| semantic | `detection.semantic` (§6) | Paraphrased / roleplay / encoded intent | < 2 s |
|
|
43
|
+
| behavioral | `detection.behavioral` (§7) | Metric threshold over a time window | < 100 ms |
|
|
44
|
+
| trace | `detection.trace` (§8) | Silent failure / scope drift / premise violation over agent execution graph | < 200 ms |
|
|
45
|
+
|
|
46
|
+
Engines MUST implement `pattern` for any conformance level. Other methods are OPTIONAL; an Engine declaring conformance MUST declare which methods it implements.
|
|
47
|
+
|
|
48
|
+
### 4.1 Runtime Profiles
|
|
49
|
+
|
|
50
|
+
Engines MAY declare conformance against one of two Runtime Profiles, which group the methods by latency and operational characteristics:
|
|
51
|
+
|
|
52
|
+
| Profile | Methods Included | Latency Budget | Use Case |
|
|
53
|
+
|---------|------------------|----------------|----------|
|
|
54
|
+
| `deterministic` | signature + pattern | < 5 ms total | Production hot path, sub-millisecond enforcement, no external dependencies |
|
|
55
|
+
| `assisted` | semantic + behavioral + trace | up to 2 s | Sidecar / async path; may call LLM judge, ingest traces, or evaluate metric windows |
|
|
56
|
+
|
|
57
|
+
A Rule's declared `detection.method` implies its profile. Engines that support only the `deterministic` profile MUST skip Rules whose method is in `assisted` (per §9), not reject them.
|
|
58
|
+
|
|
59
|
+
This split is intended to let production policy engines (e.g., enterprise governance toolkits, in-line security scanners) load only deterministic Rules into their hot path while delegating assisted-tier Rules to an async sidecar. The same Rule corpus serves both deployment patterns; no Rule rewrite is required.
|
|
60
|
+
|
|
61
|
+
Profile capability is declared as `atr/profile/deterministic` or `atr/profile/assisted` in the Engine's conformance statement.
|
|
62
|
+
|
|
63
|
+
## 5. Signature Method
|
|
64
|
+
|
|
65
|
+
### 5.1 Purpose
|
|
66
|
+
|
|
67
|
+
The Signature method detects known-bad artifacts by exact-match on stable identifiers (cryptographic hashes, package names, registry URLs). Unlike Pattern (which allows fuzzy text match) or Semantic (which infers intent), Signature requires an exact match against a canonical indicator. This is the fastest detection method (sub-millisecond), making it suitable for production hot paths.
|
|
68
|
+
|
|
69
|
+
The method is modeled on Cyber Threat Intelligence Indicator-of-Compromise (IOC) practice, adapted for AI agent artifacts: skill files, MCP tool packages, agent configurations, and registry locations. Existing tools that ship YARA-based scanning (e.g., production skill scanners) can consume Signature Rules via the ATR→YARA compiler contract in §5.4.
|
|
70
|
+
|
|
71
|
+
### 5.2 Required Fields
|
|
72
|
+
|
|
73
|
+
A Rule with `method: signature` MUST declare `detection.signature` with:
|
|
74
|
+
|
|
75
|
+
| Field | Type | Constraint |
|
|
76
|
+
|-------|------|-----------|
|
|
77
|
+
| `indicators` | array | Non-empty list of indicator objects (§5.2.1). |
|
|
78
|
+
| `match_logic` | enum | One of `any` (Rule matches if any indicator matches) or `all` (Rule matches only if every indicator matches). Defaults to `any`. |
|
|
79
|
+
|
|
80
|
+
#### 5.2.1 Indicator object
|
|
81
|
+
|
|
82
|
+
| Field | Type | Constraint |
|
|
83
|
+
|-------|------|-----------|
|
|
84
|
+
| `type` | enum | One of: `sha256`, `sha512`, `blake2b-256`, `package_name`, `registry_url`, `skill_id`. |
|
|
85
|
+
| `value` | string | Indicator value. Hash types MUST be hex-encoded (lowercase, no `0x` prefix). |
|
|
86
|
+
| `target_field` | string | Source field to match against (e.g., `skill.content`, `skill.manifest.name`, `mcp.tool.source_url`). |
|
|
87
|
+
| `provenance` | object | OPTIONAL. `{first_observed, source, attribution}` for forensic chain. |
|
|
88
|
+
|
|
89
|
+
### 5.3 Evaluation Semantics
|
|
90
|
+
|
|
91
|
+
For Input I and Rule R with method=signature:
|
|
92
|
+
|
|
93
|
+
1. For each indicator `i` in `R.signature.indicators`:
|
|
94
|
+
- Compute or extract value `v` from `I[i.target_field]` per `i.type`:
|
|
95
|
+
- Hash types: compute the digest over `I[i.target_field]` bytes.
|
|
96
|
+
- String types (`package_name`, `registry_url`, `skill_id`): read `I[i.target_field]` as a UTF-8 string.
|
|
97
|
+
- Indicator matches iff `v == i.value` after the normalization rules in §5.3.1.
|
|
98
|
+
2. Apply `match_logic`:
|
|
99
|
+
- `any`: Engine MUST emit a Match if ANY indicator matched.
|
|
100
|
+
- `all`: Engine MUST emit a Match only if EVERY indicator matched.
|
|
101
|
+
|
|
102
|
+
Engines MUST treat unknown indicator `type` values as a graceful_error per SPEC §6, not as a silent no-match.
|
|
103
|
+
|
|
104
|
+
#### 5.3.1 Normalization
|
|
105
|
+
|
|
106
|
+
- Hash hex strings: lowercase, no separator, no `0x` prefix. Engines MUST normalize both sides before comparison.
|
|
107
|
+
- `package_name` and `skill_id`: case-sensitive string equality.
|
|
108
|
+
- `registry_url`: canonical form per RFC 3986 §6 (lowercase scheme + host, no trailing slash, no fragment). Engines MUST normalize both sides before comparison.
|
|
109
|
+
|
|
110
|
+
### 5.4 ATR→YARA Compiler Compatibility
|
|
111
|
+
|
|
112
|
+
Signature Rules are designed to be compilable to YARA rule format for ecosystems that already consume YARA (e.g., production skill scanners, VirusTotal-class infrastructure). A reference compiler is implemented at `scripts/compile-yara.ts` and exposed as `npm run compile:yara`; the compilation contract below is normative and the compiler version is `atr-to-yara@1.0.0`. The compilation contract is:
|
|
113
|
+
|
|
114
|
+
| ATR Indicator | YARA equivalent |
|
|
115
|
+
|--------------|----------------|
|
|
116
|
+
| `sha256` | `hash.sha256(0, filesize)` module condition with literal hash |
|
|
117
|
+
| `sha512` | `hash.sha512(0, filesize)` module condition with literal hash |
|
|
118
|
+
| `package_name` | `strings.$name = "<value>"` with `condition: $name` |
|
|
119
|
+
| `registry_url` | `strings.$url = "<value>"` with `condition: $url` |
|
|
120
|
+
| `skill_id` | `strings.$id = "<value>"` with `condition: $id` |
|
|
121
|
+
| `match_logic: all` | YARA `condition: all of them` |
|
|
122
|
+
| `match_logic: any` | YARA `condition: any of them` |
|
|
123
|
+
|
|
124
|
+
The ATR→YARA compiler is OPTIONAL infrastructure; non-implementing engines do not lose conformance. Engines that publish YARA outputs MUST declare the `atr/compiler/yara@1.0` capability. The reference implementation at `scripts/compile-yara.ts` is tested via 11 unit tests at `tests/compile-yara.test.ts` covering single-indicator emission, multi-indicator any/all combinators, hash/string mixing, character escaping, hex normalization, and graceful_error on unknown indicator types per §5.3.
|
|
125
|
+
|
|
126
|
+
### 5.5 Example
|
|
127
|
+
|
|
128
|
+
```yaml
|
|
129
|
+
id: ATR-YYYY-DRAFT-skill-malware-example
|
|
130
|
+
title: "Known-bad skill: @malicious/persistence-rootkit"
|
|
131
|
+
status: draft
|
|
132
|
+
severity: critical
|
|
133
|
+
tags:
|
|
134
|
+
category: skill-compromise
|
|
135
|
+
scan_target: skill
|
|
136
|
+
detection:
|
|
137
|
+
method: signature
|
|
138
|
+
signature:
|
|
139
|
+
match_logic: any
|
|
140
|
+
indicators:
|
|
141
|
+
- type: sha256
|
|
142
|
+
value: "5e884898da28047151d0e56f8dc6292773603d0d6aabbdd62a11ef721d1542d8"
|
|
143
|
+
target_field: skill.content
|
|
144
|
+
provenance:
|
|
145
|
+
first_observed: "2026-05-27"
|
|
146
|
+
source: "Wild scan corpus"
|
|
147
|
+
attribution: "Public skill registry"
|
|
148
|
+
- type: package_name
|
|
149
|
+
value: "@malicious/persistence-rootkit"
|
|
150
|
+
target_field: skill.manifest.name
|
|
151
|
+
response:
|
|
152
|
+
actions: [block_request, log_alert]
|
|
153
|
+
test_cases:
|
|
154
|
+
true_positives:
|
|
155
|
+
- input: { skill.content: "<full byte content matching hash>" }
|
|
156
|
+
expected: triggered
|
|
157
|
+
true_negatives:
|
|
158
|
+
- input: { skill.manifest.name: "@safe/normal-skill" }
|
|
159
|
+
expected: not_triggered
|
|
160
|
+
```
|
|
161
|
+
|
|
162
|
+
### 5.6 Provenance and Trust
|
|
163
|
+
|
|
164
|
+
Signature Rules carry forensic weight: a hash match means "this exact artifact was previously confirmed malicious." Engines MUST preserve the `provenance` field in Match output (per SPEC §7) to permit downstream attribution and dispute resolution. Engines SHOULD NOT auto-block on a hash match without operator policy explicitly enabling it; the default response action SHOULD be `log_alert` until provenance is operator-trusted.
|
|
165
|
+
|
|
166
|
+
## 6. Semantic Method
|
|
167
|
+
|
|
168
|
+
### 6.1 Purpose
|
|
169
|
+
|
|
170
|
+
The Semantic method detects threats whose surface form bypasses pattern-level regex but whose intent is identifiable by a Judge. Examples: paraphrased prompt injection, roleplay-cloaked jailbreak, base64-decoded instruction override, and indirect injection via untrusted retrieval.
|
|
171
|
+
|
|
172
|
+
### 6.2 Required Fields
|
|
173
|
+
|
|
174
|
+
A Rule with `method: semantic` MUST declare `detection.semantic` with:
|
|
175
|
+
|
|
176
|
+
| Field | Type | Constraint |
|
|
177
|
+
|-------|------|-----------|
|
|
178
|
+
| `judge_model_class` | string | One of: `gpt-4-class`, `claude-haiku-class`, `claude-opus-class`, `llama-prompt-guard`, `llama-3-class`, `local`. Indicates capability class, not a specific vendor SKU. |
|
|
179
|
+
| `prompt_template` | string | MUST contain `{{input}}` placeholder. Engine substitutes the Input verbatim. |
|
|
180
|
+
| `output_schema` | object | JSON Schema fragment for the expected Judge response. MUST include at minimum `category` (string) and `confidence` (number 0.0-1.0). |
|
|
181
|
+
| `threshold` | number | Minimum `confidence` to count as a Match. Range 0.0-1.0. |
|
|
182
|
+
|
|
183
|
+
### 6.3 Optional Fields
|
|
184
|
+
|
|
185
|
+
| Field | Type | Constraint |
|
|
186
|
+
|-------|------|-----------|
|
|
187
|
+
| `cache_ttl` | integer | Seconds. Engine MAY cache `(prompt_hash, input_hash) → verdict` for this duration. |
|
|
188
|
+
| `judge_prompt_hash` | string | SHA-256 of the canonical Judge prompt at rule authoring time. Used for regression-testing the prompt itself. |
|
|
189
|
+
| `fallback_method` | string | One of `pattern` or `none`. Behavior when Judge is unavailable. |
|
|
190
|
+
| `consensus` | object | Multi-judge requirement: `{n: integer, agreement: float}`. Match requires `n` judges agreeing above `agreement` threshold. |
|
|
191
|
+
|
|
192
|
+
### 6.4 Evaluation Semantics
|
|
193
|
+
|
|
194
|
+
For Input I and Rule R with method=semantic:
|
|
195
|
+
|
|
196
|
+
1. Engine MAY check cache for `(R.semantic.judge_prompt_hash, hash(I))`.
|
|
197
|
+
2. If miss: Engine substitutes I into R.semantic.prompt_template, calls Judge, parses response per R.semantic.output_schema.
|
|
198
|
+
3. Engine MUST emit a Match iff parsed response has `confidence >= R.semantic.threshold`.
|
|
199
|
+
4. If Judge call fails and R.semantic.fallback_method is `pattern`, Engine MUST evaluate any pattern-mode conditions present in R as a fallback. If `none`, Engine MUST emit a graceful_error rather than a Match.
|
|
200
|
+
|
|
201
|
+
### 6.5 Regression Testing
|
|
202
|
+
|
|
203
|
+
Rules with method=semantic SHOULD ship calibration evidence as `test_cases`:
|
|
204
|
+
- `true_positives`: ≥5 Inputs the Judge produces `confidence >= threshold` for, with the Rule's `judge_prompt_hash`.
|
|
205
|
+
- `true_negatives`: ≥5 Inputs that score below threshold.
|
|
206
|
+
- The Engine's conformance runner MAY run these against a canonical Judge stub to detect prompt drift.
|
|
207
|
+
|
|
208
|
+
Calibration methodology adapted from the Promptfoo `llm-rubric` workflow [PROMPTFOO]; consensus-of-N methodology from [LLM-JURY].
|
|
209
|
+
|
|
210
|
+
## 7. Behavioral Method
|
|
211
|
+
|
|
212
|
+
### 7.1 Purpose
|
|
213
|
+
|
|
214
|
+
The Behavioral method detects threats that manifest as deviation of an observable metric from a baseline or threshold over a bounded time window. Unlike Pattern (per-input regex) and Trace (per-trace span DAG assertion), Behavioral evaluates **aggregates** over many events: tool-call frequency, token-spend velocity, retry-loop counts, latency outliers, baseline-deviation on a continuous metric.
|
|
215
|
+
|
|
216
|
+
Threat classes this method addresses:
|
|
217
|
+
|
|
218
|
+
- **Runaway autonomy**: an agent that enters a tool-call loop and exceeds a per-session budget within seconds (excessive-autonomy category).
|
|
219
|
+
- **Resource exhaustion / denial-of-wallet**: token-spend or compute velocity outside a configured envelope.
|
|
220
|
+
- **Probing / reconnaissance**: an agent making a burst of read-only tool calls across unrelated namespaces within a short window — individually benign, aggregately anomalous.
|
|
221
|
+
- **Slow-walk exfiltration**: small chunks of sensitive data sent across many sessions in a baseline-deviating cumulative pattern.
|
|
222
|
+
|
|
223
|
+
The method is closest in shape to Sigma's correlation rules and SIEM time-window queries, adapted for agent event streams.
|
|
224
|
+
|
|
225
|
+
### 7.2 Required Fields
|
|
226
|
+
|
|
227
|
+
A Rule with `method: behavioral` MUST declare `detection.behavioral` with:
|
|
228
|
+
|
|
229
|
+
| Field | Type | Constraint |
|
|
230
|
+
|-------|------|-----------|
|
|
231
|
+
| `metric` | string | Name of the metric being observed (e.g., `tool_calls_per_session`, `token_spend_usd`, `tool_distinct_namespaces`). |
|
|
232
|
+
| `aggregation` | enum | One of: `count`, `sum`, `avg`, `max`, `distinct_count`, `rate`. How event values are aggregated into a single metric value over the window. |
|
|
233
|
+
| `window` | string | ISO 8601 duration (e.g., `PT5M`, `PT1H`) or shorthand (`5m`, `1h`). |
|
|
234
|
+
| `operator` | enum | One of: `gt`, `lt`, `gte`, `lte`, `eq`, `deviation_from_baseline`. |
|
|
235
|
+
| `threshold` | number | Numeric value the aggregated metric is compared against. For `deviation_from_baseline`, expressed as standard-deviation multiplier or fractional change (see §7.4). |
|
|
236
|
+
|
|
237
|
+
### 7.3 Optional Fields
|
|
238
|
+
|
|
239
|
+
| Field | Type | Constraint |
|
|
240
|
+
|-------|------|-----------|
|
|
241
|
+
| `group_by` | array of string | Dimensions to partition the aggregation over (e.g., `["session.id"]` for per-session counts; `["user.id", "tool.name"]` for per-user-per-tool counts). Empty / absent = global aggregation. |
|
|
242
|
+
| `filter` | object | Pre-aggregation event filter expressed as attribute matchers per §8.3 predicate vocabulary. Engine evaluates filter on each event before counting. |
|
|
243
|
+
| `baseline` | object | Required only when `operator: deviation_from_baseline`. See §7.4. |
|
|
244
|
+
| `min_events` | integer | Minimum event count in the window before the rule may fire (suppresses false positives at low sample sizes). |
|
|
245
|
+
| `cooldown` | string | Duration the rule MUST NOT re-fire on the same `group_by` partition after a Match. ISO 8601 duration. |
|
|
246
|
+
|
|
247
|
+
### 7.4 Baseline (for deviation_from_baseline operator)
|
|
248
|
+
|
|
249
|
+
The `deviation_from_baseline` operator compares the current window's metric value against a baseline. The `baseline` block MUST declare:
|
|
250
|
+
|
|
251
|
+
| Field | Type | Constraint |
|
|
252
|
+
|-------|------|-----------|
|
|
253
|
+
| `source` | enum | One of: `rolling_mean`, `historical_percentile`, `fixed`. |
|
|
254
|
+
| `lookback` | string | For `rolling_mean` and `historical_percentile`: the duration to compute the baseline over (e.g., `P7D` = 7 days). |
|
|
255
|
+
| `percentile` | number | For `historical_percentile`: value in (0, 100) — e.g., `95` for p95. |
|
|
256
|
+
| `value` | number | For `fixed`: the literal baseline value. |
|
|
257
|
+
| `deviation_unit` | enum | One of: `stddev` (threshold expresses how many standard deviations above baseline), `fraction` (threshold expresses fractional change, e.g., `2.0` = 200% of baseline). |
|
|
258
|
+
|
|
259
|
+
### 7.5 Evaluation Semantics
|
|
260
|
+
|
|
261
|
+
For event stream E and Rule R with method=behavioral:
|
|
262
|
+
|
|
263
|
+
1. Engine maintains a time-windowed view per `group_by` partition of length `R.behavioral.window`.
|
|
264
|
+
2. For each incoming event e, Engine evaluates `R.behavioral.filter` against e; if false, e is skipped.
|
|
265
|
+
3. Otherwise e is folded into the appropriate partition via `R.behavioral.aggregation`.
|
|
266
|
+
4. After each fold (or on a regular tick, per Engine policy), Engine computes the aggregated value `v` per partition.
|
|
267
|
+
5. If `R.behavioral.min_events` is set and event count in the window is below it, no Match is emitted.
|
|
268
|
+
6. Engine compares `v` to `R.behavioral.threshold` per `R.behavioral.operator`. For `deviation_from_baseline`, Engine first computes the baseline per §7.4 and the deviation `d = (v - baseline) / unit_divisor`, then compares `d` to `threshold`.
|
|
269
|
+
7. On match, Engine emits a Match output (SPEC.md §7) and applies `cooldown` to the partition. Subsequent events within cooldown MUST NOT produce a new Match for the same partition.
|
|
270
|
+
|
|
271
|
+
Engines MUST implement at least one of `count`, `sum`, `rate` aggregations and the operators `gt`, `lt`, `gte`, `lte` for L1 conformance of the behavioral method.
|
|
272
|
+
|
|
273
|
+
### 7.6 Example
|
|
274
|
+
|
|
275
|
+
```yaml
|
|
276
|
+
id: ATR-YYYY-DRAFT-runaway-tool-loop
|
|
277
|
+
title: "Runaway tool-call loop within a session"
|
|
278
|
+
status: draft
|
|
279
|
+
severity: high
|
|
280
|
+
tags:
|
|
281
|
+
category: excessive-autonomy
|
|
282
|
+
scan_target: runtime
|
|
283
|
+
agent_source:
|
|
284
|
+
type: agent_behavior
|
|
285
|
+
framework: [any]
|
|
286
|
+
detection:
|
|
287
|
+
method: behavioral
|
|
288
|
+
behavioral:
|
|
289
|
+
metric: "tool_calls"
|
|
290
|
+
aggregation: count
|
|
291
|
+
window: "PT1M"
|
|
292
|
+
operator: gt
|
|
293
|
+
threshold: 100
|
|
294
|
+
group_by: ["session.id"]
|
|
295
|
+
min_events: 10
|
|
296
|
+
cooldown: "PT5M"
|
|
297
|
+
filter:
|
|
298
|
+
span.kind:
|
|
299
|
+
in: [TOOL]
|
|
300
|
+
response:
|
|
301
|
+
actions: [alert, rate_limit_source, escalate]
|
|
302
|
+
```
|
|
303
|
+
|
|
304
|
+
This rule fires when any single session emits more than 100 tool calls within one minute, with a 5-minute cooldown to suppress duplicate alerts. The `min_events: 10` floor prevents the rule from firing on barely-active sessions in cold-start periods.
|
|
305
|
+
|
|
306
|
+
### 7.7 Storage and Performance
|
|
307
|
+
|
|
308
|
+
Behavioral evaluation requires the engine to maintain windowed state per `group_by` partition. Conformant engines:
|
|
309
|
+
|
|
310
|
+
- SHOULD use a sliding-window implementation (not a fixed bucket clock-aligned tick) to avoid edge-of-window artifacts.
|
|
311
|
+
- SHOULD bound per-partition memory consumption and document the per-rule memory ceiling.
|
|
312
|
+
- SHOULD support reset of per-partition state on explicit operator command (e.g., after an incident review).
|
|
313
|
+
|
|
314
|
+
A reference time-series backend is not specified normatively. Engines MAY use in-memory ring buffers (for short windows), Redis time-series, ClickHouse, Prometheus, or any other backend; the wire format and rule semantics are storage-agnostic.
|
|
315
|
+
|
|
316
|
+
### 7.8 Profile Placement
|
|
317
|
+
|
|
318
|
+
The Behavioral method belongs to the `assisted` runtime profile (§4.1). The aggregation latency is bounded at well below the 100 ms target only when the windowed state is hot in memory; cold-start partitions or remote time-series backends may exceed the latency target. Operators deploying behavioral rules in a deterministic-profile context MUST measure tail latency before promoting the rule to in-line enforcement.
|
|
319
|
+
|
|
320
|
+
## 8. Trace Method
|
|
321
|
+
|
|
322
|
+
### 8.1 Purpose
|
|
323
|
+
|
|
324
|
+
The Trace method detects threats that manifest as patterns in agent execution rather than in input text: silent failure (no error, wrong output), premise drift (agent treats user-scoped context as global), scope violation across multi-agent delegation chains, and tool-misuse sequences whose individual steps are benign.
|
|
325
|
+
|
|
326
|
+
These threats are out of scope for v1.0 Pattern detection by construction.
|
|
327
|
+
|
|
328
|
+
### 8.2 Ingest Format
|
|
329
|
+
|
|
330
|
+
A Trace Engine MUST ingest spans in one of:
|
|
331
|
+
|
|
332
|
+
- **OpenInference** [OPENINFERENCE] — the Arize-led, Apache-2.0 trace schema with 20+ framework adoption (LangChain, LlamaIndex, OpenAI Agents SDK, CrewAI, MCP, Claude Agent SDK, Vercel AI, Pydantic AI, Spring AI, smolagents, BeeAI, Haystack, others). v1.1 RECOMMENDS OpenInference as default.
|
|
333
|
+
- **OTel GenAI Semantic Conventions** [OTEL-GENAI] — the OpenTelemetry standard for GenAI spans. Currently in Development for agent spans. v1.1 PERMITS this as ingest format for forward compatibility.
|
|
334
|
+
|
|
335
|
+
Rules MUST declare `detection.trace.ingest_format`. Engines MUST reject rules whose declared format they do not implement, with a clear error message.
|
|
336
|
+
|
|
337
|
+
### 8.3 Three Primitives
|
|
338
|
+
|
|
339
|
+
A Trace Rule expresses one or more of three primitives, evaluated against the Trace as a span DAG.
|
|
340
|
+
|
|
341
|
+
#### Attribute matchers — predicate vocabulary
|
|
342
|
+
|
|
343
|
+
Inside a primitive's `shape`, `target_shape`, or `must_be_preceded_by` blocks, attribute matchers MAY be either literal value matchers or richer predicate maps. The following predicates are normative for v1.1:
|
|
344
|
+
|
|
345
|
+
| Predicate | Semantics |
|
|
346
|
+
|-----------|-----------|
|
|
347
|
+
| `<literal>` | Exact equality. e.g., `tool.name: email.send` matches iff `attributes["tool.name"] == "email.send"`. |
|
|
348
|
+
| `in: [A, B, C]` | Set membership. Matches iff attribute value is in the list. |
|
|
349
|
+
| `not_in: [A, B, C]` | Inverse of `in`. |
|
|
350
|
+
| `equals: <value>` | Explicit equality (same as literal). |
|
|
351
|
+
| `not_equals: <value>` | Inequality. |
|
|
352
|
+
| `regex: "<pattern>"` | ECMAScript regex match on stringified attribute value. |
|
|
353
|
+
| `exists: true\|false` | Attribute presence check. |
|
|
354
|
+
|
|
355
|
+
#### Cross-attribute references
|
|
356
|
+
|
|
357
|
+
A predicate value MAY contain the placeholder `${span.attributes.<path>}` to reference another attribute of the same span being matched. Engines MUST resolve the placeholder against the candidate span's attributes before evaluating the predicate. This permits within-span invariants such as "target identifier must equal active identifier":
|
|
358
|
+
|
|
359
|
+
```yaml
|
|
360
|
+
forbid:
|
|
361
|
+
- shape:
|
|
362
|
+
span.kind: "TOOL"
|
|
363
|
+
attributes:
|
|
364
|
+
tool.args.target_conversation_id:
|
|
365
|
+
not_equals: "${span.attributes.conversation.id}"
|
|
366
|
+
```
|
|
367
|
+
|
|
368
|
+
Cross-span placeholder references (`${trace.spans[N].attributes.X}`) are NOT permitted in v1.1; cross-span invariants MUST use the `invariant` primitive in §8.3.3.
|
|
369
|
+
|
|
370
|
+
#### Shape disjunctions — `one_of_shapes`
|
|
371
|
+
|
|
372
|
+
A `preceded_by` or `must_be_preceded_by` block MAY use `one_of_shapes` to express a disjunction of candidate shapes. The primitive matches if ANY shape in the list matches a preceding span. This is the only normative disjunction primitive in v1.1; engines MUST evaluate each shape independently and short-circuit on first match:
|
|
373
|
+
|
|
374
|
+
```yaml
|
|
375
|
+
must_be_preceded_by:
|
|
376
|
+
one_of_shapes:
|
|
377
|
+
- span.kind: "HUMAN"
|
|
378
|
+
- span.kind: "AGENT"
|
|
379
|
+
attributes:
|
|
380
|
+
human_approval: true
|
|
381
|
+
within_trace: true
|
|
382
|
+
```
|
|
383
|
+
|
|
384
|
+
Other disjunction forms (`any_of`, top-level `one_of`) are NOT defined in v1.1. Rules using them MUST be treated as malformed.
|
|
385
|
+
|
|
386
|
+
#### 8.3.1 `forbid` — Span Shape That MUST NOT Appear
|
|
387
|
+
|
|
388
|
+
```yaml
|
|
389
|
+
detection:
|
|
390
|
+
method: trace
|
|
391
|
+
trace:
|
|
392
|
+
ingest_format: openinference
|
|
393
|
+
forbid:
|
|
394
|
+
- shape:
|
|
395
|
+
span.kind: TOOL
|
|
396
|
+
attributes:
|
|
397
|
+
tool.name: email.send
|
|
398
|
+
preceded_by:
|
|
399
|
+
span.kind: RETRIEVER
|
|
400
|
+
attributes:
|
|
401
|
+
source: untrusted
|
|
402
|
+
```
|
|
403
|
+
|
|
404
|
+
Semantics: Match iff there exists a span `s` in the Trace where `s` matches the shape AND (if `preceded_by` is present) there exists an earlier span `s'` matching `preceded_by` in the same Trace.
|
|
405
|
+
|
|
406
|
+
#### 8.3.2 `require` — Span Shape That MUST Precede Another
|
|
407
|
+
|
|
408
|
+
```yaml
|
|
409
|
+
detection:
|
|
410
|
+
method: trace
|
|
411
|
+
trace:
|
|
412
|
+
ingest_format: openinference
|
|
413
|
+
require:
|
|
414
|
+
- target_shape:
|
|
415
|
+
span.kind: TOOL
|
|
416
|
+
attributes:
|
|
417
|
+
tool.privilege: destructive
|
|
418
|
+
must_be_preceded_by:
|
|
419
|
+
span.kind: AGENT
|
|
420
|
+
attributes:
|
|
421
|
+
human_approval: true
|
|
422
|
+
```
|
|
423
|
+
|
|
424
|
+
Semantics: Match (anomaly) iff there exists a span `s` matching `target_shape` AND there does NOT exist an earlier span `s'` matching `must_be_preceded_by` in the same Trace.
|
|
425
|
+
|
|
426
|
+
This is the inverse polarity of `forbid`: the rule fires when an expected predecessor is *missing*. This is the only v1.1 primitive that detects absence rather than presence — and is the primary mechanism for catching silent failures.
|
|
427
|
+
|
|
428
|
+
#### 8.3.3 `invariant` — Attribute That MUST Hold Across Spans
|
|
429
|
+
|
|
430
|
+
```yaml
|
|
431
|
+
detection:
|
|
432
|
+
method: trace
|
|
433
|
+
trace:
|
|
434
|
+
ingest_format: openinference
|
|
435
|
+
invariant:
|
|
436
|
+
- attribute: session.id
|
|
437
|
+
across: trace
|
|
438
|
+
- attribute: user.id
|
|
439
|
+
across: agent.delegation_chain
|
|
440
|
+
```
|
|
441
|
+
|
|
442
|
+
Semantics: Match (anomaly) iff there exist two spans `s1`, `s2` in the domain (`across`) such that `s1.attributes[attribute] != s2.attributes[attribute]`.
|
|
443
|
+
|
|
444
|
+
`across` MUST be one of:
|
|
445
|
+
- `trace` — all spans in the Trace
|
|
446
|
+
- `agent.delegation_chain` — all spans within one delegation chain (joined by OpenInference `agent.delegation_chain[*]`)
|
|
447
|
+
- `session` — all spans sharing one `session.id`
|
|
448
|
+
- `conversation` — all spans sharing one `gen_ai.conversation.id` (OTel ingest only)
|
|
449
|
+
|
|
450
|
+
### 8.4 Composition
|
|
451
|
+
|
|
452
|
+
A Trace Rule MAY declare multiple primitives. The Rule matches iff the boolean expression in `detection.condition` evaluates to true over the named primitives.
|
|
453
|
+
|
|
454
|
+
If `detection.condition` is absent, the default is `any of *` — Rule matches if any primitive matches.
|
|
455
|
+
|
|
456
|
+
### 8.5 Evaluation Determinism
|
|
457
|
+
|
|
458
|
+
Trace evaluation MUST be deterministic for a fixed Trace input. Engines MUST NOT introduce randomization, time-of-day branching, or sampling into Trace evaluation. This is identical to SPEC §6.4 and applies normatively.
|
|
459
|
+
|
|
460
|
+
### 8.6 Performance Considerations
|
|
461
|
+
|
|
462
|
+
Trace evaluation is OPTIONAL; not all Engines need to implement it. An Engine claiming `trace` capability:
|
|
463
|
+
|
|
464
|
+
- SHOULD set a per-Rule per-Trace timeout (RECOMMENDED: 200 ms).
|
|
465
|
+
- SHOULD reject Traces exceeding a configurable maximum span count (RECOMMENDED: 10,000 spans).
|
|
466
|
+
- MUST treat malformed Traces as graceful_error rather than no-match, so authoring mistakes are surfaced.
|
|
467
|
+
|
|
468
|
+
## 9. Conformance Implications
|
|
469
|
+
|
|
470
|
+
This document does not alter SPEC.md §11 (Conformance Levels). It adds the following capability declarations that an Engine MAY make:
|
|
471
|
+
|
|
472
|
+
| Capability | Statement | What the Engine implements |
|
|
473
|
+
|------------|-----------|----------------------------|
|
|
474
|
+
| `atr/method/pattern` | REQUIRED for any conformance level. | v1.0 §6 evaluation. |
|
|
475
|
+
| `atr/method/signature` | OPTIONAL. | §5 evaluation. |
|
|
476
|
+
| `atr/method/semantic` | OPTIONAL. | §6 evaluation. |
|
|
477
|
+
| `atr/method/behavioral` | OPTIONAL. | §7 evaluation. |
|
|
478
|
+
| `atr/method/trace` | OPTIONAL. | §8 evaluation. |
|
|
479
|
+
| `atr/profile/deterministic` | OPTIONAL. Implies `atr/method/pattern` + `atr/method/signature`. | §4.1 — production hot-path engines. |
|
|
480
|
+
| `atr/profile/assisted` | OPTIONAL. Implies `atr/method/semantic` + `atr/method/behavioral` + `atr/method/trace`. | §4.1 — sidecar / async engines. |
|
|
481
|
+
| `atr/compiler/yara@1.0` | OPTIONAL. | §5.4 — emits YARA rules from Signature Rules. |
|
|
482
|
+
|
|
483
|
+
An Engine MUST publish its capability set in any conformance claim.
|
|
484
|
+
|
|
485
|
+
A Rule with `method: X` where X is not in the Engine's capability set MUST be skipped silently rather than rejected. Skipping MUST be reported in the per-evaluation metadata so operators can audit coverage.
|
|
486
|
+
|
|
487
|
+
### 9.1 OSCAL Evidence Integration
|
|
488
|
+
|
|
489
|
+
Rules MAY declare `references.oscal_assessment_objective` to act as an evidence source beneath an OSCAL-driven Assessment Plan / Result. An Engine that emits OSCAL Assessment Results (per `spec/atr-event-v1.0.md` OSCAL mapping) MUST include the Rule's match output as `observations[]` evidence for each referenced objective ID.
|
|
490
|
+
|
|
491
|
+
This is the bridge from runtime detection into compliance assessment workflows. Operators running OSCAL-based audit pipelines (e.g., FedRAMP automation, NIST AI RMF assessment) can consume ATR matches as machine-readable evidence without reauthoring rules in OSCAL's component-definition vocabulary.
|
|
492
|
+
|
|
493
|
+
### 9.2 Probe Binding (Red-Team Coverage)
|
|
494
|
+
|
|
495
|
+
Rules MAY declare `references.probe_id` to bind the Rule to one or more adversarial probes (red-team generators) whose output the Rule is designed to detect. Format is `<framework>:<probe-name>`, e.g., `pyrit:indirect_pi_v2` or `garak:promptinject.HijackHateHumans`.
|
|
496
|
+
|
|
497
|
+
This explicit pairing closes the loop between adversarial generation and detection:
|
|
498
|
+
|
|
499
|
+
- A red-team harness running probe `P` MUST be able to query the Rule corpus for all Rules with `probe_id` containing `P`, and run them against the probe's output to measure detection coverage.
|
|
500
|
+
- A Rule author MAY claim coverage of probe `P` by binding to it; the claim is testable by any party with access to the probe runner.
|
|
501
|
+
- The Engine SHOULD report per-probe detection rate as part of evaluation metadata when the operator supplies a probe identifier with the input.
|
|
502
|
+
|
|
503
|
+
This is the inverse direction of §6.5's calibration workflow: §6.5 ensures the Judge prompt holds up over time; §9.2 ensures the rule survives against newly-generated adversarial input.
|
|
504
|
+
|
|
505
|
+
### 9.3 NIST CSF 2.0 / ETSI TS 104 223 Crosswalks
|
|
506
|
+
|
|
507
|
+
Rules MAY declare `references.nist_csf` and `references.etsi_ts_104223` to align with the two major sovereign cybersecurity frameworks for AI agents:
|
|
508
|
+
|
|
509
|
+
- `nist_csf`: NIST CSF 2.0 subcategory IDs (e.g., `DE.CM-09`, `PR.IR-01`). Required for citation in NIST IR 8596 Cyber AI Profile Informative References.
|
|
510
|
+
- `etsi_ts_104223`: ETSI TS 104 223 principle/sub-principle IDs (e.g., `P4.3`). ETSI TS 104 223 upstreamed UK NCSC's AI Cyber Code of Practice; this binding lets ATR Rules be cited in NCSC Implementation Guides.
|
|
511
|
+
|
|
512
|
+
Both fields are arrays of strings to permit multi-framework alignment per Rule.
|
|
513
|
+
|
|
514
|
+
## 10. Security and Privacy Considerations
|
|
515
|
+
|
|
516
|
+
### 10.1 Semantic Method: Judge Input Confidentiality
|
|
517
|
+
|
|
518
|
+
When `prompt_template` includes the Input, the full Input is sent to the Judge. Engines MUST redact PII per SPEC §13.3 before constructing the prompt, OR provide an explicit operator opt-out documented per deployment.
|
|
519
|
+
|
|
520
|
+
### 10.2 Trace Method: Trace Confidentiality
|
|
521
|
+
|
|
522
|
+
Trace data may include user-side PII, system prompts, tool credentials, and intermediate model output. Engines MUST NOT log raw trace contents at INFO level or higher without explicit operator configuration, and MUST redact `attributes.tool.args` when emitting Match output per SPEC §13.3.
|
|
523
|
+
|
|
524
|
+
### 10.3 Judge Prompt Injection
|
|
525
|
+
|
|
526
|
+
The prompt_template is part of the trusted Rule, but the Input is untrusted. Authors MUST follow Microsoft Spotlighting [SPOTLIGHTING] practices to delineate trusted instructions from untrusted Input in the template.
|
|
527
|
+
|
|
528
|
+
## 11. References
|
|
529
|
+
|
|
530
|
+
### 11.1 Normative
|
|
531
|
+
|
|
532
|
+
- [SPEC] ATR Core Specification v1.0.0 — this repository, `SPEC.md`.
|
|
533
|
+
- [RFC2119] / [RFC8174] — BCP 14 normative language.
|
|
534
|
+
|
|
535
|
+
### 11.2 Informative
|
|
536
|
+
|
|
537
|
+
- [OPENINFERENCE] Arize AI, "OpenInference Semantic Conventions", https://github.com/Arize-ai/openinference
|
|
538
|
+
- [OTEL-GENAI] OpenTelemetry, "GenAI Semantic Conventions", https://opentelemetry.io/docs/specs/semconv/gen-ai/
|
|
539
|
+
- [PROMPTFOO] Promptfoo, "LLM-as-a-Judge Calibration Guide", https://www.promptfoo.dev/docs/guides/llm-as-a-judge/
|
|
540
|
+
- [LLM-JURY] "LLM Jury-on-Demand", arXiv:2512.01786
|
|
541
|
+
- [SPOTLIGHTING] Microsoft, "Prompt Shields Spotlighting", Build 2025.
|
|
542
|
+
- [AGENTARMOR] "AgentArmor: Type-System for Agent Trace Analysis", arXiv:2508.01249
|
|
543
|
+
- [TRACEAEGIS] "TraceAegis: Behavioral Constraints over Agent Execution Traces", arXiv:2510.11203
|
|
544
|
+
- [GOAL-DRIFT] Arike et al., "Evaluating Goal Drift in LM Agents", arXiv:2505.02709
|
|
545
|
+
|
|
546
|
+
## Appendix A. Changelog Against v1.0
|
|
547
|
+
|
|
548
|
+
| Change | Source |
|
|
549
|
+
|--------|--------|
|
|
550
|
+
| Introduced `detection.method` optional field | This document |
|
|
551
|
+
| Added `signature`, `semantic`, `behavioral`, `trace` methods | This document |
|
|
552
|
+
| Added `agent_trace` to `agent_source.type` enum | This document |
|
|
553
|
+
| Defined OpenInference + OTel GenAI as Trace ingest formats | This document |
|
|
554
|
+
| Defined three Trace primitives: `forbid`, `require`, `invariant` | This document |
|
|
555
|
+
| Specified capability-based conformance for method extensions | This document |
|
|
556
|
+
|
|
557
|
+
All changes are MINOR per SPEC §10. Rules without `detection.method` continue to be valid v1.0 Pattern rules without modification.
|