mia-code 0.2.0 → 0.3.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (410) hide show
  1. package/.miette/260321.md +1 -0
  2. package/.miette/260323.md +9 -0
  3. package/.miette/260331.md +2 -0
  4. package/.pde/2604011511--83a2d7f9-24a5-4cf4-98d5-036c82f872e8/2604020008--d3417f2c-df12-4f0f-8a1b-d88e7968f822/d3417f2c-df12-4f0f-8a1b-d88e7968f822.md +63 -0
  5. package/.pde/2604011511--83a2d7f9-24a5-4cf4-98d5-036c82f872e8/2604020008--e6c3fc5d-4a70-4523-ba7d-a3250da4c235/e6c3fc5d-4a70-4523-ba7d-a3250da4c235.md +72 -0
  6. package/.pde/2604011511--83a2d7f9-24a5-4cf4-98d5-036c82f872e8/2604020008--efeb00a2-b17a-4d32-b1f0-b90c37a8d24e/efeb00a2-b17a-4d32-b1f0-b90c37a8d24e.md +62 -0
  7. package/.pde/2604011511--83a2d7f9-24a5-4cf4-98d5-036c82f872e8/83a2d7f9-24a5-4cf4-98d5-036c82f872e8.json +302 -0
  8. package/.pde/2604011511--83a2d7f9-24a5-4cf4-98d5-036c82f872e8/83a2d7f9-24a5-4cf4-98d5-036c82f872e8.md +149 -0
  9. package/.pde/2604011511--83a2d7f9-24a5-4cf4-98d5-036c82f872e8/AGENTS.md +31 -0
  10. package/.pde/2604011511--83a2d7f9-24a5-4cf4-98d5-036c82f872e8/meta-decomposition-3-children.md +67 -0
  11. package/.pde/2604040129--61f9dd4d-7aa6-45e6-a58b-e480b1aa6737/61f9dd4d-7aa6-45e6-a58b-e480b1aa6737--from-mia-openclaw-workspace.md +125 -0
  12. package/.pde/2604040129--61f9dd4d-7aa6-45e6-a58b-e480b1aa6737/STATUS.md +1 -0
  13. package/.pde/4f02ba94-9f52-422e-9389-b16f9b37f358.json +177 -0
  14. package/.pde/4f02ba94-9f52-422e-9389-b16f9b37f358.md +77 -0
  15. package/.pde/6ad9244d-5340-490f-b76c-c86728b9de52.json +222 -0
  16. package/.pde/6ad9244d-5340-490f-b76c-c86728b9de52.md +99 -0
  17. package/.pde/8b566792-ed15-4606-96f9-2b6f593d7e6b.json +111 -0
  18. package/.pde/8b566792-ed15-4606-96f9-2b6f593d7e6b.md +67 -0
  19. package/.pde/c7f1e74b-05a5-40e2-9f01-4cc48d2528f7.json +349 -0
  20. package/.pde/c7f1e74b-05a5-40e2-9f01-4cc48d2528f7.md +147 -0
  21. package/.pde/dfc00a78-1da0-4c09-8a16-c6982644051b.json +118 -0
  22. package/.pde/dfc00a78-1da0-4c09-8a16-c6982644051b.md +64 -0
  23. package/GUILLAUME.md +8 -0
  24. package/KINSHIP.md +9 -0
  25. package/MIA_CODE_ARCHITECTURE_REPORT.md +718 -0
  26. package/contextual_research/260119-MIA-CODE--98090899-8aff-4e11-9dc3-8b99466d1.md +1101 -0
  27. package/contextual_research/MIA.md +38 -0
  28. package/contextual_research/MIAWAPASCONE.md +59 -0
  29. package/contextual_research/MIETTE.md +38 -0
  30. package/contextual_research/PDE-generalization--caefee82-efb1-4dbb-8733-691b01581464--260130/2504.00218v2.pdf +7483 -12
  31. package/contextual_research/PDE-generalization--caefee82-efb1-4dbb-8733-691b01581464--260130/2505.00212v3.pdf +0 -0
  32. package/contextual_research/PDE-generalization--caefee82-efb1-4dbb-8733-691b01581464--260130/CONTENT.md +1014 -0
  33. package/contextual_research/PDE-generalization--caefee82-efb1-4dbb-8733-691b01581464--260130/DESIGN.gemini.md +242 -0
  34. package/contextual_research/PDE-generalization--caefee82-efb1-4dbb-8733-691b01581464--260130/INDEX.md +45 -0
  35. package/contextual_research/PDE-generalization--caefee82-efb1-4dbb-8733-691b01581464--260130/sources/2504.00218v2.md +2025 -0
  36. package/contextual_research/PDE-generalization--caefee82-efb1-4dbb-8733-691b01581464--260130/sources/2504.00218v2.pdf +7483 -12
  37. package/contextual_research/PDE-generalization--caefee82-efb1-4dbb-8733-691b01581464--260130/sources/2505.00212v3.md +1755 -0
  38. package/contextual_research/PDE-generalization--caefee82-efb1-4dbb-8733-691b01581464--260130/sources/2505.00212v3.pdf +0 -0
  39. package/contextual_research/PDE-generalization--caefee82-efb1-4dbb-8733-691b01581464--260130/sources/footnote_1_12_decomposed_prompting.pdf +0 -0
  40. package/contextual_research/PDE-generalization--caefee82-efb1-4dbb-8733-691b01581464--260130/sources/footnote_1_19_hugginggpt_planning.pdf +0 -0
  41. package/contextual_research/PDE-generalization--caefee82-efb1-4dbb-8733-691b01581464--260130/sources/footnote_1_1_coordination_challenges.md +766 -0
  42. package/contextual_research/PDE-generalization--caefee82-efb1-4dbb-8733-691b01581464--260130/sources/footnote_1_1_coordination_challenges.pdf +3431 -4
  43. package/contextual_research/PDE-generalization--caefee82-efb1-4dbb-8733-691b01581464--260130/sources/footnote_1_28_guardrails_multi_agent.md +260 -0
  44. package/contextual_research/PDE-generalization--caefee82-efb1-4dbb-8733-691b01581464--260130/sources/footnote_1_28_guardrails_multi_agent.pdf +0 -0
  45. package/contextual_research/PDE-generalization--caefee82-efb1-4dbb-8733-691b01581464--260130/sources/footnote_1_2_navigating_complexity.md +558 -0
  46. package/contextual_research/PDE-generalization--caefee82-efb1-4dbb-8733-691b01581464--260130/sources/footnote_1_2_navigating_complexity.pdf +0 -0
  47. package/contextual_research/PDE-generalization--caefee82-efb1-4dbb-8733-691b01581464--260130/sources/footnote_1_34_hierarchical_multi_agent.pdf +0 -0
  48. package/contextual_research/PDE-generalization--caefee82-efb1-4dbb-8733-691b01581464--260130/sources/footnote_1_5_open_intent_extraction.pdf +0 -0
  49. package/contextual_research/PODCAST.md +109 -0
  50. package/contextual_research/langchain-principles-roadmap.md +157 -0
  51. package/contextual_research/persona-to-narrative-character-inquiry_260201.md +50 -0
  52. package/dist/cli.js +35 -11
  53. package/dist/geminiHeadless.js +8 -2
  54. package/dist/index.js +2 -1
  55. package/dist/mcp/miaco-server.js +10 -1
  56. package/dist/mcp/miatel-server.js +10 -1
  57. package/dist/mcp/miawa-server.js +10 -1
  58. package/dist/mcp/utils.d.ts +6 -1
  59. package/dist/mcp/utils.js +24 -3
  60. package/dist/sessionStore.d.ts +8 -2
  61. package/dist/sessionStore.js +39 -3
  62. package/dist/types.d.ts +1 -0
  63. package/miaco/README.md +124 -0
  64. package/miaco/dist/commands/chart.d.ts +6 -0
  65. package/miaco/dist/commands/chart.d.ts.map +1 -0
  66. package/miaco/dist/commands/chart.js +222 -0
  67. package/miaco/dist/commands/chart.js.map +1 -0
  68. package/miaco/dist/commands/decompose.d.ts +6 -0
  69. package/miaco/dist/commands/decompose.d.ts.map +1 -0
  70. package/miaco/dist/commands/decompose.js +98 -0
  71. package/miaco/dist/commands/decompose.js.map +1 -0
  72. package/miaco/dist/commands/schema.d.ts +6 -0
  73. package/miaco/dist/commands/schema.d.ts.map +1 -0
  74. package/miaco/dist/commands/schema.js +66 -0
  75. package/miaco/dist/commands/schema.js.map +1 -0
  76. package/miaco/dist/commands/stc.d.ts +11 -0
  77. package/miaco/dist/commands/stc.d.ts.map +1 -0
  78. package/miaco/dist/commands/stc.js +590 -0
  79. package/miaco/dist/commands/stc.js.map +1 -0
  80. package/miaco/dist/commands/trace.d.ts +6 -0
  81. package/miaco/dist/commands/trace.d.ts.map +1 -0
  82. package/miaco/dist/commands/trace.js +83 -0
  83. package/miaco/dist/commands/trace.js.map +1 -0
  84. package/miaco/dist/commands/validate.d.ts +6 -0
  85. package/miaco/dist/commands/validate.d.ts.map +1 -0
  86. package/miaco/dist/commands/validate.js +58 -0
  87. package/miaco/dist/commands/validate.js.map +1 -0
  88. package/miaco/dist/decompose.d.ts +93 -0
  89. package/miaco/dist/decompose.d.ts.map +1 -0
  90. package/miaco/dist/decompose.js +562 -0
  91. package/miaco/dist/decompose.js.map +1 -0
  92. package/miaco/dist/index.d.ts +18 -0
  93. package/miaco/dist/index.d.ts.map +1 -0
  94. package/miaco/dist/index.js +83 -0
  95. package/miaco/dist/index.js.map +1 -0
  96. package/miaco/dist/storage.d.ts +60 -0
  97. package/miaco/dist/storage.d.ts.map +1 -0
  98. package/miaco/dist/storage.js +100 -0
  99. package/miaco/dist/storage.js.map +1 -0
  100. package/miaco/package-lock.json +4103 -0
  101. package/miaco/package.json +40 -0
  102. package/miaco/tsconfig.json +18 -0
  103. package/miaco/version-patch-commit-and-publish.sh +1 -0
  104. package/miatel/MISSION_251231.md +3 -0
  105. package/miatel/README.md +107 -0
  106. package/miatel/dist/commands/analyze.d.ts +6 -0
  107. package/miatel/dist/commands/analyze.d.ts.map +1 -0
  108. package/miatel/dist/commands/analyze.js +100 -0
  109. package/miatel/dist/commands/analyze.js.map +1 -0
  110. package/miatel/dist/commands/arc.d.ts +6 -0
  111. package/miatel/dist/commands/arc.d.ts.map +1 -0
  112. package/miatel/dist/commands/arc.js +71 -0
  113. package/miatel/dist/commands/arc.js.map +1 -0
  114. package/miatel/dist/commands/beat.d.ts +6 -0
  115. package/miatel/dist/commands/beat.d.ts.map +1 -0
  116. package/miatel/dist/commands/beat.js +165 -0
  117. package/miatel/dist/commands/beat.js.map +1 -0
  118. package/miatel/dist/commands/theme.d.ts +6 -0
  119. package/miatel/dist/commands/theme.d.ts.map +1 -0
  120. package/miatel/dist/commands/theme.js +54 -0
  121. package/miatel/dist/commands/theme.js.map +1 -0
  122. package/miatel/dist/index.d.ts +18 -0
  123. package/miatel/dist/index.d.ts.map +1 -0
  124. package/miatel/dist/index.js +80 -0
  125. package/miatel/dist/index.js.map +1 -0
  126. package/miatel/dist/storage.d.ts +55 -0
  127. package/miatel/dist/storage.d.ts.map +1 -0
  128. package/miatel/dist/storage.js +100 -0
  129. package/miatel/dist/storage.js.map +1 -0
  130. package/miatel/package-lock.json +4103 -0
  131. package/miatel/package.json +35 -0
  132. package/miatel/src/commands/analyze.ts +109 -0
  133. package/miatel/src/commands/arc.ts +78 -0
  134. package/miatel/src/commands/beat.ts +176 -0
  135. package/miatel/src/commands/theme.ts +60 -0
  136. package/miatel/src/index.ts +94 -0
  137. package/miatel/src/storage.ts +156 -0
  138. package/miatel/tsconfig.json +18 -0
  139. package/miawa/MISSION_251231.md +144 -0
  140. package/miawa/README.md +133 -0
  141. package/miawa/dist/commands/beat.d.ts +6 -0
  142. package/miawa/dist/commands/beat.d.ts.map +1 -0
  143. package/miawa/dist/commands/beat.js +69 -0
  144. package/miawa/dist/commands/beat.js.map +1 -0
  145. package/miawa/dist/commands/ceremony.d.ts +6 -0
  146. package/miawa/dist/commands/ceremony.d.ts.map +1 -0
  147. package/miawa/dist/commands/ceremony.js +239 -0
  148. package/miawa/dist/commands/ceremony.js.map +1 -0
  149. package/miawa/dist/commands/circle.d.ts +6 -0
  150. package/miawa/dist/commands/circle.d.ts.map +1 -0
  151. package/miawa/dist/commands/circle.js +75 -0
  152. package/miawa/dist/commands/circle.js.map +1 -0
  153. package/miawa/dist/commands/eva.d.ts +6 -0
  154. package/miawa/dist/commands/eva.d.ts.map +1 -0
  155. package/miawa/dist/commands/eva.js +73 -0
  156. package/miawa/dist/commands/eva.js.map +1 -0
  157. package/miawa/dist/commands/wound.d.ts +6 -0
  158. package/miawa/dist/commands/wound.d.ts.map +1 -0
  159. package/miawa/dist/commands/wound.js +74 -0
  160. package/miawa/dist/commands/wound.js.map +1 -0
  161. package/miawa/dist/index.d.ts +19 -0
  162. package/miawa/dist/index.d.ts.map +1 -0
  163. package/miawa/dist/index.js +91 -0
  164. package/miawa/dist/index.js.map +1 -0
  165. package/miawa/dist/storage.d.ts +73 -0
  166. package/miawa/dist/storage.d.ts.map +1 -0
  167. package/miawa/dist/storage.js +100 -0
  168. package/miawa/dist/storage.js.map +1 -0
  169. package/miawa/package-lock.json +4103 -0
  170. package/miawa/package.json +36 -0
  171. package/miawa/src/commands/beat.ts +74 -0
  172. package/miawa/src/commands/ceremony.ts +256 -0
  173. package/miawa/src/commands/circle.ts +83 -0
  174. package/miawa/src/commands/eva.ts +84 -0
  175. package/miawa/src/commands/wound.ts +79 -0
  176. package/miawa/src/index.ts +108 -0
  177. package/miawa/src/storage.ts +179 -0
  178. package/miawa/tsconfig.json +18 -0
  179. package/package.json +7 -5
  180. package/references/acp/CLAUDE.md +7 -0
  181. package/references/acp/agent-plan.md +84 -0
  182. package/references/acp/clients.md +31 -0
  183. package/references/acp/extensibility.md +137 -0
  184. package/references/acp/initialization.md +225 -0
  185. package/references/acp/prompt-turn.md +321 -0
  186. package/references/acp/proxy-chains.md +562 -0
  187. package/references/acp/schema.md +3171 -0
  188. package/references/acp/session-list.md +334 -0
  189. package/references/acp/session-modes.md +170 -0
  190. package/references/acp/slash-commands.md +99 -0
  191. package/references/acp/terminals.md +281 -0
  192. package/references/acp/tool-calls.md +311 -0
  193. package/references/acp/typescript.md +29 -0
  194. package/references/claude/agent-teams.md +399 -0
  195. package/references/claude/chrome.md +231 -0
  196. package/references/claude/headless.md +158 -0
  197. package/references/claude/hooks-guide.md +708 -0
  198. package/references/claude/output-styles.md +112 -0
  199. package/references/claude/plugins.md +432 -0
  200. package/references/claude/skills.md +693 -0
  201. package/references/claude/sub-agents.md +816 -0
  202. package/references/copilot/acp/agents.md +32 -0
  203. package/references/copilot/acp/architecture.md +37 -0
  204. package/references/copilot/acp/clients.md +31 -0
  205. package/references/copilot/acp/introduction.md +42 -0
  206. package/references/copilot/acp/registry.md +339 -0
  207. package/references/copilot/acp-server.md +117 -0
  208. package/references/copilot/create-copilot-instructions.md +840 -0
  209. package/references/langchain/llms.txt +833 -0
  210. package/references/langchain/python/agents.md +677 -0
  211. package/references/langchain/python/context-engineering.md +1195 -0
  212. package/references/langchain/python/human-in-the-loop.md +326 -0
  213. package/references/langchain/python/long-term-memory.md +168 -0
  214. package/references/langchain/python/mcp.md +949 -0
  215. package/references/langchain/python/multi-agents/custom-workflow.md +187 -0
  216. package/references/langchain/python/multi-agents/handoffs.md +436 -0
  217. package/references/langchain/python/multi-agents/overview.md +295 -0
  218. package/references/langchain/python/multi-agents/router.md +150 -0
  219. package/references/langchain/python/multi-agents/skills.md +92 -0
  220. package/references/langchain/python/multi-agents/subagents.md +486 -0
  221. package/references/langchain/python/retrieval.md +320 -0
  222. package/references/langchain/python/runtime.md +141 -0
  223. package/references/langchain/python/short-term-memory.md +658 -0
  224. package/references/langchain/python/structured-output.md +712 -0
  225. package/references/langfuse/llms.txt +148 -0
  226. package/references/langgraph/javascript/llms.txt +275 -0
  227. package/references/skills/home.md +259 -0
  228. package/references/skills/integrate-skills.md +103 -0
  229. package/references/skills/specification.md +254 -0
  230. package/references/skills/what-are-skills.md +74 -0
  231. package/rispecs/README.md +164 -0
  232. package/rispecs/_sync_/miadi-code/SPEC.md +313 -0
  233. package/rispecs/_sync_/miadi-code/STATUS.md +177 -0
  234. package/rispecs/_sync_/miadi-code/dashboard/SPEC.md +465 -0
  235. package/rispecs/_sync_/miadi-code/dashboard/STATUS.md +212 -0
  236. package/rispecs/_sync_/miadi-code/multiline-input/SPEC.md +232 -0
  237. package/rispecs/_sync_/miadi-code/multiline-input/STATUS.md +108 -0
  238. package/rispecs/_sync_/miadi-code/pde/SPEC.md +253 -0
  239. package/rispecs/_sync_/miadi-code/pde/STATUS.md +56 -0
  240. package/rispecs/_sync_/miadi-code/stc/SPEC.md +397 -0
  241. package/rispecs/_sync_/miadi-code/stc/STATUS.md +70 -0
  242. package/rispecs/ava-langstack/inquiry-routing-upgrade.spec.md +119 -0
  243. package/rispecs/borrowed_from_opencode/001-client-server-architecture.rispec.md +98 -0
  244. package/rispecs/borrowed_from_opencode/002-event-bus-system.rispec.md +125 -0
  245. package/rispecs/borrowed_from_opencode/003-instance-state-pattern.rispec.md +136 -0
  246. package/rispecs/borrowed_from_opencode/004-namespace-module-pattern.rispec.md +151 -0
  247. package/rispecs/borrowed_from_opencode/005-zod-schema-validation.rispec.md +139 -0
  248. package/rispecs/borrowed_from_opencode/006-named-error-system.rispec.md +155 -0
  249. package/rispecs/borrowed_from_opencode/007-structured-logging.rispec.md +138 -0
  250. package/rispecs/borrowed_from_opencode/008-lazy-initialization.rispec.md +127 -0
  251. package/rispecs/borrowed_from_opencode/009-multi-agent-system.rispec.md +97 -0
  252. package/rispecs/borrowed_from_opencode/010-agent-definition-config.rispec.md +135 -0
  253. package/rispecs/borrowed_from_opencode/011-agent-permission-rulesets.rispec.md +151 -0
  254. package/rispecs/borrowed_from_opencode/012-agent-prompt-templates.rispec.md +141 -0
  255. package/rispecs/borrowed_from_opencode/013-agent-generation.rispec.md +142 -0
  256. package/rispecs/borrowed_from_opencode/014-plan-build-mode-toggle.rispec.md +155 -0
  257. package/rispecs/borrowed_from_opencode/015-subagent-task-delegation.rispec.md +146 -0
  258. package/rispecs/borrowed_from_opencode/016-agent-model-selection.rispec.md +151 -0
  259. package/rispecs/borrowed_from_opencode/017-compaction-agent.rispec.md +150 -0
  260. package/rispecs/borrowed_from_opencode/018-session-persistence.rispec.md +125 -0
  261. package/rispecs/borrowed_from_opencode/019-session-compaction.rispec.md +132 -0
  262. package/rispecs/borrowed_from_opencode/020-session-forking.rispec.md +134 -0
  263. package/rispecs/borrowed_from_opencode/021-session-revert-snapshot.rispec.md +135 -0
  264. package/rispecs/borrowed_from_opencode/022-session-sharing.rispec.md +165 -0
  265. package/rispecs/borrowed_from_opencode/023-session-summary-diffs.rispec.md +165 -0
  266. package/rispecs/borrowed_from_opencode/024-child-sessions.rispec.md +164 -0
  267. package/rispecs/borrowed_from_opencode/025-session-title-generation.rispec.md +162 -0
  268. package/rispecs/borrowed_from_opencode/026-message-parts-model.rispec.md +201 -0
  269. package/rispecs/borrowed_from_opencode/027-streaming-message-deltas.rispec.md +212 -0
  270. package/rispecs/borrowed_from_opencode/028-multi-provider-architecture.rispec.md +184 -0
  271. package/rispecs/borrowed_from_opencode/029-provider-authentication.rispec.md +225 -0
  272. package/rispecs/borrowed_from_opencode/030-model-registry.rispec.md +222 -0
  273. package/rispecs/borrowed_from_opencode/031-cost-tracking.rispec.md +243 -0
  274. package/rispecs/borrowed_from_opencode/032-provider-transform-pipeline.rispec.md +282 -0
  275. package/rispecs/borrowed_from_opencode/033-provider-sdk-abstraction.rispec.md +338 -0
  276. package/rispecs/borrowed_from_opencode/034-tool-registry.rispec.md +110 -0
  277. package/rispecs/borrowed_from_opencode/035-tool-context-injection.rispec.md +155 -0
  278. package/rispecs/borrowed_from_opencode/036-tool-output-truncation.rispec.md +138 -0
  279. package/rispecs/borrowed_from_opencode/037-batch-tool.rispec.md +129 -0
  280. package/rispecs/borrowed_from_opencode/038-multi-edit-tool.rispec.md +167 -0
  281. package/rispecs/borrowed_from_opencode/039-apply-patch-tool.rispec.md +161 -0
  282. package/rispecs/borrowed_from_opencode/040-code-search-tool.rispec.md +143 -0
  283. package/rispecs/borrowed_from_opencode/041-web-fetch-tool.rispec.md +131 -0
  284. package/rispecs/borrowed_from_opencode/042-web-search-tool.rispec.md +159 -0
  285. package/rispecs/borrowed_from_opencode/043-todo-tool.rispec.md +156 -0
  286. package/rispecs/borrowed_from_opencode/044-plan-mode-tool.rispec.md +139 -0
  287. package/rispecs/borrowed_from_opencode/045-task-tool.rispec.md +146 -0
  288. package/rispecs/borrowed_from_opencode/046-question-tool.rispec.md +170 -0
  289. package/rispecs/borrowed_from_opencode/047-external-directory-tool.rispec.md +166 -0
  290. package/rispecs/borrowed_from_opencode/048-file-read-write-tools.rispec.md +205 -0
  291. package/rispecs/borrowed_from_opencode/049-lsp-server-management.rispec.md +104 -0
  292. package/rispecs/borrowed_from_opencode/050-lsp-hover-completion.rispec.md +102 -0
  293. package/rispecs/borrowed_from_opencode/051-lsp-diagnostics.rispec.md +86 -0
  294. package/rispecs/borrowed_from_opencode/052-lsp-root-detection.rispec.md +109 -0
  295. package/rispecs/borrowed_from_opencode/053-remote-mcp-servers.rispec.md +119 -0
  296. package/rispecs/borrowed_from_opencode/054-mcp-oauth-flow.rispec.md +107 -0
  297. package/rispecs/borrowed_from_opencode/055-mcp-tool-conversion.rispec.md +118 -0
  298. package/rispecs/borrowed_from_opencode/056-mcp-connection-monitoring.rispec.md +106 -0
  299. package/rispecs/borrowed_from_opencode/057-local-mcp-servers.rispec.md +116 -0
  300. package/rispecs/borrowed_from_opencode/058-rich-tui.rispec.md +108 -0
  301. package/rispecs/borrowed_from_opencode/059-streaming-display.rispec.md +116 -0
  302. package/rispecs/borrowed_from_opencode/060-permission-prompts.rispec.md +130 -0
  303. package/rispecs/borrowed_from_opencode/061-session-navigation.rispec.md +155 -0
  304. package/rispecs/borrowed_from_opencode/062-syntax-highlighting.rispec.md +151 -0
  305. package/rispecs/borrowed_from_opencode/063-keybinding-system.rispec.md +181 -0
  306. package/rispecs/borrowed_from_opencode/064-multi-level-config.rispec.md +155 -0
  307. package/rispecs/borrowed_from_opencode/065-jsonc-config.rispec.md +190 -0
  308. package/rispecs/borrowed_from_opencode/066-config-env-variables.rispec.md +153 -0
  309. package/rispecs/borrowed_from_opencode/067-config-deep-merging.rispec.md +178 -0
  310. package/rispecs/borrowed_from_opencode/068-remote-org-config.rispec.md +183 -0
  311. package/rispecs/borrowed_from_opencode/069-config-markdown-frontmatter.rispec.md +206 -0
  312. package/rispecs/borrowed_from_opencode/070-managed-config-directory.rispec.md +232 -0
  313. package/rispecs/borrowed_from_opencode/071-plugin-architecture.rispec.md +104 -0
  314. package/rispecs/borrowed_from_opencode/072-plugin-hooks.rispec.md +123 -0
  315. package/rispecs/borrowed_from_opencode/073-plugin-auto-install.rispec.md +115 -0
  316. package/rispecs/borrowed_from_opencode/074-permission-system.rispec.md +133 -0
  317. package/rispecs/borrowed_from_opencode/075-git-worktree-management.rispec.md +126 -0
  318. package/rispecs/borrowed_from_opencode/076-snapshot-system.rispec.md +124 -0
  319. package/rispecs/borrowed_from_opencode/077-snapshot-diff.rispec.md +117 -0
  320. package/rispecs/borrowed_from_opencode/078-snapshot-restore.rispec.md +128 -0
  321. package/rispecs/borrowed_from_opencode/079-worktree-branch-naming.rispec.md +122 -0
  322. package/rispecs/borrowed_from_opencode/080-sqlite-storage.rispec.md +134 -0
  323. package/rispecs/borrowed_from_opencode/081-database-migrations.rispec.md +148 -0
  324. package/rispecs/borrowed_from_opencode/082-database-transactions.rispec.md +138 -0
  325. package/rispecs/borrowed_from_opencode/083-deferred-effects.rispec.md +148 -0
  326. package/rispecs/borrowed_from_opencode/084-permission-rules.rispec.md +123 -0
  327. package/rispecs/borrowed_from_opencode/085-permission-glob-patterns.rispec.md +113 -0
  328. package/rispecs/borrowed_from_opencode/086-permission-merging.rispec.md +134 -0
  329. package/rispecs/borrowed_from_opencode/087-permission-modes.rispec.md +145 -0
  330. package/rispecs/borrowed_from_opencode/088-http-api-server.rispec.md +165 -0
  331. package/rispecs/borrowed_from_opencode/089-openapi-spec-generation.rispec.md +164 -0
  332. package/rispecs/borrowed_from_opencode/090-websocket-support.rispec.md +136 -0
  333. package/rispecs/borrowed_from_opencode/091-sse-streaming.rispec.md +168 -0
  334. package/rispecs/borrowed_from_opencode/092-mdns-discovery.rispec.md +145 -0
  335. package/rispecs/borrowed_from_opencode/093-javascript-sdk.rispec.md +200 -0
  336. package/rispecs/borrowed_from_opencode/094-skill-system.rispec.md +187 -0
  337. package/rispecs/borrowed_from_opencode/095-skill-discovery.rispec.md +182 -0
  338. package/rispecs/borrowed_from_opencode/096-desktop-remote-driving.rispec.md +175 -0
  339. package/rispecs/borrowed_from_opencode/INDEX.md +255 -0
  340. package/rispecs/core.rispecs.md +261 -0
  341. package/rispecs/engines.rispecs.md +241 -0
  342. package/rispecs/formatting.rispecs.md +252 -0
  343. package/rispecs/living-specifications.rispecs.md +361 -0
  344. package/rispecs/mcp.rispecs.md +197 -0
  345. package/rispecs/pde.rispecs.md +399 -0
  346. package/rispecs/pi-mono-envisionning/ENVISIONING.md +366 -0
  347. package/rispecs/pi-mono-envisionning/storytelling-horizon.rispecs.md +76 -0
  348. package/rispecs/pi-mono-envisionning/widget.rispecs.md +2 -0
  349. package/rispecs/relation-to-mcp-structural-thinking.kin.md +72 -0
  350. package/rispecs/research-for-better-framework/CLAUDE.md +7 -0
  351. package/rispecs/research-for-better-framework/survey-pi-openclaw-opencode-openhands.md +210 -0
  352. package/rispecs/session.rispecs.md +277 -0
  353. package/rispecs/stc.rispecs.md +138 -0
  354. package/rispecs/unifier.rispecs.md +317 -0
  355. package/scripts/LAUNCH--mcp-mia-code--testing--2603141315--ac705a66-2c15-4a1c-a26d-9491018c5ba8.sh +2 -0
  356. package/scripts/RESUME--mia-code--mcps--260313--ac705a66-2c15-4a1c-a26d-9491018c5ba8.sh +1 -0
  357. package/scripts/install-widget-in-home-pi-agent-extensions.sh +4 -0
  358. package/scripts/sample-decompose--2604011535-prompt.sh +1 -0
  359. package/skills/deep-search/AGENTS.md +17 -0
  360. package/skills/deep-search/SKILL.md +281 -0
  361. package/skills/deep-search/agent-templates.md +224 -0
  362. package/skills/deep-search/orchestration-patterns.md +95 -0
  363. package/skills/miaco-pde-inquiry-routing-deep-search/AGENTS.md +13 -0
  364. package/skills/miaco-pde-inquiry-routing-deep-search/SKILL.md +136 -0
  365. package/skills/miaco-pde-inquiry-routing-internal-external-relationship/AGENTS.md +4 -0
  366. package/skills/miaco-pde-inquiry-routing-internal-external-relationship/SKILL.md +157 -0
  367. package/skills/miaco-pde-inquiry-routing-local-qmd/AGENTS.md +42 -0
  368. package/skills/miaco-pde-inquiry-routing-local-qmd/SKILL.md +135 -0
  369. package/skills/qmd/AGENTS.md +3 -0
  370. package/skills/qmd/SKILL.md +144 -0
  371. package/skills/qmd/references/mcp-setup.md +102 -0
  372. package/skills/rise-pde-inquiry-session-multi-agents-v3/SKILL.md +234 -0
  373. package/skills/rise-pde-inquiry-session-multi-agents-v3/agent-templates.md +436 -0
  374. package/skills/rise-pde-inquiry-session-multi-agents-v3/orchestration-patterns.md +197 -0
  375. package/skills/rise-pde-inquiry-session-multi-agents-v3/references/ceremonial-technology.md +102 -0
  376. package/skills/rise-pde-inquiry-session-multi-agents-v3/references/creative-orientation.md +99 -0
  377. package/skills/rise-pde-inquiry-session-multi-agents-v3/references/prompt-decomposition.md +73 -0
  378. package/skills/rise-pde-inquiry-session-multi-agents-v3/references/rise-framework.md +74 -0
  379. package/skills/rise-pde-inquiry-session-multi-agents-v3/references/structural-tension.md +82 -0
  380. package/src/cli.ts +35 -11
  381. package/src/geminiHeadless.ts +7 -2
  382. package/src/index.ts +2 -1
  383. package/src/mcp/miaco-server.ts +13 -1
  384. package/src/mcp/miatel-server.ts +13 -1
  385. package/src/mcp/miawa-server.ts +13 -1
  386. package/src/mcp/utils.ts +41 -8
  387. package/src/sessionStore.ts +44 -4
  388. package/src/types.ts +2 -1
  389. package/widget/mia-ceremony/README.md +36 -0
  390. package/widget/mia-ceremony/index.ts +143 -0
  391. package/widget/mia-interceptor/README.md +39 -0
  392. package/widget/mia-interceptor/index.ts +221 -0
  393. package/widget/mia-tools/README.md +37 -0
  394. package/widget/mia-tools/index.ts +569 -0
  395. package/widget/miette-echo/README.md +44 -0
  396. package/widget/miette-echo/index.ts +164 -0
  397. package/.claude/settings.local.json +0 -9
  398. package/.hch/issue_.env +0 -4
  399. package/.hch/issue_add__2601211715.json +0 -77
  400. package/.hch/issue_add__2601211715.md +0 -4
  401. package/.hch/issue_add__2602242020.json +0 -78
  402. package/.hch/issue_add__2602242020.md +0 -7
  403. package/.hch/issues.json +0 -2312
  404. package/.hch/issues.md +0 -30
  405. package/WS__mia-code__260214__IAIP_PDE.code-workspace +0 -29
  406. package/WS__mia-code__src332__260122.code-workspace +0 -23
  407. package/samples/copilot/session-state/be76abaa-a27f-4725-b2a9-22fb45f7e0f7/checkpoints/index.md +0 -6
  408. package/samples/copilot/session-state/be76abaa-a27f-4725-b2a9-22fb45f7e0f7/events.jsonl +0 -213
  409. package/samples/copilot/session-state/be76abaa-a27f-4725-b2a9-22fb45f7e0f7/plan.md +0 -243
  410. package/samples/copilot/session-state/be76abaa-a27f-4725-b2a9-22fb45f7e0f7/workspace.yaml +0 -5
@@ -0,0 +1,2025 @@
1
+ Agents Under Siege: Breaking Pragmatic Multi-Agent LLM Systems with
2
+ Optimized Prompt Attacks
3
+
4
+ WARNING: This paper contains text that may be considered offensive.
5
+
6
+ Rana Muhammad Shahroz Khan1, Zhen Tan2, Sukwon Yun1,
7
+ Charles Fleming3, Tianlong Chen1
8
+
9
+ 1University of North Carolina at Chapel Hill, 2Arizona State University, 3Cisco
10
+
11
+ 5
12
+ 2
13
+ 0
14
+ 2
15
+
16
+ t
17
+ c
18
+ O
19
+ 8
20
+
21
+ ]
22
+
23
+ A
24
+ M
25
+
26
+ .
27
+ s
28
+ c
29
+ [
30
+
31
+ 2
32
+ v
33
+ 8
34
+ 1
35
+ 2
36
+ 0
37
+ 0
38
+ .
39
+ 4
40
+ 0
41
+ 5
42
+ 2
43
+ :
44
+ v
45
+ i
46
+ X
47
+ r
48
+ a
49
+
50
+ Abstract
51
+
52
+ Most discussions about Large Language Model
53
+ (LLM) safety have focused on single-agent set-
54
+ tings but multi-agent LLM systems now cre-
55
+ ate novel adversarial risks because their be-
56
+ havior depends on communication between
57
+ agents and decentralized reasoning.
58
+ In this
59
+ work, we innovatively focus on attacking prag-
60
+ matic systems that have constrains such as lim-
61
+ ited token bandwidth, latency between mes-
62
+ sage delivery, and defense mechanisms. We de-
63
+ sign a permutation-invariant adversarial attack
64
+ that optimizes prompt distribution across la-
65
+ tency and bandwidth-constraint network topolo-
66
+ gies to bypass distributed safety mechanisms
67
+ within the system. Formulating the attack
68
+ path as a problem of maximum-flow minimum-
69
+ cost, coupled with the novel Permutation-
70
+ Invariant Evasion Loss (PIEL), we leverage
71
+ graph-based optimization to maximize attack
72
+ success rate while minimizing detection risk.
73
+ Evaluating across models including Llama,
74
+ Mistral, Gemma, DeepSeek and other variants
75
+ on various datasets like JailBreakBench and
76
+ AdversarialBench, our method outperforms
77
+ conventional attacks by up to 7×, exposing
78
+ critical vulnerabilities in multi-agent systems.
79
+ Moreover, we demonstrate that existing de-
80
+ fenses, including variants of Llama-Guard and
81
+ PromptGuard, fail to prohibit our attack, em-
82
+ phasizing the urgent need for multi-agent spe-
83
+ cific safety mechanisms.
84
+
85
+ 1
86
+
87
+ Introduction
88
+
89
+ Recent breakthroughs in Large Language Mod-
90
+ els (LLMs) have shown remarkable prowess in
91
+ various tasks, such as writing complex computer
92
+ code (Zheng et al., 2023; Tong and Zhang, 2024),
93
+ logical reasoning (Ouyang et al., 2022; Thoppi-
94
+ lan et al., 2022; Bai et al., 2022), among others.
95
+ However, as real-world tasks become increasingly
96
+ complex, a single LLM is often insufficient to han-
97
+ dle all aspects of a complex task. This limitation
98
+ has led to the rise of LLM-based agents (Liang
99
+
100
+ Figure 1: Adversarial attack in multi-agent LLM sys-
101
+ tems. Top: Network topology showing communi-
102
+ cation between agents and the targeted string attack
103
+ flow from source to target. Bottom: Comparison be-
104
+ tween existing approaches that fail under constraints
105
+ and our method using MFMC problem formulation
106
+ and Permutation-Invariant Loss, which successfully by-
107
+ passes safety mechanisms while respecting constraints.
108
+
109
+ et al., 2023; Yang et al., 2023), which integrate
110
+ language generation with different tools. Recent
111
+ research (Du et al., 2023; Wu et al., 2023; Liu et al.,
112
+ 2023d; Shinn et al., 2024; Wang et al., 2023; Zhang
113
+ et al., 2024b; Qian et al., 2023; Jin et al., 2023) has
114
+ further shown that multi-agent LLM systems can
115
+ significantly enhance task performance by distribut-
116
+ ing reasoning and leveraging collective intelligence.
117
+ These systems offer advantages in scalability and
118
+ adaptability, making them increasingly relevant in
119
+ autonomous systems, large-scale content modera-
120
+ tion, and AI-driven governance.
121
+
122
+ Despite their advantages, multi-agent LLM sys-
123
+ tems introduce novel security risks (Amayuelas
124
+ et al., 2024) that remain largely unexplored. While
125
+ previous research has extensively studied vulnera-
126
+ bilities in single-agent settings, such as adversarial
127
+ prompting for jailbreak attacks (Zou et al., 2023),
128
+ and data poisoning (Ramirez et al., 2022), attacking
129
+
130
+ a multi-agent system poses some unique challenges
131
+ and settings. Existing works highlight key aspects
132
+ of these vulnerabilities. Evil Geniuses (Tian et al.,
133
+ 2023) explores role-based adversarial prompting,
134
+ emphasizing the need to further investigate inter-
135
+ agent communication risks. Similarly, Prompt
136
+ Infection (Lee and Tiwari, 2024) introduces self-
137
+ replicating prompt injections, demonstrating how
138
+ adversarial prompts can persist and spread. Based
139
+ on those works, this paper targets multi-agent sys-
140
+ tems in a novel pragmatic scenario: Optimizing
141
+ adversarial prompt propagation in latency aware
142
+ and token bandwidth limited multi-agent systems
143
+ with built-in safety mechanisms. We aim to reveal
144
+ that in such constrained settings, those systems still
145
+ exhibit adversarial vectors as attackers can manip-
146
+ ulate inter-agent messaging, exploit communica-
147
+ tion bottlenecks, or disrupt agent coordination to
148
+ achieve malicious objectives.
149
+
150
+ Given this premise, as shown in Figure 1, and
151
+ the key works discussed above, our paper aims to
152
+ answer this key question:
153
+
154
+ (Q) How can adversarial prompts be opti-
155
+ mally propagated through constrained multi-
156
+ agent LLM system to evade detection while en-
157
+ suring jailbreak success, considering token band-
158
+ width limits and asynchronous message arrival?
159
+
160
+ In this paper, we develop a permutation-invariant
161
+ attack in multi-agent settings that exploits inter-
162
+ agent communication to bypass safety mechanisms.
163
+ Unlike conventional jailbreak attacks that target
164
+ a single standalone model, our method distributes
165
+ adversarial prompts across the agent network by op-
166
+ timizing over the topology and its constraints, with
167
+ a goal to attack a single agent not accessible outside
168
+ the system. This ensures that the attack propagates
169
+ undetected which maximizes its effectiveness. We
170
+ formalize this attack as a maximum-flow minimum-
171
+ cost optimization problem, accounting for token
172
+ bandwidth constraints, communication topologies,
173
+ and distributed safety enforcement.
174
+
175
+ To validate our method, we conduct extensive ex-
176
+ periments across multiple architectures, including
177
+ Llama-2-7B (Touvron et al., 2023). We bench-
178
+ mark our attack on a variety of datasets including
179
+ JailbreakBench (Chao et al., 2024), demonstrat-
180
+ ing that our method achieves upto 7× the attack
181
+ success rate compared with a vanilla prompt. Fur-
182
+ thermore, we evaluate the effectiveness of existing
183
+ safety mechanisms, including Llama-Guard (Inan
184
+ et al., 2023), and show that they fail to defend
185
+
186
+ against our attack in multi-agent settings. Lastly,
187
+ we justify our settings and hyper-parameters with
188
+ different ablation studies.
189
+
190
+ Our key contributions include: ❶ We identify
191
+ new vulnerabilities in multi-agent communication,
192
+ where attackers can manipulate inter-agent mes-
193
+ saging to bypass existing safety constraints. We
194
+ analyze realistic attack scenarios, including token
195
+ bandwidth and message asynchrony, demonstrating
196
+ fundamental weaknesses in multi-agent reasoning
197
+ systems. ❷ We propose a novel optimization-based
198
+ attack, modeling adversarial prompt propagation
199
+ under the constrained setting as a maximum-flow
200
+ minimum-cost problem. Our method remains ef-
201
+ fective across different graph configurations, en-
202
+ suring high attack success rates even in random-
203
+ ized agent topologies. ❸ We evaluate our at-
204
+ tack across multiple LLM architectures, includ-
205
+ ing Llama, Mistral, Gemma, and their DeepSeek-
206
+ R1 distilled versions. Our method is bench-
207
+ marked on JailbreakBench, AdversarialBench,
208
+ and In-the-wild Jailbreak Prompts, demon-
209
+ strating attack success rates of upto 94%, signifi-
210
+ cantly outperforming naive prompting (11%). We
211
+ conduct ablation studies on different topologies, of-
212
+ fering practical insights into securing multi-agent
213
+ LLM deployments. ❹ Additionally, we assess
214
+ various safety mechanisms, including variants of
215
+ Llama-Guard and PromptGuard, and show that
216
+ they fail to prohibit our attack, highlighting the
217
+ urgent need for advanced defenses.
218
+
219
+ 2 Related Works
220
+
221
+ Multi-LLM Agents. Large Language Model
222
+ agents have shown remarkable performance in vari-
223
+ ous tasks through mutual collaboration (Chen et al.,
224
+ 2023; Hua et al., 2023; Cohen et al., 2023; Zhou
225
+ et al., 2023; Li et al., 2023b,a; Chan et al., 2023;
226
+ Dong et al., 2024; Qian et al., 2023). A grow-
227
+ ing body of research demonstrates how integrating
228
+ multiple agents in collaborative frameworks can en-
229
+ hance problem-solving abilities in complex scenar-
230
+ ios (Liu et al., 2023a; Chen et al., 2024). Notable
231
+ examples include Generative Agents (Park et al.,
232
+ 2023), to simulate a town of 25 agents and study
233
+ social interactions and collective memory. The Nat-
234
+ ural Language-Based Society (NL-SOM) (Zhuge
235
+ et al., 2023) takes a different approach, orchestrat-
236
+ ing agents with specialized functions to tackle com-
237
+ plex tasks through iterative "mindstorms". Despite
238
+ the performance and effectiveness, new security
239
+
240
+ concerns have raised.
241
+
242
+ Jailbreak Attacks in LLMs. Research from re-
243
+ cent studies shows that Large Language Models
244
+ face serious security risks because certain precisely
245
+ designed prompts can disable their fundamental
246
+ safety features (Zeng et al., 2024a; Bai et al., 2022).
247
+ These attacks, referred to as "jailbreak" attacks,
248
+ demonstrate remarkable effectiveness by causing
249
+ LLMs to produce content which breaks their de-
250
+ clared ethical and operational guidelines. Research
251
+ in this field has progressed through two separate
252
+ development paths: (1) Traditional prompt engi-
253
+ neering where human researchers create decep-
254
+ tive prompts (Wei et al., 2024; Liu et al., 2023c;
255
+ Shen et al., 2024), (2) attack strategies developed
256
+ from learning-based approaches where we opti-
257
+ mize methods automatically to attack LLMs (Guo
258
+ et al., 2021; Lyu et al., 2022, 2023, 2024; Liu et al.,
259
+ 2023b; Zou et al., 2023). The learning-based meth-
260
+ ods are particularly concerning as they can system-
261
+ atically exploit the weaknesses of a system.
262
+
263
+ Jailbreak Attacks in Multi-Agent Systems.
264
+ The landscape of jailbreak attacks continues to ex-
265
+ pand, with many recent works (Tian et al., 2023;
266
+ Gu et al., 2024; Tan et al., 2024; Zeng et al., 2024b;
267
+ Lee and Tiwari, 2024) exploring how they affect
268
+ Multi-agent systems. While these studies highlight
269
+ critical risks, our work differs in its focus on adver-
270
+ sarial prompt propagation under constrained multi-
271
+ agent communication with certain limitations: to-
272
+ ken bandwidth constraints, latency-aware messag-
273
+ ing, and decentralized safety enforcement. Unlike
274
+ Evil Geniuses (Tian et al., 2023), which examines
275
+ role-based adversarial attacks, we optimize prompt
276
+ routing to exploit the network topology with the
277
+ above mentioned bottlenecks. Similarly, Agent
278
+ Smith (Gu et al., 2024), studies the exponential jail-
279
+ break propagation, but assumes unrestricted inter-
280
+ agent messaging, unlike our token-bandwidth con-
281
+ straint communication. Wolf Within (Tan et al.,
282
+ 2024) and Prompt Infection (Lee and Tiwari, 2024)
283
+ explore malicious prompt propagation, but focus
284
+ on stealthy influence and self-replicating attacks,
285
+ respectively. By modeling pragmatic multi-agent
286
+ attack scenarios, our study aims to study security
287
+ challenges beyond existing works, particularly ex-
288
+ tending them to ensure effective jailbreaks despite
289
+ topological constraints.
290
+ 3 Threat Model
291
+ In this section, we will introduce the settings to
292
+ study the vulnerabilities of a multi-agent system
293
+
294
+ in a realistic manner. We will discuss the general
295
+ settings of the environment that the adversary will
296
+ be deployed into, as well as discuss the capabilities
297
+ of the said adversary.
298
+
299
+ 3.1 Scenario
300
+
301
+ We consider a multi-agent LLM system, denoted
302
+ by S, where multiple LLMs operate within a con-
303
+ nected network, communicating with one another
304
+ to complete tasks collaboratively. The agents in
305
+ this system exchange messages via a predefined
306
+ communication topology L (essentially undirected
307
+ graph), which dictates how prompts are passed be-
308
+ tween models. Every input into any LLM is passed
309
+ around to its neighbors as well. Similarly, each
310
+ individual agent is responsible for its own mem-
311
+ ory bank, i.e. the context window which accumu-
312
+ lates over time until a maximum size is reached,
313
+ in which case the model evicts the oldest memory
314
+ first. We assume in our setting that the memory
315
+ bank basically accumulates the inputs an agent has
316
+ received, and at the time of inference concatenates
317
+ everything into a string which is used as a context
318
+ to generate the new output. An illustration of the
319
+ threat model is also presented in Figure 2. This
320
+ setting introduces several key constraints that make
321
+ adversarial attacks fundamentally different from
322
+ traditional single LLM:
323
+
324
+ ❶ Each edge in the network has a token band-
325
+ width constraint, F(uv) for edge uv (between
326
+ LLMs u and v), meaning only a limited number
327
+ of tokens can be transmitted per interaction. This
328
+ might not necessarily be same for each edge. This
329
+ constraint arises from various factors, such as: (1)
330
+ Design limitations, where different agents oper-
331
+ ate on distinct GPUs with varying memory capac-
332
+ ities, (2) Communication efficiency, where lower-
333
+ bandwidth connections prioritize lightweight mes-
334
+ sage exchanges, and (3) Agent Specific limitations,
335
+ as some LLMs are inherently constrained in how
336
+ much input they can process per step. ❷ Latency
337
+ varies across different edges, meaning that mes-
338
+ sages do not always arrive at their destination in a
339
+ deterministic sequence. Some edges may transmit
340
+ prompts faster than others, leading to asynchronous
341
+ message arrival at the target LLM. This variability
342
+ necesitates the design of a permutation-invariant
343
+ adversarial prompts, ensuring that the attack re-
344
+ mains effective regardless of the order in which
345
+ different chunks of the prompt reach the target. ❸
346
+ To mitigate harmful interactions, certain edges in
347
+ the network are equipped with safety mechanisms,
348
+
349
+ Figure 2: Process of generating and optimizing adversarial prompt chunks for multi-agent LLM systems. (a)
350
+ Multi-agent Topologies: Different network structures including Chain, Tree, Random Graph, and Complete Graph
351
+ that influence attack effectiveness. (b) Topological Optimization: Identifying optimal paths based on bandwidth
352
+ constraints and detection risk, with chunks strategically distributed across the network. (c) Permutation Invariance:
353
+ Due to network latency, prompt chunks may arrive in different orders, creating a sampling space where optimized
354
+ chunks remain effective regardless of arrival sequence, successfully bypassing safety mechanisms.
355
+
356
+ such as Llama-Guard, designed to filter adversarial
357
+ prompts. However, not every edge is protected due
358
+ to the following reasons: (1) Computational limita-
359
+ tions, as running safety filters on every edge would
360
+ require significant GPU resources, (2) Strategic
361
+ Safety Placement, where only high-risk interac-
362
+ tions are monitored, and (3) System design trade-
363
+ offs, where some edges prioritize communication
364
+ speed over security.
365
+ Terminology. We denote each LLM as a vertex
366
+ vi, and such a set of LLMs can be referred to as V.
367
+ Similarly we can denote E as the set of all edges
368
+ uv, and the token bandwidth of such edges can
369
+ be defined by a function F : E → R≥0 such that
370
+ for any edge uv, F(uv) = F(vu). Lastly, we can
371
+ quantify the risk of getting caught by the safety
372
+ mechanism as a function G : E → R≥0, where
373
+ G(uv) = 0 if there is not safety mechanism on the
374
+ edge uv. As a result such a system can be denoted
375
+ by S(E, V, F, G), which will simply be referred to
376
+ as S from here on out.
377
+
378
+ 3.2 Adversary Capabilities
379
+
380
+ It is assumed that the adversary operates within
381
+ the multi-agent system S, leveraging the following
382
+ capabilities to execute a stealthy jailbreak attack:
383
+ ❶ Jailbreak via Multi-Agent Communication: The
384
+ adversary can send adversarial prompts into the sys-
385
+ tem through an initial agent vi with the goal of prop-
386
+ agating the attack to a target agent vt. However,
387
+ due to token bandwidth constraints and message de-
388
+
389
+ lays, the adversarial prompt must be partitioned and
390
+ strategically routed through the network to evade
391
+ detection. ❷ Knowledge of Network Topology L
392
+ and Safety Mechanisms: Partial knowledge of the
393
+ communication graph, including agent connectiv-
394
+ ity and token bandwidth constraints on edges are
395
+ available to the adversary. Additionally, although
396
+ the adversary does not have direct access to inter-
397
+ nal LLM parameters, they are aware that certain
398
+ edges are protected by safety mechanisms and can
399
+ estimate the likelihood of detection using the risk
400
+ function G. ❸ Architecture of the Target Model
401
+ vt: The adversary knows the architecture of tar-
402
+ get LLM vt, allowing them to optimize adversarial
403
+ prompts that actually jailbreak that model type. ❹
404
+ Restricted System Access: The adversary does not
405
+ control all agents in the system, nor do they have
406
+ full visibility into message processing. They can-
407
+ not directly modify parameters or override built-in
408
+ safety mechanisms.
409
+
410
+ 3.3 Adversarial Goals
411
+ In our setting, the adversary aims to execute a
412
+ stealthy jailbreak attack within a multi-agent LLM
413
+ system S, leveraging optimized prompt propaga-
414
+ tion strategies to bypass safety mechanisms and
415
+ manipulate the target LLM’s behavior. The primary
416
+ attack scenario we use is Jailbreak, i.e., to gener-
417
+ ate harmful output, where the adversary carefully
418
+ routes an adversarial prompt through the network
419
+ topology to ensure it reaches the target agent vt
420
+ while avoiding detection.
421
+
422
+ 4 Method
423
+
424
+ In this section, we decouple the structure and ob-
425
+ jective of (i) finding the optimal path in the multi-
426
+ agent communication topology, and (ii) the per-
427
+ mutation invariant adversarial formulation to effec-
428
+ tively bypass safety mechanisms in a constrained
429
+ LLM network as shown in Figure 2.
430
+
431
+ 4.1 Topological Optimization
432
+
433
+ Problem Formulation.
434
+ In a multi-agent system
435
+ S = (V, E), an adversary aims to propagate an
436
+ adversarial prompt from a source agent, denoted
437
+ as vi ∈ V, to a target agent vt ∈ V, while mini-
438
+ mizing the risk of detection by safety mechanisms
439
+ and maximizing the token flow through the net-
440
+ work. Each communication edge (u, v) ∈ E has
441
+ a token bandwidth constraint F(u, v), which lim-
442
+ its the number of tokens that can be transmitted
443
+ in a single exchange, and a risk function G(u, v),
444
+ representing the likelihood of adversarial content
445
+ being detected and blocked by safety mechanisms
446
+ such as Llama-Guard. The adversary’s objective
447
+ is to find an optimal path that balances high token
448
+ throughput while minimizing detection risk.
449
+
450
+ Minimum Cost Maximum Flow Formulation.
451
+ Given the above problem, we formulate it as a Min-
452
+ imum Cost Maximum Flow problem. We define a
453
+ flow function f : E → R≥0, where f (u, v) repre-
454
+ sents the number of adversarial tokens transmitted
455
+ along edge (u, v) ∈ E. The objective is to mini-
456
+ mize the total risk while ensuring maximum token
457
+ flow from vi to vt:
458
+
459
+ (cid:88)
460
+
461
+ min
462
+
463
+ (u,v)∈E
464
+
465
+ G(u, v)f (u, v)
466
+
467
+ (1)
468
+
469
+ subject to the following constraints:
470
+ Token Capacity Constraints:
471
+
472
+ 0 ≤ f (u, v) ≤ F(u, v),
473
+
474
+ ∀(u, v) ∈ E
475
+
476
+ (2)
477
+
478
+ Flow Conservation:
479
+
480
+ f (w, u) =
481
+
482
+ (cid:88)
483
+
484
+ w∈V
485
+
486
+ (cid:88)
487
+
488
+ w∈V
489
+
490
+ f (u, w),
491
+
492
+ ∀u ∈ V \ {vi, vt}
493
+
494
+ Source and Sink Constraints:
495
+
496
+ where Fmax represents the maximum flow that can
497
+ be transmitted from vi to vt.
498
+
499
+ To solve this optimization problem efficiently
500
+ and get the optimal attack path, we deploy the so-
501
+ lution algorithm implemented in NetworkX (Hag-
502
+ berg et al., 2008), which finds the highest token
503
+ flow while minimizing detection risk. More infor-
504
+ mation on how we quantify this risk can be found
505
+ in Appendix C.
506
+
507
+ 4.2 Permutation Invariant Evasion Loss
508
+
509
+ Problem Formulation.
510
+ In multi-agent system,
511
+ S, communication constraints introduce a unique
512
+ challenge for adversarial attacks. Prompts are of-
513
+ ten transmitted in discrete chunks due to token
514
+ bandwidth limitations, agent-specific processing
515
+ delays, and asynchronous message arrival. As these
516
+ chunks propagate through the communication net-
517
+ work, they are accumulated in an agent’s memory
518
+ bank but arrive in varying orders depending on net-
519
+ work latency and routing paths. This inherent non-
520
+ determinism means that the adversarial prompts
521
+ must remain effective regardless of how they are
522
+ received and concatenated by the target agent. The
523
+ primary challenge in designing adversarial prompts
524
+ for multi-agent LLM system, S, lies in ensuring
525
+ that the objective enforces permutation invariance.
526
+ Given such a system, we must optimize a structured
527
+ adversarial prompt that remains effective regardless
528
+ of permutation of chunks.
529
+
530
+ Permutation Invariant Evasion Loss (PIEL).
531
+ Let the LLM agent be a next token predictor, i.e.,
532
+ a function that maps an input sequence of tokens
533
+ x1:n to a probability distribution over the next to-
534
+ ken. Specifically, we denote the probability of the
535
+ model generating the next token xn+1 given prior
536
+ tokens x1:n as p(xn+1|x1:n). Similarly, we can
537
+ now extend it to a full sequence of L target tokens,
538
+ expressing the the probability of generating a spe-
539
+ cific harmful output x∗
540
+
541
+ n+1:n+L as
542
+
543
+ p(x∗
544
+
545
+ n+1:n+L|x1:n) =
546
+
547
+ L
548
+ (cid:89)
549
+
550
+ i=1
551
+
552
+ p(x∗
553
+
554
+ n+i|x1:n+i−1)
555
+
556
+ (6)
557
+
558
+ (3)
559
+
560
+ Then the adversarial loss function is then given by
561
+ the negative log-likelihood of the target sequence:
562
+
563
+ (cid:88)
564
+
565
+ w∈V
566
+ (cid:88)
567
+
568
+ w∈V
569
+
570
+ f (vi, w) −
571
+
572
+ f (w, vt) −
573
+
574
+ (cid:88)
575
+
576
+ w∈V
577
+ (cid:88)
578
+
579
+ w∈V
580
+
581
+ f (w, vi) = Fmax,
582
+
583
+ (4)
584
+
585
+ f (vt, w) = Fmax,
586
+
587
+ (5)
588
+
589
+ LN LL(x1:n) = − log p(x∗
590
+
591
+ n+1:n+L|x1:n)
592
+
593
+ (7)
594
+
595
+ and simply minimizing LN LL(x1:n) increases the
596
+ likelihood of generating the adversarial target
597
+ phrase. To introduce permutation invariance, we
598
+
599
+ Experiment
600
+
601
+ JailbreakBenchmark
602
+
603
+ AdversarialBenchmark
604
+
605
+ In-the-wild Jailbreak
606
+
607
+ Target Model Type
608
+
609
+ Method
610
+
611
+ ASR-m ↑ ASR ↑ ASR-M ↑ ASR-m ↑ ASR ↑ ASR-M ↑ ASR-m ↑ ASR ↑ ASR-M ↑
612
+
613
+ Llama-2-7B
614
+
615
+ Llama-3.1-8B
616
+
617
+ Mistral-7B
618
+
619
+ Gemma-2-9B
620
+
621
+ Vanilla Prompt
622
+
623
+ 0
624
+
625
+ GCG
626
+
627
+ Ours
628
+
629
+ Vanilla Prompt
630
+
631
+ GCG
632
+
633
+ Ours
634
+
635
+ 0.010
636
+
637
+ 0.670
638
+
639
+ 0
640
+
641
+ 0
642
+
643
+ 0
644
+
645
+ 0.017
646
+
647
+ 0.726
648
+
649
+ 0
650
+
651
+ 0
652
+
653
+ 0
654
+
655
+ 0.020
656
+
657
+ 0.780
658
+
659
+ 0
660
+
661
+ 0
662
+
663
+ 0.430
664
+
665
+ 0.462
666
+
667
+ 0.480
668
+
669
+ Vanilla Prompt
670
+
671
+ 0
672
+
673
+ GCG
674
+
675
+ Ours
676
+
677
+ 0.290
678
+
679
+ 0.780
680
+
681
+ Vanilla Prompt
682
+
683
+ 0
684
+
685
+ 0
686
+
687
+ 0.324
688
+
689
+ 0.812
690
+
691
+ 0
692
+
693
+ 0
694
+
695
+ 0.340
696
+
697
+ 0.840
698
+
699
+ 0
700
+
701
+ GCG
702
+
703
+ Ours
704
+
705
+ 0.080
706
+
707
+ 0.700
708
+
709
+ 0.100
710
+
711
+ 0.1200
712
+
713
+ 0.720
714
+
715
+ 0.740
716
+
717
+ Llama-3.1-8B
718
+ (DeepSeek-R1-Distilled)
719
+
720
+ Vanilla Prompt
721
+
722
+ GCG
723
+
724
+ Ours
725
+
726
+ 0
727
+
728
+ 0
729
+
730
+ 0
731
+
732
+ 0
733
+
734
+ 0
735
+
736
+ 0
737
+
738
+ 0
739
+
740
+ 0.120
741
+
742
+ 0.498
743
+
744
+ 0
745
+
746
+ 0.056
747
+
748
+ 0.380
749
+
750
+ 0
751
+
752
+ 0.194
753
+
754
+ 0.512
755
+
756
+ 0
757
+
758
+ 0.146
759
+
760
+ 0.498
761
+
762
+ 0
763
+
764
+ 0
765
+
766
+ 0
767
+
768
+ 0.160
769
+
770
+ 0.533
771
+
772
+ 0
773
+
774
+ 0.067
775
+
776
+ 0.402
777
+
778
+ 0
779
+
780
+ 0.212
781
+
782
+ 0.543
783
+
784
+ 0
785
+
786
+ 0.155
787
+
788
+ 0.506
789
+
790
+ 0
791
+
792
+ 0
793
+
794
+ 0
795
+
796
+ 0.180
797
+
798
+ 0.566
799
+
800
+ 0
801
+
802
+ 0.074
803
+
804
+ 0.420
805
+
806
+ 0
807
+
808
+ 0.228
809
+
810
+ 0.566
811
+
812
+ 0
813
+
814
+ 0.162
815
+
816
+ 0.514
817
+
818
+ 0
819
+
820
+ 0
821
+
822
+ 0.121
823
+
824
+ 0.189
825
+
826
+ 0.144
827
+
828
+ 0.201
829
+
830
+ 0.153
831
+
832
+ 0.231
833
+
834
+ 0.543
835
+
836
+ 0.561
837
+
838
+ 0.587
839
+
840
+ 0.077
841
+
842
+ 0.122
843
+
844
+ 0.389
845
+
846
+ 0.187
847
+
848
+ 0.197
849
+
850
+ 0.082
851
+
852
+ 0.147
853
+
854
+ 0.410
855
+
856
+ 0.215
857
+
858
+ 0.203
859
+
860
+ 0.086
861
+
862
+ 0.159
863
+
864
+ 0.423
865
+
866
+ 0.234
867
+
868
+ 0.209
869
+
870
+ 0.603
871
+
872
+ 0.627
873
+
874
+ 0.642
875
+
876
+ 0.123
877
+
878
+ 0.188
879
+
880
+ 0.137
881
+
882
+ 0.194
883
+
884
+ 0.146
885
+
886
+ 0.198
887
+
888
+ 0.587
889
+
890
+ 0.598
891
+
892
+ 0.609
893
+
894
+ 0.065
895
+
896
+ 0.089
897
+
898
+ 0.069
899
+
900
+ 0.097
901
+
902
+ 0.072
903
+
904
+ 0.107
905
+
906
+ 0.380
907
+
908
+ 0.413
909
+
910
+ 0.440
911
+
912
+ 0.354
913
+
914
+ 0.368
915
+
916
+ 0.384
917
+
918
+ 0.369
919
+
920
+ 0.376
921
+
922
+ 0.384
923
+
924
+ Table 1: Attack success rates (ASR) of different adversarial prompting methods across multiple LLM architectures
925
+ on different benchmarks. We report the minimum (ASR-m), average (ASR), and maximum (ASR-M) attack success
926
+ rates over multiple trials.
927
+
928
+ structure the adversarial prompt as K discrete
929
+ chunks: C = {C1, C2, . . . , CK}, where each chunk
930
+ Ci consists of a sequence of tokens of length Li.
931
+ Since different message paths in the multi-agent
932
+ system, S, may deliver these chunks in varying
933
+ sequences, we define the loss to be averaged over
934
+ all possible orderings of the chunks:
935
+
936
+ L(C) =
937
+
938
+ 1
939
+ K!
940
+
941
+ (cid:88)
942
+
943
+ π∼SK
944
+
945
+ − log p(x∗
946
+
947
+ n+1:n+L|ϕ)
948
+
949
+ (8)
950
+
951
+ where SK represents the set of all possible
952
+ chunk orderings, and ϕ represents the operation
953
+ of Concatenate(π(1), π(2), . . . , π(K)). However,
954
+ optimizing token selection in adversarial inputs is
955
+ challenging due to their discrete nature. To navi-
956
+ gate this, we employ the Greedy-Coordinate Gra-
957
+ dient (GCG) (Zou et al., 2023) method, iteratively
958
+ refining token choices while considering all chunk
959
+ order permutations. For each token t in chunk Ci,
960
+ we compute its gradient based on expectation over
961
+ all orderings by ∇tL(C). Then in each iteration we
962
+ follow three key steps : (1) Compute Loss across
963
+ all orderings, (2) Gradient computation for token
964
+ updates, (3) Token substitution strategy using GCG.
965
+ The whole algorithm is also described in the Ap-
966
+ pendix as Algorithm 1.
967
+
968
+ Stochastic Permutation Invariant Evasion Loss
969
+ (S-PIEL). We can see from Equation (8) that we
970
+ calculate the loss over all K! permutations pos-
971
+ sible, which can be computationally prohibitive
972
+
973
+ in practice if the targeted model vt have multiple
974
+ neighbors (so multiple chunks). Hence to solve, we
975
+ introduce the stochastic version of the loss. Instead
976
+ of evaluating the loss on every single element of
977
+ SK we randomly sample a smaller subset ˜SK and
978
+ try to approximate the loss using it as follows:
979
+
980
+ ˜L(C) =
981
+
982
+ 1
983
+ | ˜SK|
984
+
985
+ (cid:88)
986
+
987
+ π∼ ˜SK
988
+
989
+ − log p(x∗
990
+
991
+ n+1:n+L|ϕ)
992
+
993
+ (9)
994
+
995
+ To understand the computation trade-offs with
996
+ quality of the adversarial prompts generated using
997
+ the S-PIEL, we perform an ablation study which
998
+ can be found in Section 5.5.
999
+ 5 Experiments
1000
+
1001
+ In this section, we conduct a series of experi-
1002
+ ments to evaluate the effectiveness of our proposed
1003
+ permutation-invariant attack. Detailed findings
1004
+ for all the experiments are described below, while
1005
+ all the experimental settings, including baselines,
1006
+ datasets, architectures, training settings and com-
1007
+ prehensive metrics are discussed in Appendix B.
1008
+
1009
+ 5.1 Overall Performance Comparison
1010
+
1011
+ To evaluate the effectiveness of our permutation-
1012
+ invariant attack, we conduct experiments across
1013
+ multiple LLM architectures, including Llama-2,
1014
+ across different benchmarks. Furthermore, each ex-
1015
+ periment is run three times with randomized multi-
1016
+ agent topologies to mitigate bias, ensuring robust
1017
+ evaluation of our attack performance. Complete
1018
+
1019
+ experimental details can be found in Appendix B.1.
1020
+ Based on Table 1, we can derive some key find-
1021
+ ings across different LLM architectures and bench-
1022
+ marks: ❶ Baseline Comparison: Our method sub-
1023
+ stantially outperforms existing approaches across
1024
+ all scenarios. Vanilla prompts show near-zero effec-
1025
+ tiveness on most benchmarks, while GCG achieves
1026
+ moderate success (16 − 32%) only on specific
1027
+ models like Mistral-7B. In contrast, our approach
1028
+ demonstrates upto 7× improvement over the best
1029
+ baseline performance, highlighting the effective-
1030
+ ness of permutation-invariant design. For instance,
1031
+ on Llama-2-7B, vanilla prompts achieve 0% suc-
1032
+ cess rate across structured benchmarks, while GCG
1033
+ manages only 1.7% ASR on JailbreakBench. In
1034
+ contrast, our method achieves 72.6% ASR, demon-
1035
+ strating a dramatic improvement in attack capabil-
1036
+ ity. ❷ Attack Stability: The small variance be-
1037
+ tween minimum (ASR-m) and maximum (ASR-M)
1038
+ – typically 2-6% – demonstrates the stability of our
1039
+ attack across different random topologies. This
1040
+ consistency is particularly evident in Gemma-2-9B,
1041
+ where the variance remains under 4%. This sta-
1042
+ bility extends to other models, with Mistral-7B
1043
+ showing only 6% variation (78.0% to 84.0%), con-
1044
+ firming the robustness of our permutation-invariant
1045
+ design. ❸ Model Sensitivity: Some models ex-
1046
+ hibit higher susceptibility to our attack. For exam-
1047
+ ple, Mistral-7B and Llama-2-7B show the high-
1048
+ est vulnerability, achieving 81.2% and 72.6% ASR
1049
+ on Jailbreak Benchmark, respectively, while
1050
+ models like Llama-3.1-8B achieve 41.3% ASR
1051
+ on same benchmark – a significant result given the
1052
+ model’s initial results. These findings indicate that
1053
+ despite different architectures, our method outper-
1054
+ forms the existing baselines in multi-agent setting.
1055
+ ❹ General Observations: Interestingly, the larger
1056
+ model size does not always guarantee a better secu-
1057
+ rity. Additionally, DeepSeek-R1 Distillation show
1058
+ cases notably lower ASR (41.3%) on same bench-
1059
+ mark.
1060
+
1061
+ 5.2 Safety Mechanism Efficacy
1062
+
1063
+ The goal of this experiment is to simply analyze
1064
+ the effectiveness of graph optimizations we per-
1065
+ formed in Section 4.1 in reducing the detectability
1066
+ of these jailbreak prompts when routed through a
1067
+ multi-agent system with safety mechanisms. Our
1068
+ primary focus is on understanding whether graph-
1069
+ optimized routing helps bypass safety mechanisms
1070
+ of different types. The experimental settings are
1071
+ explained in Appendix B.2.
1072
+
1073
+ Table 2: Transferability evaluation of our adversarial
1074
+ prompts across different source and target models.
1075
+
1076
+ Source Model Target Model
1077
+
1078
+ Jailbreak Benchmark Adversarial Benchmark
1079
+
1080
+ Llama-2-7B
1081
+
1082
+ Mistral-7B
1083
+
1084
+ Gemma-2-9B
1085
+
1086
+ Llama-2-7B
1087
+ Mistral-7B
1088
+ Gemma-2-9B
1089
+
1090
+ Llama-2-7B
1091
+ Mistral-7B
1092
+ Gemma-2-9B
1093
+
1094
+ Llama-2-7B
1095
+ Mistral-7B
1096
+ Gemma-2-9B
1097
+
1098
+ 0.740
1099
+ 0.710
1100
+ 0.680
1101
+
1102
+ 0.690
1103
+ 0.820
1104
+ 0.610
1105
+
1106
+ 0.610
1107
+ 0.690
1108
+ 0.710
1109
+
1110
+ 0.522
1111
+ 0.488
1112
+ 0.492
1113
+
1114
+ 0.446
1115
+ 0.522
1116
+ 0.412
1117
+
1118
+ 0.472
1119
+ 0.498
1120
+ 0.512
1121
+
1122
+ Based on Figure 3, which compares the effec-
1123
+ tiveness of different safety mechanisms against var-
1124
+ ious attack methods, we observe some key findings:
1125
+ ❶ Baseline Comparison: Across all safety mech-
1126
+ anisms, vanilla prompts are most easily detected,
1127
+ followed by GCG prompts, while our permutation-
1128
+ invariant prompts consistently achieves the lowest
1129
+ detection rates when it comes to attacks in multi-
1130
+ agent systems (chunked). ❷ Defense Robustness:
1131
+ Even the most advanced safety mechanisms strug-
1132
+ gle against our permutation-invariant attack. The
1133
+ best-performing model, Llama-Guard-3-8B, still
1134
+ sees its F1-score drop by nearly 30% when faced
1135
+ with our method compared to vanilla prompts, high-
1136
+ lighting significant vulnerability in current safety
1137
+ measures.
1138
+
1139
+ 5.3 Transferability
1140
+
1141
+ To assess the transferability of adversarial prompts,
1142
+ we evaluate attack success rates across different
1143
+ source-target LLM pairs, including Llama-2-7B,
1144
+ Mistral-7B, and Gemma-2-9B. We use Jailbreak
1145
+ Benchmark and Adversarial Benchmark, to mea-
1146
+ sure the Attack Success Rate (ASR) when prompts
1147
+ optimized on one model are applied to an-
1148
+ other. Further details regarding setup are shared
1149
+ in Appendix B.3 and the findings are summa-
1150
+ rized in Table 2 for the transferability of our
1151
+ permutation-invariant attack across different LLM
1152
+ architectures. We observe several key find-
1153
+ ings: ❶ Source-Target Similarity: The effec-
1154
+ tiveness of transferred attacks strongly corre-
1155
+ lated with architectural similarity between source
1156
+ and target models. For instance, when using
1157
+ Llama-2-7B as the source model on Jailbreak
1158
+ Benchmark, its attack achieves 74% ASR on it-
1159
+ self but also maintains relatively high effectiveness
1160
+ on Mistral-7B (71%) and Gemma-2-9B(68%).
1161
+ This suggests that adversarial prompts learned
1162
+ on one architecture can successfully transfer to
1163
+ the other model, though with some degradation
1164
+ in performance. ❷ Model-Specific Robustness:
1165
+ Mistral-7B shows unique characteristics both
1166
+
1167
+ Figure 3: Detection efficacy of different safety mechanisms against adversarial prompts.
1168
+
1169
+ The results for the ablation are summarized in
1170
+ Figure 4. Complete graph structure demonstrate
1171
+ the highest vulnerability to attacks, achieving an
1172
+ ASR of around 78%, while Chain topologies prove
1173
+ the most resilient with approximately 60% ASR.
1174
+ This suggests that increased connectivity and path
1175
+ diversity might actually make systems more suscep-
1176
+ tible to adversarial attacks when it comes to attacks
1177
+ that utilize the topology to their own advantage.
1178
+
1179
+ Table 3: Effect of sample size on the number of itera-
1180
+ tions required for convergence.
1181
+
1182
+ Sample Size( M)
1183
+
1184
+ 2
1185
+
1186
+ 4
1187
+
1188
+ 8
1189
+
1190
+ 16
1191
+
1192
+ 32
1193
+
1194
+ 64
1195
+
1196
+ Iterations
1197
+ ASR
1198
+
1199
+ N/A N/A 15,000
1200
+
1201
+ 0
1202
+
1203
+ 0.01
1204
+
1205
+ 0
1206
+
1207
+ 5,000
1208
+ 0.08
1209
+
1210
+ 4,200
1211
+ 0.17
1212
+
1213
+ 1,750
1214
+ 0.56
1215
+
1216
+ 5.5 Ablation Study 2: Sensitivity Analysis for
1217
+
1218
+ Stochastic Version
1219
+
1220
+ We know that the Permutation Invariant Evasion
1221
+ Loss introduced in Section 4.2 can be computation-
1222
+ ally prohibitive as its complexity can be catego-
1223
+ rized as O(K!) where K is the optimal number of
1224
+ chunks. Hence to solve, we introduce the Stochastic
1225
+ version of the loss where we randomly sample M
1226
+ chunks out of K! permutations at each iteration.
1227
+ To investigate the effect of sample size, M , on the
1228
+ performance of our method, we conduct an abla-
1229
+ tion study measuring the ASR as a function of the
1230
+ number of M . The experimental details can be
1231
+ found in Appendix B.5
1232
+
1233
+ We can see in Table 3 the relationship between
1234
+ sample size and ASR. Starting from a very low
1235
+ effectiveness of almost 0% ASR with small sam-
1236
+ ple sizes (M = 2, 4), the performance improves
1237
+ dramatically as M increases, reaching around 56%
1238
+ at M = 64, which is around 50% of K!. The
1239
+ computational cost, measured in required iterations
1240
+ for convergence, demonstrates an inverse relations
1241
+ with sample size M . As shown in Table 3, smaller
1242
+ sample sizes require significantly more iterations
1243
+ (15, 000 iterations for M = 8) compared to larger
1244
+
1245
+ Figure 4: Impact of different network topologies on
1246
+ attack success rate (ASR) in a multi-agent LLM system.
1247
+ as a source and target model. As a source,
1248
+ it achieves the highest self ASR (82% on Jail-
1249
+ break Benchmark) but shows steeper perfor-
1250
+ mance drops when transferred to other archi-
1251
+ tectures (69% on Llama-2-7B). This indicates
1252
+ while Mistral-7B can generate highly effective
1253
+ attacks, these attacks may be more model-specific
1254
+ compared to those generated by other architec-
1255
+ tures. ❸ Architecture Generalization: Interest-
1256
+ ingly, while Gemma-2-9B shows moderate perfor-
1257
+ mance as a source model (71% self ASR), its at-
1258
+ tacks demonstrate more consistent transfer perfor-
1259
+ mance across different target models, with smaller
1260
+ variations in success rates. This suggests that some
1261
+ architectures may naturally generate more gener-
1262
+ alizable adversarial prompts, even if they are not
1263
+ optimal for any specific target.
1264
+ 5.4 Ablation Study 1: Effect of Topology
1265
+
1266
+ To investigate the effect of communication topol-
1267
+ ogy on the success of adversarial attacks in multi-
1268
+ agent systems, we conduct an ablation study using
1269
+ a range of graph structures. Our goal is to systemat-
1270
+ ically vary the underlying communication structure,
1271
+ so we can quantify the impact of network topology
1272
+ on adversarial robustness. Experimental details are
1273
+ listed in Appendix B.4
1274
+
1275
+ PromptGuard-86MLlama-Guard-7BLlama-Guard-2-8BLlama-Guard-3-8BLlama-Guard-3-1B020406080100F1-ScoreVanillaGCGOurs0%20%40%60%80%100%ChainTreeComplete GraphRandom Graph samples (1, 750 iterations for M = 64). Notably,
1276
+ for very small sample sizes, the loss does not con-
1277
+ verge as depicted by N/A. Most interestingly,
1278
+ these results suggest a practical trade-off point be-
1279
+ tween attack effectiveness and computational effi-
1280
+ ciency.
1281
+
1282
+ 6 Conclusion
1283
+
1284
+ In this paper, we investigate the vulnerabilities of
1285
+ multi-agent LLM systems to adversarial prompt
1286
+ propagation attacks. Our findings demonstrate that
1287
+ optimized prompt routing can effectively bypass
1288
+ safety mechanisms in a system while adhering to
1289
+ token bandwidth constraints and handling asyn-
1290
+ chronous message arrival. Through extensive ex-
1291
+ periments, we highlighted critical safety gaps in
1292
+ existing defenses, showing that traditional single-
1293
+ agent safety measures are insufficient in multi-
1294
+ agent setting.
1295
+
1296
+ Limitations
1297
+
1298
+ While our study sheds light on critical vulnera-
1299
+ bilities in multi-agent systems, several constraints
1300
+ should be acknowledged.
1301
+ ❶ Our evaluation is restricted to a set of open-
1302
+ source models and benchmarks. Although these
1303
+ models represent a diverse range of large-scale
1304
+ LLMs, they do not fully encapsulate the broader
1305
+ landscape of commercial and fine-tuned propri-
1306
+ etary systems. Future research should expand the
1307
+ scope to include wider variety of architectures par-
1308
+ ticularly those with more advanced safety training
1309
+ protocols, like GPT-4 (Achiam et al., 2023), and
1310
+ Claude (ClaudeTeam).
1311
+ ❷ Our approach assumes the partial knowledge of
1312
+ the communication structure and safety enforce-
1313
+ ment mechanisms within the system. While this
1314
+ reflects certain real-world scenarios where attack-
1315
+ ers can exploit known patterns, it does not account
1316
+ for cases where the network topology is entirely
1317
+ unknown or dynamically reconfigured as shown in
1318
+ AgentPrune (Zhang et al., 2024a).
1319
+ ❸ Our modeling of inter-agent interactions simpli-
1320
+ fies some complexities present in real deployments.
1321
+ We assume static safety mechanisms and prede-
1322
+ fined token bandwidth constraints, whereas actual
1323
+ multi-agent networks may involve shifting policies,
1324
+ evolving defenses and variable latency conditions.
1325
+ ❹ All the models we study focus solely on text-
1326
+ based agent interactions. Many emerging LLM-
1327
+ based systems incorporate multi-modal capabilities
1328
+
1329
+ as well. The potential for adversarial manipula-
1330
+ tion in these multi-modal systems remains an open
1331
+ question, and future research should examine how
1332
+ cross-modal dependencies influence such security
1333
+ risks.
1334
+
1335
+ Addressing these limitations will provide a
1336
+ clearer path toward securing multi-agent LLM
1337
+ frameworks, ensuring their safe and reliable de-
1338
+ ployment in real-world application.
1339
+
1340
+ Ethical Statement
1341
+
1342
+ Ensuring the security of multi-agent LLM systems
1343
+ is critical as these models become more integrated
1344
+ into real-world applications. Our research is driven
1345
+ by the need to understand and address vulnerabil-
1346
+ ities that could be exploited by adversaries, with
1347
+ the ultimate goal of strengthening AI safety mech-
1348
+ anisms. By analyzing how adversarial prompts
1349
+ can bypass existing defenses, we aim to provide
1350
+ valuable insights for the development of more ro-
1351
+ bust safeguards that can protect these systems from
1352
+ manipulation.
1353
+
1354
+ We acknowledge that the techniques explored in
1355
+ this work could be misused if applied irresponsi-
1356
+ bly. To mitigate this risk, we have conducted all
1357
+ experiments in controlled environments and have
1358
+ refrained from testing on real-world deployments.
1359
+ Our intent is solely to inform security research and
1360
+ to assist developers in identifying and mitigating
1361
+ risks before they become exploitable. We strongly
1362
+ advocate for ethical AI practices and emphasize
1363
+ that advancements in adversarial understanding
1364
+ should always be accompanied by proactive de-
1365
+ fense strategies to ensure the safe and responsible
1366
+ deployment of AI technologies.
1367
+
1368
+ Acknowledgment
1369
+
1370
+ This research was, in part, funded by the CISCO
1371
+ Faculty Award, UNC SDS Seed Grant and Net-
1372
+ Mind.AI. The views and conclusions contained in
1373
+ this document are those of the authors and should
1374
+ not be interpreted as representing official policies,
1375
+ either expressed or implied of the funding organi-
1376
+ zations.
1377
+
1378
+ References
1379
+
1380
+ Josh Achiam, Steven Adler, Sandhini Agarwal, Lama
1381
+ Ahmad, Ilge Akkaya, Florencia Leoni Aleman,
1382
+ Diogo Almeida, Janko Altenschmidt, Sam Altman,
1383
+ Shyamal Anadkat, et al. 2023. Gpt-4 technical report.
1384
+ arXiv preprint arXiv:2303.08774.
1385
+
1386
+ Alfonso Amayuelas, Xianjun Yang, Antonis Antoniades,
1387
+ Wenyue Hua, Liangming Pan, and William Yang
1388
+ Wang. 2024. MultiAgent collaboration attack: Inves-
1389
+ tigating adversarial attacks in large language model
1390
+ collaborations via debate. In Findings of the Associ-
1391
+ ation for Computational Linguistics: EMNLP 2024,
1392
+ pages 6929–6948, Miami, Florida, USA. Association
1393
+ for Computational Linguistics.
1394
+
1395
+ Yuntao Bai, Saurav Kadavath, Sandipan Kundu,
1396
+ Amanda Askell, Jackson Kernion, Andy Jones,
1397
+ Anna Chen, Anna Goldie, Azalia Mirhoseini,
1398
+ Cameron McKinnon, et al. 2022. Constitutional
1399
+ ai: Harmlessness from ai feedback. arXiv preprint
1400
+ arXiv:2212.08073.
1401
+
1402
+ Chi-Min Chan, Weize Chen, Yusheng Su, Jianxuan Yu,
1403
+ Wei Xue, Shanghang Zhang, Jie Fu, and Zhiyuan
1404
+ Liu. 2023. Chateval: Towards better llm-based eval-
1405
+ uators through multi-agent debate. arXiv preprint
1406
+ arXiv:2308.07201.
1407
+
1408
+ Patrick Chao, Edoardo Debenedetti, Alexander Robey,
1409
+ Maksym Andriushchenko, Francesco Croce, Vikash
1410
+ Sehwag, Edgar Dobriban, Nicolas Flammarion,
1411
+ George J Pappas, Florian Tramer, et al. 2024. Jail-
1412
+ breakbench: An open robustness benchmark for jail-
1413
+ breaking large language models. arXiv preprint
1414
+ arXiv:2404.01318.
1415
+
1416
+ Jiaqi Chen, Yuxian Jiang, Jiachen Lu, and Li Zhang.
1417
+ in
1418
+ preprint
1419
+
1420
+ self-organizing agents
1421
+
1422
+ environment.
1423
+
1424
+ S-agents:
1425
+
1426
+ arXiv
1427
+
1428
+ 2024.
1429
+ open-ended
1430
+ arXiv:2402.04578.
1431
+
1432
+ Weize Chen, Yusheng Su, Jingwei Zuo, Cheng Yang,
1433
+ Chenfei Yuan, Chen Qian, Chi-Min Chan, Yujia
1434
+ Qin, Yaxi Lu, Ruobing Xie, et al. 2023. Agent-
1435
+ verse: Facilitating multi-agent collaboration and ex-
1436
+ ploring emergent behaviors in agents. arXiv preprint
1437
+ arXiv:2308.10848, 2(4):6.
1438
+
1439
+ ClaudeTeam. The claude 3 model family: Opus, sonnet,
1440
+
1441
+ haiku.
1442
+
1443
+ Roi Cohen, May Hamri, Mor Geva, and Amir Glober-
1444
+ son. 2023. Lm vs lm: Detecting factual errors via
1445
+ cross examination. arXiv preprint arXiv:2305.13281.
1446
+
1447
+ Yihong Dong, Xue Jiang, Zhi Jin, and Ge Li. 2024.
1448
+ Self-collaboration code generation via chatgpt. ACM
1449
+ Transactions on Software Engineering and Method-
1450
+ ology, 33(7):1–38.
1451
+
1452
+ Yilun Du, Shuang Li, Antonio Torralba, Joshua B Tenen-
1453
+ baum, and Igor Mordatch. 2023. Improving factual-
1454
+ ity and reasoning in language models through multia-
1455
+ gent debate. arXiv preprint arXiv:2305.14325.
1456
+
1457
+ GraySwanAI. 2024. Nanocg. https://github.com/
1458
+ [Accessed 16-
1459
+
1460
+ GraySwanAI/nanoGCG/tree/main.
1461
+ 02-2025].
1462
+
1463
+ Xiangming Gu, Xiaosen Zheng, Tianyu Pang, Chao
1464
+ Du, Qian Liu, Ye Wang, Jing Jiang, and Min Lin.
1465
+ 2024. Agent smith: A single image can jailbreak
1466
+ one million multimodal llm agents exponentially fast.
1467
+ arXiv preprint arXiv:2402.08567.
1468
+
1469
+ Chuan Guo, Alexandre Sablayrolles, Hervé Jégou, and
1470
+ Douwe Kiela. 2021. Gradient-based adversarial at-
1471
+ tacks against text transformers. In Proceedings of the
1472
+ 2021 Conference on Empirical Methods in Natural
1473
+ Language Processing, pages 5747–5757, Online and
1474
+ Punta Cana, Dominican Republic. Association for
1475
+ Computational Linguistics.
1476
+
1477
+ Daya Guo, Dejian Yang, Haowei Zhang, Junxiao Song,
1478
+ Ruoyu Zhang, Runxin Xu, Qihao Zhu, Shirong Ma,
1479
+ Peiyi Wang, Xiao Bi, et al. 2025. Deepseek-r1: In-
1480
+ centivizing reasoning capability in llms via reinforce-
1481
+ ment learning. arXiv preprint arXiv:2501.12948.
1482
+
1483
+ Aric A. Hagberg, Daniel A. Schult, and Pieter J. Swart.
1484
+ 2008. Exploring network structure, dynamics, and
1485
+ In Proceedings of the
1486
+ function using networkx.
1487
+ 7th Python in Science Conference, pages 11 – 15,
1488
+ Pasadena, CA USA.
1489
+
1490
+ Wenyue Hua, Lizhou Fan, Lingyao Li, Kai Mei,
1491
+ Jianchao Ji, Yingqiang Ge, Libby Hemphill, and
1492
+ Yongfeng Zhang. 2023. War and peace (waragent):
1493
+ Large language model-based multi-agent simulation
1494
+ of world wars. arXiv preprint arXiv:2311.17227.
1495
+
1496
+ Hakan Inan, Kartikeya Upasani, Jianfeng Chi, Rashi
1497
+ Rungta, Krithika Iyer, Yuning Mao, Michael
1498
+ Tontchev, Qing Hu, Brian Fuller, Davide Testuggine,
1499
+ et al. 2023. Llama guard: Llm-based input-output
1500
+ safeguard for human-ai conversations. arXiv preprint
1501
+ arXiv:2312.06674.
1502
+
1503
+ Albert Q Jiang, Alexandre Sablayrolles, Arthur Men-
1504
+ sch, Chris Bamford, Devendra Singh Chaplot, Diego
1505
+ de las Casas, Florian Bressand, Gianna Lengyel, Guil-
1506
+ laume Lample, Lucile Saulnier, et al. 2023. Mistral
1507
+ 7b. arXiv preprint arXiv:2310.06825.
1508
+
1509
+ Ye Jin, Xiaoxi Shen, Huiling Peng, Xiaoan Liu, Jingli
1510
+ Qin, Jiayang Li, Jintao Xie, Peizhong Gao, Guyue
1511
+ Zhou, and Jiangtao Gong. 2023. Surrealdriver: De-
1512
+ signing generative driver agent simulation framework
1513
+ in urban contexts based on large language model.
1514
+ arXiv preprint arXiv:2309.13193.
1515
+
1516
+ Donghyun Lee and Mo Tiwari. 2024. Prompt infec-
1517
+ tion: Llm-to-llm prompt injection within multi-agent
1518
+ systems. arXiv preprint arXiv:2410.07283.
1519
+
1520
+ Abhimanyu Dubey, Abhinav Jauhri, Abhinav Pandey,
1521
+ Abhishek Kadian, Ahmad Al-Dahle, Aiesha Letman,
1522
+ Akhil Mathur, Alan Schelten, Amy Yang, Angela
1523
+ Fan, et al. 2024. The llama 3 herd of models. arXiv
1524
+ preprint arXiv:2407.21783.
1525
+
1526
+ Guohao Li, Hasan Hammoud, Hani Itani, Dmitrii
1527
+ Khizbullin, and Bernard Ghanem. 2023a. Camel:
1528
+ Communicative agents for" mind" exploration of
1529
+ large language model society. Advances in Neural
1530
+ Information Processing Systems, 36:51991–52008.
1531
+
1532
+ Yuan Li, Yixuan Zhang, and Lichao Sun. 2023b. Metaa-
1533
+ gents: Simulating interactions of human behav-
1534
+ iors for llm-based task-oriented coordination via
1535
+ arXiv preprint
1536
+ collaborative generative agents.
1537
+ arXiv:2310.06500.
1538
+
1539
+ Tian Liang, Zhiwei He, Wenxiang Jiao, Xing Wang,
1540
+ Yan Wang, Rui Wang, Yujiu Yang, Shuming Shi, and
1541
+ Zhaopeng Tu. 2023. Encouraging divergent thinking
1542
+ in large language models through multi-agent debate.
1543
+ arXiv preprint arXiv:2305.19118.
1544
+
1545
+ Ruibo Liu, Ruixin Yang, Chenyan Jia, Ge Zhang, Denny
1546
+ Zhou, Andrew M Dai, Diyi Yang, and Soroush
1547
+ Vosoughi. 2023a. Training socially aligned lan-
1548
+ guage models on simulated social interactions. arXiv
1549
+ preprint arXiv:2305.16960.
1550
+
1551
+ Xiaogeng Liu, Nan Xu, Muhao Chen, and Chaowei
1552
+ Xiao. 2023b. Autodan: Generating stealthy jailbreak
1553
+ prompts on aligned large language models. arXiv
1554
+ preprint arXiv:2310.04451.
1555
+
1556
+ Yi Liu, Gelei Deng, Zhengzi Xu, Yuekang Li, Yaowen
1557
+ Zheng, Ying Zhang, Lida Zhao, Tianwei Zhang,
1558
+ Kailong Wang, and Yang Liu. 2023c. Jailbreaking
1559
+ chatgpt via prompt engineering: An empirical study.
1560
+ arXiv preprint arXiv:2305.13860.
1561
+
1562
+ Zijun Liu, Yanzhe Zhang, Peng Li, Yang Liu, and Diyi
1563
+ Yang. 2023d. Dynamic llm-agent network: An llm-
1564
+ agent collaboration framework with agent team opti-
1565
+ mization. arXiv preprint arXiv:2310.02170.
1566
+
1567
+ Weimin Lyu, Xiao Lin, Songzhu Zheng, Lu Pang,
1568
+ Haibin Ling, Susmit Jha, and Chao Chen. 2024.
1569
+ Task-agnostic detector for insertion-based backdoor
1570
+ attacks. In Findings of the Association for Computa-
1571
+ tional Linguistics: NAACL 2024, pages 2808–2822,
1572
+ Mexico City, Mexico. Association for Computational
1573
+ Linguistics.
1574
+
1575
+ Weimin Lyu, Songzhu Zheng, Tengfei Ma, and Chao
1576
+ Chen. 2022. A study of the attention abnormality
1577
+ In Proceedings of the 2022
1578
+ in trojaned BERTs.
1579
+ Conference of the North American Chapter of the
1580
+ Association for Computational Linguistics: Human
1581
+ Language Technologies, pages 4727–4741, Seattle,
1582
+ United States. Association for Computational Lin-
1583
+ guistics.
1584
+
1585
+ Weimin Lyu, Songzhu Zheng, Lu Pang, Haibin Ling,
1586
+ and Chao Chen. 2023. Attention-enhancing back-
1587
+ door attacks against BERT-based models. In Find-
1588
+ ings of the Association for Computational Linguis-
1589
+ tics: EMNLP 2023, pages 10672–10690, Singapore.
1590
+ Association for Computational Linguistics.
1591
+
1592
+ 2022. Training language models to follow instruc-
1593
+ tions with human feedback. Advances in neural in-
1594
+ formation processing systems, 35:27730–27744.
1595
+
1596
+ Joon Sung Park, Joseph O’Brien, Carrie Jun Cai, Mered-
1597
+ ith Ringel Morris, Percy Liang, and Michael S Bern-
1598
+ stein. 2023. Generative agents: Interactive simulacra
1599
+ of human behavior. In Proceedings of the 36th an-
1600
+ nual acm symposium on user interface software and
1601
+ technology, pages 1–22.
1602
+
1603
+ Chen Qian, Xin Cong, Cheng Yang, Weize Chen,
1604
+ Yusheng Su, Juyuan Xu, Zhiyuan Liu, and Maosong
1605
+ Sun. 2023. Communicative agents for software de-
1606
+ velopment. arXiv preprint arXiv:2307.07924, 6(3).
1607
+
1608
+ Miguel A Ramirez, Song-Kyoo Kim, Hussam Al
1609
+ Hamadi, Ernesto Damiani, Young-Ji Byon, Tae-Yeon
1610
+ Kim, Chung-Suk Cho, and Chan Yeob Yeun. 2022.
1611
+ Poisoning attacks and defenses on artificial intelli-
1612
+ gence: A survey. arXiv preprint arXiv:2202.10276.
1613
+
1614
+ Xinyue Shen, Zeyuan Chen, Michael Backes, Yun Shen,
1615
+ and Yang Zhang. 2024. " do anything now": Charac-
1616
+ terizing and evaluating in-the-wild jailbreak prompts
1617
+ In Proceedings of the
1618
+ on large language models.
1619
+ 2024 on ACM SIGSAC Conference on Computer and
1620
+ Communications Security, pages 1671–1685.
1621
+
1622
+ Noah Shinn, Federico Cassano, Ashwin Gopinath,
1623
+ Karthik Narasimhan, and Shunyu Yao. 2024. Re-
1624
+ flexion: Language agents with verbal reinforcement
1625
+ learning. Advances in Neural Information Process-
1626
+ ing Systems, 36.
1627
+
1628
+ Zhen Tan, Chengshuai Zhao, Raha Moraffah, Yifan
1629
+ Li, Yu Kong, Tianlong Chen, and Huan Liu. 2024.
1630
+ The wolf within: Covert injection of malice into
1631
+ mllm societies via an mllm operative. arXiv preprint
1632
+ arXiv:2402.14859.
1633
+
1634
+ Gemma Team, Morgane Riviere, Shreya Pathak,
1635
+ Pier Giuseppe Sessa, Cassidy Hardin, Surya Bhupati-
1636
+ raju, Léonard Hussenot, Thomas Mesnard, Bobak
1637
+ Shahriari, Alexandre Ramé, et al. 2024. Gemma 2:
1638
+ Improving open language models at a practical size.
1639
+ arXiv preprint arXiv:2408.00118.
1640
+
1641
+ Llama Team. 2024. Meta llama guard 2.
1642
+
1643
+ https:
1644
+
1645
+ //github.com/meta-llama/PurpleLlama/blob/
1646
+ main/Llama-Guard2/MODEL_CARD.md.
1647
+
1648
+ Romal Thoppilan, Daniel De Freitas, Jamie Hall, Noam
1649
+ Shazeer, Apoorv Kulshreshtha, Heng-Tze Cheng,
1650
+ Alicia Jin, Taylor Bos, Leslie Baker, Yu Du, et al.
1651
+ 2022. Lamda: Language models for dialog applica-
1652
+ tions. arXiv preprint arXiv:2201.08239.
1653
+
1654
+ Meta. 2024. Prompt Guard-86M | Model Cards and
1655
+ Prompt formats — llama.com. https://www.llama.
1656
+ com/docs/model-cards-and-prompt-formats/
1657
+ prompt-guard/. [Accessed 13-02-2025].
1658
+
1659
+ Yu Tian, Xiao Yang, Jingyuan Zhang, Yinpeng Dong,
1660
+ and Hang Su. 2023. Evil geniuses: Delving into
1661
+ arXiv preprint
1662
+ the safety of llm-based agents.
1663
+ arXiv:2311.11855.
1664
+
1665
+ Long Ouyang, Jeffrey Wu, Xu Jiang, Diogo Almeida,
1666
+ Carroll Wainwright, Pamela Mishkin, Chong Zhang,
1667
+ Sandhini Agarwal, Katarina Slama, Alex Ray, et al.
1668
+
1669
+ Weixi Tong and Tianyi Zhang. 2024. Codejudge: Eval-
1670
+ uating code generation with large language models.
1671
+ arXiv preprint arXiv:2410.02184.
1672
+
1673
+ Hugo Touvron, Louis Martin, Kevin Stone, Peter Al-
1674
+ bert, Amjad Almahairi, Yasmine Babaei, Nikolay
1675
+ Bashlykov, Soumya Batra, Prajjwal Bhargava, Shruti
1676
+ Bhosale, et al. 2023. Llama 2: Open founda-
1677
+ tion and fine-tuned chat models. arXiv preprint
1678
+ arXiv:2307.09288.
1679
+
1680
+ Mingchen Zhuge, Haozhe Liu, Francesco Faccio, Dy-
1681
+ lan R Ashley, Róbert Csordás, Anand Gopalakrish-
1682
+ nan, Abdullah Hamdi, Hasan Abed Al Kader Ham-
1683
+ moud, Vincent Herrmann, Kazuki Irie, et al. 2023.
1684
+ Mindstorms in natural language-based societies of
1685
+ mind. arXiv preprint arXiv:2305.17066.
1686
+
1687
+ Andy Zou, Zifan Wang, Nicholas Carlini, Milad Nasr,
1688
+ J Zico Kolter, and Matt Fredrikson. 2023. Univer-
1689
+ sal and transferable adversarial attacks on aligned
1690
+ language models. arXiv preprint arXiv:2307.15043.
1691
+
1692
+ Guanzhi Wang, Yuqi Xie, Yunfan Jiang, Ajay Man-
1693
+ dlekar, Chaowei Xiao, Yuke Zhu, Linxi Fan, and
1694
+ Anima Anandkumar. 2023. Voyager: An open-ended
1695
+ embodied agent with large language models. arXiv
1696
+ preprint arXiv:2305.16291.
1697
+
1698
+ Alexander Wei, Nika Haghtalab, and Jacob Steinhardt.
1699
+ 2024. Jailbroken: How does llm safety training fail?
1700
+ Advances in Neural Information Processing Systems,
1701
+ 36.
1702
+
1703
+ Yiran Wu, Feiran Jia, Shaokun Zhang, Hangyu Li,
1704
+ Erkang Zhu, Yue Wang, Yin Tat Lee, Richard Peng,
1705
+ Qingyun Wu, and Chi Wang. 2023. An empirical
1706
+ study on challenging math problem solving with gpt-
1707
+ 4. arXiv preprint arXiv:2306.01337.
1708
+
1709
+ Hui Yang, Sifu Yue, and Yunzhong He. 2023. Auto-gpt
1710
+ for online decision making: Benchmarks and addi-
1711
+ tional opinions. arXiv preprint arXiv:2306.02224.
1712
+
1713
+ Yi Zeng, Hongpeng Lin, Jingwen Zhang, Diyi Yang,
1714
+ Ruoxi Jia, and Weiyan Shi. 2024a. How johnny can
1715
+ persuade llms to jailbreak them: Rethinking persua-
1716
+ sion to challenge ai safety by humanizing llms. arXiv
1717
+ preprint arXiv:2401.06373.
1718
+
1719
+ Yifan Zeng, Yiran Wu, Xiao Zhang, Huazheng Wang,
1720
+ and Qingyun Wu. 2024b. Autodefense: Multi-agent
1721
+ llm defense against jailbreak attacks. arXiv preprint
1722
+ arXiv:2403.04783.
1723
+
1724
+ Guibin Zhang, Yanwei Yue, Zhixun Li, Sukwon Yun,
1725
+ Guancheng Wan, Kun Wang, Dawei Cheng, Jef-
1726
+ frey Xu Yu, and Tianlong Chen. 2024a. Cut the
1727
+ crap: An economical communication pipeline for
1728
+ arXiv preprint
1729
+ llm-based multi-agent systems.
1730
+ arXiv:2410.02506.
1731
+
1732
+ Guibin Zhang, Yanwei Yue, Xiangguo Sun, Guancheng
1733
+ Wan, Miao Yu, Junfeng Fang, Kun Wang, and Dawei
1734
+ Cheng. 2024b. G-designer: Architecting multi-agent
1735
+ communication topologies via graph neural networks.
1736
+ arXiv preprint arXiv:2410.11782.
1737
+
1738
+ Qinkai Zheng, Xiao Xia, Xu Zou, Yuxiao Dong, Shan
1739
+ Wang, Yufei Xue, Zihan Wang, Lei Shen, Andi Wang,
1740
+ Yang Li, et al. 2023. Codegeex: A pre-trained model
1741
+ for code generation with multilingual evaluations on
1742
+ humaneval-x. arXiv preprint arXiv:2303.17568.
1743
+
1744
+ Wangchunshu Zhou, Yuchen Eleanor Jiang, Long Li,
1745
+ Jialong Wu, Tiannan Wang, Shi Qiu, Jintian Zhang,
1746
+ Jing Chen, Ruipu Wu, Shuai Wang, et al. 2023.
1747
+ Agents: An open-source framework for autonomous
1748
+ language agents. arXiv preprint arXiv:2309.07870.
1749
+
1750
+ A Use of Generative AI
1751
+
1752
+ To enhance clarity and readability, we utilized
1753
+ LLMs exclusively as a language polishing tool.
1754
+ Its role was confined to proofreading, grammat-
1755
+ ical correction, and stylistic refinement—functions
1756
+ analogous to those provided by traditional grammar
1757
+ checkers and dictionaries. This tool did not con-
1758
+ tribute to the generation of new scientific content
1759
+ or ideas, and its usage is consistent with standard
1760
+ practices for manuscript preparation.
1761
+
1762
+ B Experimental Settings
1763
+
1764
+ In this section we list down all the experimental
1765
+ settings, including datasets, architectures utilized,
1766
+ baselines and metrics. Compute: We utilize 8×
1767
+ Nvidia A6000s for all of our experiments.
1768
+
1769
+ B.1 Experiment: Overall Performance
1770
+
1771
+ Comparison.
1772
+
1773
+ Mistral-7B (Jiang et
1774
+
1775
+ Datasets and Architectures. To comprehen-
1776
+ sively evaluate the effectiveness and generaliz-
1777
+ ability of our permutation-invariant attack, we
1778
+ conduct experiments across a diverse range of
1779
+ target LLM architectures and datasets. Specifically,
1780
+ we evaluate our method on Llama-2-7B (Tou-
1781
+ Llama-3.1-8B (Dubey
1782
+ vron et al., 2023),
1783
+ et
1784
+ al.,
1785
+ al., 2024),
1786
+ 2023), Gemma-2-9B (Team et al., 2024) and
1787
+ DeepSeek-R1-Distilled (Guo et al., 2025)
1788
+ version of Llama-3-8.1B (Dubey et al., 2024).
1789
+ These architectures represent a broad spectrum of
1790
+ model scales and training paradigms, ensuring a
1791
+ rigorous assessment of our attack’s applicability.
1792
+ For evaluation dataset, we utilize three distinct
1793
+ benchmarks: ❶ Jailbreak Benchmark (Chao
1794
+ et al., 2024): a collection of 100 harmful misuse
1795
+ behaviors ranging from physical harm to disin-
1796
+ formation.❷ Adversarial Benchmark (Zou et al.,
1797
+ 2023): a collection of 520 harmful instructions
1798
+ sharing the theme of profanity, discrimination,
1799
+ cybercrime and misinformation.❸ In-the-Wild
1800
+ Jailbreak Benchmark (Shen et al., 2024): A curated
1801
+ dataset of 1405 Jailbreak prompts with focus on
1802
+ upto 13 different scenarios including Fraud, Harm
1803
+ and Pornography. Furthermore, to simulate the
1804
+ multi-agent system we assign 1 random topology
1805
+ to each of the prompt for all models to have
1806
+ a consistent comparison. As the topology is
1807
+ randomly generated, we run our experiment 3
1808
+ times to mitigate the effect of any bias/seed.
1809
+
1810
+ Baselines. Given a very specific multi-agent sys-
1811
+ tem setup, we identify 2 main baselines: ❶ Greedy
1812
+ Coordinate Gradient (GCG) Attack (Zou et al.,
1813
+ 2023) and ❷ Vanilla Instructions that come paired
1814
+ in each of the benchmarks above. To calculate the
1815
+ GCG Prompt we use the NanoGCG (GraySwanAI,
1816
+ 2024) library with consistent settings across
1817
+ datasets and benchmarks. We optimize each
1818
+ prompt for upto 500 steps, and have a search width
1819
+ of 64, alongside 64 token replacements for any
1820
+ given position (topk). In this set of experiments we
1821
+ use the PIEL instead of S-PIEL for a comprehen-
1822
+ sive comparison.
1823
+
1824
+ Metrics. Across these benchmarks, we measure
1825
+ Permuted Attack Success Rate: Given an attack
1826
+ prompt, we will create K chunks (provided by the
1827
+ topological optimization method), and choose 1
1828
+ random permutation out of K! possible permuta-
1829
+ tions. Then we will pass this permutation across
1830
+ the system and record the results. As we repeat this
1831
+ experiment 3 times to avoid any bias/randomness,
1832
+ we also report ASR-m, the minimum ASR achieved
1833
+ in these 3 runs, and ASR-M which is the maximum
1834
+ ASR achieved, alongside the average ASR.
1835
+
1836
+ B.2 Experiment: Safety Mechanism Efficacy
1837
+
1838
+ (Team,
1839
+ al.,
1840
+ et
1841
+
1842
+ evaluation of our
1843
+
1844
+ Llama-Guard-2-8B
1845
+ (Dubey
1846
+
1847
+ Datasets and Architectures. For
1848
+ a
1849
+ com-
1850
+ prehensive
1851
+ routing, we
1852
+ utilize a diverse set of five safety-aligned
1853
+ safety models: Llama-Guard-7B (Inan et al.,
1854
+ 2024),
1855
+ 2023),
1856
+ Llama-Guard-3-8B
1857
+ 2024),
1858
+ Llama-Guard-3-1B (Dubey et al., 2024) and
1859
+ Prompt-Guard-86M (Meta, 2024). Such a diverse
1860
+ set ensures that a varying level of safety-aligned
1861
+ architectures are considered in our analysis.
1862
+ For the dataset, we use a complete benchmark
1863
+ provided by Jailbreak Benchmark which includes
1864
+ 100 harmful prompts, and 100 benign prompts
1865
+ which will help us quantify the false positive and
1866
+ false negative rates.
1867
+
1868
+ Settings and Metrics. We generate a random
1869
+ communication graph with maximum degree 3 for
1870
+ each prompt, following recent work (Zhang et al.,
1871
+ 2024b) showing sparse topologies achieve compa-
1872
+ rable performance as dense networks. We optimize
1873
+ the token flow using the algorithm described in
1874
+ Section 4.1, which provides optimal chunk lengths
1875
+ for each edge. We then process the prompts as fol-
1876
+ lows: vanilla and GCG prompts are directly chun-
1877
+ ked based on optimal lengths, while our method em-
1878
+
1879
+ B.5 Ablation: Sensitivity Analysis of the
1880
+
1881
+ Stochastic Version
1882
+
1883
+ Settings. We evaluate how increasing M influ-
1884
+ ences the ASR on Jailbreak Benchmark, and also
1885
+ assess the scalability and efficiency of our attack
1886
+ under computational budgets. For all of our exper-
1887
+ iments in this section we use Llama-2-7B as our
1888
+ target model type, and K = 5! = 120.
1889
+
1890
+ C Quantifying Detection Risk
1891
+
1892
+ Figure 5: Effect of chunk length on the detection of
1893
+ PromptGuard-86M and its 4-bit quantized version.
1894
+
1895
+ To quantify the risk associated with the safety
1896
+ mechanisms deployed in the multi-agent LLM sys-
1897
+ tem, we conducted a series of empirical experi-
1898
+ ments evaluating PromptGuard-86M (Meta, 2024)
1899
+ and its 4-bit quantized version. We used the com-
1900
+ plete Jailbreak Benchmark with benign samples–
1901
+ comprising of 100 harmful and 100 non-harmful
1902
+ examples – to systemically examine how prompt
1903
+ chunking affects detection performance. Specifi-
1904
+ cally, we split the GCG prompts into varying token
1905
+ lengths and assessed the F1 score of the safety
1906
+ mechanism at different chunk sizes. The results in
1907
+ Figure 5 reveal that shorter prompt fragments often
1908
+ evade detection, whereas longer chunks improve
1909
+ safety enforcement by providing more context for
1910
+ classification.
1911
+
1912
+ ploys the full Permutation-Invariant Evasion Loss.
1913
+ For evaluation metric, we assess the detection per-
1914
+ formance using the F1-Score, which provides a
1915
+ balanced measure of safety mechanism’s effective-
1916
+ ness by combining precision and recall, capturing
1917
+ both false positives, and false negatives.
1918
+
1919
+ B.3 Experiment: Transferability
1920
+
1921
+ Datasets
1922
+ and Architectures. To
1923
+ evaluate
1924
+ the transferability of our permutation-invariant
1925
+ prompts, we conduct experiments across multiple
1926
+ LLM architectures and benchmark datasets.
1927
+ Specifically, we want to assess whether adversarial
1928
+ prompts optimized for one target model and
1929
+ effectively transfer and maintain high attack
1930
+ success rates when applied to unseen models. We
1931
+ conduct experiments on Llama-2-7B, Mistral-7B
1932
+ and Gemma-2-9B which represents a diverse set
1933
+ of models. For evaluation datasets, we utilize
1934
+ two benchmarks: ❶ Jailbreak Benchmark, which
1935
+ consists of 100 harmful prompts, and ❷ Adver-
1936
+ sarial Benchmark, a collection of 520 harmful
1937
+ instructions.
1938
+
1939
+ Settings and Metrics. Similar to the first exper-
1940
+ iment, we will assign each prompt to a random
1941
+ communication topology and then optimize over
1942
+ it to find the optimal chunk length and number of
1943
+ chunks. Then we will use our Permutation Invari-
1944
+ ant Evasion Loss to generate the prompts for one
1945
+ target model. Lastly, we will sample 1 random per-
1946
+ mutation of the said prompt and apply it to other
1947
+ target models for a fair comparison. Hence the At-
1948
+ tack Success Rate (ASR) will be calculated based
1949
+ on this permutation’s performance which will be
1950
+ sampled randomly (but same for all models after
1951
+ sampling).
1952
+
1953
+ B.4 Ablation: Effect of Topology
1954
+
1955
+ Settings. Specifically, we examine how differ-
1956
+ ent agent connectivity patterns impact the Attack
1957
+ Success Rate (ASR) of our permutation invari-
1958
+ ant attack. We test four distinct topologies: ❶
1959
+ Chain, ❷ Tree, ❸ Complete Graph and ❹ Random
1960
+ Graph, each representing a varying level of con-
1961
+ nectivity. For the dataset, we calculate the ASR
1962
+ on Jailbreak Benchmark. Furthermore, notice that
1963
+ in this case each edge will have a safety mech-
1964
+ anism randomly assigned from a set of the fol-
1965
+ lowing: PromptGuard-86M, Llama-Guard-7B and
1966
+ Llama-Guard-3-8B. Lastly, for all the cases we
1967
+ use Llama-2-7B as our target model.
1968
+
1969
+ D Permutation-Invariant Evasion Loss
1970
+
1971
+ Algorithm
1972
+
1973
+ The algorithm for Permutation Invariant Loss is
1974
+ provided below:
1975
+
1976
+ Algorithm 1 Permutation-Invariant Evasion Opti-
1977
+ mization
1978
+ Require: Target Model Type vt, Initial Chunk Set
1979
+ C = {C1, . . . , CK} from Topological Opti-
1980
+ mization, Iterations T
1981
+
1982
+ Ensure: Optimized Chunked Set C∗
1983
+ 1: Randomly initialize token sequences Ck
1984
+ 2: for t = 1 to T do
1985
+ 3:
1986
+
1987
+ SK ← Set of all Permutations
1988
+ Total Loss L(C) ← 0
1989
+ for π ∈ SK do
1990
+
1991
+ 4:
1992
+
1993
+ 5:
1994
+
1995
+ 6:
1996
+
1997
+ 7:
1998
+
1999
+ 8:
2000
+
2001
+ 9:
2002
+
2003
+ 10:
2004
+
2005
+ 11:
2006
+
2007
+ 12:
2008
+
2009
+ ϕ = Concat(π(1), ..., π(K))
2010
+ Lπ = − log p(x⋆
2011
+ n+1:n+H |ϕ)
2012
+ L(C) = L(C) + Lπ
2013
+
2014
+ end for
2015
+ L(C) = L(C)/K!
2016
+ for Ci ∈ C do
2017
+
2018
+ GCG(Ci, L(C))
2019
+
2020
+ end for
2021
+
2022
+ 13:
2023
+ 14: end for
2024
+ 15: return Optimized adversarial chunks C∗
2025
+