@raishin/vanguard-frontier-agentic 2.0.1 → 2.2.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (467) hide show
  1. package/.claude-plugin/plugin.json +39 -1
  2. package/.cursor-plugin/plugin.json +39 -1
  3. package/.github/plugin/marketplace.json +1 -1
  4. package/README.md +119 -13
  5. package/agents/README.md +47 -2
  6. package/agents/hr/README.md +42 -0
  7. package/agents/hr/hr-analytics-people-data-agent/AGENT.md +64 -0
  8. package/agents/hr/hr-analytics-people-data-agent/harnesses/claude-code.agent.md +42 -0
  9. package/agents/hr/hr-analytics-people-data-agent/harnesses/codex.toml +73 -0
  10. package/agents/hr/hr-analytics-people-data-agent/harnesses/copilot.agent.md +42 -0
  11. package/agents/hr/hr-analytics-people-data-agent/harnesses/cursor.agent.md +42 -0
  12. package/agents/hr/hr-analytics-people-data-agent/harnesses/gemini.agent.md +42 -0
  13. package/agents/hr/hr-analytics-people-data-agent/harnesses/kiro-cli.agent.json +5 -0
  14. package/agents/hr/hr-analytics-people-data-agent/harnesses/kiro-ide.agent.md +42 -0
  15. package/agents/hr/hr-analytics-people-data-agent/metadata.json +42 -0
  16. package/agents/hr/hr-benefits-payroll-agent/AGENT.md +64 -0
  17. package/agents/hr/hr-benefits-payroll-agent/harnesses/claude-code.agent.md +42 -0
  18. package/agents/hr/hr-benefits-payroll-agent/harnesses/codex.toml +72 -0
  19. package/agents/hr/hr-benefits-payroll-agent/harnesses/copilot.agent.md +42 -0
  20. package/agents/hr/hr-benefits-payroll-agent/harnesses/cursor.agent.md +42 -0
  21. package/agents/hr/hr-benefits-payroll-agent/harnesses/gemini.agent.md +42 -0
  22. package/agents/hr/hr-benefits-payroll-agent/harnesses/kiro-cli.agent.json +5 -0
  23. package/agents/hr/hr-benefits-payroll-agent/harnesses/kiro-ide.agent.md +42 -0
  24. package/agents/hr/hr-benefits-payroll-agent/metadata.json +42 -0
  25. package/agents/hr/hr-compensation-equity-agent/AGENT.md +64 -0
  26. package/agents/hr/hr-compensation-equity-agent/harnesses/claude-code.agent.md +42 -0
  27. package/agents/hr/hr-compensation-equity-agent/harnesses/codex.toml +75 -0
  28. package/agents/hr/hr-compensation-equity-agent/harnesses/copilot.agent.md +42 -0
  29. package/agents/hr/hr-compensation-equity-agent/harnesses/cursor.agent.md +42 -0
  30. package/agents/hr/hr-compensation-equity-agent/harnesses/gemini.agent.md +42 -0
  31. package/agents/hr/hr-compensation-equity-agent/harnesses/kiro-cli.agent.json +5 -0
  32. package/agents/hr/hr-compensation-equity-agent/harnesses/kiro-ide.agent.md +42 -0
  33. package/agents/hr/hr-compensation-equity-agent/metadata.json +42 -0
  34. package/agents/hr/hr-culture-dei-agent/AGENT.md +64 -0
  35. package/agents/hr/hr-culture-dei-agent/harnesses/claude-code.agent.md +42 -0
  36. package/agents/hr/hr-culture-dei-agent/harnesses/codex.toml +73 -0
  37. package/agents/hr/hr-culture-dei-agent/harnesses/copilot.agent.md +42 -0
  38. package/agents/hr/hr-culture-dei-agent/harnesses/cursor.agent.md +42 -0
  39. package/agents/hr/hr-culture-dei-agent/harnesses/gemini.agent.md +42 -0
  40. package/agents/hr/hr-culture-dei-agent/harnesses/kiro-cli.agent.json +5 -0
  41. package/agents/hr/hr-culture-dei-agent/harnesses/kiro-ide.agent.md +42 -0
  42. package/agents/hr/hr-culture-dei-agent/metadata.json +42 -0
  43. package/agents/hr/hr-employee-relations-agent/AGENT.md +64 -0
  44. package/agents/hr/hr-employee-relations-agent/harnesses/claude-code.agent.md +42 -0
  45. package/agents/hr/hr-employee-relations-agent/harnesses/codex.toml +73 -0
  46. package/agents/hr/hr-employee-relations-agent/harnesses/copilot.agent.md +42 -0
  47. package/agents/hr/hr-employee-relations-agent/harnesses/cursor.agent.md +42 -0
  48. package/agents/hr/hr-employee-relations-agent/harnesses/gemini.agent.md +42 -0
  49. package/agents/hr/hr-employee-relations-agent/harnesses/kiro-cli.agent.json +5 -0
  50. package/agents/hr/hr-employee-relations-agent/harnesses/kiro-ide.agent.md +42 -0
  51. package/agents/hr/hr-employee-relations-agent/metadata.json +42 -0
  52. package/agents/hr/hr-hris-process-controls-agent/AGENT.md +64 -0
  53. package/agents/hr/hr-hris-process-controls-agent/harnesses/claude-code.agent.md +42 -0
  54. package/agents/hr/hr-hris-process-controls-agent/harnesses/codex.toml +73 -0
  55. package/agents/hr/hr-hris-process-controls-agent/harnesses/copilot.agent.md +42 -0
  56. package/agents/hr/hr-hris-process-controls-agent/harnesses/cursor.agent.md +42 -0
  57. package/agents/hr/hr-hris-process-controls-agent/harnesses/gemini.agent.md +42 -0
  58. package/agents/hr/hr-hris-process-controls-agent/harnesses/kiro-cli.agent.json +5 -0
  59. package/agents/hr/hr-hris-process-controls-agent/harnesses/kiro-ide.agent.md +42 -0
  60. package/agents/hr/hr-hris-process-controls-agent/metadata.json +42 -0
  61. package/agents/hr/hr-learning-policy-agent/AGENT.md +64 -0
  62. package/agents/hr/hr-learning-policy-agent/harnesses/claude-code.agent.md +42 -0
  63. package/agents/hr/hr-learning-policy-agent/harnesses/codex.toml +73 -0
  64. package/agents/hr/hr-learning-policy-agent/harnesses/copilot.agent.md +42 -0
  65. package/agents/hr/hr-learning-policy-agent/harnesses/cursor.agent.md +42 -0
  66. package/agents/hr/hr-learning-policy-agent/harnesses/gemini.agent.md +42 -0
  67. package/agents/hr/hr-learning-policy-agent/harnesses/kiro-cli.agent.json +5 -0
  68. package/agents/hr/hr-learning-policy-agent/harnesses/kiro-ide.agent.md +42 -0
  69. package/agents/hr/hr-learning-policy-agent/metadata.json +42 -0
  70. package/agents/hr/hr-leave-accommodation-agent/AGENT.md +64 -0
  71. package/agents/hr/hr-leave-accommodation-agent/harnesses/claude-code.agent.md +42 -0
  72. package/agents/hr/hr-leave-accommodation-agent/harnesses/codex.toml +76 -0
  73. package/agents/hr/hr-leave-accommodation-agent/harnesses/copilot.agent.md +42 -0
  74. package/agents/hr/hr-leave-accommodation-agent/harnesses/cursor.agent.md +42 -0
  75. package/agents/hr/hr-leave-accommodation-agent/harnesses/gemini.agent.md +42 -0
  76. package/agents/hr/hr-leave-accommodation-agent/harnesses/kiro-cli.agent.json +5 -0
  77. package/agents/hr/hr-leave-accommodation-agent/harnesses/kiro-ide.agent.md +42 -0
  78. package/agents/hr/hr-leave-accommodation-agent/metadata.json +42 -0
  79. package/agents/hr/hr-maestro-agent/AGENT.md +84 -0
  80. package/agents/hr/hr-maestro-agent/harnesses/claude-code.agent.md +61 -0
  81. package/agents/hr/hr-maestro-agent/harnesses/codex.toml +66 -0
  82. package/agents/hr/hr-maestro-agent/harnesses/copilot.agent.md +61 -0
  83. package/agents/hr/hr-maestro-agent/harnesses/cursor.agent.md +61 -0
  84. package/agents/hr/hr-maestro-agent/harnesses/gemini.agent.md +61 -0
  85. package/agents/hr/hr-maestro-agent/harnesses/kiro-cli.agent.json +5 -0
  86. package/agents/hr/hr-maestro-agent/harnesses/kiro-ide.agent.md +61 -0
  87. package/agents/hr/hr-maestro-agent/metadata.json +42 -0
  88. package/agents/hr/hr-performance-management-agent/AGENT.md +64 -0
  89. package/agents/hr/hr-performance-management-agent/harnesses/claude-code.agent.md +42 -0
  90. package/agents/hr/hr-performance-management-agent/harnesses/codex.toml +77 -0
  91. package/agents/hr/hr-performance-management-agent/harnesses/copilot.agent.md +42 -0
  92. package/agents/hr/hr-performance-management-agent/harnesses/cursor.agent.md +42 -0
  93. package/agents/hr/hr-performance-management-agent/harnesses/gemini.agent.md +42 -0
  94. package/agents/hr/hr-performance-management-agent/harnesses/kiro-cli.agent.json +5 -0
  95. package/agents/hr/hr-performance-management-agent/harnesses/kiro-ide.agent.md +42 -0
  96. package/agents/hr/hr-performance-management-agent/metadata.json +42 -0
  97. package/agents/hr/hr-recruiting-selection-agent/AGENT.md +64 -0
  98. package/agents/hr/hr-recruiting-selection-agent/harnesses/claude-code.agent.md +42 -0
  99. package/agents/hr/hr-recruiting-selection-agent/harnesses/codex.toml +74 -0
  100. package/agents/hr/hr-recruiting-selection-agent/harnesses/copilot.agent.md +42 -0
  101. package/agents/hr/hr-recruiting-selection-agent/harnesses/cursor.agent.md +42 -0
  102. package/agents/hr/hr-recruiting-selection-agent/harnesses/gemini.agent.md +42 -0
  103. package/agents/hr/hr-recruiting-selection-agent/harnesses/kiro-cli.agent.json +5 -0
  104. package/agents/hr/hr-recruiting-selection-agent/harnesses/kiro-ide.agent.md +42 -0
  105. package/agents/hr/hr-recruiting-selection-agent/metadata.json +42 -0
  106. package/agents/hr/hr-risk-triage-review-agent/AGENT.md +57 -0
  107. package/agents/hr/hr-risk-triage-review-agent/harnesses/claude-code.agent.md +41 -0
  108. package/agents/hr/hr-risk-triage-review-agent/harnesses/codex.toml +38 -0
  109. package/agents/hr/hr-risk-triage-review-agent/harnesses/copilot.agent.md +41 -0
  110. package/agents/hr/hr-risk-triage-review-agent/harnesses/cursor.agent.md +41 -0
  111. package/agents/hr/hr-risk-triage-review-agent/harnesses/gemini.agent.md +41 -0
  112. package/agents/hr/hr-risk-triage-review-agent/harnesses/kiro-cli.agent.json +5 -0
  113. package/agents/hr/hr-risk-triage-review-agent/harnesses/kiro-ide.agent.md +41 -0
  114. package/agents/hr/hr-risk-triage-review-agent/metadata.json +43 -0
  115. package/agents/hr/hr-termination-readiness-agent/AGENT.md +64 -0
  116. package/agents/hr/hr-termination-readiness-agent/harnesses/claude-code.agent.md +42 -0
  117. package/agents/hr/hr-termination-readiness-agent/harnesses/codex.toml +76 -0
  118. package/agents/hr/hr-termination-readiness-agent/harnesses/copilot.agent.md +42 -0
  119. package/agents/hr/hr-termination-readiness-agent/harnesses/cursor.agent.md +42 -0
  120. package/agents/hr/hr-termination-readiness-agent/harnesses/gemini.agent.md +42 -0
  121. package/agents/hr/hr-termination-readiness-agent/harnesses/kiro-cli.agent.json +5 -0
  122. package/agents/hr/hr-termination-readiness-agent/harnesses/kiro-ide.agent.md +42 -0
  123. package/agents/hr/hr-termination-readiness-agent/metadata.json +42 -0
  124. package/agents/hr/hr-workforce-planning-rif-agent/AGENT.md +64 -0
  125. package/agents/hr/hr-workforce-planning-rif-agent/harnesses/claude-code.agent.md +42 -0
  126. package/agents/hr/hr-workforce-planning-rif-agent/harnesses/codex.toml +74 -0
  127. package/agents/hr/hr-workforce-planning-rif-agent/harnesses/copilot.agent.md +42 -0
  128. package/agents/hr/hr-workforce-planning-rif-agent/harnesses/cursor.agent.md +42 -0
  129. package/agents/hr/hr-workforce-planning-rif-agent/harnesses/gemini.agent.md +42 -0
  130. package/agents/hr/hr-workforce-planning-rif-agent/harnesses/kiro-cli.agent.json +5 -0
  131. package/agents/hr/hr-workforce-planning-rif-agent/harnesses/kiro-ide.agent.md +42 -0
  132. package/agents/hr/hr-workforce-planning-rif-agent/metadata.json +42 -0
  133. package/agents/hr/hr-workplace-investigations-agent/AGENT.md +64 -0
  134. package/agents/hr/hr-workplace-investigations-agent/harnesses/claude-code.agent.md +42 -0
  135. package/agents/hr/hr-workplace-investigations-agent/harnesses/codex.toml +77 -0
  136. package/agents/hr/hr-workplace-investigations-agent/harnesses/copilot.agent.md +42 -0
  137. package/agents/hr/hr-workplace-investigations-agent/harnesses/cursor.agent.md +42 -0
  138. package/agents/hr/hr-workplace-investigations-agent/harnesses/gemini.agent.md +42 -0
  139. package/agents/hr/hr-workplace-investigations-agent/harnesses/kiro-cli.agent.json +5 -0
  140. package/agents/hr/hr-workplace-investigations-agent/harnesses/kiro-ide.agent.md +42 -0
  141. package/agents/hr/hr-workplace-investigations-agent/metadata.json +42 -0
  142. package/agents/legal/README.md +41 -0
  143. package/agents/legal/legal-contract-review-agent/AGENT.md +61 -0
  144. package/agents/legal/legal-contract-review-agent/harnesses/claude-code.agent.md +42 -0
  145. package/agents/legal/legal-contract-review-agent/harnesses/codex.toml +76 -0
  146. package/agents/legal/legal-contract-review-agent/harnesses/copilot.agent.md +42 -0
  147. package/agents/legal/legal-contract-review-agent/harnesses/cursor.agent.md +42 -0
  148. package/agents/legal/legal-contract-review-agent/harnesses/gemini.agent.md +42 -0
  149. package/agents/legal/legal-contract-review-agent/harnesses/kiro-cli.agent.json +5 -0
  150. package/agents/legal/legal-contract-review-agent/harnesses/kiro-ide.agent.md +42 -0
  151. package/agents/legal/legal-contract-review-agent/metadata.json +42 -0
  152. package/agents/legal/legal-counsel-review-agent/AGENT.md +55 -0
  153. package/agents/legal/legal-counsel-review-agent/harnesses/claude-code.agent.md +39 -0
  154. package/agents/legal/legal-counsel-review-agent/harnesses/codex.toml +36 -0
  155. package/agents/legal/legal-counsel-review-agent/harnesses/copilot.agent.md +39 -0
  156. package/agents/legal/legal-counsel-review-agent/harnesses/cursor.agent.md +39 -0
  157. package/agents/legal/legal-counsel-review-agent/harnesses/gemini.agent.md +39 -0
  158. package/agents/legal/legal-counsel-review-agent/harnesses/kiro-cli.agent.json +5 -0
  159. package/agents/legal/legal-counsel-review-agent/harnesses/kiro-ide.agent.md +39 -0
  160. package/agents/legal/legal-counsel-review-agent/metadata.json +43 -0
  161. package/agents/legal/legal-employment-law-risk-agent/AGENT.md +61 -0
  162. package/agents/legal/legal-employment-law-risk-agent/harnesses/claude-code.agent.md +42 -0
  163. package/agents/legal/legal-employment-law-risk-agent/harnesses/codex.toml +78 -0
  164. package/agents/legal/legal-employment-law-risk-agent/harnesses/copilot.agent.md +42 -0
  165. package/agents/legal/legal-employment-law-risk-agent/harnesses/cursor.agent.md +42 -0
  166. package/agents/legal/legal-employment-law-risk-agent/harnesses/gemini.agent.md +42 -0
  167. package/agents/legal/legal-employment-law-risk-agent/harnesses/kiro-cli.agent.json +5 -0
  168. package/agents/legal/legal-employment-law-risk-agent/harnesses/kiro-ide.agent.md +42 -0
  169. package/agents/legal/legal-employment-law-risk-agent/metadata.json +42 -0
  170. package/agents/legal/legal-ethics-investigations-agent/AGENT.md +61 -0
  171. package/agents/legal/legal-ethics-investigations-agent/harnesses/claude-code.agent.md +42 -0
  172. package/agents/legal/legal-ethics-investigations-agent/harnesses/codex.toml +70 -0
  173. package/agents/legal/legal-ethics-investigations-agent/harnesses/copilot.agent.md +42 -0
  174. package/agents/legal/legal-ethics-investigations-agent/harnesses/cursor.agent.md +42 -0
  175. package/agents/legal/legal-ethics-investigations-agent/harnesses/gemini.agent.md +42 -0
  176. package/agents/legal/legal-ethics-investigations-agent/harnesses/kiro-cli.agent.json +5 -0
  177. package/agents/legal/legal-ethics-investigations-agent/harnesses/kiro-ide.agent.md +42 -0
  178. package/agents/legal/legal-ethics-investigations-agent/metadata.json +42 -0
  179. package/agents/legal/legal-ip-open-source-agent/AGENT.md +61 -0
  180. package/agents/legal/legal-ip-open-source-agent/harnesses/claude-code.agent.md +42 -0
  181. package/agents/legal/legal-ip-open-source-agent/harnesses/codex.toml +78 -0
  182. package/agents/legal/legal-ip-open-source-agent/harnesses/copilot.agent.md +42 -0
  183. package/agents/legal/legal-ip-open-source-agent/harnesses/cursor.agent.md +42 -0
  184. package/agents/legal/legal-ip-open-source-agent/harnesses/gemini.agent.md +42 -0
  185. package/agents/legal/legal-ip-open-source-agent/harnesses/kiro-cli.agent.json +5 -0
  186. package/agents/legal/legal-ip-open-source-agent/harnesses/kiro-ide.agent.md +42 -0
  187. package/agents/legal/legal-ip-open-source-agent/metadata.json +42 -0
  188. package/agents/legal/legal-knowledge-management-agent/AGENT.md +61 -0
  189. package/agents/legal/legal-knowledge-management-agent/harnesses/claude-code.agent.md +42 -0
  190. package/agents/legal/legal-knowledge-management-agent/harnesses/codex.toml +68 -0
  191. package/agents/legal/legal-knowledge-management-agent/harnesses/copilot.agent.md +42 -0
  192. package/agents/legal/legal-knowledge-management-agent/harnesses/cursor.agent.md +42 -0
  193. package/agents/legal/legal-knowledge-management-agent/harnesses/gemini.agent.md +42 -0
  194. package/agents/legal/legal-knowledge-management-agent/harnesses/kiro-cli.agent.json +5 -0
  195. package/agents/legal/legal-knowledge-management-agent/harnesses/kiro-ide.agent.md +42 -0
  196. package/agents/legal/legal-knowledge-management-agent/metadata.json +42 -0
  197. package/agents/legal/legal-litigation-discovery-hold-agent/AGENT.md +61 -0
  198. package/agents/legal/legal-litigation-discovery-hold-agent/harnesses/claude-code.agent.md +42 -0
  199. package/agents/legal/legal-litigation-discovery-hold-agent/harnesses/codex.toml +78 -0
  200. package/agents/legal/legal-litigation-discovery-hold-agent/harnesses/copilot.agent.md +42 -0
  201. package/agents/legal/legal-litigation-discovery-hold-agent/harnesses/cursor.agent.md +42 -0
  202. package/agents/legal/legal-litigation-discovery-hold-agent/harnesses/gemini.agent.md +42 -0
  203. package/agents/legal/legal-litigation-discovery-hold-agent/harnesses/kiro-cli.agent.json +5 -0
  204. package/agents/legal/legal-litigation-discovery-hold-agent/harnesses/kiro-ide.agent.md +42 -0
  205. package/agents/legal/legal-litigation-discovery-hold-agent/metadata.json +42 -0
  206. package/agents/legal/legal-maestro-agent/AGENT.md +78 -0
  207. package/agents/legal/legal-maestro-agent/harnesses/claude-code.agent.md +56 -0
  208. package/agents/legal/legal-maestro-agent/harnesses/codex.toml +61 -0
  209. package/agents/legal/legal-maestro-agent/harnesses/copilot.agent.md +56 -0
  210. package/agents/legal/legal-maestro-agent/harnesses/cursor.agent.md +56 -0
  211. package/agents/legal/legal-maestro-agent/harnesses/gemini.agent.md +56 -0
  212. package/agents/legal/legal-maestro-agent/harnesses/kiro-cli.agent.json +5 -0
  213. package/agents/legal/legal-maestro-agent/harnesses/kiro-ide.agent.md +56 -0
  214. package/agents/legal/legal-maestro-agent/metadata.json +42 -0
  215. package/agents/legal/legal-policy-governance-agent/AGENT.md +61 -0
  216. package/agents/legal/legal-policy-governance-agent/harnesses/claude-code.agent.md +42 -0
  217. package/agents/legal/legal-policy-governance-agent/harnesses/codex.toml +68 -0
  218. package/agents/legal/legal-policy-governance-agent/harnesses/copilot.agent.md +42 -0
  219. package/agents/legal/legal-policy-governance-agent/harnesses/cursor.agent.md +42 -0
  220. package/agents/legal/legal-policy-governance-agent/harnesses/gemini.agent.md +42 -0
  221. package/agents/legal/legal-policy-governance-agent/harnesses/kiro-cli.agent.json +5 -0
  222. package/agents/legal/legal-policy-governance-agent/harnesses/kiro-ide.agent.md +42 -0
  223. package/agents/legal/legal-policy-governance-agent/metadata.json +42 -0
  224. package/agents/legal/legal-privacy-data-protection-agent/AGENT.md +61 -0
  225. package/agents/legal/legal-privacy-data-protection-agent/harnesses/claude-code.agent.md +42 -0
  226. package/agents/legal/legal-privacy-data-protection-agent/harnesses/codex.toml +79 -0
  227. package/agents/legal/legal-privacy-data-protection-agent/harnesses/copilot.agent.md +42 -0
  228. package/agents/legal/legal-privacy-data-protection-agent/harnesses/cursor.agent.md +42 -0
  229. package/agents/legal/legal-privacy-data-protection-agent/harnesses/gemini.agent.md +42 -0
  230. package/agents/legal/legal-privacy-data-protection-agent/harnesses/kiro-cli.agent.json +5 -0
  231. package/agents/legal/legal-privacy-data-protection-agent/harnesses/kiro-ide.agent.md +42 -0
  232. package/agents/legal/legal-privacy-data-protection-agent/metadata.json +42 -0
  233. package/agents/legal/legal-public-disclosure-agent/AGENT.md +61 -0
  234. package/agents/legal/legal-public-disclosure-agent/harnesses/claude-code.agent.md +42 -0
  235. package/agents/legal/legal-public-disclosure-agent/harnesses/codex.toml +69 -0
  236. package/agents/legal/legal-public-disclosure-agent/harnesses/copilot.agent.md +42 -0
  237. package/agents/legal/legal-public-disclosure-agent/harnesses/cursor.agent.md +42 -0
  238. package/agents/legal/legal-public-disclosure-agent/harnesses/gemini.agent.md +42 -0
  239. package/agents/legal/legal-public-disclosure-agent/harnesses/kiro-cli.agent.json +5 -0
  240. package/agents/legal/legal-public-disclosure-agent/harnesses/kiro-ide.agent.md +42 -0
  241. package/agents/legal/legal-public-disclosure-agent/metadata.json +42 -0
  242. package/agents/legal/legal-regulatory-compliance-agent/AGENT.md +61 -0
  243. package/agents/legal/legal-regulatory-compliance-agent/harnesses/claude-code.agent.md +42 -0
  244. package/agents/legal/legal-regulatory-compliance-agent/harnesses/codex.toml +77 -0
  245. package/agents/legal/legal-regulatory-compliance-agent/harnesses/copilot.agent.md +42 -0
  246. package/agents/legal/legal-regulatory-compliance-agent/harnesses/cursor.agent.md +42 -0
  247. package/agents/legal/legal-regulatory-compliance-agent/harnesses/gemini.agent.md +42 -0
  248. package/agents/legal/legal-regulatory-compliance-agent/harnesses/kiro-cli.agent.json +5 -0
  249. package/agents/legal/legal-regulatory-compliance-agent/harnesses/kiro-ide.agent.md +42 -0
  250. package/agents/legal/legal-regulatory-compliance-agent/metadata.json +42 -0
  251. package/agents/legal/legal-vendor-procurement-risk-agent/AGENT.md +61 -0
  252. package/agents/legal/legal-vendor-procurement-risk-agent/harnesses/claude-code.agent.md +42 -0
  253. package/agents/legal/legal-vendor-procurement-risk-agent/harnesses/codex.toml +67 -0
  254. package/agents/legal/legal-vendor-procurement-risk-agent/harnesses/copilot.agent.md +42 -0
  255. package/agents/legal/legal-vendor-procurement-risk-agent/harnesses/cursor.agent.md +42 -0
  256. package/agents/legal/legal-vendor-procurement-risk-agent/harnesses/gemini.agent.md +42 -0
  257. package/agents/legal/legal-vendor-procurement-risk-agent/harnesses/kiro-cli.agent.json +5 -0
  258. package/agents/legal/legal-vendor-procurement-risk-agent/harnesses/kiro-ide.agent.md +42 -0
  259. package/agents/legal/legal-vendor-procurement-risk-agent/metadata.json +42 -0
  260. package/agents/qa/README.md +51 -0
  261. package/agents/qa/ci-test-pipeline-review-agent/AGENT.md +51 -0
  262. package/agents/qa/ci-test-pipeline-review-agent/harnesses/claude-code.agent.md +35 -0
  263. package/agents/qa/ci-test-pipeline-review-agent/harnesses/codex.toml +34 -0
  264. package/agents/qa/ci-test-pipeline-review-agent/harnesses/copilot.agent.md +35 -0
  265. package/agents/qa/ci-test-pipeline-review-agent/harnesses/cursor.agent.md +35 -0
  266. package/agents/qa/ci-test-pipeline-review-agent/harnesses/gemini.agent.md +35 -0
  267. package/agents/qa/ci-test-pipeline-review-agent/harnesses/kiro-cli.agent.json +5 -0
  268. package/agents/qa/ci-test-pipeline-review-agent/harnesses/kiro-ide.agent.md +35 -0
  269. package/agents/qa/ci-test-pipeline-review-agent/metadata.json +33 -0
  270. package/agents/qa/helm-chart-quality-review-agent/AGENT.md +56 -0
  271. package/agents/qa/helm-chart-quality-review-agent/harnesses/claude-code.agent.md +40 -0
  272. package/agents/qa/helm-chart-quality-review-agent/harnesses/codex.toml +39 -0
  273. package/agents/qa/helm-chart-quality-review-agent/harnesses/copilot.agent.md +40 -0
  274. package/agents/qa/helm-chart-quality-review-agent/harnesses/cursor.agent.md +40 -0
  275. package/agents/qa/helm-chart-quality-review-agent/harnesses/gemini.agent.md +40 -0
  276. package/agents/qa/helm-chart-quality-review-agent/harnesses/kiro-cli.agent.json +5 -0
  277. package/agents/qa/helm-chart-quality-review-agent/harnesses/kiro-ide.agent.md +40 -0
  278. package/agents/qa/helm-chart-quality-review-agent/metadata.json +35 -0
  279. package/agents/qa/kubernetes-manifest-quality-review-agent/AGENT.md +55 -0
  280. package/agents/qa/kubernetes-manifest-quality-review-agent/harnesses/claude-code.agent.md +32 -0
  281. package/agents/qa/kubernetes-manifest-quality-review-agent/harnesses/codex.toml +38 -0
  282. package/agents/qa/kubernetes-manifest-quality-review-agent/harnesses/copilot.agent.md +32 -0
  283. package/agents/qa/kubernetes-manifest-quality-review-agent/harnesses/cursor.agent.md +32 -0
  284. package/agents/qa/kubernetes-manifest-quality-review-agent/harnesses/gemini.agent.md +32 -0
  285. package/agents/qa/kubernetes-manifest-quality-review-agent/harnesses/kiro-cli.agent.json +5 -0
  286. package/agents/qa/kubernetes-manifest-quality-review-agent/harnesses/kiro-ide.agent.md +32 -0
  287. package/agents/qa/kubernetes-manifest-quality-review-agent/metadata.json +35 -0
  288. package/agents/qa/llm-ai-pipeline-test-review-agent/AGENT.md +52 -0
  289. package/agents/qa/llm-ai-pipeline-test-review-agent/harnesses/claude-code.agent.md +36 -0
  290. package/agents/qa/llm-ai-pipeline-test-review-agent/harnesses/codex.toml +36 -0
  291. package/agents/qa/llm-ai-pipeline-test-review-agent/harnesses/copilot.agent.md +36 -0
  292. package/agents/qa/llm-ai-pipeline-test-review-agent/harnesses/cursor.agent.md +36 -0
  293. package/agents/qa/llm-ai-pipeline-test-review-agent/harnesses/gemini.agent.md +36 -0
  294. package/agents/qa/llm-ai-pipeline-test-review-agent/harnesses/kiro-cli.agent.json +5 -0
  295. package/agents/qa/llm-ai-pipeline-test-review-agent/harnesses/kiro-ide.agent.md +36 -0
  296. package/agents/qa/llm-ai-pipeline-test-review-agent/metadata.json +35 -0
  297. package/agents/qa/playwright-e2e-execution-run-agent/AGENT.md +50 -0
  298. package/agents/qa/playwright-e2e-execution-run-agent/harnesses/claude-code.agent.md +39 -0
  299. package/agents/qa/playwright-e2e-execution-run-agent/harnesses/cursor.agent.md +39 -0
  300. package/agents/qa/playwright-e2e-execution-run-agent/metadata.json +28 -0
  301. package/agents/qa/playwright-e2e-suite-review-agent/AGENT.md +51 -0
  302. package/agents/qa/playwright-e2e-suite-review-agent/harnesses/claude-code.agent.md +35 -0
  303. package/agents/qa/playwright-e2e-suite-review-agent/harnesses/codex.toml +34 -0
  304. package/agents/qa/playwright-e2e-suite-review-agent/harnesses/copilot.agent.md +35 -0
  305. package/agents/qa/playwright-e2e-suite-review-agent/harnesses/cursor.agent.md +35 -0
  306. package/agents/qa/playwright-e2e-suite-review-agent/harnesses/gemini.agent.md +35 -0
  307. package/agents/qa/playwright-e2e-suite-review-agent/harnesses/kiro-cli.agent.json +5 -0
  308. package/agents/qa/playwright-e2e-suite-review-agent/harnesses/kiro-ide.agent.md +35 -0
  309. package/agents/qa/playwright-e2e-suite-review-agent/metadata.json +35 -0
  310. package/agents/qa/plc-control-logic-safety-review-agent/AGENT.md +53 -0
  311. package/agents/qa/plc-control-logic-safety-review-agent/harnesses/claude-code.agent.md +37 -0
  312. package/agents/qa/plc-control-logic-safety-review-agent/harnesses/codex.toml +36 -0
  313. package/agents/qa/plc-control-logic-safety-review-agent/harnesses/copilot.agent.md +37 -0
  314. package/agents/qa/plc-control-logic-safety-review-agent/harnesses/cursor.agent.md +37 -0
  315. package/agents/qa/plc-control-logic-safety-review-agent/harnesses/gemini.agent.md +37 -0
  316. package/agents/qa/plc-control-logic-safety-review-agent/harnesses/kiro-cli.agent.json +5 -0
  317. package/agents/qa/plc-control-logic-safety-review-agent/harnesses/kiro-ide.agent.md +37 -0
  318. package/agents/qa/plc-control-logic-safety-review-agent/metadata.json +33 -0
  319. package/agents/qa/rpa-workflow-resilience-review-agent/AGENT.md +52 -0
  320. package/agents/qa/rpa-workflow-resilience-review-agent/harnesses/claude-code.agent.md +36 -0
  321. package/agents/qa/rpa-workflow-resilience-review-agent/harnesses/codex.toml +35 -0
  322. package/agents/qa/rpa-workflow-resilience-review-agent/harnesses/copilot.agent.md +36 -0
  323. package/agents/qa/rpa-workflow-resilience-review-agent/harnesses/cursor.agent.md +36 -0
  324. package/agents/qa/rpa-workflow-resilience-review-agent/harnesses/gemini.agent.md +36 -0
  325. package/agents/qa/rpa-workflow-resilience-review-agent/harnesses/kiro-cli.agent.json +5 -0
  326. package/agents/qa/rpa-workflow-resilience-review-agent/harnesses/kiro-ide.agent.md +36 -0
  327. package/agents/qa/rpa-workflow-resilience-review-agent/metadata.json +34 -0
  328. package/agents/qa/test-coverage-quality-review-agent/AGENT.md +50 -0
  329. package/agents/qa/test-coverage-quality-review-agent/harnesses/claude-code.agent.md +34 -0
  330. package/agents/qa/test-coverage-quality-review-agent/harnesses/codex.toml +33 -0
  331. package/agents/qa/test-coverage-quality-review-agent/harnesses/copilot.agent.md +34 -0
  332. package/agents/qa/test-coverage-quality-review-agent/harnesses/cursor.agent.md +34 -0
  333. package/agents/qa/test-coverage-quality-review-agent/harnesses/gemini.agent.md +34 -0
  334. package/agents/qa/test-coverage-quality-review-agent/harnesses/kiro-cli.agent.json +5 -0
  335. package/agents/qa/test-coverage-quality-review-agent/harnesses/kiro-ide.agent.md +34 -0
  336. package/agents/qa/test-coverage-quality-review-agent/metadata.json +33 -0
  337. package/agents/qa/test-flakiness-triage-agent/AGENT.md +52 -0
  338. package/agents/qa/test-flakiness-triage-agent/harnesses/claude-code.agent.md +36 -0
  339. package/agents/qa/test-flakiness-triage-agent/harnesses/codex.toml +33 -0
  340. package/agents/qa/test-flakiness-triage-agent/harnesses/copilot.agent.md +36 -0
  341. package/agents/qa/test-flakiness-triage-agent/harnesses/cursor.agent.md +36 -0
  342. package/agents/qa/test-flakiness-triage-agent/harnesses/gemini.agent.md +36 -0
  343. package/agents/qa/test-flakiness-triage-agent/harnesses/kiro-cli.agent.json +5 -0
  344. package/agents/qa/test-flakiness-triage-agent/harnesses/kiro-ide.agent.md +36 -0
  345. package/agents/qa/test-flakiness-triage-agent/metadata.json +33 -0
  346. package/catalog/agents.json +2659 -1641
  347. package/catalog/asset-integrity.json +5923 -3938
  348. package/catalog/install-roles.json +70 -1
  349. package/catalog/skill-manifest.json +395 -0
  350. package/catalog/skills.json +1153 -729
  351. package/package.json +5 -2
  352. package/plugins/vanguard-frontier-agentic/.codex-plugin/plugin.json +1 -1
  353. package/scripts/generate-readme-counts.mjs +162 -0
  354. package/skills/cross-functional/legal-hr-case-capsule/README.md +45 -0
  355. package/skills/cross-functional/legal-hr-case-capsule/SKILL.md +79 -0
  356. package/skills/cross-functional/legal-hr-case-capsule/metadata.json +19 -0
  357. package/skills/cross-functional/legal-hr-case-capsule/references/capsule-schema.md +110 -0
  358. package/skills/cross-functional/legal-hr-risk-taxonomy/README.md +97 -0
  359. package/skills/cross-functional/legal-hr-risk-taxonomy/SKILL.md +89 -0
  360. package/skills/cross-functional/legal-hr-risk-taxonomy/metadata.json +19 -0
  361. package/skills/cross-functional/legal-hr-risk-taxonomy/references/risk-labels.md +91 -0
  362. package/skills/cross-functional/legal-hr-routing-protocol/README.md +68 -0
  363. package/skills/cross-functional/legal-hr-routing-protocol/SKILL.md +92 -0
  364. package/skills/cross-functional/legal-hr-routing-protocol/metadata.json +19 -0
  365. package/skills/cross-functional/legal-hr-routing-protocol/references/handoff-matrix.md +48 -0
  366. package/skills/hr/hr-risk-triage-review/SKILL.md +60 -0
  367. package/skills/hr/hr-risk-triage-review/metadata.json +22 -0
  368. package/skills/hr/hr-risk-triage-review/references/jurisdictions/australia.md +111 -0
  369. package/skills/hr/hr-risk-triage-review/references/jurisdictions/eu.md +97 -0
  370. package/skills/hr/hr-risk-triage-review/references/jurisdictions/singapore.md +102 -0
  371. package/skills/hr/hr-risk-triage-review/references/jurisdictions/uk.md +100 -0
  372. package/skills/hr/hr-risk-triage-review/references/jurisdictions/us.md +100 -0
  373. package/skills/hr/hr-risk-triage-review/references/workflow-and-output.md +176 -0
  374. package/skills/legal/legal-counsel-review/SKILL.md +50 -0
  375. package/skills/legal/legal-counsel-review/metadata.json +22 -0
  376. package/skills/legal/legal-counsel-review/references/jurisdictions/australia.md +86 -0
  377. package/skills/legal/legal-counsel-review/references/jurisdictions/eu.md +77 -0
  378. package/skills/legal/legal-counsel-review/references/jurisdictions/singapore.md +76 -0
  379. package/skills/legal/legal-counsel-review/references/jurisdictions/uk.md +81 -0
  380. package/skills/legal/legal-counsel-review/references/jurisdictions/us.md +100 -0
  381. package/skills/legal/legal-counsel-review/references/workflow-and-output.md +148 -0
  382. package/skills/qa/ci-test-pipeline-review/SKILL.md +45 -0
  383. package/skills/qa/ci-test-pipeline-review/metadata.json +21 -0
  384. package/skills/qa/ci-test-pipeline-review/references/workflow-and-output.md +124 -0
  385. package/skills/qa/helm-chart-quality-review/SKILL.md +61 -0
  386. package/skills/qa/helm-chart-quality-review/metadata.json +23 -0
  387. package/skills/qa/helm-chart-quality-review/references/workflow-and-output.md +174 -0
  388. package/skills/qa/kubernetes-manifest-quality-review/SKILL.md +92 -0
  389. package/skills/qa/kubernetes-manifest-quality-review/metadata.json +23 -0
  390. package/skills/qa/kubernetes-manifest-quality-review/references/workflow-and-output.md +246 -0
  391. package/skills/qa/llm-ai-pipeline-test-review/SKILL.md +52 -0
  392. package/skills/qa/llm-ai-pipeline-test-review/metadata.json +23 -0
  393. package/skills/qa/llm-ai-pipeline-test-review/references/workflow-and-output.md +221 -0
  394. package/skills/qa/playwright-e2e-execution-run/SKILL.md +54 -0
  395. package/skills/qa/playwright-e2e-execution-run/metadata.json +24 -0
  396. package/skills/qa/playwright-e2e-execution-run/references/workflow-and-output.md +133 -0
  397. package/skills/qa/playwright-e2e-suite-review/SKILL.md +44 -0
  398. package/skills/qa/playwright-e2e-suite-review/metadata.json +23 -0
  399. package/skills/qa/playwright-e2e-suite-review/references/workflow-and-output.md +176 -0
  400. package/skills/qa/plc-control-logic-safety-review/SKILL.md +47 -0
  401. package/skills/qa/plc-control-logic-safety-review/metadata.json +21 -0
  402. package/skills/qa/plc-control-logic-safety-review/references/workflow-and-output.md +231 -0
  403. package/skills/qa/rpa-workflow-resilience-review/SKILL.md +47 -0
  404. package/skills/qa/rpa-workflow-resilience-review/metadata.json +22 -0
  405. package/skills/qa/rpa-workflow-resilience-review/references/workflow-and-output.md +210 -0
  406. package/skills/qa/test-coverage-quality-review/SKILL.md +44 -0
  407. package/skills/qa/test-coverage-quality-review/metadata.json +21 -0
  408. package/skills/qa/test-coverage-quality-review/references/workflow-and-output.md +139 -0
  409. package/skills/qa/test-flakiness-triage/SKILL.md +43 -0
  410. package/skills/qa/test-flakiness-triage/metadata.json +21 -0
  411. package/skills/qa/test-flakiness-triage/references/workflow-and-output.md +114 -0
  412. package/tests/eval-qa-cluster.mjs +111 -0
  413. package/tests/fixtures/hr-maestro-routing/expected/01-employee-relations.json +6 -0
  414. package/tests/fixtures/hr-maestro-routing/expected/02-workplace-investigations.json +6 -0
  415. package/tests/fixtures/hr-maestro-routing/expected/03-performance-management.json +6 -0
  416. package/tests/fixtures/hr-maestro-routing/expected/04-termination-readiness.json +6 -0
  417. package/tests/fixtures/hr-maestro-routing/expected/05-leave-accommodation.json +6 -0
  418. package/tests/fixtures/hr-maestro-routing/expected/06-recruiting-selection.json +6 -0
  419. package/tests/fixtures/hr-maestro-routing/expected/07-compensation-equity.json +6 -0
  420. package/tests/fixtures/hr-maestro-routing/expected/08-benefits-payroll.json +6 -0
  421. package/tests/fixtures/hr-maestro-routing/expected/09-workforce-planning-rif.json +6 -0
  422. package/tests/fixtures/hr-maestro-routing/expected/10-learning-policy.json +6 -0
  423. package/tests/fixtures/hr-maestro-routing/expected/11-analytics-people-data.json +6 -0
  424. package/tests/fixtures/hr-maestro-routing/expected/12-culture-dei.json +6 -0
  425. package/tests/fixtures/hr-maestro-routing/expected/13-hris-process-controls.json +6 -0
  426. package/tests/fixtures/hr-maestro-routing/expected/14-ambiguous.json +4 -0
  427. package/tests/fixtures/hr-maestro-routing/inputs/01-employee-relations.json +7 -0
  428. package/tests/fixtures/hr-maestro-routing/inputs/02-workplace-investigations.json +7 -0
  429. package/tests/fixtures/hr-maestro-routing/inputs/03-performance-management.json +7 -0
  430. package/tests/fixtures/hr-maestro-routing/inputs/04-termination-readiness.json +7 -0
  431. package/tests/fixtures/hr-maestro-routing/inputs/05-leave-accommodation.json +7 -0
  432. package/tests/fixtures/hr-maestro-routing/inputs/06-recruiting-selection.json +7 -0
  433. package/tests/fixtures/hr-maestro-routing/inputs/07-compensation-equity.json +7 -0
  434. package/tests/fixtures/hr-maestro-routing/inputs/08-benefits-payroll.json +7 -0
  435. package/tests/fixtures/hr-maestro-routing/inputs/09-workforce-planning-rif.json +7 -0
  436. package/tests/fixtures/hr-maestro-routing/inputs/10-learning-policy.json +7 -0
  437. package/tests/fixtures/hr-maestro-routing/inputs/11-analytics-people-data.json +7 -0
  438. package/tests/fixtures/hr-maestro-routing/inputs/12-culture-dei.json +7 -0
  439. package/tests/fixtures/hr-maestro-routing/inputs/13-hris-process-controls.json +7 -0
  440. package/tests/fixtures/hr-maestro-routing/inputs/14-ambiguous.json +7 -0
  441. package/tests/fixtures/hr-maestro-routing/taxonomy.json +59 -0
  442. package/tests/fixtures/legal-maestro-routing/expected/01-contract-review.json +6 -0
  443. package/tests/fixtures/legal-maestro-routing/expected/02-privacy-data-protection.json +6 -0
  444. package/tests/fixtures/legal-maestro-routing/expected/03-employment-law-risk.json +6 -0
  445. package/tests/fixtures/legal-maestro-routing/expected/04-litigation-discovery-hold.json +6 -0
  446. package/tests/fixtures/legal-maestro-routing/expected/05-regulatory-compliance.json +6 -0
  447. package/tests/fixtures/legal-maestro-routing/expected/06-ip-open-source.json +6 -0
  448. package/tests/fixtures/legal-maestro-routing/expected/07-vendor-procurement-risk.json +6 -0
  449. package/tests/fixtures/legal-maestro-routing/expected/08-ethics-investigations.json +6 -0
  450. package/tests/fixtures/legal-maestro-routing/expected/09-policy-governance.json +6 -0
  451. package/tests/fixtures/legal-maestro-routing/expected/10-public-disclosure.json +6 -0
  452. package/tests/fixtures/legal-maestro-routing/expected/11-knowledge-management.json +6 -0
  453. package/tests/fixtures/legal-maestro-routing/expected/12-ambiguous.json +4 -0
  454. package/tests/fixtures/legal-maestro-routing/inputs/01-contract-review.json +7 -0
  455. package/tests/fixtures/legal-maestro-routing/inputs/02-privacy-data-protection.json +7 -0
  456. package/tests/fixtures/legal-maestro-routing/inputs/03-employment-law-risk.json +7 -0
  457. package/tests/fixtures/legal-maestro-routing/inputs/04-litigation-discovery-hold.json +7 -0
  458. package/tests/fixtures/legal-maestro-routing/inputs/05-regulatory-compliance.json +7 -0
  459. package/tests/fixtures/legal-maestro-routing/inputs/06-ip-open-source.json +7 -0
  460. package/tests/fixtures/legal-maestro-routing/inputs/07-vendor-procurement-risk.json +7 -0
  461. package/tests/fixtures/legal-maestro-routing/inputs/08-ethics-investigations.json +7 -0
  462. package/tests/fixtures/legal-maestro-routing/inputs/09-policy-governance.json +7 -0
  463. package/tests/fixtures/legal-maestro-routing/inputs/10-public-disclosure.json +7 -0
  464. package/tests/fixtures/legal-maestro-routing/inputs/11-knowledge-management.json +7 -0
  465. package/tests/fixtures/legal-maestro-routing/inputs/12-ambiguous.json +7 -0
  466. package/tests/fixtures/legal-maestro-routing/taxonomy.json +51 -0
  467. package/tests/validate-readme-counts.mjs +179 -0
@@ -0,0 +1,92 @@
1
+ ---
2
+ name: kubernetes-manifest-quality-review
3
+ description: Use this skill when the user provides raw Kubernetes YAML manifests or asks to review K8s manifests for quality, security, or policy compliance — covering Deployment, StatefulSet, DaemonSet, Service, Ingress, NetworkPolicy, RBAC, and CRD resources.
4
+ allowed-tools: Read Grep Glob
5
+ metadata:
6
+ author: "github: Raishin"
7
+ version: "0.1.0"
8
+ updated: "2026-05-17"
9
+ category: delivery
10
+ lifecycle: experimental
11
+ ---
12
+
13
+ # Kubernetes Manifest Quality Review
14
+
15
+ ## Purpose
16
+
17
+ This skill reviews raw Kubernetes YAML manifests for quality, security, and policy-compliance defects. It covers Deployment, StatefulSet, DaemonSet, Service, Ingress, NetworkPolicy, RBAC, and CRD resources. The review is entirely static — it reads YAML files and never applies manifests to a cluster, never contacts the Kubernetes API, and never requests kubeconfig, service account tokens, or cloud credentials.
18
+
19
+ ## Lean operating rules
20
+
21
+ ### Schema and structure
22
+
23
+ - `apiVersion` or `kind` missing — CRITICAL: the manifest cannot be applied; flag and stop review of that resource.
24
+ - Deprecated API versions (e.g., `extensions/v1beta1`, `networking.k8s.io/v1beta1`, `policy/v1beta1` PodSecurityPolicy) — HIGH: these will be rejected by newer clusters.
25
+ - Missing required labels (`app`, `app.kubernetes.io/name`, `app.kubernetes.io/version`) on Pods and workload controllers — MEDIUM: impairs observability, selector targeting, and policy enforcement.
26
+ - No `namespace` specified (reliance on default namespace) — MEDIUM: encourages lateral movement and policy bypass; everything should be explicitly namespaced.
27
+
28
+ ### Pod security (Pod Security Standards)
29
+
30
+ - `securityContext.runAsRoot: true` on a container, or no `runAsNonRoot: true` at pod or container level — HIGH: processes run as UID 0 inside the container.
31
+ - `privileged: true` on a container security context — CRITICAL: the container has near-host-root access.
32
+ - `allowPrivilegeEscalation: true` or field absent (it defaults to `true` unless `privileged: false` is set) — HIGH: child processes can gain more privileges than the parent.
33
+ - `hostNetwork: true`, `hostPID: true`, `hostIPC: true` on the pod spec — CRITICAL: the pod shares the host network stack, process table, or IPC namespace, enabling broad host compromise.
34
+ - `capabilities.add` containing `SYS_ADMIN`, `NET_ADMIN`, `ALL`, `SYS_PTRACE`, or `DAC_OVERRIDE` — CRITICAL: these capabilities provide near-root privilege; drop all capabilities and add only what is specifically required.
35
+ - `readOnlyRootFilesystem: false` or field absent on a container — MEDIUM: a writable root filesystem makes container compromise easier; set to `true` and use `emptyDir` or volume mounts for mutable paths.
36
+ - `seccompProfile` absent at pod or container level — MEDIUM: no syscall filtering, increasing the kernel attack surface; use `RuntimeDefault` or a custom profile.
37
+
38
+ ### Image hygiene
39
+
40
+ - Image tag is `:latest` or absent — HIGH: non-reproducible deployments; a rollout can silently pull a different image than what was tested.
41
+ - No image digest pinning for production manifests — MEDIUM: tag mutability allows supply-chain substitution; prefer `image@sha256:<digest>`.
42
+ - Image pulled from an unverified public registry (e.g., Docker Hub) with no `imagePullPolicy: IfNotPresent` or digest — MEDIUM: arbitrary public images without integrity verification.
43
+
44
+ ### Resource governance
45
+
46
+ - `resources.requests` and `resources.limits` both absent on a container — HIGH: the container is unschedulable on resource-constrained nodes and can starve co-located workloads.
47
+ - Memory limit set without a CPU limit — MEDIUM: CPU throttling surprise; the container can be throttled heavily with no visible error.
48
+ - Ephemeral storage limit absent on containers known to produce logs or temp files — LOW: unbounded ephemeral storage can exhaust node disk and evict other pods.
49
+
50
+ ### Health probes
51
+
52
+ - `livenessProbe` missing — HIGH: the kubelet cannot detect application deadlocks or crash-loop conditions and restart the container.
53
+ - `readinessProbe` missing — HIGH: the endpoint controller sends traffic to the pod before the application is ready, causing errors during startup and rolling updates.
54
+ - Probe using `exec` command with no `timeoutSeconds` specified — MEDIUM: exec probes default to a 1-second timeout; a slow command silently causes probe failures and restarts.
55
+
56
+ ### Networking and exposure
57
+
58
+ - Service type `LoadBalancer` or `NodePort` without a comment or annotation documenting the business justification — MEDIUM: these expose services externally or on every node port; ClusterIP is sufficient for internal services.
59
+ - Ingress resource with no TLS block configured — HIGH: traffic between the client and the ingress controller is unencrypted.
60
+ - No `NetworkPolicy` resource restricts pod ingress or egress in the namespace — MEDIUM: the default Kubernetes network model is allow-all; without a NetworkPolicy every pod can reach every other pod.
61
+ - Ingress annotation `nginx.ingress.kubernetes.io/use-proxy-protocol` or similar annotation that forwards arbitrary upstream headers into backend requests from untrusted input — CRITICAL: enables SSRF and header injection.
62
+
63
+ ### RBAC and service accounts
64
+
65
+ - `ClusterRole` with verb `*` on resource `*` or on `secrets` — CRITICAL: any principal bound to this role has full cluster read/write access.
66
+ - `RoleBinding` or `ClusterRoleBinding` whose subject is `system:anonymous` or `system:unauthenticated` — CRITICAL: unauthenticated callers inherit these permissions.
67
+ - `automountServiceAccountToken: true` (or field absent, which defaults to `true`) on pods that do not contact the Kubernetes API — HIGH: the token is mounted at a known path and exploitable if the container is compromised.
68
+ - RBAC role granting `get` or `list` on `secrets` beyond what the workload demonstrably needs — HIGH: broadens blast radius of a credential compromise.
69
+
70
+ ### Secrets and config
71
+
72
+ - Plaintext credentials (passwords, tokens, connection strings) in `env.value` on a container or in `ConfigMap.data` — CRITICAL: credentials visible in manifests committed to source control or stored in etcd in plaintext.
73
+ - `Secret` with `type: Opaque` and a base64-encoded value that decodes to an empty string — MEDIUM: placeholder secret that will cause application startup failures and suggests secrets management is not wired up.
74
+
75
+ ## References
76
+
77
+ Load these only when needed:
78
+ - [Workflow and output contract](references/workflow-and-output.md) — use when executing the full review or formatting the final answer.
79
+
80
+ ## Response minimum
81
+
82
+ Return, at minimum:
83
+ - Schema and API version findings
84
+ - Pod security findings (PSS Restricted/Baseline comparison)
85
+ - Image hygiene findings
86
+ - Resource governance findings
87
+ - Health probe findings
88
+ - Networking and exposure findings
89
+ - RBAC and service account findings
90
+ - Secrets and config findings
91
+ - Severity-labelled finding list (CRITICAL / HIGH / MEDIUM / LOW)
92
+ - Safe next actions
@@ -0,0 +1,23 @@
1
+ {
2
+ "id": "kubernetes-manifest-quality-review",
3
+ "name": "Kubernetes Manifest Quality Review",
4
+ "type": "skill",
5
+ "provider": "generic",
6
+ "harnesses": ["codex", "claude-code", "cursor", "gemini", "kiro", "other"],
7
+ "summary": "Review raw Kubernetes YAML manifests for security, quality, and policy defects — deprecated APIs, missing securityContext, absent resource limits, missing health probes, RBAC over-permission, plaintext secrets, and network exposure — statically, without applying manifests or contacting a cluster.",
8
+ "source_type": "original",
9
+ "official_docs": [
10
+ "https://kubernetes.io/docs/concepts/security/pod-security-standards/",
11
+ "https://kubernetes.io/docs/concepts/configuration/manage-resources-containers/",
12
+ "https://kubernetes.io/docs/tasks/configure-pod-container/configure-liveness-readiness-startup-probes/",
13
+ "https://kubernetes.io/docs/reference/access-authn-authz/rbac/",
14
+ "https://kubernetes.io/docs/concepts/services-networking/network-policies/",
15
+ "https://github.com/yannh/kubeconform",
16
+ "https://github.com/zegl/kube-score"
17
+ ],
18
+ "security_notes": "Static review only — reads manifest YAML files, never applies manifests to a cluster, never connects to the Kubernetes API, and never requests kubeconfig, service account tokens, or cloud credentials. Do not accept manifests containing real secret values or connection strings decoded from base64; ask for sanitized versions with placeholder values.",
19
+ "last_verified": "2026-05-17",
20
+ "path": "skills/qa/kubernetes-manifest-quality-review",
21
+ "author": "github: Raishin",
22
+ "version": "0.1.0"
23
+ }
@@ -0,0 +1,246 @@
1
+ # Workflow and Output Contract
2
+
3
+ ## Workflow
4
+
5
+ ### Step 1 — Collect inputs
6
+
7
+ Ask the user to provide one or more of the following as sanitized files (no real secret values, no kubeconfig, no service account tokens, no cloud credentials — replace sensitive values with placeholders):
8
+ - Workload manifests: Deployment, StatefulSet, DaemonSet YAML
9
+ - Service and Ingress YAML
10
+ - NetworkPolicy YAML
11
+ - RBAC resources: Role, ClusterRole, RoleBinding, ClusterRoleBinding YAML
12
+ - CRD definitions if relevant
13
+ - Any Kustomize base and overlay files
14
+
15
+ If NetworkPolicy resources are not provided, the egress/ingress audit findings are stated as `inference` — say so and ask for them.
16
+
17
+ ### Step 2 — Schema and API version audit
18
+
19
+ Validate that every manifest has `apiVersion` and `kind` present. Check for deprecated or removed API versions:
20
+
21
+ ```yaml
22
+ # HIGH — removed in Kubernetes 1.22
23
+ apiVersion: extensions/v1beta1
24
+ kind: Ingress
25
+
26
+ # HIGH — networking.k8s.io/v1beta1 Ingress removed in 1.22
27
+ apiVersion: networking.k8s.io/v1beta1
28
+ kind: Ingress
29
+
30
+ # HIGH — policy/v1beta1 PodSecurityPolicy removed in 1.25
31
+ apiVersion: policy/v1beta1
32
+ kind: PodSecurityPolicy
33
+ ```
34
+
35
+ Check that required labels are present on Pod templates and workload controllers: `app`, `app.kubernetes.io/name`, `app.kubernetes.io/version`. Flag missing `namespace` on all resources.
36
+
37
+ ### Step 3 — Pod security audit (PSS Restricted/Baseline comparison)
38
+
39
+ Evaluate each Pod spec against the Pod Security Standards Restricted profile:
40
+
41
+ ```yaml
42
+ # CRITICAL — privileged container
43
+ securityContext:
44
+ privileged: true
45
+
46
+ # CRITICAL — host namespaces
47
+ hostNetwork: true
48
+ hostPID: true
49
+ hostIPC: true
50
+
51
+ # HIGH — runAsRoot or missing runAsNonRoot
52
+ securityContext:
53
+ runAsUser: 0
54
+ # or: runAsNonRoot absent
55
+
56
+ # HIGH — allowPrivilegeEscalation unset or true
57
+ securityContext:
58
+ allowPrivilegeEscalation: true
59
+
60
+ # CRITICAL — dangerous capabilities
61
+ securityContext:
62
+ capabilities:
63
+ add: ["SYS_ADMIN"]
64
+
65
+ # MEDIUM — writable root filesystem
66
+ securityContext:
67
+ readOnlyRootFilesystem: false
68
+ # or: field absent
69
+
70
+ # MEDIUM — no seccomp profile
71
+ securityContext:
72
+ # seccompProfile absent
73
+ ```
74
+
75
+ For each container in the pod, note whether the field is set at the pod level, the container level, or both. Container-level settings override pod-level settings.
76
+
77
+ ### Step 4 — Image hygiene audit
78
+
79
+ Check every container and init container image reference:
80
+
81
+ ```yaml
82
+ # HIGH — mutable tag, non-reproducible
83
+ image: nginx:latest
84
+ image: myapp # tag absent
85
+
86
+ # MEDIUM — no digest pinning
87
+ image: nginx:1.25.3 # tag present but no @sha256 digest
88
+
89
+ # MEDIUM — unverified public registry, no digest
90
+ image: docker.io/library/nginx:1.25.3
91
+ ```
92
+
93
+ For production-grade manifests, recommend digest-pinned images:
94
+ ```yaml
95
+ image: nginx:1.25.3@sha256:<digest>
96
+ ```
97
+
98
+ ### Step 5 — Resource governance audit
99
+
100
+ Check every container for `resources.requests` and `resources.limits`:
101
+
102
+ ```yaml
103
+ # HIGH — no requests or limits
104
+ containers:
105
+ - name: app
106
+ image: myapp:1.0.0
107
+ # resources absent
108
+
109
+ # MEDIUM — memory limit set without CPU limit
110
+ resources:
111
+ limits:
112
+ memory: 512Mi
113
+ requests:
114
+ cpu: 100m
115
+ memory: 256Mi
116
+ # limits.cpu absent
117
+ ```
118
+
119
+ Check for ephemeral storage limits on containers known to produce log output or temporary files.
120
+
121
+ ### Step 6 — Health probe audit
122
+
123
+ Check every container for `livenessProbe` and `readinessProbe`:
124
+
125
+ ```yaml
126
+ # HIGH — missing livenessProbe
127
+ containers:
128
+ - name: app
129
+ # livenessProbe absent
130
+
131
+ # HIGH — missing readinessProbe
132
+ containers:
133
+ - name: app
134
+ # readinessProbe absent
135
+
136
+ # MEDIUM — exec probe with no timeoutSeconds
137
+ livenessProbe:
138
+ exec:
139
+ command: ["/bin/check"]
140
+ # timeoutSeconds absent, defaults to 1 second
141
+ ```
142
+
143
+ ### Step 7 — Networking and exposure audit
144
+
145
+ Review Service types, Ingress TLS, NetworkPolicy coverage, and Ingress annotations:
146
+
147
+ ```yaml
148
+ # MEDIUM — external exposure without documented justification
149
+ kind: Service
150
+ spec:
151
+ type: LoadBalancer # or NodePort
152
+
153
+ # HIGH — Ingress without TLS
154
+ kind: Ingress
155
+ spec:
156
+ # tls block absent
157
+
158
+ # MEDIUM — no NetworkPolicy found in namespace (default allow-all)
159
+
160
+ # CRITICAL — SSRF-enabling Ingress annotation
161
+ metadata:
162
+ annotations:
163
+ nginx.ingress.kubernetes.io/use-proxy-protocol: "true"
164
+ ```
165
+
166
+ If no NetworkPolicy resources are provided for the namespace, state that the default-allow posture is inferred and ask for NetworkPolicy files.
167
+
168
+ ### Step 8 — RBAC and secrets audit
169
+
170
+ Review ClusterRole, Role, RoleBinding, ClusterRoleBinding, and Secret resources:
171
+
172
+ ```yaml
173
+ # CRITICAL — wildcard verbs on wildcard resources
174
+ rules:
175
+ - apiGroups: ["*"]
176
+ resources: ["*"]
177
+ verbs: ["*"]
178
+
179
+ # CRITICAL — unauthenticated subject
180
+ subjects:
181
+ - kind: Group
182
+ name: system:unauthenticated
183
+
184
+ # HIGH — automount enabled on pods that do not need API access
185
+ automountServiceAccountToken: true # or field absent
186
+
187
+ # HIGH — broad secret access
188
+ rules:
189
+ - resources: ["secrets"]
190
+ verbs: ["get", "list"]
191
+
192
+ # CRITICAL — plaintext credentials in env
193
+ env:
194
+ - name: DB_PASSWORD
195
+ value: "mysecretpassword"
196
+
197
+ # MEDIUM — empty-string secret value
198
+ data:
199
+ password: "" # decodes to empty
200
+ ```
201
+
202
+ ---
203
+
204
+ ## Output
205
+
206
+ Return findings in this structure:
207
+
208
+ ```
209
+ ## Verdict
210
+ <one sentence: manifests pass baseline / manifests have blocking security defects / manifests need remediation before production>
211
+
212
+ ## Evidence level
213
+ <manifest files provided | partial manifests only | inference for missing resources>
214
+
215
+ ## Findings
216
+
217
+ ### CRITICAL
218
+ - [C1] <resource name> — <finding>: <description> — <remediation>
219
+
220
+ ### HIGH
221
+ - [H1] <resource name> — <finding>: <description> — <remediation>
222
+
223
+ ### MEDIUM
224
+ - [M1] <resource name> — <finding>: <description> — <remediation>
225
+
226
+ ### LOW
227
+ - [L1] <resource name> — <finding>: <description> — <remediation>
228
+
229
+ ## Safe next actions
230
+ 1. <action>
231
+ 2. <action>
232
+
233
+ ## Open questions
234
+ - <question requiring user clarification>
235
+ ```
236
+
237
+ ---
238
+
239
+ ## Security notes
240
+
241
+ - Never request or accept kubeconfig, service account tokens, cloud credentials, or actual secret values. Ask for sanitized manifests with placeholder values in Secret resources.
242
+ - This is a static review: do not apply manifests, run `kubectl`, or contact any cluster.
243
+ - A `privileged: true` container, `hostNetwork/hostPID/hostIPC: true`, or a ClusterRole with `*` verbs on `*` resources is the highest-impact finding class. Lead with it.
244
+ - `RoleBinding` to `system:unauthenticated` or `system:anonymous` is a critical exposure; tell the user to remove it immediately.
245
+ - Plaintext credentials in `env.value` or `ConfigMap.data` should be replaced with `secretKeyRef` references; never recommend committing real credentials even in base64.
246
+ - Do not recommend disabling probes or relaxing securityContext fields to pass short-term validation — recommend the correct secure configuration and explain the rationale.
@@ -0,0 +1,52 @@
1
+ ---
2
+ name: llm-ai-pipeline-test-review
3
+ description: Use this skill when reviewing how an LLM or AI pipeline is evaluated — metric selection, golden datasets, threshold governance, adversarial coverage, and regression gating — to determine whether low-quality or unsafe model outputs can ship undetected. Trigger when a user provides evaluation configuration files, DeepEval or RAGAS test scripts, eval CI steps, or asks whether their AI pipeline actually prevents a bad model from reaching production. This skill reviews evaluation setup statically; it does not call LLM APIs, run evaluations, or contact inference endpoints.
4
+ allowed-tools: Read Grep Glob
5
+ metadata:
6
+ author: "github: Raishin"
7
+ version: "0.1.0"
8
+ updated: "2026-05-17"
9
+ category: ai
10
+ lifecycle: experimental
11
+ ---
12
+
13
+ # LLM AI Pipeline Test Review
14
+
15
+ ## Purpose
16
+ This skill reviews how an LLM or AI pipeline is evaluated — not the model itself, but the evaluation setup that decides whether a model change is safe to ship. An evaluation suite only protects users if it measures the right things, gates on meaningful thresholds, covers adversarial inputs, and detects drift across model versions. The review catches missing hallucination and factuality metrics, absent answer-relevancy and faithfulness checks for RAG pipelines, unguarded bias and toxicity, no adversarial or red-team coverage, agent evals that ignore tool correctness and task completion, thresholds that are undefined or set to zero, single-shot evals on non-deterministic outputs, and no regression baseline to detect metric drift.
17
+
18
+ ## Lean operating rules
19
+
20
+ - Treat a RAG or summarisation pipeline with no `HallucinationMetric` or no GEval with factuality criteria against source documents as HIGH — the pipeline can fabricate facts and ship them.
21
+ - Treat a pipeline with no golden dataset (fixed reference set for regression) as HIGH — metric drift across model versions is undetectable.
22
+ - Treat the absence of `AnswerRelevancyMetric` as MEDIUM — responses may be fluent but off-topic, and no eval catches it.
23
+ - Treat a RAG pipeline with no `FaithfulnessMetric` as HIGH — the model can ignore retrieved context and hallucinate; faithfulness is the primary RAG correctness signal.
24
+ - Treat missing `ContextualPrecisionMetric` or `ContextualRecallMetric` in a RAG pipeline as MEDIUM — retrieval quality is unmeasured; noisy or incomplete retrieval is invisible to the eval.
25
+ - Treat the absence of `BiasMetric` or `ToxicityMetric` as HIGH if the system is user-facing — unsafe outputs can reach users without detection; treat as CRITICAL if the audience is vulnerable (children, medical patients, crisis users).
26
+ - Treat no adversarial test cases and no red-team dataset as CRITICAL for agentic systems; HIGH for all other user-facing LLM products — prompt-injection and jailbreak paths are untested.
27
+ - Treat agent evals with no `ToolCorrectnessMetric` as HIGH — the agent can call wrong tools silently and the eval still passes.
28
+ - Treat multi-step agent evals with no `TaskCompletionMetric` as HIGH — end-to-end success is unmeasured even if individual steps look fine.
29
+ - Treat metric thresholds that are undefined, set to 0, or not reviewed by a domain expert as HIGH — a threshold of 0 means every output passes; an unreviewed threshold is a guess.
30
+ - Treat evals that run only once per input on non-deterministic outputs (no pass@k or mean-score aggregation across multiple runs) as MEDIUM — a single lucky sample masks systematic failure.
31
+ - Treat the absence of a golden dataset or scoring baseline that would detect metric regression across model versions as HIGH — a model update can silently degrade quality.
32
+ - Treat static golden datasets that have never been rotated or supplemented with synthetic adversarial data as MEDIUM — a suite that tests the same inputs repeatedly stops finding new defects (the pesticide paradox).
33
+ - Apply thresholds contextually: a faithfulness score of 0.7 may be acceptable for a joke generator and unacceptable for a medical chatbot — flag any threshold that appears copied from a tutorial without domain justification.
34
+ - Define eval metrics early in the model selection process, not after a model is chosen — catching defects before model selection is always cheaper than retrofitting evals.
35
+ - Label every finding with evidence basis: eval config provided, test script provided, documentation-based, or inference.
36
+ - Static review only — read eval configs and test source; never call LLM APIs, never run evaluations, never request model API keys or inference endpoints.
37
+
38
+ ## References
39
+ Load these only when needed:
40
+ - [Workflow and output contract](references/workflow-and-output.md) — use when executing the full review or formatting the final answer.
41
+
42
+ ## Response minimum
43
+ Return, at minimum:
44
+ - Hallucination and factual correctness findings
45
+ - Answer relevancy and faithfulness findings (especially for RAG pipelines)
46
+ - Safety metric findings (bias, toxicity)
47
+ - Adversarial and red-team coverage findings
48
+ - Agent-specific metric findings (tool correctness, task completion)
49
+ - Threshold governance and non-determinism findings
50
+ - Regression gating findings (golden dataset, baseline)
51
+ - Severity-labelled finding list (critical / high / medium / low)
52
+ - Safe next actions
@@ -0,0 +1,23 @@
1
+ {
2
+ "id": "llm-ai-pipeline-test-review",
3
+ "name": "LLM AI Pipeline Test Review",
4
+ "type": "skill",
5
+ "provider": "generic",
6
+ "harnesses": ["codex", "claude-code", "cursor", "gemini", "kiro", "other"],
7
+ "summary": "Review an LLM or AI pipeline's evaluation setup for test-quality defects — missing hallucination, relevancy, faithfulness, bias, toxicity, and tool-correctness metrics; absent golden datasets; unthresholded or single-shot evals; and no regression gate across model versions. Static review only.",
8
+ "source_type": "original",
9
+ "official_docs": [
10
+ "https://docs.confident-ai.com/",
11
+ "https://docs.confident-ai.com/docs/metrics-hallucination",
12
+ "https://docs.confident-ai.com/docs/metrics-answer-relevancy",
13
+ "https://docs.confident-ai.com/docs/metrics-faithfulness",
14
+ "https://docs.confident-ai.com/docs/metrics-bias",
15
+ "https://docs.confident-ai.com/docs/metrics-tool-correctness",
16
+ "https://www.istqb.org/certifications/certified-tester-foundation-level"
17
+ ],
18
+ "security_notes": "Static review only — reads eval configuration and test source; never calls LLM APIs, never runs evaluations, never requests model API keys or inference endpoints. Do not accept eval fixtures containing real user PII, private prompt chains, or model weights; ask for sanitized configurations.",
19
+ "last_verified": "2026-05-17",
20
+ "path": "skills/qa/llm-ai-pipeline-test-review",
21
+ "author": "github: Raishin",
22
+ "version": "0.1.0"
23
+ }
@@ -0,0 +1,221 @@
1
+ # Workflow and Output Contract
2
+
3
+ ## Workflow
4
+
5
+ ### Step 1 — Collect inputs
6
+
7
+ Ask the user to provide one or more of the following as sanitized files (no API keys, no model weights, no real user PII — replace with placeholders):
8
+ - Evaluation configuration files (DeepEval `test_*.py`, RAGAS config, custom eval scripts)
9
+ - Golden dataset samples or references to a golden dataset (path, size, last-updated date)
10
+ - CI step that runs evals (workflow YAML, script, or description of the gate)
11
+ - The metric list and threshold values in use (even if embedded in test files)
12
+ - For RAG pipelines: retrieval configuration (vector store, top-k, similarity threshold)
13
+ - Optional: recent eval run report or score history showing metric trends
14
+
15
+ If CI gating configuration is not provided, regression-gate findings are stated as `inference` — say so and ask for it.
16
+ If threshold values are not provided, threshold-governance findings are stated as `inference`.
17
+
18
+ ### Step 2 — Hallucination and factual correctness audit
19
+
20
+ Confirm the eval measures whether the model's claims are factually grounded.
21
+
22
+ ```python
23
+ # HIGH — no hallucination check; fabrications pass the suite undetected
24
+ test_cases = [LLMTestCase(input=q, actual_output=answer)]
25
+ # no HallucinationMetric or GEval with factuality criteria
26
+
27
+ # Correct — hallucination measured against source
28
+ hallucination_metric = HallucinationMetric(threshold=0.2)
29
+ dataset = EvaluationDataset(test_cases=[
30
+ LLMTestCase(input=q, actual_output=answer, context=[source_doc])
31
+ ])
32
+ assert_test(dataset, [hallucination_metric])
33
+ ```
34
+
35
+ Check for:
36
+ - Presence of `HallucinationMetric` or a GEval with `"factual consistency"` / `"faithfulness to source"` criteria
37
+ - Whether `context` (source documents) is provided to the metric — without it, the metric cannot detect contradiction
38
+ - Whether a golden dataset with expected answers exists for regression comparisons
39
+
40
+ ### Step 3 — Answer relevancy and faithfulness audit (RAG focus)
41
+
42
+ For all pipelines, confirm responses address the input. For RAG pipelines, confirm outputs are grounded in retrieved context.
43
+
44
+ ```python
45
+ # MEDIUM — relevancy not measured; off-topic responses pass
46
+ # missing AnswerRelevancyMetric
47
+
48
+ # HIGH — RAG pipeline without faithfulness check; model can ignore retrieved docs
49
+ # missing FaithfulnessMetric with retrieved_contexts
50
+
51
+ # Correct — both relevancy and faithfulness measured
52
+ relevancy = AnswerRelevancyMetric(threshold=0.7)
53
+ faithfulness = FaithfulnessMetric(threshold=0.7)
54
+ test_case = LLMTestCase(
55
+ input=query,
56
+ actual_output=answer,
57
+ retrieval_context=retrieved_docs
58
+ )
59
+ ```
60
+
61
+ Check for:
62
+ - `AnswerRelevancyMetric` present for any conversational or Q&A pipeline
63
+ - `FaithfulnessMetric` present for any RAG pipeline — this is the primary RAG correctness signal
64
+ - `ContextualPrecisionMetric` and `ContextualRecallMetric` for RAG pipelines measuring retrieval quality
65
+ - Whether `retrieval_context` is populated in test cases — an empty context silently disables the metric
66
+
67
+ ### Step 4 — Safety metrics audit (bias, toxicity)
68
+
69
+ Confirm the eval catches unsafe outputs before they reach users.
70
+
71
+ ```python
72
+ # HIGH (CRITICAL for vulnerable audiences) — no safety guardrails in eval
73
+ # missing BiasMetric and ToxicityMetric
74
+
75
+ # Correct — safety metrics applied
76
+ bias_metric = BiasMetric(threshold=0.5)
77
+ toxicity_metric = ToxicityMetric(threshold=0.5)
78
+ ```
79
+
80
+ Check for:
81
+ - `BiasMetric` present for any user-facing system
82
+ - `ToxicityMetric` present for any user-facing system
83
+ - Threshold values reviewed for the deployment context — a threshold appropriate for an adult content filter may be too permissive for a children's education tool
84
+ - Whether bias and toxicity metrics are in the gating suite or are only advisory/non-blocking
85
+
86
+ ### Step 5 — Adversarial and red-team coverage audit
87
+
88
+ Confirm the eval includes adversarial inputs, not only happy-path test cases.
89
+
90
+ ```python
91
+ # CRITICAL for agentic / HIGH for others — no adversarial cases
92
+ test_cases = [LLMTestCase(input=normal_query, actual_output=answer)]
93
+ # only benign inputs; no prompt-injection attempts, no jailbreaks
94
+
95
+ # Correct — red-team dataset included
96
+ adversarial_cases = load_dataset("adversarial_prompts.json")
97
+ ```
98
+
99
+ Check for:
100
+ - Presence of adversarial test cases or a red-team dataset (prompt-injection attempts, jailbreak patterns, boundary inputs)
101
+ - For agentic systems: test cases that verify the agent refuses or handles malicious tool-calling instructions
102
+ - Whether adversarial cases are rotated periodically — a static adversarial set becomes predictable (pesticide paradox)
103
+ - Whether adversarial inputs cluster around the topic or domain boundaries of the deployment (defect clustering)
104
+
105
+ ### Step 6 — Agent-specific metrics audit (tool correctness, task completion)
106
+
107
+ For pipelines that include LLM agents, confirm the eval measures agent behavior, not only text quality.
108
+
109
+ ```python
110
+ # HIGH — agent evals check only output text; wrong tool calls pass undetected
111
+ # missing ToolCorrectnessMetric
112
+
113
+ # HIGH — multi-step agent eval has no end-to-end success signal
114
+ # missing TaskCompletionMetric
115
+
116
+ # Correct — both agent metrics present
117
+ tool_correctness = ToolCorrectnessMetric()
118
+ task_completion = TaskCompletionMetric(threshold=0.8)
119
+ agent_test_case = LLMTestCase(
120
+ input=user_request,
121
+ actual_output=final_answer,
122
+ tools_called=agent_tool_log,
123
+ expected_tools=["search", "summarize"]
124
+ )
125
+ ```
126
+
127
+ Check for:
128
+ - `ToolCorrectnessMetric` present when an agent selects or calls tools
129
+ - `TaskCompletionMetric` present for multi-step agentic workflows
130
+ - Whether `tools_called` is logged and passed to tool metrics — without the log the metric cannot evaluate tool use
131
+ - Whether task completion is defined and measurable for the specific agent goal
132
+
133
+ ### Step 7 — Threshold governance and non-determinism audit
134
+
135
+ Confirm thresholds are meaningful and results are statistically reliable.
136
+
137
+ ```python
138
+ # HIGH — threshold of 0 means every output passes; the metric is decorative
139
+ HallucinationMetric(threshold=0)
140
+
141
+ # MEDIUM — single run on a non-deterministic model; one lucky sample masks failures
142
+ result = evaluate(dataset, metrics=[hallucination_metric])
143
+
144
+ # Correct — multiple runs aggregated; threshold domain-reviewed
145
+ scores = [evaluate(dataset, metrics=[hallucination_metric]).scores for _ in range(5)]
146
+ mean_score = sum(scores) / len(scores)
147
+ # threshold=0.2 reviewed by a domain expert for this medical-chatbot use case
148
+ ```
149
+
150
+ Check for:
151
+ - Any threshold set to 0 or left at default without documented review — flag as HIGH
152
+ - Whether thresholds are documented with a rationale (use case, acceptable failure rate, domain expert sign-off)
153
+ - Whether multi-run aggregation (pass@k, mean score over N runs) is used for non-deterministic outputs
154
+ - Whether thresholds differ appropriately across deployment contexts (production vs. staging, medical vs. entertainment)
155
+
156
+ ### Step 8 — Regression gate audit
157
+
158
+ Confirm the eval detects when a model update silently degrades quality.
159
+
160
+ ```python
161
+ # HIGH — no baseline; a new model can score worse than the old one and ship
162
+ evaluate(dataset, metrics=[hallucination_metric])
163
+ # no comparison to previous run scores
164
+
165
+ # Correct — baseline scores recorded and compared
166
+ baseline = load_baseline("eval_baseline_v1.json")
167
+ current = evaluate(dataset, metrics=[hallucination_metric])
168
+ assert current.score >= baseline.score - ALLOWED_REGRESSION
169
+ ```
170
+
171
+ Check for:
172
+ - A golden dataset that is versioned and stable enough to detect regression
173
+ - Baseline scores stored from prior runs and compared against current runs
174
+ - CI or eval step that fails when scores drop below the baseline by more than an allowed delta
175
+ - Whether the golden dataset is ever refreshed — a dataset that never changes stops finding new defect categories (pesticide paradox); rotate or supplement it with synthetic data periodically
176
+
177
+ ---
178
+
179
+ ## Output
180
+
181
+ Return findings in this structure:
182
+
183
+ ```
184
+ ## Verdict
185
+ <one sentence: eval suite gates unsafe outputs / eval runs but gates nothing / partial coverage with gaps>
186
+
187
+ ## Evidence level
188
+ <eval config + test scripts provided | eval config only | documentation-based | inference>
189
+
190
+ ## Findings
191
+
192
+ ### CRITICAL
193
+ - [C1] <finding>: <description> — <remediation>
194
+
195
+ ### HIGH
196
+ - [H1] <finding>: <description> — <remediation>
197
+
198
+ ### MEDIUM
199
+ - [M1] <finding>: <description> — <remediation>
200
+
201
+ ### LOW
202
+ - [L1] <finding>: <description> — <remediation>
203
+
204
+ ## Safe next actions
205
+ 1. <action>
206
+ 2. <action>
207
+
208
+ ## Open questions
209
+ - <question requiring user clarification>
210
+ ```
211
+
212
+ ---
213
+
214
+ ## Security notes
215
+
216
+ - Never request or accept model API keys, inference endpoint URLs, or model weights. Ask for sanitized eval configuration with placeholders.
217
+ - Never call LLM APIs, run evaluations, or contact inference endpoints — this is a static review only.
218
+ - Do not accept eval fixtures containing real user PII or private prompt chains; ask the user to anonymize them first.
219
+ - A metric with threshold=0 is functionally disabled — it is the eval equivalent of `continue-on-error: true` on a test step. Lead with it when present.
220
+ - Bias and toxicity without thresholds reviewed for the actual audience are a false signal of safety; flag the gap and ask what the audience is.
221
+ - Adversarial coverage is the most commonly absent category; absence is not evidence that the model is robust — it is evidence the question was never asked.