@vigolium/piolium 0.0.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (271) hide show
  1. package/LICENSE +21 -0
  2. package/README.md +117 -0
  3. package/agents/access-auditor.md +300 -0
  4. package/agents/assumption-breaker.md +154 -0
  5. package/agents/attack-designer.md +116 -0
  6. package/agents/code-scanner.md +139 -0
  7. package/agents/concurrency-auditor.md +238 -0
  8. package/agents/confirm-writer.md +257 -0
  9. package/agents/context-reviewer.md +274 -0
  10. package/agents/cross-verifier.md +165 -0
  11. package/agents/cve-scout.md +381 -0
  12. package/agents/env-builder.md +282 -0
  13. package/agents/env-profiler.md +205 -0
  14. package/agents/evidence-collector.md +140 -0
  15. package/agents/finding-grader.md +142 -0
  16. package/agents/finding-writer.md +148 -0
  17. package/agents/flow-tracer.md +106 -0
  18. package/agents/goal-backtracer.md +146 -0
  19. package/agents/history-miner.md +467 -0
  20. package/agents/independent-verifier.md +118 -0
  21. package/agents/intent-mapper.md +183 -0
  22. package/agents/longshot-collector.md +128 -0
  23. package/agents/longshot-prober.md +126 -0
  24. package/agents/patch-auditor.md +73 -0
  25. package/agents/poc-author.md +124 -0
  26. package/agents/poc-runner.md +194 -0
  27. package/agents/probe-lead.md +269 -0
  28. package/agents/red-challenger.md +101 -0
  29. package/agents/report-composer.md +208 -0
  30. package/agents/review-adjudicator.md +216 -0
  31. package/agents/spec-auditor.md +155 -0
  32. package/agents/taint-tracer.md +265 -0
  33. package/agents/test-locator.md +209 -0
  34. package/agents/threat-modeler.md +132 -0
  35. package/agents/variant-scanner.md +108 -0
  36. package/agents/variant-spotter.md +110 -0
  37. package/bin/piolium.mjs +376 -0
  38. package/extensions/piolium/_vendor/yaml.bundle.d.mts +6 -0
  39. package/extensions/piolium/_vendor/yaml.bundle.mjs +139 -0
  40. package/extensions/piolium/agent-runner.ts +322 -0
  41. package/extensions/piolium/agents.ts +266 -0
  42. package/extensions/piolium/audit-state.ts +522 -0
  43. package/extensions/piolium/bundled-resources.ts +97 -0
  44. package/extensions/piolium/candidate-scan.ts +966 -0
  45. package/extensions/piolium/command-target.ts +177 -0
  46. package/extensions/piolium/console-stream.ts +57 -0
  47. package/extensions/piolium/export-results.ts +380 -0
  48. package/extensions/piolium/findings.ts +448 -0
  49. package/extensions/piolium/heartbeat.ts +182 -0
  50. package/extensions/piolium/help.ts +234 -0
  51. package/extensions/piolium/index.ts +1865 -0
  52. package/extensions/piolium/longshot.ts +530 -0
  53. package/extensions/piolium/matcher-suggestions.ts +196 -0
  54. package/extensions/piolium/matcher-utils.ts +83 -0
  55. package/extensions/piolium/modes/balanced.ts +750 -0
  56. package/extensions/piolium/modes/confirm-bootstrap.ts +186 -0
  57. package/extensions/piolium/modes/confirm.ts +697 -0
  58. package/extensions/piolium/modes/deep.ts +917 -0
  59. package/extensions/piolium/modes/diff.ts +177 -0
  60. package/extensions/piolium/modes/lite.ts +540 -0
  61. package/extensions/piolium/modes/longshot.ts +595 -0
  62. package/extensions/piolium/modes/merge.ts +204 -0
  63. package/extensions/piolium/modes/phase-runner.ts +267 -0
  64. package/extensions/piolium/modes/reinvest.ts +546 -0
  65. package/extensions/piolium/modes/revisit.ts +279 -0
  66. package/extensions/piolium/modes.ts +48 -0
  67. package/extensions/piolium/phase-labels.ts +123 -0
  68. package/extensions/piolium/phase-status-strip.ts +92 -0
  69. package/extensions/piolium/prompt-prefix-editor.ts +39 -0
  70. package/extensions/piolium/providers/anthropic-vertex.ts +836 -0
  71. package/extensions/piolium/recon.ts +409 -0
  72. package/extensions/piolium/result-stats.ts +105 -0
  73. package/extensions/piolium/retry.ts +120 -0
  74. package/extensions/piolium/scheduler.ts +212 -0
  75. package/extensions/piolium/secrets.ts +368 -0
  76. package/extensions/piolium/tools/web-tools.ts +148 -0
  77. package/package.json +77 -0
  78. package/skills/agentic-actions-auditor/SKILL.md +327 -0
  79. package/skills/agentic-actions-auditor/references/action-profiles.md +186 -0
  80. package/skills/agentic-actions-auditor/references/cross-file-resolution.md +209 -0
  81. package/skills/agentic-actions-auditor/references/foundations.md +94 -0
  82. package/skills/agentic-actions-auditor/references/vector-a-env-var-intermediary.md +77 -0
  83. package/skills/agentic-actions-auditor/references/vector-b-direct-expression-injection.md +83 -0
  84. package/skills/agentic-actions-auditor/references/vector-c-cli-data-fetch.md +83 -0
  85. package/skills/agentic-actions-auditor/references/vector-d-pr-target-checkout.md +88 -0
  86. package/skills/agentic-actions-auditor/references/vector-e-error-log-injection.md +88 -0
  87. package/skills/agentic-actions-auditor/references/vector-f-subshell-expansion.md +82 -0
  88. package/skills/agentic-actions-auditor/references/vector-g-eval-of-ai-output.md +91 -0
  89. package/skills/agentic-actions-auditor/references/vector-h-dangerous-sandbox-configs.md +102 -0
  90. package/skills/agentic-actions-auditor/references/vector-i-wildcard-allowlists.md +88 -0
  91. package/skills/audit/SKILL.md +562 -0
  92. package/skills/audit/assets/icon.svg +7 -0
  93. package/skills/audit/hooks/scripts/validate_phase_output.py +550 -0
  94. package/skills/audit/references/adversarial-review.md +148 -0
  95. package/skills/audit/references/architecture-aware-sast.md +306 -0
  96. package/skills/audit/references/audit-workflow.md +737 -0
  97. package/skills/audit/references/chamber-protocol.md +384 -0
  98. package/skills/audit/references/creative-attack-modes.md +221 -0
  99. package/skills/audit/references/deep-analysis.md +273 -0
  100. package/skills/audit/references/domain-attack-playbooks.md +1129 -0
  101. package/skills/audit/references/knowledge-base-template.md +513 -0
  102. package/skills/audit/references/real-env-validation.md +191 -0
  103. package/skills/audit/references/report-templates.md +417 -0
  104. package/skills/audit/references/triage-and-prereqs.md +134 -0
  105. package/skills/audit/scripts/consolidate_drafts.py +554 -0
  106. package/skills/audit/scripts/partition_findings.py +152 -0
  107. package/skills/audit/scripts/rg-hotspots.sh +121 -0
  108. package/skills/audit/scripts/stamp_file_state.py +349 -0
  109. package/skills/code-reviewer/SKILL.md +65 -0
  110. package/skills/codeql/SKILL.md +281 -0
  111. package/skills/codeql/references/build-fixes.md +90 -0
  112. package/skills/codeql/references/diagnostic-query-templates.md +339 -0
  113. package/skills/codeql/references/extension-yaml-format.md +209 -0
  114. package/skills/codeql/references/important-only-suite.md +153 -0
  115. package/skills/codeql/references/language-details.md +207 -0
  116. package/skills/codeql/references/macos-arm64e-workaround.md +179 -0
  117. package/skills/codeql/references/performance-tuning.md +111 -0
  118. package/skills/codeql/references/quality-assessment.md +172 -0
  119. package/skills/codeql/references/ruleset-catalog.md +63 -0
  120. package/skills/codeql/references/run-all-suite.md +92 -0
  121. package/skills/codeql/references/sarif-processing.md +79 -0
  122. package/skills/codeql/references/threat-models.md +51 -0
  123. package/skills/codeql/workflows/build-database.md +280 -0
  124. package/skills/codeql/workflows/create-data-extensions.md +261 -0
  125. package/skills/codeql/workflows/run-analysis.md +301 -0
  126. package/skills/differential-review/SKILL.md +220 -0
  127. package/skills/differential-review/adversarial.md +203 -0
  128. package/skills/differential-review/methodology.md +234 -0
  129. package/skills/differential-review/patterns.md +300 -0
  130. package/skills/differential-review/reporting.md +369 -0
  131. package/skills/fp-check/SKILL.md +125 -0
  132. package/skills/fp-check/references/bug-class-verification.md +114 -0
  133. package/skills/fp-check/references/deep-verification.md +143 -0
  134. package/skills/fp-check/references/evidence-templates.md +91 -0
  135. package/skills/fp-check/references/false-positive-patterns.md +115 -0
  136. package/skills/fp-check/references/gate-reviews.md +27 -0
  137. package/skills/fp-check/references/standard-verification.md +78 -0
  138. package/skills/insecure-defaults/SKILL.md +117 -0
  139. package/skills/insecure-defaults/references/examples.md +409 -0
  140. package/skills/last30days/SKILL.md +444 -0
  141. package/skills/sarif-parsing/SKILL.md +483 -0
  142. package/skills/sarif-parsing/resources/jq-queries.md +162 -0
  143. package/skills/sarif-parsing/resources/sarif_helpers.py +331 -0
  144. package/skills/security-threat-model/LICENSE.txt +201 -0
  145. package/skills/security-threat-model/SKILL.md +81 -0
  146. package/skills/security-threat-model/agents/openai.yaml +4 -0
  147. package/skills/security-threat-model/references/prompt-template.md +255 -0
  148. package/skills/security-threat-model/references/security-controls-and-assets.md +32 -0
  149. package/skills/semgrep/SKILL.md +212 -0
  150. package/skills/semgrep/references/rulesets.md +162 -0
  151. package/skills/semgrep/references/scan-modes.md +110 -0
  152. package/skills/semgrep/references/scanner-task-prompt.md +140 -0
  153. package/skills/semgrep/scripts/merge_sarif.py +203 -0
  154. package/skills/semgrep/workflows/scan-workflow.md +311 -0
  155. package/skills/semgrep-rule-creator/SKILL.md +168 -0
  156. package/skills/semgrep-rule-creator/references/quick-reference.md +202 -0
  157. package/skills/semgrep-rule-creator/references/workflow.md +240 -0
  158. package/skills/semgrep-rule-variant-creator/SKILL.md +205 -0
  159. package/skills/semgrep-rule-variant-creator/references/applicability-analysis.md +250 -0
  160. package/skills/semgrep-rule-variant-creator/references/language-syntax-guide.md +324 -0
  161. package/skills/semgrep-rule-variant-creator/references/workflow.md +518 -0
  162. package/skills/sharp-edges/SKILL.md +292 -0
  163. package/skills/sharp-edges/references/auth-patterns.md +252 -0
  164. package/skills/sharp-edges/references/case-studies.md +274 -0
  165. package/skills/sharp-edges/references/config-patterns.md +333 -0
  166. package/skills/sharp-edges/references/crypto-apis.md +190 -0
  167. package/skills/sharp-edges/references/lang-c.md +205 -0
  168. package/skills/sharp-edges/references/lang-csharp.md +285 -0
  169. package/skills/sharp-edges/references/lang-go.md +270 -0
  170. package/skills/sharp-edges/references/lang-java.md +263 -0
  171. package/skills/sharp-edges/references/lang-javascript.md +269 -0
  172. package/skills/sharp-edges/references/lang-kotlin.md +265 -0
  173. package/skills/sharp-edges/references/lang-php.md +245 -0
  174. package/skills/sharp-edges/references/lang-python.md +274 -0
  175. package/skills/sharp-edges/references/lang-ruby.md +273 -0
  176. package/skills/sharp-edges/references/lang-rust.md +272 -0
  177. package/skills/sharp-edges/references/lang-swift.md +287 -0
  178. package/skills/sharp-edges/references/language-specific.md +588 -0
  179. package/skills/spec-to-code-compliance/SKILL.md +357 -0
  180. package/skills/spec-to-code-compliance/resources/COMPLETENESS_CHECKLIST.md +69 -0
  181. package/skills/spec-to-code-compliance/resources/IR_EXAMPLES.md +417 -0
  182. package/skills/spec-to-code-compliance/resources/OUTPUT_REQUIREMENTS.md +105 -0
  183. package/skills/supply-chain-risk-auditor/SKILL.md +67 -0
  184. package/skills/supply-chain-risk-auditor/resources/results-template.md +41 -0
  185. package/skills/variant-analysis/METHODOLOGY.md +327 -0
  186. package/skills/variant-analysis/SKILL.md +142 -0
  187. package/skills/variant-analysis/resources/codeql/cpp.ql +119 -0
  188. package/skills/variant-analysis/resources/codeql/go.ql +69 -0
  189. package/skills/variant-analysis/resources/codeql/java.ql +71 -0
  190. package/skills/variant-analysis/resources/codeql/javascript.ql +63 -0
  191. package/skills/variant-analysis/resources/codeql/python.ql +80 -0
  192. package/skills/variant-analysis/resources/semgrep/cpp.yaml +98 -0
  193. package/skills/variant-analysis/resources/semgrep/go.yaml +63 -0
  194. package/skills/variant-analysis/resources/semgrep/java.yaml +61 -0
  195. package/skills/variant-analysis/resources/semgrep/javascript.yaml +60 -0
  196. package/skills/variant-analysis/resources/semgrep/python.yaml +72 -0
  197. package/skills/variant-analysis/resources/variant-report-template.md +75 -0
  198. package/skills/vuln-report/SKILL.md +137 -0
  199. package/skills/vuln-report/agents/openai.yaml +4 -0
  200. package/skills/vuln-report/references/report-template.md +135 -0
  201. package/skills/wooyun-legacy/SKILL.md +367 -0
  202. package/skills/wooyun-legacy/references/bank-penetration.md +222 -0
  203. package/skills/wooyun-legacy/references/checklists/command-execution-checklist.md +119 -0
  204. package/skills/wooyun-legacy/references/checklists/csrf-checklist.md +74 -0
  205. package/skills/wooyun-legacy/references/checklists/file-upload-checklist.md +108 -0
  206. package/skills/wooyun-legacy/references/checklists/info-disclosure-checklist.md +114 -0
  207. package/skills/wooyun-legacy/references/checklists/logic-flaws-checklist.md +95 -0
  208. package/skills/wooyun-legacy/references/checklists/misconfig-checklist.md +124 -0
  209. package/skills/wooyun-legacy/references/checklists/path-traversal-checklist.md +87 -0
  210. package/skills/wooyun-legacy/references/checklists/rce-checklist.md +93 -0
  211. package/skills/wooyun-legacy/references/checklists/sql-injection-checklist.md +97 -0
  212. package/skills/wooyun-legacy/references/checklists/ssrf-checklist.md +99 -0
  213. package/skills/wooyun-legacy/references/checklists/unauthorized-access-checklist.md +89 -0
  214. package/skills/wooyun-legacy/references/checklists/weak-password-checklist.md +115 -0
  215. package/skills/wooyun-legacy/references/checklists/xss-checklist.md +103 -0
  216. package/skills/wooyun-legacy/references/checklists/xxe-checklist.md +130 -0
  217. package/skills/wooyun-legacy/references/info-disclosure.md +975 -0
  218. package/skills/wooyun-legacy/references/logic-flaws.md +721 -0
  219. package/skills/wooyun-legacy/references/path-traversal.md +1191 -0
  220. package/skills/wooyun-legacy/references/telecom-penetration.md +156 -0
  221. package/skills/wooyun-legacy/references/unauthorized-access.md +980 -0
  222. package/skills/wooyun-legacy/references/xss.md +746 -0
  223. package/skills/zeroize-audit/SKILL.md +371 -0
  224. package/skills/zeroize-audit/configs/c.yaml +21 -0
  225. package/skills/zeroize-audit/configs/default.yaml +128 -0
  226. package/skills/zeroize-audit/configs/rust.yaml +83 -0
  227. package/skills/zeroize-audit/prompts/report_template.md +238 -0
  228. package/skills/zeroize-audit/prompts/system.md +163 -0
  229. package/skills/zeroize-audit/prompts/task.md +97 -0
  230. package/skills/zeroize-audit/references/compile-commands.md +231 -0
  231. package/skills/zeroize-audit/references/detection-strategy.md +191 -0
  232. package/skills/zeroize-audit/references/ir-analysis.md +252 -0
  233. package/skills/zeroize-audit/references/mcp-analysis.md +221 -0
  234. package/skills/zeroize-audit/references/poc-generation.md +470 -0
  235. package/skills/zeroize-audit/references/rust-zeroization-patterns.md +867 -0
  236. package/skills/zeroize-audit/schemas/input.json +83 -0
  237. package/skills/zeroize-audit/schemas/output.json +140 -0
  238. package/skills/zeroize-audit/tools/analyze_asm.sh +202 -0
  239. package/skills/zeroize-audit/tools/analyze_cfg.py +381 -0
  240. package/skills/zeroize-audit/tools/analyze_heap.sh +211 -0
  241. package/skills/zeroize-audit/tools/analyze_ir_semantic.py +429 -0
  242. package/skills/zeroize-audit/tools/diff_ir.sh +135 -0
  243. package/skills/zeroize-audit/tools/diff_rust_mir.sh +189 -0
  244. package/skills/zeroize-audit/tools/emit_asm.sh +67 -0
  245. package/skills/zeroize-audit/tools/emit_ir.sh +77 -0
  246. package/skills/zeroize-audit/tools/emit_rust_asm.sh +178 -0
  247. package/skills/zeroize-audit/tools/emit_rust_ir.sh +150 -0
  248. package/skills/zeroize-audit/tools/emit_rust_mir.sh +158 -0
  249. package/skills/zeroize-audit/tools/extract_compile_flags.py +284 -0
  250. package/skills/zeroize-audit/tools/generate_poc.py +1329 -0
  251. package/skills/zeroize-audit/tools/mcp/apply_confidence_gates.py +113 -0
  252. package/skills/zeroize-audit/tools/mcp/check_mcp.sh +68 -0
  253. package/skills/zeroize-audit/tools/mcp/normalize_mcp_evidence.py +125 -0
  254. package/skills/zeroize-audit/tools/scripts/check_llvm_patterns.py +481 -0
  255. package/skills/zeroize-audit/tools/scripts/check_mir_patterns.py +554 -0
  256. package/skills/zeroize-audit/tools/scripts/check_rust_asm.py +424 -0
  257. package/skills/zeroize-audit/tools/scripts/check_rust_asm_aarch64.py +300 -0
  258. package/skills/zeroize-audit/tools/scripts/check_rust_asm_x86.py +283 -0
  259. package/skills/zeroize-audit/tools/scripts/find_dangerous_apis.py +375 -0
  260. package/skills/zeroize-audit/tools/scripts/semantic_audit.py +923 -0
  261. package/skills/zeroize-audit/tools/track_dataflow.sh +196 -0
  262. package/skills/zeroize-audit/tools/validate_rust_toolchain.sh +298 -0
  263. package/skills/zeroize-audit/workflows/phase-0-preflight.md +150 -0
  264. package/skills/zeroize-audit/workflows/phase-1-source-analysis.md +144 -0
  265. package/skills/zeroize-audit/workflows/phase-2-compiler-analysis.md +139 -0
  266. package/skills/zeroize-audit/workflows/phase-3-interim-report.md +46 -0
  267. package/skills/zeroize-audit/workflows/phase-4-poc-generation.md +46 -0
  268. package/skills/zeroize-audit/workflows/phase-5-poc-validation.md +136 -0
  269. package/skills/zeroize-audit/workflows/phase-6-final-report.md +44 -0
  270. package/skills/zeroize-audit/workflows/phase-7-test-generation.md +42 -0
  271. package/themes/piolium-srcery.json +94 -0
@@ -0,0 +1,255 @@
1
+ # Threat Modeling Prompt Template for LLMs
2
+
3
+ This reference provides a disciplined, repo-grounded prompt that produces AppSec-usable threat models. Use it when you need a reliable output contract and a consistent process to assemble the threat model output
4
+
5
+ ## System prompt
6
+
7
+ Use this as a stable system prompt:
8
+
9
+ ````text
10
+ You are a senior application security engineer producing a threat model that will be read by other AppSec engineers.
11
+
12
+ Primary objective:
13
+ - Generate a threat model that is specific to THIS repository and its real-world usage.
14
+ - Prefer concrete, evidence-backed findings over generic vulnerability checklists.
15
+
16
+ Evidence and grounding rules:
17
+ - Do not invent components, data stores, endpoints, flows, or controls.
18
+ - Every architectural claim must be backed by at least one "Evidence anchor" referencing a repo path
19
+ (and a symbol name, config key, or a short quoted snippet if available).
20
+ - If information is missing, state assumptions explicitly and list the open questions needed to validate them.
21
+
22
+ Security hygiene:
23
+ - Never output secrets. If you encounter tokens/keys/passwords, redact them and only describe their presence and location.
24
+
25
+ Threat modeling approach:
26
+ - Model the system using data flows and trust boundaries.
27
+ - Enumerate threats and produce attack goals and abuse paths
28
+ - Prioritize threats using explicit likelihood and impact reasoning (qualitative is acceptable: low/medium/high).
29
+
30
+ Scope discipline:
31
+ - Clearly separate: production/runtime behavior vs CI/build/dev tooling vs tests/examples.
32
+ - Clearly separate attacker-controlled inputs vs operator-controlled inputs vs developer-controlled inputs.
33
+ - If a vulnerability class requires attacker control that likely does not exist for this repo's real usage, say so and downgrade severity.
34
+
35
+ Communication quality:
36
+ - Write for AppSec engineers: concise but specific.
37
+ - Use precise terminology. Include mitigations and residual risks.
38
+ - Avoid restating large blocks of README/spec; summarize and point to evidence.
39
+
40
+ Diagram requirements:
41
+ - Produce a single compact Mermaid flowchart showing primary components and trust boundaries.
42
+ - Mermaid must render cleanly. Use a conservative subset:
43
+ - Use `flowchart TD` or `flowchart LR` and only `-->` arrows.
44
+ - Use simple node IDs (letters/numbers/underscores only) and quoted labels (e.g., `A["Label"]`); avoid `A(Label)` shape syntax.
45
+ - Do not use Mermaid `title` lines or `style` directives.
46
+ - Keep edge labels to plain words/spaces only via `-->|label|`; avoid `{}`, `[]`, `()`, or quotes in edge labels (if needed, drop the label).
47
+ - Keep node labels short and readable: do not include file paths, URLs, or socket paths (put those details in prose outside the diagram).
48
+ - Wrap the diagram in a Markdown fenced block:
49
+ ```mermaid
50
+ <mermaid syntax here>
51
+ ```
52
+ ````
53
+
54
+ ## Repository summary prompt
55
+
56
+ ```
57
+ We have a codebase located at {repo_directory/path}, currently on branch {branch_name}.
58
+
59
+ Please produce a security-oriented summary of the repository (or the specified sub-path) with the goal of helping a follow-on security engineer quickly understand the system well enough to build an initial threat model and investigate potential security hypotheses.
60
+
61
+ Objectives
62
+ 1. Project overview
63
+ • Identify the primary programming languages, frameworks, and build system.
64
+ • Summarize the project’s core purpose and high-level architecture.
65
+ • Describe major components, services, or modules and how they interact.
66
+ 2. Security posture and entry points
67
+ • Identify likely user entry points and trust boundaries.
68
+ • Describe existing security layers (e.g., authentication, authorization, validation, sandboxing, isolation, privilege boundaries).
69
+ • Call out security-critical components and assumptions that must hold for the system to remain secure.
70
+
71
+ Guidance for Security Analysis
72
+
73
+ Structure the summary so an application security engineer can quickly answer questions such as:
74
+ • Where does user input originate?
75
+ • How is untrusted data parsed, validated, and handled?
76
+ • What security assumptions should not be violated?
77
+ • Where are the most likely choke points for security bugs?
78
+
79
+ Adapt the analysis to the project type. For example:
80
+ • Web applications: where requests enter, how user data is parsed, routed, authenticated, and stored.
81
+ • Command-line tools: supported inputs (arguments, files, environment variables, stdin) and how they are processed.
82
+ • Network daemons: exposed ports, supported protocols, message formats, and request handling paths.
83
+ • Operating system or low-level components: common vulnerability classes (e.g., memory corruption, logic flaws) that could lead to LPE or RCE.
84
+
85
+ Be thorough but pragmatic: the goal is to help a security engineer quickly determine whether a discovered bug is security-relevant and where deeper investigation should focus.
86
+
87
+ Tooling Notes
88
+
89
+ If Ripgrep (rg) is available, use it to explore the codebase. When using grep or rg, always include the -I flag to avoid searching through binary files.
90
+ ```
91
+
92
+
93
+
94
+ ## User prompt template
95
+
96
+ Use this as the task prompt, filling in what you know and marking the rest as assumptions:
97
+
98
+ ```text
99
+ # Inputs
100
+ Context (fill as available; otherwise infer and mark assumptions):
101
+ - intended_usage: {intended_usage}
102
+ - deployment_model: {deployment_model}
103
+ - data_sensitivity: {data_sensitivity}
104
+ - internet_exposure: {internet_exposure}
105
+ - authn_authz_expectations: {authn_authz_expectations}
106
+ - out_of_scope: {out_of_scope}
107
+
108
+ Provided summaries (may be incomplete):
109
+ - repository_summary: {repository_summary}
110
+
111
+
112
+ In-scope code locations (if known):
113
+ - in_scope_paths: {in_scope_paths}
114
+
115
+ # Task
116
+ Construct a repo-centric threat model that helps AppSec engineers understand the most important security risks and where to focus manual review.
117
+
118
+ You MUST follow this process and reflect outputs in the final document:
119
+
120
+ ## Process
121
+ 1) Repo discovery (evidence collection)
122
+ a. Identify the repo shape:
123
+ - languages and frameworks
124
+ - how it runs (server/cli/library), entrypoints, build artifacts
125
+ b. Identify security-relevant surfaces and controls by searching for evidences, such as:
126
+ - network listeners/routes/endpoints; RPC handlers; message consumers
127
+ - authentication, session/token handling, authorization checks, RBAC/ACL logic
128
+ - parsing/serialization/deserialization (JSON/YAML/XML/protobuf), template rendering, eval/dynamic code
129
+ - file upload/read paths, archive extraction, image/document parsing
130
+ - database/queue/cache clients and query construction
131
+ - secrets/config loading, environment variables, key management
132
+ - SSRF-capable HTTP clients, webhooks, URL fetchers
133
+ - sandboxing/isolation, privilege boundaries, subprocess execution
134
+ - logging/auditing and error handling paths
135
+ - CI/build/release: pipelines, dependency management, artifact publishing
136
+
137
+ 2) System model
138
+ a. Summarize the primary components (runtime plus critical build/CI components when relevant).
139
+ b. Enumerate data flows and trust boundaries.
140
+ - For each trust boundary, specify:
141
+ * source to destination
142
+ * data types crossing (e.g., credentials, PII, files, tokens, prompts)
143
+ * channel/protocol (HTTP/gRPC/IPC/file/db)
144
+ * security guarantees and validation (auth, mTLS, origin checks, schema validation, rate limits)
145
+ c. Provide a compact Mermaid diagram showing components and trust boundaries.
146
+
147
+ 3) Assets and security objectives
148
+ - List assets (data, credentials, integrity-critical state, availability-critical components, build artifacts).
149
+ - For each asset, state why it matters (confidentiality/integrity/availability, compliance, user harm).
150
+
151
+ 4) Attacker model
152
+ - Capabilities: realistic remote attacker assumptions based on intended usage and exposure.
153
+ - Non-capabilities: things attacker cannot plausibly do (unless explicitly in scope), to avoid inflated severity.
154
+
155
+ 5) Threat enumeration (concrete, system-specific)
156
+ - Generate threats as attacker stories tied to:
157
+ * entry points
158
+ * trust boundaries
159
+ * privileged components
160
+ - Prefer abuse paths (multi-step sequences) over single-line generic threats.
161
+
162
+ 6) Risk prioritization
163
+ - For each threat:
164
+ * Likelihood: low/medium/high with a 1 to 2 sentence justification
165
+ * Impact: low/medium/high with a 1 to 2 sentence justification
166
+ * Overall priority: critical/high/medium/low (based on likelihood x impact, adjusted for existing controls)
167
+ - Explicitly state which assumptions most affect risk.
168
+
169
+ 7) Validate assumptions and service context with the user (required before final report)
170
+ - Summarize key assumptions that materially affect scope or risk ranking.
171
+ - Ask 1 to 3 targeted questions to resolve missing service meta-context (service owner/environment, scale/users, deployment model, authn/authz, internet exposure, data sensitivity, multi-tenancy).
172
+ - Pause and wait for user feedback before producing the final report.
173
+ - If the user cannot answer, proceed with explicit assumptions and mark any conditional conclusions.
174
+
175
+ 8) Mitigations and recommendations
176
+ - For each high/critical threat:
177
+ * Existing mitigations (with evidence anchors)
178
+ * Gaps/weaknesses
179
+ * Recommended mitigations (code/config/process)
180
+ * Detection/monitoring ideas (logging, metrics, alerts)
181
+
182
+ 9) Focus paths for manual security review
183
+ - Output 2 to 30 repo-relative paths (files or directories) that merit deeper review.
184
+ - For each path, give a one-sentence reason tied to the threat model.
185
+
186
+ 10) Quality check
187
+ - Provide a short checklist confirming you covered:
188
+ * all entry points you discovered
189
+ * each trust boundary at least once in threats
190
+ * runtime vs CI/dev separation
191
+ * user clarifications (or explicit non-responses)
192
+ * assumptions and open questions
193
+
194
+ ## Required output format (exact)
195
+ Before producing the final Markdown report, first provide an assumption-validation check-in:
196
+ - List the key assumptions in 3 to 6 bullets.
197
+ - Ask 1 to 3 targeted context questions.
198
+ - Wait for the user response, then produce the final report below using the clarified context.
199
+
200
+ Produce valid Markdown with these sections in this order:
201
+
202
+ ## Executive summary
203
+ - 1 short paragraph on the top risk themes and highest-risk areas.
204
+
205
+ ## Scope and assumptions
206
+ - In-scope paths, out-of-scope items, and explicit assumptions.
207
+ - A short list of open questions that would materially change the risk ranking.
208
+
209
+
210
+ ## System model
211
+ ### Primary components
212
+ ### Data flows and trust boundaries
213
+ Represent the system as a sequence of arrow-style bullets (e.g., Internet → API Server, User Input -> Application Logic, etc). For each boundary, document:
214
+ • the primary data types crossing the boundary,
215
+ • the communication channel or protocol,
216
+ • the security guarantees (e.g., authentication, origin checks, encryption, rate limiting), and
217
+ • any input validation, normalization, or schema enforcement performed.
218
+
219
+ #### Diagram
220
+ - Include a single, compact Mermaid diagram (`flowchart TD` or `flowchart LR`) showing primary components and trust boundaries (e.g., separate trust zones via subgraphs). Keep it compact, use only `-->`, avoid `title`/`style`, keep node labels short (no paths/URLs), and keep edge labels to plain words only (avoid `{}`, `[]`, `()`, or quotes).
221
+
222
+
223
+ ## Assets and security objectives
224
+ - A table: Asset | Why it matters | Security objective (C/I/A)
225
+
226
+ ## Attacker model
227
+ ### Capabilities
228
+ ### Non-capabilities
229
+
230
+ ## Entry points and attack surfaces
231
+ - A table: Surface | How reached | Trust boundary | Notes | Evidence (repo path / symbol)
232
+
233
+ ## Top abuse paths
234
+ - 5 to 10 short abuse paths, each as a numbered sequence of steps (attacker goal -> steps -> impact).
235
+
236
+ ## Threat model table
237
+ - A Markdown table with columns:
238
+ Threat ID | Threat source | Prerequisites | Threat action | Impact | Impacted assets | Existing controls (evidence) | Gaps | Recommended mitigations | Detection ideas | Likelihood | Impact severity | Priority
239
+
240
+ Rules:
241
+ - Threat IDs must be stable and formatted: TM-001, TM-002, ...
242
+ - Priority must be one of: critical, high, medium, low.
243
+ - Keep prerequisites to 1 to 2 sentences. Keep recommended mitigations concrete.
244
+
245
+ ## Criticality calibration
246
+ - Define what counts as critical/high/medium/low for THIS repo and context.
247
+ - Include 2 to 3 examples per level (tailored to the repo's assets and exposure).
248
+
249
+ ## Focus paths for security review
250
+ - A table: Path | Why it matters | Related Threat IDs
251
+
252
+ ## Notes on use
253
+
254
+ - Fill in known context, but allow the model to infer and mark assumptions.
255
+ - Include 1–2 repo-path anchors per major claim; do not dump every match.
@@ -0,0 +1,32 @@
1
+ # Security Controls and Asset Categories
2
+
3
+ Use this as a lightweight checklist to keep outputs consistent across teams. Prefer concrete, system-specific items over generic text.
4
+
5
+ ## Asset categories (pick only what applies)
6
+ - User data (PII, content, uploads)
7
+ - Authentication artifacts (passwords, tokens, sessions, cookies)
8
+ - Authorization state (roles, policies, ACLs)
9
+ - Secrets and keys (API keys, signing keys, encryption keys)
10
+ - Configuration and feature flags
11
+ - Models and weights (if ML systems)
12
+ - Source code and build artifacts
13
+ - Audit logs and telemetry
14
+ - Availability-critical resources (queues, caches, rate limits, compute budgets)
15
+ - Tenant isolation boundaries and metadata
16
+
17
+ ## Security control categories
18
+ - Identity and access: authN, authZ, session handling, mTLS, key rotation
19
+ - Input protection: schema validation, parsing hardening, upload scanning, sandboxing
20
+ - Network safeguards: TLS, network policies, WAF, rate limiting, DoS controls
21
+ - Data protection: encryption at rest/in transit, tokenization, redaction
22
+ - Isolation: process sandboxing, container boundaries, tenant isolation, seccomp
23
+ - Observability: audit logs, alerting, anomaly detection, tamper resistance
24
+ - Supply chain: dependency pinning, SBOMs, provenance, signing
25
+ - Change control: CI checks, deployment approvals, config guardrails
26
+
27
+ ## Mitigation phrasing patterns
28
+ - "Enforce schema at <boundary> for <payload> before <component>."
29
+ - "Require authZ check for <action> on <resource> in <service>."
30
+ - "Isolate <parser/component> in a sandbox with <resource limits>."
31
+ - "Rate limit <endpoint> by <key> and apply burst caps."
32
+ - "Encrypt <data> at rest using <key management> and rotate <keys>."
@@ -0,0 +1,212 @@
1
+ ---
2
+ name: semgrep
3
+ description: >-
4
+ Run Semgrep static analysis scan on a codebase using parallel subagents.
5
+ Supports two scan modes — "run all" (full ruleset coverage) and "important
6
+ only" (high-confidence security vulnerabilities). Automatically detects and
7
+ uses Semgrep Pro for cross-file taint analysis when available. Use when asked
8
+ to scan code for vulnerabilities, run a security audit with Semgrep, find
9
+ bugs, or perform static analysis. Spawns parallel workers for multi-language
10
+ codebases.
11
+ allowed-tools:
12
+ - Bash
13
+ - Read
14
+ - Glob
15
+ - Task
16
+ - AskUserQuestion
17
+ - TaskCreate
18
+ - TaskList
19
+ - TaskUpdate
20
+ ---
21
+
22
+ # Semgrep Security Scan
23
+
24
+ Run a Semgrep scan with automatic language detection, parallel execution via Task subagents, and merged SARIF output.
25
+
26
+ ## Essential Principles
27
+
28
+ 1. **Always use `--metrics=off`** — Semgrep sends telemetry by default; `--config auto` also phones home. Every `semgrep` command must include `--metrics=off` to prevent data leakage during security audits.
29
+ 2. **User must approve the scan plan (Step 3 is a hard gate)** — The original "scan this codebase" request is NOT approval. Present exact rulesets, target, engine, and mode; wait for explicit "yes"/"proceed" before spawning scanners.
30
+ 3. **Third-party rulesets are required, not optional** — Trail of Bits, 0xdea, and Decurity rules catch vulnerabilities absent from the official registry. Include them whenever the detected language matches.
31
+ 4. **Spawn all scan Tasks in a single message** — Parallel execution is the core performance advantage. Never spawn Tasks sequentially; always emit all Task tool calls in one response.
32
+ 5. **Always check for Semgrep Pro before scanning** — Pro enables cross-file taint tracking and catches ~250% more true positives. Skipping the check means silently missing critical inter-file vulnerabilities.
33
+
34
+ ## When to Use
35
+
36
+ - Security audit of a codebase
37
+ - Finding vulnerabilities before code review
38
+ - Scanning for known bug patterns
39
+ - First-pass static analysis
40
+
41
+ ## When NOT to Use
42
+
43
+ - Binary analysis → Use binary analysis tools
44
+ - Already have Semgrep CI configured → Use existing pipeline
45
+ - Need cross-file analysis but no Pro license → Consider CodeQL as alternative
46
+ - Creating custom Semgrep rules → Use `semgrep-rule-creator` skill
47
+ - Porting existing rules to other languages → Use `semgrep-rule-variant-creator` skill
48
+
49
+ ## Output Directory
50
+
51
+ All scan results, SARIF files, and temporary data are stored in a single output directory.
52
+
53
+ - **If the user specifies an output directory** in their prompt, use it as `OUTPUT_DIR`.
54
+ - **If not specified**, default to `./static_analysis_semgrep_1`. If that already exists, increment to `_2`, `_3`, etc.
55
+
56
+ In both cases, **always create the directory** with `mkdir -p` before writing any files.
57
+
58
+ ```bash
59
+ # Resolve output directory
60
+ if [ -n "$USER_SPECIFIED_DIR" ]; then
61
+ OUTPUT_DIR="$USER_SPECIFIED_DIR"
62
+ else
63
+ BASE="static_analysis_semgrep"
64
+ N=1
65
+ while [ -e "${BASE}_${N}" ]; do
66
+ N=$((N + 1))
67
+ done
68
+ OUTPUT_DIR="${BASE}_${N}"
69
+ fi
70
+ mkdir -p "$OUTPUT_DIR/raw" "$OUTPUT_DIR/results"
71
+ ```
72
+
73
+ The output directory is resolved **once** at the start of Step 1 and used throughout all subsequent steps.
74
+
75
+ ```
76
+ $OUTPUT_DIR/
77
+ ├── rulesets.txt # Approved rulesets (logged after Step 3)
78
+ ├── raw/ # Per-scan raw output (unfiltered)
79
+ │ ├── python-python.json
80
+ │ ├── python-python.sarif
81
+ │ ├── python-django.json
82
+ │ ├── python-django.sarif
83
+ │ └── ...
84
+ └── results/ # Final merged output
85
+ └── results.sarif
86
+ ```
87
+
88
+ ## Prerequisites
89
+
90
+ **Required:** Semgrep CLI (`semgrep --version`). If not installed, see [Semgrep installation docs](https://semgrep.dev/docs/getting-started/).
91
+
92
+ **Optional:** Semgrep Pro — enables cross-file taint tracking, inter-procedural analysis, and additional languages (Apex, C#, Elixir). Check with:
93
+
94
+ ```bash
95
+ semgrep --pro --validate --config p/default 2>/dev/null && echo "Pro available" || echo "OSS only"
96
+ ```
97
+
98
+ **Limitations:** OSS mode cannot track data flow across files. Pro mode uses `-j 1` for cross-file analysis (slower per ruleset, but parallel rulesets compensate).
99
+
100
+ ## Scan Modes
101
+
102
+ Select mode in Step 2 of the workflow. Mode affects both scanner flags and post-processing.
103
+
104
+ | Mode | Coverage | Findings Reported |
105
+ |------|----------|-------------------|
106
+ | **Run all** | All rulesets, all severity levels | Everything |
107
+ | **Important only** | All rulesets, pre- and post-filtered | Security vulns only, medium-high confidence/impact |
108
+
109
+ **Important only** applies two filter layers:
110
+ 1. **Pre-filter**: `--severity MEDIUM --severity HIGH --severity CRITICAL` (CLI flag)
111
+ 2. **Post-filter**: JSON metadata — keeps only `category=security`, `confidence∈{MEDIUM,HIGH}`, `impact∈{MEDIUM,HIGH}`
112
+
113
+ See [scan-modes.md](references/scan-modes.md) for metadata criteria and jq filter commands.
114
+
115
+ ## Orchestration Architecture
116
+
117
+ ```
118
+ ┌──────────────────────────────────────────────────────────────────┐
119
+ │ MAIN AGENT (this skill) │
120
+ │ Step 1: Detect languages + check Pro availability │
121
+ │ Step 2: Select scan mode + rulesets (ref: rulesets.md) │
122
+ │ Step 3: Present plan + rulesets, get approval [⛔ HARD GATE] │
123
+ │ Step 4: Spawn parallel scan Tasks (approved rulesets + mode) │
124
+ │ Step 5: Merge results and report │
125
+ └──────────────────────────────────────────────────────────────────┘
126
+ │ Step 4
127
+
128
+ ┌─────────────────┐
129
+ │ Scan Tasks │
130
+ │ (parallel) │
131
+ ├─────────────────┤
132
+ │ Python scanner │
133
+ │ JS/TS scanner │
134
+ │ Go scanner │
135
+ │ Docker scanner │
136
+ └─────────────────┘
137
+ ```
138
+
139
+ ## Workflow
140
+
141
+ **Follow the detailed workflow in [scan-workflow.md](workflows/scan-workflow.md).** Summary:
142
+
143
+ | Step | Action | Gate | Key Reference |
144
+ |------|--------|------|---------------|
145
+ | 1 | Resolve output dir, detect languages + Pro availability | — | Use Glob, not Bash |
146
+ | 2 | Select scan mode + rulesets | — | [rulesets.md](references/rulesets.md) |
147
+ | 3 | Present plan, get explicit approval | ⛔ HARD | AskUserQuestion |
148
+ | 4 | Spawn parallel scan Tasks | — | [scanner-task-prompt.md](references/scanner-task-prompt.md) |
149
+ | 5 | Merge results and report | — | Merge script (below) |
150
+
151
+ **Task enforcement:** On invocation, create 5 tasks with blockedBy dependencies (each step blocks the previous). Step 3 is a HARD GATE — mark complete ONLY after user explicitly approves.
152
+
153
+ **Merge command (Step 5):**
154
+
155
+ ```bash
156
+ uv run {baseDir}/scripts/merge_sarif.py $OUTPUT_DIR/raw $OUTPUT_DIR/results/results.sarif
157
+ ```
158
+
159
+ ## Agents
160
+
161
+ | Agent | Tools | Purpose |
162
+ |-------|-------|---------|
163
+ | `static-analysis:semgrep-scanner` | Bash | Executes parallel semgrep scans for a language category |
164
+
165
+ Use `subagent_type: static-analysis:semgrep-scanner` in Step 4 when spawning Task subagents.
166
+
167
+ ## Rationalizations to Reject
168
+
169
+ | Shortcut | Why It's Wrong |
170
+ |----------|----------------|
171
+ | "User asked for scan, that's approval" | Original request ≠ plan approval. Present plan, use AskUserQuestion, await explicit "yes" |
172
+ | "Step 3 task is blocking, just mark complete" | Lying about task status defeats enforcement. Only mark complete after real approval |
173
+ | "I already know what they want" | Assumptions cause scanning wrong directories/rulesets. Present plan for verification |
174
+ | "Just use default rulesets" | User must see and approve exact rulesets before scan |
175
+ | "Add extra rulesets without asking" | Modifying approved list without consent breaks trust |
176
+ | "Third-party rulesets are optional" | Trail of Bits, 0xdea, Decurity catch vulnerabilities not in official registry — REQUIRED |
177
+ | "Use --config auto" | Sends metrics; less control over rulesets |
178
+ | "One Task at a time" | Defeats parallelism; spawn all Tasks together |
179
+ | "Pro is too slow, skip --pro" | Cross-file analysis catches 250% more true positives; worth the time |
180
+ | "Semgrep handles GitHub URLs natively" | URL handling fails on repos with non-standard YAML; always clone first |
181
+ | "Cleanup is optional" | Cloned repos pollute the user's workspace and accumulate across runs |
182
+ | "Use `.` or relative path as target" | Subagents need absolute paths to avoid ambiguity |
183
+ | "Let the user pick an output dir later" | Output directory must be resolved at Step 1, before any files are created |
184
+
185
+ ## Reference Index
186
+
187
+ | File | Content |
188
+ |------|---------|
189
+ | [rulesets.md](references/rulesets.md) | Complete ruleset catalog and selection algorithm |
190
+ | [scan-modes.md](references/scan-modes.md) | Pre/post-filter criteria and jq commands |
191
+ | [scanner-task-prompt.md](references/scanner-task-prompt.md) | Template for spawning scanner subagents |
192
+
193
+ | Workflow | Purpose |
194
+ |----------|---------|
195
+ | [scan-workflow.md](workflows/scan-workflow.md) | Complete 5-step scan execution process |
196
+
197
+ ## Success Criteria
198
+
199
+ - [ ] Output directory resolved (user-specified or auto-incremented default)
200
+ - [ ] All generated files stored inside `$OUTPUT_DIR`
201
+ - [ ] Languages detected with file counts; Pro status checked
202
+ - [ ] Scan mode selected by user (run all / important only)
203
+ - [ ] Rulesets include third-party rules for all detected languages
204
+ - [ ] User explicitly approved the scan plan (Step 3 gate passed)
205
+ - [ ] All scan Tasks spawned in a single message and completed
206
+ - [ ] Every `semgrep` command used `--metrics=off`
207
+ - [ ] Approved rulesets logged to `$OUTPUT_DIR/rulesets.txt`
208
+ - [ ] Raw per-scan outputs stored in `$OUTPUT_DIR/raw/`
209
+ - [ ] `results.sarif` exists in `$OUTPUT_DIR/results/` and is valid JSON
210
+ - [ ] Important-only mode: post-filter applied before merge; unfiltered results preserved in `raw/`
211
+ - [ ] Results summary reported with severity and category breakdown
212
+ - [ ] Cloned repos (if any) cleaned up from `$OUTPUT_DIR/repos/`
@@ -0,0 +1,162 @@
1
+ # Semgrep Rulesets Reference
2
+
3
+ ## Complete Ruleset Catalog
4
+
5
+ ### Security-Focused Rulesets
6
+
7
+ | Ruleset | Description | Use Case |
8
+ |---------|-------------|----------|
9
+ | `p/security-audit` | Comprehensive vulnerability detection, higher false positives | Manual audits, security reviews |
10
+ | `p/secrets` | Hardcoded credentials, API keys, tokens | Always include |
11
+ | `p/owasp-top-ten` | OWASP Top 10 web application vulnerabilities | Web app security |
12
+ | `p/cwe-top-25` | CWE Top 25 most dangerous software weaknesses | General security |
13
+ | `p/sql-injection` | SQL injection patterns and tainted data flows | Database security |
14
+ | `p/insecure-transport` | Ensures code uses encrypted channels | Network security |
15
+ | `p/gitleaks` | Hard-coded credentials detection (gitleaks port) | Secrets scanning |
16
+ | `p/findsecbugs` | FindSecBugs rule pack for Java | Java security |
17
+ | `p/phpcs-security-audit` | PHP security audit rules | PHP security |
18
+
19
+ ### CI/CD Rulesets
20
+
21
+ | Ruleset | Description | Use Case |
22
+ |---------|-------------|----------|
23
+ | `p/default` | Default ruleset, balanced coverage | First-time users |
24
+ | `p/ci` | High-confidence security + logic bugs, low FP | CI pipelines |
25
+ | `p/r2c-ci` | Low false positives, CI-safe | CI/CD blocking |
26
+ | `p/r2c` | Community favorite, curated by Semgrep (618k+ downloads) | General scanning |
27
+ | `p/auto` | Auto-selects rules based on detected languages/frameworks | Quick scans |
28
+ | `p/comment` | Comment-related rules | Code review |
29
+
30
+ ### Third-Party Rulesets
31
+
32
+ | Ruleset | Description | Maintainer |
33
+ |---------|-------------|------------|
34
+ | `p/gitlab` | GitLab-maintained security rules | GitLab |
35
+
36
+ ---
37
+
38
+ ## Ruleset Selection Algorithm
39
+
40
+ Follow this algorithm to select rulesets based on detected languages and frameworks.
41
+
42
+ ### Step 1: Always Include Security Baseline
43
+
44
+ ```json
45
+ {
46
+ "baseline": ["p/security-audit", "p/secrets"]
47
+ }
48
+ ```
49
+
50
+ - `p/security-audit` - Comprehensive vulnerability detection (always include)
51
+ - `p/secrets` - Hardcoded credentials, API keys, tokens (always include)
52
+
53
+ ### Step 2: Add Language-Specific Rulesets
54
+
55
+ For each detected language, add the primary ruleset. If a framework is detected, add its ruleset too.
56
+
57
+ **GA Languages (production-ready):**
58
+
59
+ | Detection | Primary Ruleset | Framework Rulesets | Pro Rule Count |
60
+ |-----------|-----------------|-------------------|----------------|
61
+ | `.py` | `p/python` | `p/django`, `p/flask`, `p/fastapi` | 710+ |
62
+ | `.js`, `.jsx` | `p/javascript` | `p/react`, `p/nodejs`, `p/express`, `p/nextjs`, `p/angular` | 250+ (JS), 70+ (JSX) |
63
+ | `.ts`, `.tsx` | `p/typescript` | `p/react`, `p/nodejs`, `p/express`, `p/nextjs`, `p/angular` | 230+ |
64
+ | `.go` | `p/golang` | `p/go` (alias) | 80+ |
65
+ | `.java` | `p/java` | `p/spring`, `p/findsecbugs` | 190+ |
66
+ | `.kt` | `p/kotlin` | `p/spring` | 60+ |
67
+ | `.rb` | `p/ruby` | `p/rails` | 40+ |
68
+ | `.php` | `p/php` | `p/symfony`, `p/laravel`, `p/phpcs-security-audit` | 50+ |
69
+ | `.c`, `.cpp`, `.h` | `p/c` | - | 150+ |
70
+ | `.rs` | `p/rust` | - | 40+ |
71
+ | `.cs` | `p/csharp` | - | 170+ |
72
+ | `.scala` | `p/scala` | - | Community |
73
+ | `.swift` | `p/swift` | - | 60+ |
74
+
75
+ **Beta Languages (Pro recommended):**
76
+
77
+ | Detection | Primary Ruleset | Notes |
78
+ |-----------|-----------------|-------|
79
+ | `.ex`, `.exs` | `p/elixir` | Requires Pro for best coverage |
80
+ | `.cls`, `.trigger` | `p/apex` | Salesforce; requires Pro |
81
+
82
+ **Experimental Languages:**
83
+
84
+ | Detection | Primary Ruleset | Notes |
85
+ |-----------|-----------------|-------|
86
+ | `.sol` | No official ruleset | Use Decurity third-party rules |
87
+ | `Dockerfile` | `p/dockerfile` | Limited rules |
88
+ | `.yaml`, `.yml` | `p/yaml` | K8s, GitHub Actions, docker-compose patterns |
89
+ | `.json` | `r/json.aws` | AWS IAM policies; use `r/json.*` for specific rules |
90
+ | Bash scripts | - | Community support |
91
+ | Cairo, Circom | - | Experimental, smart contracts |
92
+
93
+ **Framework detection hints:**
94
+
95
+ | Framework | Detection Signals | Ruleset |
96
+ |-----------|------------------|---------|
97
+ | Django | `settings.py`, `urls.py`, `django` in requirements | `p/django` |
98
+ | Flask | `flask` in requirements, `@app.route` | `p/flask` |
99
+ | FastAPI | `fastapi` in requirements, `@app.get/post` | `p/fastapi` |
100
+ | React | `package.json` with react dependency, `.jsx`/`.tsx` files | `p/react` |
101
+ | Next.js | `next.config.js`, `pages/` or `app/` directory | `p/nextjs` |
102
+ | Angular | `angular.json`, `@angular/` dependencies | `p/angular` |
103
+ | Express | `express` in package.json, `app.use()` patterns | `p/express` |
104
+ | NestJS | `@nestjs/` dependencies, `@Controller` decorators | `p/nodejs` |
105
+ | Spring | `pom.xml` with spring, `@SpringBootApplication` | `p/spring` |
106
+ | Rails | `Gemfile` with rails, `config/routes.rb` | `p/rails` |
107
+ | Laravel | `composer.json` with laravel, `artisan` | `p/laravel` |
108
+ | Symfony | `composer.json` with symfony, `config/packages/` | `p/symfony` |
109
+
110
+ ### Step 3: Add Infrastructure Rulesets
111
+
112
+ | Detection | Ruleset | Description |
113
+ |-----------|---------|-------------|
114
+ | `Dockerfile` | `p/dockerfile` | Container security, best practices |
115
+ | `.tf`, `.hcl` | `p/terraform` | IaC misconfigurations, CIS benchmarks, AWS/Azure/GCP |
116
+ | k8s manifests | `p/kubernetes` | K8s security, RBAC issues |
117
+ | CloudFormation | `p/cloudformation` | AWS infrastructure security |
118
+ | GitHub Actions | `p/github-actions` | CI/CD security, secrets exposure |
119
+ | `.yaml`, `.yml` | `p/yaml` | Generic YAML patterns (K8s, docker-compose) |
120
+ | AWS IAM JSON | `r/json.aws` | IAM policy misconfigurations (use `--config r/json.aws`) |
121
+
122
+ ### Step 4: Add Third-Party Rulesets
123
+
124
+ These are **NOT optional**. Include automatically when language matches:
125
+
126
+ | Languages | Source | Why Required |
127
+ |-----------|--------|--------------|
128
+ | Python, Go, Ruby, JS/TS, Terraform, HCL | [Trail of Bits](https://github.com/trailofbits/semgrep-rules) | Security audit patterns from real engagements (AGPLv3) |
129
+ | C, C++ | [0xdea](https://github.com/0xdea/semgrep-rules) | Memory safety, low-level vulnerabilities |
130
+ | Solidity, Cairo, Rust | [Decurity](https://github.com/Decurity/semgrep-smart-contracts) | Smart contract vulnerabilities, DeFi exploits |
131
+ | Go | [dgryski](https://github.com/dgryski/semgrep-go) | Additional Go-specific patterns |
132
+ | Android (Java/Kotlin) | [MindedSecurity](https://github.com/mindedsecurity/semgrep-rules-android-security) | OWASP MASTG-derived mobile security rules |
133
+ | Java, Go, JS/TS, C#, Python, PHP | [elttam](https://github.com/elttam/semgrep-rules) | Security consulting patterns |
134
+ | Dockerfile, PHP, Go, Java | [kondukto](https://github.com/kondukto-io/semgrep-rules) | Container and web app security |
135
+ | PHP, Kotlin, Java | [dotta](https://github.com/federicodotta/semgrep-rules) | Pentest-derived web/mobile app rules |
136
+ | Terraform, HCL | [HashiCorp](https://github.com/hashicorp-forge/semgrep-rules) | HashiCorp infrastructure patterns |
137
+ | Swift, Java, Cobol | [akabe1](https://github.com/akabe1/akabe1-semgrep-rules) | iOS and legacy system patterns |
138
+ | Java | [Atlassian Labs](https://github.com/atlassian-labs/atlassian-sast-ruleset) | Atlassian-maintained Java rules |
139
+ | Python, JS/TS, Java, Ruby, Go, PHP | [Apiiro](https://github.com/apiiro/malicious-code-ruleset) | Malicious code detection, supply chain |
140
+
141
+ ### Step 5: Verify Rulesets
142
+
143
+ Before finalizing, verify official rulesets load:
144
+
145
+ ```bash
146
+ # Quick validation (exits 0 if valid)
147
+ semgrep --config p/python --validate --metrics=off 2>&1 | head -3
148
+ ```
149
+
150
+ Or browse the [Semgrep Registry](https://semgrep.dev/explore).
151
+
152
+ ### Output Format
153
+
154
+ ```json
155
+ {
156
+ "baseline": ["p/security-audit", "p/secrets"],
157
+ "python": ["p/python", "p/django"],
158
+ "javascript": ["p/javascript", "p/react", "p/nodejs"],
159
+ "docker": ["p/dockerfile"],
160
+ "third_party": ["https://github.com/trailofbits/semgrep-rules"]
161
+ }
162
+ ```