@vigolium/piolium 0.0.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (271) hide show
  1. package/LICENSE +21 -0
  2. package/README.md +117 -0
  3. package/agents/access-auditor.md +300 -0
  4. package/agents/assumption-breaker.md +154 -0
  5. package/agents/attack-designer.md +116 -0
  6. package/agents/code-scanner.md +139 -0
  7. package/agents/concurrency-auditor.md +238 -0
  8. package/agents/confirm-writer.md +257 -0
  9. package/agents/context-reviewer.md +274 -0
  10. package/agents/cross-verifier.md +165 -0
  11. package/agents/cve-scout.md +381 -0
  12. package/agents/env-builder.md +282 -0
  13. package/agents/env-profiler.md +205 -0
  14. package/agents/evidence-collector.md +140 -0
  15. package/agents/finding-grader.md +142 -0
  16. package/agents/finding-writer.md +148 -0
  17. package/agents/flow-tracer.md +106 -0
  18. package/agents/goal-backtracer.md +146 -0
  19. package/agents/history-miner.md +467 -0
  20. package/agents/independent-verifier.md +118 -0
  21. package/agents/intent-mapper.md +183 -0
  22. package/agents/longshot-collector.md +128 -0
  23. package/agents/longshot-prober.md +126 -0
  24. package/agents/patch-auditor.md +73 -0
  25. package/agents/poc-author.md +124 -0
  26. package/agents/poc-runner.md +194 -0
  27. package/agents/probe-lead.md +269 -0
  28. package/agents/red-challenger.md +101 -0
  29. package/agents/report-composer.md +208 -0
  30. package/agents/review-adjudicator.md +216 -0
  31. package/agents/spec-auditor.md +155 -0
  32. package/agents/taint-tracer.md +265 -0
  33. package/agents/test-locator.md +209 -0
  34. package/agents/threat-modeler.md +132 -0
  35. package/agents/variant-scanner.md +108 -0
  36. package/agents/variant-spotter.md +110 -0
  37. package/bin/piolium.mjs +376 -0
  38. package/extensions/piolium/_vendor/yaml.bundle.d.mts +6 -0
  39. package/extensions/piolium/_vendor/yaml.bundle.mjs +139 -0
  40. package/extensions/piolium/agent-runner.ts +322 -0
  41. package/extensions/piolium/agents.ts +266 -0
  42. package/extensions/piolium/audit-state.ts +522 -0
  43. package/extensions/piolium/bundled-resources.ts +97 -0
  44. package/extensions/piolium/candidate-scan.ts +966 -0
  45. package/extensions/piolium/command-target.ts +177 -0
  46. package/extensions/piolium/console-stream.ts +57 -0
  47. package/extensions/piolium/export-results.ts +380 -0
  48. package/extensions/piolium/findings.ts +448 -0
  49. package/extensions/piolium/heartbeat.ts +182 -0
  50. package/extensions/piolium/help.ts +234 -0
  51. package/extensions/piolium/index.ts +1865 -0
  52. package/extensions/piolium/longshot.ts +530 -0
  53. package/extensions/piolium/matcher-suggestions.ts +196 -0
  54. package/extensions/piolium/matcher-utils.ts +83 -0
  55. package/extensions/piolium/modes/balanced.ts +750 -0
  56. package/extensions/piolium/modes/confirm-bootstrap.ts +186 -0
  57. package/extensions/piolium/modes/confirm.ts +697 -0
  58. package/extensions/piolium/modes/deep.ts +917 -0
  59. package/extensions/piolium/modes/diff.ts +177 -0
  60. package/extensions/piolium/modes/lite.ts +540 -0
  61. package/extensions/piolium/modes/longshot.ts +595 -0
  62. package/extensions/piolium/modes/merge.ts +204 -0
  63. package/extensions/piolium/modes/phase-runner.ts +267 -0
  64. package/extensions/piolium/modes/reinvest.ts +546 -0
  65. package/extensions/piolium/modes/revisit.ts +279 -0
  66. package/extensions/piolium/modes.ts +48 -0
  67. package/extensions/piolium/phase-labels.ts +123 -0
  68. package/extensions/piolium/phase-status-strip.ts +92 -0
  69. package/extensions/piolium/prompt-prefix-editor.ts +39 -0
  70. package/extensions/piolium/providers/anthropic-vertex.ts +836 -0
  71. package/extensions/piolium/recon.ts +409 -0
  72. package/extensions/piolium/result-stats.ts +105 -0
  73. package/extensions/piolium/retry.ts +120 -0
  74. package/extensions/piolium/scheduler.ts +212 -0
  75. package/extensions/piolium/secrets.ts +368 -0
  76. package/extensions/piolium/tools/web-tools.ts +148 -0
  77. package/package.json +77 -0
  78. package/skills/agentic-actions-auditor/SKILL.md +327 -0
  79. package/skills/agentic-actions-auditor/references/action-profiles.md +186 -0
  80. package/skills/agentic-actions-auditor/references/cross-file-resolution.md +209 -0
  81. package/skills/agentic-actions-auditor/references/foundations.md +94 -0
  82. package/skills/agentic-actions-auditor/references/vector-a-env-var-intermediary.md +77 -0
  83. package/skills/agentic-actions-auditor/references/vector-b-direct-expression-injection.md +83 -0
  84. package/skills/agentic-actions-auditor/references/vector-c-cli-data-fetch.md +83 -0
  85. package/skills/agentic-actions-auditor/references/vector-d-pr-target-checkout.md +88 -0
  86. package/skills/agentic-actions-auditor/references/vector-e-error-log-injection.md +88 -0
  87. package/skills/agentic-actions-auditor/references/vector-f-subshell-expansion.md +82 -0
  88. package/skills/agentic-actions-auditor/references/vector-g-eval-of-ai-output.md +91 -0
  89. package/skills/agentic-actions-auditor/references/vector-h-dangerous-sandbox-configs.md +102 -0
  90. package/skills/agentic-actions-auditor/references/vector-i-wildcard-allowlists.md +88 -0
  91. package/skills/audit/SKILL.md +562 -0
  92. package/skills/audit/assets/icon.svg +7 -0
  93. package/skills/audit/hooks/scripts/validate_phase_output.py +550 -0
  94. package/skills/audit/references/adversarial-review.md +148 -0
  95. package/skills/audit/references/architecture-aware-sast.md +306 -0
  96. package/skills/audit/references/audit-workflow.md +737 -0
  97. package/skills/audit/references/chamber-protocol.md +384 -0
  98. package/skills/audit/references/creative-attack-modes.md +221 -0
  99. package/skills/audit/references/deep-analysis.md +273 -0
  100. package/skills/audit/references/domain-attack-playbooks.md +1129 -0
  101. package/skills/audit/references/knowledge-base-template.md +513 -0
  102. package/skills/audit/references/real-env-validation.md +191 -0
  103. package/skills/audit/references/report-templates.md +417 -0
  104. package/skills/audit/references/triage-and-prereqs.md +134 -0
  105. package/skills/audit/scripts/consolidate_drafts.py +554 -0
  106. package/skills/audit/scripts/partition_findings.py +152 -0
  107. package/skills/audit/scripts/rg-hotspots.sh +121 -0
  108. package/skills/audit/scripts/stamp_file_state.py +349 -0
  109. package/skills/code-reviewer/SKILL.md +65 -0
  110. package/skills/codeql/SKILL.md +281 -0
  111. package/skills/codeql/references/build-fixes.md +90 -0
  112. package/skills/codeql/references/diagnostic-query-templates.md +339 -0
  113. package/skills/codeql/references/extension-yaml-format.md +209 -0
  114. package/skills/codeql/references/important-only-suite.md +153 -0
  115. package/skills/codeql/references/language-details.md +207 -0
  116. package/skills/codeql/references/macos-arm64e-workaround.md +179 -0
  117. package/skills/codeql/references/performance-tuning.md +111 -0
  118. package/skills/codeql/references/quality-assessment.md +172 -0
  119. package/skills/codeql/references/ruleset-catalog.md +63 -0
  120. package/skills/codeql/references/run-all-suite.md +92 -0
  121. package/skills/codeql/references/sarif-processing.md +79 -0
  122. package/skills/codeql/references/threat-models.md +51 -0
  123. package/skills/codeql/workflows/build-database.md +280 -0
  124. package/skills/codeql/workflows/create-data-extensions.md +261 -0
  125. package/skills/codeql/workflows/run-analysis.md +301 -0
  126. package/skills/differential-review/SKILL.md +220 -0
  127. package/skills/differential-review/adversarial.md +203 -0
  128. package/skills/differential-review/methodology.md +234 -0
  129. package/skills/differential-review/patterns.md +300 -0
  130. package/skills/differential-review/reporting.md +369 -0
  131. package/skills/fp-check/SKILL.md +125 -0
  132. package/skills/fp-check/references/bug-class-verification.md +114 -0
  133. package/skills/fp-check/references/deep-verification.md +143 -0
  134. package/skills/fp-check/references/evidence-templates.md +91 -0
  135. package/skills/fp-check/references/false-positive-patterns.md +115 -0
  136. package/skills/fp-check/references/gate-reviews.md +27 -0
  137. package/skills/fp-check/references/standard-verification.md +78 -0
  138. package/skills/insecure-defaults/SKILL.md +117 -0
  139. package/skills/insecure-defaults/references/examples.md +409 -0
  140. package/skills/last30days/SKILL.md +444 -0
  141. package/skills/sarif-parsing/SKILL.md +483 -0
  142. package/skills/sarif-parsing/resources/jq-queries.md +162 -0
  143. package/skills/sarif-parsing/resources/sarif_helpers.py +331 -0
  144. package/skills/security-threat-model/LICENSE.txt +201 -0
  145. package/skills/security-threat-model/SKILL.md +81 -0
  146. package/skills/security-threat-model/agents/openai.yaml +4 -0
  147. package/skills/security-threat-model/references/prompt-template.md +255 -0
  148. package/skills/security-threat-model/references/security-controls-and-assets.md +32 -0
  149. package/skills/semgrep/SKILL.md +212 -0
  150. package/skills/semgrep/references/rulesets.md +162 -0
  151. package/skills/semgrep/references/scan-modes.md +110 -0
  152. package/skills/semgrep/references/scanner-task-prompt.md +140 -0
  153. package/skills/semgrep/scripts/merge_sarif.py +203 -0
  154. package/skills/semgrep/workflows/scan-workflow.md +311 -0
  155. package/skills/semgrep-rule-creator/SKILL.md +168 -0
  156. package/skills/semgrep-rule-creator/references/quick-reference.md +202 -0
  157. package/skills/semgrep-rule-creator/references/workflow.md +240 -0
  158. package/skills/semgrep-rule-variant-creator/SKILL.md +205 -0
  159. package/skills/semgrep-rule-variant-creator/references/applicability-analysis.md +250 -0
  160. package/skills/semgrep-rule-variant-creator/references/language-syntax-guide.md +324 -0
  161. package/skills/semgrep-rule-variant-creator/references/workflow.md +518 -0
  162. package/skills/sharp-edges/SKILL.md +292 -0
  163. package/skills/sharp-edges/references/auth-patterns.md +252 -0
  164. package/skills/sharp-edges/references/case-studies.md +274 -0
  165. package/skills/sharp-edges/references/config-patterns.md +333 -0
  166. package/skills/sharp-edges/references/crypto-apis.md +190 -0
  167. package/skills/sharp-edges/references/lang-c.md +205 -0
  168. package/skills/sharp-edges/references/lang-csharp.md +285 -0
  169. package/skills/sharp-edges/references/lang-go.md +270 -0
  170. package/skills/sharp-edges/references/lang-java.md +263 -0
  171. package/skills/sharp-edges/references/lang-javascript.md +269 -0
  172. package/skills/sharp-edges/references/lang-kotlin.md +265 -0
  173. package/skills/sharp-edges/references/lang-php.md +245 -0
  174. package/skills/sharp-edges/references/lang-python.md +274 -0
  175. package/skills/sharp-edges/references/lang-ruby.md +273 -0
  176. package/skills/sharp-edges/references/lang-rust.md +272 -0
  177. package/skills/sharp-edges/references/lang-swift.md +287 -0
  178. package/skills/sharp-edges/references/language-specific.md +588 -0
  179. package/skills/spec-to-code-compliance/SKILL.md +357 -0
  180. package/skills/spec-to-code-compliance/resources/COMPLETENESS_CHECKLIST.md +69 -0
  181. package/skills/spec-to-code-compliance/resources/IR_EXAMPLES.md +417 -0
  182. package/skills/spec-to-code-compliance/resources/OUTPUT_REQUIREMENTS.md +105 -0
  183. package/skills/supply-chain-risk-auditor/SKILL.md +67 -0
  184. package/skills/supply-chain-risk-auditor/resources/results-template.md +41 -0
  185. package/skills/variant-analysis/METHODOLOGY.md +327 -0
  186. package/skills/variant-analysis/SKILL.md +142 -0
  187. package/skills/variant-analysis/resources/codeql/cpp.ql +119 -0
  188. package/skills/variant-analysis/resources/codeql/go.ql +69 -0
  189. package/skills/variant-analysis/resources/codeql/java.ql +71 -0
  190. package/skills/variant-analysis/resources/codeql/javascript.ql +63 -0
  191. package/skills/variant-analysis/resources/codeql/python.ql +80 -0
  192. package/skills/variant-analysis/resources/semgrep/cpp.yaml +98 -0
  193. package/skills/variant-analysis/resources/semgrep/go.yaml +63 -0
  194. package/skills/variant-analysis/resources/semgrep/java.yaml +61 -0
  195. package/skills/variant-analysis/resources/semgrep/javascript.yaml +60 -0
  196. package/skills/variant-analysis/resources/semgrep/python.yaml +72 -0
  197. package/skills/variant-analysis/resources/variant-report-template.md +75 -0
  198. package/skills/vuln-report/SKILL.md +137 -0
  199. package/skills/vuln-report/agents/openai.yaml +4 -0
  200. package/skills/vuln-report/references/report-template.md +135 -0
  201. package/skills/wooyun-legacy/SKILL.md +367 -0
  202. package/skills/wooyun-legacy/references/bank-penetration.md +222 -0
  203. package/skills/wooyun-legacy/references/checklists/command-execution-checklist.md +119 -0
  204. package/skills/wooyun-legacy/references/checklists/csrf-checklist.md +74 -0
  205. package/skills/wooyun-legacy/references/checklists/file-upload-checklist.md +108 -0
  206. package/skills/wooyun-legacy/references/checklists/info-disclosure-checklist.md +114 -0
  207. package/skills/wooyun-legacy/references/checklists/logic-flaws-checklist.md +95 -0
  208. package/skills/wooyun-legacy/references/checklists/misconfig-checklist.md +124 -0
  209. package/skills/wooyun-legacy/references/checklists/path-traversal-checklist.md +87 -0
  210. package/skills/wooyun-legacy/references/checklists/rce-checklist.md +93 -0
  211. package/skills/wooyun-legacy/references/checklists/sql-injection-checklist.md +97 -0
  212. package/skills/wooyun-legacy/references/checklists/ssrf-checklist.md +99 -0
  213. package/skills/wooyun-legacy/references/checklists/unauthorized-access-checklist.md +89 -0
  214. package/skills/wooyun-legacy/references/checklists/weak-password-checklist.md +115 -0
  215. package/skills/wooyun-legacy/references/checklists/xss-checklist.md +103 -0
  216. package/skills/wooyun-legacy/references/checklists/xxe-checklist.md +130 -0
  217. package/skills/wooyun-legacy/references/info-disclosure.md +975 -0
  218. package/skills/wooyun-legacy/references/logic-flaws.md +721 -0
  219. package/skills/wooyun-legacy/references/path-traversal.md +1191 -0
  220. package/skills/wooyun-legacy/references/telecom-penetration.md +156 -0
  221. package/skills/wooyun-legacy/references/unauthorized-access.md +980 -0
  222. package/skills/wooyun-legacy/references/xss.md +746 -0
  223. package/skills/zeroize-audit/SKILL.md +371 -0
  224. package/skills/zeroize-audit/configs/c.yaml +21 -0
  225. package/skills/zeroize-audit/configs/default.yaml +128 -0
  226. package/skills/zeroize-audit/configs/rust.yaml +83 -0
  227. package/skills/zeroize-audit/prompts/report_template.md +238 -0
  228. package/skills/zeroize-audit/prompts/system.md +163 -0
  229. package/skills/zeroize-audit/prompts/task.md +97 -0
  230. package/skills/zeroize-audit/references/compile-commands.md +231 -0
  231. package/skills/zeroize-audit/references/detection-strategy.md +191 -0
  232. package/skills/zeroize-audit/references/ir-analysis.md +252 -0
  233. package/skills/zeroize-audit/references/mcp-analysis.md +221 -0
  234. package/skills/zeroize-audit/references/poc-generation.md +470 -0
  235. package/skills/zeroize-audit/references/rust-zeroization-patterns.md +867 -0
  236. package/skills/zeroize-audit/schemas/input.json +83 -0
  237. package/skills/zeroize-audit/schemas/output.json +140 -0
  238. package/skills/zeroize-audit/tools/analyze_asm.sh +202 -0
  239. package/skills/zeroize-audit/tools/analyze_cfg.py +381 -0
  240. package/skills/zeroize-audit/tools/analyze_heap.sh +211 -0
  241. package/skills/zeroize-audit/tools/analyze_ir_semantic.py +429 -0
  242. package/skills/zeroize-audit/tools/diff_ir.sh +135 -0
  243. package/skills/zeroize-audit/tools/diff_rust_mir.sh +189 -0
  244. package/skills/zeroize-audit/tools/emit_asm.sh +67 -0
  245. package/skills/zeroize-audit/tools/emit_ir.sh +77 -0
  246. package/skills/zeroize-audit/tools/emit_rust_asm.sh +178 -0
  247. package/skills/zeroize-audit/tools/emit_rust_ir.sh +150 -0
  248. package/skills/zeroize-audit/tools/emit_rust_mir.sh +158 -0
  249. package/skills/zeroize-audit/tools/extract_compile_flags.py +284 -0
  250. package/skills/zeroize-audit/tools/generate_poc.py +1329 -0
  251. package/skills/zeroize-audit/tools/mcp/apply_confidence_gates.py +113 -0
  252. package/skills/zeroize-audit/tools/mcp/check_mcp.sh +68 -0
  253. package/skills/zeroize-audit/tools/mcp/normalize_mcp_evidence.py +125 -0
  254. package/skills/zeroize-audit/tools/scripts/check_llvm_patterns.py +481 -0
  255. package/skills/zeroize-audit/tools/scripts/check_mir_patterns.py +554 -0
  256. package/skills/zeroize-audit/tools/scripts/check_rust_asm.py +424 -0
  257. package/skills/zeroize-audit/tools/scripts/check_rust_asm_aarch64.py +300 -0
  258. package/skills/zeroize-audit/tools/scripts/check_rust_asm_x86.py +283 -0
  259. package/skills/zeroize-audit/tools/scripts/find_dangerous_apis.py +375 -0
  260. package/skills/zeroize-audit/tools/scripts/semantic_audit.py +923 -0
  261. package/skills/zeroize-audit/tools/track_dataflow.sh +196 -0
  262. package/skills/zeroize-audit/tools/validate_rust_toolchain.sh +298 -0
  263. package/skills/zeroize-audit/workflows/phase-0-preflight.md +150 -0
  264. package/skills/zeroize-audit/workflows/phase-1-source-analysis.md +144 -0
  265. package/skills/zeroize-audit/workflows/phase-2-compiler-analysis.md +139 -0
  266. package/skills/zeroize-audit/workflows/phase-3-interim-report.md +46 -0
  267. package/skills/zeroize-audit/workflows/phase-4-poc-generation.md +46 -0
  268. package/skills/zeroize-audit/workflows/phase-5-poc-validation.md +136 -0
  269. package/skills/zeroize-audit/workflows/phase-6-final-report.md +44 -0
  270. package/skills/zeroize-audit/workflows/phase-7-test-generation.md +42 -0
  271. package/themes/piolium-srcery.json +94 -0
@@ -0,0 +1,231 @@
1
+ # Working with compile_commands.json
2
+
3
+ This reference covers how to generate and use `compile_commands.json` for the zeroize-audit IR/ASM analysis pipeline. Read this before running Step 7 (IR comparison) or Step 8 (assembly analysis) in `task.md`.
4
+
5
+ ---
6
+
7
+ ## Structure
8
+
9
+ `compile_commands.json` is a JSON array where each entry describes the exact compiler invocation for one translation unit (TU):
10
+
11
+ ```json
12
+ [
13
+ {
14
+ "directory": "/path/to/project/build",
15
+ "arguments": [
16
+ "clang", "-std=c11", "-I../include", "-DNDEBUG", "-Wall",
17
+ "-c", "../src/crypto.c", "-o", "crypto.c.o"
18
+ ],
19
+ "file": "../src/crypto.c"
20
+ },
21
+ {
22
+ "directory": "/path/to/project/build",
23
+ "command": "clang++ -std=c++17 -I../include -DNDEBUG -c ../src/aead.cpp -o aead.cpp.o",
24
+ "file": "../src/aead.cpp"
25
+ }
26
+ ]
27
+ ```
28
+
29
+ **`arguments` vs `command`**: Some tools produce an `arguments` array (preferred); others produce a `command` string. `extract_compile_flags.py` handles both forms transparently.
30
+
31
+ **`directory`**: The working directory for the invocation. All relative paths in `arguments`/`command` and `file` are resolved against this field — **not** against the current working directory when running analysis. `extract_compile_flags.py` handles this automatically; manual invocations must account for it.
32
+
33
+ ---
34
+
35
+ ## Generating compile_commands.json
36
+
37
+ ### CMake (C/C++)
38
+
39
+ ```bash
40
+ cmake -B build -DCMAKE_EXPORT_COMPILE_COMMANDS=ON
41
+ # Output: build/compile_commands.json
42
+ ```
43
+
44
+ **Constraints**: Works only with Makefile and Ninja generators. Does not work with Xcode or MSVC generators. Run from the project root and point `--compile-db` at `build/compile_commands.json`.
45
+
46
+ ### Bear (any Make-based build system)
47
+
48
+ Bear intercepts compiler invocations at the OS level. Works with any `make`-based or custom build system:
49
+
50
+ ```bash
51
+ # Install: apt install bear OR brew install bear
52
+ bear -- make clean all # clean build recommended for accuracy
53
+ # Output: compile_commands.json in the current directory
54
+ ```
55
+
56
+ Use `make clean all` rather than `make` alone to ensure all TUs are recompiled and captured. Incremental builds will only record the files that were actually recompiled.
57
+
58
+ ### intercept-build (LLVM scan-build companion)
59
+
60
+ ```bash
61
+ intercept-build make
62
+ # Output: compile_commands.json in the current directory
63
+ ```
64
+
65
+ ### Rust / Cargo
66
+
67
+ Cargo does not natively emit `compile_commands.json`. Two options:
68
+
69
+ ```bash
70
+ # Option 1: Bear with cargo check (faster — avoids linking)
71
+ bear -- cargo check
72
+ bear -- cargo build # if cargo check is insufficient
73
+
74
+ # Option 2: compiledb
75
+ pip install compiledb
76
+ compiledb cargo build
77
+ ```
78
+
79
+ **Critical limitation for Rust**: Bear captures `rustc` invocations, not `clang` invocations. `emit_ir.sh` (which calls `clang`) **will not work** directly on Rust TUs. Use `cargo rustc` instead to emit IR and assembly directly:
80
+
81
+ ```bash
82
+ # Preferred: use the emit scripts which handle CARGO_TARGET_DIR isolation:
83
+ {baseDir}/tools/emit_rust_ir.sh --manifest Cargo.toml --opt O0 --out /tmp/crate.O0.ll
84
+ {baseDir}/tools/emit_rust_ir.sh --manifest Cargo.toml --opt O2 --out /tmp/crate.O2.ll
85
+
86
+ # Manual alternative (output goes to an isolated temp dir, not target/debug/deps):
87
+ CARGO_TARGET_DIR=/tmp/zir cargo rustc -- --emit=llvm-ir -C opt-level=0
88
+ CARGO_TARGET_DIR=/tmp/zir cargo rustc -- --emit=llvm-ir -C opt-level=2
89
+
90
+ # Assembly for Rust (use instead of emit_asm.sh):
91
+ cargo rustc -- --emit=asm -C opt-level=2
92
+ # Output: target/release/deps/*.s
93
+ ```
94
+
95
+ Pass the resulting `.ll` and `.s` files directly to `diff_ir.sh` and `analyze_asm.sh`.
96
+
97
+ ---
98
+
99
+ ## End-to-End Pipeline
100
+
101
+ The canonical pipeline for C/C++ analysis. Always use a hash of the source path as `<tu_hash>` (not the raw filename) to avoid collisions during parallel TU processing. Clean up temp files on completion or failure.
102
+
103
+ ```bash
104
+ mkdir -p /tmp/zeroize-audit/
105
+
106
+ # Step 1: Extract build-relevant flags for the TU (as a bash array)
107
+ FLAGS=()
108
+ while IFS= read -r flag; do FLAGS+=("$flag"); done < <(
109
+ python {baseDir}/tools/extract_compile_flags.py \
110
+ --compile-db /path/to/build/compile_commands.json \
111
+ --src /path/to/src/crypto.c --format lines)
112
+
113
+ # Step 2: Emit IR at each level in opt_levels (always include O0 as baseline)
114
+ {baseDir}/tools/emit_ir.sh \
115
+ --src /path/to/src/crypto.c \
116
+ --out /tmp/zeroize-audit/<tu_hash>.O0.ll --opt O0 -- "${FLAGS[@]}"
117
+
118
+ {baseDir}/tools/emit_ir.sh \
119
+ --src /path/to/src/crypto.c \
120
+ --out /tmp/zeroize-audit/<tu_hash>.O1.ll --opt O1 -- "${FLAGS[@]}"
121
+
122
+ {baseDir}/tools/emit_ir.sh \
123
+ --src /path/to/src/crypto.c \
124
+ --out /tmp/zeroize-audit/<tu_hash>.O2.ll --opt O2 -- "${FLAGS[@]}"
125
+
126
+ # Step 3: Diff across all levels — O1 is the diagnostic level for simple DSE;
127
+ # O2 catches more aggressive eliminations
128
+ {baseDir}/tools/diff_ir.sh \
129
+ /tmp/zeroize-audit/<tu_hash>.O0.ll \
130
+ /tmp/zeroize-audit/<tu_hash>.O1.ll \
131
+ /tmp/zeroize-audit/<tu_hash>.O2.ll
132
+
133
+ # Step 4: Emit assembly at O2 for register-spill and stack-retention analysis
134
+ {baseDir}/tools/emit_asm.sh \
135
+ --src /path/to/src/crypto.c \
136
+ --out /tmp/zeroize-audit/<tu_hash>.O2.s --opt O2 -- "${FLAGS[@]}"
137
+
138
+ # Step 5: Analyze assembly output
139
+ {baseDir}/tools/analyze_asm.sh /tmp/zeroize-audit/<tu_hash>.O2.s
140
+
141
+ # Cleanup
142
+ rm -rf /tmp/zeroize-audit/<tu_hash>.*
143
+ ```
144
+
145
+ Refer to the IR analysis reference (loaded separately from SKILL.md) for how to interpret IR diffs and identify wipe elimination patterns.
146
+
147
+ ---
148
+
149
+ ## Flags Stripped by extract_compile_flags.py
150
+
151
+ These flags are removed because they are irrelevant to or break single-file IR/ASM emission:
152
+
153
+ | Flag(s) | Reason stripped |
154
+ |---|---|
155
+ | `-o <file>` | Emission tools supply their own `-o` |
156
+ | `-c` | IR/ASM emission uses `-S -emit-llvm` / `-S` instead |
157
+ | `-MF`, `-MT`, `-MQ` (+ argument) | Dependency file generation — irrelevant for analysis |
158
+ | `-MD`, `-MMD`, `-MP`, `-MG` | Dependency generation side-effects |
159
+ | `-pipe` | OS pipe between compiler stages; not meaningful for direct calls |
160
+ | `-save-temps` | Saves intermediate files; produces clutter |
161
+ | `-gsplit-dwarf` | Splits debug info to `.dwo`; incompatible with single-file emission |
162
+ | `-fcrash-diagnostics-dir=...` | Crash report output; irrelevant |
163
+ | `-fmodule-file=...`, `-fmodules-cache-path=...` | Clang module paths; may confuse single-TU invocation |
164
+ | `--serialize-diagnostics` | Clang diagnostic binary output; not needed |
165
+ | `-fdebug-prefix-map=...` | Debug info path remapping; harmless to strip |
166
+ | `-fprofile-generate`, `-fprofile-use=...` | PGO instrumentation; distorts IR for analysis |
167
+ | `-fcoverage-mapping` | Coverage instrumentation; alters IR structure |
168
+
169
+ Flags that are **kept** (build-relevant):
170
+
171
+ | Pattern | Reason kept |
172
+ |---|---|
173
+ | `-I`, `-isystem`, `-iquote` | Include paths required to parse the TU |
174
+ | `-D`, `-U` | Preprocessor defines/undefines that affect code paths |
175
+ | `-std=<val>` | Language standard — affects syntax and semantics |
176
+ | `-f*` security/codegen flags | e.g., `-fstack-protector`, `-fPIC`, `-fno-omit-frame-pointer` |
177
+ | `-m<arch>` | Target architecture flags (e.g., `-m64`, `-march=x86-64`, `-mthumb`) |
178
+ | `-W*` | Warning flags — harmless to pass through |
179
+ | `-pthread` | Threading model; affects macro definitions |
180
+ | `--sysroot=`, `-isysroot` | System root for cross-compilation |
181
+ | `-target <triple>` | Cross-compilation target triple; must be preserved |
182
+
183
+ ---
184
+
185
+ ## Common Pitfalls
186
+
187
+ ### 1. Relative paths and the `"directory"` field
188
+
189
+ `"file": "../src/crypto.c"` is relative to `"directory"`, not to the CWD when running analysis. Always resolve file paths using `"directory"`. `extract_compile_flags.py` does this automatically; be explicit if invoking `clang` manually.
190
+
191
+ ### 2. Multiple entries for the same file
192
+
193
+ Some build systems emit duplicate entries (e.g., with and without a precompiled header). `extract_compile_flags.py` returns the **first** match. If that entry includes `-fpch-preprocess`, the PCH must exist in the build directory for compilation to succeed. Either regenerate the PCH or strip PCH-related flags manually.
194
+
195
+ ### 3. Stale or incomplete compile DB (most common failure)
196
+
197
+ If `bear` or CMake was run on an incremental build, only recompiled TUs are recorded. TUs compiled in a previous run may be missing or have outdated flags. **Always generate the compile DB from a clean build** (`make clean all`, `cargo clean && cargo build`) to ensure all TUs are captured with current flags.
198
+
199
+ `extract_compile_flags.py` exits with code 2 if a source file is not found in the DB. Common causes:
200
+ - Header-only files (no TU entry — expected)
201
+ - Files added after the last `bear`/CMake run
202
+ - Symlinked paths that resolve differently than recorded
203
+
204
+ Regenerate the compile DB if entries are missing.
205
+
206
+ ### 4. Generated source files
207
+
208
+ Entries may point to generated files in the build directory (e.g., `build/generated/config.c`) that don't exist in a clean checkout. Run the build system to generate them before running analysis. Preflight (Step 1 in `task.md`) will catch this if trial compilation is attempted.
209
+
210
+ ### 5. Cross-compilation targets
211
+
212
+ If the compile DB was generated for a cross-compilation target (e.g., `-target aarch64-linux-gnu` or `-target thumbv7m-none-eabi`), emitted IR and assembly will be for that target, not x86-64. This affects analysis in two ways:
213
+
214
+ - **IR diffs**: Only compare IR files emitted for the same target. Do not mix targets across opt levels.
215
+ - **Assembly analysis**: `analyze_asm.sh` adapts register patterns by target:
216
+ - x86-64: callee-saved registers are `rbx`, `r12`–`r15`; spills use `movq`/`movdqa` to `[rsp+N]`
217
+ - AArch64: callee-saved registers are `x19`–`x28`; spills use `str`/`stp` to `[sp, #N]`
218
+ - Thumb/ARM: callee-saved registers are `r4`–`r11`; spills use `str`/`stm` to `[sp, #N]`
219
+
220
+ Ensure `--target` is preserved in the stripped flags (it is, per the kept-flags table above).
221
+
222
+ ### 6. `extract_compile_flags.py` exit codes
223
+
224
+ | Exit code | Meaning |
225
+ |---|---|
226
+ | 0 | Flags extracted successfully; output on stdout |
227
+ | 1 | Compile DB not found or not readable |
228
+ | 2 | Source file not found in compile DB |
229
+ | 3 | Compile DB is malformed JSON |
230
+
231
+ Check the exit code before passing flags to emission tools. An empty `FLAGS` array will silently produce incorrect IR.
@@ -0,0 +1,191 @@
1
+ # Detection Strategy
2
+
3
+ Read this during execution to guide per-step analysis. Steps 1–6 are Phase 1 (source-level); Steps 7–12 are Phase 2 (compiler-level).
4
+
5
+ ---
6
+
7
+ ## Phase 1 — Source-Level Analysis
8
+
9
+ ### Step 1 — Preflight Build Context (mandatory)
10
+ - Verify `compile_db` exists and is readable.
11
+ - Verify compile database entries point to existing files/working directories.
12
+ - Verify the codebase is compilable with the captured commands (or equivalent build invocation).
13
+ - Fail fast if preflight fails; do not continue with partial/source-only analysis.
14
+
15
+ ### Step 2 — Identify Sensitive Objects
16
+
17
+ Scan all TUs for objects matching these heuristics. Each heuristic has a confidence level that propagates to findings.
18
+
19
+ **Name patterns (low confidence)** — match substrings case-insensitively:
20
+ `key`, `secret`, `seed`, `priv`, `sk`, `shared_secret`, `nonce`, `token`, `pwd`, `pass`
21
+
22
+ **Type hints (medium confidence)** — byte buffers, fixed-size arrays, or structs whose names or fields match name patterns above.
23
+
24
+ **Explicit annotations (high confidence)**:
25
+ - Rust: `#[secret]`, `Secret<T>` patterns (configurable)
26
+ - C/C++: `__attribute__((annotate("sensitive")))`, `SENSITIVE` macro (configurable via `explicit_sensitive_markers` in `{baseDir}/configs/default.yaml`)
27
+
28
+ Record each sensitive object with: name, type, location (file:line), confidence level, and the heuristic that matched.
29
+
30
+ ### Step 3 — Detect Zeroization Attempts
31
+
32
+ For each sensitive object identified in Step 2, check whether a call to an approved wipe API (see Approved Wipe APIs in SKILL.md) exists within the same scope or a cleanup function reachable from that scope.
33
+
34
+ Record: wipe API used, location, and whether the wipe was found at all.
35
+
36
+ ### Step 4 — MCP Semantic Pass (when available)
37
+
38
+ Run this step **before** correctness validation so that resolved types, aliases, and cross-file references are available to Steps 5 and 6. Skip and continue if MCP is unavailable in `prefer` mode (see Confidence Gating in SKILL.md).
39
+
40
+ - Run `{baseDir}/tools/mcp/check_mcp.sh` to confirm MCP is live. If it fails and `mcp_mode=require`, stop the run.
41
+ - Activate the project with `activate_project` (pass the repository root path). This must succeed before any other Serena tool can be used. If activation fails, treat MCP as unavailable.
42
+ - For each sensitive object and wipe call, resolve symbol definitions using `find_symbol` (by name, with `include_body: true` for type details) and collect cross-file references using `find_referencing_symbols`.
43
+ - Trace callers and cleanup paths using `find_referencing_symbols` on wipe wrapper functions. For outgoing calls, read the function body from `find_symbol` output and resolve called symbols.
44
+ - Use `get_symbols_overview` to get a high-level view of symbols in a file when exploring unfamiliar TUs.
45
+ - Normalize all MCP output: `python {baseDir}/tools/mcp/normalize_mcp_evidence.py`.
46
+
47
+ Prioritize `find_symbol` queries by sensitive-object name first, then wipe wrapper names. Score confidence: name match alone → `needs_review`; name + type resolved → `likely`; name + type + call chain confirmed → `confirmed`.
48
+
49
+ ### Step 5 — Validate Correctness
50
+
51
+ For each sensitive object with a detected wipe, use type and alias data from Step 4 (if available) to validate:
52
+ - **Size correct**: wipe length matches `sizeof(object)`, not `sizeof(pointer)`. MCP-resolved typedefs and array sizes take precedence over source-level estimates.
53
+ - **All exits covered** (heuristic): wipe is present on normal exit, early return, and error paths visible in source. Flag `NOT_ON_ALL_PATHS` if any path appears uncovered.
54
+ - **Ordering correct**: wipe occurs before `free()` or scope end, not after.
55
+
56
+ Emit `PARTIAL_WIPE` for incorrect size. Emit `NOT_ON_ALL_PATHS` for missing paths (heuristic; CFG analysis in Step 10 provides definitive results).
57
+
58
+ ### Step 6 — Data-Flow and Heap Checks
59
+
60
+ Use cross-file reference data from Step 4 (if available) to extend tracking beyond the current TU.
61
+
62
+ **Data-flow (produces `SECRET_COPY`):**
63
+ - Detect `memcpy()`/`memmove()` copying sensitive buffers.
64
+ - Track struct assignments and array copies of sensitive objects.
65
+ - Flag function arguments passed by value (copies on stack).
66
+ - Flag secrets returned by value.
67
+ - Emit `SECRET_COPY` when any of the above copies exist and no approved wipe is tracked for the copy destination.
68
+
69
+ **Heap (produces `INSECURE_HEAP_ALLOC`):**
70
+ - Detect `malloc`/`calloc`/`realloc` used to allocate sensitive objects.
71
+ - Check for `mlock()`/`madvise(MADV_DONTDUMP)` — note absence as a warning.
72
+ - Recommend secure allocators: `OPENSSL_secure_malloc`, `sodium_malloc`.
73
+
74
+ ---
75
+
76
+ ## Phase 2 — Compiler-Level Analysis
77
+
78
+ All steps in Phase 2 require a valid compile DB and a working `clang` installation. Skip Phase 2 findings if Phase 1 preflight failed.
79
+
80
+ ### Step 7 — IR Comparison (produces `OPTIMIZED_AWAY_ZEROIZE`)
81
+
82
+ For each TU containing sensitive objects:
83
+
84
+ ```bash
85
+ FLAGS=()
86
+ while IFS= read -r flag; do FLAGS+=("$flag"); done < <(
87
+ python {baseDir}/tools/extract_compile_flags.py \
88
+ --compile-db <compile_db> --src <file> --format lines)
89
+
90
+ {baseDir}/tools/emit_ir.sh --src <file> \
91
+ --out /tmp/zeroize-audit/<tu_hash>.O0.ll --opt O0 -- "${FLAGS[@]}"
92
+
93
+ {baseDir}/tools/emit_ir.sh --src <file> \
94
+ --out /tmp/zeroize-audit/<tu_hash>.O1.ll --opt O1 -- "${FLAGS[@]}"
95
+
96
+ {baseDir}/tools/emit_ir.sh --src <file> \
97
+ --out /tmp/zeroize-audit/<tu_hash>.O2.ll --opt O2 -- "${FLAGS[@]}"
98
+
99
+ {baseDir}/tools/diff_ir.sh \
100
+ /tmp/zeroize-audit/<tu_hash>.O0.ll \
101
+ /tmp/zeroize-audit/<tu_hash>.O1.ll \
102
+ /tmp/zeroize-audit/<tu_hash>.O2.ll
103
+ ```
104
+
105
+ Use `<tu_hash>` (a hash of the source path) to avoid collisions when processing multiple TUs.
106
+ `diff_ir.sh` outputs a unified diff to stdout; a non-zero exit code means divergence was detected.
107
+ Clean up `/tmp/zeroize-audit/` on completion or failure.
108
+
109
+ **Interpretation:**
110
+ - Wipe present at O0, absent at O1 → simple dead-store elimination. Flag `OPTIMIZED_AWAY_ZEROIZE`.
111
+ - Wipe present at O1, absent at O2 → aggressive optimization. Flag `OPTIMIZED_AWAY_ZEROIZE`.
112
+ - Include the IR diff as mandatory evidence in the finding.
113
+
114
+ Key IR patterns: `store volatile i8 0` is the primary wipe signal; its absence at O2 when present at O0 is DSE. `@llvm.memset` without the volatile flag is elidable. `alloca` with `@llvm.lifetime.end` and no `store volatile` in the same function indicates stack retention.
115
+
116
+ ### Step 8 — Assembly Analysis (produces `STACK_RETENTION`, `REGISTER_SPILL`)
117
+
118
+ Skip if `enable_asm=false`.
119
+
120
+ ```bash
121
+ {baseDir}/tools/emit_asm.sh --src <file> \
122
+ --out /tmp/zeroize-audit/<tu_hash>.O2.s --opt O2 -- "${FLAGS[@]}"
123
+
124
+ {baseDir}/tools/analyze_asm.sh \
125
+ --asm /tmp/zeroize-audit/<tu_hash>.O2.s \
126
+ --out /tmp/zeroize-audit/<tu_hash>.asm-analysis.json
127
+ ```
128
+
129
+ `analyze_asm.sh` outputs annotated findings to stdout.
130
+
131
+ Check for:
132
+ - **Register spills**: `movq`/`movdqa` of secret values to stack offsets → flag `REGISTER_SPILL`.
133
+ - **Callee-saved registers**: `rbx`, `r12`–`r15` (x86-64) pushed to stack containing secret values → flag `REGISTER_SPILL`.
134
+ - **Stack retention**: stack frame size and whether secret bytes are cleared before `ret` → flag `STACK_RETENTION`.
135
+
136
+ Include the relevant assembly excerpt as mandatory evidence.
137
+
138
+ ### Step 9 — Semantic IR Analysis (produces `LOOP_UNROLLED_INCOMPLETE`)
139
+
140
+ Skip if `enable_semantic_ir=false`.
141
+
142
+ Parse LLVM IR structurally (do not use regex on raw IR text):
143
+ - Build function and basic block representations.
144
+ - Track memory operations in SSA form after the `mem2reg` pass.
145
+ - Detect loop-unrolled zeroization: 4 or more consecutive zero stores.
146
+ - Verify unrolled stores target the correct addresses and cover the full object size.
147
+ - Identify phi nodes and register-promoted variables that may hide secret values.
148
+
149
+ Flag `LOOP_UNROLLED_INCOMPLETE` when unrolling is detected but does not cover the full object.
150
+
151
+ ### Step 10 — Control-Flow Graph Analysis (produces `MISSING_ON_ERROR_PATH`, `NOT_DOMINATING_EXITS`)
152
+
153
+ Skip if `enable_cfg=false`.
154
+
155
+ Build a CFG from source or LLVM IR:
156
+ - Enumerate all execution paths from function entry to exits.
157
+ - Compute dominator sets for all nodes.
158
+ - Verify that a wipe node dominates all exit nodes. If not, flag `NOT_DOMINATING_EXITS`.
159
+ - Identify error paths (early returns, `goto`, exceptions, `longjmp`) that bypass the wipe. Flag `MISSING_ON_ERROR_PATH` for each such path.
160
+
161
+ This step produces definitive results replacing the heuristic `NOT_ON_ALL_PATHS` finding from Step 5. If both are emitted for the same object, keep only the CFG-backed finding.
162
+
163
+ ### Step 11 — Runtime Validation Test Generation
164
+
165
+ Skip if `enable_runtime_tests=false`.
166
+
167
+ For each confirmed finding, generate:
168
+ - A C test harness that allocates the sensitive object and verifies all bytes are zero after the expected wipe point.
169
+ - A MemorySanitizer test (`-fsanitize=memory`) to detect reads of uninitialized or un-zeroed memory.
170
+ - A Valgrind invocation target for leak and memory error detection.
171
+ - A stack canary test to detect stack retention after function return.
172
+
173
+ Output a `Makefile` in `{baseDir}/generated_tests/` that builds and runs all tests with appropriate sanitizer flags.
174
+
175
+ ### Step 12 — PoC Generation (mandatory)
176
+
177
+ Generate proof-of-concept C programs for all findings regardless of confidence. Each PoC exits 0 (exploitable) or 1 (not exploitable):
178
+
179
+ ```bash
180
+ python {baseDir}/tools/generate_poc.py \
181
+ --findings <findings_json> \
182
+ --compile-db <compile_db> \
183
+ --out <poc_output_dir> \
184
+ --categories <poc_categories> \
185
+ --config <config> \
186
+ --no-confidence-filter
187
+ ```
188
+
189
+ After generation, review PoCs for `// TODO` comments and fill them in using source context. Compilation and validation are handled by the orchestrator in Phase 5 (interactive).
190
+
191
+ Key PoC strategies: `OPTIMIZED_AWAY_ZEROIZE` — compile with and without `-O2`, compare memory dumps; `STACK_RETENTION` — call the target function, read stack memory after return; `MISSING_SOURCE_ZEROIZE` — verify bytes are non-zero at function exit. C/C++ findings support all categories. Rust findings support `MISSING_SOURCE_ZEROIZE`, `SECRET_COPY`, and `PARTIAL_WIPE` via `cargo test`; all other Rust categories are marked `poc_supported: false`.
@@ -0,0 +1,252 @@
1
+ # LLVM IR Analysis for Zeroization Auditing
2
+
3
+ This reference covers multi-level IR analysis for detecting compiler-optimized zeroization (dead-store elimination of wipes) and interpreting results. Read this during Step 7 (IR comparison) and Step 9 (semantic IR analysis) in `task.md`. For flag extraction and pipeline setup, refer to the compile-commands reference (loaded separately from SKILL.md).
4
+
5
+ ---
6
+
7
+ ## Optimization Level Semantics
8
+
9
+ | Level | What changes | Relevance to zeroization |
10
+ |---|---|---|
11
+ | **O0** | No optimization. All stores kept. | Baseline — wipe always present if written in source |
12
+ | **O1** | Basic optimizations. Simple dead-store elimination begins. | Diagnostic level: if wipe vanishes here, it's simple DSE. Fix is straightforward. |
13
+ | **O2** | Full DSE, inlining, SROA, alias analysis. | Most production builds. Most non-volatile wipes removed here. |
14
+ | **O3** | Aggressive vectorization, loop transforms, more inlining. | Rarely removes more wipes than O2, but can for loop-based wipes. |
15
+ | **Os/Oz** | Size-optimized. May collapse wipe loops into `memset`. | Verify wipe survives after size optimization; collapsed `memset` may become DSE-vulnerable. |
16
+
17
+ **Always include O0 as the unoptimized baseline**, regardless of the `opt_levels` input. O1 is the diagnostic level — if the wipe disappears there, the cause is simple DSE and the fix is straightforward. If the wipe only disappears at O2 or O3, proceed to the multi-level root cause analysis below.
18
+
19
+ ---
20
+
21
+ ## Emitting IR at Multiple Levels
22
+
23
+ Extract flags once, then emit IR for each level in `opt_levels`. Use `<tu_hash>` (a hash of the source path) to avoid collisions during parallel TU processing. Always clean up temp files on completion or failure.
24
+
25
+ ```bash
26
+ mkdir -p /tmp/zeroize-audit/
27
+
28
+ FLAGS=()
29
+ while IFS= read -r flag; do FLAGS+=("$flag"); done < <(
30
+ python {baseDir}/tools/extract_compile_flags.py \
31
+ --compile-db build/compile_commands.json \
32
+ --src src/crypto.c --format lines)
33
+
34
+ # Emit IR for each level in opt_levels (O0 always included as baseline)
35
+ for OPT in O0 O1 O2; do
36
+ {baseDir}/tools/emit_ir.sh \
37
+ --src src/crypto.c \
38
+ --out /tmp/zeroize-audit/<tu_hash>.${OPT}.ll \
39
+ --opt ${OPT} -- "${FLAGS[@]}"
40
+ done
41
+
42
+ # Diff all levels — prints pairwise diffs and a WIPE PATTERN SUMMARY
43
+ {baseDir}/tools/diff_ir.sh \
44
+ /tmp/zeroize-audit/<tu_hash>.O0.ll \
45
+ /tmp/zeroize-audit/<tu_hash>.O1.ll \
46
+ /tmp/zeroize-audit/<tu_hash>.O2.ll
47
+
48
+ # Cleanup
49
+ rm -f /tmp/zeroize-audit/<tu_hash>.*.ll
50
+ ```
51
+
52
+ For Rust TUs, `emit_ir.sh` does not apply. Use `cargo rustc -- --emit=llvm-ir -C opt-level=N` instead and pass the resulting `.ll` files directly to `diff_ir.sh`. Use `bear -- cargo build` to generate `compile_commands.json` for Rust projects.
53
+
54
+ ---
55
+
56
+ ## LLVM IR Zeroization Patterns
57
+
58
+ ### DSE-safe patterns (survive optimization)
59
+
60
+ These indicate a secure wipe the compiler cannot remove.
61
+
62
+ **Volatile memset intrinsic** — the `i1 true` (volatile) flag prevents DSE:
63
+ ```llvm
64
+ call void @llvm.memset.p0i8.i64(i8* volatile %ptr, i8 0, i64 32, i1 true)
65
+ ```
66
+
67
+ **Volatile zero stores** — volatile side effects must be preserved:
68
+ ```llvm
69
+ store volatile i8 0, i8* %ptr, align 1
70
+ store volatile i64 0, i64* %ptr, align 8
71
+ ```
72
+
73
+ **Opaque wipe function calls** — DSE cannot remove calls to external functions with unknown side effects:
74
+ ```llvm
75
+ call void @explicit_bzero(i8* %key, i64 32)
76
+ call void @sodium_memzero(i8* %key, i64 32)
77
+ call void @OPENSSL_cleanse(i8* %key, i64 32)
78
+ call void @SecureZeroMemory(i8* %key, i64 32)
79
+ ```
80
+
81
+ **`memset_s`** — defined by C11 to be non-optimizable:
82
+ ```llvm
83
+ call i32 @memset_s(i8* %key, i64 32, i32 0, i64 32)
84
+ ```
85
+
86
+ **Rust `zeroize` crate** — emits volatile stores via the `Zeroize` trait; look for:
87
+ ```llvm
88
+ store volatile i8 0, i8* %ptr, align 1 ; repeated per byte, or as unrolled loop
89
+ ```
90
+
91
+ ---
92
+
93
+ ### DSE-vulnerable patterns (may be removed at O1 or O2)
94
+
95
+ **Non-volatile memset intrinsic** — `i1 false` is the most common `OPTIMIZED_AWAY_ZEROIZE` pattern:
96
+ ```llvm
97
+ call void @llvm.memset.p0i8.i64(i8* %ptr, i8 0, i64 32, i1 false)
98
+ ```
99
+
100
+ **Non-volatile zero stores** — any non-volatile store to a dead location is DSE-eligible:
101
+ ```llvm
102
+ store i8 0, i8* %ptr, align 1
103
+ store i64 0, i64* %ptr, align 8
104
+ store i32 0, i32* %ptr, align 4
105
+ ```
106
+
107
+ **Standard `memset` inlined to non-volatile intrinsic** — `memset(key, 0, 32)` in source is lowered by Clang to `@llvm.memset ... i1 false`. The source used `memset` but the IR form is DSE-vulnerable. This is the most frequent source of confusion.
108
+
109
+ ---
110
+
111
+ ## Reading an IR Diff: Concrete Before/After Example
112
+
113
+ **Source (C):**
114
+ ```c
115
+ void handle_request(uint8_t session_key[32]) {
116
+ // ... use session_key ...
117
+ memset(session_key, 0, 32); // intended cleanup
118
+ }
119
+ ```
120
+
121
+ **O0 IR — wipe present:**
122
+ ```llvm
123
+ define void @handle_request(i8* %session_key) {
124
+ entry:
125
+ ; ... computation uses session_key ...
126
+ call void @llvm.memset.p0i8.i64(i8* %session_key, i8 0, i64 32, i1 false)
127
+ ret void
128
+ }
129
+ ```
130
+
131
+ **O2 IR — wipe removed by DSE:**
132
+ ```llvm
133
+ define void @handle_request(i8* %session_key) {
134
+ entry:
135
+ ; ... computation ...
136
+ ; llvm.memset REMOVED — no read from session_key after the store;
137
+ ; optimizer treats it as a dead store and eliminates it.
138
+ ret void
139
+ }
140
+ ```
141
+
142
+ **`diff_ir.sh` output:**
143
+ ```
144
+ === DIFF: O0.ll vs O2.ll ===
145
+ - call void @llvm.memset.p0i8.i64(i8* %session_key, i8 0, i64 32, i1 false)
146
+
147
+ === WIPE PATTERN SUMMARY ===
148
+ O0.ll: WIPE PRESENT
149
+ O1.ll: WIPE PRESENT
150
+ O2.ll: WIPE ABSENT <-- first disappearance
151
+ ```
152
+
153
+ Lines starting with `-` are present in the lower-opt file but absent in the higher-opt file. A `-` line containing any of the following tokens is direct evidence of `OPTIMIZED_AWAY_ZEROIZE`:
154
+
155
+ `llvm.memset`, `store i8 0`, `store i64 0`, `store i32 0`, `@explicit_bzero`, `@sodium_memzero`, `@OPENSSL_cleanse`, `@SecureZeroMemory`
156
+
157
+ ---
158
+
159
+ ## Multi-Level Root Cause Analysis
160
+
161
+ The level at which the wipe first disappears narrows the root cause and determines the appropriate fix:
162
+
163
+ ```
164
+ O0 → WIPE PRESENT (baseline — wipe was written in source)
165
+ O1 → WIPE ABSENT → Simple dead-store elimination (basic DSE pass)
166
+ Fix: replace memset with explicit_bzero or volatile wipe loop
167
+ O2 → WIPE ABSENT → One or more of:
168
+ (first disappearance) • DSE + inlining: wipe is in a helper inlined into caller,
169
+ becomes dead store in caller's context
170
+ • SROA: struct/array promoted to scalars; individual
171
+ zero stores become DSE-eligible
172
+ • Alias analysis: proves no live uses after the wipe
173
+ Fix: use explicit_bzero; ensure wipe is not inside
174
+ an inlined callee (see Inlining section below)
175
+ O3 → WIPE ABSENT → Aggressive loop transforms or vectorization eliminated
176
+ (only here) a loop-based wipe
177
+ Fix: replace wipe loop with explicit_bzero or volatile loop
178
+ ```
179
+
180
+ If the wipe disappears at O1, a simple `explicit_bzero` or `volatile` qualifier is sufficient. If it only disappears at O2 due to inlining, also ensure the wipe is not inside a callee that gets inlined at the call site.
181
+
182
+ ---
183
+
184
+ ## Advanced IR Analysis Scenarios
185
+
186
+ ### Inlining and cross-function DSE
187
+
188
+ When a cleanup wrapper (e.g., `zeroize_key()`) is inlined into a caller, the wipe may become a dead store in the caller's context even if it survives in the callee's IR. Always emit IR for the **calling** TU — this is where inlining occurs:
189
+
190
+ ```bash
191
+ # zeroize_key() defined in utils.c, called from crypto.c
192
+ # Emit IR for the caller — inlining happens here:
193
+ FLAGS=()
194
+ while IFS= read -r flag; do FLAGS+=("$flag"); done < <(
195
+ python {baseDir}/tools/extract_compile_flags.py \
196
+ --compile-db build/compile_commands.json --src src/crypto.c --format lines)
197
+
198
+ {baseDir}/tools/emit_ir.sh \
199
+ --src src/crypto.c \
200
+ --out /tmp/zeroize-audit/<tu_hash>.O2.ll --opt O2 -- "${FLAGS[@]}"
201
+ ```
202
+
203
+ If the wipe is present in `utils.c` IR but absent in `crypto.c` IR at O2, the cause is cross-function DSE after inlining. Mark the `OPTIMIZED_AWAY_ZEROIZE` finding on the call site in `crypto.c`, not on `utils.c`.
204
+
205
+ ### SROA (Scalar Replacement of Aggregates)
206
+
207
+ At O1+, SROA promotes small structs and arrays to individual scalar SSA values (registers). A `memset` of a struct may become a series of individual `store i32 0` / `store i8 0` instructions per field — each then eligible for DSE independently. In the diff, look for:
208
+ - O0: single `llvm.memset` covering the struct
209
+ - O1/O2: the `memset` is replaced by per-field zero stores, then those stores are removed
210
+
211
+ This means the wipe may partially survive SROA (some fields zeroed, others eliminated). Check that **all** fields of a sensitive struct are covered, not just the first.
212
+
213
+ ### Loop unrolling of wipe loops
214
+
215
+ A manual wipe loop:
216
+ ```c
217
+ for (int i = 0; i < 32; i++) key[i] = 0;
218
+ ```
219
+ may be unrolled at O2 into 32 consecutive `store i8 0` instructions. If unrolling is incomplete (e.g., only 16 of 32 iterations unrolled and the remainder is a DSE-eligible tail), flag `LOOP_UNROLLED_INCOMPLETE`. Use `{baseDir}/tools/analyze_ir_semantic.py` for automated detection — do not use regex on raw IR text. The semantic tool builds a proper basic block representation and counts consecutive zero stores with address verification.
220
+
221
+ ### Phi nodes and register-promoted secrets
222
+
223
+ After `mem2reg`, secret values that were stack-allocated may be promoted to SSA values tracked through phi nodes. A wipe of the original stack slot may not reach all SSA uses. Look for:
224
+ ```llvm
225
+ %key.0 = phi i64 [ %loaded_key, %entry ], [ 0, %cleanup ]
226
+ ```
227
+ If `%key.0` is used after the phi but the `0` arm is only reached on one path, the secret may persist in the non-zero arm. Flag as `NOT_DOMINATING_EXITS` if CFG analysis confirms it.
228
+
229
+ ---
230
+
231
+ ## Populating `compiler_evidence` in the Report
232
+
233
+ For each `OPTIMIZED_AWAY_ZEROIZE` finding, populate the output schema fields as follows. `OPTIMIZED_AWAY_ZEROIZE` is **never valid without IR diff evidence** — do not emit this finding from source-level analysis alone.
234
+
235
+ ```json
236
+ {
237
+ "category": "OPTIMIZED_AWAY_ZEROIZE",
238
+ "compiler_evidence": {
239
+ "opt_levels": ["O0", "O1", "O2"],
240
+ "o0": "call void @llvm.memset.p0i8.i64(i8* %session_key, i8 0, i64 32, i1 false) present at line 88.",
241
+ "o1": "WIPE PRESENT at O1.",
242
+ "o2": "llvm.memset call absent at O2 — dead store eliminated after SROA promotes session_key to registers.",
243
+ "diff_summary": "Wipe first disappears at O2. Non-volatile memset(session_key, 0, 32) eliminated by DSE after SROA. Fix: replace memset with explicit_bzero."
244
+ }
245
+ }
246
+ ```
247
+
248
+ Field usage notes:
249
+ - `opt_levels`: list every level that was emitted, not just the levels where the wipe changed.
250
+ - `o0` through `o2` (and `o1`, `o3` if analyzed): state explicitly whether the wipe is PRESENT or ABSENT at each level, with a short IR excerpt if present.
251
+ - If the wipe only disappears at O3 but is present at O2: set `o2` to `"WIPE PRESENT at O2"` and document the O3 removal in `diff_summary`.
252
+ - `diff_summary`: always identify the first disappearance level and the most likely optimization pass responsible (DSE, inlining, SROA, alias analysis, loop transform).