@vigolium/piolium 0.0.1
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/LICENSE +21 -0
- package/README.md +117 -0
- package/agents/access-auditor.md +300 -0
- package/agents/assumption-breaker.md +154 -0
- package/agents/attack-designer.md +116 -0
- package/agents/code-scanner.md +139 -0
- package/agents/concurrency-auditor.md +238 -0
- package/agents/confirm-writer.md +257 -0
- package/agents/context-reviewer.md +274 -0
- package/agents/cross-verifier.md +165 -0
- package/agents/cve-scout.md +381 -0
- package/agents/env-builder.md +282 -0
- package/agents/env-profiler.md +205 -0
- package/agents/evidence-collector.md +140 -0
- package/agents/finding-grader.md +142 -0
- package/agents/finding-writer.md +148 -0
- package/agents/flow-tracer.md +106 -0
- package/agents/goal-backtracer.md +146 -0
- package/agents/history-miner.md +467 -0
- package/agents/independent-verifier.md +118 -0
- package/agents/intent-mapper.md +183 -0
- package/agents/longshot-collector.md +128 -0
- package/agents/longshot-prober.md +126 -0
- package/agents/patch-auditor.md +73 -0
- package/agents/poc-author.md +124 -0
- package/agents/poc-runner.md +194 -0
- package/agents/probe-lead.md +269 -0
- package/agents/red-challenger.md +101 -0
- package/agents/report-composer.md +208 -0
- package/agents/review-adjudicator.md +216 -0
- package/agents/spec-auditor.md +155 -0
- package/agents/taint-tracer.md +265 -0
- package/agents/test-locator.md +209 -0
- package/agents/threat-modeler.md +132 -0
- package/agents/variant-scanner.md +108 -0
- package/agents/variant-spotter.md +110 -0
- package/bin/piolium.mjs +376 -0
- package/extensions/piolium/_vendor/yaml.bundle.d.mts +6 -0
- package/extensions/piolium/_vendor/yaml.bundle.mjs +139 -0
- package/extensions/piolium/agent-runner.ts +322 -0
- package/extensions/piolium/agents.ts +266 -0
- package/extensions/piolium/audit-state.ts +522 -0
- package/extensions/piolium/bundled-resources.ts +97 -0
- package/extensions/piolium/candidate-scan.ts +966 -0
- package/extensions/piolium/command-target.ts +177 -0
- package/extensions/piolium/console-stream.ts +57 -0
- package/extensions/piolium/export-results.ts +380 -0
- package/extensions/piolium/findings.ts +448 -0
- package/extensions/piolium/heartbeat.ts +182 -0
- package/extensions/piolium/help.ts +234 -0
- package/extensions/piolium/index.ts +1865 -0
- package/extensions/piolium/longshot.ts +530 -0
- package/extensions/piolium/matcher-suggestions.ts +196 -0
- package/extensions/piolium/matcher-utils.ts +83 -0
- package/extensions/piolium/modes/balanced.ts +750 -0
- package/extensions/piolium/modes/confirm-bootstrap.ts +186 -0
- package/extensions/piolium/modes/confirm.ts +697 -0
- package/extensions/piolium/modes/deep.ts +917 -0
- package/extensions/piolium/modes/diff.ts +177 -0
- package/extensions/piolium/modes/lite.ts +540 -0
- package/extensions/piolium/modes/longshot.ts +595 -0
- package/extensions/piolium/modes/merge.ts +204 -0
- package/extensions/piolium/modes/phase-runner.ts +267 -0
- package/extensions/piolium/modes/reinvest.ts +546 -0
- package/extensions/piolium/modes/revisit.ts +279 -0
- package/extensions/piolium/modes.ts +48 -0
- package/extensions/piolium/phase-labels.ts +123 -0
- package/extensions/piolium/phase-status-strip.ts +92 -0
- package/extensions/piolium/prompt-prefix-editor.ts +39 -0
- package/extensions/piolium/providers/anthropic-vertex.ts +836 -0
- package/extensions/piolium/recon.ts +409 -0
- package/extensions/piolium/result-stats.ts +105 -0
- package/extensions/piolium/retry.ts +120 -0
- package/extensions/piolium/scheduler.ts +212 -0
- package/extensions/piolium/secrets.ts +368 -0
- package/extensions/piolium/tools/web-tools.ts +148 -0
- package/package.json +77 -0
- package/skills/agentic-actions-auditor/SKILL.md +327 -0
- package/skills/agentic-actions-auditor/references/action-profiles.md +186 -0
- package/skills/agentic-actions-auditor/references/cross-file-resolution.md +209 -0
- package/skills/agentic-actions-auditor/references/foundations.md +94 -0
- package/skills/agentic-actions-auditor/references/vector-a-env-var-intermediary.md +77 -0
- package/skills/agentic-actions-auditor/references/vector-b-direct-expression-injection.md +83 -0
- package/skills/agentic-actions-auditor/references/vector-c-cli-data-fetch.md +83 -0
- package/skills/agentic-actions-auditor/references/vector-d-pr-target-checkout.md +88 -0
- package/skills/agentic-actions-auditor/references/vector-e-error-log-injection.md +88 -0
- package/skills/agentic-actions-auditor/references/vector-f-subshell-expansion.md +82 -0
- package/skills/agentic-actions-auditor/references/vector-g-eval-of-ai-output.md +91 -0
- package/skills/agentic-actions-auditor/references/vector-h-dangerous-sandbox-configs.md +102 -0
- package/skills/agentic-actions-auditor/references/vector-i-wildcard-allowlists.md +88 -0
- package/skills/audit/SKILL.md +562 -0
- package/skills/audit/assets/icon.svg +7 -0
- package/skills/audit/hooks/scripts/validate_phase_output.py +550 -0
- package/skills/audit/references/adversarial-review.md +148 -0
- package/skills/audit/references/architecture-aware-sast.md +306 -0
- package/skills/audit/references/audit-workflow.md +737 -0
- package/skills/audit/references/chamber-protocol.md +384 -0
- package/skills/audit/references/creative-attack-modes.md +221 -0
- package/skills/audit/references/deep-analysis.md +273 -0
- package/skills/audit/references/domain-attack-playbooks.md +1129 -0
- package/skills/audit/references/knowledge-base-template.md +513 -0
- package/skills/audit/references/real-env-validation.md +191 -0
- package/skills/audit/references/report-templates.md +417 -0
- package/skills/audit/references/triage-and-prereqs.md +134 -0
- package/skills/audit/scripts/consolidate_drafts.py +554 -0
- package/skills/audit/scripts/partition_findings.py +152 -0
- package/skills/audit/scripts/rg-hotspots.sh +121 -0
- package/skills/audit/scripts/stamp_file_state.py +349 -0
- package/skills/code-reviewer/SKILL.md +65 -0
- package/skills/codeql/SKILL.md +281 -0
- package/skills/codeql/references/build-fixes.md +90 -0
- package/skills/codeql/references/diagnostic-query-templates.md +339 -0
- package/skills/codeql/references/extension-yaml-format.md +209 -0
- package/skills/codeql/references/important-only-suite.md +153 -0
- package/skills/codeql/references/language-details.md +207 -0
- package/skills/codeql/references/macos-arm64e-workaround.md +179 -0
- package/skills/codeql/references/performance-tuning.md +111 -0
- package/skills/codeql/references/quality-assessment.md +172 -0
- package/skills/codeql/references/ruleset-catalog.md +63 -0
- package/skills/codeql/references/run-all-suite.md +92 -0
- package/skills/codeql/references/sarif-processing.md +79 -0
- package/skills/codeql/references/threat-models.md +51 -0
- package/skills/codeql/workflows/build-database.md +280 -0
- package/skills/codeql/workflows/create-data-extensions.md +261 -0
- package/skills/codeql/workflows/run-analysis.md +301 -0
- package/skills/differential-review/SKILL.md +220 -0
- package/skills/differential-review/adversarial.md +203 -0
- package/skills/differential-review/methodology.md +234 -0
- package/skills/differential-review/patterns.md +300 -0
- package/skills/differential-review/reporting.md +369 -0
- package/skills/fp-check/SKILL.md +125 -0
- package/skills/fp-check/references/bug-class-verification.md +114 -0
- package/skills/fp-check/references/deep-verification.md +143 -0
- package/skills/fp-check/references/evidence-templates.md +91 -0
- package/skills/fp-check/references/false-positive-patterns.md +115 -0
- package/skills/fp-check/references/gate-reviews.md +27 -0
- package/skills/fp-check/references/standard-verification.md +78 -0
- package/skills/insecure-defaults/SKILL.md +117 -0
- package/skills/insecure-defaults/references/examples.md +409 -0
- package/skills/last30days/SKILL.md +444 -0
- package/skills/sarif-parsing/SKILL.md +483 -0
- package/skills/sarif-parsing/resources/jq-queries.md +162 -0
- package/skills/sarif-parsing/resources/sarif_helpers.py +331 -0
- package/skills/security-threat-model/LICENSE.txt +201 -0
- package/skills/security-threat-model/SKILL.md +81 -0
- package/skills/security-threat-model/agents/openai.yaml +4 -0
- package/skills/security-threat-model/references/prompt-template.md +255 -0
- package/skills/security-threat-model/references/security-controls-and-assets.md +32 -0
- package/skills/semgrep/SKILL.md +212 -0
- package/skills/semgrep/references/rulesets.md +162 -0
- package/skills/semgrep/references/scan-modes.md +110 -0
- package/skills/semgrep/references/scanner-task-prompt.md +140 -0
- package/skills/semgrep/scripts/merge_sarif.py +203 -0
- package/skills/semgrep/workflows/scan-workflow.md +311 -0
- package/skills/semgrep-rule-creator/SKILL.md +168 -0
- package/skills/semgrep-rule-creator/references/quick-reference.md +202 -0
- package/skills/semgrep-rule-creator/references/workflow.md +240 -0
- package/skills/semgrep-rule-variant-creator/SKILL.md +205 -0
- package/skills/semgrep-rule-variant-creator/references/applicability-analysis.md +250 -0
- package/skills/semgrep-rule-variant-creator/references/language-syntax-guide.md +324 -0
- package/skills/semgrep-rule-variant-creator/references/workflow.md +518 -0
- package/skills/sharp-edges/SKILL.md +292 -0
- package/skills/sharp-edges/references/auth-patterns.md +252 -0
- package/skills/sharp-edges/references/case-studies.md +274 -0
- package/skills/sharp-edges/references/config-patterns.md +333 -0
- package/skills/sharp-edges/references/crypto-apis.md +190 -0
- package/skills/sharp-edges/references/lang-c.md +205 -0
- package/skills/sharp-edges/references/lang-csharp.md +285 -0
- package/skills/sharp-edges/references/lang-go.md +270 -0
- package/skills/sharp-edges/references/lang-java.md +263 -0
- package/skills/sharp-edges/references/lang-javascript.md +269 -0
- package/skills/sharp-edges/references/lang-kotlin.md +265 -0
- package/skills/sharp-edges/references/lang-php.md +245 -0
- package/skills/sharp-edges/references/lang-python.md +274 -0
- package/skills/sharp-edges/references/lang-ruby.md +273 -0
- package/skills/sharp-edges/references/lang-rust.md +272 -0
- package/skills/sharp-edges/references/lang-swift.md +287 -0
- package/skills/sharp-edges/references/language-specific.md +588 -0
- package/skills/spec-to-code-compliance/SKILL.md +357 -0
- package/skills/spec-to-code-compliance/resources/COMPLETENESS_CHECKLIST.md +69 -0
- package/skills/spec-to-code-compliance/resources/IR_EXAMPLES.md +417 -0
- package/skills/spec-to-code-compliance/resources/OUTPUT_REQUIREMENTS.md +105 -0
- package/skills/supply-chain-risk-auditor/SKILL.md +67 -0
- package/skills/supply-chain-risk-auditor/resources/results-template.md +41 -0
- package/skills/variant-analysis/METHODOLOGY.md +327 -0
- package/skills/variant-analysis/SKILL.md +142 -0
- package/skills/variant-analysis/resources/codeql/cpp.ql +119 -0
- package/skills/variant-analysis/resources/codeql/go.ql +69 -0
- package/skills/variant-analysis/resources/codeql/java.ql +71 -0
- package/skills/variant-analysis/resources/codeql/javascript.ql +63 -0
- package/skills/variant-analysis/resources/codeql/python.ql +80 -0
- package/skills/variant-analysis/resources/semgrep/cpp.yaml +98 -0
- package/skills/variant-analysis/resources/semgrep/go.yaml +63 -0
- package/skills/variant-analysis/resources/semgrep/java.yaml +61 -0
- package/skills/variant-analysis/resources/semgrep/javascript.yaml +60 -0
- package/skills/variant-analysis/resources/semgrep/python.yaml +72 -0
- package/skills/variant-analysis/resources/variant-report-template.md +75 -0
- package/skills/vuln-report/SKILL.md +137 -0
- package/skills/vuln-report/agents/openai.yaml +4 -0
- package/skills/vuln-report/references/report-template.md +135 -0
- package/skills/wooyun-legacy/SKILL.md +367 -0
- package/skills/wooyun-legacy/references/bank-penetration.md +222 -0
- package/skills/wooyun-legacy/references/checklists/command-execution-checklist.md +119 -0
- package/skills/wooyun-legacy/references/checklists/csrf-checklist.md +74 -0
- package/skills/wooyun-legacy/references/checklists/file-upload-checklist.md +108 -0
- package/skills/wooyun-legacy/references/checklists/info-disclosure-checklist.md +114 -0
- package/skills/wooyun-legacy/references/checklists/logic-flaws-checklist.md +95 -0
- package/skills/wooyun-legacy/references/checklists/misconfig-checklist.md +124 -0
- package/skills/wooyun-legacy/references/checklists/path-traversal-checklist.md +87 -0
- package/skills/wooyun-legacy/references/checklists/rce-checklist.md +93 -0
- package/skills/wooyun-legacy/references/checklists/sql-injection-checklist.md +97 -0
- package/skills/wooyun-legacy/references/checklists/ssrf-checklist.md +99 -0
- package/skills/wooyun-legacy/references/checklists/unauthorized-access-checklist.md +89 -0
- package/skills/wooyun-legacy/references/checklists/weak-password-checklist.md +115 -0
- package/skills/wooyun-legacy/references/checklists/xss-checklist.md +103 -0
- package/skills/wooyun-legacy/references/checklists/xxe-checklist.md +130 -0
- package/skills/wooyun-legacy/references/info-disclosure.md +975 -0
- package/skills/wooyun-legacy/references/logic-flaws.md +721 -0
- package/skills/wooyun-legacy/references/path-traversal.md +1191 -0
- package/skills/wooyun-legacy/references/telecom-penetration.md +156 -0
- package/skills/wooyun-legacy/references/unauthorized-access.md +980 -0
- package/skills/wooyun-legacy/references/xss.md +746 -0
- package/skills/zeroize-audit/SKILL.md +371 -0
- package/skills/zeroize-audit/configs/c.yaml +21 -0
- package/skills/zeroize-audit/configs/default.yaml +128 -0
- package/skills/zeroize-audit/configs/rust.yaml +83 -0
- package/skills/zeroize-audit/prompts/report_template.md +238 -0
- package/skills/zeroize-audit/prompts/system.md +163 -0
- package/skills/zeroize-audit/prompts/task.md +97 -0
- package/skills/zeroize-audit/references/compile-commands.md +231 -0
- package/skills/zeroize-audit/references/detection-strategy.md +191 -0
- package/skills/zeroize-audit/references/ir-analysis.md +252 -0
- package/skills/zeroize-audit/references/mcp-analysis.md +221 -0
- package/skills/zeroize-audit/references/poc-generation.md +470 -0
- package/skills/zeroize-audit/references/rust-zeroization-patterns.md +867 -0
- package/skills/zeroize-audit/schemas/input.json +83 -0
- package/skills/zeroize-audit/schemas/output.json +140 -0
- package/skills/zeroize-audit/tools/analyze_asm.sh +202 -0
- package/skills/zeroize-audit/tools/analyze_cfg.py +381 -0
- package/skills/zeroize-audit/tools/analyze_heap.sh +211 -0
- package/skills/zeroize-audit/tools/analyze_ir_semantic.py +429 -0
- package/skills/zeroize-audit/tools/diff_ir.sh +135 -0
- package/skills/zeroize-audit/tools/diff_rust_mir.sh +189 -0
- package/skills/zeroize-audit/tools/emit_asm.sh +67 -0
- package/skills/zeroize-audit/tools/emit_ir.sh +77 -0
- package/skills/zeroize-audit/tools/emit_rust_asm.sh +178 -0
- package/skills/zeroize-audit/tools/emit_rust_ir.sh +150 -0
- package/skills/zeroize-audit/tools/emit_rust_mir.sh +158 -0
- package/skills/zeroize-audit/tools/extract_compile_flags.py +284 -0
- package/skills/zeroize-audit/tools/generate_poc.py +1329 -0
- package/skills/zeroize-audit/tools/mcp/apply_confidence_gates.py +113 -0
- package/skills/zeroize-audit/tools/mcp/check_mcp.sh +68 -0
- package/skills/zeroize-audit/tools/mcp/normalize_mcp_evidence.py +125 -0
- package/skills/zeroize-audit/tools/scripts/check_llvm_patterns.py +481 -0
- package/skills/zeroize-audit/tools/scripts/check_mir_patterns.py +554 -0
- package/skills/zeroize-audit/tools/scripts/check_rust_asm.py +424 -0
- package/skills/zeroize-audit/tools/scripts/check_rust_asm_aarch64.py +300 -0
- package/skills/zeroize-audit/tools/scripts/check_rust_asm_x86.py +283 -0
- package/skills/zeroize-audit/tools/scripts/find_dangerous_apis.py +375 -0
- package/skills/zeroize-audit/tools/scripts/semantic_audit.py +923 -0
- package/skills/zeroize-audit/tools/track_dataflow.sh +196 -0
- package/skills/zeroize-audit/tools/validate_rust_toolchain.sh +298 -0
- package/skills/zeroize-audit/workflows/phase-0-preflight.md +150 -0
- package/skills/zeroize-audit/workflows/phase-1-source-analysis.md +144 -0
- package/skills/zeroize-audit/workflows/phase-2-compiler-analysis.md +139 -0
- package/skills/zeroize-audit/workflows/phase-3-interim-report.md +46 -0
- package/skills/zeroize-audit/workflows/phase-4-poc-generation.md +46 -0
- package/skills/zeroize-audit/workflows/phase-5-poc-validation.md +136 -0
- package/skills/zeroize-audit/workflows/phase-6-final-report.md +44 -0
- package/skills/zeroize-audit/workflows/phase-7-test-generation.md +42 -0
- package/themes/piolium-srcery.json +94 -0
|
@@ -0,0 +1,231 @@
|
|
|
1
|
+
# Working with compile_commands.json
|
|
2
|
+
|
|
3
|
+
This reference covers how to generate and use `compile_commands.json` for the zeroize-audit IR/ASM analysis pipeline. Read this before running Step 7 (IR comparison) or Step 8 (assembly analysis) in `task.md`.
|
|
4
|
+
|
|
5
|
+
---
|
|
6
|
+
|
|
7
|
+
## Structure
|
|
8
|
+
|
|
9
|
+
`compile_commands.json` is a JSON array where each entry describes the exact compiler invocation for one translation unit (TU):
|
|
10
|
+
|
|
11
|
+
```json
|
|
12
|
+
[
|
|
13
|
+
{
|
|
14
|
+
"directory": "/path/to/project/build",
|
|
15
|
+
"arguments": [
|
|
16
|
+
"clang", "-std=c11", "-I../include", "-DNDEBUG", "-Wall",
|
|
17
|
+
"-c", "../src/crypto.c", "-o", "crypto.c.o"
|
|
18
|
+
],
|
|
19
|
+
"file": "../src/crypto.c"
|
|
20
|
+
},
|
|
21
|
+
{
|
|
22
|
+
"directory": "/path/to/project/build",
|
|
23
|
+
"command": "clang++ -std=c++17 -I../include -DNDEBUG -c ../src/aead.cpp -o aead.cpp.o",
|
|
24
|
+
"file": "../src/aead.cpp"
|
|
25
|
+
}
|
|
26
|
+
]
|
|
27
|
+
```
|
|
28
|
+
|
|
29
|
+
**`arguments` vs `command`**: Some tools produce an `arguments` array (preferred); others produce a `command` string. `extract_compile_flags.py` handles both forms transparently.
|
|
30
|
+
|
|
31
|
+
**`directory`**: The working directory for the invocation. All relative paths in `arguments`/`command` and `file` are resolved against this field — **not** against the current working directory when running analysis. `extract_compile_flags.py` handles this automatically; manual invocations must account for it.
|
|
32
|
+
|
|
33
|
+
---
|
|
34
|
+
|
|
35
|
+
## Generating compile_commands.json
|
|
36
|
+
|
|
37
|
+
### CMake (C/C++)
|
|
38
|
+
|
|
39
|
+
```bash
|
|
40
|
+
cmake -B build -DCMAKE_EXPORT_COMPILE_COMMANDS=ON
|
|
41
|
+
# Output: build/compile_commands.json
|
|
42
|
+
```
|
|
43
|
+
|
|
44
|
+
**Constraints**: Works only with Makefile and Ninja generators. Does not work with Xcode or MSVC generators. Run from the project root and point `--compile-db` at `build/compile_commands.json`.
|
|
45
|
+
|
|
46
|
+
### Bear (any Make-based build system)
|
|
47
|
+
|
|
48
|
+
Bear intercepts compiler invocations at the OS level. Works with any `make`-based or custom build system:
|
|
49
|
+
|
|
50
|
+
```bash
|
|
51
|
+
# Install: apt install bear OR brew install bear
|
|
52
|
+
bear -- make clean all # clean build recommended for accuracy
|
|
53
|
+
# Output: compile_commands.json in the current directory
|
|
54
|
+
```
|
|
55
|
+
|
|
56
|
+
Use `make clean all` rather than `make` alone to ensure all TUs are recompiled and captured. Incremental builds will only record the files that were actually recompiled.
|
|
57
|
+
|
|
58
|
+
### intercept-build (LLVM scan-build companion)
|
|
59
|
+
|
|
60
|
+
```bash
|
|
61
|
+
intercept-build make
|
|
62
|
+
# Output: compile_commands.json in the current directory
|
|
63
|
+
```
|
|
64
|
+
|
|
65
|
+
### Rust / Cargo
|
|
66
|
+
|
|
67
|
+
Cargo does not natively emit `compile_commands.json`. Two options:
|
|
68
|
+
|
|
69
|
+
```bash
|
|
70
|
+
# Option 1: Bear with cargo check (faster — avoids linking)
|
|
71
|
+
bear -- cargo check
|
|
72
|
+
bear -- cargo build # if cargo check is insufficient
|
|
73
|
+
|
|
74
|
+
# Option 2: compiledb
|
|
75
|
+
pip install compiledb
|
|
76
|
+
compiledb cargo build
|
|
77
|
+
```
|
|
78
|
+
|
|
79
|
+
**Critical limitation for Rust**: Bear captures `rustc` invocations, not `clang` invocations. `emit_ir.sh` (which calls `clang`) **will not work** directly on Rust TUs. Use `cargo rustc` instead to emit IR and assembly directly:
|
|
80
|
+
|
|
81
|
+
```bash
|
|
82
|
+
# Preferred: use the emit scripts which handle CARGO_TARGET_DIR isolation:
|
|
83
|
+
{baseDir}/tools/emit_rust_ir.sh --manifest Cargo.toml --opt O0 --out /tmp/crate.O0.ll
|
|
84
|
+
{baseDir}/tools/emit_rust_ir.sh --manifest Cargo.toml --opt O2 --out /tmp/crate.O2.ll
|
|
85
|
+
|
|
86
|
+
# Manual alternative (output goes to an isolated temp dir, not target/debug/deps):
|
|
87
|
+
CARGO_TARGET_DIR=/tmp/zir cargo rustc -- --emit=llvm-ir -C opt-level=0
|
|
88
|
+
CARGO_TARGET_DIR=/tmp/zir cargo rustc -- --emit=llvm-ir -C opt-level=2
|
|
89
|
+
|
|
90
|
+
# Assembly for Rust (use instead of emit_asm.sh):
|
|
91
|
+
cargo rustc -- --emit=asm -C opt-level=2
|
|
92
|
+
# Output: target/release/deps/*.s
|
|
93
|
+
```
|
|
94
|
+
|
|
95
|
+
Pass the resulting `.ll` and `.s` files directly to `diff_ir.sh` and `analyze_asm.sh`.
|
|
96
|
+
|
|
97
|
+
---
|
|
98
|
+
|
|
99
|
+
## End-to-End Pipeline
|
|
100
|
+
|
|
101
|
+
The canonical pipeline for C/C++ analysis. Always use a hash of the source path as `<tu_hash>` (not the raw filename) to avoid collisions during parallel TU processing. Clean up temp files on completion or failure.
|
|
102
|
+
|
|
103
|
+
```bash
|
|
104
|
+
mkdir -p /tmp/zeroize-audit/
|
|
105
|
+
|
|
106
|
+
# Step 1: Extract build-relevant flags for the TU (as a bash array)
|
|
107
|
+
FLAGS=()
|
|
108
|
+
while IFS= read -r flag; do FLAGS+=("$flag"); done < <(
|
|
109
|
+
python {baseDir}/tools/extract_compile_flags.py \
|
|
110
|
+
--compile-db /path/to/build/compile_commands.json \
|
|
111
|
+
--src /path/to/src/crypto.c --format lines)
|
|
112
|
+
|
|
113
|
+
# Step 2: Emit IR at each level in opt_levels (always include O0 as baseline)
|
|
114
|
+
{baseDir}/tools/emit_ir.sh \
|
|
115
|
+
--src /path/to/src/crypto.c \
|
|
116
|
+
--out /tmp/zeroize-audit/<tu_hash>.O0.ll --opt O0 -- "${FLAGS[@]}"
|
|
117
|
+
|
|
118
|
+
{baseDir}/tools/emit_ir.sh \
|
|
119
|
+
--src /path/to/src/crypto.c \
|
|
120
|
+
--out /tmp/zeroize-audit/<tu_hash>.O1.ll --opt O1 -- "${FLAGS[@]}"
|
|
121
|
+
|
|
122
|
+
{baseDir}/tools/emit_ir.sh \
|
|
123
|
+
--src /path/to/src/crypto.c \
|
|
124
|
+
--out /tmp/zeroize-audit/<tu_hash>.O2.ll --opt O2 -- "${FLAGS[@]}"
|
|
125
|
+
|
|
126
|
+
# Step 3: Diff across all levels — O1 is the diagnostic level for simple DSE;
|
|
127
|
+
# O2 catches more aggressive eliminations
|
|
128
|
+
{baseDir}/tools/diff_ir.sh \
|
|
129
|
+
/tmp/zeroize-audit/<tu_hash>.O0.ll \
|
|
130
|
+
/tmp/zeroize-audit/<tu_hash>.O1.ll \
|
|
131
|
+
/tmp/zeroize-audit/<tu_hash>.O2.ll
|
|
132
|
+
|
|
133
|
+
# Step 4: Emit assembly at O2 for register-spill and stack-retention analysis
|
|
134
|
+
{baseDir}/tools/emit_asm.sh \
|
|
135
|
+
--src /path/to/src/crypto.c \
|
|
136
|
+
--out /tmp/zeroize-audit/<tu_hash>.O2.s --opt O2 -- "${FLAGS[@]}"
|
|
137
|
+
|
|
138
|
+
# Step 5: Analyze assembly output
|
|
139
|
+
{baseDir}/tools/analyze_asm.sh /tmp/zeroize-audit/<tu_hash>.O2.s
|
|
140
|
+
|
|
141
|
+
# Cleanup
|
|
142
|
+
rm -rf /tmp/zeroize-audit/<tu_hash>.*
|
|
143
|
+
```
|
|
144
|
+
|
|
145
|
+
Refer to the IR analysis reference (loaded separately from SKILL.md) for how to interpret IR diffs and identify wipe elimination patterns.
|
|
146
|
+
|
|
147
|
+
---
|
|
148
|
+
|
|
149
|
+
## Flags Stripped by extract_compile_flags.py
|
|
150
|
+
|
|
151
|
+
These flags are removed because they are irrelevant to or break single-file IR/ASM emission:
|
|
152
|
+
|
|
153
|
+
| Flag(s) | Reason stripped |
|
|
154
|
+
|---|---|
|
|
155
|
+
| `-o <file>` | Emission tools supply their own `-o` |
|
|
156
|
+
| `-c` | IR/ASM emission uses `-S -emit-llvm` / `-S` instead |
|
|
157
|
+
| `-MF`, `-MT`, `-MQ` (+ argument) | Dependency file generation — irrelevant for analysis |
|
|
158
|
+
| `-MD`, `-MMD`, `-MP`, `-MG` | Dependency generation side-effects |
|
|
159
|
+
| `-pipe` | OS pipe between compiler stages; not meaningful for direct calls |
|
|
160
|
+
| `-save-temps` | Saves intermediate files; produces clutter |
|
|
161
|
+
| `-gsplit-dwarf` | Splits debug info to `.dwo`; incompatible with single-file emission |
|
|
162
|
+
| `-fcrash-diagnostics-dir=...` | Crash report output; irrelevant |
|
|
163
|
+
| `-fmodule-file=...`, `-fmodules-cache-path=...` | Clang module paths; may confuse single-TU invocation |
|
|
164
|
+
| `--serialize-diagnostics` | Clang diagnostic binary output; not needed |
|
|
165
|
+
| `-fdebug-prefix-map=...` | Debug info path remapping; harmless to strip |
|
|
166
|
+
| `-fprofile-generate`, `-fprofile-use=...` | PGO instrumentation; distorts IR for analysis |
|
|
167
|
+
| `-fcoverage-mapping` | Coverage instrumentation; alters IR structure |
|
|
168
|
+
|
|
169
|
+
Flags that are **kept** (build-relevant):
|
|
170
|
+
|
|
171
|
+
| Pattern | Reason kept |
|
|
172
|
+
|---|---|
|
|
173
|
+
| `-I`, `-isystem`, `-iquote` | Include paths required to parse the TU |
|
|
174
|
+
| `-D`, `-U` | Preprocessor defines/undefines that affect code paths |
|
|
175
|
+
| `-std=<val>` | Language standard — affects syntax and semantics |
|
|
176
|
+
| `-f*` security/codegen flags | e.g., `-fstack-protector`, `-fPIC`, `-fno-omit-frame-pointer` |
|
|
177
|
+
| `-m<arch>` | Target architecture flags (e.g., `-m64`, `-march=x86-64`, `-mthumb`) |
|
|
178
|
+
| `-W*` | Warning flags — harmless to pass through |
|
|
179
|
+
| `-pthread` | Threading model; affects macro definitions |
|
|
180
|
+
| `--sysroot=`, `-isysroot` | System root for cross-compilation |
|
|
181
|
+
| `-target <triple>` | Cross-compilation target triple; must be preserved |
|
|
182
|
+
|
|
183
|
+
---
|
|
184
|
+
|
|
185
|
+
## Common Pitfalls
|
|
186
|
+
|
|
187
|
+
### 1. Relative paths and the `"directory"` field
|
|
188
|
+
|
|
189
|
+
`"file": "../src/crypto.c"` is relative to `"directory"`, not to the CWD when running analysis. Always resolve file paths using `"directory"`. `extract_compile_flags.py` does this automatically; be explicit if invoking `clang` manually.
|
|
190
|
+
|
|
191
|
+
### 2. Multiple entries for the same file
|
|
192
|
+
|
|
193
|
+
Some build systems emit duplicate entries (e.g., with and without a precompiled header). `extract_compile_flags.py` returns the **first** match. If that entry includes `-fpch-preprocess`, the PCH must exist in the build directory for compilation to succeed. Either regenerate the PCH or strip PCH-related flags manually.
|
|
194
|
+
|
|
195
|
+
### 3. Stale or incomplete compile DB (most common failure)
|
|
196
|
+
|
|
197
|
+
If `bear` or CMake was run on an incremental build, only recompiled TUs are recorded. TUs compiled in a previous run may be missing or have outdated flags. **Always generate the compile DB from a clean build** (`make clean all`, `cargo clean && cargo build`) to ensure all TUs are captured with current flags.
|
|
198
|
+
|
|
199
|
+
`extract_compile_flags.py` exits with code 2 if a source file is not found in the DB. Common causes:
|
|
200
|
+
- Header-only files (no TU entry — expected)
|
|
201
|
+
- Files added after the last `bear`/CMake run
|
|
202
|
+
- Symlinked paths that resolve differently than recorded
|
|
203
|
+
|
|
204
|
+
Regenerate the compile DB if entries are missing.
|
|
205
|
+
|
|
206
|
+
### 4. Generated source files
|
|
207
|
+
|
|
208
|
+
Entries may point to generated files in the build directory (e.g., `build/generated/config.c`) that don't exist in a clean checkout. Run the build system to generate them before running analysis. Preflight (Step 1 in `task.md`) will catch this if trial compilation is attempted.
|
|
209
|
+
|
|
210
|
+
### 5. Cross-compilation targets
|
|
211
|
+
|
|
212
|
+
If the compile DB was generated for a cross-compilation target (e.g., `-target aarch64-linux-gnu` or `-target thumbv7m-none-eabi`), emitted IR and assembly will be for that target, not x86-64. This affects analysis in two ways:
|
|
213
|
+
|
|
214
|
+
- **IR diffs**: Only compare IR files emitted for the same target. Do not mix targets across opt levels.
|
|
215
|
+
- **Assembly analysis**: `analyze_asm.sh` adapts register patterns by target:
|
|
216
|
+
- x86-64: callee-saved registers are `rbx`, `r12`–`r15`; spills use `movq`/`movdqa` to `[rsp+N]`
|
|
217
|
+
- AArch64: callee-saved registers are `x19`–`x28`; spills use `str`/`stp` to `[sp, #N]`
|
|
218
|
+
- Thumb/ARM: callee-saved registers are `r4`–`r11`; spills use `str`/`stm` to `[sp, #N]`
|
|
219
|
+
|
|
220
|
+
Ensure `--target` is preserved in the stripped flags (it is, per the kept-flags table above).
|
|
221
|
+
|
|
222
|
+
### 6. `extract_compile_flags.py` exit codes
|
|
223
|
+
|
|
224
|
+
| Exit code | Meaning |
|
|
225
|
+
|---|---|
|
|
226
|
+
| 0 | Flags extracted successfully; output on stdout |
|
|
227
|
+
| 1 | Compile DB not found or not readable |
|
|
228
|
+
| 2 | Source file not found in compile DB |
|
|
229
|
+
| 3 | Compile DB is malformed JSON |
|
|
230
|
+
|
|
231
|
+
Check the exit code before passing flags to emission tools. An empty `FLAGS` array will silently produce incorrect IR.
|
|
@@ -0,0 +1,191 @@
|
|
|
1
|
+
# Detection Strategy
|
|
2
|
+
|
|
3
|
+
Read this during execution to guide per-step analysis. Steps 1–6 are Phase 1 (source-level); Steps 7–12 are Phase 2 (compiler-level).
|
|
4
|
+
|
|
5
|
+
---
|
|
6
|
+
|
|
7
|
+
## Phase 1 — Source-Level Analysis
|
|
8
|
+
|
|
9
|
+
### Step 1 — Preflight Build Context (mandatory)
|
|
10
|
+
- Verify `compile_db` exists and is readable.
|
|
11
|
+
- Verify compile database entries point to existing files/working directories.
|
|
12
|
+
- Verify the codebase is compilable with the captured commands (or equivalent build invocation).
|
|
13
|
+
- Fail fast if preflight fails; do not continue with partial/source-only analysis.
|
|
14
|
+
|
|
15
|
+
### Step 2 — Identify Sensitive Objects
|
|
16
|
+
|
|
17
|
+
Scan all TUs for objects matching these heuristics. Each heuristic has a confidence level that propagates to findings.
|
|
18
|
+
|
|
19
|
+
**Name patterns (low confidence)** — match substrings case-insensitively:
|
|
20
|
+
`key`, `secret`, `seed`, `priv`, `sk`, `shared_secret`, `nonce`, `token`, `pwd`, `pass`
|
|
21
|
+
|
|
22
|
+
**Type hints (medium confidence)** — byte buffers, fixed-size arrays, or structs whose names or fields match name patterns above.
|
|
23
|
+
|
|
24
|
+
**Explicit annotations (high confidence)**:
|
|
25
|
+
- Rust: `#[secret]`, `Secret<T>` patterns (configurable)
|
|
26
|
+
- C/C++: `__attribute__((annotate("sensitive")))`, `SENSITIVE` macro (configurable via `explicit_sensitive_markers` in `{baseDir}/configs/default.yaml`)
|
|
27
|
+
|
|
28
|
+
Record each sensitive object with: name, type, location (file:line), confidence level, and the heuristic that matched.
|
|
29
|
+
|
|
30
|
+
### Step 3 — Detect Zeroization Attempts
|
|
31
|
+
|
|
32
|
+
For each sensitive object identified in Step 2, check whether a call to an approved wipe API (see Approved Wipe APIs in SKILL.md) exists within the same scope or a cleanup function reachable from that scope.
|
|
33
|
+
|
|
34
|
+
Record: wipe API used, location, and whether the wipe was found at all.
|
|
35
|
+
|
|
36
|
+
### Step 4 — MCP Semantic Pass (when available)
|
|
37
|
+
|
|
38
|
+
Run this step **before** correctness validation so that resolved types, aliases, and cross-file references are available to Steps 5 and 6. Skip and continue if MCP is unavailable in `prefer` mode (see Confidence Gating in SKILL.md).
|
|
39
|
+
|
|
40
|
+
- Run `{baseDir}/tools/mcp/check_mcp.sh` to confirm MCP is live. If it fails and `mcp_mode=require`, stop the run.
|
|
41
|
+
- Activate the project with `activate_project` (pass the repository root path). This must succeed before any other Serena tool can be used. If activation fails, treat MCP as unavailable.
|
|
42
|
+
- For each sensitive object and wipe call, resolve symbol definitions using `find_symbol` (by name, with `include_body: true` for type details) and collect cross-file references using `find_referencing_symbols`.
|
|
43
|
+
- Trace callers and cleanup paths using `find_referencing_symbols` on wipe wrapper functions. For outgoing calls, read the function body from `find_symbol` output and resolve called symbols.
|
|
44
|
+
- Use `get_symbols_overview` to get a high-level view of symbols in a file when exploring unfamiliar TUs.
|
|
45
|
+
- Normalize all MCP output: `python {baseDir}/tools/mcp/normalize_mcp_evidence.py`.
|
|
46
|
+
|
|
47
|
+
Prioritize `find_symbol` queries by sensitive-object name first, then wipe wrapper names. Score confidence: name match alone → `needs_review`; name + type resolved → `likely`; name + type + call chain confirmed → `confirmed`.
|
|
48
|
+
|
|
49
|
+
### Step 5 — Validate Correctness
|
|
50
|
+
|
|
51
|
+
For each sensitive object with a detected wipe, use type and alias data from Step 4 (if available) to validate:
|
|
52
|
+
- **Size correct**: wipe length matches `sizeof(object)`, not `sizeof(pointer)`. MCP-resolved typedefs and array sizes take precedence over source-level estimates.
|
|
53
|
+
- **All exits covered** (heuristic): wipe is present on normal exit, early return, and error paths visible in source. Flag `NOT_ON_ALL_PATHS` if any path appears uncovered.
|
|
54
|
+
- **Ordering correct**: wipe occurs before `free()` or scope end, not after.
|
|
55
|
+
|
|
56
|
+
Emit `PARTIAL_WIPE` for incorrect size. Emit `NOT_ON_ALL_PATHS` for missing paths (heuristic; CFG analysis in Step 10 provides definitive results).
|
|
57
|
+
|
|
58
|
+
### Step 6 — Data-Flow and Heap Checks
|
|
59
|
+
|
|
60
|
+
Use cross-file reference data from Step 4 (if available) to extend tracking beyond the current TU.
|
|
61
|
+
|
|
62
|
+
**Data-flow (produces `SECRET_COPY`):**
|
|
63
|
+
- Detect `memcpy()`/`memmove()` copying sensitive buffers.
|
|
64
|
+
- Track struct assignments and array copies of sensitive objects.
|
|
65
|
+
- Flag function arguments passed by value (copies on stack).
|
|
66
|
+
- Flag secrets returned by value.
|
|
67
|
+
- Emit `SECRET_COPY` when any of the above copies exist and no approved wipe is tracked for the copy destination.
|
|
68
|
+
|
|
69
|
+
**Heap (produces `INSECURE_HEAP_ALLOC`):**
|
|
70
|
+
- Detect `malloc`/`calloc`/`realloc` used to allocate sensitive objects.
|
|
71
|
+
- Check for `mlock()`/`madvise(MADV_DONTDUMP)` — note absence as a warning.
|
|
72
|
+
- Recommend secure allocators: `OPENSSL_secure_malloc`, `sodium_malloc`.
|
|
73
|
+
|
|
74
|
+
---
|
|
75
|
+
|
|
76
|
+
## Phase 2 — Compiler-Level Analysis
|
|
77
|
+
|
|
78
|
+
All steps in Phase 2 require a valid compile DB and a working `clang` installation. Skip Phase 2 findings if Phase 1 preflight failed.
|
|
79
|
+
|
|
80
|
+
### Step 7 — IR Comparison (produces `OPTIMIZED_AWAY_ZEROIZE`)
|
|
81
|
+
|
|
82
|
+
For each TU containing sensitive objects:
|
|
83
|
+
|
|
84
|
+
```bash
|
|
85
|
+
FLAGS=()
|
|
86
|
+
while IFS= read -r flag; do FLAGS+=("$flag"); done < <(
|
|
87
|
+
python {baseDir}/tools/extract_compile_flags.py \
|
|
88
|
+
--compile-db <compile_db> --src <file> --format lines)
|
|
89
|
+
|
|
90
|
+
{baseDir}/tools/emit_ir.sh --src <file> \
|
|
91
|
+
--out /tmp/zeroize-audit/<tu_hash>.O0.ll --opt O0 -- "${FLAGS[@]}"
|
|
92
|
+
|
|
93
|
+
{baseDir}/tools/emit_ir.sh --src <file> \
|
|
94
|
+
--out /tmp/zeroize-audit/<tu_hash>.O1.ll --opt O1 -- "${FLAGS[@]}"
|
|
95
|
+
|
|
96
|
+
{baseDir}/tools/emit_ir.sh --src <file> \
|
|
97
|
+
--out /tmp/zeroize-audit/<tu_hash>.O2.ll --opt O2 -- "${FLAGS[@]}"
|
|
98
|
+
|
|
99
|
+
{baseDir}/tools/diff_ir.sh \
|
|
100
|
+
/tmp/zeroize-audit/<tu_hash>.O0.ll \
|
|
101
|
+
/tmp/zeroize-audit/<tu_hash>.O1.ll \
|
|
102
|
+
/tmp/zeroize-audit/<tu_hash>.O2.ll
|
|
103
|
+
```
|
|
104
|
+
|
|
105
|
+
Use `<tu_hash>` (a hash of the source path) to avoid collisions when processing multiple TUs.
|
|
106
|
+
`diff_ir.sh` outputs a unified diff to stdout; a non-zero exit code means divergence was detected.
|
|
107
|
+
Clean up `/tmp/zeroize-audit/` on completion or failure.
|
|
108
|
+
|
|
109
|
+
**Interpretation:**
|
|
110
|
+
- Wipe present at O0, absent at O1 → simple dead-store elimination. Flag `OPTIMIZED_AWAY_ZEROIZE`.
|
|
111
|
+
- Wipe present at O1, absent at O2 → aggressive optimization. Flag `OPTIMIZED_AWAY_ZEROIZE`.
|
|
112
|
+
- Include the IR diff as mandatory evidence in the finding.
|
|
113
|
+
|
|
114
|
+
Key IR patterns: `store volatile i8 0` is the primary wipe signal; its absence at O2 when present at O0 is DSE. `@llvm.memset` without the volatile flag is elidable. `alloca` with `@llvm.lifetime.end` and no `store volatile` in the same function indicates stack retention.
|
|
115
|
+
|
|
116
|
+
### Step 8 — Assembly Analysis (produces `STACK_RETENTION`, `REGISTER_SPILL`)
|
|
117
|
+
|
|
118
|
+
Skip if `enable_asm=false`.
|
|
119
|
+
|
|
120
|
+
```bash
|
|
121
|
+
{baseDir}/tools/emit_asm.sh --src <file> \
|
|
122
|
+
--out /tmp/zeroize-audit/<tu_hash>.O2.s --opt O2 -- "${FLAGS[@]}"
|
|
123
|
+
|
|
124
|
+
{baseDir}/tools/analyze_asm.sh \
|
|
125
|
+
--asm /tmp/zeroize-audit/<tu_hash>.O2.s \
|
|
126
|
+
--out /tmp/zeroize-audit/<tu_hash>.asm-analysis.json
|
|
127
|
+
```
|
|
128
|
+
|
|
129
|
+
`analyze_asm.sh` outputs annotated findings to stdout.
|
|
130
|
+
|
|
131
|
+
Check for:
|
|
132
|
+
- **Register spills**: `movq`/`movdqa` of secret values to stack offsets → flag `REGISTER_SPILL`.
|
|
133
|
+
- **Callee-saved registers**: `rbx`, `r12`–`r15` (x86-64) pushed to stack containing secret values → flag `REGISTER_SPILL`.
|
|
134
|
+
- **Stack retention**: stack frame size and whether secret bytes are cleared before `ret` → flag `STACK_RETENTION`.
|
|
135
|
+
|
|
136
|
+
Include the relevant assembly excerpt as mandatory evidence.
|
|
137
|
+
|
|
138
|
+
### Step 9 — Semantic IR Analysis (produces `LOOP_UNROLLED_INCOMPLETE`)
|
|
139
|
+
|
|
140
|
+
Skip if `enable_semantic_ir=false`.
|
|
141
|
+
|
|
142
|
+
Parse LLVM IR structurally (do not use regex on raw IR text):
|
|
143
|
+
- Build function and basic block representations.
|
|
144
|
+
- Track memory operations in SSA form after the `mem2reg` pass.
|
|
145
|
+
- Detect loop-unrolled zeroization: 4 or more consecutive zero stores.
|
|
146
|
+
- Verify unrolled stores target the correct addresses and cover the full object size.
|
|
147
|
+
- Identify phi nodes and register-promoted variables that may hide secret values.
|
|
148
|
+
|
|
149
|
+
Flag `LOOP_UNROLLED_INCOMPLETE` when unrolling is detected but does not cover the full object.
|
|
150
|
+
|
|
151
|
+
### Step 10 — Control-Flow Graph Analysis (produces `MISSING_ON_ERROR_PATH`, `NOT_DOMINATING_EXITS`)
|
|
152
|
+
|
|
153
|
+
Skip if `enable_cfg=false`.
|
|
154
|
+
|
|
155
|
+
Build a CFG from source or LLVM IR:
|
|
156
|
+
- Enumerate all execution paths from function entry to exits.
|
|
157
|
+
- Compute dominator sets for all nodes.
|
|
158
|
+
- Verify that a wipe node dominates all exit nodes. If not, flag `NOT_DOMINATING_EXITS`.
|
|
159
|
+
- Identify error paths (early returns, `goto`, exceptions, `longjmp`) that bypass the wipe. Flag `MISSING_ON_ERROR_PATH` for each such path.
|
|
160
|
+
|
|
161
|
+
This step produces definitive results replacing the heuristic `NOT_ON_ALL_PATHS` finding from Step 5. If both are emitted for the same object, keep only the CFG-backed finding.
|
|
162
|
+
|
|
163
|
+
### Step 11 — Runtime Validation Test Generation
|
|
164
|
+
|
|
165
|
+
Skip if `enable_runtime_tests=false`.
|
|
166
|
+
|
|
167
|
+
For each confirmed finding, generate:
|
|
168
|
+
- A C test harness that allocates the sensitive object and verifies all bytes are zero after the expected wipe point.
|
|
169
|
+
- A MemorySanitizer test (`-fsanitize=memory`) to detect reads of uninitialized or un-zeroed memory.
|
|
170
|
+
- A Valgrind invocation target for leak and memory error detection.
|
|
171
|
+
- A stack canary test to detect stack retention after function return.
|
|
172
|
+
|
|
173
|
+
Output a `Makefile` in `{baseDir}/generated_tests/` that builds and runs all tests with appropriate sanitizer flags.
|
|
174
|
+
|
|
175
|
+
### Step 12 — PoC Generation (mandatory)
|
|
176
|
+
|
|
177
|
+
Generate proof-of-concept C programs for all findings regardless of confidence. Each PoC exits 0 (exploitable) or 1 (not exploitable):
|
|
178
|
+
|
|
179
|
+
```bash
|
|
180
|
+
python {baseDir}/tools/generate_poc.py \
|
|
181
|
+
--findings <findings_json> \
|
|
182
|
+
--compile-db <compile_db> \
|
|
183
|
+
--out <poc_output_dir> \
|
|
184
|
+
--categories <poc_categories> \
|
|
185
|
+
--config <config> \
|
|
186
|
+
--no-confidence-filter
|
|
187
|
+
```
|
|
188
|
+
|
|
189
|
+
After generation, review PoCs for `// TODO` comments and fill them in using source context. Compilation and validation are handled by the orchestrator in Phase 5 (interactive).
|
|
190
|
+
|
|
191
|
+
Key PoC strategies: `OPTIMIZED_AWAY_ZEROIZE` — compile with and without `-O2`, compare memory dumps; `STACK_RETENTION` — call the target function, read stack memory after return; `MISSING_SOURCE_ZEROIZE` — verify bytes are non-zero at function exit. C/C++ findings support all categories. Rust findings support `MISSING_SOURCE_ZEROIZE`, `SECRET_COPY`, and `PARTIAL_WIPE` via `cargo test`; all other Rust categories are marked `poc_supported: false`.
|
|
@@ -0,0 +1,252 @@
|
|
|
1
|
+
# LLVM IR Analysis for Zeroization Auditing
|
|
2
|
+
|
|
3
|
+
This reference covers multi-level IR analysis for detecting compiler-optimized zeroization (dead-store elimination of wipes) and interpreting results. Read this during Step 7 (IR comparison) and Step 9 (semantic IR analysis) in `task.md`. For flag extraction and pipeline setup, refer to the compile-commands reference (loaded separately from SKILL.md).
|
|
4
|
+
|
|
5
|
+
---
|
|
6
|
+
|
|
7
|
+
## Optimization Level Semantics
|
|
8
|
+
|
|
9
|
+
| Level | What changes | Relevance to zeroization |
|
|
10
|
+
|---|---|---|
|
|
11
|
+
| **O0** | No optimization. All stores kept. | Baseline — wipe always present if written in source |
|
|
12
|
+
| **O1** | Basic optimizations. Simple dead-store elimination begins. | Diagnostic level: if wipe vanishes here, it's simple DSE. Fix is straightforward. |
|
|
13
|
+
| **O2** | Full DSE, inlining, SROA, alias analysis. | Most production builds. Most non-volatile wipes removed here. |
|
|
14
|
+
| **O3** | Aggressive vectorization, loop transforms, more inlining. | Rarely removes more wipes than O2, but can for loop-based wipes. |
|
|
15
|
+
| **Os/Oz** | Size-optimized. May collapse wipe loops into `memset`. | Verify wipe survives after size optimization; collapsed `memset` may become DSE-vulnerable. |
|
|
16
|
+
|
|
17
|
+
**Always include O0 as the unoptimized baseline**, regardless of the `opt_levels` input. O1 is the diagnostic level — if the wipe disappears there, the cause is simple DSE and the fix is straightforward. If the wipe only disappears at O2 or O3, proceed to the multi-level root cause analysis below.
|
|
18
|
+
|
|
19
|
+
---
|
|
20
|
+
|
|
21
|
+
## Emitting IR at Multiple Levels
|
|
22
|
+
|
|
23
|
+
Extract flags once, then emit IR for each level in `opt_levels`. Use `<tu_hash>` (a hash of the source path) to avoid collisions during parallel TU processing. Always clean up temp files on completion or failure.
|
|
24
|
+
|
|
25
|
+
```bash
|
|
26
|
+
mkdir -p /tmp/zeroize-audit/
|
|
27
|
+
|
|
28
|
+
FLAGS=()
|
|
29
|
+
while IFS= read -r flag; do FLAGS+=("$flag"); done < <(
|
|
30
|
+
python {baseDir}/tools/extract_compile_flags.py \
|
|
31
|
+
--compile-db build/compile_commands.json \
|
|
32
|
+
--src src/crypto.c --format lines)
|
|
33
|
+
|
|
34
|
+
# Emit IR for each level in opt_levels (O0 always included as baseline)
|
|
35
|
+
for OPT in O0 O1 O2; do
|
|
36
|
+
{baseDir}/tools/emit_ir.sh \
|
|
37
|
+
--src src/crypto.c \
|
|
38
|
+
--out /tmp/zeroize-audit/<tu_hash>.${OPT}.ll \
|
|
39
|
+
--opt ${OPT} -- "${FLAGS[@]}"
|
|
40
|
+
done
|
|
41
|
+
|
|
42
|
+
# Diff all levels — prints pairwise diffs and a WIPE PATTERN SUMMARY
|
|
43
|
+
{baseDir}/tools/diff_ir.sh \
|
|
44
|
+
/tmp/zeroize-audit/<tu_hash>.O0.ll \
|
|
45
|
+
/tmp/zeroize-audit/<tu_hash>.O1.ll \
|
|
46
|
+
/tmp/zeroize-audit/<tu_hash>.O2.ll
|
|
47
|
+
|
|
48
|
+
# Cleanup
|
|
49
|
+
rm -f /tmp/zeroize-audit/<tu_hash>.*.ll
|
|
50
|
+
```
|
|
51
|
+
|
|
52
|
+
For Rust TUs, `emit_ir.sh` does not apply. Use `cargo rustc -- --emit=llvm-ir -C opt-level=N` instead and pass the resulting `.ll` files directly to `diff_ir.sh`. Use `bear -- cargo build` to generate `compile_commands.json` for Rust projects.
|
|
53
|
+
|
|
54
|
+
---
|
|
55
|
+
|
|
56
|
+
## LLVM IR Zeroization Patterns
|
|
57
|
+
|
|
58
|
+
### DSE-safe patterns (survive optimization)
|
|
59
|
+
|
|
60
|
+
These indicate a secure wipe the compiler cannot remove.
|
|
61
|
+
|
|
62
|
+
**Volatile memset intrinsic** — the `i1 true` (volatile) flag prevents DSE:
|
|
63
|
+
```llvm
|
|
64
|
+
call void @llvm.memset.p0i8.i64(i8* volatile %ptr, i8 0, i64 32, i1 true)
|
|
65
|
+
```
|
|
66
|
+
|
|
67
|
+
**Volatile zero stores** — volatile side effects must be preserved:
|
|
68
|
+
```llvm
|
|
69
|
+
store volatile i8 0, i8* %ptr, align 1
|
|
70
|
+
store volatile i64 0, i64* %ptr, align 8
|
|
71
|
+
```
|
|
72
|
+
|
|
73
|
+
**Opaque wipe function calls** — DSE cannot remove calls to external functions with unknown side effects:
|
|
74
|
+
```llvm
|
|
75
|
+
call void @explicit_bzero(i8* %key, i64 32)
|
|
76
|
+
call void @sodium_memzero(i8* %key, i64 32)
|
|
77
|
+
call void @OPENSSL_cleanse(i8* %key, i64 32)
|
|
78
|
+
call void @SecureZeroMemory(i8* %key, i64 32)
|
|
79
|
+
```
|
|
80
|
+
|
|
81
|
+
**`memset_s`** — defined by C11 to be non-optimizable:
|
|
82
|
+
```llvm
|
|
83
|
+
call i32 @memset_s(i8* %key, i64 32, i32 0, i64 32)
|
|
84
|
+
```
|
|
85
|
+
|
|
86
|
+
**Rust `zeroize` crate** — emits volatile stores via the `Zeroize` trait; look for:
|
|
87
|
+
```llvm
|
|
88
|
+
store volatile i8 0, i8* %ptr, align 1 ; repeated per byte, or as unrolled loop
|
|
89
|
+
```
|
|
90
|
+
|
|
91
|
+
---
|
|
92
|
+
|
|
93
|
+
### DSE-vulnerable patterns (may be removed at O1 or O2)
|
|
94
|
+
|
|
95
|
+
**Non-volatile memset intrinsic** — `i1 false` is the most common `OPTIMIZED_AWAY_ZEROIZE` pattern:
|
|
96
|
+
```llvm
|
|
97
|
+
call void @llvm.memset.p0i8.i64(i8* %ptr, i8 0, i64 32, i1 false)
|
|
98
|
+
```
|
|
99
|
+
|
|
100
|
+
**Non-volatile zero stores** — any non-volatile store to a dead location is DSE-eligible:
|
|
101
|
+
```llvm
|
|
102
|
+
store i8 0, i8* %ptr, align 1
|
|
103
|
+
store i64 0, i64* %ptr, align 8
|
|
104
|
+
store i32 0, i32* %ptr, align 4
|
|
105
|
+
```
|
|
106
|
+
|
|
107
|
+
**Standard `memset` inlined to non-volatile intrinsic** — `memset(key, 0, 32)` in source is lowered by Clang to `@llvm.memset ... i1 false`. The source used `memset` but the IR form is DSE-vulnerable. This is the most frequent source of confusion.
|
|
108
|
+
|
|
109
|
+
---
|
|
110
|
+
|
|
111
|
+
## Reading an IR Diff: Concrete Before/After Example
|
|
112
|
+
|
|
113
|
+
**Source (C):**
|
|
114
|
+
```c
|
|
115
|
+
void handle_request(uint8_t session_key[32]) {
|
|
116
|
+
// ... use session_key ...
|
|
117
|
+
memset(session_key, 0, 32); // intended cleanup
|
|
118
|
+
}
|
|
119
|
+
```
|
|
120
|
+
|
|
121
|
+
**O0 IR — wipe present:**
|
|
122
|
+
```llvm
|
|
123
|
+
define void @handle_request(i8* %session_key) {
|
|
124
|
+
entry:
|
|
125
|
+
; ... computation uses session_key ...
|
|
126
|
+
call void @llvm.memset.p0i8.i64(i8* %session_key, i8 0, i64 32, i1 false)
|
|
127
|
+
ret void
|
|
128
|
+
}
|
|
129
|
+
```
|
|
130
|
+
|
|
131
|
+
**O2 IR — wipe removed by DSE:**
|
|
132
|
+
```llvm
|
|
133
|
+
define void @handle_request(i8* %session_key) {
|
|
134
|
+
entry:
|
|
135
|
+
; ... computation ...
|
|
136
|
+
; llvm.memset REMOVED — no read from session_key after the store;
|
|
137
|
+
; optimizer treats it as a dead store and eliminates it.
|
|
138
|
+
ret void
|
|
139
|
+
}
|
|
140
|
+
```
|
|
141
|
+
|
|
142
|
+
**`diff_ir.sh` output:**
|
|
143
|
+
```
|
|
144
|
+
=== DIFF: O0.ll vs O2.ll ===
|
|
145
|
+
- call void @llvm.memset.p0i8.i64(i8* %session_key, i8 0, i64 32, i1 false)
|
|
146
|
+
|
|
147
|
+
=== WIPE PATTERN SUMMARY ===
|
|
148
|
+
O0.ll: WIPE PRESENT
|
|
149
|
+
O1.ll: WIPE PRESENT
|
|
150
|
+
O2.ll: WIPE ABSENT <-- first disappearance
|
|
151
|
+
```
|
|
152
|
+
|
|
153
|
+
Lines starting with `-` are present in the lower-opt file but absent in the higher-opt file. A `-` line containing any of the following tokens is direct evidence of `OPTIMIZED_AWAY_ZEROIZE`:
|
|
154
|
+
|
|
155
|
+
`llvm.memset`, `store i8 0`, `store i64 0`, `store i32 0`, `@explicit_bzero`, `@sodium_memzero`, `@OPENSSL_cleanse`, `@SecureZeroMemory`
|
|
156
|
+
|
|
157
|
+
---
|
|
158
|
+
|
|
159
|
+
## Multi-Level Root Cause Analysis
|
|
160
|
+
|
|
161
|
+
The level at which the wipe first disappears narrows the root cause and determines the appropriate fix:
|
|
162
|
+
|
|
163
|
+
```
|
|
164
|
+
O0 → WIPE PRESENT (baseline — wipe was written in source)
|
|
165
|
+
O1 → WIPE ABSENT → Simple dead-store elimination (basic DSE pass)
|
|
166
|
+
Fix: replace memset with explicit_bzero or volatile wipe loop
|
|
167
|
+
O2 → WIPE ABSENT → One or more of:
|
|
168
|
+
(first disappearance) • DSE + inlining: wipe is in a helper inlined into caller,
|
|
169
|
+
becomes dead store in caller's context
|
|
170
|
+
• SROA: struct/array promoted to scalars; individual
|
|
171
|
+
zero stores become DSE-eligible
|
|
172
|
+
• Alias analysis: proves no live uses after the wipe
|
|
173
|
+
Fix: use explicit_bzero; ensure wipe is not inside
|
|
174
|
+
an inlined callee (see Inlining section below)
|
|
175
|
+
O3 → WIPE ABSENT → Aggressive loop transforms or vectorization eliminated
|
|
176
|
+
(only here) a loop-based wipe
|
|
177
|
+
Fix: replace wipe loop with explicit_bzero or volatile loop
|
|
178
|
+
```
|
|
179
|
+
|
|
180
|
+
If the wipe disappears at O1, a simple `explicit_bzero` or `volatile` qualifier is sufficient. If it only disappears at O2 due to inlining, also ensure the wipe is not inside a callee that gets inlined at the call site.
|
|
181
|
+
|
|
182
|
+
---
|
|
183
|
+
|
|
184
|
+
## Advanced IR Analysis Scenarios
|
|
185
|
+
|
|
186
|
+
### Inlining and cross-function DSE
|
|
187
|
+
|
|
188
|
+
When a cleanup wrapper (e.g., `zeroize_key()`) is inlined into a caller, the wipe may become a dead store in the caller's context even if it survives in the callee's IR. Always emit IR for the **calling** TU — this is where inlining occurs:
|
|
189
|
+
|
|
190
|
+
```bash
|
|
191
|
+
# zeroize_key() defined in utils.c, called from crypto.c
|
|
192
|
+
# Emit IR for the caller — inlining happens here:
|
|
193
|
+
FLAGS=()
|
|
194
|
+
while IFS= read -r flag; do FLAGS+=("$flag"); done < <(
|
|
195
|
+
python {baseDir}/tools/extract_compile_flags.py \
|
|
196
|
+
--compile-db build/compile_commands.json --src src/crypto.c --format lines)
|
|
197
|
+
|
|
198
|
+
{baseDir}/tools/emit_ir.sh \
|
|
199
|
+
--src src/crypto.c \
|
|
200
|
+
--out /tmp/zeroize-audit/<tu_hash>.O2.ll --opt O2 -- "${FLAGS[@]}"
|
|
201
|
+
```
|
|
202
|
+
|
|
203
|
+
If the wipe is present in `utils.c` IR but absent in `crypto.c` IR at O2, the cause is cross-function DSE after inlining. Mark the `OPTIMIZED_AWAY_ZEROIZE` finding on the call site in `crypto.c`, not on `utils.c`.
|
|
204
|
+
|
|
205
|
+
### SROA (Scalar Replacement of Aggregates)
|
|
206
|
+
|
|
207
|
+
At O1+, SROA promotes small structs and arrays to individual scalar SSA values (registers). A `memset` of a struct may become a series of individual `store i32 0` / `store i8 0` instructions per field — each then eligible for DSE independently. In the diff, look for:
|
|
208
|
+
- O0: single `llvm.memset` covering the struct
|
|
209
|
+
- O1/O2: the `memset` is replaced by per-field zero stores, then those stores are removed
|
|
210
|
+
|
|
211
|
+
This means the wipe may partially survive SROA (some fields zeroed, others eliminated). Check that **all** fields of a sensitive struct are covered, not just the first.
|
|
212
|
+
|
|
213
|
+
### Loop unrolling of wipe loops
|
|
214
|
+
|
|
215
|
+
A manual wipe loop:
|
|
216
|
+
```c
|
|
217
|
+
for (int i = 0; i < 32; i++) key[i] = 0;
|
|
218
|
+
```
|
|
219
|
+
may be unrolled at O2 into 32 consecutive `store i8 0` instructions. If unrolling is incomplete (e.g., only 16 of 32 iterations unrolled and the remainder is a DSE-eligible tail), flag `LOOP_UNROLLED_INCOMPLETE`. Use `{baseDir}/tools/analyze_ir_semantic.py` for automated detection — do not use regex on raw IR text. The semantic tool builds a proper basic block representation and counts consecutive zero stores with address verification.
|
|
220
|
+
|
|
221
|
+
### Phi nodes and register-promoted secrets
|
|
222
|
+
|
|
223
|
+
After `mem2reg`, secret values that were stack-allocated may be promoted to SSA values tracked through phi nodes. A wipe of the original stack slot may not reach all SSA uses. Look for:
|
|
224
|
+
```llvm
|
|
225
|
+
%key.0 = phi i64 [ %loaded_key, %entry ], [ 0, %cleanup ]
|
|
226
|
+
```
|
|
227
|
+
If `%key.0` is used after the phi but the `0` arm is only reached on one path, the secret may persist in the non-zero arm. Flag as `NOT_DOMINATING_EXITS` if CFG analysis confirms it.
|
|
228
|
+
|
|
229
|
+
---
|
|
230
|
+
|
|
231
|
+
## Populating `compiler_evidence` in the Report
|
|
232
|
+
|
|
233
|
+
For each `OPTIMIZED_AWAY_ZEROIZE` finding, populate the output schema fields as follows. `OPTIMIZED_AWAY_ZEROIZE` is **never valid without IR diff evidence** — do not emit this finding from source-level analysis alone.
|
|
234
|
+
|
|
235
|
+
```json
|
|
236
|
+
{
|
|
237
|
+
"category": "OPTIMIZED_AWAY_ZEROIZE",
|
|
238
|
+
"compiler_evidence": {
|
|
239
|
+
"opt_levels": ["O0", "O1", "O2"],
|
|
240
|
+
"o0": "call void @llvm.memset.p0i8.i64(i8* %session_key, i8 0, i64 32, i1 false) present at line 88.",
|
|
241
|
+
"o1": "WIPE PRESENT at O1.",
|
|
242
|
+
"o2": "llvm.memset call absent at O2 — dead store eliminated after SROA promotes session_key to registers.",
|
|
243
|
+
"diff_summary": "Wipe first disappears at O2. Non-volatile memset(session_key, 0, 32) eliminated by DSE after SROA. Fix: replace memset with explicit_bzero."
|
|
244
|
+
}
|
|
245
|
+
}
|
|
246
|
+
```
|
|
247
|
+
|
|
248
|
+
Field usage notes:
|
|
249
|
+
- `opt_levels`: list every level that was emitted, not just the levels where the wipe changed.
|
|
250
|
+
- `o0` through `o2` (and `o1`, `o3` if analyzed): state explicitly whether the wipe is PRESENT or ABSENT at each level, with a short IR excerpt if present.
|
|
251
|
+
- If the wipe only disappears at O3 but is present at O2: set `o2` to `"WIPE PRESENT at O2"` and document the O3 removal in `diff_summary`.
|
|
252
|
+
- `diff_summary`: always identify the first disappearance level and the most likely optimization pass responsible (DSE, inlining, SROA, alias analysis, loop transform).
|