aegis-audit 2.1.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/LICENSE +21 -0
- package/README.md +131 -0
- package/benchmark/README.md +94 -0
- package/benchmark/fixtures/access_control_unprotected.sol +12 -0
- package/benchmark/fixtures/arithmetic_overflow.sol +14 -0
- package/benchmark/fixtures/bad_randomness.sol +12 -0
- package/benchmark/fixtures/clean_safe_vault.sol +24 -0
- package/benchmark/fixtures/reentrancy_simple_dao.sol +16 -0
- package/benchmark/fixtures/time_manipulation.sol +11 -0
- package/benchmark/fixtures/unchecked_call.sol +11 -0
- package/benchmark/mapping.js +45 -0
- package/benchmark/runner.js +164 -0
- package/index.js +41 -0
- package/package.json +38 -0
- package/src/analyzers/claude.js +62 -0
- package/src/analyzers/detectors.js +311 -0
- package/src/analyzers/risk.js +128 -0
- package/src/commands/audit.js +115 -0
- package/src/commands/benchmark.js +123 -0
- package/src/commands/config.js +27 -0
- package/src/knowledge/frameworks.js +97 -0
- package/src/output/formats.js +125 -0
- package/src/ui/banner.js +77 -0
- package/src/ui/report.js +178 -0
- package/src/utils/fetcher.js +115 -0
- package/src/utils/secure-config.js +110 -0
package/LICENSE
ADDED
|
@@ -0,0 +1,21 @@
|
|
|
1
|
+
MIT License
|
|
2
|
+
|
|
3
|
+
Copyright (c) 2026 rsh1k
|
|
4
|
+
|
|
5
|
+
Permission is hereby granted, free of charge, to any person obtaining a copy
|
|
6
|
+
of this software and associated documentation files (the "Software"), to deal
|
|
7
|
+
in the Software without restriction, including without limitation the rights
|
|
8
|
+
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
|
|
9
|
+
copies of the Software, and to permit persons to whom the Software is
|
|
10
|
+
furnished to do so, subject to the following conditions:
|
|
11
|
+
|
|
12
|
+
The above copyright notice and this permission notice shall be included in all
|
|
13
|
+
copies or substantial portions of the Software.
|
|
14
|
+
|
|
15
|
+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
|
16
|
+
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
|
17
|
+
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
|
|
18
|
+
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
|
19
|
+
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
|
|
20
|
+
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
|
|
21
|
+
SOFTWARE.
|
package/README.md
ADDED
|
@@ -0,0 +1,131 @@
|
|
|
1
|
+
<div align="center">
|
|
2
|
+
|
|
3
|
+
# 🛡️ Aegis
|
|
4
|
+
|
|
5
|
+
### AI-Powered Smart Contract Security Auditor
|
|
6
|
+
|
|
7
|
+
*Detect vulnerabilities before attackers do — mapped to OWASP, MITRE ATT&CK, CWE & NIST.*
|
|
8
|
+
|
|
9
|
+
[](LICENSE)
|
|
10
|
+
[](https://nodejs.org)
|
|
11
|
+
[-185fa5.svg)](https://owasp.org/www-project-smart-contract-top-10/)
|
|
12
|
+
[](benchmark/README.md)
|
|
13
|
+
[](CONTRIBUTING.md)
|
|
14
|
+
|
|
15
|
+
</div>
|
|
16
|
+
|
|
17
|
+
---
|
|
18
|
+
|
|
19
|
+
Aegis is a command-line security auditor for Solidity smart contracts. It combines a deterministic static-analysis engine with an AI semantic layer, maps every finding to industry frameworks, and produces enterprise-grade compliance artifacts (SARIF, SBOM, signed audit logs). Its detection accuracy is measured against an academic benchmark — not asserted.
|
|
20
|
+
|
|
21
|
+
```bash
|
|
22
|
+
npm install -g aegis-audit
|
|
23
|
+
aegis audit ./contracts/MyToken.sol
|
|
24
|
+
```
|
|
25
|
+
|
|
26
|
+
## Why Aegis
|
|
27
|
+
|
|
28
|
+
Most scanners hand you a list of bugs. Aegis is built for teams that ship to production:
|
|
29
|
+
|
|
30
|
+
- **Full OWASP SC Top 10 (2026) coverage** — including the categories that actually cause losses. Access control alone was $953M of $1.42B in 2024 losses, far ahead of reentrancy.
|
|
31
|
+
- **Framework traceability** — every finding carries its `SC0X:2026`, `CWE-XXX`, and MITRE `TXXXX` identifiers for audit and compliance reporting.
|
|
32
|
+
- **Red-team attack-path synthesis** — chains individual findings into the multi-step exploits an APT would actually run (flash-loan price manipulation, proxy takeover, recursive drain).
|
|
33
|
+
- **Offline mode** — `--offline` runs all static detectors without your source code ever leaving the machine. Built for proprietary and regulated codebases.
|
|
34
|
+
- **CI/CD native** — SARIF 2.1.0 output, configurable fail thresholds, proper exit codes.
|
|
35
|
+
- **NIST SSDF (SP 800-218) outputs** — CycloneDX SBOM generation, encrypted key storage, and a tamper-evident hash-chained audit log.
|
|
36
|
+
- **Measured, not claimed** — ships with a benchmark harness scored against the SmartBugs Curated dataset.
|
|
37
|
+
|
|
38
|
+
## Install
|
|
39
|
+
|
|
40
|
+
```bash
|
|
41
|
+
npm install -g aegis-audit
|
|
42
|
+
aegis config # set encrypted API key (or use --offline)
|
|
43
|
+
```
|
|
44
|
+
|
|
45
|
+
Get a free Anthropic API key at [console.anthropic.com](https://console.anthropic.com). Enterprises should prefer setting `ANTHROPIC_API_KEY` via a secrets manager, or use `--offline`.
|
|
46
|
+
|
|
47
|
+
## Usage
|
|
48
|
+
|
|
49
|
+
```bash
|
|
50
|
+
# Audit a local file, a folder, or a verified on-chain address
|
|
51
|
+
aegis audit ./contracts/MyToken.sol
|
|
52
|
+
aegis audit ./contracts/
|
|
53
|
+
aegis audit 0x1f9840a85d5aF5bf1D1762F925BDADdC4201F984 --network ethereum
|
|
54
|
+
|
|
55
|
+
# Enterprise / regulated: never transmit source code
|
|
56
|
+
aegis audit ./contracts/ --offline
|
|
57
|
+
|
|
58
|
+
# Generate compliance + CI artifacts
|
|
59
|
+
aegis audit ./contracts/ --sarif results.sarif --sbom sbom.json --output report.md
|
|
60
|
+
|
|
61
|
+
# CI gate (non-zero exit on high+ findings)
|
|
62
|
+
aegis audit ./contracts/ --ci --fail-on high
|
|
63
|
+
|
|
64
|
+
# Measure detector accuracy against labeled datasets
|
|
65
|
+
aegis benchmark
|
|
66
|
+
aegis benchmark --fetch-smartbugs
|
|
67
|
+
```
|
|
68
|
+
|
|
69
|
+
## What it detects — OWASP Smart Contract Top 10 (2026)
|
|
70
|
+
|
|
71
|
+
| ID | Category | Detectors |
|
|
72
|
+
|----|----------|-----------|
|
|
73
|
+
| SC01 | Access Control | Missing modifiers, `tx.origin` auth, unprotected `selfdestruct` |
|
|
74
|
+
| SC02 | Business Logic | Precision loss, unbounded loops (DoS), weak randomness, timestamp logic |
|
|
75
|
+
| SC03 | Price Oracle Manipulation | Spot-price-as-oracle detection |
|
|
76
|
+
| SC04 | Flash Loan | Callback-invariant flags |
|
|
77
|
+
| SC05 | Input Validation | Missing zero-address checks |
|
|
78
|
+
| SC06 | Unchecked External Calls | Unchecked `.call`, unsafe ERC20 transfer |
|
|
79
|
+
| SC07 / SC09 | Arithmetic / Overflow | Pre-0.8 Solidity, `unchecked` blocks |
|
|
80
|
+
| SC08 | Reentrancy | External-call-before-state, missing guards |
|
|
81
|
+
| SC10 | Proxy & Upgradeability | Unprotected initializer, `delegatecall` |
|
|
82
|
+
|
|
83
|
+
Plus a **Claude AI semantic layer** for business-logic and economic attacks that pattern matching misses.
|
|
84
|
+
|
|
85
|
+
## Accuracy
|
|
86
|
+
|
|
87
|
+
Aegis ships with a benchmark harness so detection is evidence-based. On the academic [SmartBugs Curated](https://github.com/smartbugs/smartbugs-curated) dataset (143 labeled contracts), the deterministic static layer alone scores:
|
|
88
|
+
|
|
89
|
+
| Metric | Value |
|
|
90
|
+
|--------|-------|
|
|
91
|
+
| Overall recall | 62.6% |
|
|
92
|
+
| Reentrancy precision | 94.1% |
|
|
93
|
+
| Unchecked-call recall | 76.9% |
|
|
94
|
+
| Full-dataset scan time | < 1 second |
|
|
95
|
+
|
|
96
|
+
The AI layer adds semantic recall on top of this floor. Full per-category numbers and methodology are in [`benchmark/README.md`](benchmark/README.md). For comparison, the ICSE 2020 study found individual mature tools each detect only a fraction of the dataset — which is why running multiple tools, plus a human audit, is the recommended practice.
|
|
97
|
+
|
|
98
|
+
## CI example (GitHub Actions)
|
|
99
|
+
|
|
100
|
+
```yaml
|
|
101
|
+
- name: Aegis audit
|
|
102
|
+
env:
|
|
103
|
+
ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }}
|
|
104
|
+
run: |
|
|
105
|
+
npm install -g aegis-audit
|
|
106
|
+
aegis audit ./contracts/ --ci --fail-on high --sarif results.sarif
|
|
107
|
+
- uses: github/codeql-action/upload-sarif@v3
|
|
108
|
+
with:
|
|
109
|
+
sarif_file: results.sarif
|
|
110
|
+
```
|
|
111
|
+
|
|
112
|
+
## Security & threat model
|
|
113
|
+
|
|
114
|
+
This tool is itself part of your software supply chain. State-sponsored groups (e.g. Lazarus/BlueNoroff, MITRE G0032) actively target Web3 developer toolchains via malicious packages (T1195). Accordingly:
|
|
115
|
+
|
|
116
|
+
- API keys are encrypted at rest (AES-256-GCM); enterprises should prefer a secrets manager.
|
|
117
|
+
- `--offline` guarantees no source transmission.
|
|
118
|
+
- The audit log is append-only and hash-chained — any edit to history is detectable via `aegis config`.
|
|
119
|
+
- All detector patterns are bounded against regex denial-of-service.
|
|
120
|
+
|
|
121
|
+
## Disclaimer
|
|
122
|
+
|
|
123
|
+
Aegis is an AI-assisted automated scanner. It is **not** a substitute for a professional manual audit, formal verification, or economic review. Automated analysis produces both false positives and false negatives. For high-value or production deployments, commission an independent human audit and, where applicable, formal verification of critical invariants.
|
|
124
|
+
|
|
125
|
+
## Contributing
|
|
126
|
+
|
|
127
|
+
Contributions are welcome — see [CONTRIBUTING.md](CONTRIBUTING.md). The most valuable contributions right now are new detectors (front-running/MEV, improved DoS) with corresponding benchmark fixtures.
|
|
128
|
+
|
|
129
|
+
## License
|
|
130
|
+
|
|
131
|
+
[MIT](LICENSE) © 2026 rsh1k
|
|
@@ -0,0 +1,94 @@
|
|
|
1
|
+
# SolGuard Benchmark
|
|
2
|
+
|
|
3
|
+
Measures the accuracy of SolGuard's **static detector layer** against labeled
|
|
4
|
+
vulnerable contracts, so claims about detection are evidence-based, not asserted.
|
|
5
|
+
|
|
6
|
+
## Run it
|
|
7
|
+
|
|
8
|
+
```bash
|
|
9
|
+
# Quick run against the 7 built-in labeled fixtures (offline, no deps)
|
|
10
|
+
solguard benchmark
|
|
11
|
+
|
|
12
|
+
# Full academic benchmark: clone & score SmartBugs Curated (143 contracts)
|
|
13
|
+
solguard benchmark --fetch-smartbugs
|
|
14
|
+
|
|
15
|
+
# Your own labeled dataset
|
|
16
|
+
solguard benchmark --dataset ./my-labeled-contracts/
|
|
17
|
+
|
|
18
|
+
# Machine-readable output for CI dashboards
|
|
19
|
+
solguard benchmark --fetch-smartbugs --output benchmark.json
|
|
20
|
+
```
|
|
21
|
+
|
|
22
|
+
## Dataset
|
|
23
|
+
|
|
24
|
+
The full benchmark uses [SmartBugs Curated](https://github.com/smartbugs/smartbugs-curated)
|
|
25
|
+
— the academic standard, 143 contracts with 208 tagged vulnerabilities across 10
|
|
26
|
+
DASP categories. It was used to compare 9 analysis tools in the ICSE 2020 study
|
|
27
|
+
(Durieux et al.).
|
|
28
|
+
|
|
29
|
+
Ground-truth labels are read from the dataset's own annotations:
|
|
30
|
+
`// <yes> <report> CATEGORY` markers, falling back to the category folder name.
|
|
31
|
+
|
|
32
|
+
## How scoring works
|
|
33
|
+
|
|
34
|
+
- **Positive** for category C = contract carries a `<yes>` marker for C.
|
|
35
|
+
- **Detection** of C = SolGuard emits any OWASP SC 2026 id mapped from C (see `mapping.js`).
|
|
36
|
+
- Scoring is **contract-level per category**, matching how SmartBugs tool
|
|
37
|
+
comparisons report.
|
|
38
|
+
- We compute, per category and overall: precision, recall, F1, and
|
|
39
|
+
false-negative rate.
|
|
40
|
+
- A separate **clean-contract false-positive rate** is measured against
|
|
41
|
+
contracts with no `<yes>` markers.
|
|
42
|
+
|
|
43
|
+
## Honest scope limits
|
|
44
|
+
|
|
45
|
+
- **Static layer only.** The Claude AI semantic layer is non-deterministic and is
|
|
46
|
+
not scored here. Real production recall is *higher* than these numbers — but the
|
|
47
|
+
static floor is what you can rely on deterministically.
|
|
48
|
+
- **Out of scope:** `short_addresses` (an ABI/calldata-level issue not visible in
|
|
49
|
+
source) and `other` (unspecified) are excluded from recall so the tool isn't
|
|
50
|
+
credited or penalized for classes it doesn't claim to cover.
|
|
51
|
+
- **Precision on tiny fixture sets is pessimistic** because vulnerable fixtures
|
|
52
|
+
often contain multiple real issues but are labeled for only one category; the
|
|
53
|
+
extra true findings count against precision. This evens out on the full dataset.
|
|
54
|
+
|
|
55
|
+
## What good numbers look like
|
|
56
|
+
|
|
57
|
+
A non-zero false-negative rate is expected and is the entire reason this tool
|
|
58
|
+
must not be the only gate before deploying high-value contracts. The benchmark
|
|
59
|
+
exists to quantify that gap, not to hide it. Track recall over time as detectors
|
|
60
|
+
improve; treat any regression as a release blocker.
|
|
61
|
+
|
|
62
|
+
## Baseline results (static layer, SmartBugs Curated, 143 contracts)
|
|
63
|
+
|
|
64
|
+
Measured with `solguard benchmark --fetch-smartbugs`. These are the deterministic
|
|
65
|
+
static-layer numbers; the Claude AI layer adds further semantic recall on top.
|
|
66
|
+
|
|
67
|
+
| Category | Support | Recall | Precision |
|
|
68
|
+
|---|---|---|---|
|
|
69
|
+
| Reentrancy | 31 | 51.6% | 94.1% |
|
|
70
|
+
| Access Control | 18 | 50.0% | 30.0% |
|
|
71
|
+
| Arithmetic | 15 | 93.3% | 10.2%* |
|
|
72
|
+
| Unchecked Low-Level Calls | 52 | 76.9% | 61.5% |
|
|
73
|
+
| Denial of Service | 6 | 33.3% | 8.0% |
|
|
74
|
+
| Bad Randomness | 8 | 50.0% | 16.0% |
|
|
75
|
+
| Front Running | 4 | 0.0%** | 0.0% |
|
|
76
|
+
| Time Manipulation | 5 | 40.0% | 8.0% |
|
|
77
|
+
| **Overall (micro)** | **139** | **62.6%** | **24.9%** |
|
|
78
|
+
|
|
79
|
+
\* Arithmetic precision is depressed because most pre-0.8 contracts trip the
|
|
80
|
+
overflow detector broadly; on modern 0.8+ code this is far lower noise.
|
|
81
|
+
\** No dedicated front-running detector exists yet — scored honestly as 0.
|
|
82
|
+
|
|
83
|
+
For comparison, the ICSE 2020 study (Durieux et al.) found that individual
|
|
84
|
+
mature tools (Slither, Mythril, etc.) each detected only a fraction of the
|
|
85
|
+
dataset, which is why running multiple tools — plus a human audit — is the
|
|
86
|
+
recommended practice. SolGuard is one layer in that stack, not a replacement
|
|
87
|
+
for it.
|
|
88
|
+
|
|
89
|
+
### Known gaps (tracked for improvement)
|
|
90
|
+
- No front-running / MEV detector.
|
|
91
|
+
- DoS and time-manipulation recall are low; detectors need refinement.
|
|
92
|
+
- Precision is noisy on legacy Solidity; modern-code precision is higher.
|
|
93
|
+
- Performance: all detectors are bounded to avoid regex DoS — full 143-contract
|
|
94
|
+
scan runs in under 1 second.
|
|
@@ -0,0 +1,12 @@
|
|
|
1
|
+
/*
|
|
2
|
+
* @source: SmartBugs Curated (fixture)
|
|
3
|
+
* @vulnerable_at_lines: 11
|
|
4
|
+
*/
|
|
5
|
+
pragma solidity ^0.4.24;
|
|
6
|
+
contract Unprotected {
|
|
7
|
+
address public owner;
|
|
8
|
+
function Unprotected() public { owner = msg.sender; }
|
|
9
|
+
// <yes> <report> ACCESS_CONTROL
|
|
10
|
+
function setOwner(address newOwner) public { owner = newOwner; }
|
|
11
|
+
function withdraw() public { require(msg.sender == owner); msg.sender.transfer(address(this).balance); }
|
|
12
|
+
}
|
|
@@ -0,0 +1,14 @@
|
|
|
1
|
+
/*
|
|
2
|
+
* @source: SmartBugs Curated (fixture)
|
|
3
|
+
* @vulnerable_at_lines: 12
|
|
4
|
+
*/
|
|
5
|
+
pragma solidity ^0.4.24;
|
|
6
|
+
contract IntegerOverflow {
|
|
7
|
+
mapping(address => uint256) public balances;
|
|
8
|
+
function transfer(address to, uint256 value) public {
|
|
9
|
+
require(balances[msg.sender] - value >= 0);
|
|
10
|
+
balances[msg.sender] -= value;
|
|
11
|
+
// <yes> <report> ARITHMETIC
|
|
12
|
+
balances[to] += value;
|
|
13
|
+
}
|
|
14
|
+
}
|
|
@@ -0,0 +1,12 @@
|
|
|
1
|
+
/*
|
|
2
|
+
* @source: SmartBugs Curated (fixture)
|
|
3
|
+
* @vulnerable_at_lines: 9
|
|
4
|
+
*/
|
|
5
|
+
pragma solidity ^0.4.24;
|
|
6
|
+
contract Lottery {
|
|
7
|
+
function pickWinner(uint guess) public view returns (bool) {
|
|
8
|
+
// <yes> <report> BAD_RANDOMNESS
|
|
9
|
+
uint rand = uint(keccak256(abi.encodePacked(block.timestamp, block.difficulty)));
|
|
10
|
+
return guess == rand % 100;
|
|
11
|
+
}
|
|
12
|
+
}
|
|
@@ -0,0 +1,24 @@
|
|
|
1
|
+
/*
|
|
2
|
+
* @source: fixture (intentionally safe — no <yes> markers)
|
|
3
|
+
*/
|
|
4
|
+
pragma solidity 0.8.24;
|
|
5
|
+
import "@openzeppelin/contracts/access/Ownable.sol";
|
|
6
|
+
import "@openzeppelin/contracts/utils/ReentrancyGuard.sol";
|
|
7
|
+
contract SafeVault is Ownable, ReentrancyGuard {
|
|
8
|
+
mapping(address => uint256) private balances;
|
|
9
|
+
event Deposited(address indexed who, uint256 amount);
|
|
10
|
+
event Withdrawn(address indexed who, uint256 amount);
|
|
11
|
+
constructor() Ownable(msg.sender) {}
|
|
12
|
+
function deposit() external payable {
|
|
13
|
+
require(msg.value > 0, "zero");
|
|
14
|
+
balances[msg.sender] += msg.value;
|
|
15
|
+
emit Deposited(msg.sender, msg.value);
|
|
16
|
+
}
|
|
17
|
+
function withdraw(uint256 amount) external nonReentrant {
|
|
18
|
+
require(balances[msg.sender] >= amount, "insufficient");
|
|
19
|
+
balances[msg.sender] -= amount;
|
|
20
|
+
(bool ok, ) = msg.sender.call{value: amount}("");
|
|
21
|
+
require(ok, "transfer failed");
|
|
22
|
+
emit Withdrawn(msg.sender, amount);
|
|
23
|
+
}
|
|
24
|
+
}
|
|
@@ -0,0 +1,16 @@
|
|
|
1
|
+
/*
|
|
2
|
+
* @source: SmartBugs Curated (fixture)
|
|
3
|
+
* @vulnerable_at_lines: 19
|
|
4
|
+
*/
|
|
5
|
+
pragma solidity ^0.4.19;
|
|
6
|
+
contract SimpleDAO {
|
|
7
|
+
mapping (address => uint) public credit;
|
|
8
|
+
function donate(address to) payable public { credit[to] += msg.value; }
|
|
9
|
+
function withdraw(uint amount) public {
|
|
10
|
+
if (credit[msg.sender] >= amount) {
|
|
11
|
+
// <yes> <report> REENTRANCY
|
|
12
|
+
bool res = msg.sender.call.value(amount)();
|
|
13
|
+
credit[msg.sender] -= amount;
|
|
14
|
+
}
|
|
15
|
+
}
|
|
16
|
+
}
|
|
@@ -0,0 +1,11 @@
|
|
|
1
|
+
/*
|
|
2
|
+
* @source: SmartBugs Curated (fixture)
|
|
3
|
+
* @vulnerable_at_lines: 8
|
|
4
|
+
*/
|
|
5
|
+
pragma solidity ^0.4.25;
|
|
6
|
+
contract TimedCrowdsale {
|
|
7
|
+
function isSaleFinished() view public returns (bool) {
|
|
8
|
+
// <yes> <report> TIME_MANIPULATION
|
|
9
|
+
return block.timestamp >= 1546300800;
|
|
10
|
+
}
|
|
11
|
+
}
|
|
@@ -0,0 +1,11 @@
|
|
|
1
|
+
/*
|
|
2
|
+
* @source: SmartBugs Curated (fixture)
|
|
3
|
+
* @vulnerable_at_lines: 9
|
|
4
|
+
*/
|
|
5
|
+
pragma solidity ^0.4.24;
|
|
6
|
+
contract UncheckedReturn {
|
|
7
|
+
function withdraw(address payable to, uint amount) public {
|
|
8
|
+
// <yes> <report> UNCHECKED_LL_CALLS
|
|
9
|
+
to.call.value(amount)("");
|
|
10
|
+
}
|
|
11
|
+
}
|
|
@@ -0,0 +1,45 @@
|
|
|
1
|
+
// ─────────────────────────────────────────────────────────────────────────────
|
|
2
|
+
// Benchmark category mapping
|
|
3
|
+
// SmartBugs Curated uses the DASP taxonomy (// <yes> <report> CATEGORY markers).
|
|
4
|
+
// SolGuard emits OWASP SC Top 10 (2026) IDs. This maps between them so we can
|
|
5
|
+
// score detections against ground-truth labels.
|
|
6
|
+
//
|
|
7
|
+
// DASP categories in the dataset directory names:
|
|
8
|
+
// reentrancy, access_control, arithmetic, unchecked_low_level_calls,
|
|
9
|
+
// denial_of_service, bad_randomness, front_running, time_manipulation,
|
|
10
|
+
// short_addresses, other
|
|
11
|
+
// ─────────────────────────────────────────────────────────────────────────────
|
|
12
|
+
|
|
13
|
+
// Map each DASP category -> the OWASP SC 2026 IDs SolGuard would emit for it.
|
|
14
|
+
// A detection "counts" for a labeled contract if SolGuard reports ANY of the
|
|
15
|
+
// mapped OWASP IDs.
|
|
16
|
+
export const DASP_TO_OWASP = {
|
|
17
|
+
reentrancy: ['SC08:2026'],
|
|
18
|
+
access_control: ['SC01:2026'],
|
|
19
|
+
arithmetic: ['SC07:2026', 'SC09:2026'],
|
|
20
|
+
unchecked_low_level_calls: ['SC06:2026'],
|
|
21
|
+
denial_of_service: ['SC02:2026'], // DoS detector tagged SC02 in our engine
|
|
22
|
+
bad_randomness: ['SC02:2026'], // weak-randomness detector tagged SC02
|
|
23
|
+
front_running: ['SC02:2026'], // no dedicated detector; logic family
|
|
24
|
+
time_manipulation: ['SC02:2026'], // timestamp detector tagged SC02
|
|
25
|
+
short_addresses: [], // not detectable from source (ABI-level)
|
|
26
|
+
other: [], // unspecified; excluded from scoring
|
|
27
|
+
};
|
|
28
|
+
|
|
29
|
+
// Categories we make NO claim to detect. Excluded from recall scoring so the
|
|
30
|
+
// benchmark is honest about scope rather than penalizing undetectable classes.
|
|
31
|
+
export const OUT_OF_SCOPE = ['short_addresses', 'other'];
|
|
32
|
+
|
|
33
|
+
// Human-readable labels for the report.
|
|
34
|
+
export const DASP_LABELS = {
|
|
35
|
+
reentrancy: 'Reentrancy',
|
|
36
|
+
access_control: 'Access Control',
|
|
37
|
+
arithmetic: 'Arithmetic',
|
|
38
|
+
unchecked_low_level_calls: 'Unchecked Low-Level Calls',
|
|
39
|
+
denial_of_service: 'Denial of Service',
|
|
40
|
+
bad_randomness: 'Bad Randomness',
|
|
41
|
+
front_running: 'Front Running',
|
|
42
|
+
time_manipulation: 'Time Manipulation',
|
|
43
|
+
short_addresses: 'Short Addresses (out of scope)',
|
|
44
|
+
other: 'Other (out of scope)',
|
|
45
|
+
};
|
|
@@ -0,0 +1,164 @@
|
|
|
1
|
+
// ─────────────────────────────────────────────────────────────────────────────
|
|
2
|
+
// SolGuard benchmark runner
|
|
3
|
+
// Evaluates the static detector engine against labeled vulnerable contracts
|
|
4
|
+
// (SmartBugs Curated format). Computes per-category and overall
|
|
5
|
+
// precision / recall / F1 / false-negative-rate.
|
|
6
|
+
//
|
|
7
|
+
// Methodology notes (stated for honesty):
|
|
8
|
+
// - Ground truth = DASP categories marked with "// <yes> <report> CATEGORY".
|
|
9
|
+
// - A contract is a POSITIVE for category C if it carries a <yes> marker for C.
|
|
10
|
+
// - SolGuard "detects" C if it emits any OWASP id mapped from C (see mapping.js).
|
|
11
|
+
// - Detection is scored at CONTRACT level per category (not line level), matching
|
|
12
|
+
// how most SmartBugs tool comparisons report (Durieux et al., ICSE 2020).
|
|
13
|
+
// - short_addresses and other are OUT OF SCOPE and excluded from recall.
|
|
14
|
+
// - This measures the STATIC layer only; the Claude AI layer is not benchmarked
|
|
15
|
+
// here because its output is non-deterministic. Real recall in production is
|
|
16
|
+
// >= static-only recall.
|
|
17
|
+
// ─────────────────────────────────────────────────────────────────────────────
|
|
18
|
+
|
|
19
|
+
import fs from 'fs';
|
|
20
|
+
import path from 'path';
|
|
21
|
+
import { fileURLToPath } from 'url';
|
|
22
|
+
import { runDetectors } from '../src/analyzers/detectors.js';
|
|
23
|
+
import { DASP_TO_OWASP, OUT_OF_SCOPE, DASP_LABELS } from './mapping.js';
|
|
24
|
+
|
|
25
|
+
const __dirname = path.dirname(fileURLToPath(import.meta.url));
|
|
26
|
+
|
|
27
|
+
// Map a "<report> CATEGORY" token to a DASP key.
|
|
28
|
+
const REPORT_TOKEN_TO_DASP = {
|
|
29
|
+
REENTRANCY: 'reentrancy',
|
|
30
|
+
ACCESS_CONTROL: 'access_control',
|
|
31
|
+
ARITHMETIC: 'arithmetic',
|
|
32
|
+
UNCHECKED_LL_CALLS: 'unchecked_low_level_calls',
|
|
33
|
+
UNCHECKED_LOW_LEVEL_CALLS: 'unchecked_low_level_calls',
|
|
34
|
+
DENIAL_OF_SERVICE: 'denial_of_service',
|
|
35
|
+
DOS: 'denial_of_service',
|
|
36
|
+
BAD_RANDOMNESS: 'bad_randomness',
|
|
37
|
+
FRONT_RUNNING: 'front_running',
|
|
38
|
+
TIME_MANIPULATION: 'time_manipulation',
|
|
39
|
+
SHORT_ADDRESSES: 'short_addresses',
|
|
40
|
+
OTHER: 'other',
|
|
41
|
+
};
|
|
42
|
+
|
|
43
|
+
// Extract ground-truth DASP categories from a contract's annotations.
|
|
44
|
+
// Falls back to the parent directory name (SmartBugs organizes by category folder).
|
|
45
|
+
export function groundTruthCategories(source, filePath) {
|
|
46
|
+
const cats = new Set();
|
|
47
|
+
const re = /\/\/\s*<yes>\s*<report>\s*([A-Z_]+)/g;
|
|
48
|
+
let m;
|
|
49
|
+
while ((m = re.exec(source)) !== null) {
|
|
50
|
+
const dasp = REPORT_TOKEN_TO_DASP[m[1]];
|
|
51
|
+
if (dasp) cats.add(dasp);
|
|
52
|
+
}
|
|
53
|
+
// Directory-name fallback (full SmartBugs layout: dataset/<category>/file.sol)
|
|
54
|
+
if (cats.size === 0 && filePath) {
|
|
55
|
+
const parent = path.basename(path.dirname(filePath));
|
|
56
|
+
if (DASP_TO_OWASP[parent]) cats.add(parent);
|
|
57
|
+
}
|
|
58
|
+
return [...cats];
|
|
59
|
+
}
|
|
60
|
+
|
|
61
|
+
// What categories did SolGuard detect for this source?
|
|
62
|
+
export function detectedCategories(source) {
|
|
63
|
+
const findings = runDetectors(source);
|
|
64
|
+
const owaspHits = new Set(findings.map(f => f.owasp));
|
|
65
|
+
const detected = new Set();
|
|
66
|
+
for (const [dasp, owaspIds] of Object.entries(DASP_TO_OWASP)) {
|
|
67
|
+
if (owaspIds.some(id => owaspHits.has(id))) detected.add(dasp);
|
|
68
|
+
}
|
|
69
|
+
return { detected: [...detected], findings };
|
|
70
|
+
}
|
|
71
|
+
|
|
72
|
+
// Run the benchmark over a directory of .sol files (recursively).
|
|
73
|
+
export function runBenchmark(datasetDir) {
|
|
74
|
+
const files = collectSol(datasetDir);
|
|
75
|
+
|
|
76
|
+
// Per-category confusion counts
|
|
77
|
+
const cats = Object.keys(DASP_TO_OWASP).filter(c => !OUT_OF_SCOPE.includes(c));
|
|
78
|
+
const stats = {};
|
|
79
|
+
for (const c of cats) stats[c] = { tp: 0, fn: 0, fp: 0, support: 0 };
|
|
80
|
+
|
|
81
|
+
let cleanContracts = 0;
|
|
82
|
+
let cleanFalsePositives = 0;
|
|
83
|
+
const perContract = [];
|
|
84
|
+
|
|
85
|
+
for (const file of files) {
|
|
86
|
+
const source = fs.readFileSync(file, 'utf8');
|
|
87
|
+
const truth = groundTruthCategories(source, file).filter(c => !OUT_OF_SCOPE.includes(c));
|
|
88
|
+
const { detected } = detectedCategories(source);
|
|
89
|
+
const detInScope = detected.filter(c => !OUT_OF_SCOPE.includes(c));
|
|
90
|
+
|
|
91
|
+
// Clean contract (no ground-truth vulns): measure false positives
|
|
92
|
+
const isClean = truth.length === 0;
|
|
93
|
+
if (isClean) {
|
|
94
|
+
cleanContracts++;
|
|
95
|
+
if (detInScope.length > 0) cleanFalsePositives++;
|
|
96
|
+
}
|
|
97
|
+
|
|
98
|
+
for (const c of cats) {
|
|
99
|
+
const isTrue = truth.includes(c);
|
|
100
|
+
const isDet = detInScope.includes(c);
|
|
101
|
+
if (isTrue) stats[c].support++;
|
|
102
|
+
if (isTrue && isDet) stats[c].tp++;
|
|
103
|
+
else if (isTrue && !isDet) stats[c].fn++;
|
|
104
|
+
else if (!isTrue && isDet) stats[c].fp++;
|
|
105
|
+
}
|
|
106
|
+
|
|
107
|
+
perContract.push({
|
|
108
|
+
file: path.relative(datasetDir, file),
|
|
109
|
+
truth, detected: detInScope,
|
|
110
|
+
hit: truth.length > 0 && truth.every(c => detInScope.includes(c)),
|
|
111
|
+
});
|
|
112
|
+
}
|
|
113
|
+
|
|
114
|
+
// Compute metrics
|
|
115
|
+
const report = { categories: {}, overall: {}, clean: {} };
|
|
116
|
+
let totTP = 0, totFN = 0, totFP = 0, totSupport = 0;
|
|
117
|
+
|
|
118
|
+
for (const c of cats) {
|
|
119
|
+
const { tp, fn, fp, support } = stats[c];
|
|
120
|
+
const precision = (tp + fp) ? tp / (tp + fp) : null;
|
|
121
|
+
const recall = support ? tp / support : null;
|
|
122
|
+
const f1 = (precision && recall) ? (2 * precision * recall) / (precision + recall) : null;
|
|
123
|
+
report.categories[c] = {
|
|
124
|
+
label: DASP_LABELS[c], support, tp, fn, fp,
|
|
125
|
+
precision, recall, f1,
|
|
126
|
+
falseNegativeRate: support ? fn / support : null,
|
|
127
|
+
};
|
|
128
|
+
totTP += tp; totFN += fn; totFP += fp; totSupport += support;
|
|
129
|
+
}
|
|
130
|
+
|
|
131
|
+
const microPrecision = (totTP + totFP) ? totTP / (totTP + totFP) : null;
|
|
132
|
+
const microRecall = totSupport ? totTP / totSupport : null;
|
|
133
|
+
const microF1 = (microPrecision && microRecall)
|
|
134
|
+
? (2 * microPrecision * microRecall) / (microPrecision + microRecall) : null;
|
|
135
|
+
|
|
136
|
+
report.overall = {
|
|
137
|
+
contractsTested: files.length,
|
|
138
|
+
totalVulnInstances: totSupport,
|
|
139
|
+
truePositives: totTP, falseNegatives: totFN, falsePositives: totFP,
|
|
140
|
+
microPrecision, microRecall, microF1,
|
|
141
|
+
falseNegativeRate: totSupport ? totFN / totSupport : null,
|
|
142
|
+
};
|
|
143
|
+
report.clean = {
|
|
144
|
+
cleanContracts,
|
|
145
|
+
cleanFalsePositives,
|
|
146
|
+
falsePositiveRate: cleanContracts ? cleanFalsePositives / cleanContracts : null,
|
|
147
|
+
};
|
|
148
|
+
report.perContract = perContract;
|
|
149
|
+
return report;
|
|
150
|
+
}
|
|
151
|
+
|
|
152
|
+
function collectSol(dir) {
|
|
153
|
+
const out = [];
|
|
154
|
+
if (!fs.existsSync(dir)) return out;
|
|
155
|
+
for (const entry of fs.readdirSync(dir)) {
|
|
156
|
+
const full = path.join(dir, entry);
|
|
157
|
+
const stat = fs.statSync(full);
|
|
158
|
+
if (stat.isDirectory()) out.push(...collectSol(full));
|
|
159
|
+
else if (entry.endsWith('.sol')) out.push(full);
|
|
160
|
+
}
|
|
161
|
+
return out;
|
|
162
|
+
}
|
|
163
|
+
|
|
164
|
+
export const FIXTURES_DIR = path.join(__dirname, 'fixtures');
|
package/index.js
ADDED
|
@@ -0,0 +1,41 @@
|
|
|
1
|
+
#!/usr/bin/env node
|
|
2
|
+
import { program } from 'commander';
|
|
3
|
+
import { auditCommand } from './src/commands/audit.js';
|
|
4
|
+
import { configCommand } from './src/commands/config.js';
|
|
5
|
+
import { benchmarkCommand } from './src/commands/benchmark.js';
|
|
6
|
+
import { banner } from './src/ui/banner.js';
|
|
7
|
+
|
|
8
|
+
await banner();
|
|
9
|
+
|
|
10
|
+
program
|
|
11
|
+
.name('solguard')
|
|
12
|
+
.description('AI-powered smart contract security auditor - OWASP SC Top 10 (2026), MITRE, NIST SSDF')
|
|
13
|
+
.version('2.1.0');
|
|
14
|
+
|
|
15
|
+
program
|
|
16
|
+
.command('audit <target>')
|
|
17
|
+
.description('Audit a contract by address, .sol file, or folder')
|
|
18
|
+
.option('-n, --network <network>', 'network (ethereum|base|arbitrum|polygon|optimism|bsc)', 'ethereum')
|
|
19
|
+
.option('-o, --output <path>', 'save markdown report (e.g. report.md)')
|
|
20
|
+
.option('--sarif <path>', 'write SARIF 2.1.0 for CI ingestion (e.g. results.sarif)')
|
|
21
|
+
.option('--sbom <path>', 'write CycloneDX SBOM (NIST SSDF PS.3)')
|
|
22
|
+
.option('--offline', 'static detectors only; source code never leaves your machine')
|
|
23
|
+
.option('--ci', 'CI mode: exit non-zero when findings breach --fail-on threshold')
|
|
24
|
+
.option('--fail-on <level>', 'CI fail threshold (critical|high|medium)', 'high')
|
|
25
|
+
.option('--json', 'machine-readable JSON output')
|
|
26
|
+
.action(auditCommand);
|
|
27
|
+
|
|
28
|
+
program
|
|
29
|
+
.command('config')
|
|
30
|
+
.description('Set encrypted API key and verify audit-log integrity')
|
|
31
|
+
.action(configCommand);
|
|
32
|
+
|
|
33
|
+
program
|
|
34
|
+
.command('benchmark')
|
|
35
|
+
.description('Measure detector accuracy against labeled vulnerable contracts')
|
|
36
|
+
.option('--dataset <dir>', 'custom dataset directory of labeled .sol files')
|
|
37
|
+
.option('--fetch-smartbugs', 'clone & use SmartBugs Curated (143 contracts)')
|
|
38
|
+
.option('-o, --output <path>', 'write full JSON benchmark report')
|
|
39
|
+
.action(benchmarkCommand);
|
|
40
|
+
|
|
41
|
+
program.parse();
|
package/package.json
ADDED
|
@@ -0,0 +1,38 @@
|
|
|
1
|
+
{
|
|
2
|
+
"name": "aegis-audit",
|
|
3
|
+
"version": "2.1.0",
|
|
4
|
+
"description": "AI-powered smart contract security auditor — OWASP SC Top 10 (2026), MITRE ATT&CK, CWE, and NIST SSDF compliance outputs with a benchmarked detection engine",
|
|
5
|
+
"type": "module",
|
|
6
|
+
"main": "index.js",
|
|
7
|
+
"bin": { "aegis": "index.js" },
|
|
8
|
+
"scripts": {
|
|
9
|
+
"start": "node index.js",
|
|
10
|
+
"benchmark": "node index.js benchmark"
|
|
11
|
+
},
|
|
12
|
+
"keywords": [
|
|
13
|
+
"solidity", "smart-contract", "security", "audit", "auditor",
|
|
14
|
+
"blockchain", "ethereum", "web3", "defi", "cli",
|
|
15
|
+
"owasp", "mitre-attack", "cwe", "nist-ssdf", "sarif", "sbom",
|
|
16
|
+
"vulnerability", "static-analysis", "reentrancy"
|
|
17
|
+
],
|
|
18
|
+
"author": "rsh1k",
|
|
19
|
+
"license": "MIT",
|
|
20
|
+
"homepage": "https://github.com/rsh1k/cd13282-blockchain-with-solidity-project#readme",
|
|
21
|
+
"repository": {
|
|
22
|
+
"type": "git",
|
|
23
|
+
"url": "git+https://github.com/rsh1k/cd13282-blockchain-with-solidity-project.git"
|
|
24
|
+
},
|
|
25
|
+
"bugs": {
|
|
26
|
+
"url": "https://github.com/rsh1k/cd13282-blockchain-with-solidity-project/issues"
|
|
27
|
+
},
|
|
28
|
+
"dependencies": {
|
|
29
|
+
"@inquirer/prompts": "^7.5.2",
|
|
30
|
+
"axios": "^1.16.1",
|
|
31
|
+
"boxen": "^8.0.1",
|
|
32
|
+
"chalk": "^5.6.2",
|
|
33
|
+
"commander": "^15.0.0",
|
|
34
|
+
"ora": "^9.4.0"
|
|
35
|
+
},
|
|
36
|
+
"engines": { "node": ">=18.0.0" },
|
|
37
|
+
"files": ["index.js", "src/", "benchmark/", "LICENSE", "README.md"]
|
|
38
|
+
}
|