sonobat 0.4.0 → 0.5.1
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +89 -88
- package/dist/index.js +2800 -2909
- package/dist/index.js.map +1 -1
- package/package.json +1 -1
package/README.md
CHANGED
|
@@ -4,33 +4,55 @@
|
|
|
4
4
|
|
|
5
5
|
**AttackDataGraph for autonomous penetration testing.**
|
|
6
6
|
|
|
7
|
-
sonobat is a
|
|
7
|
+
sonobat is a graph-native data store that ingests tool outputs (nmap, ffuf, nuclei), builds a structured attack graph using generic `nodes` + `edges` tables, and proposes next-step actions based on missing data. It includes a **HackTricks knowledge base** with FTS5 full-text search and exposes an [MCP Server](https://modelcontextprotocol.io/) so that LLM agents can drive the entire reconnaissance-to-exploitation loop autonomously.
|
|
8
8
|
|
|
9
9
|
## Features
|
|
10
10
|
|
|
11
11
|
- **Ingest** — Parse nmap XML, ffuf JSON, and nuclei JSONL into a normalized SQLite graph
|
|
12
|
-
- **
|
|
12
|
+
- **Graph-Native Schema** — Generic `nodes` + `edges` tables with Zod-validated props for 10 node kinds and 13 edge kinds
|
|
13
13
|
- **Propose** — Gap-driven engine suggests what to scan next based on missing data
|
|
14
|
-
- **
|
|
15
|
-
- **
|
|
14
|
+
- **Graph Traversal** — SQLite recursive CTE queries for attack path analysis with preset patterns
|
|
15
|
+
- **Knowledge Base** — HackTricks documentation with auto-clone, incremental indexing, and FTS5 full-text search
|
|
16
|
+
- **Continuous Pentest** — Engagement/run lifecycle, action queue with deduplication, finding tracking with state machine, and time-series risk snapshots
|
|
17
|
+
- **MCP Server** — 6 tools + 4 resources accessible via stdio for LLM agents (Claude Desktop, Claude Code, etc.)
|
|
16
18
|
|
|
17
19
|
## Data Model
|
|
18
20
|
|
|
19
21
|
```
|
|
20
|
-
|
|
21
|
-
├──
|
|
22
|
-
|
|
23
|
-
|
|
24
|
-
|
|
25
|
-
|
|
26
|
-
|
|
27
|
-
|
|
28
|
-
|
|
29
|
-
|
|
30
|
-
|
|
22
|
+
nodes (kind + props_json)
|
|
23
|
+
├── host — IP or domain target
|
|
24
|
+
├── vhost — Virtual host
|
|
25
|
+
├── service — Transport + port + protocol
|
|
26
|
+
├── endpoint — HTTP method + path
|
|
27
|
+
├── input — Parameter (query, body, header, etc.)
|
|
28
|
+
├── observation — Observed value for an input
|
|
29
|
+
├── credential — Username + secret
|
|
30
|
+
├── vulnerability — Detected vulnerability
|
|
31
|
+
├── cve — CVE record
|
|
32
|
+
└── svc_observation — Service-level key-value observation
|
|
33
|
+
|
|
34
|
+
edges (kind + source_id + target_id)
|
|
35
|
+
HOST_SERVICE, HOST_VHOST, SERVICE_ENDPOINT, SERVICE_INPUT,
|
|
36
|
+
SERVICE_CREDENTIAL, SERVICE_VULNERABILITY, SERVICE_OBSERVATION,
|
|
37
|
+
ENDPOINT_INPUT, ENDPOINT_VULNERABILITY, ENDPOINT_CREDENTIAL,
|
|
38
|
+
INPUT_OBSERVATION, VULNERABILITY_CVE, VHOST_ENDPOINT
|
|
31
39
|
```
|
|
32
40
|
|
|
33
|
-
Every
|
|
41
|
+
Every node can be linked to an **Artifact** (evidence), ensuring full traceability.
|
|
42
|
+
|
|
43
|
+
### Operational Tables (v0.5.0)
|
|
44
|
+
|
|
45
|
+
```
|
|
46
|
+
engagements — Long-lived assessment context (STG continuous testing)
|
|
47
|
+
└── runs — Execution cycle (manual/scheduled/event-triggered)
|
|
48
|
+
├── action_queue — Proposed actions with priority queue + deduplication
|
|
49
|
+
│ └── action_executions — Attempt history and outcomes
|
|
50
|
+
├── findings — Vulnerability lifecycle (open → fixed/accepted_risk)
|
|
51
|
+
│ └── finding_events — Immutable state transition log
|
|
52
|
+
└── risk_snapshots — Time-series risk metrics for trend analysis
|
|
53
|
+
```
|
|
54
|
+
|
|
55
|
+
`scans` and `artifacts` gain `engagement_id` / `run_id` lineage columns for full traceability.
|
|
34
56
|
|
|
35
57
|
## Quick Start
|
|
36
58
|
|
|
@@ -56,86 +78,45 @@ npm test
|
|
|
56
78
|
|
|
57
79
|
## MCP Server
|
|
58
80
|
|
|
59
|
-
sonobat runs as an MCP server over stdio. LLM agents connect to it and use tools to ingest data, query the graph,
|
|
60
|
-
|
|
61
|
-
### Available Tools
|
|
62
|
-
|
|
63
|
-
| Category | Tool | Description |
|
|
64
|
-
|----------|------|-------------|
|
|
65
|
-
| **Ingest** | `ingest_file` | Ingest a tool output file and normalize it into the graph |
|
|
66
|
-
| **Query** | `list_hosts` | List all discovered hosts |
|
|
67
|
-
| | `get_host` | Get host details including services and vhosts |
|
|
68
|
-
| | `list_services` | List services for a host |
|
|
69
|
-
| | `list_endpoints` | List HTTP endpoints for a service |
|
|
70
|
-
| | `list_inputs` | List input parameters for a service |
|
|
71
|
-
| | `list_observations` | List observed values for an input |
|
|
72
|
-
| | `list_credentials` | List credentials (optionally filtered by service) |
|
|
73
|
-
| | `list_vulnerabilities` | List vulnerabilities (optionally filtered by service/severity) |
|
|
74
|
-
| **Propose** | `propose` | Suggest next actions based on missing data |
|
|
75
|
-
| **Mutation** | `add_host` | Manually add a host |
|
|
76
|
-
| | `add_credential` | Add a credential for a service |
|
|
77
|
-
| | `add_vulnerability` | Add a vulnerability for a service |
|
|
78
|
-
| | `link_cve` | Link a CVE record to a vulnerability |
|
|
79
|
-
| **Datalog** | `list_facts` | Show database contents as Datalog facts |
|
|
80
|
-
| | `run_datalog` | Execute a custom Datalog program against the database |
|
|
81
|
-
| | `query_attack_paths` | Run preset or saved attack pattern analysis |
|
|
82
|
-
|
|
83
|
-
### MCP Resources
|
|
84
|
-
|
|
85
|
-
| URI | Description |
|
|
86
|
-
|-----|-------------|
|
|
87
|
-
| `sonobat://hosts` | Host list (JSON) |
|
|
88
|
-
| `sonobat://hosts/{id}` | Host detail with full service tree |
|
|
89
|
-
| `sonobat://summary` | Overall statistics |
|
|
90
|
-
|
|
91
|
-
## Datalog Inference Engine
|
|
92
|
-
|
|
93
|
-
sonobat includes a built-in Datalog inference engine that enables attack path analysis by reasoning over the normalized database.
|
|
94
|
-
|
|
95
|
-
### How It Works
|
|
96
|
-
|
|
97
|
-
1. **Fact Extraction** — Database rows are automatically converted to Datalog facts (e.g., `host("h-001", "10.0.0.1", "IP")`)
|
|
98
|
-
2. **Rule Evaluation** — Naive bottom-up evaluator with fixed-point iteration derives new facts from rules
|
|
99
|
-
3. **Query Answering** — Queries return matching tuples with variable bindings
|
|
81
|
+
sonobat runs as an MCP server over stdio. LLM agents connect to it and use tools to ingest data, query the graph, traverse attack paths, and get next-step proposals.
|
|
100
82
|
|
|
101
|
-
### Available
|
|
83
|
+
### Available Tools (6)
|
|
102
84
|
|
|
103
|
-
|
|
|
104
|
-
|
|
105
|
-
|
|
|
106
|
-
| `
|
|
107
|
-
| `
|
|
108
|
-
| `
|
|
109
|
-
|
|
|
110
|
-
| `
|
|
111
|
-
| `
|
|
112
|
-
| `
|
|
113
|
-
|
|
|
114
|
-
|
|
|
115
|
-
|
|
|
85
|
+
| Tool | Actions / Description |
|
|
86
|
+
|------|----------------------|
|
|
87
|
+
| **`query`** | `list_nodes` — List nodes by kind with optional JSON filters |
|
|
88
|
+
| | `get_node` — Get node detail with adjacent edges and neighbors |
|
|
89
|
+
| | `traverse` — Recursive graph traversal with depth/edge-kind filters |
|
|
90
|
+
| | `summary` — Node and edge counts by kind |
|
|
91
|
+
| | `attack_paths` — Preset pattern analysis (attack_surface, critical_vulns, etc.) |
|
|
92
|
+
| **`mutate`** | `add_node` — Create or upsert a node with validated props |
|
|
93
|
+
| | `add_edge` — Create an edge between two nodes |
|
|
94
|
+
| | `update_node` — Partial update of node props |
|
|
95
|
+
| | `delete_node` — Delete a node (cascades to edges) |
|
|
96
|
+
| **`ingest_file`** | Ingest a tool output file (nmap/ffuf/nuclei) and normalize into the graph |
|
|
97
|
+
| **`propose`** | Suggest next actions based on missing data in the graph |
|
|
98
|
+
| **`search_kb`** | Full-text search the HackTricks knowledge base |
|
|
99
|
+
| **`index_kb`** | Auto-clone/pull HackTricks and incrementally index documentation |
|
|
116
100
|
|
|
117
|
-
###
|
|
101
|
+
### Attack Path Presets
|
|
118
102
|
|
|
119
103
|
| Pattern | Description |
|
|
120
104
|
|---------|-------------|
|
|
121
|
-
| `
|
|
122
|
-
| `
|
|
123
|
-
| `
|
|
124
|
-
| `
|
|
125
|
-
| `
|
|
126
|
-
| `
|
|
105
|
+
| `attack_surface` | Host → endpoint + input complete paths |
|
|
106
|
+
| `critical_vulns` | Host → service → vulnerability (critical/high severity) |
|
|
107
|
+
| `credential_exposure` | Service → credential mappings |
|
|
108
|
+
| `unscanned_services` | Services with no endpoints discovered |
|
|
109
|
+
| `vuln_by_host` | Vulnerability count by host |
|
|
110
|
+
| `reachable_services` | All services reachable from a host |
|
|
127
111
|
|
|
128
|
-
###
|
|
112
|
+
### MCP Resources (4)
|
|
129
113
|
|
|
130
|
-
|
|
131
|
-
|
|
132
|
-
|
|
133
|
-
|
|
134
|
-
|
|
135
|
-
|
|
136
|
-
vulnerability(ServiceId, VulnId, "sqli", Title, Severity, Confidence).
|
|
137
|
-
?- sqli_service(HostId, ServiceId, Title).
|
|
138
|
-
```
|
|
114
|
+
| URI | Description |
|
|
115
|
+
|-----|-------------|
|
|
116
|
+
| `sonobat://nodes` | Node list (optionally filter by kind) |
|
|
117
|
+
| `sonobat://nodes/{id}` | Node detail with edges and neighbors |
|
|
118
|
+
| `sonobat://summary` | Overall statistics |
|
|
119
|
+
| `sonobat://techniques/categories` | Knowledge base categories |
|
|
139
120
|
|
|
140
121
|
## Propose Engine
|
|
141
122
|
|
|
@@ -151,6 +132,25 @@ The proposer analyzes missing data in the attack graph and suggests next actions
|
|
|
151
132
|
| HTTP service has no vhosts | `vhost_discovery` | Virtual host enumeration |
|
|
152
133
|
| HTTP service has no vulnerability scan | `nuclei_scan` | Run vulnerability scanner |
|
|
153
134
|
|
|
135
|
+
## Knowledge Base (HackTricks)
|
|
136
|
+
|
|
137
|
+
sonobat includes a built-in knowledge base powered by [HackTricks](https://github.com/HackTricks-wiki/hacktricks). When `index_kb` is called without a path, it automatically:
|
|
138
|
+
|
|
139
|
+
1. **Clones** HackTricks to `~/.sonobat/data/hacktricks/` (first run)
|
|
140
|
+
2. **Pulls** latest changes (subsequent runs)
|
|
141
|
+
3. **Incrementally indexes** only new/changed files using file mtime comparison
|
|
142
|
+
|
|
143
|
+
This means `npm install -g sonobat` users get the full knowledge base with a single `index_kb` call — no manual git clone required.
|
|
144
|
+
|
|
145
|
+
| Parameter | Default | Description |
|
|
146
|
+
|-----------|---------|-------------|
|
|
147
|
+
| `path` | `~/.sonobat/data/hacktricks/` | Custom path to a HackTricks directory |
|
|
148
|
+
| `update` | `true` | Set to `false` to skip git pull before indexing |
|
|
149
|
+
|
|
150
|
+
The data directory can be overridden with the `SONOBAT_DATA_DIR` environment variable.
|
|
151
|
+
|
|
152
|
+
## Configuration
|
|
153
|
+
|
|
154
154
|
### Claude Desktop
|
|
155
155
|
|
|
156
156
|
Add to `claude_desktop_config.json`:
|
|
@@ -198,6 +198,7 @@ npx @modelcontextprotocol/inspector npx tsx src/index.ts
|
|
|
198
198
|
| Variable | Default | Description |
|
|
199
199
|
|----------|---------|-------------|
|
|
200
200
|
| `SONOBAT_DB_PATH` | `sonobat.db` | Path to the SQLite database file |
|
|
201
|
+
| `SONOBAT_DATA_DIR` | `~/.sonobat/data/` | Root data directory for auto-cloned repositories |
|
|
201
202
|
|
|
202
203
|
## Tech Stack
|
|
203
204
|
|