sonobat 0.4.0 → 0.5.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -4,33 +4,55 @@
4
4
 
5
5
  **AttackDataGraph for autonomous penetration testing.**
6
6
 
7
- sonobat is a normalized data store that ingests tool outputs (nmap, ffuf, nuclei), builds a structured attack graph, and proposes next-step actions based on missing data. It includes a built-in **Datalog inference engine** for attack path analysis and exposes an [MCP Server](https://modelcontextprotocol.io/) so that LLM agents can drive the entire reconnaissance-to-exploitation loop autonomously.
7
+ sonobat is a graph-native data store that ingests tool outputs (nmap, ffuf, nuclei), builds a structured attack graph using generic `nodes` + `edges` tables, and proposes next-step actions based on missing data. It includes a **HackTricks knowledge base** with FTS5 full-text search and exposes an [MCP Server](https://modelcontextprotocol.io/) so that LLM agents can drive the entire reconnaissance-to-exploitation loop autonomously.
8
8
 
9
9
  ## Features
10
10
 
11
11
  - **Ingest** — Parse nmap XML, ffuf JSON, and nuclei JSONL into a normalized SQLite graph
12
- - **Normalize** — Deduplicate and link hosts, services, endpoints, inputs, observations, credentials, and vulnerabilities
12
+ - **Graph-Native Schema** — Generic `nodes` + `edges` tables with Zod-validated props for 10 node kinds and 13 edge kinds
13
13
  - **Propose** — Gap-driven engine suggests what to scan next based on missing data
14
- - **Datalog Inference** — Built-in Datalog engine for attack path analysis with preset and custom rules
15
- - **MCP Server** — 17 tools + 3 resources accessible via stdio for LLM agents (Claude Desktop, Claude Code, etc.)
14
+ - **Graph Traversal** — SQLite recursive CTE queries for attack path analysis with preset patterns
15
+ - **Knowledge Base** — HackTricks documentation with auto-clone, incremental indexing, and FTS5 full-text search
16
+ - **Continuous Pentest** — Engagement/run lifecycle, action queue with deduplication, finding tracking with state machine, and time-series risk snapshots
17
+ - **MCP Server** — 6 tools + 4 resources accessible via stdio for LLM agents (Claude Desktop, Claude Code, etc.)
16
18
 
17
19
  ## Data Model
18
20
 
19
21
  ```
20
- Host
21
- ├── Vhost
22
- └── Service (transport + port + protocol)
23
- ├── ServiceObservation (key-value)
24
- ├── Credential
25
- ├── HttpEndpoint
26
- │ └── EndpointInput (many-to-many)
27
- ├── Input (location + name)
28
- │ └── Observation (observed values)
29
- └── Vulnerability
30
- └── CVE
22
+ nodes (kind + props_json)
23
+ ├── host — IP or domain target
24
+ ├── vhost — Virtual host
25
+ ├── service — Transport + port + protocol
26
+ ├── endpoint — HTTP method + path
27
+ ├── input — Parameter (query, body, header, etc.)
28
+ ├── observation — Observed value for an input
29
+ ├── credential — Username + secret
30
+ ├── vulnerability — Detected vulnerability
31
+ ├── cve — CVE record
32
+ └── svc_observation — Service-level key-value observation
33
+
34
+ edges (kind + source_id + target_id)
35
+ HOST_SERVICE, HOST_VHOST, SERVICE_ENDPOINT, SERVICE_INPUT,
36
+ SERVICE_CREDENTIAL, SERVICE_VULNERABILITY, SERVICE_OBSERVATION,
37
+ ENDPOINT_INPUT, ENDPOINT_VULNERABILITY, ENDPOINT_CREDENTIAL,
38
+ INPUT_OBSERVATION, VULNERABILITY_CVE, VHOST_ENDPOINT
31
39
  ```
32
40
 
33
- Every fact is linked to an **Artifact** (evidence), ensuring full traceability.
41
+ Every node can be linked to an **Artifact** (evidence), ensuring full traceability.
42
+
43
+ ### Operational Tables (v0.5.0)
44
+
45
+ ```
46
+ engagements — Long-lived assessment context (STG continuous testing)
47
+ └── runs — Execution cycle (manual/scheduled/event-triggered)
48
+ ├── action_queue — Proposed actions with priority queue + deduplication
49
+ │ └── action_executions — Attempt history and outcomes
50
+ ├── findings — Vulnerability lifecycle (open → fixed/accepted_risk)
51
+ │ └── finding_events — Immutable state transition log
52
+ └── risk_snapshots — Time-series risk metrics for trend analysis
53
+ ```
54
+
55
+ `scans` and `artifacts` gain `engagement_id` / `run_id` lineage columns for full traceability.
34
56
 
35
57
  ## Quick Start
36
58
 
@@ -56,86 +78,45 @@ npm test
56
78
 
57
79
  ## MCP Server
58
80
 
59
- sonobat runs as an MCP server over stdio. LLM agents connect to it and use tools to ingest data, query the graph, run Datalog inference, and get next-step proposals.
60
-
61
- ### Available Tools
62
-
63
- | Category | Tool | Description |
64
- |----------|------|-------------|
65
- | **Ingest** | `ingest_file` | Ingest a tool output file and normalize it into the graph |
66
- | **Query** | `list_hosts` | List all discovered hosts |
67
- | | `get_host` | Get host details including services and vhosts |
68
- | | `list_services` | List services for a host |
69
- | | `list_endpoints` | List HTTP endpoints for a service |
70
- | | `list_inputs` | List input parameters for a service |
71
- | | `list_observations` | List observed values for an input |
72
- | | `list_credentials` | List credentials (optionally filtered by service) |
73
- | | `list_vulnerabilities` | List vulnerabilities (optionally filtered by service/severity) |
74
- | **Propose** | `propose` | Suggest next actions based on missing data |
75
- | **Mutation** | `add_host` | Manually add a host |
76
- | | `add_credential` | Add a credential for a service |
77
- | | `add_vulnerability` | Add a vulnerability for a service |
78
- | | `link_cve` | Link a CVE record to a vulnerability |
79
- | **Datalog** | `list_facts` | Show database contents as Datalog facts |
80
- | | `run_datalog` | Execute a custom Datalog program against the database |
81
- | | `query_attack_paths` | Run preset or saved attack pattern analysis |
82
-
83
- ### MCP Resources
84
-
85
- | URI | Description |
86
- |-----|-------------|
87
- | `sonobat://hosts` | Host list (JSON) |
88
- | `sonobat://hosts/{id}` | Host detail with full service tree |
89
- | `sonobat://summary` | Overall statistics |
90
-
91
- ## Datalog Inference Engine
92
-
93
- sonobat includes a built-in Datalog inference engine that enables attack path analysis by reasoning over the normalized database.
94
-
95
- ### How It Works
96
-
97
- 1. **Fact Extraction** — Database rows are automatically converted to Datalog facts (e.g., `host("h-001", "10.0.0.1", "IP")`)
98
- 2. **Rule Evaluation** — Naive bottom-up evaluator with fixed-point iteration derives new facts from rules
99
- 3. **Query Answering** — Queries return matching tuples with variable bindings
81
+ sonobat runs as an MCP server over stdio. LLM agents connect to it and use tools to ingest data, query the graph, traverse attack paths, and get next-step proposals.
100
82
 
101
- ### Available Predicates
83
+ ### Available Tools (6)
102
84
 
103
- | Predicate | Arity | Source Table |
104
- |-----------|-------|-------------|
105
- | `host(Id, Authority, Kind)` | 3 | hosts |
106
- | `service(HostId, Id, Transport, Port, AppProto, State)` | 6 | services |
107
- | `http_endpoint(ServiceId, Id, Method, Path, StatusCode)` | 5 | http_endpoints |
108
- | `input(ServiceId, Id, Location, Name)` | 4 | inputs |
109
- | `endpoint_input(EndpointId, InputId)` | 2 | endpoint_inputs |
110
- | `observation(InputId, Id, RawValue, Source, Confidence)` | 5 | observations |
111
- | `credential(ServiceId, Id, Username, SecretType, Source, Confidence)` | 6 | credentials |
112
- | `vulnerability(ServiceId, Id, VulnType, Title, Severity, Confidence)` | 6 | vulnerabilities |
113
- | `vulnerability_endpoint(VulnId, EndpointId)` | 2 | vulnerabilities |
114
- | `cve(VulnId, CveId, CvssScore)` | 3 | cves |
115
- | `vhost(HostId, Id, Hostname, Source)` | 4 | vhosts |
85
+ | Tool | Actions / Description |
86
+ |------|----------------------|
87
+ | **`query`** | `list_nodes` List nodes by kind with optional JSON filters |
88
+ | | `get_node` Get node detail with adjacent edges and neighbors |
89
+ | | `traverse` Recursive graph traversal with depth/edge-kind filters |
90
+ | | `summary` Node and edge counts by kind |
91
+ | | `attack_paths` Preset pattern analysis (attack_surface, critical_vulns, etc.) |
92
+ | **`mutate`** | `add_node` Create or upsert a node with validated props |
93
+ | | `add_edge` Create an edge between two nodes |
94
+ | | `update_node` Partial update of node props |
95
+ | | `delete_node` Delete a node (cascades to edges) |
96
+ | **`ingest_file`** | Ingest a tool output file (nmap/ffuf/nuclei) and normalize into the graph |
97
+ | **`propose`** | Suggest next actions based on missing data in the graph |
98
+ | **`search_kb`** | Full-text search the HackTricks knowledge base |
99
+ | **`index_kb`** | Auto-clone/pull HackTricks and incrementally index documentation |
116
100
 
117
- ### Preset Attack Patterns
101
+ ### Attack Path Presets
118
102
 
119
103
  | Pattern | Description |
120
104
  |---------|-------------|
121
- | `reachable_services` | Open services reachable on each host |
122
- | `authenticated_access` | Services with known credentials |
123
- | `exploitable_endpoints` | Endpoints with confirmed vulnerabilities |
124
- | `critical_vulns` | Critical and high severity vulnerabilities |
125
- | `attack_surface` | Full attack surface overview |
126
- | `unfuzzed_inputs` | Inputs with observations but no vulnerabilities found yet |
105
+ | `attack_surface` | Host endpoint + input complete paths |
106
+ | `critical_vulns` | Host service vulnerability (critical/high severity) |
107
+ | `credential_exposure` | Service credential mappings |
108
+ | `unscanned_services` | Services with no endpoints discovered |
109
+ | `vuln_by_host` | Vulnerability count by host |
110
+ | `reachable_services` | All services reachable from a host |
127
111
 
128
- ### Custom Rules
112
+ ### MCP Resources (4)
129
113
 
130
- LLM agents can write and execute custom Datalog rules via the `run_datalog` MCP tool. Rules can be saved to the database with a `generated_by` field (`human` or `ai`) for future reuse.
131
-
132
- ```
133
- % Example: Find all HTTP services with SQL injection vulnerabilities
134
- sqli_service(HostId, ServiceId, Title) :-
135
- service(HostId, ServiceId, "tcp", Port, "http", "open"),
136
- vulnerability(ServiceId, VulnId, "sqli", Title, Severity, Confidence).
137
- ?- sqli_service(HostId, ServiceId, Title).
138
- ```
114
+ | URI | Description |
115
+ |-----|-------------|
116
+ | `sonobat://nodes` | Node list (optionally filter by kind) |
117
+ | `sonobat://nodes/{id}` | Node detail with edges and neighbors |
118
+ | `sonobat://summary` | Overall statistics |
119
+ | `sonobat://techniques/categories` | Knowledge base categories |
139
120
 
140
121
  ## Propose Engine
141
122
 
@@ -151,6 +132,25 @@ The proposer analyzes missing data in the attack graph and suggests next actions
151
132
  | HTTP service has no vhosts | `vhost_discovery` | Virtual host enumeration |
152
133
  | HTTP service has no vulnerability scan | `nuclei_scan` | Run vulnerability scanner |
153
134
 
135
+ ## Knowledge Base (HackTricks)
136
+
137
+ sonobat includes a built-in knowledge base powered by [HackTricks](https://github.com/HackTricks-wiki/hacktricks). When `index_kb` is called without a path, it automatically:
138
+
139
+ 1. **Clones** HackTricks to `~/.sonobat/data/hacktricks/` (first run)
140
+ 2. **Pulls** latest changes (subsequent runs)
141
+ 3. **Incrementally indexes** only new/changed files using file mtime comparison
142
+
143
+ This means `npm install -g sonobat` users get the full knowledge base with a single `index_kb` call — no manual git clone required.
144
+
145
+ | Parameter | Default | Description |
146
+ |-----------|---------|-------------|
147
+ | `path` | `~/.sonobat/data/hacktricks/` | Custom path to a HackTricks directory |
148
+ | `update` | `true` | Set to `false` to skip git pull before indexing |
149
+
150
+ The data directory can be overridden with the `SONOBAT_DATA_DIR` environment variable.
151
+
152
+ ## Configuration
153
+
154
154
  ### Claude Desktop
155
155
 
156
156
  Add to `claude_desktop_config.json`:
@@ -198,6 +198,7 @@ npx @modelcontextprotocol/inspector npx tsx src/index.ts
198
198
  | Variable | Default | Description |
199
199
  |----------|---------|-------------|
200
200
  | `SONOBAT_DB_PATH` | `sonobat.db` | Path to the SQLite database file |
201
+ | `SONOBAT_DATA_DIR` | `~/.sonobat/data/` | Root data directory for auto-cloned repositories |
201
202
 
202
203
  ## Tech Stack
203
204