sonobat 0.1.0 → 0.2.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -1,15 +1,18 @@
1
1
  # sonobat
2
2
 
3
+ [![CI](https://github.com/0x6d61/sonobat/actions/workflows/ci.yml/badge.svg)](https://github.com/0x6d61/sonobat/actions/workflows/ci.yml)
4
+
3
5
  **AttackDataGraph for autonomous penetration testing.**
4
6
 
5
- sonobat is a normalized data store that ingests tool outputs (nmap, ffuf, nuclei), builds a structured attack graph, and proposes next-step actions based on missing data. It exposes an [MCP Server](https://modelcontextprotocol.io/) so that LLM agents can drive the entire reconnaissance-to-exploitation loop autonomously.
7
+ sonobat is a normalized data store that ingests tool outputs (nmap, ffuf, nuclei), builds a structured attack graph, and proposes next-step actions based on missing data. It includes a built-in **Datalog inference engine** for attack path analysis and exposes an [MCP Server](https://modelcontextprotocol.io/) so that LLM agents can drive the entire reconnaissance-to-exploitation loop autonomously.
6
8
 
7
9
  ## Features
8
10
 
9
11
  - **Ingest** — Parse nmap XML, ffuf JSON, and nuclei JSONL into a normalized SQLite graph
10
12
  - **Normalize** — Deduplicate and link hosts, services, endpoints, inputs, observations, credentials, and vulnerabilities
11
13
  - **Propose** — Gap-driven engine suggests what to scan next based on missing data
12
- - **MCP Server** — 14 tools + 3 resources accessible via stdio for LLM agents (Claude Desktop, Claude Code, etc.)
14
+ - **Datalog Inference** — Built-in Datalog engine for attack path analysis with preset and custom rules
15
+ - **MCP Server** — 17 tools + 3 resources accessible via stdio for LLM agents (Claude Desktop, Claude Code, etc.)
13
16
 
14
17
  ## Data Model
15
18
 
@@ -53,7 +56,7 @@ npm test
53
56
 
54
57
  ## MCP Server
55
58
 
56
- sonobat runs as an MCP server over stdio. LLM agents connect to it and use tools to ingest data, query the graph, and get next-step proposals.
59
+ sonobat runs as an MCP server over stdio. LLM agents connect to it and use tools to ingest data, query the graph, run Datalog inference, and get next-step proposals.
57
60
 
58
61
  ### Available Tools
59
62
 
@@ -73,6 +76,9 @@ sonobat runs as an MCP server over stdio. LLM agents connect to it and use tools
73
76
  | | `add_credential` | Add a credential for a service |
74
77
  | | `add_vulnerability` | Add a vulnerability for a service |
75
78
  | | `link_cve` | Link a CVE record to a vulnerability |
79
+ | **Datalog** | `list_facts` | Show database contents as Datalog facts |
80
+ | | `run_datalog` | Execute a custom Datalog program against the database |
81
+ | | `query_attack_paths` | Run preset or saved attack pattern analysis |
76
82
 
77
83
  ### MCP Resources
78
84
 
@@ -82,6 +88,69 @@ sonobat runs as an MCP server over stdio. LLM agents connect to it and use tools
82
88
  | `sonobat://hosts/{id}` | Host detail with full service tree |
83
89
  | `sonobat://summary` | Overall statistics |
84
90
 
91
+ ## Datalog Inference Engine
92
+
93
+ sonobat includes a built-in Datalog inference engine that enables attack path analysis by reasoning over the normalized database.
94
+
95
+ ### How It Works
96
+
97
+ 1. **Fact Extraction** — Database rows are automatically converted to Datalog facts (e.g., `host("h-001", "10.0.0.1", "IP")`)
98
+ 2. **Rule Evaluation** — Naive bottom-up evaluator with fixed-point iteration derives new facts from rules
99
+ 3. **Query Answering** — Queries return matching tuples with variable bindings
100
+
101
+ ### Available Predicates
102
+
103
+ | Predicate | Arity | Source Table |
104
+ |-----------|-------|-------------|
105
+ | `host(Id, Authority, Kind)` | 3 | hosts |
106
+ | `service(HostId, Id, Transport, Port, AppProto, State)` | 6 | services |
107
+ | `http_endpoint(ServiceId, Id, Method, Path, StatusCode)` | 5 | http_endpoints |
108
+ | `input(ServiceId, Id, Location, Name)` | 4 | inputs |
109
+ | `endpoint_input(EndpointId, InputId)` | 2 | endpoint_inputs |
110
+ | `observation(InputId, Id, RawValue, Source, Confidence)` | 5 | observations |
111
+ | `credential(ServiceId, Id, Username, SecretType, Source, Confidence)` | 6 | credentials |
112
+ | `vulnerability(ServiceId, Id, VulnType, Title, Severity, Confidence)` | 6 | vulnerabilities |
113
+ | `vulnerability_endpoint(VulnId, EndpointId)` | 2 | vulnerabilities |
114
+ | `cve(VulnId, CveId, CvssScore)` | 3 | cves |
115
+ | `vhost(HostId, Id, Hostname, Source)` | 4 | vhosts |
116
+
117
+ ### Preset Attack Patterns
118
+
119
+ | Pattern | Description |
120
+ |---------|-------------|
121
+ | `reachable_services` | Open services reachable on each host |
122
+ | `authenticated_access` | Services with known credentials |
123
+ | `exploitable_endpoints` | Endpoints with confirmed vulnerabilities |
124
+ | `critical_vulns` | Critical and high severity vulnerabilities |
125
+ | `attack_surface` | Full attack surface overview |
126
+ | `unfuzzed_inputs` | Inputs with observations but no vulnerabilities found yet |
127
+
128
+ ### Custom Rules
129
+
130
+ LLM agents can write and execute custom Datalog rules via the `run_datalog` MCP tool. Rules can be saved to the database with a `generated_by` field (`human` or `ai`) for future reuse.
131
+
132
+ ```
133
+ % Example: Find all HTTP services with SQL injection vulnerabilities
134
+ sqli_service(HostId, ServiceId, Title) :-
135
+ service(HostId, ServiceId, "tcp", Port, "http", "open"),
136
+ vulnerability(ServiceId, VulnId, "sqli", Title, Severity, Confidence).
137
+ ?- sqli_service(HostId, ServiceId, Title).
138
+ ```
139
+
140
+ ## Propose Engine
141
+
142
+ The proposer analyzes missing data in the attack graph and suggests next actions:
143
+
144
+ | Missing Data Pattern | Proposed Action | Description |
145
+ |---------------------|----------------|-------------|
146
+ | Host has no services | `nmap_scan` | Port scan the host |
147
+ | HTTP service has no endpoints | `ffuf_discovery` | Directory/file discovery |
148
+ | Endpoint has no inputs | `parameter_discovery` | Find input parameters |
149
+ | Input has no observations | `value_collection` | Collect parameter values |
150
+ | Input has observations but no vulnerabilities | `value_fuzz` | Fuzz the parameter with attack payloads |
151
+ | HTTP service has no vhosts | `vhost_discovery` | Virtual host enumeration |
152
+ | HTTP service has no vulnerability scan | `nuclei_scan` | Run vulnerability scanner |
153
+
85
154
  ### Claude Desktop
86
155
 
87
156
  Add to `claude_desktop_config.json`:
@@ -142,6 +211,8 @@ npx @modelcontextprotocol/inspector npx tsx src/index.ts
142
211
  | Validation | Zod |
143
212
  | Build | tsup (esbuild) |
144
213
  | Test | Vitest |
214
+ | Linter | ESLint + @typescript-eslint |
215
+ | Formatter | Prettier |
145
216
 
146
217
  ## Development
147
218
 
@@ -151,7 +222,9 @@ npm test # Run all tests
151
222
  npm run test:watch # Watch mode
152
223
  npm run test:coverage # Coverage report
153
224
  npm run lint # ESLint
225
+ npm run lint:fix # ESLint with auto-fix
154
226
  npm run format # Prettier
227
+ npm run format:check # Prettier check
155
228
  npm run typecheck # tsc --noEmit
156
229
  npm run build # Production build
157
230
  ```