voidaccess 1.3.0__tar.gz
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- voidaccess-1.3.0/LICENSE +21 -0
- voidaccess-1.3.0/PKG-INFO +395 -0
- voidaccess-1.3.0/README.md +349 -0
- voidaccess-1.3.0/analysis/__init__.py +49 -0
- voidaccess-1.3.0/analysis/opsec.py +454 -0
- voidaccess-1.3.0/analysis/patterns.py +202 -0
- voidaccess-1.3.0/analysis/temporal.py +201 -0
- voidaccess-1.3.0/api/__init__.py +1 -0
- voidaccess-1.3.0/api/auth.py +163 -0
- voidaccess-1.3.0/api/main.py +509 -0
- voidaccess-1.3.0/api/routes/__init__.py +1 -0
- voidaccess-1.3.0/api/routes/admin.py +214 -0
- voidaccess-1.3.0/api/routes/auth.py +157 -0
- voidaccess-1.3.0/api/routes/entities.py +871 -0
- voidaccess-1.3.0/api/routes/export.py +359 -0
- voidaccess-1.3.0/api/routes/investigations.py +2567 -0
- voidaccess-1.3.0/api/routes/monitors.py +405 -0
- voidaccess-1.3.0/api/routes/search.py +157 -0
- voidaccess-1.3.0/api/routes/settings.py +851 -0
- voidaccess-1.3.0/auth/__init__.py +1 -0
- voidaccess-1.3.0/auth/token_blacklist.py +108 -0
- voidaccess-1.3.0/cli/__init__.py +3 -0
- voidaccess-1.3.0/cli/adapters/__init__.py +1 -0
- voidaccess-1.3.0/cli/adapters/sqlite.py +273 -0
- voidaccess-1.3.0/cli/browser.py +376 -0
- voidaccess-1.3.0/cli/commands/__init__.py +1 -0
- voidaccess-1.3.0/cli/commands/configure.py +185 -0
- voidaccess-1.3.0/cli/commands/enrich.py +154 -0
- voidaccess-1.3.0/cli/commands/export.py +158 -0
- voidaccess-1.3.0/cli/commands/investigate.py +601 -0
- voidaccess-1.3.0/cli/commands/show.py +87 -0
- voidaccess-1.3.0/cli/config.py +180 -0
- voidaccess-1.3.0/cli/display.py +212 -0
- voidaccess-1.3.0/cli/main.py +154 -0
- voidaccess-1.3.0/cli/tor_detect.py +71 -0
- voidaccess-1.3.0/config.py +180 -0
- voidaccess-1.3.0/crawler/__init__.py +28 -0
- voidaccess-1.3.0/crawler/dedup.py +97 -0
- voidaccess-1.3.0/crawler/frontier.py +115 -0
- voidaccess-1.3.0/crawler/spider.py +462 -0
- voidaccess-1.3.0/crawler/utils.py +122 -0
- voidaccess-1.3.0/db/__init__.py +47 -0
- voidaccess-1.3.0/db/migrations/__init__.py +0 -0
- voidaccess-1.3.0/db/migrations/env.py +80 -0
- voidaccess-1.3.0/db/migrations/versions/0001_initial_schema.py +270 -0
- voidaccess-1.3.0/db/migrations/versions/0002_add_investigation_status_column.py +27 -0
- voidaccess-1.3.0/db/migrations/versions/0002_add_missing_tables.py +33 -0
- voidaccess-1.3.0/db/migrations/versions/0003_add_canonical_value_and_entity_links.py +61 -0
- voidaccess-1.3.0/db/migrations/versions/0004_add_page_posted_at.py +41 -0
- voidaccess-1.3.0/db/migrations/versions/0005_add_extraction_method.py +32 -0
- voidaccess-1.3.0/db/migrations/versions/0006_add_monitor_alerts.py +26 -0
- voidaccess-1.3.0/db/migrations/versions/0007_add_actor_style_profiles.py +23 -0
- voidaccess-1.3.0/db/migrations/versions/0008_add_users_table.py +47 -0
- voidaccess-1.3.0/db/migrations/versions/0009_add_investigation_id_to_relationships.py +29 -0
- voidaccess-1.3.0/db/migrations/versions/0010_add_composite_index_entity_relationships.py +22 -0
- voidaccess-1.3.0/db/migrations/versions/0011_add_page_extraction_cache.py +52 -0
- voidaccess-1.3.0/db/migrations/versions/0013_add_graph_status.py +31 -0
- voidaccess-1.3.0/db/migrations/versions/0015_add_progress_fields.py +41 -0
- voidaccess-1.3.0/db/migrations/versions/0016_backfill_graph_status.py +33 -0
- voidaccess-1.3.0/db/migrations/versions/0017_add_user_api_keys.py +44 -0
- voidaccess-1.3.0/db/migrations/versions/0018_add_user_id_to_investigations.py +33 -0
- voidaccess-1.3.0/db/migrations/versions/0019_add_content_safety_log.py +46 -0
- voidaccess-1.3.0/db/migrations/versions/0020_add_entity_source_tracking.py +50 -0
- voidaccess-1.3.0/db/models.py +618 -0
- voidaccess-1.3.0/db/queries.py +841 -0
- voidaccess-1.3.0/db/session.py +270 -0
- voidaccess-1.3.0/export/__init__.py +34 -0
- voidaccess-1.3.0/export/misp.py +257 -0
- voidaccess-1.3.0/export/sigma.py +342 -0
- voidaccess-1.3.0/export/stix.py +418 -0
- voidaccess-1.3.0/extractor/__init__.py +21 -0
- voidaccess-1.3.0/extractor/llm_extract.py +372 -0
- voidaccess-1.3.0/extractor/ner.py +512 -0
- voidaccess-1.3.0/extractor/normalizer.py +638 -0
- voidaccess-1.3.0/extractor/pipeline.py +401 -0
- voidaccess-1.3.0/extractor/regex_patterns.py +325 -0
- voidaccess-1.3.0/fingerprint/__init__.py +33 -0
- voidaccess-1.3.0/fingerprint/profiler.py +240 -0
- voidaccess-1.3.0/fingerprint/stylometry.py +249 -0
- voidaccess-1.3.0/graph/__init__.py +73 -0
- voidaccess-1.3.0/graph/builder.py +894 -0
- voidaccess-1.3.0/graph/export.py +225 -0
- voidaccess-1.3.0/graph/model.py +83 -0
- voidaccess-1.3.0/graph/queries.py +297 -0
- voidaccess-1.3.0/graph/visualize.py +178 -0
- voidaccess-1.3.0/i18n/__init__.py +24 -0
- voidaccess-1.3.0/i18n/detect.py +76 -0
- voidaccess-1.3.0/i18n/query_expand.py +72 -0
- voidaccess-1.3.0/i18n/translate.py +210 -0
- voidaccess-1.3.0/monitor/__init__.py +27 -0
- voidaccess-1.3.0/monitor/_db.py +74 -0
- voidaccess-1.3.0/monitor/alerts.py +345 -0
- voidaccess-1.3.0/monitor/config.py +118 -0
- voidaccess-1.3.0/monitor/diff.py +75 -0
- voidaccess-1.3.0/monitor/jobs.py +247 -0
- voidaccess-1.3.0/monitor/scheduler.py +184 -0
- voidaccess-1.3.0/pyproject.toml +93 -0
- voidaccess-1.3.0/scraper/__init__.py +0 -0
- voidaccess-1.3.0/scraper/scrape.py +857 -0
- voidaccess-1.3.0/scraper/scrape_js.py +272 -0
- voidaccess-1.3.0/search/__init__.py +318 -0
- voidaccess-1.3.0/search/circuit_breaker.py +240 -0
- voidaccess-1.3.0/search/search.py +334 -0
- voidaccess-1.3.0/setup.cfg +4 -0
- voidaccess-1.3.0/sources/__init__.py +96 -0
- voidaccess-1.3.0/sources/blockchain.py +444 -0
- voidaccess-1.3.0/sources/cache.py +93 -0
- voidaccess-1.3.0/sources/cisa.py +108 -0
- voidaccess-1.3.0/sources/dns_enrichment.py +557 -0
- voidaccess-1.3.0/sources/domain_reputation.py +643 -0
- voidaccess-1.3.0/sources/email_reputation.py +635 -0
- voidaccess-1.3.0/sources/engines.py +244 -0
- voidaccess-1.3.0/sources/enrichment.py +1244 -0
- voidaccess-1.3.0/sources/github_scraper.py +589 -0
- voidaccess-1.3.0/sources/gitlab_scraper.py +624 -0
- voidaccess-1.3.0/sources/hash_reputation.py +856 -0
- voidaccess-1.3.0/sources/historical_intel.py +253 -0
- voidaccess-1.3.0/sources/ip_reputation.py +521 -0
- voidaccess-1.3.0/sources/paste_scraper.py +484 -0
- voidaccess-1.3.0/sources/pastes.py +278 -0
- voidaccess-1.3.0/sources/rss_scraper.py +576 -0
- voidaccess-1.3.0/sources/seed_manager.py +373 -0
- voidaccess-1.3.0/sources/seeds.py +368 -0
- voidaccess-1.3.0/sources/shodan.py +103 -0
- voidaccess-1.3.0/sources/telegram.py +199 -0
- voidaccess-1.3.0/sources/virustotal.py +113 -0
- voidaccess-1.3.0/tests/test_analysis_opsec.py +34 -0
- voidaccess-1.3.0/tests/test_analysis_stylometry.py +50 -0
- voidaccess-1.3.0/tests/test_analysis_temporal.py +69 -0
- voidaccess-1.3.0/tests/test_api.py +588 -0
- voidaccess-1.3.0/tests/test_api_monitors.py +127 -0
- voidaccess-1.3.0/tests/test_blockchain.py +59 -0
- voidaccess-1.3.0/tests/test_config.py +57 -0
- voidaccess-1.3.0/tests/test_crawler.py +936 -0
- voidaccess-1.3.0/tests/test_db.py +855 -0
- voidaccess-1.3.0/tests/test_dns_enrichment.py +269 -0
- voidaccess-1.3.0/tests/test_domain_reputation.py +547 -0
- voidaccess-1.3.0/tests/test_email_reputation.py +552 -0
- voidaccess-1.3.0/tests/test_fingerprint.py +393 -0
- voidaccess-1.3.0/tests/test_github_scraper.py +270 -0
- voidaccess-1.3.0/tests/test_gitlab_scraper.py +336 -0
- voidaccess-1.3.0/tests/test_graph.py +1027 -0
- voidaccess-1.3.0/tests/test_hash_reputation.py +735 -0
- voidaccess-1.3.0/tests/test_i18n.py +320 -0
- voidaccess-1.3.0/tests/test_ip_reputation.py +448 -0
- voidaccess-1.3.0/tests/test_llm.py +138 -0
- voidaccess-1.3.0/tests/test_llm_utils.py +155 -0
- voidaccess-1.3.0/tests/test_model_singleton.py +73 -0
- voidaccess-1.3.0/tests/test_monitor.py +339 -0
- voidaccess-1.3.0/tests/test_pagination.py +100 -0
- voidaccess-1.3.0/tests/test_paste_scraper.py +245 -0
- voidaccess-1.3.0/tests/test_rss_scraper.py +359 -0
- voidaccess-1.3.0/tests/test_scrape_js.py +80 -0
- voidaccess-1.3.0/tests/test_settings.py +295 -0
- voidaccess-1.3.0/tests/test_sources.py +894 -0
- voidaccess-1.3.0/tests/test_sources_enrichment_new.py +380 -0
- voidaccess-1.3.0/tests/test_vector.py +226 -0
- voidaccess-1.3.0/utils/__init__.py +0 -0
- voidaccess-1.3.0/utils/async_utils.py +89 -0
- voidaccess-1.3.0/utils/content_safety.py +193 -0
- voidaccess-1.3.0/utils/defang.py +94 -0
- voidaccess-1.3.0/utils/encryption.py +34 -0
- voidaccess-1.3.0/utils/ioc_freshness.py +124 -0
- voidaccess-1.3.0/utils/user_keys.py +33 -0
- voidaccess-1.3.0/vector/__init__.py +39 -0
- voidaccess-1.3.0/vector/embedder.py +100 -0
- voidaccess-1.3.0/vector/model_singleton.py +49 -0
- voidaccess-1.3.0/vector/search.py +87 -0
- voidaccess-1.3.0/vector/store.py +514 -0
- voidaccess-1.3.0/voidaccess/__init__.py +0 -0
- voidaccess-1.3.0/voidaccess/llm.py +717 -0
- voidaccess-1.3.0/voidaccess/llm_utils.py +696 -0
- voidaccess-1.3.0/voidaccess.egg-info/PKG-INFO +395 -0
- voidaccess-1.3.0/voidaccess.egg-info/SOURCES.txt +176 -0
- voidaccess-1.3.0/voidaccess.egg-info/dependency_links.txt +1 -0
- voidaccess-1.3.0/voidaccess.egg-info/entry_points.txt +2 -0
- voidaccess-1.3.0/voidaccess.egg-info/requires.txt +26 -0
- voidaccess-1.3.0/voidaccess.egg-info/top_level.txt +19 -0
voidaccess-1.3.0/LICENSE
ADDED
|
@@ -0,0 +1,21 @@
|
|
|
1
|
+
MIT License
|
|
2
|
+
|
|
3
|
+
Copyright (c) 2026 VoidAccess Contributors
|
|
4
|
+
|
|
5
|
+
Permission is hereby granted, free of charge, to any person obtaining a copy
|
|
6
|
+
of this software and associated documentation files (the "Software"), to deal
|
|
7
|
+
in the Software without restriction, including without limitation the rights
|
|
8
|
+
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
|
|
9
|
+
copies of the Software, and to permit persons to whom the Software is
|
|
10
|
+
furnished to do so, subject to the following conditions:
|
|
11
|
+
|
|
12
|
+
The above copyright notice and this permission notice shall be included in all
|
|
13
|
+
copies or substantial portions of the Software.
|
|
14
|
+
|
|
15
|
+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
|
16
|
+
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
|
17
|
+
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
|
|
18
|
+
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
|
19
|
+
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
|
|
20
|
+
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
|
|
21
|
+
SOFTWARE.
|
|
@@ -0,0 +1,395 @@
|
|
|
1
|
+
Metadata-Version: 2.4
|
|
2
|
+
Name: voidaccess
|
|
3
|
+
Version: 1.3.0
|
|
4
|
+
Summary: Dark web OSINT CLI — automated threat intelligence from query to report
|
|
5
|
+
Author: VoidAccess
|
|
6
|
+
License: MIT
|
|
7
|
+
Keywords: osint,darkweb,threat-intelligence,tor,cli
|
|
8
|
+
Classifier: Development Status :: 4 - Beta
|
|
9
|
+
Classifier: Environment :: Console
|
|
10
|
+
Classifier: Intended Audience :: Information Technology
|
|
11
|
+
Classifier: License :: OSI Approved :: MIT License
|
|
12
|
+
Classifier: Operating System :: OS Independent
|
|
13
|
+
Classifier: Programming Language :: Python :: 3
|
|
14
|
+
Classifier: Programming Language :: Python :: 3.11
|
|
15
|
+
Classifier: Programming Language :: Python :: 3.12
|
|
16
|
+
Classifier: Topic :: Security
|
|
17
|
+
Requires-Python: >=3.10
|
|
18
|
+
Description-Content-Type: text/markdown
|
|
19
|
+
License-File: LICENSE
|
|
20
|
+
Requires-Dist: typer>=0.12
|
|
21
|
+
Requires-Dist: rich>=13
|
|
22
|
+
Requires-Dist: textual>=0.60
|
|
23
|
+
Requires-Dist: aiohttp>=3.9
|
|
24
|
+
Requires-Dist: aiohttp-socks>=0.8
|
|
25
|
+
Requires-Dist: sqlalchemy>=2.0
|
|
26
|
+
Requires-Dist: aiosqlite>=0.20
|
|
27
|
+
Requires-Dist: langchain>=0.2
|
|
28
|
+
Requires-Dist: langchain-openai>=0.1
|
|
29
|
+
Requires-Dist: langchain-anthropic>=0.1
|
|
30
|
+
Requires-Dist: langchain-google-genai>=1.0
|
|
31
|
+
Requires-Dist: langchain-groq>=0.1
|
|
32
|
+
Requires-Dist: python-dotenv>=1.0
|
|
33
|
+
Requires-Dist: httpx>=0.27
|
|
34
|
+
Requires-Dist: spacy>=3.7
|
|
35
|
+
Requires-Dist: beautifulsoup4>=4.12
|
|
36
|
+
Requires-Dist: feedparser>=6.0
|
|
37
|
+
Requires-Dist: python-dateutil>=2.9
|
|
38
|
+
Requires-Dist: trafilatura>=1.6
|
|
39
|
+
Requires-Dist: requests>=2.31
|
|
40
|
+
Requires-Dist: python-socks>=2.4
|
|
41
|
+
Requires-Dist: tldextract>=5.1
|
|
42
|
+
Provides-Extra: dev
|
|
43
|
+
Requires-Dist: pytest; extra == "dev"
|
|
44
|
+
Requires-Dist: pytest-asyncio; extra == "dev"
|
|
45
|
+
Dynamic: license-file
|
|
46
|
+
|
|
47
|
+
<div align="center">
|
|
48
|
+
<img src="./public/logo_circle.png" width="160" alt="VoidAccess Logo">
|
|
49
|
+
<h1>VoidAccess</h1>
|
|
50
|
+
<p><strong>A self-hosted OSINT platform for dark web threat intelligence.</strong></p>
|
|
51
|
+
<p>Automate the entire investigation workflow from query refinement to relationship mapping in 13 autonomous pipeline steps.</p>
|
|
52
|
+
</div>
|
|
53
|
+
|
|
54
|
+
---
|
|
55
|
+
|
|
56
|
+
## The OSINT Powerhouse
|
|
57
|
+
|
|
58
|
+
Commercial threat intelligence platforms often charge prohibitive annual fees for capabilities that can be run on private hardware. **VoidAccess** democratizes high-end dark web intelligence by providing an automated, end-to-end workflow:
|
|
59
|
+
|
|
60
|
+
- **Query Refinement**: Intelligent search term optimization using LLMs.
|
|
61
|
+
- **Multilingual Search**: Deep-web fan-out across English, Russian, and Chinese engines.
|
|
62
|
+
- **Entity Extraction**: Autonomous identification of wallets, IOCs, PGP keys, and more.
|
|
63
|
+
- **Relationship Mapping**: Dynamic graph generation from extracted data co-occurrence.
|
|
64
|
+
- **Structured Export**: STIX 2.1, MISP, Sigma, and CSV support.
|
|
65
|
+
|
|
66
|
+
---
|
|
67
|
+
|
|
68
|
+
## Visual Walkthrough
|
|
69
|
+
|
|
70
|
+
### 1. Intuitive Dashboard
|
|
71
|
+
Start investigations with a clean, dark-themed interface designed for high-stakes research.
|
|
72
|
+

|
|
73
|
+
|
|
74
|
+
### 2. Intelligent Scoping
|
|
75
|
+
Refine queries and select investigation depth with precision.
|
|
76
|
+

|
|
77
|
+
|
|
78
|
+
### 3. Real-time Pipeline Tracking
|
|
79
|
+
Monitor the 13-step autonomous pipeline as it crawls and extracts intelligence.
|
|
80
|
+

|
|
81
|
+
|
|
82
|
+
### 4. Interactive Graph Intelligence
|
|
83
|
+
Explore connections between entities, onion sites, and threat actors in a dynamic, high-contrast graph.
|
|
84
|
+

|
|
85
|
+
|
|
86
|
+
### 5. Comprehensive Intel Reports
|
|
87
|
+
Get structured summaries and actionable artifacts once the scan completes.
|
|
88
|
+

|
|
89
|
+
|
|
90
|
+
---
|
|
91
|
+
|
|
92
|
+
## How It Works (The 13-Step Pipeline)
|
|
93
|
+
|
|
94
|
+
VoidAccess handles the complexity of dark web research through a rigorous sequence:
|
|
95
|
+
|
|
96
|
+
1. **LLM Query Refinement**: Optimizes search terms for .onion engine indexing.
|
|
97
|
+
2. **Parallel Collection**: Queries 16+ Tor search engines simultaneously with paste sites (Pastebin, dpaste, paste.ee), GitHub, GitLab, and curated RSS security feeds.
|
|
98
|
+
3. **Intelligence Filtering**: LLM filters noise, keeping only relevant intelligence pages.
|
|
99
|
+
4. **Multi-Source Enrichment**: Pulls from AlienVault OTX, abuse.ch, ransomware.live, CISA KEV, Shodan, GreyNoise, AbuseIPDB, Feodo Tracker, C2IntelFeeds, and more — running in parallel with collection.
|
|
100
|
+
5. **Recursive .onion Discovery**: Discovers hidden links via seed URL crawling.
|
|
101
|
+
6. **Vector Cache Check**: Avoids redundant scraping for recently visited pages (24h TTL).
|
|
102
|
+
7. **Tor-Routed Scraping**: Safely fetches page content with a 1MB safety cap.
|
|
103
|
+
8. **Persistence**: Stores new content in the local vector cache.
|
|
104
|
+
9. **Intelligence Merging**: Combines scraped and enriched data for processing.
|
|
105
|
+
10. **Advanced Extraction**: Regex, NER, and LLM-based entity identification.
|
|
106
|
+
11. **Historical Cross-Referencing**: Validates data against seed datasets.
|
|
107
|
+
12. **Graph Construction**: Builds relationship nodes based on co-occurrence.
|
|
108
|
+
13. **Final Intelligence Summary**: LLM generates a structured technical briefing.
|
|
109
|
+
|
|
110
|
+
---
|
|
111
|
+
|
|
112
|
+
## What It Extracts
|
|
113
|
+
|
|
114
|
+
The extraction pipeline identifies these entity types:
|
|
115
|
+
|
|
116
|
+
| Category | Examples |
|
|
117
|
+
|---|---|
|
|
118
|
+
| **Cryptocurrency** | Bitcoin, Ethereum, Monero wallet addresses |
|
|
119
|
+
| **Network Indicators** | IPv4 addresses, .onion URLs, domains, email addresses, PGP keys |
|
|
120
|
+
| **File Indicators** | MD5, SHA1, SHA256 hashes |
|
|
121
|
+
| **Vulnerabilities** | CVE numbers, MITRE ATT&CK techniques |
|
|
122
|
+
| **Threat Actors** | Actor handles, malware families, ransomware group names |
|
|
123
|
+
| **Paste Sites** | Pastebin, Ghostbin, Rentry, and similar links |
|
|
124
|
+
| **People/Orgs** | Named persons, organization names, locations |
|
|
125
|
+
|
|
126
|
+
Parallel collection sources (run alongside Tor search):
|
|
127
|
+
|
|
128
|
+
- **Paste sites** — Pastebin, dpaste, paste.ee, Rentry
|
|
129
|
+
- **GitHub** — code search and repository READMEs
|
|
130
|
+
- **GitLab** — code search and project pages
|
|
131
|
+
- **RSS security feeds** — 20 curated feeds (Krebs, BleepingComputer, Talos, Mandiant, CrowdStrike, Unit 42, CISA, and more)
|
|
132
|
+
- **Curated .onion seed catalogue** — 31 vetted seeds across 8 categories, scored per query
|
|
133
|
+
|
|
134
|
+
Enrichment and quality sources (19 total):
|
|
135
|
+
|
|
136
|
+
- **AlienVault OTX** — threat pulses and malware families
|
|
137
|
+
- **MalwareBazaar** — malware samples and signatures
|
|
138
|
+
- **ThreatFox** — recent IOC feed
|
|
139
|
+
- **URLhaus** — malicious URL database
|
|
140
|
+
- **ransomware.live** — ransomware group tracking and leak-site seeds
|
|
141
|
+
- **CISA KEV** — known exploited vulnerabilities catalog
|
|
142
|
+
- **Shodan InternetDB** — passive vulnerability signatures
|
|
143
|
+
- **VirusTotal** — file hash AV detection ratio (API key required)
|
|
144
|
+
- **GreyNoise** — suppresses known benign scanner IPs from results (API key required)
|
|
145
|
+
- **AbuseIPDB** — community IP abuse reports; 1,000 checks/day free
|
|
146
|
+
- **Feodo Tracker + C2IntelFeeds** — confirmed C2 IPs for 6 major frameworks; no key required
|
|
147
|
+
- **crt.sh** — certificate transparency logs; subdomain enumeration; free
|
|
148
|
+
- **URLScan.io** — live domain scan data and malicious verdicts
|
|
149
|
+
- **Wayback Machine** — historical domain snapshots for taken-down infrastructure
|
|
150
|
+
- **Hybrid Analysis** — behavioral sandbox verdict and AV detection ratio for file hashes
|
|
151
|
+
- **HaveIBeenPwned** — breach history for email addresses (paid API key)
|
|
152
|
+
- **EmailRep** — email reputation scoring and disposable detection
|
|
153
|
+
- **CIRCL PDNS + RDAP** — passive DNS history and WHOIS registration data; free
|
|
154
|
+
- **BlockCypher + Etherscan** — blockchain wallet balance and transaction graph
|
|
155
|
+
|
|
156
|
+
Export formats:
|
|
157
|
+
|
|
158
|
+
- **STIX 2.1** — bundles with indicators, threat actors, malware objects
|
|
159
|
+
- **MISP JSON** — events with galaxies for direct import
|
|
160
|
+
- **Sigma rules** — auto-generated detection rules from extracted IOCs
|
|
161
|
+
- **CSV** — flat entity dumps for spreadsheet analysis
|
|
162
|
+
|
|
163
|
+
---
|
|
164
|
+
|
|
165
|
+
## LLM & Enrichment Ecosystem
|
|
166
|
+
|
|
167
|
+
### Supported LLM Providers
|
|
168
|
+
|
|
169
|
+
| Provider | Models | Notes |
|
|
170
|
+
|---|---|---|
|
|
171
|
+
| **OpenRouter** | DeepSeek, Llama 3.3, Claude Haiku | Recommended default; free models available |
|
|
172
|
+
| **Groq** | Llama 3.3, Llama 3.1 | Fast inference; free tier |
|
|
173
|
+
| **OpenAI** | GPT-4o Mini | API key required |
|
|
174
|
+
| **Anthropic** | Claude Haiku | Haiku is the tested default; other models work via manual override. |
|
|
175
|
+
| **Google Gemini** | Gemini 1.5 Flash, 2.5 Pro | Free tier via AI Studio |
|
|
176
|
+
| **Ollama** | Any local model | Air-gapped; no API key needed |
|
|
177
|
+
|
|
178
|
+
The default is **DeepSeek via OpenRouter** — fast and strong on technical security content. With free-tier LLMs (Groq free, OpenRouter free models, or Ollama) the cost is **$0**. With paid models like DeepSeek via OpenRouter it is **under $0.50 per investigation**. For fully air-gapped deployments, Ollama runs entirely locally.
|
|
179
|
+
|
|
180
|
+
### Optional Enrichment API Keys
|
|
181
|
+
|
|
182
|
+
All enrichment sources that require a key degrade gracefully when the key is absent — they are skipped without failing the investigation. Keys marked "free" require registration but have no cost.
|
|
183
|
+
|
|
184
|
+
| Key | What it does | Free | Sign up |
|
|
185
|
+
|---|---|---|---|
|
|
186
|
+
| `OTX_API_KEY` | AlienVault OTX threat pulses | Yes | [otx.alienvault.com](https://otx.alienvault.com) |
|
|
187
|
+
| `VT_API_KEY` | VirusTotal file hash AV detections | Yes (4 req/min) | [virustotal.com](https://www.virustotal.com) |
|
|
188
|
+
| `ABUSECH_API_KEY` | MalwareBazaar, ThreatFox, URLhaus rate limits | Yes | [abuse.ch](https://abuse.ch) |
|
|
189
|
+
| `ABUSEIPDB_API_KEY` | Community IP abuse reports; 1,000 checks/day | Yes | [abuseipdb.com/register](https://www.abuseipdb.com/register) |
|
|
190
|
+
| `GREYNOISE_API_KEY` | Suppresses known scanner/researcher IPs | Free tier available | [greynoise.io/pricing](https://www.greynoise.io/pricing) |
|
|
191
|
+
| `URLSCAN_API_KEY` | Higher rate limits for URLScan.io domain scans | Yes (public results without key) | [urlscan.io/user/signup](https://urlscan.io/user/signup) |
|
|
192
|
+
| `HYBRID_ANALYSIS_API_KEY` | Behavioral sandbox analysis for file hashes | Yes | [hybrid-analysis.com/signup](https://www.hybrid-analysis.com/signup) |
|
|
193
|
+
| `HIBP_API_KEY` | Email breach history — the most valuable email enrichment | No ($3.50/month) | [haveibeenpwned.com/API/Key](https://haveibeenpwned.com/API/Key) |
|
|
194
|
+
| `EMAILREP_API_KEY` | Email reputation scoring; increases rate limits | Yes (reduced rate without key) | [emailrep.io/key](https://emailrep.io/key) |
|
|
195
|
+
| `SECURITYTRAILS_API_KEY` | Richer DNS history for domains | Yes (50 queries/month) | [securitytrails.com/corp/api](https://securitytrails.com/corp/api) |
|
|
196
|
+
| `GITHUB_TOKEN` | Raises GitHub scraping from 10 to 30 req/min | Free | [github.com/settings/tokens](https://github.com/settings/tokens) |
|
|
197
|
+
| `GITLAB_TOKEN` | Raises GitLab scraping from 15 to 60 req/min | Free | [gitlab.com/profile/personal_access_tokens](https://gitlab.com/-/profile/personal_access_tokens) |
|
|
198
|
+
| `BLOCKCYPHER_TOKEN` | BTC/ETH wallet balance and transaction graph | Yes | [blockcypher.com](https://www.blockcypher.com) |
|
|
199
|
+
| `ETHERSCAN_API_KEY` | ETH wallet lookups | Yes | [etherscan.io/apis](https://etherscan.io/apis) |
|
|
200
|
+
|
|
201
|
+
---
|
|
202
|
+
|
|
203
|
+
## Cost Comparison
|
|
204
|
+
|
|
205
|
+
| Platform | Annual Cost | Self-Hosted | Open Source |
|
|
206
|
+
|---|---|---|---|
|
|
207
|
+
| Recorded Future | ~$25,000 | No | No |
|
|
208
|
+
| DarkOwl | ~$15,000 | No | No |
|
|
209
|
+
| Flare | ~$8,000 | No | No |
|
|
210
|
+
| **VoidAccess** | **Free** | **Yes** | **Yes** |
|
|
211
|
+
|
|
212
|
+
Free with Groq, OpenRouter free models, or Ollama. Under $0.50 per investigation with paid models like DeepSeek.
|
|
213
|
+
|
|
214
|
+
---
|
|
215
|
+
|
|
216
|
+
## What's New in v1.3
|
|
217
|
+
|
|
218
|
+
- **10 new enrichment sources**: GreyNoise (scanner suppression), AbuseIPDB, Feodo Tracker, C2IntelFeeds, crt.sh, URLScan.io, Wayback Machine, Hybrid Analysis, HaveIBeenPwned, EmailRep
|
|
219
|
+
- **4 new clearnet collection sources**: paste sites, GitHub code search, GitLab code search, and 20 curated RSS security feeds
|
|
220
|
+
- **Curated .onion seed list** — 31 seeds across 8 categories, relevance-scored per query
|
|
221
|
+
- **CIRCL passive DNS + RDAP WHOIS** — infrastructure cluster detection for IPs and domains
|
|
222
|
+
- **Investigation cancellation** — cancel a running pipeline at any checkpoint; partial results are preserved
|
|
223
|
+
- **Sources panel** — per-investigation breakdown of which sources ran and what each returned
|
|
224
|
+
- **Infrastructure clusters panel** — groups IPs and domains sharing ASN, CIDR block, or WHOIS registrant
|
|
225
|
+
- **Entity quality badges** — C2, Malicious, Breached, Disposable, Archived, Taken Down, AV ratio
|
|
226
|
+
- **GreyNoise suppression** — known benign scanner IPs are filtered from entity results automatically
|
|
227
|
+
- **MALWARE_FAMILY auto-creation** from confirmed family names returned by hash enrichment
|
|
228
|
+
|
|
229
|
+
---
|
|
230
|
+
|
|
231
|
+
## Quick Start
|
|
232
|
+
|
|
233
|
+
### Prerequisites
|
|
234
|
+
- Docker and Docker Compose
|
|
235
|
+
- Python 3 (recommended — used by setup.sh for secret generation; Linux/macOS fall back to /dev/urandom if absent, Windows setup.bat may require it)
|
|
236
|
+
- One LLM API key — or Ollama for fully local operation (free)
|
|
237
|
+
|
|
238
|
+
**Free LLM options (no credit card required):**
|
|
239
|
+
- [Groq](https://console.groq.com) — fast, free tier, Llama 3.3 70B
|
|
240
|
+
- [OpenRouter](https://openrouter.ai) — free models including DeepSeek and Llama 3.3
|
|
241
|
+
- [Google AI Studio](https://aistudio.google.com) — Gemini free tier
|
|
242
|
+
- [Ollama](https://ollama.ai) — fully local, no internet required
|
|
243
|
+
|
|
244
|
+
### Installation
|
|
245
|
+
|
|
246
|
+
**macOS / Linux / WSL:**
|
|
247
|
+
```bash
|
|
248
|
+
bash setup.sh
|
|
249
|
+
```
|
|
250
|
+
|
|
251
|
+
**Windows (native):**
|
|
252
|
+
```bat
|
|
253
|
+
setup.bat
|
|
254
|
+
```
|
|
255
|
+
|
|
256
|
+
The interactive wizard creates `.env`, generates `JWT_SECRET` and `POSTGRES_PASSWORD`, prompts for your LLM provider (one of: Groq, OpenRouter, Anthropic, OpenAI, Google Gemini, or Ollama), optionally collects threat-intel keys (`OTX_API_KEY`, `VT_API_KEY`), optionally enables Redis, sets the admin password, and starts the Docker stack.
|
|
257
|
+
|
|
258
|
+
<div align="center">
|
|
259
|
+
<img src="./public/setup_gif.gif" width="100%" alt="Setup walkthrough">
|
|
260
|
+
</div>
|
|
261
|
+
|
|
262
|
+
### Starting and Stopping
|
|
263
|
+
|
|
264
|
+
**macOS / Linux / WSL:**
|
|
265
|
+
```bash
|
|
266
|
+
./start.sh # build and start all services
|
|
267
|
+
./stop.sh # stop all services
|
|
268
|
+
```
|
|
269
|
+
|
|
270
|
+
**Windows (native):**
|
|
271
|
+
```bat
|
|
272
|
+
start.bat :: build and start all services
|
|
273
|
+
stop.bat :: stop all services
|
|
274
|
+
```
|
|
275
|
+
|
|
276
|
+
Once running, open **http://localhost:3001** in your browser.
|
|
277
|
+
|
|
278
|
+
<div align="center">
|
|
279
|
+
<img src="./public/start_gif.gif" width="100%" alt="Starting VoidAccess">
|
|
280
|
+
</div>
|
|
281
|
+
|
|
282
|
+
### Getting a JWT (API access)
|
|
283
|
+
|
|
284
|
+
`setup.sh` creates a default admin account at `admin@voidaccess.tech` with the password you provided during the wizard.
|
|
285
|
+
|
|
286
|
+
```bash
|
|
287
|
+
curl -X POST http://localhost:8000/auth/login \
|
|
288
|
+
-H "Content-Type: application/json" \
|
|
289
|
+
-d '{"email": "admin@voidaccess.tech", "password": "yourpassword"}'
|
|
290
|
+
```
|
|
291
|
+
|
|
292
|
+
Use the returned token in an `Authorization: Bearer <token>` header for API requests.
|
|
293
|
+
|
|
294
|
+
### Running your first investigation (API)
|
|
295
|
+
|
|
296
|
+
```bash
|
|
297
|
+
curl -X POST http://localhost:8000/investigations \
|
|
298
|
+
-H "Authorization: Bearer <your_jwt>" \
|
|
299
|
+
-H "Content-Type: application/json" \
|
|
300
|
+
-d '{"query": "LockBit ransomware infrastructure 2024"}'
|
|
301
|
+
```
|
|
302
|
+
|
|
303
|
+
The investigation starts in `pending`, moves to `processing`, and completes in 3–5 minutes with a summary, extracted entities, relationship graph, and export-ready artifacts.
|
|
304
|
+
|
|
305
|
+
---
|
|
306
|
+
|
|
307
|
+
## Architecture
|
|
308
|
+
|
|
309
|
+
Four Docker services:
|
|
310
|
+
|
|
311
|
+
| Service | Technology | Port |
|
|
312
|
+
|---|---|---|
|
|
313
|
+
| **postgres** | PostgreSQL 16 | 5433 |
|
|
314
|
+
| **tor** | Tor SOCKS5 proxy | 9050 |
|
|
315
|
+
| **fastapi** | Python 3.11, FastAPI, SQLAlchemy | 8000 |
|
|
316
|
+
| **nextjs** | Next.js 14, TypeScript, Tailwind | 3001 |
|
|
317
|
+
|
|
318
|
+
The FastAPI backend runs a 13-step pipeline triggered by `POST /investigations`. Every external call has `try/except` with graceful fallback — the pipeline never hard-crashes. API docs are available at **http://localhost:8000/docs** when running.
|
|
319
|
+
|
|
320
|
+
### Source Tree
|
|
321
|
+
|
|
322
|
+
```
|
|
323
|
+
voidaccess/
|
|
324
|
+
├── analysis/ # Temporal patterns, OPSEC failure detection, anomaly scoring
|
|
325
|
+
├── api/ # FastAPI routes; investigation pipeline orchestrator
|
|
326
|
+
├── auth/ # JWT authentication and user management
|
|
327
|
+
├── crawler/ # Recursive .onion link discovery spider
|
|
328
|
+
├── db/ # SQLAlchemy ORM models and Alembic migrations
|
|
329
|
+
├── docs/ # Contributing, security, and usage policy documents
|
|
330
|
+
├── export/ # STIX 2.1, MISP, Sigma, and CSV artifact generation
|
|
331
|
+
├── extractor/ # Regex → NER → LLM entity extraction pipeline
|
|
332
|
+
├── fingerprint/ # Stylometry vectors and actor style profiling
|
|
333
|
+
├── graph/ # NetworkX MultiDiGraph builder and pyvis visualization
|
|
334
|
+
├── i18n/ # Language detection, translation, multilingual query expansion
|
|
335
|
+
├── infra/ # Docker Compose, Tor config, Postgres init
|
|
336
|
+
├── monitor/ # APScheduler watches, change diffing, Telegram/SMTP alerts
|
|
337
|
+
├── public/ # Logo, walkthrough screenshots, demo media
|
|
338
|
+
├── scraper/ # Async aiohttp and Playwright scrapers over Tor
|
|
339
|
+
├── scripts/ # Seed imports and operational utilities
|
|
340
|
+
├── search/ # 16+ .onion search engine fan-out with circuit breaker
|
|
341
|
+
├── sources/ # DarkSearch, Telegram, paste sites, threat-intel feeds
|
|
342
|
+
├── tests/ # Pytest suite (one test file per module)
|
|
343
|
+
├── utils/ # Async helpers, content safety, encryption, defang
|
|
344
|
+
├── vector/ # ChromaDB cache with sentence-transformer embeddings
|
|
345
|
+
├── voidaccess/ # LangChain LLM wrappers and provider registry
|
|
346
|
+
└── web/ # Next.js 14 + TypeScript + Tailwind frontend
|
|
347
|
+
```
|
|
348
|
+
|
|
349
|
+
> **Note on `voidaccess/voidaccess/`** — the nested directory holds the core LLM utilities (`llm.py`, `llm_utils.py`) and is imported at runtime by the API routes (`from voidaccess.llm import ...`). The nested naming reflects the original package structure from the project's pre-API baseline.
|
|
350
|
+
|
|
351
|
+
---
|
|
352
|
+
|
|
353
|
+
## Troubleshooting
|
|
354
|
+
|
|
355
|
+
**Services won't start:**
|
|
356
|
+
```bash
|
|
357
|
+
docker compose -f infra/docker-compose.yml --project-directory . ps
|
|
358
|
+
docker compose -f infra/docker-compose.yml --project-directory . logs -f
|
|
359
|
+
```
|
|
360
|
+
|
|
361
|
+
**Port conflicts** (3001 or 8000 already in use):
|
|
362
|
+
- macOS/Linux: `lsof -i :3001` to find what's using it
|
|
363
|
+
- Windows: `netstat -ano | findstr :3001`
|
|
364
|
+
|
|
365
|
+
**Tor not connecting:** The Tor service takes 30–60 seconds to bootstrap on first start. Check health with `./check_health.sh`. This script verifies Tor proxy connectivity, LLM provider reachability, and dark web search engine availability.
|
|
366
|
+
|
|
367
|
+
**No .env file:** Run `bash setup.sh` (macOS/Linux/WSL) or `setup.bat` (Windows) before starting.
|
|
368
|
+
|
|
369
|
+
**Docker build takes a long time:** First build downloads ~3GB of layers. Subsequent builds use the Docker layer cache and are much faster.
|
|
370
|
+
|
|
371
|
+
---
|
|
372
|
+
|
|
373
|
+
## Content Safety
|
|
374
|
+
|
|
375
|
+
Every investigation runs through mandatory content safety filters before results reach the UI or appear in the graph. CSAM, gore, snuff content, and other prohibited material are blocked at the query stage, URL validation, content scanning, and post-extraction entity filtering. These filters are mandatory and cannot be disabled.
|
|
376
|
+
|
|
377
|
+
---
|
|
378
|
+
|
|
379
|
+
## Acceptable Use
|
|
380
|
+
|
|
381
|
+
VoidAccess is for authorized security research, threat intelligence gathering, and law enforcement purposes only. Users are responsible for ensuring compliance with all local laws and ethical standards. See [docs/USAGE_POLICY.md](docs/USAGE_POLICY.md) for the full policy.
|
|
382
|
+
|
|
383
|
+
---
|
|
384
|
+
|
|
385
|
+
## Contributing
|
|
386
|
+
|
|
387
|
+
Contributions are welcome. See [docs/CONTRIBUTING.md](docs/CONTRIBUTING.md) for setup instructions, code standards, and the PR process. Please read [docs/CODE_OF_CONDUCT.md](docs/CODE_OF_CONDUCT.md) before participating.
|
|
388
|
+
|
|
389
|
+
To report a security vulnerability, see [docs/SECURITY.md](docs/SECURITY.md).
|
|
390
|
+
|
|
391
|
+
---
|
|
392
|
+
|
|
393
|
+
## License
|
|
394
|
+
|
|
395
|
+
MIT License. See [LICENSE](LICENSE) for details.
|