@fanboynz/network-scanner 1.0.35

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md ADDED
@@ -0,0 +1,357 @@
1
+ A Puppeteer-based tool for scanning websites to find third-party (or optionally first-party) network requests matching specified patterns, and generate Adblock-formatted rules.
2
+
3
+ ## Features
4
+
5
+ - Scan websites and detect matching third-party or first-party resources
6
+ - Output Adblock-formatted blocking rules
7
+ - Support for multiple filters per site
8
+ - Grouped titles (! <url>) before site matches
9
+ - Ignore unwanted domains (global and per-site)
10
+ - Block unwanted domains during scan (simulate adblock)
11
+ - Support Chrome, Firefox, Safari user agents (desktop or mobile)
12
+ - Advanced fingerprint spoofing and referrer header simulation
13
+ - Delay, timeout, reload options per site
14
+ - Verbose and debug modes
15
+ - Dump matched full URLs into `matched_urls.log`
16
+ - Save output in normal Adblock format or localhost (127.0.0.1/0.0.0.0)
17
+ - Subdomain handling (collapse to root or full subdomain)
18
+ - Optionally match only first-party, third-party, or both
19
+ - Enhanced redirect handling with JavaScript and meta refresh detection
20
+
21
+ ---
22
+
23
+ ## Command Line Arguments
24
+
25
+ ### Output Options
26
+
27
+ | Argument | Description |
28
+ |:---------------------------|:------------|
29
+ | `-o, --output <file>` | Output file for rules. If omitted, prints to console |
30
+ | `--compare <file>` | Remove rules that already exist in this file before output |
31
+ | `--color, --colour` | Enable colored console output for status messages |
32
+ | `--append` | Append new rules to output file instead of overwriting (requires `-o`) |
33
+
34
+ ### Output Format Options
35
+
36
+ | Argument | Description |
37
+ |:---------------------------|:------------|
38
+ | `--localhost` | Output as `127.0.0.1 domain.com` |
39
+ | `--localhost-0.0.0.0` | Output as `0.0.0.0 domain.com` |
40
+ | `--plain` | Output just domains (no adblock formatting) |
41
+ | `--dnsmasq` | Output as `local=/domain.com/` (dnsmasq format) |
42
+ | `--dnsmasq-old` | Output as `server=/domain.com/` (dnsmasq old format) |
43
+ | `--unbound` | Output as `local-zone: "domain.com." always_null` (unbound format) |
44
+ | `--privoxy` | Output as `{ +block } .domain.com` (Privoxy format) |
45
+ | `--pihole` | Output as `(^\|\\.)domain\\.com$` (Pi-hole regex format) |
46
+ | `--adblock-rules` | Generate adblock filter rules with resource type modifiers (requires `-o`) |
47
+
48
+ ### General Options
49
+
50
+ | Argument | Description |
51
+ |:---------------------------|:------------|
52
+ | `--verbose` | Force verbose mode globally |
53
+ | `--debug` | Force debug mode globally |
54
+ | `--silent` | Suppress normal console logs |
55
+ | `--titles` | Add `! <url>` title before each site's group |
56
+ | `--dumpurls` | Dump matched URLs into matched_urls.log |
57
+ | `--remove-tempfiles` | Remove Chrome/Puppeteer temporary files before exit |
58
+ | `--compress-logs` | Compress log files with gzip (requires `--dumpurls`) |
59
+ | `--sub-domains` | Output full subdomains instead of collapsing to root |
60
+ | `--no-interact` | Disable page interactions globally |
61
+ | `--custom-json <file>` | Use a custom config JSON file instead of config.json |
62
+ | `--headful` | Launch browser with GUI (not headless) |
63
+ | `--cdp` | Enable Chrome DevTools Protocol logging (now per-page if enabled) |
64
+ | `--remove-dupes` | Remove duplicate domains from output (only with `-o`) |
65
+ | `--dry-run` | Console output only: show matching regex, titles, whois/dig/searchstring results, and adblock rules |
66
+ | `--eval-on-doc` | Globally enable evaluateOnNewDocument() for Fetch/XHR interception |
67
+ | `--help`, `-h` | Show this help menu |
68
+ | `--version` | Show script version |
69
+
70
+ ### Validation Options
71
+
72
+ | Argument | Description |
73
+ |:---------------------------|:------------|
74
+ | `--validate-config` | Validate config.json file and exit |
75
+ | `--validate-rules [file]` | Validate rule file format (uses --output/--compare files if no file specified) |
76
+ | `--clean-rules [file]` | Clean rule files by removing invalid lines and optionally duplicates (uses --output/--compare files if no file specified) |
77
+ | `--test-validation` | Run domain validation tests and exit |
78
+
79
+ ---
80
+
81
+ ## config.json Format
82
+
83
+ Example:
84
+
85
+ ```json
86
+ {
87
+ "ignoreDomains": [
88
+ "googleapis.com",
89
+ "googletagmanager.com"
90
+ ],
91
+ "sites": [
92
+ {
93
+ "url": "https://example.com/",
94
+ "userAgent": "chrome",
95
+ "filterRegex": "ads|analytics",
96
+ "resourceTypes": ["script", "xhr", "image"],
97
+ "reload": 2,
98
+ "delay": 5000,
99
+ "timeout": 30000,
100
+ "verbose": 1,
101
+ "debug": 1,
102
+ "interact": true,
103
+ "fingerprint_protection": "random",
104
+ "referrer_headers": {
105
+ "mode": "random_search",
106
+ "search_terms": ["example reviews", "best deals"]
107
+ },
108
+ "custom_headers": {
109
+ "X-Custom-Header": "value"
110
+ },
111
+ "firstParty": 0,
112
+ "thirdParty": 1,
113
+ "subDomains": 0,
114
+ "blocked": [
115
+ "googletagmanager.com",
116
+ ".*tracking.*"
117
+ ]
118
+ }
119
+ ]
120
+ }
121
+ ```
122
+
123
+ ---
124
+
125
+ ## config.json Field Table
126
+
127
+ ### Basic Configuration
128
+
129
+ | Field | Values | Default | Description |
130
+ |:---------------------|:-------|:-------:|:------------|
131
+ | `url` | String or Array | - | Website URL(s) to scan |
132
+ | `userAgent` | `chrome`, `firefox`, `safari` | - | User agent for page (latest versions: Chrome 131, Firefox 133, Safari 18.2) |
133
+ | `filterRegex` | String or Array | `.*` | Regex or list of regexes to match requests |
134
+ | `comments` | String or Array | - | String of comments or references |
135
+ | `resourceTypes` | Array | `["script", "xhr", "image", "stylesheet"]` | What resource types to monitor |
136
+ | `reload` | Integer | `1` | Number of times to reload page |
137
+ | `delay` | Milliseconds | `4000` | Wait time after loading/reloading |
138
+ | `timeout` | Milliseconds | `30000` | Timeout for page load |
139
+ | `verbose` | `0` or `1` | `0` | Enable verbose output per site |
140
+ | `debug` | `0` or `1` | `0` | Dump matching URLs for the site |
141
+ | `interact` | `true` or `false` | `false` | Simulate user interaction (hover, click) |
142
+ | `firstParty` | `0` or `1` | `0` | Match first-party requests |
143
+ | `thirdParty` | `0` or `1` | `1` | Match third-party requests |
144
+ | `subDomains` | `0` or `1` | `0` | 1 = preserve subdomains in output |
145
+ | `blocked` | Array | - | Domains or regexes to block during scanning |
146
+ | `even_blocked` | Boolean | `false` | Add matching rules even if requests are blocked |
147
+
148
+ ### Redirect Handling Options
149
+
150
+ | Field | Values | Default | Description |
151
+ |:---------------------|:-------|:-------:|:------------|
152
+ | `follow_redirects` | Boolean | `true` | Follow redirects to new domains |
153
+ | `max_redirects` | Integer | `10` | Maximum number of redirects to follow |
154
+ | `js_redirect_timeout` | Milliseconds | `5000` | Time to wait for JavaScript redirects |
155
+ | `detect_js_patterns` | Boolean | `true` | Analyze page source for redirect patterns |
156
+ | `redirect_timeout_multiplier` | Number | `1.5` | Increase timeout for redirected URLs |
157
+
158
+ When a page redirects to a new domain, first-party/third-party detection is based on the **final redirected domain**, and all intermediate redirect domains (like `bit.ly`, `t.co`) are automatically excluded from the generated rules.
159
+
160
+
161
+ ### Advanced Stealth & Fingerprinting
162
+
163
+ | Field | Values | Default | Description |
164
+ |:---------------------|:-------|:-------:|:------------|
165
+ | `fingerprint_protection` | `true`, `false`, `"random"` | `false` | Enable navigator/device spoofing |
166
+ | `referrer_headers` | String, Array, or Object | - | Set referrer header for realistic traffic sources |
167
+ | `custom_headers` | Object | - | Add custom HTTP headers to requests |
168
+
169
+ #### Referrer Header Options
170
+
171
+ **Simple formats:**
172
+ ```json
173
+ "referrer_headers": "https://google.com/search?q=example"
174
+ "referrer_headers": ["url1", "url2"]
175
+ ```
176
+
177
+ **Smart modes:**
178
+ ```json
179
+ "referrer_headers": {"mode": "random_search", "search_terms": ["reviews"]}
180
+ "referrer_headers": {"mode": "social_media"}
181
+ "referrer_headers": {"mode": "direct_navigation"}
182
+ "referrer_headers": {"mode": "custom", "custom": ["https://news.ycombinator.com/"]}
183
+ ```
184
+
185
+ ### Protection Bypassing
186
+
187
+ | Field | Values | Default | Description |
188
+ |:---------------------|:-------|:-------:|:------------|
189
+ | `cloudflare_phish` | Boolean | `false` | Auto-click through Cloudflare phishing warnings |
190
+ | `cloudflare_bypass` | Boolean | `false` | Auto-solve Cloudflare "Verify you are human" challenges |
191
+ | `flowproxy_detection` | Boolean | `false` | Enable flowProxy protection detection and handling |
192
+ | `flowproxy_page_timeout` | Milliseconds | `45000` | Page timeout for flowProxy sites |
193
+ | `flowproxy_nav_timeout` | Milliseconds | `45000` | Navigation timeout for flowProxy sites |
194
+ | `flowproxy_js_timeout` | Milliseconds | `15000` | JavaScript challenge timeout |
195
+ | `flowproxy_delay` | Milliseconds | `30000` | Delay for rate limiting |
196
+ | `flowproxy_additional_delay` | Milliseconds | `5000` | Additional processing delay |
197
+
198
+ ### WHOIS/DNS Analysis Options
199
+
200
+ | Field | Values | Default | Description |
201
+ |:---------------------|:-------|:-------:|:------------|
202
+ | `whois` | Array | - | Check whois data for ALL specified terms (AND logic) |
203
+ | `whois-or` | Array | - | Check whois data for ANY specified term (OR logic) |
204
+ | `whois_delay` | Integer | `3000` | Delay whois requests to avoid throttling |
205
+ | `whois_server` | String or Array | - | Custom whois server(s) - single server or randomized list |
206
+ | `whois_server_mode` | String | `"random"` | Server selection mode: `"random"` or `"cycle"` |
207
+ | `whois_max_retries` | Integer | `2` | Maximum retry attempts per domain |
208
+ | `whois_timeout_multiplier` | Number | `1.5` | Timeout increase multiplier per retry |
209
+ | `whois_use_fallback` | Boolean | `true` | Add TLD-specific fallback servers |
210
+ | `whois_retry_on_timeout` | Boolean | `true` | Retry on timeout errors |
211
+ | `whois_retry_on_error` | Boolean | `false` | Retry on connection/other errors |
212
+ | `dig` | Array | - | Check dig output for ALL specified terms (AND logic) |
213
+ | `dig-or` | Array | - | Check dig output for ANY specified term (OR logic) |
214
+ | `dig_subdomain` | Boolean | `false` | Use subdomain for dig lookup instead of root domain |
215
+ | `digRecordType` | String | `"A"` | DNS record type for dig (A, CNAME, MX, etc.) |
216
+
217
+ ### Content Analysis Options
218
+
219
+ | Field | Values | Default | Description |
220
+ |:---------------------|:-------|:-------:|:------------|
221
+ | `searchstring` | String or Array | - | Text to search in response content (OR logic) |
222
+ | `searchstring_and` | String or Array | - | Text to search with AND logic - ALL terms must be present |
223
+ | `curl` | Boolean | `false` | Use curl to download content for analysis |
224
+ | `grep` | Boolean | `false` | Use grep instead of JavaScript for pattern matching (requires curl=true) |
225
+
226
+ ### Advanced Browser Options
227
+
228
+ | Field | Values | Default | Description |
229
+ |:---------------------|:-------|:-------:|:------------|
230
+ | `goto_options` | Object | `{"waitUntil": "load"}` | Custom page.goto() options |
231
+ | `clear_sitedata` | Boolean | `false` | Clear all cookies, cache, storage before each load |
232
+ | `forcereload` | Boolean | `false` | Force an additional reload after reloads |
233
+ | `isBrave` | Boolean | `false` | Spoof Brave browser detection |
234
+ | `evaluateOnNewDocument` | Boolean | `false` | Inject fetch/XHR interceptor in page |
235
+ | `cdp` | Boolean | `false` | Enable CDP logging for this site |
236
+ | `css_blocked` | Array | - | CSS selectors to hide elements |
237
+ | `source` | Boolean | `false` | Save page source HTML after load |
238
+ | `screenshot` | Boolean | `false` | Capture screenshot on load failure |
239
+ | `headful` | Boolean | `false` | Launch browser with GUI for this site |
240
+ | `adblock_rules` | Boolean | `false` | Generate adblock filter rules with resource types for this site |
241
+
242
+ ### Global Configuration Options
243
+
244
+ These options go at the root level of your config.json:
245
+
246
+ | Field | Values | Default | Description |
247
+ |:---------------------|:-------|:-------:|:------------|
248
+ | `ignoreDomains` | Array | - | Domains to completely ignore (supports wildcards like `*.ads.com`) |
249
+ | `blocked` | Array | - | Global regex patterns to block requests (combined with per-site blocked) |
250
+ | `whois_server_mode` | String | `"random"` | Default server selection mode for all sites |
251
+ | `ignore_similar` | Boolean | `true` | Ignore domains similar to already found domains |
252
+ | `ignore_similar_threshold` | Integer | `80` | Similarity threshold percentage for ignore_similar |
253
+ | `ignore_similar_ignored_domains` | Boolean | `true` | Ignore domains similar to ignoreDomains list |
254
+
255
+ ---
256
+
257
+ ## Usage Examples
258
+
259
+ ### Basic Scanning
260
+ ```bash
261
+ # Scan with default config and output to console
262
+ node nwss.js
263
+
264
+ # Scan and save rules to file
265
+ node nwss.js -o blocklist.txt
266
+
267
+ # Append new rules to existing file
268
+ node nwss.js --append -o blocklist.txt
269
+
270
+ # Clean existing rules and append new ones
271
+ node nwss.js --clean-rules --append -o blocklist.txt
272
+ ```
273
+
274
+ ### Advanced Options
275
+ ```bash
276
+ # Debug mode with URL dumping and colored output
277
+ node nwss.js --debug --dumpurls --color -o rules.txt
278
+
279
+ # Dry run to see what would be matched
280
+ node nwss.js --dry-run --debug
281
+
282
+ # Validate configuration before running
283
+ node nwss.js --validate-config
284
+
285
+ # Clean rule files
286
+ node nwss.js --clean-rules existing_rules.txt
287
+
288
+ # Maximum stealth scanning
289
+ node nwss.js --debug --color -o stealth_rules.txt
290
+ ```
291
+
292
+ ### Stealth Configuration Examples
293
+
294
+ #### E-commerce Site Scanning
295
+ ```json
296
+ {
297
+ "url": "https://shopping-site.com",
298
+ "userAgent": "chrome",
299
+ "fingerprint_protection": "random",
300
+ "referrer_headers": {
301
+ "mode": "random_search",
302
+ "search_terms": ["product reviews", "best deals", "price comparison"]
303
+ },
304
+ "interact": true,
305
+ "delay": 6000,
306
+ "filterRegex": "analytics|tracking|ads"
307
+ }
308
+ ```
309
+
310
+ #### News Site Analysis
311
+ ```json
312
+ {
313
+ "url": "https://news-site.com",
314
+ "userAgent": "firefox",
315
+ "fingerprint_protection": true,
316
+ "referrer_headers": {"mode": "social_media"},
317
+ "custom_headers": {
318
+ "Accept-Language": "en-US,en;q=0.9"
319
+ },
320
+ "filterRegex": "doubleclick|googletagmanager"
321
+ }
322
+ ```
323
+
324
+ #### Tech Blog with Custom Referrers
325
+ ```json
326
+ {
327
+ "url": "https://tech-blog.com",
328
+ "fingerprint_protection": "random",
329
+ "referrer_headers": {
330
+ "mode": "custom",
331
+ "custom": [
332
+ "https://news.ycombinator.com/",
333
+ "https://www.reddit.com/r/programming/",
334
+ "https://lobste.rs/"
335
+ ]
336
+ }
337
+ }
338
+ ```
339
+
340
+ ---
341
+
342
+ ## Notes
343
+
344
+ - If both `firstParty: 0` and `thirdParty: 0` are set for a site, it will be skipped.
345
+ - `ignoreDomains` applies globally across all sites.
346
+ - `ignoreDomains` supports wildcards (e.g., `*.ads.com` matches `tracker.ads.com`)
347
+ - Blocking (`blocked`) can match full domains or regex.
348
+ - If a site's `blocked` field is missing, no extra blocking is applied.
349
+ - `--clean-rules` with `--append` will clean existing files first, then append new rules
350
+ - `--remove-dupes` works with all output modes and removes duplicates from final output
351
+ - Validation tools help ensure rule files are properly formatted before use
352
+ - `--remove-tempfiles` removes Chrome/Puppeteer temporary files before exiting, avoids disk space issues
353
+ - For maximum stealth, combine `fingerprint_protection: "random"` with appropriate `referrer_headers` modes
354
+ - User agents are automatically updated to latest versions (Chrome 131, Firefox 133, Safari 18.2)
355
+ - Referrer headers work independently from fingerprint protection - use both for best results
356
+
357
+ ---
package/config.json ADDED
@@ -0,0 +1,74 @@
1
+ {
2
+ "ignoreDomains": [
3
+ "googletagmanager.com",
4
+ "googleapis.com",
5
+ "amung.us",
6
+ "cloudflare.com",
7
+ "facebook.net",
8
+ "histats.com",
9
+ "bing.com",
10
+ "zeotap.com",
11
+ "bootstrapcdn.com",
12
+ "cloudfront.net",
13
+ "google.com",
14
+ "gstatic.com",
15
+ "jwpcdn.com",
16
+ "jquery.com",
17
+ "reddit.com",
18
+ "cloudflarestorage.com",
19
+ "youtube-nocookie.com",
20
+ "cloudflareinsights.com",
21
+ "onaudience.com",
22
+ "intensedebate.com",
23
+ "wordpress.com",
24
+ "ouo.io",
25
+ "amazon.com",
26
+ "amazon-adsystem.com",
27
+ "disqus.com",
28
+ "addtoany.com",
29
+ "cutcaptcha.net",
30
+ "yahoo.com",
31
+ "doubleclick.net",
32
+ "youtube.com",
33
+ "yandex.com",
34
+ "yandex.ru",
35
+ "tmdb.org",
36
+ "google-analytics.com"
37
+ ],
38
+ "sites": [
39
+ {
40
+ "url": "https://www.anandtech.com/",
41
+ "filterRegex": ".",
42
+ "resourceTypes": ["script", "xhr", "document"],
43
+ "reload": 1,
44
+ "timeout": 25000,
45
+ "delay": 5000,
46
+ "verbose": 1,
47
+ "blocked": ["somedomain.com", "anotherdomain.com"],
48
+ "css_blocked": ["#advertisement", ".ad-banner", "[data-ad-slot]", "div[class*='sponsored']"],
49
+ "interact": true
50
+ },
51
+ {
52
+ "url": "https://www.tomshardware.com/",
53
+ "filterRegex": ".",
54
+ "resourceTypes": ["all"],
55
+ "reload": 2,
56
+ "timeout": 25000,
57
+ "delay": 15000,
58
+ "verbose": 1,
59
+ "interact": true,
60
+ "fingerprint_protection": "random"
61
+ },
62
+ {
63
+ "url": ["https://www.tomshardware.com/", "https://www.anandtech.com/"],
64
+ "filterRegex": ".",
65
+ "resourceTypes": ["all"],
66
+ "reload": 2,
67
+ "timeout": 25000,
68
+ "delay": 15000,
69
+ "verbose": 1,
70
+ "interact": true,
71
+ "fingerprint_protection": "random"
72
+ }
73
+ ]
74
+ }