waymore 4.8__tar.gz → 5.0__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -1,6 +1,6 @@
1
1
  Metadata-Version: 2.1
2
2
  Name: waymore
3
- Version: 4.8
3
+ Version: 5.0
4
4
  Summary: Find way more from the Wayback Machine, Common Crawl, Alien Vault OTX, URLScan & VirusTotal!
5
5
  Home-page: https://github.com/xnl-h4ck3r/waymore
6
6
  Author: @xnl-h4ck3r
@@ -15,7 +15,7 @@ Requires-Dist: tldextract
15
15
 
16
16
  <center><img src="https://github.com/xnl-h4ck3r/waymore/blob/main/waymore/images/title.png"></center>
17
17
 
18
- ## About - v4.8
18
+ ## About - v5.0
19
19
 
20
20
  The idea behind **waymore** is to find even more links from the Wayback Machine than other existing tools.
21
21
 
@@ -23,7 +23,7 @@ The idea behind **waymore** is to find even more links from the Wayback Machine
23
23
  👉 Also, other tools do not currenrtly deal with the rate limiting now in place by the sources, and will often just stop with incomplete results and not let you know they are incomplete.
24
24
 
25
25
  Anyone who does bug bounty will have likely used the amazing [waybackurls](https://github.com/tomnomnom/waybackurls) by @TomNomNoms. This tool gets URLs from [web.archive.org](https://web.archive.org) and additional links (if any) from one of the index collections on [index.commoncrawl.org](http://index.commoncrawl.org/).
26
- You would have also likely used the amazing [gau](https://github.com/lc/gau) by @hacker\_ which also finds URL's from wayback archive, Common Crawl, but also from Alien Vault and URLScan.
26
+ You would have also likely used the amazing [gau](https://github.com/lc/gau) by @hacker\_ which also finds URL's from wayback archive, Common Crawl, but also from Alien Vault, URLScan, Virus Total and Intelligence X.
27
27
  Now **waymore** gets URL's from ALL of those sources too (with ability to filter more to get what you want):
28
28
 
29
29
  - Wayback Machine (web.archive.org)
@@ -31,6 +31,7 @@ Now **waymore** gets URL's from ALL of those sources too (with ability to filter
31
31
  - Alien Vault OTX (otx.alienvault.com)
32
32
  - URLScan (urlscan.io)
33
33
  - Virus Total (virustotal.com)
34
+ - Intelligence X (intelx.io) - PAID SOURCE ONLY
34
35
 
35
36
  👉 It's a point that many seem to miss, so I'll just add it again :) ... The biggest difference between **waymore** and other tools is that it can also **download the archived responses** for URLs on wayback machine so that you can then search these for even more links, developer comments, extra parameters, etc. etc.
36
37
 
@@ -83,9 +84,9 @@ pipx install git+https://github.com/xnl-h4ck3r/waymore.git
83
84
  | -n | --no-subs | Don't include subdomains of the target domain (only used if input is not a domain with a specific path). |
84
85
  | -f | --filter-responses-only | The initial links from sources will not be filtered, only the responses that are downloaded, e.g. it maybe useful to still see all available paths from the links, even if you don't want to check the content. |
85
86
  | -fc | | Filter HTTP status codes for retrieved URLs and responses. Comma separated list of codes (default: the `FILTER_CODE` values from `config.yml`). Passing this argument will override the value from `config.yml` |
86
- | -ft | | Filter MIME Types for retrieved URLs and responses. Comma separated list of MIME Types (default: the `FILTER_MIME` values from `config.yml`). Passing this argument will override the value from `config.yml`. **NOTE: This will NOT be applied to Alien Vault OTX and Virus Total because they don't have the ability to filter on MIME Type. Sometimes URLScan does not have a MIME Type defined - these will always be included. Consider excluding sources if this matters to you.**. |
87
+ | -ft | | Filter MIME Types for retrieved URLs and responses. Comma separated list of MIME Types (default: the `FILTER_MIME` values from `config.yml`). Passing this argument will override the value from `config.yml`. **NOTE: This will NOT be applied to Alien Vault OTX, Virus Total and Intelligence X because they don't have the ability to filter on MIME Type. Sometimes URLScan does not have a MIME Type defined - these will always be included. Consider excluding sources if this matters to you.**. |
87
88
  | -mc | | Only Match HTTP status codes for retrieved URLs and responses. Comma separated list of codes. Passing this argument overrides the config `FILTER_CODE` and `-fc`. |
88
- | -mt | | Only MIME Types for retrieved URLs and responses. Comma separated list of MIME types. Passing this argument overrides the config `FILTER_MIME` and `-ft`. **NOTE: This will NOT be applied to Alien Vault OTX and Virus Total because they don't have the ability to filter on MIME Type. Sometimes URLScan does not have a MIME Type defined - these will always be included. Consider excluding sources if this matters to you.**. |
89
+ | -mt | | Only MIME Types for retrieved URLs and responses. Comma separated list of MIME types. Passing this argument overrides the config `FILTER_MIME` and `-ft`. **NOTE: This will NOT be applied to Alien Vault OTX, Virus Total and Intelligence X because they don't have the ability to filter on MIME Type. Sometimes URLScan does not have a MIME Type defined - these will always be included. Consider excluding sources if this matters to you.**. |
89
90
  | -l | --limit | How many responses will be saved (if `-mode R` or `-mode B` is passed). A positive value will get the **first N** results, a negative value will get the **last N** results. A value of 0 will get **ALL** responses (default: 5000) |
90
91
  | -from | --from-date | What date to get responses from. If not specified it will get from the earliest possible results. A partial value can be passed, e.g. `2016`, `201805`, etc. |
91
92
  | -to | --to-date | What date to get responses to. If not specified it will get to the latest possible results. A partial value can be passed, e.g. `2021`, `202112`, etc. |
@@ -97,6 +98,7 @@ pipx install git+https://github.com/xnl-h4ck3r/waymore.git
97
98
  | -xav | | Exclude checks for links from alienvault.com |
98
99
  | -xus | | Exclude checks for links from urlscan.io |
99
100
  | -xvt | | Exclude checks for links from virustotal.com |
101
+ | -xix | | Exclude checks for links from Intelligence X.com |
100
102
  | -lcc | | Limit the number of Common Crawl index collections searched, e.g. `-lcc 10` will just search the latest `10` collections (default: 1). As of November 2024 there are currently 106 collections. Setting to `0` will search **ALL** collections. If you don't want to search Common Crawl at all, use the `-xcc` option. |
101
103
  | -lcy | | Limit the number of Common Crawl index collections searched by the year of the index data. The earliest index has data from 2008. Setting to 0 (default) will search collections or any year (but in conjuction with `-lcc`). For example, if you are only interested in data from 2015 and after, pass `-lcy 2015`. This will override the value of `-lcc` if passed. If you don't want to search Common Crawl at all, use the `-xcc` option. |
102
104
  | -t | --timeout | This is for archived responses only! How many seconds to wait for the server to send data before giving up (default: 30) |
@@ -164,8 +166,9 @@ The `config.yml` file (typically in `~/.config/waymore/`) have values that can b
164
166
  - `CONTINUE_RESPONSES_IF_PIPED` - If retrieving archive responses doesn't complete, you will be prompted next time whether you want to continue with the previous run. However, if `stdout` is piped to another process it is assumed you don't want to have an interactive prompt. A value of `True` (default) will determine assure the previous run will be continued. if you want a fresh run every time then set to `False`.
165
167
  - `WEBHOOK_DISCORD` - If the `--notify-discord` argument is passed, `knoxnl` will send a notification to this Discord wehook when a successful XSS is found.
166
168
  - `DEFAULT_OUTPUT_DIR` - This is the default location of any output files written if the `-oU` and `-oR` arguments are not used. If the value of this key is blank, then it will default to the location of the `config.yml` file.
169
+ - `INTELX_API_KEY` - You can sign up to [intelx.io here](https://intelx.io/product). It requires a paid API key to do the `/phonebook/search` through their API (as of 2024-09-01, the Phonebook service has been restricted to paid users due to constant abuse by spam accounts).
167
170
 
168
- **NOTE: The MIME types cannot be filtered for Alien Vault OTX and Virus Total because they don't have the ability to filter on MIME Type. Sometimes URLScan does not have a MIME Type defined for a URL. In these cases, URLs will be included regardless of filter or match. Bear this in mind and consider excluding certain providers if this is important.**
171
+ **NOTE: The MIME types cannot be filtered for Alien Vault OTX, Virus Total and Intelligence X because they don't have the ability to filter on MIME Type. Sometimes URLScan does not have a MIME Type defined for a URL. In these cases, URLs will be included regardless of filter or match. Bear this in mind and consider excluding certain providers if this is important.**
169
172
 
170
173
  ## Output
171
174
 
@@ -281,7 +284,7 @@ If you come across any problems at all, or have ideas for improvements, please f
281
284
 
282
285
  ## TODO
283
286
 
284
- - Add an `-oss` argument that accepts a file of Out Of Scope subdomains/URLs that will not be returned in the output, or have any responses downloaded
287
+ - Add an `-oos` argument that accepts a file of Out Of Scope subdomains/URLs that will not be returned in the output, or have any responses downloaded
285
288
 
286
289
  ## References
287
290
 
@@ -290,6 +293,7 @@ If you come across any problems at all, or have ideas for improvements, please f
290
293
  - [Alien Vault OTX API](https://otx.alienvault.com/assets/static/external_api.html)
291
294
  - [URLScan API](https://urlscan.io/docs/api/)
292
295
  - [VirusTotal API (v2)](https://docs.virustotal.com/v2.0/reference/getting-started)
296
+ - [Intelligence X SDK](https://github.com/IntelligenceX/SDK?tab=readme-ov-file#intelligence-x-public-sdk)
293
297
 
294
298
  Good luck and good hunting!
295
299
  If you really love the tool (or any others), or they helped you find an awesome bounty, consider [BUYING ME A COFFEE!](https://ko-fi.com/xnlh4ck3r) ☕ (I could use the caffeine!)
@@ -1,6 +1,6 @@
1
1
  <center><img src="https://github.com/xnl-h4ck3r/waymore/blob/main/waymore/images/title.png"></center>
2
2
 
3
- ## About - v4.8
3
+ ## About - v5.0
4
4
 
5
5
  The idea behind **waymore** is to find even more links from the Wayback Machine than other existing tools.
6
6
 
@@ -8,7 +8,7 @@ The idea behind **waymore** is to find even more links from the Wayback Machine
8
8
  👉 Also, other tools do not currenrtly deal with the rate limiting now in place by the sources, and will often just stop with incomplete results and not let you know they are incomplete.
9
9
 
10
10
  Anyone who does bug bounty will have likely used the amazing [waybackurls](https://github.com/tomnomnom/waybackurls) by @TomNomNoms. This tool gets URLs from [web.archive.org](https://web.archive.org) and additional links (if any) from one of the index collections on [index.commoncrawl.org](http://index.commoncrawl.org/).
11
- You would have also likely used the amazing [gau](https://github.com/lc/gau) by @hacker\_ which also finds URL's from wayback archive, Common Crawl, but also from Alien Vault and URLScan.
11
+ You would have also likely used the amazing [gau](https://github.com/lc/gau) by @hacker\_ which also finds URL's from wayback archive, Common Crawl, but also from Alien Vault, URLScan, Virus Total and Intelligence X.
12
12
  Now **waymore** gets URL's from ALL of those sources too (with ability to filter more to get what you want):
13
13
 
14
14
  - Wayback Machine (web.archive.org)
@@ -16,6 +16,7 @@ Now **waymore** gets URL's from ALL of those sources too (with ability to filter
16
16
  - Alien Vault OTX (otx.alienvault.com)
17
17
  - URLScan (urlscan.io)
18
18
  - Virus Total (virustotal.com)
19
+ - Intelligence X (intelx.io) - PAID SOURCE ONLY
19
20
 
20
21
  👉 It's a point that many seem to miss, so I'll just add it again :) ... The biggest difference between **waymore** and other tools is that it can also **download the archived responses** for URLs on wayback machine so that you can then search these for even more links, developer comments, extra parameters, etc. etc.
21
22
 
@@ -68,9 +69,9 @@ pipx install git+https://github.com/xnl-h4ck3r/waymore.git
68
69
  | -n | --no-subs | Don't include subdomains of the target domain (only used if input is not a domain with a specific path). |
69
70
  | -f | --filter-responses-only | The initial links from sources will not be filtered, only the responses that are downloaded, e.g. it maybe useful to still see all available paths from the links, even if you don't want to check the content. |
70
71
  | -fc | | Filter HTTP status codes for retrieved URLs and responses. Comma separated list of codes (default: the `FILTER_CODE` values from `config.yml`). Passing this argument will override the value from `config.yml` |
71
- | -ft | | Filter MIME Types for retrieved URLs and responses. Comma separated list of MIME Types (default: the `FILTER_MIME` values from `config.yml`). Passing this argument will override the value from `config.yml`. **NOTE: This will NOT be applied to Alien Vault OTX and Virus Total because they don't have the ability to filter on MIME Type. Sometimes URLScan does not have a MIME Type defined - these will always be included. Consider excluding sources if this matters to you.**. |
72
+ | -ft | | Filter MIME Types for retrieved URLs and responses. Comma separated list of MIME Types (default: the `FILTER_MIME` values from `config.yml`). Passing this argument will override the value from `config.yml`. **NOTE: This will NOT be applied to Alien Vault OTX, Virus Total and Intelligence X because they don't have the ability to filter on MIME Type. Sometimes URLScan does not have a MIME Type defined - these will always be included. Consider excluding sources if this matters to you.**. |
72
73
  | -mc | | Only Match HTTP status codes for retrieved URLs and responses. Comma separated list of codes. Passing this argument overrides the config `FILTER_CODE` and `-fc`. |
73
- | -mt | | Only MIME Types for retrieved URLs and responses. Comma separated list of MIME types. Passing this argument overrides the config `FILTER_MIME` and `-ft`. **NOTE: This will NOT be applied to Alien Vault OTX and Virus Total because they don't have the ability to filter on MIME Type. Sometimes URLScan does not have a MIME Type defined - these will always be included. Consider excluding sources if this matters to you.**. |
74
+ | -mt | | Only MIME Types for retrieved URLs and responses. Comma separated list of MIME types. Passing this argument overrides the config `FILTER_MIME` and `-ft`. **NOTE: This will NOT be applied to Alien Vault OTX, Virus Total and Intelligence X because they don't have the ability to filter on MIME Type. Sometimes URLScan does not have a MIME Type defined - these will always be included. Consider excluding sources if this matters to you.**. |
74
75
  | -l | --limit | How many responses will be saved (if `-mode R` or `-mode B` is passed). A positive value will get the **first N** results, a negative value will get the **last N** results. A value of 0 will get **ALL** responses (default: 5000) |
75
76
  | -from | --from-date | What date to get responses from. If not specified it will get from the earliest possible results. A partial value can be passed, e.g. `2016`, `201805`, etc. |
76
77
  | -to | --to-date | What date to get responses to. If not specified it will get to the latest possible results. A partial value can be passed, e.g. `2021`, `202112`, etc. |
@@ -82,6 +83,7 @@ pipx install git+https://github.com/xnl-h4ck3r/waymore.git
82
83
  | -xav | | Exclude checks for links from alienvault.com |
83
84
  | -xus | | Exclude checks for links from urlscan.io |
84
85
  | -xvt | | Exclude checks for links from virustotal.com |
86
+ | -xix | | Exclude checks for links from Intelligence X.com |
85
87
  | -lcc | | Limit the number of Common Crawl index collections searched, e.g. `-lcc 10` will just search the latest `10` collections (default: 1). As of November 2024 there are currently 106 collections. Setting to `0` will search **ALL** collections. If you don't want to search Common Crawl at all, use the `-xcc` option. |
86
88
  | -lcy | | Limit the number of Common Crawl index collections searched by the year of the index data. The earliest index has data from 2008. Setting to 0 (default) will search collections or any year (but in conjuction with `-lcc`). For example, if you are only interested in data from 2015 and after, pass `-lcy 2015`. This will override the value of `-lcc` if passed. If you don't want to search Common Crawl at all, use the `-xcc` option. |
87
89
  | -t | --timeout | This is for archived responses only! How many seconds to wait for the server to send data before giving up (default: 30) |
@@ -149,8 +151,9 @@ The `config.yml` file (typically in `~/.config/waymore/`) have values that can b
149
151
  - `CONTINUE_RESPONSES_IF_PIPED` - If retrieving archive responses doesn't complete, you will be prompted next time whether you want to continue with the previous run. However, if `stdout` is piped to another process it is assumed you don't want to have an interactive prompt. A value of `True` (default) will determine assure the previous run will be continued. if you want a fresh run every time then set to `False`.
150
152
  - `WEBHOOK_DISCORD` - If the `--notify-discord` argument is passed, `knoxnl` will send a notification to this Discord wehook when a successful XSS is found.
151
153
  - `DEFAULT_OUTPUT_DIR` - This is the default location of any output files written if the `-oU` and `-oR` arguments are not used. If the value of this key is blank, then it will default to the location of the `config.yml` file.
154
+ - `INTELX_API_KEY` - You can sign up to [intelx.io here](https://intelx.io/product). It requires a paid API key to do the `/phonebook/search` through their API (as of 2024-09-01, the Phonebook service has been restricted to paid users due to constant abuse by spam accounts).
152
155
 
153
- **NOTE: The MIME types cannot be filtered for Alien Vault OTX and Virus Total because they don't have the ability to filter on MIME Type. Sometimes URLScan does not have a MIME Type defined for a URL. In these cases, URLs will be included regardless of filter or match. Bear this in mind and consider excluding certain providers if this is important.**
156
+ **NOTE: The MIME types cannot be filtered for Alien Vault OTX, Virus Total and Intelligence X because they don't have the ability to filter on MIME Type. Sometimes URLScan does not have a MIME Type defined for a URL. In these cases, URLs will be included regardless of filter or match. Bear this in mind and consider excluding certain providers if this is important.**
154
157
 
155
158
  ## Output
156
159
 
@@ -266,7 +269,7 @@ If you come across any problems at all, or have ideas for improvements, please f
266
269
 
267
270
  ## TODO
268
271
 
269
- - Add an `-oss` argument that accepts a file of Out Of Scope subdomains/URLs that will not be returned in the output, or have any responses downloaded
272
+ - Add an `-oos` argument that accepts a file of Out Of Scope subdomains/URLs that will not be returned in the output, or have any responses downloaded
270
273
 
271
274
  ## References
272
275
 
@@ -275,6 +278,7 @@ If you come across any problems at all, or have ideas for improvements, please f
275
278
  - [Alien Vault OTX API](https://otx.alienvault.com/assets/static/external_api.html)
276
279
  - [URLScan API](https://urlscan.io/docs/api/)
277
280
  - [VirusTotal API (v2)](https://docs.virustotal.com/v2.0/reference/getting-started)
281
+ - [Intelligence X SDK](https://github.com/IntelligenceX/SDK?tab=readme-ov-file#intelligence-x-public-sdk)
278
282
 
279
283
  Good luck and good hunting!
280
284
  If you really love the tool (or any others), or they helped you find an awesome bounty, consider [BUYING ME A COFFEE!](https://ko-fi.com/xnlh4ck3r) ☕ (I could use the caffeine!)
@@ -0,0 +1 @@
1
+ __version__="5.0"
@@ -79,6 +79,7 @@ checkCommonCrawl = 0
79
79
  checkAlienVault = 0
80
80
  checkURLScan = 0
81
81
  checkVirusTotal = 0
82
+ checkIntelx = 0
82
83
  argsInputHostname = ''
83
84
  responseOutputDirectory = ''
84
85
 
@@ -88,6 +89,9 @@ CCRAWL_INDEX_URL = 'https://index.commoncrawl.org/collinfo.json'
88
89
  ALIENVAULT_URL = 'https://otx.alienvault.com/api/v1/indicators/{TYPE}/{DOMAIN}/url_list?limit=500'
89
90
  URLSCAN_URL = 'https://urlscan.io/api/v1/search/?q=domain:{DOMAIN}&size=10000'
90
91
  VIRUSTOTAL_URL = 'https://www.virustotal.com/vtapi/v2/domain/report?apikey={APIKEY}&domain={DOMAIN}'
92
+ INTELX_SEARCH_URL = 'https://2.intelx.io/phonebook/search'
93
+ INTELX_RESULTS_URL = 'https://2.intelx.io/phonebook/search/result?id='
94
+ INTELX_ACCOUNT_URL = 'https://2.intelx.io/authenticate/info'
91
95
 
92
96
  # User Agents to use when making requests, chosen at random
93
97
  USER_AGENT = [
@@ -144,6 +148,7 @@ URLSCAN_API_KEY = ''
144
148
  CONTINUE_RESPONSES_IF_PIPED = True
145
149
  WEBHOOK_DISCORD = ''
146
150
  DEFAULT_OUTPUT_DIR = ''
151
+ INTELX_API_KEY = ''
147
152
 
148
153
  API_KEY_SECRET = "aHR0cHM6Ly95b3V0dS5iZS9kUXc0dzlXZ1hjUQ=="
149
154
 
@@ -285,7 +290,7 @@ def showOptions():
285
290
  """
286
291
  Show the chosen options and config settings
287
292
  """
288
- global inputIsDomainANDPath, argsInput, isInputFile
293
+ global inputIsDomainANDPath, argsInput, isInputFile, INTELX_API_KEY
289
294
 
290
295
  try:
291
296
  write(colored('Selected config and settings:', 'cyan'))
@@ -325,6 +330,9 @@ def showOptions():
325
330
  providers = providers + 'URLScan, '
326
331
  if not args.xvt:
327
332
  providers = providers + 'VirusTotal, '
333
+ # Only show Intelligence X if the API key wa provided
334
+ if not args.xix and INTELX_API_KEY != '':
335
+ providers = providers + 'Intelligence X, '
328
336
  if providers == '':
329
337
  providers = 'None'
330
338
  write(colored('Providers: ' +str(providers.strip(', ')), 'magenta')+colored(' Which providers to check for URLs.','white'))
@@ -349,6 +357,11 @@ def showOptions():
349
357
  write(colored('VirusTotal API Key:', 'magenta')+colored(' {none} - You can get a FREE or paid API Key at https://www.virustotal.com/gui/join-us which will let you get some extra URLs.','white'))
350
358
  else:
351
359
  write(colored('VirusTotal API Key: ', 'magenta')+colored(VIRUSTOTAL_API_KEY))
360
+
361
+ if INTELX_API_KEY == '':
362
+ write(colored('Intelligence X API Key:', 'magenta')+colored(' {none} - You require a paid API Key from https://intelx.io/product','white'))
363
+ else:
364
+ write(colored('Intelligence X API Key: ', 'magenta')+colored(INTELX_API_KEY))
352
365
 
353
366
  if args.mode in ['U','B']:
354
367
  if args.output_urls != '':
@@ -401,12 +414,12 @@ def showOptions():
401
414
  write(colored('Response URL exclusions: ', 'magenta')+colored(FILTER_URL))
402
415
 
403
416
  if args.mt:
404
- write(colored('-mt: ' +str(args.mt.lower()), 'magenta')+colored(' Only retrieve URLs and Responses that match these MIME Types.','white')+colored(' NOTE: This will NOT be applied to Alien Vault OTX and Virus Total because they don\'t have the ability to filter on MIME Type. Sometimes URLScan does not have a MIME Type defined - these will always be included. Consider excluding sources if this matters to you','yellow'))
417
+ write(colored('-mt: ' +str(args.mt.lower()), 'magenta')+colored(' Only retrieve URLs and Responses that match these MIME Types.','white')+colored(' NOTE: This will NOT be applied to Alien Vault OTX, Virus Total and Intelligence X because they don\'t have the ability to filter on MIME Type. Sometimes URLScan does not have a MIME Type defined - these will always be included. Consider excluding sources if this matters to you','yellow'))
405
418
  else:
406
419
  if args.ft:
407
- write(colored('-ft: ' +str(args.ft.lower()), 'magenta')+colored(' Don\'t retrieve URLs and Responses that match these MIME Types.','white')+colored(' NOTE: This will NOT be applied to Alien Vault OTX and Virus Total because they don\'t have the ability to filter on MIME Type. Sometimes URLScan does not have a MIME Type defined - these will always be included. Consider excluding sources if this matters to you','yellow'))
420
+ write(colored('-ft: ' +str(args.ft.lower()), 'magenta')+colored(' Don\'t retrieve URLs and Responses that match these MIME Types.','white')+colored(' NOTE: This will NOT be applied to Alien Vault OTX, Virus Total and Intelligence X because they don\'t have the ability to filter on MIME Type. Sometimes URLScan does not have a MIME Type defined - these will always be included. Consider excluding sources if this matters to you','yellow'))
408
421
  else:
409
- write(colored('MIME Type exclusions: ', 'magenta')+colored(FILTER_MIME)+colored(' Don\'t retrieve URLs and Responses that match these MIME Types.','white')+colored(' NOTE: This will NOT be applied to Alien Vault OTX and Virus Total because they don\'t have the ability to filter on MIME Type. Sometimes URLScan does not have a MIME Type defined - these will always be included. Consider excluding sources if this matters to you','yellow'))
422
+ write(colored('MIME Type exclusions: ', 'magenta')+colored(FILTER_MIME)+colored(' Don\'t retrieve URLs and Responses that match these MIME Types.','white')+colored(' NOTE: This will NOT be applied to Alien Vault OTX, Virus Total and Intelligence X because they don\'t have the ability to filter on MIME Type. Sometimes URLScan does not have a MIME Type defined - these will always be included. Consider excluding sources if this matters to you','yellow'))
410
423
 
411
424
  if args.keywords_only and args.keywords_only == '#CONFIG':
412
425
  if FILTER_KEYWORDS == '':
@@ -444,7 +457,7 @@ def getConfig():
444
457
  """
445
458
  Try to get the values from the config file, otherwise use the defaults
446
459
  """
447
- global FILTER_CODE, FILTER_MIME, FILTER_URL, FILTER_KEYWORDS, URLSCAN_API_KEY, VIRUSTOTAL_API_KEY, CONTINUE_RESPONSES_IF_PIPED, subs, path, waymorePath, inputIsDomainANDPath, HTTP_ADAPTER, HTTP_ADAPTER_CC, argsInput, terminalWidth, MATCH_CODE, WEBHOOK_DISCORD, DEFAULT_OUTPUT_DIR, MATCH_MIME
460
+ global FILTER_CODE, FILTER_MIME, FILTER_URL, FILTER_KEYWORDS, URLSCAN_API_KEY, VIRUSTOTAL_API_KEY, CONTINUE_RESPONSES_IF_PIPED, subs, path, waymorePath, inputIsDomainANDPath, HTTP_ADAPTER, HTTP_ADAPTER_CC, argsInput, terminalWidth, MATCH_CODE, WEBHOOK_DISCORD, DEFAULT_OUTPUT_DIR, MATCH_MIME, INTELX_API_KEY
448
461
  try:
449
462
 
450
463
  # Set terminal width
@@ -580,6 +593,13 @@ def getConfig():
580
593
  writerr(colored('Unable to read "VIRUSTOTAL_API_KEY" from config.yml - consider adding (you can get a FREE api key at virustotal.com)', 'red'))
581
594
  VIRUSTOTAL_API_KEY = ''
582
595
 
596
+ try:
597
+ INTELX_API_KEY = config.get('INTELX_API_KEY')
598
+ if str(INTELX_API_KEY) == 'None':
599
+ INTELX_API_KEY = ''
600
+ except Exception as e:
601
+ INTELX_API_KEY = ''
602
+
583
603
  try:
584
604
  FILTER_KEYWORDS = config.get('FILTER_KEYWORDS')
585
605
  if str(FILTER_KEYWORDS) == 'None':
@@ -653,6 +673,7 @@ def getConfig():
653
673
  FILTER_CODE = DEFAULT_FILTER_CODE
654
674
  URLSCAN_API_KEY = ''
655
675
  VIRUSTOTAL_API_KEY = ''
676
+ INTELX_API_KEY = ''
656
677
  FILTER_KEYWORDS = ''
657
678
  CONTINUE_RESPONSES_IF_PIPED = True
658
679
  WEBHOOK_DISCORD = ''
@@ -759,7 +780,7 @@ def linksFoundResponseAdd(link):
759
780
  parsed_url = linkWithoutTimestamp
760
781
 
761
782
  # Don't write it if the link does not contain the requested domain (this can sometimes happen)
762
- if parsed_url.find(checkInput) >= 0:
783
+ if parsed_url.lower().find(checkInput.lower()) >= 0:
763
784
  linksFound.add(link)
764
785
  except Exception as e:
765
786
  linksFound.add(link)
@@ -825,8 +846,9 @@ def processArchiveUrl(url):
825
846
  # Only create a file if there is a response
826
847
  if len(archiveHtml) != 0:
827
848
 
828
- # If the FILTER_CODE includes 404, and it only process if it doesn't seem to be a custom 404 page
829
- if '404' in FILTER_CODE and not re.findall(REGEX_404, archiveHtml, re.DOTALL|re.IGNORECASE):
849
+ # If the FILTER_CODE doesn't include 404, OR
850
+ # If the FILTER_CODE includes 404, and it doesn't seem to be a custom 404 page
851
+ if '404' not in FILTER_CODE or ('404' in FILTER_CODE and not re.findall(REGEX_404, archiveHtml, re.DOTALL|re.IGNORECASE)):
830
852
 
831
853
  # Add the URL as a comment at the start of the response
832
854
  if args.url_filename:
@@ -1014,12 +1036,12 @@ def processURLOutput():
1014
1036
  """
1015
1037
  Show results of the URL output, i.e. getting URLs from archive.org and commoncrawl.org and write results to file
1016
1038
  """
1017
- global linksFound, subs, path, argsInput, checkWayback, checkCommonCrawl, checkAlienVault, checkURLScan, checkVirusTotal, DEFAULT_OUTPUT_DIR
1039
+ global linksFound, subs, path, argsInput, checkWayback, checkCommonCrawl, checkAlienVault, checkURLScan, checkVirusTotal, DEFAULT_OUTPUT_DIR, checkIntelx
1018
1040
 
1019
1041
  try:
1020
1042
 
1021
1043
  if args.check_only:
1022
- totalRequests = checkWayback + checkCommonCrawl + checkAlienVault + checkURLScan + checkVirusTotal
1044
+ totalRequests = checkWayback + checkCommonCrawl + checkAlienVault + checkURLScan + checkVirusTotal + checkIntelx
1023
1045
  minutes = totalRequests*1 // 60
1024
1046
  hours = minutes // 60
1025
1047
  days = hours // 24
@@ -1284,12 +1306,13 @@ def validateArgProviders(x):
1284
1306
  - otx
1285
1307
  - urlscan
1286
1308
  - virustotal
1309
+ - intelx
1287
1310
  """
1288
1311
  invalid = False
1289
1312
  x = x.lower()
1290
1313
  providers = x.split(',')
1291
1314
  for provider in providers:
1292
- if not re.fullmatch(r'(wayback|commoncrawl|otx|urlscan|virustotal)', provider):
1315
+ if not re.fullmatch(r'(wayback|commoncrawl|otx|urlscan|virustotal|intelx)', provider):
1293
1316
  invalid = True
1294
1317
  break
1295
1318
  if invalid:
@@ -2335,7 +2358,7 @@ def getCommonCrawlUrls():
2335
2358
 
2336
2359
  def processVirusTotalUrl(url):
2337
2360
  """
2338
- Process a specific URL from virustotal.io to determine whether to save the link
2361
+ Process a specific URL from virustotal.com to determine whether to save the link
2339
2362
  """
2340
2363
  global argsInput, argsInputHostname
2341
2364
 
@@ -2377,7 +2400,7 @@ def processVirusTotalUrl(url):
2377
2400
 
2378
2401
  # Add link if it passed filters
2379
2402
  if addLink:
2380
- # Just get the hostname of the urkl
2403
+ # Just get the hostname of the url
2381
2404
  tldExtract = tldextract.extract(url)
2382
2405
  subDomain = tldExtract.subdomain
2383
2406
  if subDomain != '':
@@ -2422,11 +2445,10 @@ def getVirusTotalUrls():
2422
2445
  session = requests.Session()
2423
2446
  session.mount('https://', HTTP_ADAPTER)
2424
2447
  session.mount('http://', HTTP_ADAPTER)
2425
- # Pass the API-Key header too. This can change the max endpoints per page, depending on URLScan subscription
2426
2448
  resp = session.get(url, headers={'User-Agent':userAgent})
2427
2449
  requestsMade = requestsMade + 1
2428
2450
  except Exception as e:
2429
- write(colored(getSPACER('[ ERR ] Unable to get links from virustotal.io: ' + str(e)), 'red'))
2451
+ write(colored(getSPACER('[ ERR ] Unable to get links from virustotal.com: ' + str(e)), 'red'))
2430
2452
  return
2431
2453
 
2432
2454
  # Deal with any errors
@@ -2493,6 +2515,204 @@ def getVirusTotalUrls():
2493
2515
  except Exception as e:
2494
2516
  writerr(colored('ERROR getVirusTotalUrls 1: ' + str(e), 'red'))
2495
2517
 
2518
+ def processIntelxUrl(url):
2519
+ """
2520
+ Process a specific URL from intelx.io to determine whether to save the link
2521
+ """
2522
+ global argsInput, argsInputHostname
2523
+
2524
+ addLink = True
2525
+
2526
+ # If the url passed doesn't have a scheme, prefix with http://
2527
+ match = re.search(r'^[A-za-z]*\:\/\/', url, flags=re.IGNORECASE)
2528
+ if match is None:
2529
+ url = 'http://'+url
2530
+
2531
+ try:
2532
+ # If filters are required then test them
2533
+ if not args.filter_responses_only:
2534
+
2535
+ # If the user requested -n / --no-subs then we don't want to add it if it has a sub domain (www. will not be classed as a sub domain)
2536
+ if args.no_subs:
2537
+ match = re.search(r'^[A-za-z]*\:\/\/(www\.)?'+re.escape(argsInputHostname), url, flags=re.IGNORECASE)
2538
+ if match is None:
2539
+ addLink = False
2540
+
2541
+ # If the user didn't requested -f / --filter-responses-only then check http code
2542
+ # Note we can't check MIME filter because it is not returned by VirusTotal API
2543
+ if addLink and not args.filter_responses_only:
2544
+
2545
+ # Check the URL exclusions
2546
+ if addLink:
2547
+ match = re.search(r'('+re.escape(FILTER_URL).replace(',','|')+')', url, flags=re.IGNORECASE)
2548
+ if match is not None:
2549
+ addLink = False
2550
+
2551
+ # Set keywords filter if -ko argument passed
2552
+ if addLink and args.keywords_only:
2553
+ if args.keywords_only == '#CONFIG':
2554
+ match = re.search(r'('+re.escape(FILTER_KEYWORDS).replace(',','|')+')', url, flags=re.IGNORECASE)
2555
+ else:
2556
+ match = re.search(r'('+args.keywords_only+')', url, flags=re.IGNORECASE)
2557
+ if match is None:
2558
+ addLink = False
2559
+
2560
+ # Add link if it passed filters
2561
+ if addLink:
2562
+ linksFoundAdd(url)
2563
+
2564
+ except Exception as e:
2565
+ writerr(colored('ERROR processIntelxUrl 1: ' + str(e), 'red'))
2566
+
2567
+ def processIntelxType(target, credits):
2568
+ '''
2569
+ target: 1 - Domains
2570
+ target: 3 - URLs
2571
+ '''
2572
+ try:
2573
+ try:
2574
+ requestsMade = 0
2575
+
2576
+ # Choose a random user agent string to use for any requests
2577
+ userAgent = random.choice(USER_AGENT)
2578
+ session = requests.Session()
2579
+ session.mount('https://', HTTP_ADAPTER)
2580
+ session.mount('http://', HTTP_ADAPTER)
2581
+ # Pass the API key in the X-Key header too.
2582
+ resp = session.post(INTELX_SEARCH_URL, data='{"term":"'+quote(argsInputHostname)+'","target":'+str(target)+'}', headers={'User-Agent':userAgent,'X-Key':INTELX_API_KEY})
2583
+ requestsMade = requestsMade + 1
2584
+ except Exception as e:
2585
+ write(colored(getSPACER('[ ERR ] Unable to get links from intelx.io: ' + str(e)), 'red'))
2586
+ return
2587
+
2588
+ # Deal with any errors
2589
+ if resp.status_code == 429:
2590
+ writerr(colored(getSPACER('[ 429 ] IntelX rate limit reached so unable to get links.'),'red'))
2591
+ return
2592
+ elif resp.status_code == 401:
2593
+ writerr(colored(getSPACER('[ 401 ] IntelX: Not authorized. The source requires a paid API key. Check your API key is correct.'),'red'))
2594
+ return
2595
+ elif resp.status_code == 402:
2596
+ if credits.startswith("0/"):
2597
+ writerr(colored(getSPACER('[ 402 ] IntelX: You have run out of daily credits on Intelx ('+credits+').'),'red'))
2598
+ else:
2599
+ writerr(colored(getSPACER('[ 402 ] IntelX: It appears you have run out of daily credits on Intelx.'),'red'))
2600
+ return
2601
+ elif resp.status_code == 403:
2602
+ writerr(colored(getSPACER('[ 403 ] IntelX: Permission denied. Check your API key is correct.'),'red'))
2603
+ return
2604
+ elif resp.status_code != 200:
2605
+ writerr(colored(getSPACER('[ ' + str(resp.status_code) + ' ] Unable to get links from intelx.io'),'red'))
2606
+ return
2607
+
2608
+ # Get the JSON response
2609
+ try:
2610
+ jsonResp = json.loads(resp.text.strip())
2611
+ id = jsonResp['id']
2612
+ except:
2613
+ writerr(colored(getSPACER('[ ERR ] There was an unexpected response from the Intelligence API'),'red'))
2614
+ return
2615
+
2616
+ # Get each page of the results
2617
+ moreResults = True
2618
+ status = 0
2619
+ while moreResults:
2620
+ if stopSource:
2621
+ break
2622
+ try:
2623
+ resp = session.get(INTELX_RESULTS_URL+id, headers={'User-Agent':userAgent,'X-Key':INTELX_API_KEY})
2624
+ requestsMade = requestsMade + 1
2625
+ except Exception as e:
2626
+ write(colored(getSPACER('[ ERR ] Unable to get links from intelx.io: ' + str(e)), 'red'))
2627
+ return
2628
+
2629
+ # Get the JSON response
2630
+ try:
2631
+ jsonResp = json.loads(resp.text.strip())
2632
+ status = jsonResp['status']
2633
+ except:
2634
+ writerr(colored(getSPACER('[ ERR ] There was an unexpected response from the Intelligence API'),'red'))
2635
+ moreResults = False
2636
+
2637
+ try:
2638
+ selector_values = [entry['selectorvalue'] for entry in jsonResp.get('selectors', [])]
2639
+ except Exception as e:
2640
+ selector_values = []
2641
+ try:
2642
+ selector_valuesh = [entry['selectorvalueh'] for entry in jsonResp.get('selectors', [])]
2643
+ except Exception as e:
2644
+ selector_valuesh = []
2645
+
2646
+ # Work out whether to include each url
2647
+ unique_values = list(set(selector_values + selector_valuesh))
2648
+ for ixurl in unique_values:
2649
+ if stopSource:
2650
+ break
2651
+ processIntelxUrl(ixurl)
2652
+
2653
+ if status == 1 or selector_values == []:
2654
+ moreResults = False
2655
+
2656
+ except Exception as e:
2657
+ writerr(colored('ERROR processIntelxType 1: ' + str(e), 'red'))
2658
+
2659
+ def getIntelxAccountInfo() -> str:
2660
+ '''
2661
+ Get the account info and return the number of Credits remainiing from the /phonebook/search
2662
+ '''
2663
+ try:
2664
+ # Choose a random user agent string to use for any requests
2665
+ userAgent = random.choice(USER_AGENT)
2666
+ session = requests.Session()
2667
+ session.mount('https://', HTTP_ADAPTER)
2668
+ session.mount('http://', HTTP_ADAPTER)
2669
+ # Pass the API key in the X-Key header too.
2670
+ resp = session.get(INTELX_ACCOUNT_URL, headers={'User-Agent':userAgent,'X-Key':INTELX_API_KEY})
2671
+ jsonResp = json.loads(resp.text.strip())
2672
+ credits = str(jsonResp.get("paths", {}).get("/phonebook/search", {}).get("Credit", "Unknown"))
2673
+ credits_max = str(jsonResp.get("paths", {}).get("/phonebook/search", {}).get("CreditMax", "Unknown"))
2674
+ return credits+"/"+credits_max
2675
+ except:
2676
+ return "Unknown"
2677
+
2678
+ def getIntelxUrls():
2679
+ """
2680
+ Get URLs from the Intelligence X Phonebook search
2681
+ """
2682
+ global INTELX_API_KEY, linksFound, waymorePath, subs, stopProgram, stopSource, argsInput, checkIntelx, argsInputHostname
2683
+
2684
+ # Write the file of URL's for the passed domain/URL
2685
+ try:
2686
+ if args.check_only:
2687
+ write(colored('Get URLs from Intelligence X: ','cyan')+colored('minimum 4 requests','white'))
2688
+ checkIntelx = 4
2689
+ return
2690
+
2691
+ stopSource = False
2692
+ originalLinkCount = len(linksFound)
2693
+ credits = getIntelxAccountInfo()
2694
+ if verbose():
2695
+ write(colored('The Intelligence X URL requested to get links (Credits: '+credits+'): ','magenta')+colored(INTELX_SEARCH_URL+'\n','white'))
2696
+
2697
+ if not args.check_only:
2698
+ write(colored('\rGetting links from intelx.io API...\r','cyan'))
2699
+
2700
+ # Get the domains from Intelligence X if the --no-subs wasn't passed
2701
+ if not args.no_subs:
2702
+ processIntelxType(1, credits)
2703
+
2704
+ # Get the URLs from Intelligence X
2705
+ processIntelxType(3, credits)
2706
+
2707
+ linkCount = len(linksFound) - originalLinkCount
2708
+ if args.xwm and args.xcc and args.xav and args.xus and args.xvt:
2709
+ write(getSPACER(colored('Links found on intelx.io: ', 'cyan')+colored(str(linkCount),'white'))+'\n')
2710
+ else:
2711
+ write(getSPACER(colored('Extra links found on intelx.io: ', 'cyan')+colored(str(linkCount),'white'))+'\n')
2712
+
2713
+ except Exception as e:
2714
+ writerr(colored('ERROR getIntelxUrls 1: ' + str(e), 'red'))
2715
+
2496
2716
  def processResponses():
2497
2717
  """
2498
2718
  Get archived responses from Wayback Machine (archive.org)
@@ -2920,6 +3140,8 @@ def combineInlineJS():
2920
3140
  for script in uniqueExternalScripts:
2921
3141
  inlineExternalFile.write(script.strip() + '\n')
2922
3142
  write(colored('Created file ','cyan')+colored(responseOutputDirectory+'combinedInlineSrc.txt','white')+colored(' (src of external JS)','cyan'))
3143
+ else:
3144
+ write(colored('No external JS scripts found, so no combined Inline Src file written.\n','cyan'))
2923
3145
 
2924
3146
  # Write files for all combined inline JS
2925
3147
  uniqueScripts = set()
@@ -2977,19 +3199,20 @@ def combineInlineJS():
2977
3199
 
2978
3200
  currentScript += 1
2979
3201
 
2980
- if totalExternal == 0 and totalSections == 0:
2981
- write(colored('No scripts found, so no combined JS files written.\n','cyan'))
2982
- elif fileNumber == 1:
2983
- write(colored('Created file ','cyan')+colored(responseOutputDirectory+'combinedInline1.js','white')+colored(' (contents of inline JS)\n','cyan'))
3202
+ if totalSections == 0:
3203
+ write(colored('No scripts found, so no combined Inline JS files written.\n','cyan'))
2984
3204
  else:
2985
- write(colored('Created files ','cyan')+colored(responseOutputDirectory+'combinedInline{1-'+str(fileNumber)+'}.js','white')+colored(' (contents of inline JS)\n','cyan'))
3205
+ if fileNumber == 1:
3206
+ write(colored('Created file ','cyan')+colored(responseOutputDirectory+'combinedInline1.js','white')+colored(' (contents of inline JS)\n','cyan'))
3207
+ else:
3208
+ write(colored('Created files ','cyan')+colored(responseOutputDirectory+'combinedInline{1-'+str(fileNumber)+'}.js','white')+colored(' (contents of inline JS)\n','cyan'))
2986
3209
 
2987
3210
  except Exception as e:
2988
3211
  writerr(colored('ERROR combineInlineJS 1: ' + str(e), 'red'))
2989
3212
 
2990
3213
  # Run waymore
2991
3214
  def main():
2992
- global args, DEFAULT_TIMEOUT, inputValues, argsInput, linksFound, linkMimes, successCount, failureCount, fileCount, totalResponses, totalPages, indexFile, path, stopSource, stopProgram, VIRUSTOTAL_API_KEY, inputIsSubDomain, argsInputHostname, WEBHOOK_DISCORD, responseOutputDirectory, fileCount
3215
+ global args, DEFAULT_TIMEOUT, inputValues, argsInput, linksFound, linkMimes, successCount, failureCount, fileCount, totalResponses, totalPages, indexFile, path, stopSource, stopProgram, VIRUSTOTAL_API_KEY, inputIsSubDomain, argsInputHostname, WEBHOOK_DISCORD, responseOutputDirectory, fileCount, INTELX_API_KEY
2993
3216
 
2994
3217
  # Tell Python to run the handler() function when SIGINT is received
2995
3218
  signal(SIGINT, handler)
@@ -3047,7 +3270,7 @@ def main():
3047
3270
  parser.add_argument(
3048
3271
  '-ft',
3049
3272
  action='store',
3050
- help='Filter MIME Types for retrieved URLs and responses. Comma separated list of MIME Types (default: the FILTER_MIME values from config.yml). Passing this argument will override the value from config.yml. NOTE: This will NOT be applied to Alien Vault OTX and Virus Total because they don\'t have the ability to filter on MIME Type. Sometimes URLScan does not have a MIME Type defined - these will always be included. Consider excluding sources if this matters to you.',
3273
+ help='Filter MIME Types for retrieved URLs and responses. Comma separated list of MIME Types (default: the FILTER_MIME values from config.yml). Passing this argument will override the value from config.yml. NOTE: This will NOT be applied to Alien Vault OTX, Virus Total and Intelligence X because they don\'t have the ability to filter on MIME Type. Sometimes URLScan does not have a MIME Type defined - these will always be included. Consider excluding sources if this matters to you.',
3051
3274
  type=validateArgMimeTypes,
3052
3275
  )
3053
3276
  parser.add_argument(
@@ -3059,7 +3282,7 @@ def main():
3059
3282
  parser.add_argument(
3060
3283
  '-mt',
3061
3284
  action='store',
3062
- help='Only MIME Types for retrieved URLs and responses. Comma separated list of MIME types. Passing this argument overrides the config FILTER_MIME and -ft. NOTE: This will NOT be applied to Alien Vault OTX and Virus Total because they don\'t have the ability to filter on MIME Type. Sometimes URLScan does not have a MIME Type defined - these will always be included. Consider excluding sources if this matters to you.',
3285
+ help='Only MIME Types for retrieved URLs and responses. Comma separated list of MIME types. Passing this argument overrides the config FILTER_MIME and -ft. NOTE: This will NOT be applied to Alien Vault OTX, Virus Total and Intelligence X because they don\'t have the ability to filter on MIME Type. Sometimes URLScan does not have a MIME Type defined - these will always be included. Consider excluding sources if this matters to you.',
3063
3286
  type=validateArgMimeTypes,
3064
3287
  )
3065
3288
  parser.add_argument(
@@ -3137,13 +3360,19 @@ def main():
3137
3360
  help='Exclude checks for links from virustotal.com',
3138
3361
  default=False
3139
3362
  )
3363
+ parser.add_argument(
3364
+ '-xix',
3365
+ action='store_true',
3366
+ help='Exclude checks for links from intelx.io',
3367
+ default=False
3368
+ )
3140
3369
  parser.add_argument(
3141
3370
  '--providers',
3142
3371
  action='store',
3143
- help='A comma separated list of source providers that you want to get URLs from. The values can be wayback,commoncrawl,otx,urlscan and virustotal. Passing this will override any exclude arguments (e.g. -xwm,-xcc, etc.) passed to exclude sources, and reset those based on what was passed with this argument.',
3372
+ help='A comma separated list of source providers that you want to get URLs from. The values can be wayback,commoncrawl,otx,urlscan,virustotal and intelx. Passing this will override any exclude arguments (e.g. -xwm,-xcc, etc.) passed to exclude sources, and reset those based on what was passed with this argument.',
3144
3373
  default=[],
3145
3374
  type=validateArgProviders,
3146
- metavar='{wayback,commoncrawl,otx,urlscan,virustotal}'
3375
+ metavar='{wayback,commoncrawl,otx,urlscan,virustotal,intelx}'
3147
3376
  )
3148
3377
  parser.add_argument(
3149
3378
  '-lcc',
@@ -3297,6 +3526,10 @@ def main():
3297
3526
  args.xvt = True
3298
3527
  else:
3299
3528
  args.xvt = False
3529
+ if 'intelx' not in args.providers:
3530
+ args.xix = True
3531
+ else:
3532
+ args.xix = False
3300
3533
 
3301
3534
  # If no input was given, raise an error
3302
3535
  if sys.stdin.isatty():
@@ -3386,6 +3619,10 @@ def main():
3386
3619
  # If not requested to exclude, get URLs from virustotal.com if we have an API key
3387
3620
  if not args.xvt and VIRUSTOTAL_API_KEY != '' and stopProgram is None:
3388
3621
  getVirusTotalUrls()
3622
+
3623
+ # If not requested to exclude, get URLs from intelx.io if we have an API key
3624
+ if not args.xix and INTELX_API_KEY != '' and stopProgram is None:
3625
+ getIntelxUrls()
3389
3626
 
3390
3627
  # Output results of all searches
3391
3628
  processURLOutput()
@@ -1,6 +1,6 @@
1
1
  Metadata-Version: 2.1
2
2
  Name: waymore
3
- Version: 4.8
3
+ Version: 5.0
4
4
  Summary: Find way more from the Wayback Machine, Common Crawl, Alien Vault OTX, URLScan & VirusTotal!
5
5
  Home-page: https://github.com/xnl-h4ck3r/waymore
6
6
  Author: @xnl-h4ck3r
@@ -15,7 +15,7 @@ Requires-Dist: tldextract
15
15
 
16
16
  <center><img src="https://github.com/xnl-h4ck3r/waymore/blob/main/waymore/images/title.png"></center>
17
17
 
18
- ## About - v4.8
18
+ ## About - v5.0
19
19
 
20
20
  The idea behind **waymore** is to find even more links from the Wayback Machine than other existing tools.
21
21
 
@@ -23,7 +23,7 @@ The idea behind **waymore** is to find even more links from the Wayback Machine
23
23
  👉 Also, other tools do not currenrtly deal with the rate limiting now in place by the sources, and will often just stop with incomplete results and not let you know they are incomplete.
24
24
 
25
25
  Anyone who does bug bounty will have likely used the amazing [waybackurls](https://github.com/tomnomnom/waybackurls) by @TomNomNoms. This tool gets URLs from [web.archive.org](https://web.archive.org) and additional links (if any) from one of the index collections on [index.commoncrawl.org](http://index.commoncrawl.org/).
26
- You would have also likely used the amazing [gau](https://github.com/lc/gau) by @hacker\_ which also finds URL's from wayback archive, Common Crawl, but also from Alien Vault and URLScan.
26
+ You would have also likely used the amazing [gau](https://github.com/lc/gau) by @hacker\_ which also finds URL's from wayback archive, Common Crawl, but also from Alien Vault, URLScan, Virus Total and Intelligence X.
27
27
  Now **waymore** gets URL's from ALL of those sources too (with ability to filter more to get what you want):
28
28
 
29
29
  - Wayback Machine (web.archive.org)
@@ -31,6 +31,7 @@ Now **waymore** gets URL's from ALL of those sources too (with ability to filter
31
31
  - Alien Vault OTX (otx.alienvault.com)
32
32
  - URLScan (urlscan.io)
33
33
  - Virus Total (virustotal.com)
34
+ - Intelligence X (intelx.io) - PAID SOURCE ONLY
34
35
 
35
36
  👉 It's a point that many seem to miss, so I'll just add it again :) ... The biggest difference between **waymore** and other tools is that it can also **download the archived responses** for URLs on wayback machine so that you can then search these for even more links, developer comments, extra parameters, etc. etc.
36
37
 
@@ -83,9 +84,9 @@ pipx install git+https://github.com/xnl-h4ck3r/waymore.git
83
84
  | -n | --no-subs | Don't include subdomains of the target domain (only used if input is not a domain with a specific path). |
84
85
  | -f | --filter-responses-only | The initial links from sources will not be filtered, only the responses that are downloaded, e.g. it maybe useful to still see all available paths from the links, even if you don't want to check the content. |
85
86
  | -fc | | Filter HTTP status codes for retrieved URLs and responses. Comma separated list of codes (default: the `FILTER_CODE` values from `config.yml`). Passing this argument will override the value from `config.yml` |
86
- | -ft | | Filter MIME Types for retrieved URLs and responses. Comma separated list of MIME Types (default: the `FILTER_MIME` values from `config.yml`). Passing this argument will override the value from `config.yml`. **NOTE: This will NOT be applied to Alien Vault OTX and Virus Total because they don't have the ability to filter on MIME Type. Sometimes URLScan does not have a MIME Type defined - these will always be included. Consider excluding sources if this matters to you.**. |
87
+ | -ft | | Filter MIME Types for retrieved URLs and responses. Comma separated list of MIME Types (default: the `FILTER_MIME` values from `config.yml`). Passing this argument will override the value from `config.yml`. **NOTE: This will NOT be applied to Alien Vault OTX, Virus Total and Intelligence X because they don't have the ability to filter on MIME Type. Sometimes URLScan does not have a MIME Type defined - these will always be included. Consider excluding sources if this matters to you.**. |
87
88
  | -mc | | Only Match HTTP status codes for retrieved URLs and responses. Comma separated list of codes. Passing this argument overrides the config `FILTER_CODE` and `-fc`. |
88
- | -mt | | Only MIME Types for retrieved URLs and responses. Comma separated list of MIME types. Passing this argument overrides the config `FILTER_MIME` and `-ft`. **NOTE: This will NOT be applied to Alien Vault OTX and Virus Total because they don't have the ability to filter on MIME Type. Sometimes URLScan does not have a MIME Type defined - these will always be included. Consider excluding sources if this matters to you.**. |
89
+ | -mt | | Only MIME Types for retrieved URLs and responses. Comma separated list of MIME types. Passing this argument overrides the config `FILTER_MIME` and `-ft`. **NOTE: This will NOT be applied to Alien Vault OTX, Virus Total and Intelligence X because they don't have the ability to filter on MIME Type. Sometimes URLScan does not have a MIME Type defined - these will always be included. Consider excluding sources if this matters to you.**. |
89
90
  | -l | --limit | How many responses will be saved (if `-mode R` or `-mode B` is passed). A positive value will get the **first N** results, a negative value will get the **last N** results. A value of 0 will get **ALL** responses (default: 5000) |
90
91
  | -from | --from-date | What date to get responses from. If not specified it will get from the earliest possible results. A partial value can be passed, e.g. `2016`, `201805`, etc. |
91
92
  | -to | --to-date | What date to get responses to. If not specified it will get to the latest possible results. A partial value can be passed, e.g. `2021`, `202112`, etc. |
@@ -97,6 +98,7 @@ pipx install git+https://github.com/xnl-h4ck3r/waymore.git
97
98
  | -xav | | Exclude checks for links from alienvault.com |
98
99
  | -xus | | Exclude checks for links from urlscan.io |
99
100
  | -xvt | | Exclude checks for links from virustotal.com |
101
+ | -xix | | Exclude checks for links from Intelligence X.com |
100
102
  | -lcc | | Limit the number of Common Crawl index collections searched, e.g. `-lcc 10` will just search the latest `10` collections (default: 1). As of November 2024 there are currently 106 collections. Setting to `0` will search **ALL** collections. If you don't want to search Common Crawl at all, use the `-xcc` option. |
101
103
  | -lcy | | Limit the number of Common Crawl index collections searched by the year of the index data. The earliest index has data from 2008. Setting to 0 (default) will search collections or any year (but in conjuction with `-lcc`). For example, if you are only interested in data from 2015 and after, pass `-lcy 2015`. This will override the value of `-lcc` if passed. If you don't want to search Common Crawl at all, use the `-xcc` option. |
102
104
  | -t | --timeout | This is for archived responses only! How many seconds to wait for the server to send data before giving up (default: 30) |
@@ -164,8 +166,9 @@ The `config.yml` file (typically in `~/.config/waymore/`) have values that can b
164
166
  - `CONTINUE_RESPONSES_IF_PIPED` - If retrieving archive responses doesn't complete, you will be prompted next time whether you want to continue with the previous run. However, if `stdout` is piped to another process it is assumed you don't want to have an interactive prompt. A value of `True` (default) will determine assure the previous run will be continued. if you want a fresh run every time then set to `False`.
165
167
  - `WEBHOOK_DISCORD` - If the `--notify-discord` argument is passed, `knoxnl` will send a notification to this Discord wehook when a successful XSS is found.
166
168
  - `DEFAULT_OUTPUT_DIR` - This is the default location of any output files written if the `-oU` and `-oR` arguments are not used. If the value of this key is blank, then it will default to the location of the `config.yml` file.
169
+ - `INTELX_API_KEY` - You can sign up to [intelx.io here](https://intelx.io/product). It requires a paid API key to do the `/phonebook/search` through their API (as of 2024-09-01, the Phonebook service has been restricted to paid users due to constant abuse by spam accounts).
167
170
 
168
- **NOTE: The MIME types cannot be filtered for Alien Vault OTX and Virus Total because they don't have the ability to filter on MIME Type. Sometimes URLScan does not have a MIME Type defined for a URL. In these cases, URLs will be included regardless of filter or match. Bear this in mind and consider excluding certain providers if this is important.**
171
+ **NOTE: The MIME types cannot be filtered for Alien Vault OTX, Virus Total and Intelligence X because they don't have the ability to filter on MIME Type. Sometimes URLScan does not have a MIME Type defined for a URL. In these cases, URLs will be included regardless of filter or match. Bear this in mind and consider excluding certain providers if this is important.**
169
172
 
170
173
  ## Output
171
174
 
@@ -281,7 +284,7 @@ If you come across any problems at all, or have ideas for improvements, please f
281
284
 
282
285
  ## TODO
283
286
 
284
- - Add an `-oss` argument that accepts a file of Out Of Scope subdomains/URLs that will not be returned in the output, or have any responses downloaded
287
+ - Add an `-oos` argument that accepts a file of Out Of Scope subdomains/URLs that will not be returned in the output, or have any responses downloaded
285
288
 
286
289
  ## References
287
290
 
@@ -290,6 +293,7 @@ If you come across any problems at all, or have ideas for improvements, please f
290
293
  - [Alien Vault OTX API](https://otx.alienvault.com/assets/static/external_api.html)
291
294
  - [URLScan API](https://urlscan.io/docs/api/)
292
295
  - [VirusTotal API (v2)](https://docs.virustotal.com/v2.0/reference/getting-started)
296
+ - [Intelligence X SDK](https://github.com/IntelligenceX/SDK?tab=readme-ov-file#intelligence-x-public-sdk)
293
297
 
294
298
  Good luck and good hunting!
295
299
  If you really love the tool (or any others), or they helped you find an awesome bounty, consider [BUYING ME A COFFEE!](https://ko-fi.com/xnlh4ck3r) ☕ (I could use the caffeine!)
@@ -1 +0,0 @@
1
- __version__="4.8"
File without changes
File without changes
File without changes