waymore 7.2__tar.gz → 7.5__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -1,6 +1,6 @@
1
1
  Metadata-Version: 2.4
2
2
  Name: waymore
3
- Version: 7.2
3
+ Version: 7.5
4
4
  Summary: Find way more from the Wayback Machine, Common Crawl, Alien Vault OTX, URLScan, VirusTotal & Intelligence X!
5
5
  Home-page: https://github.com/xnl-h4ck3r/waymore
6
6
  Author: xnl-h4ck3r
@@ -21,12 +21,12 @@ Dynamic: license-file
21
21
 
22
22
  <center><img src="https://github.com/xnl-h4ck3r/waymore/blob/main/waymore/images/title.png"></center>
23
23
 
24
- ## About - v7.2
24
+ ## About - v7.5
25
25
 
26
- The idea behind **waymore** is to find even more links from the Wayback Machine than other existing tools.
26
+ The idea behind **waymore** is to find even more links from the Wayback Machine (plus other sources) than other existing tools.
27
27
 
28
- 👉 The biggest difference between **waymore** and other tools is that it can also **download the archived responses** for URLs on wayback machine so that you can then search these for even more links, developer comments, extra parameters, etc. etc.
29
- 👉 Also, other tools do not currenrtly deal with the rate limiting now in place by the sources, and will often just stop with incomplete results and not let you know they are incomplete.
28
+ 👉 The biggest difference between **waymore** and other tools is that it can also **download the archived responses** for URLs on wayback machine (and URLScan) so that you can then search these for even more links, developer comments, extra parameters, etc. etc.
29
+ 👉 Also, other tools do not currently deal with the rate limiting now in place by the sources, and will often just stop with incomplete results and not let you know they are incomplete.
30
30
 
31
31
  Anyone who does bug bounty will have likely used the amazing [waybackurls](https://github.com/tomnomnom/waybackurls) by @TomNomNoms. This tool gets URLs from [web.archive.org](https://web.archive.org) and additional links (if any) from one of the index collections on [index.commoncrawl.org](http://index.commoncrawl.org/).
32
32
  You would have also likely used the amazing [gau](https://github.com/lc/gau) by @hacker\_ which also finds URL's from wayback archive, Common Crawl, but also from Alien Vault, URLScan, Virus Total and Intelligence X.
@@ -37,7 +37,7 @@ Now **waymore** gets URL's from ALL of those sources too (with ability to filter
37
37
  - Alien Vault OTX (otx.alienvault.com)
38
38
  - URLScan (urlscan.io)
39
39
  - Virus Total (virustotal.com)
40
- - Intelligence X (intelx.io) - PAID SOURCE ONLY
40
+ - Intelligence X (intelx.io) - ACADEMIA OR PAID TIERS ONLY
41
41
 
42
42
  👉 It's a point that many seem to miss, so I'll just add it again :) ... The biggest difference between **waymore** and other tools is that it can also **download the archived responses** for URLs on wayback machine so that you can then search these for even more links, developer comments, extra parameters, etc. etc.
43
43
 
@@ -120,7 +120,7 @@ pipx install git+https://github.com/xnl-h4ck3r/waymore.git
120
120
  | -urlr | --urlscan-rate-limit-retry | The number of minutes the user wants to wait for a rate limit pause on URLScan.io instead of stopping with a `429` error (default: 1). |
121
121
  | -co | --check-only | This will make a few minimal requests to show you how many requests, and roughly how long it could take, to get URLs from the sources and downloaded responses from Wayback Machine (unfortunately it isn't possible to check how long it will take to download responses from URLScan). |
122
122
  | -nd | --notify-discord | Whether to send a notification to Discord when waymore completes. It requires `WEBHOOK_DISCORD` to be provided in the `config.yml` file. |
123
- | -nt | --notify-telegram | Whether to send a notification to Telegram when waymore completes. It requires `WEBHOOK_TELEGRAM` to be provided in the `config.yml` file. |
123
+ | -nt | --notify-telegram | Whether to send a notification to Telegram when waymore completes. It requires `TELEGRAM_BOT_TOKEN` and `TELEGRAM_CHAT_ID` to be provided in the `config.yml` file. |
124
124
  | -oijs | --output-inline-js | Whether to save combined inline javascript of all relevant files in the response directory when `-mode R` (or `-mode B`) has been used. The files are saved with the name `combinedInline{}.js` where `{}` is the number of the file, saving 1000 unique scripts per file. The file `combinedInlineSrc.txt` will also be created, containing the `src` value of all external scripts referenced in the files. |
125
125
  | -v | --verbose | Verbose output |
126
126
  | | --version | Show current version number. |
@@ -172,9 +172,10 @@ The `config.yml` file (typically in `~/.config/waymore/`) have values that can b
172
172
  - `URLSCAN_API_KEY` - You can sign up to [urlscan.io](https://urlscan.io/user/signup) to get a **FREE** API key (there are also paid subscriptions available). It is recommended you get a key and put it into the config file so that you can get more back (and quicker) from their API. NOTE: You will get rate limited unless you have a full paid subscription.
173
173
  - `CONTINUE_RESPONSES_IF_PIPED` - If retrieving archive responses doesn't complete, you will be prompted next time whether you want to continue with the previous run. However, if `stdout` is piped to another process it is assumed you don't want to have an interactive prompt. A value of `True` (default) will determine assure the previous run will be continued. if you want a fresh run every time then set to `False`.
174
174
  - `WEBHOOK_DISCORD` - If the `--notify-discord` argument is passed, `waymore` will send a notification to this Discord wehook.
175
- - `WEBHOOK_TELEGRAM` - If the `--notify-telegram` argument is passed, `waymore` will send a notification to this Telegram wehook.
175
+ - `TELEGRAM_BOT_TOKEN` - If the `--notify-telegram` argument is passed, `waymore` will use this token to send a notification to Telegram.
176
+ - `TELEGRAM_CHAT_ID` - If the `--notify-telegram` argument is passed, `waymore` will send the notification to this chat ID.
176
177
  - `DEFAULT_OUTPUT_DIR` - This is the default location of any output files written if the `-oU` and `-oR` arguments are not used. If the value of this key is blank, then it will default to the location of the `config.yml` file.
177
- - `INTELX_API_KEY` - You can sign up to [intelx.io here](https://intelx.io/product). It requires a paid API key to do the `/phonebook/search` through their API (as of 2024-09-01, the Phonebook service has been restricted to paid users due to constant abuse by spam accounts).
178
+ - `INTELX_API_KEY` - You can sign up to [intelx.io here](https://intelx.io/product). It requires an academia or paid API key to do the `/phonebook/search` through their API (as of 2024-09-01, the Phonebook service has been restricted to academia or paid users due to constant abuse by spam accounts). You can get a free API key for academic use if you sign up with a valid academic email address.
178
179
 
179
180
  **NOTE: The MIME types cannot be filtered for Alien Vault OTX, Virus Total and Intelligence X because they don't have the ability to filter on MIME Type. Sometimes URLScan does not have a MIME Type defined for a URL. In these cases, URLs will be included regardless of filter or match. Bear this in mind and consider excluding certain providers if this is important.**
180
181
 
@@ -1,11 +1,11 @@
1
1
  <center><img src="https://github.com/xnl-h4ck3r/waymore/blob/main/waymore/images/title.png"></center>
2
2
 
3
- ## About - v7.2
3
+ ## About - v7.5
4
4
 
5
- The idea behind **waymore** is to find even more links from the Wayback Machine than other existing tools.
5
+ The idea behind **waymore** is to find even more links from the Wayback Machine (plus other sources) than other existing tools.
6
6
 
7
- 👉 The biggest difference between **waymore** and other tools is that it can also **download the archived responses** for URLs on wayback machine so that you can then search these for even more links, developer comments, extra parameters, etc. etc.
8
- 👉 Also, other tools do not currenrtly deal with the rate limiting now in place by the sources, and will often just stop with incomplete results and not let you know they are incomplete.
7
+ 👉 The biggest difference between **waymore** and other tools is that it can also **download the archived responses** for URLs on wayback machine (and URLScan) so that you can then search these for even more links, developer comments, extra parameters, etc. etc.
8
+ 👉 Also, other tools do not currently deal with the rate limiting now in place by the sources, and will often just stop with incomplete results and not let you know they are incomplete.
9
9
 
10
10
  Anyone who does bug bounty will have likely used the amazing [waybackurls](https://github.com/tomnomnom/waybackurls) by @TomNomNoms. This tool gets URLs from [web.archive.org](https://web.archive.org) and additional links (if any) from one of the index collections on [index.commoncrawl.org](http://index.commoncrawl.org/).
11
11
  You would have also likely used the amazing [gau](https://github.com/lc/gau) by @hacker\_ which also finds URL's from wayback archive, Common Crawl, but also from Alien Vault, URLScan, Virus Total and Intelligence X.
@@ -16,7 +16,7 @@ Now **waymore** gets URL's from ALL of those sources too (with ability to filter
16
16
  - Alien Vault OTX (otx.alienvault.com)
17
17
  - URLScan (urlscan.io)
18
18
  - Virus Total (virustotal.com)
19
- - Intelligence X (intelx.io) - PAID SOURCE ONLY
19
+ - Intelligence X (intelx.io) - ACADEMIA OR PAID TIERS ONLY
20
20
 
21
21
  👉 It's a point that many seem to miss, so I'll just add it again :) ... The biggest difference between **waymore** and other tools is that it can also **download the archived responses** for URLs on wayback machine so that you can then search these for even more links, developer comments, extra parameters, etc. etc.
22
22
 
@@ -99,7 +99,7 @@ pipx install git+https://github.com/xnl-h4ck3r/waymore.git
99
99
  | -urlr | --urlscan-rate-limit-retry | The number of minutes the user wants to wait for a rate limit pause on URLScan.io instead of stopping with a `429` error (default: 1). |
100
100
  | -co | --check-only | This will make a few minimal requests to show you how many requests, and roughly how long it could take, to get URLs from the sources and downloaded responses from Wayback Machine (unfortunately it isn't possible to check how long it will take to download responses from URLScan). |
101
101
  | -nd | --notify-discord | Whether to send a notification to Discord when waymore completes. It requires `WEBHOOK_DISCORD` to be provided in the `config.yml` file. |
102
- | -nt | --notify-telegram | Whether to send a notification to Telegram when waymore completes. It requires `WEBHOOK_TELEGRAM` to be provided in the `config.yml` file. |
102
+ | -nt | --notify-telegram | Whether to send a notification to Telegram when waymore completes. It requires `TELEGRAM_BOT_TOKEN` and `TELEGRAM_CHAT_ID` to be provided in the `config.yml` file. |
103
103
  | -oijs | --output-inline-js | Whether to save combined inline javascript of all relevant files in the response directory when `-mode R` (or `-mode B`) has been used. The files are saved with the name `combinedInline{}.js` where `{}` is the number of the file, saving 1000 unique scripts per file. The file `combinedInlineSrc.txt` will also be created, containing the `src` value of all external scripts referenced in the files. |
104
104
  | -v | --verbose | Verbose output |
105
105
  | | --version | Show current version number. |
@@ -151,9 +151,10 @@ The `config.yml` file (typically in `~/.config/waymore/`) have values that can b
151
151
  - `URLSCAN_API_KEY` - You can sign up to [urlscan.io](https://urlscan.io/user/signup) to get a **FREE** API key (there are also paid subscriptions available). It is recommended you get a key and put it into the config file so that you can get more back (and quicker) from their API. NOTE: You will get rate limited unless you have a full paid subscription.
152
152
  - `CONTINUE_RESPONSES_IF_PIPED` - If retrieving archive responses doesn't complete, you will be prompted next time whether you want to continue with the previous run. However, if `stdout` is piped to another process it is assumed you don't want to have an interactive prompt. A value of `True` (default) will determine assure the previous run will be continued. if you want a fresh run every time then set to `False`.
153
153
  - `WEBHOOK_DISCORD` - If the `--notify-discord` argument is passed, `waymore` will send a notification to this Discord wehook.
154
- - `WEBHOOK_TELEGRAM` - If the `--notify-telegram` argument is passed, `waymore` will send a notification to this Telegram wehook.
154
+ - `TELEGRAM_BOT_TOKEN` - If the `--notify-telegram` argument is passed, `waymore` will use this token to send a notification to Telegram.
155
+ - `TELEGRAM_CHAT_ID` - If the `--notify-telegram` argument is passed, `waymore` will send the notification to this chat ID.
155
156
  - `DEFAULT_OUTPUT_DIR` - This is the default location of any output files written if the `-oU` and `-oR` arguments are not used. If the value of this key is blank, then it will default to the location of the `config.yml` file.
156
- - `INTELX_API_KEY` - You can sign up to [intelx.io here](https://intelx.io/product). It requires a paid API key to do the `/phonebook/search` through their API (as of 2024-09-01, the Phonebook service has been restricted to paid users due to constant abuse by spam accounts).
157
+ - `INTELX_API_KEY` - You can sign up to [intelx.io here](https://intelx.io/product). It requires an academia or paid API key to do the `/phonebook/search` through their API (as of 2024-09-01, the Phonebook service has been restricted to academia or paid users due to constant abuse by spam accounts). You can get a free API key for academic use if you sign up with a valid academic email address.
157
158
 
158
159
  **NOTE: The MIME types cannot be filtered for Alien Vault OTX, Virus Total and Intelligence X because they don't have the ability to filter on MIME Type. Sometimes URLScan does not have a MIME Type defined for a URL. In these cases, URLs will be included regardless of filter or match. Bear this in mind and consider excluding certain providers if this is important.**
159
160
 
@@ -0,0 +1 @@
1
+ __version__ = "7.5"
@@ -19,6 +19,7 @@ import threading
19
19
  from datetime import datetime, timedelta
20
20
  from pathlib import Path
21
21
  from signal import SIGINT, signal
22
+ from typing import Optional
22
23
  from urllib.parse import urlparse
23
24
 
24
25
  import requests
@@ -109,6 +110,11 @@ linkCountAlienVault = 0
109
110
  linkCountURLScan = 0
110
111
  linkCountVirusTotal = 0
111
112
  linkCountIntelx = 0
113
+ linksFoundCommonCrawl = set()
114
+ linksFoundAlienVault = set()
115
+ linksFoundURLScan = set()
116
+ linksFoundVirusTotal = set()
117
+ linksFoundIntelx = set()
112
118
 
113
119
  # Thread lock for protecting shared state during concurrent operations
114
120
  links_lock = threading.Lock()
@@ -124,9 +130,64 @@ ALIENVAULT_URL = "https://otx.alienvault.com/api/v1/indicators/{TYPE}/{DOMAIN}/u
124
130
  URLSCAN_URL = "https://urlscan.io/api/v1/search/?q=domain:{DOMAIN}{DATERANGE}&size=10000"
125
131
  URLSCAN_DOM_URL = "https://urlscan.io/dom/"
126
132
  VIRUSTOTAL_URL = "https://www.virustotal.com/vtapi/v2/domain/report?apikey={APIKEY}&domain={DOMAIN}"
127
- INTELX_SEARCH_URL = "https://2.intelx.io/phonebook/search"
128
- INTELX_RESULTS_URL = "https://2.intelx.io/phonebook/search/result?id="
129
- INTELX_ACCOUNT_URL = "https://2.intelx.io/authenticate/info"
133
+ # Paid endpoint first, free endpoint as fallback
134
+ INTELX_BASES = ["https://2.intelx.io", "https://free.intelx.io"]
135
+
136
+ intelx_tls = threading.local()
137
+
138
+
139
+ def initIntelxTls():
140
+ """Initialize thread-local storage for IntelX if not already done."""
141
+ if not hasattr(intelx_tls, "INTELX_BASE"):
142
+ intelx_tls.INTELX_BASE = INTELX_BASES[0]
143
+ intelx_tls.INTELX_SEARCH_URL = f"{intelx_tls.INTELX_BASE}/phonebook/search"
144
+ intelx_tls.INTELX_RESULTS_URL = f"{intelx_tls.INTELX_BASE}/phonebook/search/result?id="
145
+ intelx_tls.INTELX_ACCOUNT_URL = f"{intelx_tls.INTELX_BASE}/authenticate/info"
146
+
147
+
148
+ def setIntelxBase(base: str):
149
+ """Update IntelX URLs to use the provided base (thread-local)."""
150
+ initIntelxTls()
151
+ intelx_tls.INTELX_BASE = base
152
+ intelx_tls.INTELX_SEARCH_URL = f"{intelx_tls.INTELX_BASE}/phonebook/search"
153
+ intelx_tls.INTELX_RESULTS_URL = f"{intelx_tls.INTELX_BASE}/phonebook/search/result?id="
154
+ intelx_tls.INTELX_ACCOUNT_URL = f"{intelx_tls.INTELX_BASE}/authenticate/info"
155
+
156
+
157
+ def chooseIntelxBase(api_key: str) -> Optional[requests.Response]:
158
+ """
159
+ Probe IntelX endpoints in order (paid, then free) and set the first that works.
160
+ Returns the last response (or None) so callers can inspect status/JSON.
161
+ """
162
+ initIntelxTls()
163
+ try:
164
+ session = requests.Session()
165
+ session.mount("https://", HTTP_ADAPTER)
166
+ session.mount("http://", HTTP_ADAPTER)
167
+ last_resp = None
168
+ for base in INTELX_BASES:
169
+ userAgent = random.choice(USER_AGENT)
170
+ try:
171
+ resp = session.get(
172
+ f"{base}/authenticate/info",
173
+ headers={"User-Agent": userAgent, "X-Key": api_key},
174
+ )
175
+ last_resp = resp
176
+ if resp.status_code == 200:
177
+ setIntelxBase(base)
178
+ return resp
179
+ if resp.status_code in [401, 403]:
180
+ # Try next base
181
+ continue
182
+ except Exception as e:
183
+ writerr(colored(f"IntelX - [ ERR ] Problem probing {base}: {e}", "red"))
184
+ # For other codes or exceptions, try next base anyway instead of breaking prematurely
185
+ continue
186
+ return last_resp
187
+ except Exception as e:
188
+ writerr(colored(f"IntelX - [ ERR ] Unexpected error in chooseIntelxBase: {e}", "red"))
189
+ return None
190
+
130
191
 
131
192
  # User Agents to use when making requests, chosen at random
132
193
  USER_AGENT = [
@@ -182,7 +243,8 @@ FILTER_KEYWORDS = ""
182
243
  URLSCAN_API_KEY = ""
183
244
  CONTINUE_RESPONSES_IF_PIPED = True
184
245
  WEBHOOK_DISCORD = ""
185
- WEBHOOK_TELEGRAM = ""
246
+ TELEGRAM_BOT_TOKEN = ""
247
+ TELEGRAM_CHAT_ID = ""
186
248
  DEFAULT_OUTPUT_DIR = ""
187
249
  INTELX_API_KEY = ""
188
250
 
@@ -553,7 +615,7 @@ def showOptions():
553
615
  write(
554
616
  colored("Intelligence X API Key:", "magenta")
555
617
  + colored(
556
- " {none} - You require a paid API Key from https://intelx.io/product",
618
+ " {none} - You require a Academia or Paid API Key from https://intelx.io/product",
557
619
  "white",
558
620
  )
559
621
  )
@@ -794,16 +856,22 @@ def showOptions():
794
856
  write(colored("Discord Webhook: ", "magenta") + colored(WEBHOOK_DISCORD))
795
857
 
796
858
  if args.notify_telegram:
797
- if WEBHOOK_TELEGRAM == "" or WEBHOOK_TELEGRAM == "YOUR_WEBHOOK":
859
+ if (
860
+ TELEGRAM_BOT_TOKEN == ""
861
+ or TELEGRAM_BOT_TOKEN == "YOUR_TOKEN"
862
+ or TELEGRAM_CHAT_ID == ""
863
+ or TELEGRAM_CHAT_ID == "YOUR_CHAT_ID"
864
+ ):
798
865
  write(
799
- colored("Telegram Webhook: ", "magenta")
866
+ colored("Telegram: ", "magenta")
800
867
  + colored(
801
- "It looks like no Telegram webhook has been set in config.yml file.",
868
+ "It looks like Telegram Bot Token or Chat ID has not been set in config.yml file.",
802
869
  "red",
803
870
  )
804
871
  )
805
872
  else:
806
- write(colored("Telegram Webhook: ", "magenta") + colored(WEBHOOK_TELEGRAM))
873
+ write(colored("Telegram Bot Token: ", "magenta") + colored(TELEGRAM_BOT_TOKEN))
874
+ write(colored("Telegram Chat ID: ", "magenta") + colored(TELEGRAM_CHAT_ID))
807
875
 
808
876
  write(colored("Default Output Directory: ", "magenta") + colored(str(DEFAULT_OUTPUT_DIR)))
809
877
 
@@ -863,7 +931,7 @@ def getConfig():
863
931
  """
864
932
  Try to get the values from the config file, otherwise use the defaults
865
933
  """
866
- global FILTER_CODE, FILTER_MIME, FILTER_URL, FILTER_KEYWORDS, URLSCAN_API_KEY, VIRUSTOTAL_API_KEY, CONTINUE_RESPONSES_IF_PIPED, subs, path, waymorePath, inputIsDomainANDPath, HTTP_ADAPTER, HTTP_ADAPTER_CC, argsInput, terminalWidth, MATCH_CODE, WEBHOOK_DISCORD, DEFAULT_OUTPUT_DIR, MATCH_MIME, INTELX_API_KEY, WEBHOOK_TELEGRAM
934
+ global FILTER_CODE, FILTER_MIME, FILTER_URL, FILTER_KEYWORDS, URLSCAN_API_KEY, VIRUSTOTAL_API_KEY, CONTINUE_RESPONSES_IF_PIPED, subs, path, waymorePath, inputIsDomainANDPath, HTTP_ADAPTER, HTTP_ADAPTER_CC, argsInput, terminalWidth, MATCH_CODE, WEBHOOK_DISCORD, TELEGRAM_BOT_TOKEN, TELEGRAM_CHAT_ID, DEFAULT_OUTPUT_DIR, MATCH_MIME, INTELX_API_KEY
867
935
  try:
868
936
 
869
937
  # Set terminal width
@@ -1130,23 +1198,42 @@ def getConfig():
1130
1198
 
1131
1199
  if args.notify_telegram:
1132
1200
  try:
1133
- WEBHOOK_TELEGRAM = config.get("WEBHOOK_TELEGRAM")
1134
- if str(WEBHOOK_TELEGRAM) == "None" or str(WEBHOOK_TELEGRAM) == "YOUR_WEBHOOK":
1201
+ TELEGRAM_BOT_TOKEN = config.get("TELEGRAM_BOT_TOKEN")
1202
+ if str(TELEGRAM_BOT_TOKEN) == "None" or str(TELEGRAM_BOT_TOKEN) == "YOUR_TOKEN":
1135
1203
  writerr(
1136
1204
  colored(
1137
- 'No value for "WEBHOOK_TELEGRAM" in config.yml - default set',
1205
+ 'No value for "TELEGRAM_BOT_TOKEN" in config.yml - default set',
1138
1206
  "yellow",
1139
1207
  )
1140
1208
  )
1141
- WEBHOOK_TELEGRAM = ""
1209
+ TELEGRAM_BOT_TOKEN = ""
1142
1210
  except Exception:
1143
1211
  writerr(
1144
1212
  colored(
1145
- 'Unable to read "WEBHOOK_TELEGRAM" from config.yml - default set',
1213
+ 'Unable to read "TELEGRAM_BOT_TOKEN" from config.yml - default set',
1146
1214
  "red",
1147
1215
  )
1148
1216
  )
1149
- WEBHOOK_TELEGRAM = ""
1217
+ TELEGRAM_BOT_TOKEN = ""
1218
+
1219
+ try:
1220
+ TELEGRAM_CHAT_ID = config.get("TELEGRAM_CHAT_ID")
1221
+ if str(TELEGRAM_CHAT_ID) == "None" or str(TELEGRAM_CHAT_ID) == "YOUR_CHAT_ID":
1222
+ writerr(
1223
+ colored(
1224
+ 'No value for "TELEGRAM_CHAT_ID" in config.yml - default set',
1225
+ "yellow",
1226
+ )
1227
+ )
1228
+ TELEGRAM_CHAT_ID = ""
1229
+ except Exception:
1230
+ writerr(
1231
+ colored(
1232
+ 'Unable to read "TELEGRAM_CHAT_ID" from config.yml - default set',
1233
+ "red",
1234
+ )
1235
+ )
1236
+ TELEGRAM_CHAT_ID = ""
1150
1237
 
1151
1238
  try:
1152
1239
  DEFAULT_OUTPUT_DIR = config.get("DEFAULT_OUTPUT_DIR")
@@ -1247,7 +1334,8 @@ def getConfig():
1247
1334
  FILTER_KEYWORDS = ""
1248
1335
  CONTINUE_RESPONSES_IF_PIPED = True
1249
1336
  WEBHOOK_DISCORD = ""
1250
- WEBHOOK_TELEGRAM = ""
1337
+ TELEGRAM_BOT_TOKEN = ""
1338
+ TELEGRAM_CHAT_ID = ""
1251
1339
  DEFAULT_OUTPUT_DIR = os.path.expanduser("~/.config/waymore")
1252
1340
 
1253
1341
  except Exception as e:
@@ -4568,7 +4656,10 @@ def processIntelxUrl(url):
4568
4656
 
4569
4657
  # Add link if it passed filters
4570
4658
  if addLink:
4571
- linksFoundAdd(url, linksFoundIntelx)
4659
+ # Clean the link to remove any █ (\u2588) characters from the link. These can be present in the IntelX results when the Academia plan is used
4660
+ url = url.replace("\u2588", "").strip()
4661
+ if url != "":
4662
+ linksFoundAdd(url, linksFoundIntelx)
4572
4663
 
4573
4664
  except Exception as e:
4574
4665
  writerr(colored("ERROR processIntelxUrl 1: " + str(e), "red"))
@@ -4579,86 +4670,107 @@ def processIntelxType(target, credits):
4579
4670
  target: 1 - Domains
4580
4671
  target: 3 - URLs
4581
4672
  """
4673
+ initIntelxTls()
4582
4674
  global intelxAPIIssue
4583
4675
  try:
4584
- try:
4585
- requestsMade = 0
4676
+ attempts = 0
4677
+ resp = None
4678
+ # Choose a random user agent string to use for any requests and reuse session
4679
+ userAgent = random.choice(USER_AGENT)
4680
+ session = requests.Session()
4681
+ session.mount("https://", HTTP_ADAPTER)
4682
+ session.mount("http://", HTTP_ADAPTER)
4586
4683
 
4587
- # Choose a random user agent string to use for any requests
4588
- userAgent = random.choice(USER_AGENT)
4589
- session = requests.Session()
4590
- session.mount("https://", HTTP_ADAPTER)
4591
- session.mount("http://", HTTP_ADAPTER)
4592
- # Pass the API key in the X-Key header too.
4593
- resp = session.post(
4594
- INTELX_SEARCH_URL,
4595
- data='{"term":"' + quote(argsInputHostname) + '","target":' + str(target) + "}",
4596
- headers={"User-Agent": userAgent, "X-Key": INTELX_API_KEY},
4597
- )
4598
- requestsMade = requestsMade + 1
4599
- except Exception as e:
4600
- write(
4601
- colored(
4602
- "IntelX - [ ERR ] Unable to get links from intelx.io: " + str(e),
4603
- "red",
4684
+ while attempts < 2:
4685
+ attempts += 1
4686
+ try:
4687
+ requestsMade = 0
4688
+ # Pass the API key in the X-Key header too.
4689
+ resp = session.post(
4690
+ intelx_tls.INTELX_SEARCH_URL,
4691
+ data='{"term":"' + quote(argsInputHostname) + '","target":' + str(target) + "}",
4692
+ headers={"User-Agent": userAgent, "X-Key": INTELX_API_KEY},
4604
4693
  )
4605
- )
4606
- return
4694
+ requestsMade = requestsMade + 1
4695
+ except Exception as e:
4696
+ write(
4697
+ colored(
4698
+ "IntelX - [ ERR ] Unable to get links from intelx.io: " + str(e),
4699
+ "red",
4700
+ )
4701
+ )
4702
+ return
4607
4703
 
4608
- # Deal with any errors
4609
- if resp.status_code == 429:
4610
- intelxAPIIssue = True
4611
- writerr(
4612
- colored(
4613
- "IntelX - [ 429 ] Rate limit reached so unable to get links.",
4614
- "red",
4704
+ # Deal with any errors
4705
+ if resp.status_code == 200:
4706
+ break
4707
+ elif resp.status_code == 429:
4708
+ intelxAPIIssue = True
4709
+ writerr(
4710
+ colored(
4711
+ "IntelX - [ 429 ] Rate limit reached so unable to get links.",
4712
+ "red",
4713
+ )
4615
4714
  )
4616
- )
4617
- return
4618
- elif resp.status_code == 401:
4619
- intelxAPIIssue = True
4620
- writerr(
4621
- colored(
4622
- "IntelX - [ 401 ] Not authorized. The source requires a paid API key. Check your API key is correct.",
4623
- "red",
4715
+ return
4716
+ elif resp.status_code == 401:
4717
+ # Retry with free endpoint if paid endpoint was used and auth failed
4718
+ if intelx_tls.INTELX_BASE != INTELX_BASES[-1]:
4719
+ setIntelxBase(INTELX_BASES[-1])
4720
+ continue
4721
+ intelxAPIIssue = True
4722
+ writerr(
4723
+ colored(
4724
+ "IntelX - [ 401 ] Not authorized. Check your API key is correct.",
4725
+ "red",
4726
+ )
4624
4727
  )
4625
- )
4626
- return
4627
- elif resp.status_code == 402:
4628
- intelxAPIIssue = True
4629
- if credits.startswith("0/"):
4728
+ return
4729
+ elif resp.status_code == 402:
4730
+ # If we were on paid, fall back to free and retry once
4731
+ if intelx_tls.INTELX_BASE != INTELX_BASES[-1]:
4732
+ setIntelxBase(INTELX_BASES[-1])
4733
+ continue
4734
+ intelxAPIIssue = True
4735
+ if credits.startswith("0/"):
4736
+ writerr(
4737
+ colored(
4738
+ "IntelX - [ 402 ] You have run out of daily credits on Intelx ("
4739
+ + credits
4740
+ + ").",
4741
+ "red",
4742
+ )
4743
+ )
4744
+ else:
4745
+ writerr(
4746
+ colored(
4747
+ "IntelX - [ 402 ] It appears you have run out of daily credits on Intelx.",
4748
+ "red",
4749
+ )
4750
+ )
4751
+ return
4752
+ elif resp.status_code == 403:
4753
+ intelxAPIIssue = True
4630
4754
  writerr(
4631
4755
  colored(
4632
- "IntelX - [ 402 ] You have run out of daily credits on Intelx ("
4633
- + credits
4634
- + ").",
4756
+ "IntelX - [ 403 ] Permission denied. Check your API key is correct.",
4635
4757
  "red",
4636
4758
  )
4637
4759
  )
4760
+ return
4638
4761
  else:
4639
4762
  writerr(
4640
4763
  colored(
4641
- "IntelX - [ 402 ] It appears you have run out of daily credits on Intelx.",
4764
+ "IntelX - [ "
4765
+ + str(resp.status_code)
4766
+ + " ] Unable to get links from intelx.io",
4642
4767
  "red",
4643
4768
  )
4644
4769
  )
4645
- return
4646
- elif resp.status_code == 403:
4647
- intelxAPIIssue = True
4648
- writerr(
4649
- colored(
4650
- "IntelX - [ 403 ] Permission denied. Check your API key is correct.",
4651
- "red",
4652
- )
4653
- )
4654
- return
4655
- elif resp.status_code != 200:
4656
- writerr(
4657
- colored(
4658
- "IntelX - [ " + str(resp.status_code) + " ] Unable to get links from intelx.io",
4659
- "red",
4660
- )
4661
- )
4770
+ return
4771
+
4772
+ # Double check we have a valid response
4773
+ if resp is None or resp.status_code != 200:
4662
4774
  return
4663
4775
 
4664
4776
  # Get the JSON response
@@ -4682,7 +4794,7 @@ def processIntelxType(target, credits):
4682
4794
  break
4683
4795
  try:
4684
4796
  resp = session.get(
4685
- INTELX_RESULTS_URL + id,
4797
+ intelx_tls.INTELX_RESULTS_URL + id,
4686
4798
  headers={"User-Agent": userAgent, "X-Key": INTELX_API_KEY},
4687
4799
  )
4688
4800
  requestsMade = requestsMade + 1
@@ -4737,19 +4849,13 @@ def processIntelxType(target, credits):
4737
4849
 
4738
4850
  def getIntelxAccountInfo() -> str:
4739
4851
  """
4740
- Get the account info and return the number of Credits remainiing from the /phonebook/search
4852
+ Get the account info and return the number of Credits remaining from the /phonebook/search
4741
4853
  """
4854
+ initIntelxTls()
4742
4855
  try:
4743
- # Choose a random user agent string to use for any requests
4744
- userAgent = random.choice(USER_AGENT)
4745
- session = requests.Session()
4746
- session.mount("https://", HTTP_ADAPTER)
4747
- session.mount("http://", HTTP_ADAPTER)
4748
- # Pass the API key in the X-Key header too.
4749
- resp = session.get(
4750
- INTELX_ACCOUNT_URL,
4751
- headers={"User-Agent": userAgent, "X-Key": INTELX_API_KEY},
4752
- )
4856
+ resp = chooseIntelxBase(INTELX_API_KEY)
4857
+ if resp is None or resp.status_code != 200:
4858
+ return "Unknown"
4753
4859
  jsonResp = json.loads(resp.text.strip())
4754
4860
  credits = str(
4755
4861
  jsonResp.get("paths", {}).get("/phonebook/search", {}).get("Credit", "Unknown")
@@ -4766,7 +4872,7 @@ def getIntelxUrls():
4766
4872
  """
4767
4873
  Get URLs from the Intelligence X Phonebook search
4768
4874
  """
4769
- global INTELX_API_KEY, linksFound, waymorePath, subs, stopProgram, stopSourceIntelx, argsInput, checkIntelx, argsInputHostname, intelxAPIIssue, linkCountIntelx
4875
+ global INTELX_API_KEY, linksFound, waymorePath, subs, stopProgram, stopSourceIntelx, argsInput, checkIntelx, argsInputHostname, intelxAPIIssue, linkCountIntelx, linksFoundIntelx
4770
4876
 
4771
4877
  # Write the file of URL's for the passed domain/URL
4772
4878
  try:
@@ -4780,6 +4886,7 @@ def getIntelxUrls():
4780
4886
 
4781
4887
  stopSourceIntelx = False
4782
4888
  linksFoundIntelx = set()
4889
+ initIntelxTls()
4783
4890
 
4784
4891
  credits = getIntelxAccountInfo()
4785
4892
  if verbose():
@@ -4790,7 +4897,7 @@ def getIntelxUrls():
4790
4897
  + "): ",
4791
4898
  "magenta",
4792
4899
  )
4793
- + colored(INTELX_SEARCH_URL + "\n", "white")
4900
+ + colored(intelx_tls.INTELX_SEARCH_URL + "\n", "white")
4794
4901
  )
4795
4902
 
4796
4903
  if not args.check_only:
@@ -5798,13 +5905,15 @@ def notifyDiscord():
5798
5905
 
5799
5906
 
5800
5907
  def notifyTelegram():
5801
- global WEBHOOK_TELEGRAM, args
5908
+ global TELEGRAM_BOT_TOKEN, TELEGRAM_CHAT_ID, args
5802
5909
  try:
5910
+ url = "https://api.telegram.org/bot" + TELEGRAM_BOT_TOKEN + "/sendMessage"
5803
5911
  data = {
5912
+ "chat_id": TELEGRAM_CHAT_ID,
5804
5913
  "text": "waymore has finished for `-i " + args.input + " -mode " + args.mode + "` ! 🤘",
5805
5914
  }
5806
5915
  try:
5807
- result = requests.post(WEBHOOK_TELEGRAM, json=data)
5916
+ result = requests.post(url, json=data)
5808
5917
  if 300 <= result.status_code < 200:
5809
5918
  writerr(
5810
5919
  colored(
@@ -6373,7 +6482,7 @@ def main():
6373
6482
  "-nt",
6374
6483
  "--notify-telegram",
6375
6484
  action="store_true",
6376
- help="Whether to send a notification to Telegram when waymore completes. It requires WEBHOOK_TELEGRAM to be provided in the config.yml file.",
6485
+ help="Whether to send a notification to Telegram when waymore completes. It requires TELEGRAM_BOT_TOKEN and TELEGRAM_CHAT_ID to be provided in the config.yml file.",
6377
6486
  )
6378
6487
  parser.add_argument(
6379
6488
  "-oijs",
@@ -6601,7 +6710,7 @@ def main():
6601
6710
  except Exception:
6602
6711
  pass
6603
6712
  try:
6604
- if args.notify_telegram and WEBHOOK_TELEGRAM != "":
6713
+ if args.notify_telegram and TELEGRAM_BOT_TOKEN != "" and TELEGRAM_CHAT_ID != "":
6605
6714
  notifyTelegram()
6606
6715
  except Exception:
6607
6716
  pass
@@ -1,6 +1,6 @@
1
1
  Metadata-Version: 2.4
2
2
  Name: waymore
3
- Version: 7.2
3
+ Version: 7.5
4
4
  Summary: Find way more from the Wayback Machine, Common Crawl, Alien Vault OTX, URLScan, VirusTotal & Intelligence X!
5
5
  Home-page: https://github.com/xnl-h4ck3r/waymore
6
6
  Author: xnl-h4ck3r
@@ -21,12 +21,12 @@ Dynamic: license-file
21
21
 
22
22
  <center><img src="https://github.com/xnl-h4ck3r/waymore/blob/main/waymore/images/title.png"></center>
23
23
 
24
- ## About - v7.2
24
+ ## About - v7.5
25
25
 
26
- The idea behind **waymore** is to find even more links from the Wayback Machine than other existing tools.
26
+ The idea behind **waymore** is to find even more links from the Wayback Machine (plus other sources) than other existing tools.
27
27
 
28
- 👉 The biggest difference between **waymore** and other tools is that it can also **download the archived responses** for URLs on wayback machine so that you can then search these for even more links, developer comments, extra parameters, etc. etc.
29
- 👉 Also, other tools do not currenrtly deal with the rate limiting now in place by the sources, and will often just stop with incomplete results and not let you know they are incomplete.
28
+ 👉 The biggest difference between **waymore** and other tools is that it can also **download the archived responses** for URLs on wayback machine (and URLScan) so that you can then search these for even more links, developer comments, extra parameters, etc. etc.
29
+ 👉 Also, other tools do not currently deal with the rate limiting now in place by the sources, and will often just stop with incomplete results and not let you know they are incomplete.
30
30
 
31
31
  Anyone who does bug bounty will have likely used the amazing [waybackurls](https://github.com/tomnomnom/waybackurls) by @TomNomNoms. This tool gets URLs from [web.archive.org](https://web.archive.org) and additional links (if any) from one of the index collections on [index.commoncrawl.org](http://index.commoncrawl.org/).
32
32
  You would have also likely used the amazing [gau](https://github.com/lc/gau) by @hacker\_ which also finds URL's from wayback archive, Common Crawl, but also from Alien Vault, URLScan, Virus Total and Intelligence X.
@@ -37,7 +37,7 @@ Now **waymore** gets URL's from ALL of those sources too (with ability to filter
37
37
  - Alien Vault OTX (otx.alienvault.com)
38
38
  - URLScan (urlscan.io)
39
39
  - Virus Total (virustotal.com)
40
- - Intelligence X (intelx.io) - PAID SOURCE ONLY
40
+ - Intelligence X (intelx.io) - ACADEMIA OR PAID TIERS ONLY
41
41
 
42
42
  👉 It's a point that many seem to miss, so I'll just add it again :) ... The biggest difference between **waymore** and other tools is that it can also **download the archived responses** for URLs on wayback machine so that you can then search these for even more links, developer comments, extra parameters, etc. etc.
43
43
 
@@ -120,7 +120,7 @@ pipx install git+https://github.com/xnl-h4ck3r/waymore.git
120
120
  | -urlr | --urlscan-rate-limit-retry | The number of minutes the user wants to wait for a rate limit pause on URLScan.io instead of stopping with a `429` error (default: 1). |
121
121
  | -co | --check-only | This will make a few minimal requests to show you how many requests, and roughly how long it could take, to get URLs from the sources and downloaded responses from Wayback Machine (unfortunately it isn't possible to check how long it will take to download responses from URLScan). |
122
122
  | -nd | --notify-discord | Whether to send a notification to Discord when waymore completes. It requires `WEBHOOK_DISCORD` to be provided in the `config.yml` file. |
123
- | -nt | --notify-telegram | Whether to send a notification to Telegram when waymore completes. It requires `WEBHOOK_TELEGRAM` to be provided in the `config.yml` file. |
123
+ | -nt | --notify-telegram | Whether to send a notification to Telegram when waymore completes. It requires `TELEGRAM_BOT_TOKEN` and `TELEGRAM_CHAT_ID` to be provided in the `config.yml` file. |
124
124
  | -oijs | --output-inline-js | Whether to save combined inline javascript of all relevant files in the response directory when `-mode R` (or `-mode B`) has been used. The files are saved with the name `combinedInline{}.js` where `{}` is the number of the file, saving 1000 unique scripts per file. The file `combinedInlineSrc.txt` will also be created, containing the `src` value of all external scripts referenced in the files. |
125
125
  | -v | --verbose | Verbose output |
126
126
  | | --version | Show current version number. |
@@ -172,9 +172,10 @@ The `config.yml` file (typically in `~/.config/waymore/`) have values that can b
172
172
  - `URLSCAN_API_KEY` - You can sign up to [urlscan.io](https://urlscan.io/user/signup) to get a **FREE** API key (there are also paid subscriptions available). It is recommended you get a key and put it into the config file so that you can get more back (and quicker) from their API. NOTE: You will get rate limited unless you have a full paid subscription.
173
173
  - `CONTINUE_RESPONSES_IF_PIPED` - If retrieving archive responses doesn't complete, you will be prompted next time whether you want to continue with the previous run. However, if `stdout` is piped to another process it is assumed you don't want to have an interactive prompt. A value of `True` (default) will determine assure the previous run will be continued. if you want a fresh run every time then set to `False`.
174
174
  - `WEBHOOK_DISCORD` - If the `--notify-discord` argument is passed, `waymore` will send a notification to this Discord wehook.
175
- - `WEBHOOK_TELEGRAM` - If the `--notify-telegram` argument is passed, `waymore` will send a notification to this Telegram wehook.
175
+ - `TELEGRAM_BOT_TOKEN` - If the `--notify-telegram` argument is passed, `waymore` will use this token to send a notification to Telegram.
176
+ - `TELEGRAM_CHAT_ID` - If the `--notify-telegram` argument is passed, `waymore` will send the notification to this chat ID.
176
177
  - `DEFAULT_OUTPUT_DIR` - This is the default location of any output files written if the `-oU` and `-oR` arguments are not used. If the value of this key is blank, then it will default to the location of the `config.yml` file.
177
- - `INTELX_API_KEY` - You can sign up to [intelx.io here](https://intelx.io/product). It requires a paid API key to do the `/phonebook/search` through their API (as of 2024-09-01, the Phonebook service has been restricted to paid users due to constant abuse by spam accounts).
178
+ - `INTELX_API_KEY` - You can sign up to [intelx.io here](https://intelx.io/product). It requires an academia or paid API key to do the `/phonebook/search` through their API (as of 2024-09-01, the Phonebook service has been restricted to academia or paid users due to constant abuse by spam accounts). You can get a free API key for academic use if you sign up with a valid academic email address.
178
179
 
179
180
  **NOTE: The MIME types cannot be filtered for Alien Vault OTX, Virus Total and Intelligence X because they don't have the ability to filter on MIME Type. Sometimes URLScan does not have a MIME Type defined for a URL. In these cases, URLs will be included regardless of filter or match. Bear this in mind and consider excluding certain providers if this is important.**
180
181
 
@@ -1 +0,0 @@
1
- __version__ = "7.2"
File without changes
File without changes
File without changes
File without changes
File without changes
File without changes