waymore 6.4__tar.gz → 6.6__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -1,13 +1,16 @@
1
1
  Metadata-Version: 2.1
2
2
  Name: waymore
3
- Version: 6.4
4
- Summary: Find way more from the Wayback Machine, Common Crawl, Alien Vault OTX, URLScan & VirusTotal!
3
+ Version: 6.6
4
+ Summary: Find way more from the Wayback Machine, Common Crawl, Alien Vault OTX, URLScan, VirusTotal & Intelligence X!
5
5
  Home-page: https://github.com/xnl-h4ck3r/waymore
6
- Author: @xnl-h4ck3r
6
+ Author: xnl-h4ck3r
7
+ License: MIT
8
+ Requires-Python: >=3.9
7
9
  Description-Content-Type: text/markdown
8
10
  License-File: LICENSE
11
+ Requires-Dist: PyYAML
9
12
  Requires-Dist: requests
10
- Requires-Dist: pyyaml
13
+ Requires-Dist: setuptools
11
14
  Requires-Dist: termcolor
12
15
  Requires-Dist: psutil
13
16
  Requires-Dist: urlparse3
@@ -15,7 +18,7 @@ Requires-Dist: tldextract
15
18
 
16
19
  <center><img src="https://github.com/xnl-h4ck3r/waymore/blob/main/waymore/images/title.png"></center>
17
20
 
18
- ## About - v6.4
21
+ ## About - v6.6
19
22
 
20
23
  The idea behind **waymore** is to find even more links from the Wayback Machine than other existing tools.
21
24
 
@@ -88,8 +91,8 @@ pipx install git+https://github.com/xnl-h4ck3r/waymore.git
88
91
  | -mc | | Only Match HTTP status codes for retrieved URLs and responses. Comma separated list of codes. Passing this argument overrides the config `FILTER_CODE` and `-fc`. |
89
92
  | -mt | | Only MIME Types for retrieved URLs and responses. Comma separated list of MIME types. Passing this argument overrides the config `FILTER_MIME` and `-ft`. **NOTE: This will NOT be applied to Alien Vault OTX, Virus Total and Intelligence X because they don't have the ability to filter on MIME Type. Sometimes URLScan does not have a MIME Type defined - these will always be included. Consider excluding sources if this matters to you.**. |
90
93
  | -l | --limit | How many responses will be saved (if `-mode R` or `-mode B` is passed). A positive value will get the **first N** results, a negative value will get the **last N** results. A value of 0 will get **ALL** responses (default: 5000) |
91
- | -from | --from-date | What date to get responses from. If not specified it will get from the earliest possible results. A partial value can be passed, e.g. `2016`, `201805`, etc. |
92
- | -to | --to-date | What date to get responses to. If not specified it will get to the latest possible results. A partial value can be passed, e.g. `2021`, `202112`, etc. |
94
+ | -from | --from-date | What date to get data from. If not specified it will get from the earliest possible results. A partial value can be passed, e.g. `2016`, `201805`, etc. **IMPORTANT: There are some exceptions with sources unable to get URLs within date limits: Virus Total - all known sub domains will still be returned; Intelligence X - all URLs will still be returned.** |
95
+ | -to | --to-date | What date to get data to. If not specified it will get to the latest possible results. A partial value can be passed, e.g. `2021`, `202112`, etc. **IMPORTANT: There are some exceptions with sources unable to get URLs within date limits: Virus Total - all known sub domains will still be returned; Intelligence X - all URLs will still be returned.** |
93
96
  | -ci | --capture-interval | Filters the search on archive.org to only get at most 1 capture per hour (`h`), day (`d`) or month (`m`). This filter is used for responses only. The default is `d` but can also be set to `none` to not filter anything and get all responses. |
94
97
  | -ra | --regex-after | RegEx for filtering purposes against links found from all sources of URLs AND responses downloaded. Only positive matches will be output. |
95
98
  | -url-filename | | Set the file name of downloaded responses to the URL that generated the response, otherwise it will be set to the hash value of the response. Using the hash value means multiple URLs that generated the same response will only result in one file being saved for that response. |
@@ -100,7 +103,6 @@ pipx install git+https://github.com/xnl-h4ck3r/waymore.git
100
103
  | -xvt | | Exclude checks for links from virustotal.com |
101
104
  | -xix | | Exclude checks for links from Intelligence X.com |
102
105
  | -lcc | | Limit the number of Common Crawl index collections searched, e.g. `-lcc 10` will just search the latest `10` collections (default: 1). As of November 2024 there are currently 106 collections. Setting to `0` will search **ALL** collections. If you don't want to search Common Crawl at all, use the `-xcc` option. |
103
- | -lcy | | Limit the number of Common Crawl index collections searched by the year of the index data. The earliest index has data from 2008. Setting to 0 (default) will search collections or any year (but in conjuction with `-lcc`). For example, if you are only interested in data from 2015 and after, pass `-lcy 2015`. This will override the value of `-lcc` if passed. If you don't want to search Common Crawl at all, use the `-xcc` option. |
104
106
  | -t | --timeout | This is for archived responses only! How many seconds to wait for the server to send data before giving up (default: 30) |
105
107
  | -p | --processes | Basic multithreading is done when getting requests for a file of URLs. This argument determines the number of processes (threads) used (default: 1) |
106
108
  | -r | --retries | The number of retries for requests that get connection error or rate limited (default: 1). |
@@ -138,7 +140,7 @@ docker build -t waymore .
138
140
  Run waymore with this command:
139
141
 
140
142
  ```bash
141
- docker run -it --rm -v $PWD/results:/app/results waymore:latest waymore -i example.com -oU example.com.links -oR results/example.com/
143
+ docker run -it --rm -v $PWD/results:/app/results waymore:latest -i example.com -oU example.com.links -oR results/example.com/
142
144
  ```
143
145
 
144
146
  ## Input and Mode
@@ -1,6 +1,6 @@
1
1
  <center><img src="https://github.com/xnl-h4ck3r/waymore/blob/main/waymore/images/title.png"></center>
2
2
 
3
- ## About - v6.4
3
+ ## About - v6.6
4
4
 
5
5
  The idea behind **waymore** is to find even more links from the Wayback Machine than other existing tools.
6
6
 
@@ -73,8 +73,8 @@ pipx install git+https://github.com/xnl-h4ck3r/waymore.git
73
73
  | -mc | | Only Match HTTP status codes for retrieved URLs and responses. Comma separated list of codes. Passing this argument overrides the config `FILTER_CODE` and `-fc`. |
74
74
  | -mt | | Only MIME Types for retrieved URLs and responses. Comma separated list of MIME types. Passing this argument overrides the config `FILTER_MIME` and `-ft`. **NOTE: This will NOT be applied to Alien Vault OTX, Virus Total and Intelligence X because they don't have the ability to filter on MIME Type. Sometimes URLScan does not have a MIME Type defined - these will always be included. Consider excluding sources if this matters to you.**. |
75
75
  | -l | --limit | How many responses will be saved (if `-mode R` or `-mode B` is passed). A positive value will get the **first N** results, a negative value will get the **last N** results. A value of 0 will get **ALL** responses (default: 5000) |
76
- | -from | --from-date | What date to get responses from. If not specified it will get from the earliest possible results. A partial value can be passed, e.g. `2016`, `201805`, etc. |
77
- | -to | --to-date | What date to get responses to. If not specified it will get to the latest possible results. A partial value can be passed, e.g. `2021`, `202112`, etc. |
76
+ | -from | --from-date | What date to get data from. If not specified it will get from the earliest possible results. A partial value can be passed, e.g. `2016`, `201805`, etc. **IMPORTANT: There are some exceptions with sources unable to get URLs within date limits: Virus Total - all known sub domains will still be returned; Intelligence X - all URLs will still be returned.** |
77
+ | -to | --to-date | What date to get data to. If not specified it will get to the latest possible results. A partial value can be passed, e.g. `2021`, `202112`, etc. **IMPORTANT: There are some exceptions with sources unable to get URLs within date limits: Virus Total - all known sub domains will still be returned; Intelligence X - all URLs will still be returned.** |
78
78
  | -ci | --capture-interval | Filters the search on archive.org to only get at most 1 capture per hour (`h`), day (`d`) or month (`m`). This filter is used for responses only. The default is `d` but can also be set to `none` to not filter anything and get all responses. |
79
79
  | -ra | --regex-after | RegEx for filtering purposes against links found from all sources of URLs AND responses downloaded. Only positive matches will be output. |
80
80
  | -url-filename | | Set the file name of downloaded responses to the URL that generated the response, otherwise it will be set to the hash value of the response. Using the hash value means multiple URLs that generated the same response will only result in one file being saved for that response. |
@@ -85,7 +85,6 @@ pipx install git+https://github.com/xnl-h4ck3r/waymore.git
85
85
  | -xvt | | Exclude checks for links from virustotal.com |
86
86
  | -xix | | Exclude checks for links from Intelligence X.com |
87
87
  | -lcc | | Limit the number of Common Crawl index collections searched, e.g. `-lcc 10` will just search the latest `10` collections (default: 1). As of November 2024 there are currently 106 collections. Setting to `0` will search **ALL** collections. If you don't want to search Common Crawl at all, use the `-xcc` option. |
88
- | -lcy | | Limit the number of Common Crawl index collections searched by the year of the index data. The earliest index has data from 2008. Setting to 0 (default) will search collections or any year (but in conjuction with `-lcc`). For example, if you are only interested in data from 2015 and after, pass `-lcy 2015`. This will override the value of `-lcc` if passed. If you don't want to search Common Crawl at all, use the `-xcc` option. |
89
88
  | -t | --timeout | This is for archived responses only! How many seconds to wait for the server to send data before giving up (default: 30) |
90
89
  | -p | --processes | Basic multithreading is done when getting requests for a file of URLs. This argument determines the number of processes (threads) used (default: 1) |
91
90
  | -r | --retries | The number of retries for requests that get connection error or rate limited (default: 1). |
@@ -123,7 +122,7 @@ docker build -t waymore .
123
122
  Run waymore with this command:
124
123
 
125
124
  ```bash
126
- docker run -it --rm -v $PWD/results:/app/results waymore:latest waymore -i example.com -oU example.com.links -oR results/example.com/
125
+ docker run -it --rm -v $PWD/results:/app/results waymore:latest -i example.com -oU example.com.links -oR results/example.com/
127
126
  ```
128
127
 
129
128
  ## Input and Mode
@@ -0,0 +1,31 @@
1
+ [build-system]
2
+ requires = ["setuptools>=68", "wheel"]
3
+ build-backend = "setuptools.build_meta"
4
+
5
+ [project]
6
+ name = "waymore"
7
+ description = "Find way more from the Wayback Machine, Common Crawl, Alien Vault OTX, URLScan, VirusTotal & Intelligence X!"
8
+ readme = "README.md"
9
+ requires-python = ">=3.9"
10
+ license = { text = "MIT" }
11
+ authors = [{ name = "xnl-h4ck3r" }]
12
+ dynamic = ["version", "dependencies"]
13
+
14
+ [project.scripts]
15
+ waymore = "waymore.waymore:main"
16
+
17
+ [tool.setuptools.dynamic]
18
+ dependencies = { file = ["requirements.txt"] }
19
+ version = { attr = "waymore.__version__" }
20
+
21
+ [tool.ruff]
22
+ line-length = 100
23
+ target-version = "py39"
24
+
25
+ [tool.ruff.lint]
26
+ select = ["E", "F", "I", "UP"]
27
+ ignore = ["E501"] # Ignore line length violations for existing code
28
+
29
+ [tool.black]
30
+ line-length = 100
31
+ target-version = ["py39"]
@@ -0,0 +1,7 @@
1
+ PyYAML
2
+ requests
3
+ setuptools
4
+ termcolor
5
+ psutil
6
+ urlparse3
7
+ tldextract
@@ -1,7 +1,22 @@
1
1
  #!/usr/bin/env python
2
2
  import os
3
+ import re
3
4
  import shutil
4
- from setuptools import setup, find_packages
5
+
6
+ from setuptools import find_packages, setup
7
+
8
+ # Read version from __init__.py without importing
9
+
10
+
11
+ def get_version():
12
+ init_path = os.path.join(os.path.dirname(__file__), "waymore", "__init__.py")
13
+ with open(init_path, encoding="utf-8") as f:
14
+ content = f.read()
15
+ match = re.search(r'__version__\s*=\s*["\']([^"\']+)["\']', content)
16
+ if match:
17
+ return match.group(1)
18
+ raise RuntimeError("Unable to find version string.")
19
+
5
20
 
6
21
  target_directory = (
7
22
  os.path.join(os.getenv("APPDATA", ""), "waymore")
@@ -10,9 +25,7 @@ target_directory = (
10
25
  os.path.join(os.path.expanduser("~"), ".config", "waymore")
11
26
  if os.name == "posix"
12
27
  else (
13
- os.path.join(
14
- os.path.expanduser("~"), "Library", "Application Support", "waymore"
15
- )
28
+ os.path.join(os.path.expanduser("~"), "Library", "Application Support", "waymore")
16
29
  if os.name == "darwin"
17
30
  else None
18
31
  )
@@ -26,29 +39,22 @@ if target_directory and os.path.isfile("config.yml"):
26
39
  # If file already exists, create a new one
27
40
  if os.path.isfile(target_directory + "/config.yml"):
28
41
  configNew = True
29
- os.rename(
30
- target_directory + "/config.yml", target_directory + "/config.yml.OLD"
31
- )
42
+ os.rename(target_directory + "/config.yml", target_directory + "/config.yml.OLD")
32
43
  shutil.copy("config.yml", target_directory)
33
- os.rename(
34
- target_directory + "/config.yml", target_directory + "/config.yml.NEW"
35
- )
36
- os.rename(
37
- target_directory + "/config.yml.OLD", target_directory + "/config.yml"
38
- )
44
+ os.rename(target_directory + "/config.yml", target_directory + "/config.yml.NEW")
45
+ os.rename(target_directory + "/config.yml.OLD", target_directory + "/config.yml")
39
46
  else:
40
47
  shutil.copy("config.yml", target_directory)
41
48
 
42
49
  setup(
43
50
  name="waymore",
44
51
  packages=find_packages(),
45
- version=__import__("waymore").__version__,
52
+ version=get_version(),
46
53
  description="Find way more from the Wayback Machine, Common Crawl, Alien Vault OTX, URLScan & VirusTotal!",
47
- long_description=open("README.md").read(),
54
+ long_description=open("README.md", encoding="utf-8").read(),
48
55
  long_description_content_type="text/markdown",
49
56
  author="@xnl-h4ck3r",
50
57
  url="https://github.com/xnl-h4ck3r/waymore",
51
- py_modules=["waymore"],
52
58
  install_requires=[
53
59
  "requests",
54
60
  "pyyaml",
@@ -71,8 +77,4 @@ if configNew:
71
77
  + "/config.yml already exists.\nCreating config.yml.NEW but leaving existing config.\nIf you need the new file, then remove the current one and rename config.yml.NEW to config.yml\n\033[0m"
72
78
  )
73
79
  else:
74
- print(
75
- "\n\033[92mThe file "
76
- + target_directory
77
- + "/config.yml has been created.\n\033[0m"
78
- )
80
+ print("\n\033[92mThe file " + target_directory + "/config.yml has been created.\n\033[0m")
@@ -0,0 +1,5 @@
1
+ def test_import_waymore():
2
+ import importlib
3
+
4
+ m = importlib.import_module("waymore")
5
+ assert m is not None
@@ -0,0 +1 @@
1
+ __version__ = "6.6"