PyPI - waymore - Versions diffs - 6.5__tar.gz → 7.0__tar.gz - Mend

waymore 6.5tar.gz → 7.0tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (17) hide show

{waymore-6.5/waymore.egg-info → waymore-7.0}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
-Metadata-Version: 2.1
+Metadata-Version: 2.4
 Name: waymore
-Version: 6.5
+Version: 7.0
 Summary: Find way more from the Wayback Machine, Common Crawl, Alien Vault OTX, URLScan, VirusTotal & Intelligence X!
 Home-page: https://github.com/xnl-h4ck3r/waymore
 Author: xnl-h4ck3r
@@ -15,10 +15,13 @@ Requires-Dist: termcolor
 Requires-Dist: psutil
 Requires-Dist: urlparse3
 Requires-Dist: tldextract
+Requires-Dist: aiohttp
+Dynamic: home-page
+Dynamic: license-file
 <center><img src="https://github.com/xnl-h4ck3r/waymore/blob/main/waymore/images/title.png"></center>
-## About - v6.5
+## About - v7.0
 The idea behind **waymore** is to find even more links from the Wayback Machine than other existing tools.
@@ -48,7 +51,7 @@ Now **waymore** gets URL's from ALL of those sources too (with ability to filter
 **NOTE: If you already have a `config.yml` file, it will not be overwritten. The file `config.yml.NEW` will be created in the same directory. If you need the new config, remove `config.yml` and rename `config.yml.NEW` back to `config.yml`.**
-`waymore` supports **Python 3**.
+`waymore` supports **Python 3.7+** (Python 3.7 or higher required for async/await support).
 Install `waymore` in default (global) python environment.
@@ -91,8 +94,8 @@ pipx install git+https://github.com/xnl-h4ck3r/waymore.git
 | -mc           |                            | Only Match HTTP status codes for retrieved URLs and responses. Comma separated list of codes. Passing this argument overrides the config `FILTER_CODE` and `-fc`.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              |
 | -mt           |                            | Only MIME Types for retrieved URLs and responses. Comma separated list of MIME types. Passing this argument overrides the config `FILTER_MIME` and `-ft`. **NOTE: This will NOT be applied to Alien Vault OTX, Virus Total and Intelligence X because they don't have the ability to filter on MIME Type. Sometimes URLScan does not have a MIME Type defined - these will always be included. Consider excluding sources if this matters to you.**.                                                                                                                                                                                                           |
 | -l            | --limit                    | How many responses will be saved (if `-mode R` or `-mode B` is passed). A positive value will get the **first N** results, a negative value will get the **last N** results. A value of 0 will get **ALL** responses (default: 5000)                                                                                                                                                                                                                                                                                                                                                                                                                           |
-| -from         | --from-date                | What date to get responses from. If not specified it will get from the earliest possible results. A partial value can be passed, e.g. `2016`, `201805`, etc.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   |
-| -to           | --to-date                  | What date to get responses to. If not specified it will get to the latest possible results. A partial value can be passed, e.g. `2021`, `202112`, etc.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         |
+| -from         | --from-date                | What date to get data from. If not specified it will get from the earliest possible results. A partial value can be passed, e.g. `2016`, `201805`, etc. **IMPORTANT: There are some exceptions with sources unable to get URLs within date limits: Virus Total - all known sub domains will still be returned; Intelligence X - all URLs will still be returned.**                                                                                                                                                                                                                                                                                             |
+| -to           | --to-date                  | What date to get data to. If not specified it will get to the latest possible results. A partial value can be passed, e.g. `2021`, `202112`, etc. **IMPORTANT: There are some exceptions with sources unable to get URLs within date limits: Virus Total - all known sub domains will still be returned; Intelligence X - all URLs will still be returned.**                                                                                                                                                                                                                                                                                                   |
 | -ci           | --capture-interval         | Filters the search on archive.org to only get at most 1 capture per hour (`h`), day (`d`) or month (`m`). This filter is used for responses only. The default is `d` but can also be set to `none` to not filter anything and get all responses.                                                                                                                                                                                                                                                                                                                                                                                                               |
 | -ra           | --regex-after              | RegEx for filtering purposes against links found from all sources of URLs AND responses downloaded. Only positive matches will be output.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      |
 | -url-filename |                            | Set the file name of downloaded responses to the URL that generated the response, otherwise it will be set to the hash value of the response. Using the hash value means multiple URLs that generated the same response will only result in one file being saved for that response.                                                                                                                                                                                                                                                                                                                                                                            |
@@ -103,9 +106,8 @@ pipx install git+https://github.com/xnl-h4ck3r/waymore.git
 | -xvt          |                            | Exclude checks for links from virustotal.com                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   |
 | -xix          |                            | Exclude checks for links from Intelligence X.com                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               |
 | -lcc          |                            | Limit the number of Common Crawl index collections searched, e.g. `-lcc 10` will just search the latest `10` collections (default: 1). As of November 2024 there are currently 106 collections. Setting to `0` will search **ALL** collections. If you don't want to search Common Crawl at all, use the `-xcc` option.                                                                                                                                                                                                                                                                                                                                        |
-| -lcy          |                            | Limit the number of Common Crawl index collections searched by the year of the index data. The earliest index has data from 2008. Setting to 0 (default) will search collections or any year (but in conjuction with `-lcc`). For example, if you are only interested in data from 2015 and after, pass `-lcy 2015`. This will override the value of `-lcc` if passed. If you don't want to search Common Crawl at all, use the `-xcc` option.                                                                                                                                                                                                                 |
 | -t            | --timeout                  | This is for archived responses only! How many seconds to wait for the server to send data before giving up (default: 30)                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       |
-| -p            | --processes                | Basic multithreading is done when getting requests for a file of URLs. This argument determines the number of processes (threads) used (default: 1)                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            |
+| -p            | --processes                | Basic multithreading is done when getting requests for a file of URLs. This argument determines the number of processes (threads) used (default: 2)                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            |
 | -r            | --retries                  | The number of retries for requests that get connection error or rate limited (default: 1).                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     |
 | -m            | --memory-threshold         | The memory threshold percentage. If the machines memory goes above the threshold, the program will be stopped and ended gracefully before running out of memory (default: 95)                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  |
 | -ko           | --keywords-only            | Only return links and responses that contain keywords that you are interested in. This can reduce the time it takes to get results. If you provide the flag with no value, Keywords are taken from the comma separated list in the `config.yml` file (typically in `~/.config/waymore/`) with the `FILTER_KEYWORDS` key, otherwise you can pass a specific Regex value to use, e.g. `-ko "admin"` to only get links containing the word `admin`, or `-ko "\.js(\?\|$)"` to only get JS files. The Regex check is NOT case sensitive.                                                                                                                           |

{waymore-6.5 → waymore-7.0}/README.md RENAMED Viewed

@@ -1,6 +1,6 @@
 <center><img src="https://github.com/xnl-h4ck3r/waymore/blob/main/waymore/images/title.png"></center>
-## About - v6.5
+## About - v7.0
 The idea behind **waymore** is to find even more links from the Wayback Machine than other existing tools.
@@ -30,7 +30,7 @@ Now **waymore** gets URL's from ALL of those sources too (with ability to filter
 **NOTE: If you already have a `config.yml` file, it will not be overwritten. The file `config.yml.NEW` will be created in the same directory. If you need the new config, remove `config.yml` and rename `config.yml.NEW` back to `config.yml`.**
-`waymore` supports **Python 3**.
+`waymore` supports **Python 3.7+** (Python 3.7 or higher required for async/await support).
 Install `waymore` in default (global) python environment.
@@ -73,8 +73,8 @@ pipx install git+https://github.com/xnl-h4ck3r/waymore.git
 | -mc           |                            | Only Match HTTP status codes for retrieved URLs and responses. Comma separated list of codes. Passing this argument overrides the config `FILTER_CODE` and `-fc`.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              |
 | -mt           |                            | Only MIME Types for retrieved URLs and responses. Comma separated list of MIME types. Passing this argument overrides the config `FILTER_MIME` and `-ft`. **NOTE: This will NOT be applied to Alien Vault OTX, Virus Total and Intelligence X because they don't have the ability to filter on MIME Type. Sometimes URLScan does not have a MIME Type defined - these will always be included. Consider excluding sources if this matters to you.**.                                                                                                                                                                                                           |
 | -l            | --limit                    | How many responses will be saved (if `-mode R` or `-mode B` is passed). A positive value will get the **first N** results, a negative value will get the **last N** results. A value of 0 will get **ALL** responses (default: 5000)                                                                                                                                                                                                                                                                                                                                                                                                                           |
-| -from         | --from-date                | What date to get responses from. If not specified it will get from the earliest possible results. A partial value can be passed, e.g. `2016`, `201805`, etc.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   |
-| -to           | --to-date                  | What date to get responses to. If not specified it will get to the latest possible results. A partial value can be passed, e.g. `2021`, `202112`, etc.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         |
+| -from         | --from-date                | What date to get data from. If not specified it will get from the earliest possible results. A partial value can be passed, e.g. `2016`, `201805`, etc. **IMPORTANT: There are some exceptions with sources unable to get URLs within date limits: Virus Total - all known sub domains will still be returned; Intelligence X - all URLs will still be returned.**                                                                                                                                                                                                                                                                                             |
+| -to           | --to-date                  | What date to get data to. If not specified it will get to the latest possible results. A partial value can be passed, e.g. `2021`, `202112`, etc. **IMPORTANT: There are some exceptions with sources unable to get URLs within date limits: Virus Total - all known sub domains will still be returned; Intelligence X - all URLs will still be returned.**                                                                                                                                                                                                                                                                                                   |
 | -ci           | --capture-interval         | Filters the search on archive.org to only get at most 1 capture per hour (`h`), day (`d`) or month (`m`). This filter is used for responses only. The default is `d` but can also be set to `none` to not filter anything and get all responses.                                                                                                                                                                                                                                                                                                                                                                                                               |
 | -ra           | --regex-after              | RegEx for filtering purposes against links found from all sources of URLs AND responses downloaded. Only positive matches will be output.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      |
 | -url-filename |                            | Set the file name of downloaded responses to the URL that generated the response, otherwise it will be set to the hash value of the response. Using the hash value means multiple URLs that generated the same response will only result in one file being saved for that response.                                                                                                                                                                                                                                                                                                                                                                            |
@@ -85,9 +85,8 @@ pipx install git+https://github.com/xnl-h4ck3r/waymore.git
 | -xvt          |                            | Exclude checks for links from virustotal.com                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   |
 | -xix          |                            | Exclude checks for links from Intelligence X.com                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               |
 | -lcc          |                            | Limit the number of Common Crawl index collections searched, e.g. `-lcc 10` will just search the latest `10` collections (default: 1). As of November 2024 there are currently 106 collections. Setting to `0` will search **ALL** collections. If you don't want to search Common Crawl at all, use the `-xcc` option.                                                                                                                                                                                                                                                                                                                                        |
-| -lcy          |                            | Limit the number of Common Crawl index collections searched by the year of the index data. The earliest index has data from 2008. Setting to 0 (default) will search collections or any year (but in conjuction with `-lcc`). For example, if you are only interested in data from 2015 and after, pass `-lcy 2015`. This will override the value of `-lcc` if passed. If you don't want to search Common Crawl at all, use the `-xcc` option.                                                                                                                                                                                                                 |
 | -t            | --timeout                  | This is for archived responses only! How many seconds to wait for the server to send data before giving up (default: 30)                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       |
-| -p            | --processes                | Basic multithreading is done when getting requests for a file of URLs. This argument determines the number of processes (threads) used (default: 1)                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            |
+| -p            | --processes                | Basic multithreading is done when getting requests for a file of URLs. This argument determines the number of processes (threads) used (default: 2)                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            |
 | -r            | --retries                  | The number of retries for requests that get connection error or rate limited (default: 1).                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     |
 | -m            | --memory-threshold         | The memory threshold percentage. If the machines memory goes above the threshold, the program will be stopped and ended gracefully before running out of memory (default: 95)                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  |
 | -ko           | --keywords-only            | Only return links and responses that contain keywords that you are interested in. This can reduce the time it takes to get results. If you provide the flag with no value, Keywords are taken from the comma separated list in the `config.yml` file (typically in `~/.config/waymore/`) with the `FILTER_KEYWORDS` key, otherwise you can pass a specific Regex value to use, e.g. `-ko "admin"` to only get links containing the word `admin`, or `-ko "\.js(\?\|$)"` to only get JS files. The Regex check is NOT case sensitive.                                                                                                                           |

{waymore-6.5 → waymore-7.0}/pyproject.toml RENAMED Viewed

@@ -21,7 +21,10 @@ version = { attr = "waymore.__version__" }
 [tool.ruff]
 line-length = 100
 target-version = "py39"
+[tool.ruff.lint]
 select = ["E", "F", "I", "UP"]
+ignore = ["E501"]  # Ignore line length violations for existing code
 [tool.black]
 line-length = 100

waymore-6.5/waymore.egg-info/requires.txt → waymore-7.0/requirements.txt RENAMED Viewed

@@ -5,3 +5,4 @@ termcolor
 psutil
 urlparse3
 tldextract
+aiohttp

{waymore-6.5 → waymore-7.0}/setup.py RENAMED Viewed

@@ -2,15 +2,15 @@
 import os
 import re
 import shutil
-from setuptools import setup, find_packages
+from setuptools import find_packages, setup
 # Read version from __init__.py without importing
 def get_version():
-    init_path = os.path.join(os.path.dirname(
-        __file__), "waymore", "__init__.py")
-    with open(init_path, "r", encoding="utf-8") as f:
+    init_path = os.path.join(os.path.dirname(__file__), "waymore", "__init__.py")
+    with open(init_path, encoding="utf-8") as f:
         content = f.read()
         match = re.search(r'__version__\s*=\s*["\']([^"\']+)["\']', content)
         if match:
@@ -25,10 +25,7 @@ target_directory = (
         os.path.join(os.path.expanduser("~"), ".config", "waymore")
         if os.name == "posix"
         else (
-            os.path.join(
-                os.path.expanduser(
-                    "~"), "Library", "Application Support", "waymore"
-            )
+            os.path.join(os.path.expanduser("~"), "Library", "Application Support", "waymore")
             if os.name == "darwin"
             else None
         )
@@ -42,16 +39,10 @@ if target_directory and os.path.isfile("config.yml"):
     # If file already exists, create a new one
     if os.path.isfile(target_directory + "/config.yml"):
         configNew = True
-        os.rename(
-            target_directory + "/config.yml", target_directory + "/config.yml.OLD"
-        )
+        os.rename(target_directory + "/config.yml", target_directory + "/config.yml.OLD")
         shutil.copy("config.yml", target_directory)
-        os.rename(
-            target_directory + "/config.yml", target_directory + "/config.yml.NEW"
-        )
-        os.rename(
-            target_directory + "/config.yml.OLD", target_directory + "/config.yml"
-        )
+        os.rename(target_directory + "/config.yml", target_directory + "/config.yml.NEW")
+        os.rename(target_directory + "/config.yml.OLD", target_directory + "/config.yml")
     else:
         shutil.copy("config.yml", target_directory)
@@ -86,8 +77,4 @@ if configNew:
         + "/config.yml already exists.\nCreating config.yml.NEW but leaving existing config.\nIf you need the new file, then remove the current one and rename config.yml.NEW to config.yml\n\033[0m"
     )
 else:
-    print(
-        "\n\033[92mThe file "
-        + target_directory
-        + "/config.yml has been created.\n\033[0m"
-    )
+    print("\n\033[92mThe file " + target_directory + "/config.yml has been created.\n\033[0m")

{waymore-6.5 → waymore-7.0}/tests/test_import.py RENAMED Viewed

@@ -1,4 +1,5 @@
 def test_import_waymore():
     import importlib
     m = importlib.import_module("waymore")
     assert m is not None

waymore-7.0/waymore/__init__.py ADDED Viewed

	@@ -0,0 +1 @@
1	+ __version__ = "7.0"

waymore 6.5__tar.gz → 7.0__tar.gz

waymore 6.5tar.gz → 7.0tar.gz