PyPI - waymore - Versions diffs - 4.1__tar.gz → 4.2__tar.gz - Mend

waymore 4.1tar.gz → 4.2tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (14) hide show

{waymore-4.1/waymore.egg-info → waymore-4.2}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.1
 Name: waymore
-Version: 4.1
+Version: 4.2
 Summary: Find way more from the Wayback Machine, Common Crawl, Alien Vault OTX, URLScan & VirusTotal!
 Home-page: https://github.com/xnl-h4ck3r/waymore
 Author: @xnl-h4ck3r
@@ -16,7 +16,7 @@ Requires-Dist: tldextract
 <center><img src="https://github.com/xnl-h4ck3r/waymore/blob/main/waymore/images/title.png"></center>
-## About - v4.1
+## About - v4.2
 The idea behind **waymore** is to find even more links from the Wayback Machine than other existing tools.

{waymore-4.1 → waymore-4.2}/README.md RENAMED Viewed

@@ -1,6 +1,6 @@
 <center><img src="https://github.com/xnl-h4ck3r/waymore/blob/main/waymore/images/title.png"></center>
-## About - v4.1
+## About - v4.2
 The idea behind **waymore** is to find even more links from the Wayback Machine than other existing tools.

waymore-4.2/waymore/__init__.py ADDED Viewed

	@@ -0,0 +1 @@
1	+ __version__="4.2"

{waymore-4.1 → waymore-4.2}/waymore/waymore.py RENAMED Viewed

@@ -1780,23 +1780,31 @@ def processWayBackPage(url):
                     return
             # Get the URLs and MIME types. Each line is a separate JSON string
-            for line in resp.iter_lines():
-                results = line.decode("utf-8")
-                # Only get MIME Types if --verbose option was selected
-                if verbose():
-                    try:
-                        linkMimes.add(str(results).split(' ')[2])
-                    except Exception as e:
-                        if verbose():
-                            writerr(colored(getSPACER('ERROR processWayBackPage 2: Cannot get MIME type from line: ' + str(line)),'red'))
-                            write(resp.text)
-                try:
+            try:
+                for line in resp.iter_lines():
+                    results = line.decode("utf-8")
                     foundUrl = fixArchiveOrgUrl(str(results).split(' ')[1])
-                    linksFoundAdd(foundUrl)
-                except Exception as e:
-                    if verbose():
-                        writerr(colored(getSPACER('ERROR processWayBackPage 3: Cannot get link from line: ' + str(line)),'red'))
-                        write(resp.text)
+                    # Check the URL exclusions
+                    match = re.search(r'('+re.escape(FILTER_URL).replace(',','|')+')', foundUrl, flags=re.IGNORECASE)
+                    if match is None:
+                        # Only get MIME Types if --verbose option was selected
+                        if verbose():
+                            try:
+                                linkMimes.add(str(results).split(' ')[2])
+                            except Exception as e:
+                                if verbose():
+                                    writerr(colored(getSPACER('ERROR processWayBackPage 2: Cannot get MIME type from line: ' + str(line)),'red'))
+                                    write(resp.text)
+                        try:
+                            linksFoundAdd(foundUrl)
+                        except Exception as e:
+                            if verbose():
+                                writerr(colored(getSPACER('ERROR processWayBackPage 3: Cannot get link from line: ' + str(line)),'red'))
+                                write(resp.text)
+            except Exception as e:
+                if verbose():
+                    writerr(colored(getSPACER('ERROR processWayBackPage 4: ' + str(line)),'red'))
         else:
             pass
     except Exception as e:
@@ -2422,12 +2430,12 @@ def processResponses():
             # This is useful for filtering out captures that are 'too dense' or when looking for unique captures."
             if args.capture_interval == 'none': # get all
                 collapse = ''
-            elif args.capture_interval == 'h': # get at most 1 capture per hour
-                collapse = 'timestamp:10'
-            elif args.capture_interval == 'd': # get at most 1 capture per day
-                collapse = 'timestamp:8'
-            elif args.capture_interval == 'm': # get at most 1 capture per month
-                collapse = 'timestamp:6'
+            elif args.capture_interval == 'h': # get at most 1 capture per URL per hour
+                collapse = 'timestamp:10,original'
+            elif args.capture_interval == 'd': # get at most 1 capture per URL per day
+                collapse = 'timestamp:8,original'
+            elif args.capture_interval == 'm': # get at most 1 capture per URL per month
+                collapse = 'timestamp:6,original'
             url = WAYBACK_URL.replace('{DOMAIN}',subs + quote(argsInput) + path).replace('{COLLAPSE}',collapse) + filterMIME + filterCode + filterLimit + filterFrom + filterTo + filterKeywords

{waymore-4.1 → waymore-4.2/waymore.egg-info}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.1
 Name: waymore
-Version: 4.1
+Version: 4.2
 Summary: Find way more from the Wayback Machine, Common Crawl, Alien Vault OTX, URLScan & VirusTotal!
 Home-page: https://github.com/xnl-h4ck3r/waymore
 Author: @xnl-h4ck3r
@@ -16,7 +16,7 @@ Requires-Dist: tldextract
 <center><img src="https://github.com/xnl-h4ck3r/waymore/blob/main/waymore/images/title.png"></center>
-## About - v4.1
+## About - v4.2
 The idea behind **waymore** is to find even more links from the Wayback Machine than other existing tools.