PyPI - mkdocs-htmlproofer-plugin - Versions diffs - 1.0.0__tar.gz → 1.2.0__tar.gz - Mend

mkdocs-htmlproofer-plugin 1.0.0tar.gz → 1.2.0tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (14) hide show

{mkdocs-htmlproofer-plugin-1.0.0/mkdocs_htmlproofer_plugin.egg-info → mkdocs-htmlproofer-plugin-1.2.0}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.1
 Name: mkdocs-htmlproofer-plugin
-Version: 1.0.0
+Version: 1.2.0
 Summary: A MkDocs plugin that validates URL in rendered HTML files
 Home-page: https://github.com/manuzhang/mkdocs-htmlproofer-plugin
 Author: Manu Zhang
@@ -51,12 +51,6 @@ plugins:
     - htmlproofer
 ```
-To enable cross-page anchor validation, you must set `use_directory_urls = False` in `mkdocs.yml`:
-```yaml
-use_directory_urls: False
-```
 ## Configuring
 ### `enabled`
@@ -101,7 +95,7 @@ plugins:
 ### `raise_error_excludes`
 When specifying `raise_error: True` or `raise_error_after_finish: True`, it is possible to ignore errors
-for combinations of urls (`'*'` means all urls) and status codes with `raise_error_excludes`.
+for combinations of URLs and status codes with `raise_error_excludes`. Each URL supports unix style wildcards `*`, `[]`, `?`, etc.
 ```yaml
 plugins:
@@ -110,10 +104,42 @@ plugins:
       raise_error: True
       raise_error_excludes:
         504: ['https://www.mkdocs.org/']
-        404: ['https://github.com/manuzhang/mkdocs-htmlproofer-plugin']
+        404: ['https://github.com/manuzhang/*']
         400: ['*']
 ```
+### `ignore_urls`
+Avoid validating the given list of URLs by ignoring them altogether. Each URL in the
+list supports unix style wildcards `*`, `[]`, `?`, etc.
+Unlike `raise_error_excludes`, ignored URLs will not be fetched at all.
+```yaml
+plugins:
+  - search
+  - htmlproofer:
+      raise_error: True
+      ignore_urls:
+        - https://github.com/myprivateorg/*
+        - https://app.dynamic-service-of-some-kind.io*
+```
+### `warn_on_ignored_urls`
+Log a warning when ignoring URLs with `ignore_urls` option. Defaults to `false` (no warning).
+```yaml
+plugins:
+  - search
+  - htmlproofer:
+      raise_error: True
+      ignore_urls:
+        - https://github.com/myprivateorg/*
+        - https://app.dynamic-service-of-some-kind.io*
+      warn_on_ignored_urls: true
+```
 ### `validate_external_urls`
 Avoids validating any external URLs (i.e those starting with http:// or https://).
@@ -136,13 +162,24 @@ plugins:
       validate_rendered_template: True
 ```
+### `skip_downloads`
+Optionally skip downloading of a remote URLs content via GET request. This can
+considerably reduce the time taken to validate URLs.
+```yaml
+plugins:
+  - htmlproofer:
+      skip_downloads: True
+```
 ## Compatibility with `attr_list` extension
-If you need to manually specify anchors make use of the `attr_list` [extension](https://python-markdown.github.io/extensions/attr_list) in the markdown.
+If you need to manually specify anchors make use of the `attr_list` [extension](https://python-markdown.github.io/extensions/attr_list) in the markdown.
 This can be useful for multilingual documentation to keep anchors as language neutral permalinks in all languages.
 * A sample for a heading `# Grüße {#greetings}` (the slugified generated anchor `Gre` is overwritten with `greetings`).
-* This also works for images `this is a nice image [](foo-bar.png){#nice-image}`
+* This also works for images `this is a nice image [](foo-bar.png){#nice-image}`
 * And generall for paragraphs:
 ```markdown
 Listing: This is noteworthy.
@@ -155,4 +192,4 @@ More information about plugins in the [MkDocs documentation](http://www.mkdocs.o
 ## Acknowledgement
-This work is based on the [mkdocs-markdownextradata-plugin](https://github.com/rosscdh/mkdocs-markdownextradata-plugin) project and the [Finding and Fixing Website Link Rot with Python, BeautifulSoup and Requests](https://www.twilio.com/blog/2018/07/find-fix-website-link-rot-python-beautifulsoup-requests.html) article.
+This work is based on the [mkdocs-markdownextradata-plugin](https://github.com/rosscdh/mkdocs-markdownextradata-plugin) project and the [Finding and Fixing Website Link Rot with Python, BeautifulSoup and Requests](https://www.twilio.com/en-us/blog/find-fix-website-link-rot-python-beautifulsoup-requests-html) article.

{mkdocs-htmlproofer-plugin-1.0.0 → mkdocs-htmlproofer-plugin-1.2.0}/README.md RENAMED Viewed

@@ -28,12 +28,6 @@ plugins:
     - htmlproofer
 ```
-To enable cross-page anchor validation, you must set `use_directory_urls = False` in `mkdocs.yml`:
-```yaml
-use_directory_urls: False
-```
 ## Configuring
 ### `enabled`
@@ -78,7 +72,7 @@ plugins:
 ### `raise_error_excludes`
 When specifying `raise_error: True` or `raise_error_after_finish: True`, it is possible to ignore errors
-for combinations of urls (`'*'` means all urls) and status codes with `raise_error_excludes`.
+for combinations of URLs and status codes with `raise_error_excludes`. Each URL supports unix style wildcards `*`, `[]`, `?`, etc.
 ```yaml
 plugins:
@@ -87,10 +81,42 @@ plugins:
       raise_error: True
       raise_error_excludes:
         504: ['https://www.mkdocs.org/']
-        404: ['https://github.com/manuzhang/mkdocs-htmlproofer-plugin']
+        404: ['https://github.com/manuzhang/*']
         400: ['*']
 ```
+### `ignore_urls`
+Avoid validating the given list of URLs by ignoring them altogether. Each URL in the
+list supports unix style wildcards `*`, `[]`, `?`, etc.
+Unlike `raise_error_excludes`, ignored URLs will not be fetched at all.
+```yaml
+plugins:
+  - search
+  - htmlproofer:
+      raise_error: True
+      ignore_urls:
+        - https://github.com/myprivateorg/*
+        - https://app.dynamic-service-of-some-kind.io*
+```
+### `warn_on_ignored_urls`
+Log a warning when ignoring URLs with `ignore_urls` option. Defaults to `false` (no warning).
+```yaml
+plugins:
+  - search
+  - htmlproofer:
+      raise_error: True
+      ignore_urls:
+        - https://github.com/myprivateorg/*
+        - https://app.dynamic-service-of-some-kind.io*
+      warn_on_ignored_urls: true
+```
 ### `validate_external_urls`
 Avoids validating any external URLs (i.e those starting with http:// or https://).
@@ -113,13 +139,24 @@ plugins:
       validate_rendered_template: True
 ```
+### `skip_downloads`
+Optionally skip downloading of a remote URLs content via GET request. This can
+considerably reduce the time taken to validate URLs.
+```yaml
+plugins:
+  - htmlproofer:
+      skip_downloads: True
+```
 ## Compatibility with `attr_list` extension
-If you need to manually specify anchors make use of the `attr_list` [extension](https://python-markdown.github.io/extensions/attr_list) in the markdown.
+If you need to manually specify anchors make use of the `attr_list` [extension](https://python-markdown.github.io/extensions/attr_list) in the markdown.
 This can be useful for multilingual documentation to keep anchors as language neutral permalinks in all languages.
 * A sample for a heading `# Grüße {#greetings}` (the slugified generated anchor `Gre` is overwritten with `greetings`).
-* This also works for images `this is a nice image [](foo-bar.png){#nice-image}`
+* This also works for images `this is a nice image [](foo-bar.png){#nice-image}`
 * And generall for paragraphs:
 ```markdown
 Listing: This is noteworthy.
@@ -132,4 +169,4 @@ More information about plugins in the [MkDocs documentation](http://www.mkdocs.o
 ## Acknowledgement
-This work is based on the [mkdocs-markdownextradata-plugin](https://github.com/rosscdh/mkdocs-markdownextradata-plugin) project and the [Finding and Fixing Website Link Rot with Python, BeautifulSoup and Requests](https://www.twilio.com/blog/2018/07/find-fix-website-link-rot-python-beautifulsoup-requests.html) article.
+This work is based on the [mkdocs-markdownextradata-plugin](https://github.com/rosscdh/mkdocs-markdownextradata-plugin) project and the [Finding and Fixing Website Link Rot with Python, BeautifulSoup and Requests](https://www.twilio.com/en-us/blog/find-fix-website-link-rot-python-beautifulsoup-requests-html) article.

{mkdocs-htmlproofer-plugin-1.0.0 → mkdocs-htmlproofer-plugin-1.2.0}/htmlproofer/plugin.py RENAMED Viewed

@@ -1,8 +1,9 @@
+import fnmatch
 from functools import lru_cache, partial
 import os.path
 import pathlib
 import re
-from typing import Dict, Optional, Set
+from typing import Dict, List, Optional, Set
 import urllib.parse
 import uuid
@@ -56,7 +57,7 @@ def log_error(msg, *args, **kwargs):
 class HtmlProoferPlugin(BasePlugin):
-    files: Dict[str, File]
+    files: List[File]
     invalid_links = False
     config_scheme = (
@@ -64,8 +65,11 @@ class HtmlProoferPlugin(BasePlugin):
         ('raise_error', config_options.Type(bool, default=False)),
         ('raise_error_after_finish', config_options.Type(bool, default=False)),
         ('raise_error_excludes', config_options.Type(dict, default={})),
+        ('skip_downloads', config_options.Type(bool, default=False)),
         ('validate_external_urls', config_options.Type(bool, default=True)),
         ('validate_rendered_template', config_options.Type(bool, default=False)),
+        ('ignore_urls', config_options.Type(list, default=[])),
+        ('warn_on_ignored_urls', config_options.Type(bool, default=False)),
     )
     def __init__(self):
@@ -73,7 +77,7 @@ class HtmlProoferPlugin(BasePlugin):
         self._session.verify = False
         self._session.headers.update(URL_HEADERS)
         self._session.max_redirects = 5
-        self.files = {}
+        self.files = []
         self.scheme_handlers = {
             "http": partial(HtmlProoferPlugin.resolve_web_scheme, self),
             "https": partial(HtmlProoferPlugin.resolve_web_scheme, self),
@@ -84,15 +88,24 @@ class HtmlProoferPlugin(BasePlugin):
         if self.config['raise_error_after_finish'] and self.invalid_links:
             raise PluginError("Invalid links present.")
-    def on_page_markdown(self, markdown: str, page: Page, config: Config, files: Files) -> None:
+    def on_files(self, files: Files, config: Config) -> None:
         # Store files to allow inspecting Markdown files in later stages.
-        self.files.update({os.path.normpath(file.url): file for file in files})
+        # The values in files at this point are not guaranteed to be the same as the ones in the Page objects.
+        # For example, material blog plugin may modify the files after this event.
+        for f in files:
+            self.files.append(f)
     def on_post_page(self, output_content: str, page: Page, config: Config) -> None:
         if not self.config['enabled']:
             return
-        use_directory_urls = config.data["use_directory_urls"]
+        # Optimization: At this point, we have all the files, so we can create
+        # a dictionary for faster lookups. Prior to this point, files are
+        # still being updated so creating a dictionary before now would result
+        # in incorrect values appearing as the key.
+        opt_files = {}
+        opt_files.update({os.path.normpath(file.url): file for file in self.files})
+        opt_files.update({os.path.normpath(file.src_uri): file for file in self.files})
         # Optimization: only parse links and headings
         # li, sup are used for footnotes
@@ -106,10 +119,13 @@ class HtmlProoferPlugin(BasePlugin):
         for a in soup.find_all('a', href=True):
             url = a['href']
-            url_status = self.get_url_status(url, page.file.src_path, all_element_ids, self.files, use_directory_urls)
-            if self.bad_url(url_status) and self.is_error(self.config, url, url_status):
-                self.report_invalid_url(url, url_status, page.file.src_path)
+            if any(fnmatch.fnmatch(url, ignore_url) for ignore_url in self.config['ignore_urls']):
+                if self.config['warn_on_ignored_urls']:
+                    log_warning(f"ignoring URL {url} from {page.file.src_path}")
+            else:
+                url_status = self.get_url_status(url, page.file.src_path, all_element_ids, opt_files)
+                if self.bad_url(url_status) and self.is_error(self.config, url, url_status):
+                    self.report_invalid_url(url, url_status, page.file.src_path)
     def report_invalid_url(self, url, url_status, src_path):
         error = f'invalid url - {url} [{url_status}] [{src_path}]'
@@ -131,7 +147,13 @@ class HtmlProoferPlugin(BasePlugin):
     @lru_cache(maxsize=1000)
     def resolve_web_scheme(self, url: str) -> int:
         try:
-            response = self._session.get(url, timeout=URL_TIMEOUT)
+            response = self._session.get(url, timeout=URL_TIMEOUT, stream=True)
+            if self.config['skip_downloads'] is False:
+                # Download the entire contents as to not break previous behaviour.
+                for _ in response.iter_content(chunk_size=1024):
+                    pass
             return response.status_code
         except requests.exceptions.Timeout:
             return 504
@@ -145,8 +167,7 @@ class HtmlProoferPlugin(BasePlugin):
             url: str,
             src_path: str,
             all_element_ids: Set[str],
-            files: Dict[str, File],
-            use_directory_urls: bool
+            files: Dict[str, File]
     ) -> int:
         if any(pat.match(url) for pat in LOCAL_PATTERNS):
             return 0
@@ -158,18 +179,13 @@ class HtmlProoferPlugin(BasePlugin):
             return 0
         if fragment and not path:
             return 0 if url[1:] in all_element_ids else 404
-        elif not use_directory_urls:
-            # use_directory_urls = True injects too many challenges for locating the correct target
-            # Markdown file, so disable target anchor validation in this case. Examples include:
-            # ../..#BAD_ANCHOR style links to index.html and extra ../ inserted into relative
-            # links.
+        else:
             is_valid = self.is_url_target_valid(url, src_path, files)
             url_status = 404
             if not is_valid and self.is_error(self.config, url, url_status):
                 log_warning(f"Unable to locate source file for: {url}")
                 return url_status
             return 0
-        return 0
     @staticmethod
     def is_url_target_valid(url: str, src_path: str, files: Dict[str, File]) -> bool:
@@ -209,8 +225,14 @@ class HtmlProoferPlugin(BasePlugin):
             # Convert root/site paths
             search_path = os.path.normpath(url[1:])
         else:
-            # Handle relative links by concatenating the source dir with the destination path
-            search_path = os.path.normpath(str(pathlib.Path(src_path).parent / pathlib.Path(url)))
+            # Handle relative links by looking up the destination url for the
+            # src_path and getting the parent directory.
+            try:
+                dest_uri = files[src_path].dest_uri
+                src_dir = urllib.parse.quote(str(pathlib.Path(dest_uri).parent), safe='/\\')
+                search_path = os.path.normpath(str(pathlib.Path(src_dir) / pathlib.Path(url)))
+            except KeyError:
+                return None
         try:
             return files[search_path]
@@ -270,23 +292,16 @@ class HtmlProoferPlugin(BasePlugin):
     def bad_url(url_status: int) -> bool:
         if url_status == -1:
             return True
-        elif url_status == 401 or url_status == 403:
-            return False
-        elif url_status in (503, 504):
-            # Usually transient
-            return False
-        elif url_status == 999:
-            # Returned by some websites (e.g. LinkedIn) that think you're crawling them.
-            return False
         elif url_status >= 400:
             return True
-        return False
+        else:
+            return False
     @staticmethod
     def is_error(config: Config, url: str, url_status: int) -> bool:
         excludes = config['raise_error_excludes'].get(url_status, [])
-        if '*' in excludes or url in excludes:
+        if any(fnmatch.fnmatch(url, exclude_url) for exclude_url in excludes):
             return False
-        return True
+        else:
+            return True

{mkdocs-htmlproofer-plugin-1.0.0 → mkdocs-htmlproofer-plugin-1.2.0/mkdocs_htmlproofer_plugin.egg-info}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.1
 Name: mkdocs-htmlproofer-plugin
-Version: 1.0.0
+Version: 1.2.0
 Summary: A MkDocs plugin that validates URL in rendered HTML files
 Home-page: https://github.com/manuzhang/mkdocs-htmlproofer-plugin
 Author: Manu Zhang
@@ -51,12 +51,6 @@ plugins:
     - htmlproofer
 ```
-To enable cross-page anchor validation, you must set `use_directory_urls = False` in `mkdocs.yml`:
-```yaml
-use_directory_urls: False
-```
 ## Configuring
 ### `enabled`
@@ -101,7 +95,7 @@ plugins:
 ### `raise_error_excludes`
 When specifying `raise_error: True` or `raise_error_after_finish: True`, it is possible to ignore errors
-for combinations of urls (`'*'` means all urls) and status codes with `raise_error_excludes`.
+for combinations of URLs and status codes with `raise_error_excludes`. Each URL supports unix style wildcards `*`, `[]`, `?`, etc.
 ```yaml
 plugins:
@@ -110,10 +104,42 @@ plugins:
       raise_error: True
       raise_error_excludes:
         504: ['https://www.mkdocs.org/']
-        404: ['https://github.com/manuzhang/mkdocs-htmlproofer-plugin']
+        404: ['https://github.com/manuzhang/*']
         400: ['*']
 ```
+### `ignore_urls`
+Avoid validating the given list of URLs by ignoring them altogether. Each URL in the
+list supports unix style wildcards `*`, `[]`, `?`, etc.
+Unlike `raise_error_excludes`, ignored URLs will not be fetched at all.
+```yaml
+plugins:
+  - search
+  - htmlproofer:
+      raise_error: True
+      ignore_urls:
+        - https://github.com/myprivateorg/*
+        - https://app.dynamic-service-of-some-kind.io*
+```
+### `warn_on_ignored_urls`
+Log a warning when ignoring URLs with `ignore_urls` option. Defaults to `false` (no warning).
+```yaml
+plugins:
+  - search
+  - htmlproofer:
+      raise_error: True
+      ignore_urls:
+        - https://github.com/myprivateorg/*
+        - https://app.dynamic-service-of-some-kind.io*
+      warn_on_ignored_urls: true
+```
 ### `validate_external_urls`
 Avoids validating any external URLs (i.e those starting with http:// or https://).
@@ -136,13 +162,24 @@ plugins:
       validate_rendered_template: True
 ```
+### `skip_downloads`
+Optionally skip downloading of a remote URLs content via GET request. This can
+considerably reduce the time taken to validate URLs.
+```yaml
+plugins:
+  - htmlproofer:
+      skip_downloads: True
+```
 ## Compatibility with `attr_list` extension
-If you need to manually specify anchors make use of the `attr_list` [extension](https://python-markdown.github.io/extensions/attr_list) in the markdown.
+If you need to manually specify anchors make use of the `attr_list` [extension](https://python-markdown.github.io/extensions/attr_list) in the markdown.
 This can be useful for multilingual documentation to keep anchors as language neutral permalinks in all languages.
 * A sample for a heading `# Grüße {#greetings}` (the slugified generated anchor `Gre` is overwritten with `greetings`).
-* This also works for images `this is a nice image [](foo-bar.png){#nice-image}`
+* This also works for images `this is a nice image [](foo-bar.png){#nice-image}`
 * And generall for paragraphs:
 ```markdown
 Listing: This is noteworthy.
@@ -155,4 +192,4 @@ More information about plugins in the [MkDocs documentation](http://www.mkdocs.o
 ## Acknowledgement
-This work is based on the [mkdocs-markdownextradata-plugin](https://github.com/rosscdh/mkdocs-markdownextradata-plugin) project and the [Finding and Fixing Website Link Rot with Python, BeautifulSoup and Requests](https://www.twilio.com/blog/2018/07/find-fix-website-link-rot-python-beautifulsoup-requests.html) article.
+This work is based on the [mkdocs-markdownextradata-plugin](https://github.com/rosscdh/mkdocs-markdownextradata-plugin) project and the [Finding and Fixing Website Link Rot with Python, BeautifulSoup and Requests](https://www.twilio.com/en-us/blog/find-fix-website-link-rot-python-beautifulsoup-requests-html) article.

{mkdocs-htmlproofer-plugin-1.0.0 → mkdocs-htmlproofer-plugin-1.2.0}/setup.py RENAMED Viewed

@@ -9,7 +9,7 @@ def read(fname: str):
 setup(
     name='mkdocs-htmlproofer-plugin',
-    version='1.0.0',
+    version='1.2.0',
     description='A MkDocs plugin that validates URL in rendered HTML files',
     long_description=read('README.md'),
     long_description_content_type='text/markdown',