PyPI - markdown-analysis - Versions diffs - 0.0.4__tar.gz → 0.0.5__tar.gz - Mend

markdown-analysis 0.0.4tar.gz → 0.0.5tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (15) hide show

{markdown_analysis-0.0.4 → markdown_analysis-0.0.5}/LICENSE RENAMED Viewed

File without changes

markdown_analysis-0.0.5/PKG-INFO ADDED Viewed

@@ -0,0 +1,137 @@
+Metadata-Version: 2.1
+Name: markdown_analysis
+Version: 0.0.5
+Summary: UNKNOWN
+Home-page: https://github.com/yannbanas/mrkdwn_analysis
+Author: yannbanas
+Author-email: yannbanas@gmail.com
+License: UNKNOWN
+Description: # mrkdwn_analysis
+        `mrkdwn_analysis` is a Python library designed to analyze Markdown files. With its powerful parsing capabilities, it can extract and categorize various elements within a Markdown document, including headers, sections, links, images, blockquotes, code blocks, and lists. This makes it a valuable tool for anyone looking to parse Markdown content for data analysis, content generation, or for building other tools that utilize Markdown.
+        ## Features
+        - File Loading: The MarkdownAnalyzer can load any given Markdown file provided through the file path.
+        - Header Identification: The tool can extract all headers from the markdown file, ranging from H1 to H6 tags. This allows users to have a quick overview of the document's structure.
+        - Section Identification: The analyzer can recognize different sections of the document. It defines a section as a block of text followed by a line composed solely of = or - characters.
+        - Paragraph Identification: The tool can distinguish between regular text and other elements such as lists, headers, etc., thereby identifying all the paragraphs present in the document.
+        - Blockquote Identification: The analyzer can identify and extract all blockquotes in the markdown file.
+        - Code Block Identification: The tool can extract all code blocks defined in the document, allowing you to separate the programming code from the regular text easily.
+        - List Identification: The analyzer can identify both ordered and unordered lists in the markdown file, providing information about the hierarchical structure of the points.
+        - Table Identification: The tool can identify and extract tables from the markdown file, enabling users to separate and analyze tabular data quickly.
+        - Link Identification and Validation: The analyzer can identify all links present in the markdown file, categorizing them into text and image links. Moreover, it can also verify if these links are valid or broken.
+        - Todo Identification: The tool is capable of recognizing and extracting todos (tasks or action items) present in the document.
+        - Element Counting: The analyzer can count the total number of a specific element type in the file. This can help in quantifying the extent of different elements in the document.
+        - Word Counting: The tool can count the total number of words in the file, providing an estimate of the document's length.
+        - Character Counting: The analyzer can count the total number of characters (excluding spaces) in the file, giving a detailed measure of the document's size.
+        ## Installation
+        You can install `mrkdwn_analysis` from PyPI:
+        ```bash
+        pip install mrkdwn_analysis
+        ```
+        We hope `mrkdwn_analysis` helps you with all your Markdown analyzing needs!
+        ## Usage
+        Using `mrkdwn_analysis` is simple. Just import the `MarkdownAnalyzer` class, create an instance with your Markdown file, and you're good to go!
+        ```python
+        from mrkdwn_analysis import MarkdownAnalyzer
+        analyzer = MarkdownAnalyzer("path/to/your/markdown.md")
+        headers = analyzer.identify_headers()
+        sections = analyzer.identify_sections()
+        ...
+        ```
+        ### Class MarkdownAnalyzer
+        The `MarkdownAnalyzer` class is designed to analyze Markdown files. It has the ability to extract and categorize various elements of a Markdown document.
+        ### `__init__(self, file_path)`
+        The constructor of the class. It opens the specified Markdown file and stores its content line by line.
+        - `file_path`: the path of the Markdown file to analyze.
+        ### `identify_headers(self)`
+        Analyzes the file and identifies all headers (from h1 to h6). Headers are returned as a dictionary where the key is "Header" and the value is a list of all headers found.
+        ### `identify_sections(self)`
+        Analyzes the file and identifies all sections. Sections are defined as a block of text followed by a line composed solely of `=` or `-` characters. Sections are returned as a dictionary where the key is "Section" and the value is a list of all sections found.
+        ### `identify_paragraphs(self)`
+        Analyzes the file and identifies all paragraphs. Paragraphs are defined as a block of text that is not a header, list, blockquote, etc. Paragraphs are returned as a dictionary where the key is "Paragraph" and the value is a list of all paragraphs found.
+        ### `identify_blockquotes(self)`
+        Analyzes the file and identifies all blockquotes. Blockquotes are defined by a line starting with the `>` character. Blockquotes are returned as a dictionary where the key is "Blockquote" and the value is a list of all blockquotes found.
+        ### `identify_code_blocks(self)`
+        Analyzes the file and identifies all code blocks. Code blocks are defined by a block of text surrounded by lines containing only the "```" text. Code blocks are returned as a dictionary where the key is "Code block" and the value is a list of all code blocks found.
+        ### `identify_ordered_lists(self)`
+        Analyzes the file and identifies all ordered lists. Ordered lists are defined by lines starting with a number followed by a dot. Ordered lists are returned as a dictionary where the key is "Ordered list" and the value is a list of all ordered lists found.
+        ### `identify_unordered_lists(self)`
+        Analyzes the file and identifies all unordered lists. Unordered lists are defined by lines starting with a `-`, `*`, or `+`. Unordered lists are returned as a dictionary where the key is "Unordered list" and the value is a list of all unordered lists found.
+        ### `identify_tables(self)`
+        Analyzes the file and identifies all tables. Tables are defined by lines containing `|` to delimit cells and are separated by lines containing `-` to define the borders. Tables are returned as a dictionary where the key is "Table" and the value is a list of all tables found.
+        ### `identify_links(self)`
+        Analyzes the file and identifies all links. Links are defined by the format `[text](url)`. Links are returned as a dictionary where the keys are "Text link" and "Image link" and the values are lists of all links found.
+        ### `check_links(self)`
+        Checks all links identified by `identify_links` to see if they are broken (return a 404 error). Broken links are returned as a list, each item being a dictionary containing the line number, link text, and URL.
+        ### `identify_todos(self)`
+        Analyzes the file and identifies all todos. Todos are defined by lines starting with `- [ ] `. Todos are returned as a list, each item being a dictionary containing the line number and todo text.
+        ### `count_elements(self, element_type)`
+        Counts the total number of a specific element type in the file. The `element_type` should match the name of one of the identification methods (for example, "headers" for `identify_headers`). Returns the total number of elements of this type.
+        ### `count_words(self)`
+        Counts the total number of words in the file. Returns the word count.
+        ### `count_characters(self)`
+        Counts the total number of characters (excluding spaces) in the file. Returns the character count.
+        ## Contributions
+        Contributions are always welcome! If you have a feature request, bug report, or just want to improve the code, feel free to create a pull request or open an issue.
+Platform: UNKNOWN
+Classifier: Development Status :: 2 - Pre-Alpha
+Classifier: Intended Audience :: Developers
+Classifier: License :: OSI Approved :: MIT License
+Classifier: Programming Language :: Python :: 3.11
+Description-Content-Type: text/markdown

{markdown_analysis-0.0.4 → markdown_analysis-0.0.5}/README.md RENAMED Viewed

File without changes

markdown_analysis-0.0.5/markdown_analysis.egg-info/PKG-INFO ADDED Viewed

@@ -0,0 +1,137 @@
+Metadata-Version: 2.1
+Name: markdown-analysis
+Version: 0.0.5
+Summary: UNKNOWN
+Home-page: https://github.com/yannbanas/mrkdwn_analysis
+Author: yannbanas
+Author-email: yannbanas@gmail.com
+License: UNKNOWN
+Description: # mrkdwn_analysis
+        `mrkdwn_analysis` is a Python library designed to analyze Markdown files. With its powerful parsing capabilities, it can extract and categorize various elements within a Markdown document, including headers, sections, links, images, blockquotes, code blocks, and lists. This makes it a valuable tool for anyone looking to parse Markdown content for data analysis, content generation, or for building other tools that utilize Markdown.
+        ## Features
+        - File Loading: The MarkdownAnalyzer can load any given Markdown file provided through the file path.
+        - Header Identification: The tool can extract all headers from the markdown file, ranging from H1 to H6 tags. This allows users to have a quick overview of the document's structure.
+        - Section Identification: The analyzer can recognize different sections of the document. It defines a section as a block of text followed by a line composed solely of = or - characters.
+        - Paragraph Identification: The tool can distinguish between regular text and other elements such as lists, headers, etc., thereby identifying all the paragraphs present in the document.
+        - Blockquote Identification: The analyzer can identify and extract all blockquotes in the markdown file.
+        - Code Block Identification: The tool can extract all code blocks defined in the document, allowing you to separate the programming code from the regular text easily.
+        - List Identification: The analyzer can identify both ordered and unordered lists in the markdown file, providing information about the hierarchical structure of the points.
+        - Table Identification: The tool can identify and extract tables from the markdown file, enabling users to separate and analyze tabular data quickly.
+        - Link Identification and Validation: The analyzer can identify all links present in the markdown file, categorizing them into text and image links. Moreover, it can also verify if these links are valid or broken.
+        - Todo Identification: The tool is capable of recognizing and extracting todos (tasks or action items) present in the document.
+        - Element Counting: The analyzer can count the total number of a specific element type in the file. This can help in quantifying the extent of different elements in the document.
+        - Word Counting: The tool can count the total number of words in the file, providing an estimate of the document's length.
+        - Character Counting: The analyzer can count the total number of characters (excluding spaces) in the file, giving a detailed measure of the document's size.
+        ## Installation
+        You can install `mrkdwn_analysis` from PyPI:
+        ```bash
+        pip install mrkdwn_analysis
+        ```
+        We hope `mrkdwn_analysis` helps you with all your Markdown analyzing needs!
+        ## Usage
+        Using `mrkdwn_analysis` is simple. Just import the `MarkdownAnalyzer` class, create an instance with your Markdown file, and you're good to go!
+        ```python
+        from mrkdwn_analysis import MarkdownAnalyzer
+        analyzer = MarkdownAnalyzer("path/to/your/markdown.md")
+        headers = analyzer.identify_headers()
+        sections = analyzer.identify_sections()
+        ...
+        ```
+        ### Class MarkdownAnalyzer
+        The `MarkdownAnalyzer` class is designed to analyze Markdown files. It has the ability to extract and categorize various elements of a Markdown document.
+        ### `__init__(self, file_path)`
+        The constructor of the class. It opens the specified Markdown file and stores its content line by line.
+        - `file_path`: the path of the Markdown file to analyze.
+        ### `identify_headers(self)`
+        Analyzes the file and identifies all headers (from h1 to h6). Headers are returned as a dictionary where the key is "Header" and the value is a list of all headers found.
+        ### `identify_sections(self)`
+        Analyzes the file and identifies all sections. Sections are defined as a block of text followed by a line composed solely of `=` or `-` characters. Sections are returned as a dictionary where the key is "Section" and the value is a list of all sections found.
+        ### `identify_paragraphs(self)`
+        Analyzes the file and identifies all paragraphs. Paragraphs are defined as a block of text that is not a header, list, blockquote, etc. Paragraphs are returned as a dictionary where the key is "Paragraph" and the value is a list of all paragraphs found.
+        ### `identify_blockquotes(self)`
+        Analyzes the file and identifies all blockquotes. Blockquotes are defined by a line starting with the `>` character. Blockquotes are returned as a dictionary where the key is "Blockquote" and the value is a list of all blockquotes found.
+        ### `identify_code_blocks(self)`
+        Analyzes the file and identifies all code blocks. Code blocks are defined by a block of text surrounded by lines containing only the "```" text. Code blocks are returned as a dictionary where the key is "Code block" and the value is a list of all code blocks found.
+        ### `identify_ordered_lists(self)`
+        Analyzes the file and identifies all ordered lists. Ordered lists are defined by lines starting with a number followed by a dot. Ordered lists are returned as a dictionary where the key is "Ordered list" and the value is a list of all ordered lists found.
+        ### `identify_unordered_lists(self)`
+        Analyzes the file and identifies all unordered lists. Unordered lists are defined by lines starting with a `-`, `*`, or `+`. Unordered lists are returned as a dictionary where the key is "Unordered list" and the value is a list of all unordered lists found.
+        ### `identify_tables(self)`
+        Analyzes the file and identifies all tables. Tables are defined by lines containing `|` to delimit cells and are separated by lines containing `-` to define the borders. Tables are returned as a dictionary where the key is "Table" and the value is a list of all tables found.
+        ### `identify_links(self)`
+        Analyzes the file and identifies all links. Links are defined by the format `[text](url)`. Links are returned as a dictionary where the keys are "Text link" and "Image link" and the values are lists of all links found.
+        ### `check_links(self)`
+        Checks all links identified by `identify_links` to see if they are broken (return a 404 error). Broken links are returned as a list, each item being a dictionary containing the line number, link text, and URL.
+        ### `identify_todos(self)`
+        Analyzes the file and identifies all todos. Todos are defined by lines starting with `- [ ] `. Todos are returned as a list, each item being a dictionary containing the line number and todo text.
+        ### `count_elements(self, element_type)`
+        Counts the total number of a specific element type in the file. The `element_type` should match the name of one of the identification methods (for example, "headers" for `identify_headers`). Returns the total number of elements of this type.
+        ### `count_words(self)`
+        Counts the total number of words in the file. Returns the word count.
+        ### `count_characters(self)`
+        Counts the total number of characters (excluding spaces) in the file. Returns the character count.
+        ## Contributions
+        Contributions are always welcome! If you have a feature request, bug report, or just want to improve the code, feel free to create a pull request or open an issue.
+Platform: UNKNOWN
+Classifier: Development Status :: 2 - Pre-Alpha
+Classifier: Intended Audience :: Developers
+Classifier: License :: OSI Approved :: MIT License
+Classifier: Programming Language :: Python :: 3.11
+Description-Content-Type: text/markdown

{markdown_analysis-0.0.4 → markdown_analysis-0.0.5}/markdown_analysis.egg-info/SOURCES.txt RENAMED Viewed

File without changes

{markdown_analysis-0.0.4 → markdown_analysis-0.0.5}/markdown_analysis.egg-info/dependency_links.txt RENAMED Viewed

File without changes

{markdown_analysis-0.0.4 → markdown_analysis-0.0.5}/markdown_analysis.egg-info/requires.txt RENAMED Viewed

@@ -1,2 +1,2 @@
-requests
 urllib3
+requests

{markdown_analysis-0.0.4 → markdown_analysis-0.0.5}/markdown_analysis.egg-info/top_level.txt RENAMED Viewed

File without changes

{markdown_analysis-0.0.4 → markdown_analysis-0.0.5}/mrkdwn_analysis/__init__.py RENAMED Viewed

File without changes

{markdown_analysis-0.0.4 → markdown_analysis-0.0.5}/mrkdwn_analysis/markdown_analyzer.py RENAMED Viewed

@@ -3,8 +3,8 @@ import requests
 from collections import defaultdict, Counter
 class MarkdownAnalyzer:
-    def __init__(self, file_path):
-        with open(file_path, 'r') as file:
+    def __init__(self, file_path, encoding='utf-8'):
+        with open(file_path, 'r', encoding=encoding) as file:
             self.lines = file.readlines()
     def identify_headers(self):
@@ -162,19 +162,10 @@ class MarkdownAnalyzer:
     def identify_tables(self):
         result = defaultdict(list)
-        table_pattern = re.compile(r'^ {0,3}\|(?P<table_head>.+)\|[ \t]*\n' +
-                                   r' {0,3}\|(?P<table_align> *[-:]+[-| :]*)\|[ \t]*\n' +
-                                   r'(?P<table_body>(?: {0,3}\|.*\|[ \t]*(?:\n|$))*)\n*')
-        nptable_pattern = re.compile(r'^ {0,3}(?P<nptable_head>\S.*\|.*)\n' +
-                                     r' {0,3}(?P<nptable_align>[-:]+ *\|[-| :]*)\n' +
-                                     r'(?P<nptable_body>(?:.*\|.*(?:\n|$))*)\n*')
-        text = "".join(self.lines)
-        matches_table = re.findall(table_pattern, text)
-        matches_nptable = re.findall(nptable_pattern, text)
-        for match in matches_table + matches_nptable:
-            result["Table"].append(match)
+        table_pattern = re.compile(r'^\|.*\|$', re.MULTILINE)
+        table_rows = table_pattern.findall("".join(self.lines))
+        for table_row in table_rows:
+            result["Table"].append(table_row.strip().split("|"))
         return dict(result)
     def identify_links(self):

{markdown_analysis-0.0.4 → markdown_analysis-0.0.5}/setup.cfg RENAMED Viewed

@@ -1,4 +1,4 @@
-[egg_info]
-tag_build =
-tag_date = 0
+[egg_info]
+tag_build =
+tag_date = 0

{markdown_analysis-0.0.4 → markdown_analysis-0.0.5}/setup.py RENAMED Viewed

@@ -6,7 +6,7 @@ with open("README.md", "r", encoding="utf-8") as fh:
 setup(
     name='markdown_analysis',
-    version='0.0.4',
+    version='0.0.5',
     long_description=long_description,
     long_description_content_type="text/markdown",
     author='yannbanas',

{markdown_analysis-0.0.4 → markdown_analysis-0.0.5}/test/__init__.py RENAMED Viewed

File without changes

markdown_analysis-0.0.4/PKG-INFO DELETED Viewed

@@ -1,140 +0,0 @@
-Metadata-Version: 2.1
-Name: markdown_analysis
-Version: 0.0.4
-Summary: UNKNOWN
-Home-page: https://github.com/yannbanas/mrkdwn_analysis
-Author: yannbanas
-Author-email: yannbanas@gmail.com
-License: UNKNOWN
-Platform: UNKNOWN
-Classifier: Development Status :: 2 - Pre-Alpha
-Classifier: Intended Audience :: Developers
-Classifier: License :: OSI Approved :: MIT License
-Classifier: Programming Language :: Python :: 3.11
-Description-Content-Type: text/markdown
-License-File: LICENSE
-# mrkdwn_analysis
-`mrkdwn_analysis` is a Python library designed to analyze Markdown files. With its powerful parsing capabilities, it can extract and categorize various elements within a Markdown document, including headers, sections, links, images, blockquotes, code blocks, and lists. This makes it a valuable tool for anyone looking to parse Markdown content for data analysis, content generation, or for building other tools that utilize Markdown.
-## Features
-- File Loading: The MarkdownAnalyzer can load any given Markdown file provided through the file path.
-- Header Identification: The tool can extract all headers from the markdown file, ranging from H1 to H6 tags. This allows users to have a quick overview of the document's structure.
-- Section Identification: The analyzer can recognize different sections of the document. It defines a section as a block of text followed by a line composed solely of = or - characters.
-- Paragraph Identification: The tool can distinguish between regular text and other elements such as lists, headers, etc., thereby identifying all the paragraphs present in the document.
-- Blockquote Identification: The analyzer can identify and extract all blockquotes in the markdown file.
-- Code Block Identification: The tool can extract all code blocks defined in the document, allowing you to separate the programming code from the regular text easily.
-- List Identification: The analyzer can identify both ordered and unordered lists in the markdown file, providing information about the hierarchical structure of the points.
-- Table Identification: The tool can identify and extract tables from the markdown file, enabling users to separate and analyze tabular data quickly.
-- Link Identification and Validation: The analyzer can identify all links present in the markdown file, categorizing them into text and image links. Moreover, it can also verify if these links are valid or broken.
-- Todo Identification: The tool is capable of recognizing and extracting todos (tasks or action items) present in the document.
-- Element Counting: The analyzer can count the total number of a specific element type in the file. This can help in quantifying the extent of different elements in the document.
-- Word Counting: The tool can count the total number of words in the file, providing an estimate of the document's length.
-- Character Counting: The analyzer can count the total number of characters (excluding spaces) in the file, giving a detailed measure of the document's size.
-## Installation
-You can install `mrkdwn_analysis` from PyPI:
-```bash
-pip install mrkdwn_analysis
-```
-We hope `mrkdwn_analysis` helps you with all your Markdown analyzing needs!
-## Usage
-Using `mrkdwn_analysis` is simple. Just import the `MarkdownAnalyzer` class, create an instance with your Markdown file, and you're good to go!
-```python
-from mrkdwn_analysis import MarkdownAnalyzer
-analyzer = MarkdownAnalyzer("path/to/your/markdown.md")
-headers = analyzer.identify_headers()
-sections = analyzer.identify_sections()
-...
-```
-### Class MarkdownAnalyzer
-The `MarkdownAnalyzer` class is designed to analyze Markdown files. It has the ability to extract and categorize various elements of a Markdown document.
-### `__init__(self, file_path)`
-The constructor of the class. It opens the specified Markdown file and stores its content line by line.
-- `file_path`: the path of the Markdown file to analyze.
-### `identify_headers(self)`
-Analyzes the file and identifies all headers (from h1 to h6). Headers are returned as a dictionary where the key is "Header" and the value is a list of all headers found.
-### `identify_sections(self)`
-Analyzes the file and identifies all sections. Sections are defined as a block of text followed by a line composed solely of `=` or `-` characters. Sections are returned as a dictionary where the key is "Section" and the value is a list of all sections found.
-### `identify_paragraphs(self)`
-Analyzes the file and identifies all paragraphs. Paragraphs are defined as a block of text that is not a header, list, blockquote, etc. Paragraphs are returned as a dictionary where the key is "Paragraph" and the value is a list of all paragraphs found.
-### `identify_blockquotes(self)`
-Analyzes the file and identifies all blockquotes. Blockquotes are defined by a line starting with the `>` character. Blockquotes are returned as a dictionary where the key is "Blockquote" and the value is a list of all blockquotes found.
-### `identify_code_blocks(self)`
-Analyzes the file and identifies all code blocks. Code blocks are defined by a block of text surrounded by lines containing only the "```" text. Code blocks are returned as a dictionary where the key is "Code block" and the value is a list of all code blocks found.
-### `identify_ordered_lists(self)`
-Analyzes the file and identifies all ordered lists. Ordered lists are defined by lines starting with a number followed by a dot. Ordered lists are returned as a dictionary where the key is "Ordered list" and the value is a list of all ordered lists found.
-### `identify_unordered_lists(self)`
-Analyzes the file and identifies all unordered lists. Unordered lists are defined by lines starting with a `-`, `*`, or `+`. Unordered lists are returned as a dictionary where the key is "Unordered list" and the value is a list of all unordered lists found.
-### `identify_tables(self)`
-Analyzes the file and identifies all tables. Tables are defined by lines containing `|` to delimit cells and are separated by lines containing `-` to define the borders. Tables are returned as a dictionary where the key is "Table" and the value is a list of all tables found.
-### `identify_links(self)`
-Analyzes the file and identifies all links. Links are defined by the format `[text](url)`. Links are returned as a dictionary where the keys are "Text link" and "Image link" and the values are lists of all links found.
-### `check_links(self)`
-Checks all links identified by `identify_links` to see if they are broken (return a 404 error). Broken links are returned as a list, each item being a dictionary containing the line number, link text, and URL.
-### `identify_todos(self)`
-Analyzes the file and identifies all todos. Todos are defined by lines starting with `- [ ] `. Todos are returned as a list, each item being a dictionary containing the line number and todo text.
-### `count_elements(self, element_type)`
-Counts the total number of a specific element type in the file. The `element_type` should match the name of one of the identification methods (for example, "headers" for `identify_headers`). Returns the total number of elements of this type.
-### `count_words(self)`
-Counts the total number of words in the file. Returns the word count.
-### `count_characters(self)`
-Counts the total number of characters (excluding spaces) in the file. Returns the character count.
-## Contributions
-Contributions are always welcome! If you have a feature request, bug report, or just want to improve the code, feel free to create a pull request or open an issue.

markdown_analysis-0.0.4/markdown_analysis.egg-info/PKG-INFO DELETED Viewed

@@ -1,140 +0,0 @@
-Metadata-Version: 2.1
-Name: markdown-analysis
-Version: 0.0.4
-Summary: UNKNOWN
-Home-page: https://github.com/yannbanas/mrkdwn_analysis
-Author: yannbanas
-Author-email: yannbanas@gmail.com
-License: UNKNOWN
-Platform: UNKNOWN
-Classifier: Development Status :: 2 - Pre-Alpha
-Classifier: Intended Audience :: Developers
-Classifier: License :: OSI Approved :: MIT License
-Classifier: Programming Language :: Python :: 3.11
-Description-Content-Type: text/markdown
-License-File: LICENSE
-# mrkdwn_analysis
-`mrkdwn_analysis` is a Python library designed to analyze Markdown files. With its powerful parsing capabilities, it can extract and categorize various elements within a Markdown document, including headers, sections, links, images, blockquotes, code blocks, and lists. This makes it a valuable tool for anyone looking to parse Markdown content for data analysis, content generation, or for building other tools that utilize Markdown.
-## Features
-- File Loading: The MarkdownAnalyzer can load any given Markdown file provided through the file path.
-- Header Identification: The tool can extract all headers from the markdown file, ranging from H1 to H6 tags. This allows users to have a quick overview of the document's structure.
-- Section Identification: The analyzer can recognize different sections of the document. It defines a section as a block of text followed by a line composed solely of = or - characters.
-- Paragraph Identification: The tool can distinguish between regular text and other elements such as lists, headers, etc., thereby identifying all the paragraphs present in the document.
-- Blockquote Identification: The analyzer can identify and extract all blockquotes in the markdown file.
-- Code Block Identification: The tool can extract all code blocks defined in the document, allowing you to separate the programming code from the regular text easily.
-- List Identification: The analyzer can identify both ordered and unordered lists in the markdown file, providing information about the hierarchical structure of the points.
-- Table Identification: The tool can identify and extract tables from the markdown file, enabling users to separate and analyze tabular data quickly.
-- Link Identification and Validation: The analyzer can identify all links present in the markdown file, categorizing them into text and image links. Moreover, it can also verify if these links are valid or broken.
-- Todo Identification: The tool is capable of recognizing and extracting todos (tasks or action items) present in the document.
-- Element Counting: The analyzer can count the total number of a specific element type in the file. This can help in quantifying the extent of different elements in the document.
-- Word Counting: The tool can count the total number of words in the file, providing an estimate of the document's length.
-- Character Counting: The analyzer can count the total number of characters (excluding spaces) in the file, giving a detailed measure of the document's size.
-## Installation
-You can install `mrkdwn_analysis` from PyPI:
-```bash
-pip install mrkdwn_analysis
-```
-We hope `mrkdwn_analysis` helps you with all your Markdown analyzing needs!
-## Usage
-Using `mrkdwn_analysis` is simple. Just import the `MarkdownAnalyzer` class, create an instance with your Markdown file, and you're good to go!
-```python
-from mrkdwn_analysis import MarkdownAnalyzer
-analyzer = MarkdownAnalyzer("path/to/your/markdown.md")
-headers = analyzer.identify_headers()
-sections = analyzer.identify_sections()
-...
-```
-### Class MarkdownAnalyzer
-The `MarkdownAnalyzer` class is designed to analyze Markdown files. It has the ability to extract and categorize various elements of a Markdown document.
-### `__init__(self, file_path)`
-The constructor of the class. It opens the specified Markdown file and stores its content line by line.
-- `file_path`: the path of the Markdown file to analyze.
-### `identify_headers(self)`
-Analyzes the file and identifies all headers (from h1 to h6). Headers are returned as a dictionary where the key is "Header" and the value is a list of all headers found.
-### `identify_sections(self)`
-Analyzes the file and identifies all sections. Sections are defined as a block of text followed by a line composed solely of `=` or `-` characters. Sections are returned as a dictionary where the key is "Section" and the value is a list of all sections found.
-### `identify_paragraphs(self)`
-Analyzes the file and identifies all paragraphs. Paragraphs are defined as a block of text that is not a header, list, blockquote, etc. Paragraphs are returned as a dictionary where the key is "Paragraph" and the value is a list of all paragraphs found.
-### `identify_blockquotes(self)`
-Analyzes the file and identifies all blockquotes. Blockquotes are defined by a line starting with the `>` character. Blockquotes are returned as a dictionary where the key is "Blockquote" and the value is a list of all blockquotes found.
-### `identify_code_blocks(self)`
-Analyzes the file and identifies all code blocks. Code blocks are defined by a block of text surrounded by lines containing only the "```" text. Code blocks are returned as a dictionary where the key is "Code block" and the value is a list of all code blocks found.
-### `identify_ordered_lists(self)`
-Analyzes the file and identifies all ordered lists. Ordered lists are defined by lines starting with a number followed by a dot. Ordered lists are returned as a dictionary where the key is "Ordered list" and the value is a list of all ordered lists found.
-### `identify_unordered_lists(self)`
-Analyzes the file and identifies all unordered lists. Unordered lists are defined by lines starting with a `-`, `*`, or `+`. Unordered lists are returned as a dictionary where the key is "Unordered list" and the value is a list of all unordered lists found.
-### `identify_tables(self)`
-Analyzes the file and identifies all tables. Tables are defined by lines containing `|` to delimit cells and are separated by lines containing `-` to define the borders. Tables are returned as a dictionary where the key is "Table" and the value is a list of all tables found.
-### `identify_links(self)`
-Analyzes the file and identifies all links. Links are defined by the format `[text](url)`. Links are returned as a dictionary where the keys are "Text link" and "Image link" and the values are lists of all links found.
-### `check_links(self)`
-Checks all links identified by `identify_links` to see if they are broken (return a 404 error). Broken links are returned as a list, each item being a dictionary containing the line number, link text, and URL.
-### `identify_todos(self)`
-Analyzes the file and identifies all todos. Todos are defined by lines starting with `- [ ] `. Todos are returned as a list, each item being a dictionary containing the line number and todo text.
-### `count_elements(self, element_type)`
-Counts the total number of a specific element type in the file. The `element_type` should match the name of one of the identification methods (for example, "headers" for `identify_headers`). Returns the total number of elements of this type.
-### `count_words(self)`
-Counts the total number of words in the file. Returns the word count.
-### `count_characters(self)`
-Counts the total number of characters (excluding spaces) in the file. Returns the character count.
-## Contributions
-Contributions are always welcome! If you have a feature request, bug report, or just want to improve the code, feel free to create a pull request or open an issue.

markdown-analysis 0.0.4__tar.gz → 0.0.5__tar.gz

markdown-analysis 0.0.4tar.gz → 0.0.5tar.gz