markdown-analysis 0.0.5__tar.gz → 0.1.0__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,204 @@
1
+ Metadata-Version: 2.1
2
+ Name: markdown_analysis
3
+ Version: 0.1.0
4
+ Summary: UNKNOWN
5
+ Home-page: https://github.com/yannbanas/mrkdwn_analysis
6
+ Author: yannbanas
7
+ Author-email: yannbanas@gmail.com
8
+ License: UNKNOWN
9
+ Platform: UNKNOWN
10
+ Classifier: Development Status :: 2 - Pre-Alpha
11
+ Classifier: Intended Audience :: Developers
12
+ Classifier: License :: OSI Approved :: MIT License
13
+ Classifier: Programming Language :: Python :: 3.11
14
+ Description-Content-Type: text/markdown
15
+ License-File: LICENSE
16
+
17
+ # mrkdwn_analysis
18
+
19
+ `mrkdwn_analysis` is a powerful Python library designed to analyze Markdown files. It provides extensive parsing capabilities to extract and categorize various elements within a Markdown document, including headers, sections, links, images, blockquotes, code blocks, lists, tables, tasks (todos), footnotes, and even embedded HTML. This makes it a versatile tool for data analysis, content generation, or building other tools that work with Markdown.
20
+
21
+ ## Features
22
+
23
+ - **File Loading**: Load any given Markdown file by providing its file path.
24
+
25
+ - **Header Detection**: Identify all headers (ATX `#` to `######`, and Setext `===` and `---`) in the document, giving you a quick overview of its structure.
26
+
27
+ - **Section Identification (Setext)**: Recognize sections defined by a block of text followed by `=` or `-` lines, helping you understand the document’s conceptual divisions.
28
+
29
+ - **Paragraph Extraction**: Distinguish regular text (paragraphs) from structured elements like headers, lists, or code blocks, making it easy to isolate the body content.
30
+
31
+ - **Blockquote Identification**: Extract all blockquotes defined by lines starting with `>`.
32
+
33
+ - **Code Block Extraction**: Detect fenced code blocks delimited by triple backticks (```), optionally retrieve their language, and separate programming code from regular text.
34
+
35
+ - **List Recognition**: Identify both ordered and unordered lists, including task lists (`- [ ]`, `- [x]`), and understand their structure and hierarchy.
36
+
37
+ - **Tables (GFM)**: Detect GitHub-Flavored Markdown tables, parse their headers and rows, and separate structured tabular data for further analysis.
38
+
39
+ - **Links and Images**: Identify text links (`[text](url)`) and images (`![alt](url)`), as well as reference-style links. This is useful for link validation or content analysis.
40
+
41
+ - **Footnotes**: Extract and handle Markdown footnotes (`[^note1]`), providing a way to process reference notes in the document.
42
+
43
+ - **HTML Blocks and Inline HTML**: Handle HTML blocks (`<div>...</div>`) as a single element, and detect inline HTML elements (`<span style="...">... </span>`) as a unified component.
44
+
45
+ - **Front Matter**: If present, extract YAML front matter at the start of the file.
46
+
47
+ - **Counting Elements**: Count how many occurrences of a certain element type (e.g., how many headers, code blocks, etc.).
48
+
49
+ - **Textual Statistics**: Count the number of words and characters (excluding whitespace). Get a global summary (`analyse()`) of the document’s composition.
50
+
51
+ ## Installation
52
+
53
+ Install `mrkdwn_analysis` from PyPI:
54
+
55
+ ```bash
56
+ pip install markdown-analysis
57
+ ```
58
+
59
+ ## Usage
60
+
61
+ Using `mrkdwn_analysis` is straightforward. Import `MarkdownAnalyzer`, create an instance with your Markdown file path, and then call the various methods to extract the elements you need.
62
+
63
+ ```python
64
+ from mrkdwn_analysis import MarkdownAnalyzer
65
+
66
+ analyzer = MarkdownAnalyzer("path/to/document.md")
67
+
68
+ headers = analyzer.identify_headers()
69
+ paragraphs = analyzer.identify_paragraphs()
70
+ links = analyzer.identify_links()
71
+ ...
72
+ ```
73
+
74
+ ### Example
75
+
76
+ Consider `example.md`:
77
+
78
+ ```markdown
79
+ ---
80
+ title: "Python 3.11 Report"
81
+ author: "John Doe"
82
+ date: "2024-01-15"
83
+ ---
84
+
85
+ Python 3.11
86
+ ===========
87
+
88
+ A major **Python** release with significant improvements...
89
+
90
+ ### Performance Details
91
+
92
+ ```python
93
+ import math
94
+ print(math.factorial(10))
95
+ ```
96
+
97
+ > *Quote*: "Python 3.11 brings the speed we needed"
98
+
99
+ <div class="note">
100
+ <p>HTML block example</p>
101
+ </div>
102
+
103
+ This paragraph contains inline HTML: <span style="color:red;">Red text</span>.
104
+
105
+ - Unordered list:
106
+ - A basic point
107
+ - [ ] A task to do
108
+ - [x] A completed task
109
+
110
+ 1. Ordered list item 1
111
+ 2. Ordered list item 2
112
+ ```
113
+
114
+ After analysis:
115
+
116
+ ```python
117
+ analyzer = MarkdownAnalyzer("example.md")
118
+
119
+ print(analyzer.identify_headers())
120
+ # {"Header": [{"line": X, "level": 1, "text": "Python 3.11"}, {"line": Y, "level": 3, "text": "Performance Details"}]}
121
+
122
+ print(analyzer.identify_paragraphs())
123
+ # {"Paragraph": ["A major **Python** release ...", "This paragraph contains inline HTML: ..."]}
124
+
125
+ print(analyzer.identify_html_blocks())
126
+ # [{"line": Z, "content": "<div class=\"note\">\n <p>HTML block example</p>\n</div>"}]
127
+
128
+ print(analyzer.identify_html_inline())
129
+ # [{"line": W, "html": "<span style=\"color:red;\">Red text</span>"}]
130
+
131
+ print(analyzer.identify_lists())
132
+ # {
133
+ # "Ordered list": [["Ordered list item 1", "Ordered list item 2"]],
134
+ # "Unordered list": [["A basic point", "A task to do [Task]", "A completed task [Task done]"]]
135
+ # }
136
+
137
+ print(analyzer.identify_code_blocks())
138
+ # {"Code block": [{"start_line": X, "content": "import math\nprint(math.factorial(10))", "language": "python"}]}
139
+
140
+ print(analyzer.analyse())
141
+ # {
142
+ # 'headers': 2,
143
+ # 'paragraphs': 2,
144
+ # 'blockquotes': 1,
145
+ # 'code_blocks': 1,
146
+ # 'ordered_lists': 2,
147
+ # 'unordered_lists': 3,
148
+ # 'tables': 0,
149
+ # 'html_blocks': 1,
150
+ # 'html_inline_count': 1,
151
+ # 'words': 42,
152
+ # 'characters': 250
153
+ # }
154
+ ```
155
+
156
+ ### Key Methods
157
+
158
+ - `__init__(self, file_path)`: Load the Markdown file.
159
+ - `identify_headers()`: Returns all headers.
160
+ - `identify_sections()`: Returns setext sections.
161
+ - `identify_paragraphs()`: Returns paragraphs.
162
+ - `identify_blockquotes()`: Returns blockquotes.
163
+ - `identify_code_blocks()`: Returns code blocks with content and language.
164
+ - `identify_lists()`: Returns both ordered and unordered lists (including tasks).
165
+ - `identify_tables()`: Returns any GFM tables.
166
+ - `identify_links()`: Returns text and image links.
167
+ - `identify_footnotes()`: Returns footnotes used in the document.
168
+ - `identify_html_blocks()`: Returns HTML blocks as single tokens.
169
+ - `identify_html_inline()`: Returns inline HTML elements.
170
+ - `identify_todos()`: Returns task items.
171
+ - `count_elements(element_type)`: Counts occurrences of a specific element type.
172
+ - `count_words()`: Counts words in the entire document.
173
+ - `count_characters()`: Counts non-whitespace characters.
174
+ - `analyse()`: Provides a global summary (headers count, paragraphs count, etc.).
175
+
176
+ ### Checking and Validating Links
177
+
178
+ - `check_links()`: Validates text links to see if they are broken (e.g., non-200 status) and returns a list of broken links.
179
+
180
+ ### Global Analysis Example
181
+
182
+ ```python
183
+ analysis = analyzer.analyse()
184
+ print(analysis)
185
+ # {
186
+ # 'headers': X,
187
+ # 'paragraphs': Y,
188
+ # 'blockquotes': Z,
189
+ # 'code_blocks': A,
190
+ # 'ordered_lists': B,
191
+ # 'unordered_lists': C,
192
+ # 'tables': D,
193
+ # 'html_blocks': E,
194
+ # 'html_inline_count': F,
195
+ # 'words': G,
196
+ # 'characters': H
197
+ # }
198
+ ```
199
+
200
+ ## Contributing
201
+
202
+ Contributions are welcome! Feel free to open an issue or submit a pull request for bug reports, feature requests, or code improvements. Your input helps make `mrkdwn_analysis` more robust and versatile.
203
+
204
+
@@ -0,0 +1,186 @@
1
+ # mrkdwn_analysis
2
+
3
+ `mrkdwn_analysis` is a powerful Python library designed to analyze Markdown files. It provides extensive parsing capabilities to extract and categorize various elements within a Markdown document, including headers, sections, links, images, blockquotes, code blocks, lists, tables, tasks (todos), footnotes, and even embedded HTML. This makes it a versatile tool for data analysis, content generation, or building other tools that work with Markdown.
4
+
5
+ ## Features
6
+
7
+ - **File Loading**: Load any given Markdown file by providing its file path.
8
+
9
+ - **Header Detection**: Identify all headers (ATX `#` to `######`, and Setext `===` and `---`) in the document, giving you a quick overview of its structure.
10
+
11
+ - **Section Identification (Setext)**: Recognize sections defined by a block of text followed by `=` or `-` lines, helping you understand the document’s conceptual divisions.
12
+
13
+ - **Paragraph Extraction**: Distinguish regular text (paragraphs) from structured elements like headers, lists, or code blocks, making it easy to isolate the body content.
14
+
15
+ - **Blockquote Identification**: Extract all blockquotes defined by lines starting with `>`.
16
+
17
+ - **Code Block Extraction**: Detect fenced code blocks delimited by triple backticks (```), optionally retrieve their language, and separate programming code from regular text.
18
+
19
+ - **List Recognition**: Identify both ordered and unordered lists, including task lists (`- [ ]`, `- [x]`), and understand their structure and hierarchy.
20
+
21
+ - **Tables (GFM)**: Detect GitHub-Flavored Markdown tables, parse their headers and rows, and separate structured tabular data for further analysis.
22
+
23
+ - **Links and Images**: Identify text links (`[text](url)`) and images (`![alt](url)`), as well as reference-style links. This is useful for link validation or content analysis.
24
+
25
+ - **Footnotes**: Extract and handle Markdown footnotes (`[^note1]`), providing a way to process reference notes in the document.
26
+
27
+ - **HTML Blocks and Inline HTML**: Handle HTML blocks (`<div>...</div>`) as a single element, and detect inline HTML elements (`<span style="...">... </span>`) as a unified component.
28
+
29
+ - **Front Matter**: If present, extract YAML front matter at the start of the file.
30
+
31
+ - **Counting Elements**: Count how many occurrences of a certain element type (e.g., how many headers, code blocks, etc.).
32
+
33
+ - **Textual Statistics**: Count the number of words and characters (excluding whitespace). Get a global summary (`analyse()`) of the document’s composition.
34
+
35
+ ## Installation
36
+
37
+ Install `mrkdwn_analysis` from PyPI:
38
+
39
+ ```bash
40
+ pip install markdown-analysis
41
+ ```
42
+
43
+ ## Usage
44
+
45
+ Using `mrkdwn_analysis` is straightforward. Import `MarkdownAnalyzer`, create an instance with your Markdown file path, and then call the various methods to extract the elements you need.
46
+
47
+ ```python
48
+ from mrkdwn_analysis import MarkdownAnalyzer
49
+
50
+ analyzer = MarkdownAnalyzer("path/to/document.md")
51
+
52
+ headers = analyzer.identify_headers()
53
+ paragraphs = analyzer.identify_paragraphs()
54
+ links = analyzer.identify_links()
55
+ ...
56
+ ```
57
+
58
+ ### Example
59
+
60
+ Consider `example.md`:
61
+
62
+ ```markdown
63
+ ---
64
+ title: "Python 3.11 Report"
65
+ author: "John Doe"
66
+ date: "2024-01-15"
67
+ ---
68
+
69
+ Python 3.11
70
+ ===========
71
+
72
+ A major **Python** release with significant improvements...
73
+
74
+ ### Performance Details
75
+
76
+ ```python
77
+ import math
78
+ print(math.factorial(10))
79
+ ```
80
+
81
+ > *Quote*: "Python 3.11 brings the speed we needed"
82
+
83
+ <div class="note">
84
+ <p>HTML block example</p>
85
+ </div>
86
+
87
+ This paragraph contains inline HTML: <span style="color:red;">Red text</span>.
88
+
89
+ - Unordered list:
90
+ - A basic point
91
+ - [ ] A task to do
92
+ - [x] A completed task
93
+
94
+ 1. Ordered list item 1
95
+ 2. Ordered list item 2
96
+ ```
97
+
98
+ After analysis:
99
+
100
+ ```python
101
+ analyzer = MarkdownAnalyzer("example.md")
102
+
103
+ print(analyzer.identify_headers())
104
+ # {"Header": [{"line": X, "level": 1, "text": "Python 3.11"}, {"line": Y, "level": 3, "text": "Performance Details"}]}
105
+
106
+ print(analyzer.identify_paragraphs())
107
+ # {"Paragraph": ["A major **Python** release ...", "This paragraph contains inline HTML: ..."]}
108
+
109
+ print(analyzer.identify_html_blocks())
110
+ # [{"line": Z, "content": "<div class=\"note\">\n <p>HTML block example</p>\n</div>"}]
111
+
112
+ print(analyzer.identify_html_inline())
113
+ # [{"line": W, "html": "<span style=\"color:red;\">Red text</span>"}]
114
+
115
+ print(analyzer.identify_lists())
116
+ # {
117
+ # "Ordered list": [["Ordered list item 1", "Ordered list item 2"]],
118
+ # "Unordered list": [["A basic point", "A task to do [Task]", "A completed task [Task done]"]]
119
+ # }
120
+
121
+ print(analyzer.identify_code_blocks())
122
+ # {"Code block": [{"start_line": X, "content": "import math\nprint(math.factorial(10))", "language": "python"}]}
123
+
124
+ print(analyzer.analyse())
125
+ # {
126
+ # 'headers': 2,
127
+ # 'paragraphs': 2,
128
+ # 'blockquotes': 1,
129
+ # 'code_blocks': 1,
130
+ # 'ordered_lists': 2,
131
+ # 'unordered_lists': 3,
132
+ # 'tables': 0,
133
+ # 'html_blocks': 1,
134
+ # 'html_inline_count': 1,
135
+ # 'words': 42,
136
+ # 'characters': 250
137
+ # }
138
+ ```
139
+
140
+ ### Key Methods
141
+
142
+ - `__init__(self, file_path)`: Load the Markdown file.
143
+ - `identify_headers()`: Returns all headers.
144
+ - `identify_sections()`: Returns setext sections.
145
+ - `identify_paragraphs()`: Returns paragraphs.
146
+ - `identify_blockquotes()`: Returns blockquotes.
147
+ - `identify_code_blocks()`: Returns code blocks with content and language.
148
+ - `identify_lists()`: Returns both ordered and unordered lists (including tasks).
149
+ - `identify_tables()`: Returns any GFM tables.
150
+ - `identify_links()`: Returns text and image links.
151
+ - `identify_footnotes()`: Returns footnotes used in the document.
152
+ - `identify_html_blocks()`: Returns HTML blocks as single tokens.
153
+ - `identify_html_inline()`: Returns inline HTML elements.
154
+ - `identify_todos()`: Returns task items.
155
+ - `count_elements(element_type)`: Counts occurrences of a specific element type.
156
+ - `count_words()`: Counts words in the entire document.
157
+ - `count_characters()`: Counts non-whitespace characters.
158
+ - `analyse()`: Provides a global summary (headers count, paragraphs count, etc.).
159
+
160
+ ### Checking and Validating Links
161
+
162
+ - `check_links()`: Validates text links to see if they are broken (e.g., non-200 status) and returns a list of broken links.
163
+
164
+ ### Global Analysis Example
165
+
166
+ ```python
167
+ analysis = analyzer.analyse()
168
+ print(analysis)
169
+ # {
170
+ # 'headers': X,
171
+ # 'paragraphs': Y,
172
+ # 'blockquotes': Z,
173
+ # 'code_blocks': A,
174
+ # 'ordered_lists': B,
175
+ # 'unordered_lists': C,
176
+ # 'tables': D,
177
+ # 'html_blocks': E,
178
+ # 'html_inline_count': F,
179
+ # 'words': G,
180
+ # 'characters': H
181
+ # }
182
+ ```
183
+
184
+ ## Contributing
185
+
186
+ Contributions are welcome! Feel free to open an issue or submit a pull request for bug reports, feature requests, or code improvements. Your input helps make `mrkdwn_analysis` more robust and versatile.
@@ -0,0 +1,204 @@
1
+ Metadata-Version: 2.1
2
+ Name: markdown-analysis
3
+ Version: 0.1.0
4
+ Summary: UNKNOWN
5
+ Home-page: https://github.com/yannbanas/mrkdwn_analysis
6
+ Author: yannbanas
7
+ Author-email: yannbanas@gmail.com
8
+ License: UNKNOWN
9
+ Platform: UNKNOWN
10
+ Classifier: Development Status :: 2 - Pre-Alpha
11
+ Classifier: Intended Audience :: Developers
12
+ Classifier: License :: OSI Approved :: MIT License
13
+ Classifier: Programming Language :: Python :: 3.11
14
+ Description-Content-Type: text/markdown
15
+ License-File: LICENSE
16
+
17
+ # mrkdwn_analysis
18
+
19
+ `mrkdwn_analysis` is a powerful Python library designed to analyze Markdown files. It provides extensive parsing capabilities to extract and categorize various elements within a Markdown document, including headers, sections, links, images, blockquotes, code blocks, lists, tables, tasks (todos), footnotes, and even embedded HTML. This makes it a versatile tool for data analysis, content generation, or building other tools that work with Markdown.
20
+
21
+ ## Features
22
+
23
+ - **File Loading**: Load any given Markdown file by providing its file path.
24
+
25
+ - **Header Detection**: Identify all headers (ATX `#` to `######`, and Setext `===` and `---`) in the document, giving you a quick overview of its structure.
26
+
27
+ - **Section Identification (Setext)**: Recognize sections defined by a block of text followed by `=` or `-` lines, helping you understand the document’s conceptual divisions.
28
+
29
+ - **Paragraph Extraction**: Distinguish regular text (paragraphs) from structured elements like headers, lists, or code blocks, making it easy to isolate the body content.
30
+
31
+ - **Blockquote Identification**: Extract all blockquotes defined by lines starting with `>`.
32
+
33
+ - **Code Block Extraction**: Detect fenced code blocks delimited by triple backticks (```), optionally retrieve their language, and separate programming code from regular text.
34
+
35
+ - **List Recognition**: Identify both ordered and unordered lists, including task lists (`- [ ]`, `- [x]`), and understand their structure and hierarchy.
36
+
37
+ - **Tables (GFM)**: Detect GitHub-Flavored Markdown tables, parse their headers and rows, and separate structured tabular data for further analysis.
38
+
39
+ - **Links and Images**: Identify text links (`[text](url)`) and images (`![alt](url)`), as well as reference-style links. This is useful for link validation or content analysis.
40
+
41
+ - **Footnotes**: Extract and handle Markdown footnotes (`[^note1]`), providing a way to process reference notes in the document.
42
+
43
+ - **HTML Blocks and Inline HTML**: Handle HTML blocks (`<div>...</div>`) as a single element, and detect inline HTML elements (`<span style="...">... </span>`) as a unified component.
44
+
45
+ - **Front Matter**: If present, extract YAML front matter at the start of the file.
46
+
47
+ - **Counting Elements**: Count how many occurrences of a certain element type (e.g., how many headers, code blocks, etc.).
48
+
49
+ - **Textual Statistics**: Count the number of words and characters (excluding whitespace). Get a global summary (`analyse()`) of the document’s composition.
50
+
51
+ ## Installation
52
+
53
+ Install `mrkdwn_analysis` from PyPI:
54
+
55
+ ```bash
56
+ pip install markdown-analysis
57
+ ```
58
+
59
+ ## Usage
60
+
61
+ Using `mrkdwn_analysis` is straightforward. Import `MarkdownAnalyzer`, create an instance with your Markdown file path, and then call the various methods to extract the elements you need.
62
+
63
+ ```python
64
+ from mrkdwn_analysis import MarkdownAnalyzer
65
+
66
+ analyzer = MarkdownAnalyzer("path/to/document.md")
67
+
68
+ headers = analyzer.identify_headers()
69
+ paragraphs = analyzer.identify_paragraphs()
70
+ links = analyzer.identify_links()
71
+ ...
72
+ ```
73
+
74
+ ### Example
75
+
76
+ Consider `example.md`:
77
+
78
+ ```markdown
79
+ ---
80
+ title: "Python 3.11 Report"
81
+ author: "John Doe"
82
+ date: "2024-01-15"
83
+ ---
84
+
85
+ Python 3.11
86
+ ===========
87
+
88
+ A major **Python** release with significant improvements...
89
+
90
+ ### Performance Details
91
+
92
+ ```python
93
+ import math
94
+ print(math.factorial(10))
95
+ ```
96
+
97
+ > *Quote*: "Python 3.11 brings the speed we needed"
98
+
99
+ <div class="note">
100
+ <p>HTML block example</p>
101
+ </div>
102
+
103
+ This paragraph contains inline HTML: <span style="color:red;">Red text</span>.
104
+
105
+ - Unordered list:
106
+ - A basic point
107
+ - [ ] A task to do
108
+ - [x] A completed task
109
+
110
+ 1. Ordered list item 1
111
+ 2. Ordered list item 2
112
+ ```
113
+
114
+ After analysis:
115
+
116
+ ```python
117
+ analyzer = MarkdownAnalyzer("example.md")
118
+
119
+ print(analyzer.identify_headers())
120
+ # {"Header": [{"line": X, "level": 1, "text": "Python 3.11"}, {"line": Y, "level": 3, "text": "Performance Details"}]}
121
+
122
+ print(analyzer.identify_paragraphs())
123
+ # {"Paragraph": ["A major **Python** release ...", "This paragraph contains inline HTML: ..."]}
124
+
125
+ print(analyzer.identify_html_blocks())
126
+ # [{"line": Z, "content": "<div class=\"note\">\n <p>HTML block example</p>\n</div>"}]
127
+
128
+ print(analyzer.identify_html_inline())
129
+ # [{"line": W, "html": "<span style=\"color:red;\">Red text</span>"}]
130
+
131
+ print(analyzer.identify_lists())
132
+ # {
133
+ # "Ordered list": [["Ordered list item 1", "Ordered list item 2"]],
134
+ # "Unordered list": [["A basic point", "A task to do [Task]", "A completed task [Task done]"]]
135
+ # }
136
+
137
+ print(analyzer.identify_code_blocks())
138
+ # {"Code block": [{"start_line": X, "content": "import math\nprint(math.factorial(10))", "language": "python"}]}
139
+
140
+ print(analyzer.analyse())
141
+ # {
142
+ # 'headers': 2,
143
+ # 'paragraphs': 2,
144
+ # 'blockquotes': 1,
145
+ # 'code_blocks': 1,
146
+ # 'ordered_lists': 2,
147
+ # 'unordered_lists': 3,
148
+ # 'tables': 0,
149
+ # 'html_blocks': 1,
150
+ # 'html_inline_count': 1,
151
+ # 'words': 42,
152
+ # 'characters': 250
153
+ # }
154
+ ```
155
+
156
+ ### Key Methods
157
+
158
+ - `__init__(self, file_path)`: Load the Markdown file.
159
+ - `identify_headers()`: Returns all headers.
160
+ - `identify_sections()`: Returns setext sections.
161
+ - `identify_paragraphs()`: Returns paragraphs.
162
+ - `identify_blockquotes()`: Returns blockquotes.
163
+ - `identify_code_blocks()`: Returns code blocks with content and language.
164
+ - `identify_lists()`: Returns both ordered and unordered lists (including tasks).
165
+ - `identify_tables()`: Returns any GFM tables.
166
+ - `identify_links()`: Returns text and image links.
167
+ - `identify_footnotes()`: Returns footnotes used in the document.
168
+ - `identify_html_blocks()`: Returns HTML blocks as single tokens.
169
+ - `identify_html_inline()`: Returns inline HTML elements.
170
+ - `identify_todos()`: Returns task items.
171
+ - `count_elements(element_type)`: Counts occurrences of a specific element type.
172
+ - `count_words()`: Counts words in the entire document.
173
+ - `count_characters()`: Counts non-whitespace characters.
174
+ - `analyse()`: Provides a global summary (headers count, paragraphs count, etc.).
175
+
176
+ ### Checking and Validating Links
177
+
178
+ - `check_links()`: Validates text links to see if they are broken (e.g., non-200 status) and returns a list of broken links.
179
+
180
+ ### Global Analysis Example
181
+
182
+ ```python
183
+ analysis = analyzer.analyse()
184
+ print(analysis)
185
+ # {
186
+ # 'headers': X,
187
+ # 'paragraphs': Y,
188
+ # 'blockquotes': Z,
189
+ # 'code_blocks': A,
190
+ # 'ordered_lists': B,
191
+ # 'unordered_lists': C,
192
+ # 'tables': D,
193
+ # 'html_blocks': E,
194
+ # 'html_inline_count': F,
195
+ # 'words': G,
196
+ # 'characters': H
197
+ # }
198
+ ```
199
+
200
+ ## Contributing
201
+
202
+ Contributions are welcome! Feel free to open an issue or submit a pull request for bug reports, feature requests, or code improvements. Your input helps make `mrkdwn_analysis` more robust and versatile.
203
+
204
+