markdown-analysis 0.0.5__tar.gz → 0.1.1__tar.gz

Sign up to get free protection for your applications and to get access to all the features.
@@ -0,0 +1,204 @@
1
+ Metadata-Version: 2.1
2
+ Name: markdown_analysis
3
+ Version: 0.1.1
4
+ Summary: UNKNOWN
5
+ Home-page: https://github.com/yannbanas/mrkdwn_analysis
6
+ Author: yannbanas
7
+ Author-email: yannbanas@gmail.com
8
+ License: UNKNOWN
9
+ Platform: UNKNOWN
10
+ Classifier: Development Status :: 2 - Pre-Alpha
11
+ Classifier: Intended Audience :: Developers
12
+ Classifier: License :: OSI Approved :: MIT License
13
+ Classifier: Programming Language :: Python :: 3.11
14
+ Description-Content-Type: text/markdown
15
+ License-File: LICENSE
16
+
17
+ # mrkdwn_analysis
18
+
19
+ `mrkdwn_analysis` is a powerful Python library designed to analyze Markdown files. It provides extensive parsing capabilities to extract and categorize various elements within a Markdown document, including headers, sections, links, images, blockquotes, code blocks, lists, tables, tasks (todos), footnotes, and even embedded HTML. This makes it a versatile tool for data analysis, content generation, or building other tools that work with Markdown.
20
+
21
+ ## Features
22
+
23
+ - **File Loading**: Load any given Markdown file by providing its file path.
24
+
25
+ - **Header Detection**: Identify all headers (ATX `#` to `######`, and Setext `===` and `---`) in the document, giving you a quick overview of its structure.
26
+
27
+ - **Section Identification (Setext)**: Recognize sections defined by a block of text followed by `=` or `-` lines, helping you understand the document’s conceptual divisions.
28
+
29
+ - **Paragraph Extraction**: Distinguish regular text (paragraphs) from structured elements like headers, lists, or code blocks, making it easy to isolate the body content.
30
+
31
+ - **Blockquote Identification**: Extract all blockquotes defined by lines starting with `>`.
32
+
33
+ - **Code Block Extraction**: Detect fenced code blocks delimited by triple backticks (```), optionally retrieve their language, and separate programming code from regular text.
34
+
35
+ - **List Recognition**: Identify both ordered and unordered lists, including task lists (`- [ ]`, `- [x]`), and understand their structure and hierarchy.
36
+
37
+ - **Tables (GFM)**: Detect GitHub-Flavored Markdown tables, parse their headers and rows, and separate structured tabular data for further analysis.
38
+
39
+ - **Links and Images**: Identify text links (`[text](url)`) and images (`![alt](url)`), as well as reference-style links. This is useful for link validation or content analysis.
40
+
41
+ - **Footnotes**: Extract and handle Markdown footnotes (`[^note1]`), providing a way to process reference notes in the document.
42
+
43
+ - **HTML Blocks and Inline HTML**: Handle HTML blocks (`<div>...</div>`) as a single element, and detect inline HTML elements (`<span style="...">... </span>`) as a unified component.
44
+
45
+ - **Front Matter**: If present, extract YAML front matter at the start of the file.
46
+
47
+ - **Counting Elements**: Count how many occurrences of a certain element type (e.g., how many headers, code blocks, etc.).
48
+
49
+ - **Textual Statistics**: Count the number of words and characters (excluding whitespace). Get a global summary (`analyse()`) of the document’s composition.
50
+
51
+ ## Installation
52
+
53
+ Install `mrkdwn_analysis` from PyPI:
54
+
55
+ ```bash
56
+ pip install markdown-analysis
57
+ ```
58
+
59
+ ## Usage
60
+
61
+ Using `mrkdwn_analysis` is straightforward. Import `MarkdownAnalyzer`, create an instance with your Markdown file path, and then call the various methods to extract the elements you need.
62
+
63
+ ```python
64
+ from mrkdwn_analysis import MarkdownAnalyzer
65
+
66
+ analyzer = MarkdownAnalyzer("path/to/document.md")
67
+
68
+ headers = analyzer.identify_headers()
69
+ paragraphs = analyzer.identify_paragraphs()
70
+ links = analyzer.identify_links()
71
+ ...
72
+ ```
73
+
74
+ ### Example
75
+
76
+ Consider `example.md`:
77
+
78
+ ```markdown
79
+ ---
80
+ title: "Python 3.11 Report"
81
+ author: "John Doe"
82
+ date: "2024-01-15"
83
+ ---
84
+
85
+ Python 3.11
86
+ ===========
87
+
88
+ A major **Python** release with significant improvements...
89
+
90
+ ### Performance Details
91
+
92
+ ```python
93
+ import math
94
+ print(math.factorial(10))
95
+ ```
96
+
97
+ > *Quote*: "Python 3.11 brings the speed we needed"
98
+
99
+ <div class="note">
100
+ <p>HTML block example</p>
101
+ </div>
102
+
103
+ This paragraph contains inline HTML: <span style="color:red;">Red text</span>.
104
+
105
+ - Unordered list:
106
+ - A basic point
107
+ - [ ] A task to do
108
+ - [x] A completed task
109
+
110
+ 1. Ordered list item 1
111
+ 2. Ordered list item 2
112
+ ```
113
+
114
+ After analysis:
115
+
116
+ ```python
117
+ analyzer = MarkdownAnalyzer("example.md")
118
+
119
+ print(analyzer.identify_headers())
120
+ # {"Header": [{"line": X, "level": 1, "text": "Python 3.11"}, {"line": Y, "level": 3, "text": "Performance Details"}]}
121
+
122
+ print(analyzer.identify_paragraphs())
123
+ # {"Paragraph": ["A major **Python** release ...", "This paragraph contains inline HTML: ..."]}
124
+
125
+ print(analyzer.identify_html_blocks())
126
+ # [{"line": Z, "content": "<div class=\"note\">\n <p>HTML block example</p>\n</div>"}]
127
+
128
+ print(analyzer.identify_html_inline())
129
+ # [{"line": W, "html": "<span style=\"color:red;\">Red text</span>"}]
130
+
131
+ print(analyzer.identify_lists())
132
+ # {
133
+ # "Ordered list": [["Ordered list item 1", "Ordered list item 2"]],
134
+ # "Unordered list": [["A basic point", "A task to do [Task]", "A completed task [Task done]"]]
135
+ # }
136
+
137
+ print(analyzer.identify_code_blocks())
138
+ # {"Code block": [{"start_line": X, "content": "import math\nprint(math.factorial(10))", "language": "python"}]}
139
+
140
+ print(analyzer.analyse())
141
+ # {
142
+ # 'headers': 2,
143
+ # 'paragraphs': 2,
144
+ # 'blockquotes': 1,
145
+ # 'code_blocks': 1,
146
+ # 'ordered_lists': 2,
147
+ # 'unordered_lists': 3,
148
+ # 'tables': 0,
149
+ # 'html_blocks': 1,
150
+ # 'html_inline_count': 1,
151
+ # 'words': 42,
152
+ # 'characters': 250
153
+ # }
154
+ ```
155
+
156
+ ### Key Methods
157
+
158
+ - `__init__(self, file_path)`: Load the Markdown file.
159
+ - `identify_headers()`: Returns all headers.
160
+ - `identify_sections()`: Returns setext sections.
161
+ - `identify_paragraphs()`: Returns paragraphs.
162
+ - `identify_blockquotes()`: Returns blockquotes.
163
+ - `identify_code_blocks()`: Returns code blocks with content and language.
164
+ - `identify_lists()`: Returns both ordered and unordered lists (including tasks).
165
+ - `identify_tables()`: Returns any GFM tables.
166
+ - `identify_links()`: Returns text and image links.
167
+ - `identify_footnotes()`: Returns footnotes used in the document.
168
+ - `identify_html_blocks()`: Returns HTML blocks as single tokens.
169
+ - `identify_html_inline()`: Returns inline HTML elements.
170
+ - `identify_todos()`: Returns task items.
171
+ - `count_elements(element_type)`: Counts occurrences of a specific element type.
172
+ - `count_words()`: Counts words in the entire document.
173
+ - `count_characters()`: Counts non-whitespace characters.
174
+ - `analyse()`: Provides a global summary (headers count, paragraphs count, etc.).
175
+
176
+ ### Checking and Validating Links
177
+
178
+ - `check_links()`: Validates text links to see if they are broken (e.g., non-200 status) and returns a list of broken links.
179
+
180
+ ### Global Analysis Example
181
+
182
+ ```python
183
+ analysis = analyzer.analyse()
184
+ print(analysis)
185
+ # {
186
+ # 'headers': X,
187
+ # 'paragraphs': Y,
188
+ # 'blockquotes': Z,
189
+ # 'code_blocks': A,
190
+ # 'ordered_lists': B,
191
+ # 'unordered_lists': C,
192
+ # 'tables': D,
193
+ # 'html_blocks': E,
194
+ # 'html_inline_count': F,
195
+ # 'words': G,
196
+ # 'characters': H
197
+ # }
198
+ ```
199
+
200
+ ## Contributing
201
+
202
+ Contributions are welcome! Feel free to open an issue or submit a pull request for bug reports, feature requests, or code improvements. Your input helps make `mrkdwn_analysis` more robust and versatile.
203
+
204
+
@@ -0,0 +1,186 @@
1
+ # mrkdwn_analysis
2
+
3
+ `mrkdwn_analysis` is a powerful Python library designed to analyze Markdown files. It provides extensive parsing capabilities to extract and categorize various elements within a Markdown document, including headers, sections, links, images, blockquotes, code blocks, lists, tables, tasks (todos), footnotes, and even embedded HTML. This makes it a versatile tool for data analysis, content generation, or building other tools that work with Markdown.
4
+
5
+ ## Features
6
+
7
+ - **File Loading**: Load any given Markdown file by providing its file path.
8
+
9
+ - **Header Detection**: Identify all headers (ATX `#` to `######`, and Setext `===` and `---`) in the document, giving you a quick overview of its structure.
10
+
11
+ - **Section Identification (Setext)**: Recognize sections defined by a block of text followed by `=` or `-` lines, helping you understand the document’s conceptual divisions.
12
+
13
+ - **Paragraph Extraction**: Distinguish regular text (paragraphs) from structured elements like headers, lists, or code blocks, making it easy to isolate the body content.
14
+
15
+ - **Blockquote Identification**: Extract all blockquotes defined by lines starting with `>`.
16
+
17
+ - **Code Block Extraction**: Detect fenced code blocks delimited by triple backticks (```), optionally retrieve their language, and separate programming code from regular text.
18
+
19
+ - **List Recognition**: Identify both ordered and unordered lists, including task lists (`- [ ]`, `- [x]`), and understand their structure and hierarchy.
20
+
21
+ - **Tables (GFM)**: Detect GitHub-Flavored Markdown tables, parse their headers and rows, and separate structured tabular data for further analysis.
22
+
23
+ - **Links and Images**: Identify text links (`[text](url)`) and images (`![alt](url)`), as well as reference-style links. This is useful for link validation or content analysis.
24
+
25
+ - **Footnotes**: Extract and handle Markdown footnotes (`[^note1]`), providing a way to process reference notes in the document.
26
+
27
+ - **HTML Blocks and Inline HTML**: Handle HTML blocks (`<div>...</div>`) as a single element, and detect inline HTML elements (`<span style="...">... </span>`) as a unified component.
28
+
29
+ - **Front Matter**: If present, extract YAML front matter at the start of the file.
30
+
31
+ - **Counting Elements**: Count how many occurrences of a certain element type (e.g., how many headers, code blocks, etc.).
32
+
33
+ - **Textual Statistics**: Count the number of words and characters (excluding whitespace). Get a global summary (`analyse()`) of the document’s composition.
34
+
35
+ ## Installation
36
+
37
+ Install `mrkdwn_analysis` from PyPI:
38
+
39
+ ```bash
40
+ pip install markdown-analysis
41
+ ```
42
+
43
+ ## Usage
44
+
45
+ Using `mrkdwn_analysis` is straightforward. Import `MarkdownAnalyzer`, create an instance with your Markdown file path, and then call the various methods to extract the elements you need.
46
+
47
+ ```python
48
+ from mrkdwn_analysis import MarkdownAnalyzer
49
+
50
+ analyzer = MarkdownAnalyzer("path/to/document.md")
51
+
52
+ headers = analyzer.identify_headers()
53
+ paragraphs = analyzer.identify_paragraphs()
54
+ links = analyzer.identify_links()
55
+ ...
56
+ ```
57
+
58
+ ### Example
59
+
60
+ Consider `example.md`:
61
+
62
+ ```markdown
63
+ ---
64
+ title: "Python 3.11 Report"
65
+ author: "John Doe"
66
+ date: "2024-01-15"
67
+ ---
68
+
69
+ Python 3.11
70
+ ===========
71
+
72
+ A major **Python** release with significant improvements...
73
+
74
+ ### Performance Details
75
+
76
+ ```python
77
+ import math
78
+ print(math.factorial(10))
79
+ ```
80
+
81
+ > *Quote*: "Python 3.11 brings the speed we needed"
82
+
83
+ <div class="note">
84
+ <p>HTML block example</p>
85
+ </div>
86
+
87
+ This paragraph contains inline HTML: <span style="color:red;">Red text</span>.
88
+
89
+ - Unordered list:
90
+ - A basic point
91
+ - [ ] A task to do
92
+ - [x] A completed task
93
+
94
+ 1. Ordered list item 1
95
+ 2. Ordered list item 2
96
+ ```
97
+
98
+ After analysis:
99
+
100
+ ```python
101
+ analyzer = MarkdownAnalyzer("example.md")
102
+
103
+ print(analyzer.identify_headers())
104
+ # {"Header": [{"line": X, "level": 1, "text": "Python 3.11"}, {"line": Y, "level": 3, "text": "Performance Details"}]}
105
+
106
+ print(analyzer.identify_paragraphs())
107
+ # {"Paragraph": ["A major **Python** release ...", "This paragraph contains inline HTML: ..."]}
108
+
109
+ print(analyzer.identify_html_blocks())
110
+ # [{"line": Z, "content": "<div class=\"note\">\n <p>HTML block example</p>\n</div>"}]
111
+
112
+ print(analyzer.identify_html_inline())
113
+ # [{"line": W, "html": "<span style=\"color:red;\">Red text</span>"}]
114
+
115
+ print(analyzer.identify_lists())
116
+ # {
117
+ # "Ordered list": [["Ordered list item 1", "Ordered list item 2"]],
118
+ # "Unordered list": [["A basic point", "A task to do [Task]", "A completed task [Task done]"]]
119
+ # }
120
+
121
+ print(analyzer.identify_code_blocks())
122
+ # {"Code block": [{"start_line": X, "content": "import math\nprint(math.factorial(10))", "language": "python"}]}
123
+
124
+ print(analyzer.analyse())
125
+ # {
126
+ # 'headers': 2,
127
+ # 'paragraphs': 2,
128
+ # 'blockquotes': 1,
129
+ # 'code_blocks': 1,
130
+ # 'ordered_lists': 2,
131
+ # 'unordered_lists': 3,
132
+ # 'tables': 0,
133
+ # 'html_blocks': 1,
134
+ # 'html_inline_count': 1,
135
+ # 'words': 42,
136
+ # 'characters': 250
137
+ # }
138
+ ```
139
+
140
+ ### Key Methods
141
+
142
+ - `__init__(self, file_path)`: Load the Markdown file.
143
+ - `identify_headers()`: Returns all headers.
144
+ - `identify_sections()`: Returns setext sections.
145
+ - `identify_paragraphs()`: Returns paragraphs.
146
+ - `identify_blockquotes()`: Returns blockquotes.
147
+ - `identify_code_blocks()`: Returns code blocks with content and language.
148
+ - `identify_lists()`: Returns both ordered and unordered lists (including tasks).
149
+ - `identify_tables()`: Returns any GFM tables.
150
+ - `identify_links()`: Returns text and image links.
151
+ - `identify_footnotes()`: Returns footnotes used in the document.
152
+ - `identify_html_blocks()`: Returns HTML blocks as single tokens.
153
+ - `identify_html_inline()`: Returns inline HTML elements.
154
+ - `identify_todos()`: Returns task items.
155
+ - `count_elements(element_type)`: Counts occurrences of a specific element type.
156
+ - `count_words()`: Counts words in the entire document.
157
+ - `count_characters()`: Counts non-whitespace characters.
158
+ - `analyse()`: Provides a global summary (headers count, paragraphs count, etc.).
159
+
160
+ ### Checking and Validating Links
161
+
162
+ - `check_links()`: Validates text links to see if they are broken (e.g., non-200 status) and returns a list of broken links.
163
+
164
+ ### Global Analysis Example
165
+
166
+ ```python
167
+ analysis = analyzer.analyse()
168
+ print(analysis)
169
+ # {
170
+ # 'headers': X,
171
+ # 'paragraphs': Y,
172
+ # 'blockquotes': Z,
173
+ # 'code_blocks': A,
174
+ # 'ordered_lists': B,
175
+ # 'unordered_lists': C,
176
+ # 'tables': D,
177
+ # 'html_blocks': E,
178
+ # 'html_inline_count': F,
179
+ # 'words': G,
180
+ # 'characters': H
181
+ # }
182
+ ```
183
+
184
+ ## Contributing
185
+
186
+ Contributions are welcome! Feel free to open an issue or submit a pull request for bug reports, feature requests, or code improvements. Your input helps make `mrkdwn_analysis` more robust and versatile.
@@ -0,0 +1,204 @@
1
+ Metadata-Version: 2.1
2
+ Name: markdown-analysis
3
+ Version: 0.1.1
4
+ Summary: UNKNOWN
5
+ Home-page: https://github.com/yannbanas/mrkdwn_analysis
6
+ Author: yannbanas
7
+ Author-email: yannbanas@gmail.com
8
+ License: UNKNOWN
9
+ Platform: UNKNOWN
10
+ Classifier: Development Status :: 2 - Pre-Alpha
11
+ Classifier: Intended Audience :: Developers
12
+ Classifier: License :: OSI Approved :: MIT License
13
+ Classifier: Programming Language :: Python :: 3.11
14
+ Description-Content-Type: text/markdown
15
+ License-File: LICENSE
16
+
17
+ # mrkdwn_analysis
18
+
19
+ `mrkdwn_analysis` is a powerful Python library designed to analyze Markdown files. It provides extensive parsing capabilities to extract and categorize various elements within a Markdown document, including headers, sections, links, images, blockquotes, code blocks, lists, tables, tasks (todos), footnotes, and even embedded HTML. This makes it a versatile tool for data analysis, content generation, or building other tools that work with Markdown.
20
+
21
+ ## Features
22
+
23
+ - **File Loading**: Load any given Markdown file by providing its file path.
24
+
25
+ - **Header Detection**: Identify all headers (ATX `#` to `######`, and Setext `===` and `---`) in the document, giving you a quick overview of its structure.
26
+
27
+ - **Section Identification (Setext)**: Recognize sections defined by a block of text followed by `=` or `-` lines, helping you understand the document’s conceptual divisions.
28
+
29
+ - **Paragraph Extraction**: Distinguish regular text (paragraphs) from structured elements like headers, lists, or code blocks, making it easy to isolate the body content.
30
+
31
+ - **Blockquote Identification**: Extract all blockquotes defined by lines starting with `>`.
32
+
33
+ - **Code Block Extraction**: Detect fenced code blocks delimited by triple backticks (```), optionally retrieve their language, and separate programming code from regular text.
34
+
35
+ - **List Recognition**: Identify both ordered and unordered lists, including task lists (`- [ ]`, `- [x]`), and understand their structure and hierarchy.
36
+
37
+ - **Tables (GFM)**: Detect GitHub-Flavored Markdown tables, parse their headers and rows, and separate structured tabular data for further analysis.
38
+
39
+ - **Links and Images**: Identify text links (`[text](url)`) and images (`![alt](url)`), as well as reference-style links. This is useful for link validation or content analysis.
40
+
41
+ - **Footnotes**: Extract and handle Markdown footnotes (`[^note1]`), providing a way to process reference notes in the document.
42
+
43
+ - **HTML Blocks and Inline HTML**: Handle HTML blocks (`<div>...</div>`) as a single element, and detect inline HTML elements (`<span style="...">... </span>`) as a unified component.
44
+
45
+ - **Front Matter**: If present, extract YAML front matter at the start of the file.
46
+
47
+ - **Counting Elements**: Count how many occurrences of a certain element type (e.g., how many headers, code blocks, etc.).
48
+
49
+ - **Textual Statistics**: Count the number of words and characters (excluding whitespace). Get a global summary (`analyse()`) of the document’s composition.
50
+
51
+ ## Installation
52
+
53
+ Install `mrkdwn_analysis` from PyPI:
54
+
55
+ ```bash
56
+ pip install markdown-analysis
57
+ ```
58
+
59
+ ## Usage
60
+
61
+ Using `mrkdwn_analysis` is straightforward. Import `MarkdownAnalyzer`, create an instance with your Markdown file path, and then call the various methods to extract the elements you need.
62
+
63
+ ```python
64
+ from mrkdwn_analysis import MarkdownAnalyzer
65
+
66
+ analyzer = MarkdownAnalyzer("path/to/document.md")
67
+
68
+ headers = analyzer.identify_headers()
69
+ paragraphs = analyzer.identify_paragraphs()
70
+ links = analyzer.identify_links()
71
+ ...
72
+ ```
73
+
74
+ ### Example
75
+
76
+ Consider `example.md`:
77
+
78
+ ```markdown
79
+ ---
80
+ title: "Python 3.11 Report"
81
+ author: "John Doe"
82
+ date: "2024-01-15"
83
+ ---
84
+
85
+ Python 3.11
86
+ ===========
87
+
88
+ A major **Python** release with significant improvements...
89
+
90
+ ### Performance Details
91
+
92
+ ```python
93
+ import math
94
+ print(math.factorial(10))
95
+ ```
96
+
97
+ > *Quote*: "Python 3.11 brings the speed we needed"
98
+
99
+ <div class="note">
100
+ <p>HTML block example</p>
101
+ </div>
102
+
103
+ This paragraph contains inline HTML: <span style="color:red;">Red text</span>.
104
+
105
+ - Unordered list:
106
+ - A basic point
107
+ - [ ] A task to do
108
+ - [x] A completed task
109
+
110
+ 1. Ordered list item 1
111
+ 2. Ordered list item 2
112
+ ```
113
+
114
+ After analysis:
115
+
116
+ ```python
117
+ analyzer = MarkdownAnalyzer("example.md")
118
+
119
+ print(analyzer.identify_headers())
120
+ # {"Header": [{"line": X, "level": 1, "text": "Python 3.11"}, {"line": Y, "level": 3, "text": "Performance Details"}]}
121
+
122
+ print(analyzer.identify_paragraphs())
123
+ # {"Paragraph": ["A major **Python** release ...", "This paragraph contains inline HTML: ..."]}
124
+
125
+ print(analyzer.identify_html_blocks())
126
+ # [{"line": Z, "content": "<div class=\"note\">\n <p>HTML block example</p>\n</div>"}]
127
+
128
+ print(analyzer.identify_html_inline())
129
+ # [{"line": W, "html": "<span style=\"color:red;\">Red text</span>"}]
130
+
131
+ print(analyzer.identify_lists())
132
+ # {
133
+ # "Ordered list": [["Ordered list item 1", "Ordered list item 2"]],
134
+ # "Unordered list": [["A basic point", "A task to do [Task]", "A completed task [Task done]"]]
135
+ # }
136
+
137
+ print(analyzer.identify_code_blocks())
138
+ # {"Code block": [{"start_line": X, "content": "import math\nprint(math.factorial(10))", "language": "python"}]}
139
+
140
+ print(analyzer.analyse())
141
+ # {
142
+ # 'headers': 2,
143
+ # 'paragraphs': 2,
144
+ # 'blockquotes': 1,
145
+ # 'code_blocks': 1,
146
+ # 'ordered_lists': 2,
147
+ # 'unordered_lists': 3,
148
+ # 'tables': 0,
149
+ # 'html_blocks': 1,
150
+ # 'html_inline_count': 1,
151
+ # 'words': 42,
152
+ # 'characters': 250
153
+ # }
154
+ ```
155
+
156
+ ### Key Methods
157
+
158
+ - `__init__(self, file_path)`: Load the Markdown file.
159
+ - `identify_headers()`: Returns all headers.
160
+ - `identify_sections()`: Returns setext sections.
161
+ - `identify_paragraphs()`: Returns paragraphs.
162
+ - `identify_blockquotes()`: Returns blockquotes.
163
+ - `identify_code_blocks()`: Returns code blocks with content and language.
164
+ - `identify_lists()`: Returns both ordered and unordered lists (including tasks).
165
+ - `identify_tables()`: Returns any GFM tables.
166
+ - `identify_links()`: Returns text and image links.
167
+ - `identify_footnotes()`: Returns footnotes used in the document.
168
+ - `identify_html_blocks()`: Returns HTML blocks as single tokens.
169
+ - `identify_html_inline()`: Returns inline HTML elements.
170
+ - `identify_todos()`: Returns task items.
171
+ - `count_elements(element_type)`: Counts occurrences of a specific element type.
172
+ - `count_words()`: Counts words in the entire document.
173
+ - `count_characters()`: Counts non-whitespace characters.
174
+ - `analyse()`: Provides a global summary (headers count, paragraphs count, etc.).
175
+
176
+ ### Checking and Validating Links
177
+
178
+ - `check_links()`: Validates text links to see if they are broken (e.g., non-200 status) and returns a list of broken links.
179
+
180
+ ### Global Analysis Example
181
+
182
+ ```python
183
+ analysis = analyzer.analyse()
184
+ print(analysis)
185
+ # {
186
+ # 'headers': X,
187
+ # 'paragraphs': Y,
188
+ # 'blockquotes': Z,
189
+ # 'code_blocks': A,
190
+ # 'ordered_lists': B,
191
+ # 'unordered_lists': C,
192
+ # 'tables': D,
193
+ # 'html_blocks': E,
194
+ # 'html_inline_count': F,
195
+ # 'words': G,
196
+ # 'characters': H
197
+ # }
198
+ ```
199
+
200
+ ## Contributing
201
+
202
+ Contributions are welcome! Feel free to open an issue or submit a pull request for bug reports, feature requests, or code improvements. Your input helps make `mrkdwn_analysis` more robust and versatile.
203
+
204
+