markdown-analysis 0.0.4__tar.gz → 0.0.5__tar.gz

Sign up to get free protection for your applications and to get access to all the features.
@@ -0,0 +1,137 @@
1
+ Metadata-Version: 2.1
2
+ Name: markdown_analysis
3
+ Version: 0.0.5
4
+ Summary: UNKNOWN
5
+ Home-page: https://github.com/yannbanas/mrkdwn_analysis
6
+ Author: yannbanas
7
+ Author-email: yannbanas@gmail.com
8
+ License: UNKNOWN
9
+ Description: # mrkdwn_analysis
10
+
11
+ `mrkdwn_analysis` is a Python library designed to analyze Markdown files. With its powerful parsing capabilities, it can extract and categorize various elements within a Markdown document, including headers, sections, links, images, blockquotes, code blocks, and lists. This makes it a valuable tool for anyone looking to parse Markdown content for data analysis, content generation, or for building other tools that utilize Markdown.
12
+
13
+ ## Features
14
+
15
+ - File Loading: The MarkdownAnalyzer can load any given Markdown file provided through the file path.
16
+
17
+ - Header Identification: The tool can extract all headers from the markdown file, ranging from H1 to H6 tags. This allows users to have a quick overview of the document's structure.
18
+
19
+ - Section Identification: The analyzer can recognize different sections of the document. It defines a section as a block of text followed by a line composed solely of = or - characters.
20
+
21
+ - Paragraph Identification: The tool can distinguish between regular text and other elements such as lists, headers, etc., thereby identifying all the paragraphs present in the document.
22
+
23
+ - Blockquote Identification: The analyzer can identify and extract all blockquotes in the markdown file.
24
+
25
+ - Code Block Identification: The tool can extract all code blocks defined in the document, allowing you to separate the programming code from the regular text easily.
26
+
27
+ - List Identification: The analyzer can identify both ordered and unordered lists in the markdown file, providing information about the hierarchical structure of the points.
28
+
29
+ - Table Identification: The tool can identify and extract tables from the markdown file, enabling users to separate and analyze tabular data quickly.
30
+
31
+ - Link Identification and Validation: The analyzer can identify all links present in the markdown file, categorizing them into text and image links. Moreover, it can also verify if these links are valid or broken.
32
+
33
+ - Todo Identification: The tool is capable of recognizing and extracting todos (tasks or action items) present in the document.
34
+
35
+ - Element Counting: The analyzer can count the total number of a specific element type in the file. This can help in quantifying the extent of different elements in the document.
36
+
37
+ - Word Counting: The tool can count the total number of words in the file, providing an estimate of the document's length.
38
+
39
+ - Character Counting: The analyzer can count the total number of characters (excluding spaces) in the file, giving a detailed measure of the document's size.
40
+
41
+ ## Installation
42
+ You can install `mrkdwn_analysis` from PyPI:
43
+
44
+ ```bash
45
+ pip install mrkdwn_analysis
46
+ ```
47
+
48
+ We hope `mrkdwn_analysis` helps you with all your Markdown analyzing needs!
49
+
50
+ ## Usage
51
+ Using `mrkdwn_analysis` is simple. Just import the `MarkdownAnalyzer` class, create an instance with your Markdown file, and you're good to go!
52
+
53
+ ```python
54
+ from mrkdwn_analysis import MarkdownAnalyzer
55
+
56
+ analyzer = MarkdownAnalyzer("path/to/your/markdown.md")
57
+
58
+ headers = analyzer.identify_headers()
59
+ sections = analyzer.identify_sections()
60
+ ...
61
+ ```
62
+
63
+ ### Class MarkdownAnalyzer
64
+
65
+ The `MarkdownAnalyzer` class is designed to analyze Markdown files. It has the ability to extract and categorize various elements of a Markdown document.
66
+
67
+ ### `__init__(self, file_path)`
68
+
69
+ The constructor of the class. It opens the specified Markdown file and stores its content line by line.
70
+
71
+ - `file_path`: the path of the Markdown file to analyze.
72
+
73
+ ### `identify_headers(self)`
74
+
75
+ Analyzes the file and identifies all headers (from h1 to h6). Headers are returned as a dictionary where the key is "Header" and the value is a list of all headers found.
76
+
77
+ ### `identify_sections(self)`
78
+
79
+ Analyzes the file and identifies all sections. Sections are defined as a block of text followed by a line composed solely of `=` or `-` characters. Sections are returned as a dictionary where the key is "Section" and the value is a list of all sections found.
80
+
81
+ ### `identify_paragraphs(self)`
82
+
83
+ Analyzes the file and identifies all paragraphs. Paragraphs are defined as a block of text that is not a header, list, blockquote, etc. Paragraphs are returned as a dictionary where the key is "Paragraph" and the value is a list of all paragraphs found.
84
+
85
+ ### `identify_blockquotes(self)`
86
+
87
+ Analyzes the file and identifies all blockquotes. Blockquotes are defined by a line starting with the `>` character. Blockquotes are returned as a dictionary where the key is "Blockquote" and the value is a list of all blockquotes found.
88
+
89
+ ### `identify_code_blocks(self)`
90
+
91
+ Analyzes the file and identifies all code blocks. Code blocks are defined by a block of text surrounded by lines containing only the "```" text. Code blocks are returned as a dictionary where the key is "Code block" and the value is a list of all code blocks found.
92
+
93
+ ### `identify_ordered_lists(self)`
94
+
95
+ Analyzes the file and identifies all ordered lists. Ordered lists are defined by lines starting with a number followed by a dot. Ordered lists are returned as a dictionary where the key is "Ordered list" and the value is a list of all ordered lists found.
96
+
97
+ ### `identify_unordered_lists(self)`
98
+
99
+ Analyzes the file and identifies all unordered lists. Unordered lists are defined by lines starting with a `-`, `*`, or `+`. Unordered lists are returned as a dictionary where the key is "Unordered list" and the value is a list of all unordered lists found.
100
+
101
+ ### `identify_tables(self)`
102
+
103
+ Analyzes the file and identifies all tables. Tables are defined by lines containing `|` to delimit cells and are separated by lines containing `-` to define the borders. Tables are returned as a dictionary where the key is "Table" and the value is a list of all tables found.
104
+
105
+ ### `identify_links(self)`
106
+
107
+ Analyzes the file and identifies all links. Links are defined by the format `[text](url)`. Links are returned as a dictionary where the keys are "Text link" and "Image link" and the values are lists of all links found.
108
+
109
+ ### `check_links(self)`
110
+
111
+ Checks all links identified by `identify_links` to see if they are broken (return a 404 error). Broken links are returned as a list, each item being a dictionary containing the line number, link text, and URL.
112
+
113
+ ### `identify_todos(self)`
114
+
115
+ Analyzes the file and identifies all todos. Todos are defined by lines starting with `- [ ] `. Todos are returned as a list, each item being a dictionary containing the line number and todo text.
116
+
117
+ ### `count_elements(self, element_type)`
118
+
119
+ Counts the total number of a specific element type in the file. The `element_type` should match the name of one of the identification methods (for example, "headers" for `identify_headers`). Returns the total number of elements of this type.
120
+
121
+ ### `count_words(self)`
122
+
123
+ Counts the total number of words in the file. Returns the word count.
124
+
125
+ ### `count_characters(self)`
126
+
127
+ Counts the total number of characters (excluding spaces) in the file. Returns the character count.
128
+
129
+ ## Contributions
130
+ Contributions are always welcome! If you have a feature request, bug report, or just want to improve the code, feel free to create a pull request or open an issue.
131
+
132
+ Platform: UNKNOWN
133
+ Classifier: Development Status :: 2 - Pre-Alpha
134
+ Classifier: Intended Audience :: Developers
135
+ Classifier: License :: OSI Approved :: MIT License
136
+ Classifier: Programming Language :: Python :: 3.11
137
+ Description-Content-Type: text/markdown
@@ -0,0 +1,137 @@
1
+ Metadata-Version: 2.1
2
+ Name: markdown-analysis
3
+ Version: 0.0.5
4
+ Summary: UNKNOWN
5
+ Home-page: https://github.com/yannbanas/mrkdwn_analysis
6
+ Author: yannbanas
7
+ Author-email: yannbanas@gmail.com
8
+ License: UNKNOWN
9
+ Description: # mrkdwn_analysis
10
+
11
+ `mrkdwn_analysis` is a Python library designed to analyze Markdown files. With its powerful parsing capabilities, it can extract and categorize various elements within a Markdown document, including headers, sections, links, images, blockquotes, code blocks, and lists. This makes it a valuable tool for anyone looking to parse Markdown content for data analysis, content generation, or for building other tools that utilize Markdown.
12
+
13
+ ## Features
14
+
15
+ - File Loading: The MarkdownAnalyzer can load any given Markdown file provided through the file path.
16
+
17
+ - Header Identification: The tool can extract all headers from the markdown file, ranging from H1 to H6 tags. This allows users to have a quick overview of the document's structure.
18
+
19
+ - Section Identification: The analyzer can recognize different sections of the document. It defines a section as a block of text followed by a line composed solely of = or - characters.
20
+
21
+ - Paragraph Identification: The tool can distinguish between regular text and other elements such as lists, headers, etc., thereby identifying all the paragraphs present in the document.
22
+
23
+ - Blockquote Identification: The analyzer can identify and extract all blockquotes in the markdown file.
24
+
25
+ - Code Block Identification: The tool can extract all code blocks defined in the document, allowing you to separate the programming code from the regular text easily.
26
+
27
+ - List Identification: The analyzer can identify both ordered and unordered lists in the markdown file, providing information about the hierarchical structure of the points.
28
+
29
+ - Table Identification: The tool can identify and extract tables from the markdown file, enabling users to separate and analyze tabular data quickly.
30
+
31
+ - Link Identification and Validation: The analyzer can identify all links present in the markdown file, categorizing them into text and image links. Moreover, it can also verify if these links are valid or broken.
32
+
33
+ - Todo Identification: The tool is capable of recognizing and extracting todos (tasks or action items) present in the document.
34
+
35
+ - Element Counting: The analyzer can count the total number of a specific element type in the file. This can help in quantifying the extent of different elements in the document.
36
+
37
+ - Word Counting: The tool can count the total number of words in the file, providing an estimate of the document's length.
38
+
39
+ - Character Counting: The analyzer can count the total number of characters (excluding spaces) in the file, giving a detailed measure of the document's size.
40
+
41
+ ## Installation
42
+ You can install `mrkdwn_analysis` from PyPI:
43
+
44
+ ```bash
45
+ pip install mrkdwn_analysis
46
+ ```
47
+
48
+ We hope `mrkdwn_analysis` helps you with all your Markdown analyzing needs!
49
+
50
+ ## Usage
51
+ Using `mrkdwn_analysis` is simple. Just import the `MarkdownAnalyzer` class, create an instance with your Markdown file, and you're good to go!
52
+
53
+ ```python
54
+ from mrkdwn_analysis import MarkdownAnalyzer
55
+
56
+ analyzer = MarkdownAnalyzer("path/to/your/markdown.md")
57
+
58
+ headers = analyzer.identify_headers()
59
+ sections = analyzer.identify_sections()
60
+ ...
61
+ ```
62
+
63
+ ### Class MarkdownAnalyzer
64
+
65
+ The `MarkdownAnalyzer` class is designed to analyze Markdown files. It has the ability to extract and categorize various elements of a Markdown document.
66
+
67
+ ### `__init__(self, file_path)`
68
+
69
+ The constructor of the class. It opens the specified Markdown file and stores its content line by line.
70
+
71
+ - `file_path`: the path of the Markdown file to analyze.
72
+
73
+ ### `identify_headers(self)`
74
+
75
+ Analyzes the file and identifies all headers (from h1 to h6). Headers are returned as a dictionary where the key is "Header" and the value is a list of all headers found.
76
+
77
+ ### `identify_sections(self)`
78
+
79
+ Analyzes the file and identifies all sections. Sections are defined as a block of text followed by a line composed solely of `=` or `-` characters. Sections are returned as a dictionary where the key is "Section" and the value is a list of all sections found.
80
+
81
+ ### `identify_paragraphs(self)`
82
+
83
+ Analyzes the file and identifies all paragraphs. Paragraphs are defined as a block of text that is not a header, list, blockquote, etc. Paragraphs are returned as a dictionary where the key is "Paragraph" and the value is a list of all paragraphs found.
84
+
85
+ ### `identify_blockquotes(self)`
86
+
87
+ Analyzes the file and identifies all blockquotes. Blockquotes are defined by a line starting with the `>` character. Blockquotes are returned as a dictionary where the key is "Blockquote" and the value is a list of all blockquotes found.
88
+
89
+ ### `identify_code_blocks(self)`
90
+
91
+ Analyzes the file and identifies all code blocks. Code blocks are defined by a block of text surrounded by lines containing only the "```" text. Code blocks are returned as a dictionary where the key is "Code block" and the value is a list of all code blocks found.
92
+
93
+ ### `identify_ordered_lists(self)`
94
+
95
+ Analyzes the file and identifies all ordered lists. Ordered lists are defined by lines starting with a number followed by a dot. Ordered lists are returned as a dictionary where the key is "Ordered list" and the value is a list of all ordered lists found.
96
+
97
+ ### `identify_unordered_lists(self)`
98
+
99
+ Analyzes the file and identifies all unordered lists. Unordered lists are defined by lines starting with a `-`, `*`, or `+`. Unordered lists are returned as a dictionary where the key is "Unordered list" and the value is a list of all unordered lists found.
100
+
101
+ ### `identify_tables(self)`
102
+
103
+ Analyzes the file and identifies all tables. Tables are defined by lines containing `|` to delimit cells and are separated by lines containing `-` to define the borders. Tables are returned as a dictionary where the key is "Table" and the value is a list of all tables found.
104
+
105
+ ### `identify_links(self)`
106
+
107
+ Analyzes the file and identifies all links. Links are defined by the format `[text](url)`. Links are returned as a dictionary where the keys are "Text link" and "Image link" and the values are lists of all links found.
108
+
109
+ ### `check_links(self)`
110
+
111
+ Checks all links identified by `identify_links` to see if they are broken (return a 404 error). Broken links are returned as a list, each item being a dictionary containing the line number, link text, and URL.
112
+
113
+ ### `identify_todos(self)`
114
+
115
+ Analyzes the file and identifies all todos. Todos are defined by lines starting with `- [ ] `. Todos are returned as a list, each item being a dictionary containing the line number and todo text.
116
+
117
+ ### `count_elements(self, element_type)`
118
+
119
+ Counts the total number of a specific element type in the file. The `element_type` should match the name of one of the identification methods (for example, "headers" for `identify_headers`). Returns the total number of elements of this type.
120
+
121
+ ### `count_words(self)`
122
+
123
+ Counts the total number of words in the file. Returns the word count.
124
+
125
+ ### `count_characters(self)`
126
+
127
+ Counts the total number of characters (excluding spaces) in the file. Returns the character count.
128
+
129
+ ## Contributions
130
+ Contributions are always welcome! If you have a feature request, bug report, or just want to improve the code, feel free to create a pull request or open an issue.
131
+
132
+ Platform: UNKNOWN
133
+ Classifier: Development Status :: 2 - Pre-Alpha
134
+ Classifier: Intended Audience :: Developers
135
+ Classifier: License :: OSI Approved :: MIT License
136
+ Classifier: Programming Language :: Python :: 3.11
137
+ Description-Content-Type: text/markdown
@@ -3,8 +3,8 @@ import requests
3
3
  from collections import defaultdict, Counter
4
4
 
5
5
  class MarkdownAnalyzer:
6
- def __init__(self, file_path):
7
- with open(file_path, 'r') as file:
6
+ def __init__(self, file_path, encoding='utf-8'):
7
+ with open(file_path, 'r', encoding=encoding) as file:
8
8
  self.lines = file.readlines()
9
9
 
10
10
  def identify_headers(self):
@@ -162,19 +162,10 @@ class MarkdownAnalyzer:
162
162
 
163
163
  def identify_tables(self):
164
164
  result = defaultdict(list)
165
- table_pattern = re.compile(r'^ {0,3}\|(?P<table_head>.+)\|[ \t]*\n' +
166
- r' {0,3}\|(?P<table_align> *[-:]+[-| :]*)\|[ \t]*\n' +
167
- r'(?P<table_body>(?: {0,3}\|.*\|[ \t]*(?:\n|$))*)\n*')
168
- nptable_pattern = re.compile(r'^ {0,3}(?P<nptable_head>\S.*\|.*)\n' +
169
- r' {0,3}(?P<nptable_align>[-:]+ *\|[-| :]*)\n' +
170
- r'(?P<nptable_body>(?:.*\|.*(?:\n|$))*)\n*')
171
-
172
- text = "".join(self.lines)
173
- matches_table = re.findall(table_pattern, text)
174
- matches_nptable = re.findall(nptable_pattern, text)
175
- for match in matches_table + matches_nptable:
176
- result["Table"].append(match)
177
-
165
+ table_pattern = re.compile(r'^\|.*\|$', re.MULTILINE)
166
+ table_rows = table_pattern.findall("".join(self.lines))
167
+ for table_row in table_rows:
168
+ result["Table"].append(table_row.strip().split("|"))
178
169
  return dict(result)
179
170
 
180
171
  def identify_links(self):
@@ -1,4 +1,4 @@
1
- [egg_info]
2
- tag_build =
3
- tag_date = 0
4
-
1
+ [egg_info]
2
+ tag_build =
3
+ tag_date = 0
4
+
@@ -6,7 +6,7 @@ with open("README.md", "r", encoding="utf-8") as fh:
6
6
 
7
7
  setup(
8
8
  name='markdown_analysis',
9
- version='0.0.4',
9
+ version='0.0.5',
10
10
  long_description=long_description,
11
11
  long_description_content_type="text/markdown",
12
12
  author='yannbanas',
@@ -1,140 +0,0 @@
1
- Metadata-Version: 2.1
2
- Name: markdown_analysis
3
- Version: 0.0.4
4
- Summary: UNKNOWN
5
- Home-page: https://github.com/yannbanas/mrkdwn_analysis
6
- Author: yannbanas
7
- Author-email: yannbanas@gmail.com
8
- License: UNKNOWN
9
- Platform: UNKNOWN
10
- Classifier: Development Status :: 2 - Pre-Alpha
11
- Classifier: Intended Audience :: Developers
12
- Classifier: License :: OSI Approved :: MIT License
13
- Classifier: Programming Language :: Python :: 3.11
14
- Description-Content-Type: text/markdown
15
- License-File: LICENSE
16
-
17
- # mrkdwn_analysis
18
-
19
- `mrkdwn_analysis` is a Python library designed to analyze Markdown files. With its powerful parsing capabilities, it can extract and categorize various elements within a Markdown document, including headers, sections, links, images, blockquotes, code blocks, and lists. This makes it a valuable tool for anyone looking to parse Markdown content for data analysis, content generation, or for building other tools that utilize Markdown.
20
-
21
- ## Features
22
-
23
- - File Loading: The MarkdownAnalyzer can load any given Markdown file provided through the file path.
24
-
25
- - Header Identification: The tool can extract all headers from the markdown file, ranging from H1 to H6 tags. This allows users to have a quick overview of the document's structure.
26
-
27
- - Section Identification: The analyzer can recognize different sections of the document. It defines a section as a block of text followed by a line composed solely of = or - characters.
28
-
29
- - Paragraph Identification: The tool can distinguish between regular text and other elements such as lists, headers, etc., thereby identifying all the paragraphs present in the document.
30
-
31
- - Blockquote Identification: The analyzer can identify and extract all blockquotes in the markdown file.
32
-
33
- - Code Block Identification: The tool can extract all code blocks defined in the document, allowing you to separate the programming code from the regular text easily.
34
-
35
- - List Identification: The analyzer can identify both ordered and unordered lists in the markdown file, providing information about the hierarchical structure of the points.
36
-
37
- - Table Identification: The tool can identify and extract tables from the markdown file, enabling users to separate and analyze tabular data quickly.
38
-
39
- - Link Identification and Validation: The analyzer can identify all links present in the markdown file, categorizing them into text and image links. Moreover, it can also verify if these links are valid or broken.
40
-
41
- - Todo Identification: The tool is capable of recognizing and extracting todos (tasks or action items) present in the document.
42
-
43
- - Element Counting: The analyzer can count the total number of a specific element type in the file. This can help in quantifying the extent of different elements in the document.
44
-
45
- - Word Counting: The tool can count the total number of words in the file, providing an estimate of the document's length.
46
-
47
- - Character Counting: The analyzer can count the total number of characters (excluding spaces) in the file, giving a detailed measure of the document's size.
48
-
49
- ## Installation
50
- You can install `mrkdwn_analysis` from PyPI:
51
-
52
- ```bash
53
- pip install mrkdwn_analysis
54
- ```
55
-
56
- We hope `mrkdwn_analysis` helps you with all your Markdown analyzing needs!
57
-
58
- ## Usage
59
- Using `mrkdwn_analysis` is simple. Just import the `MarkdownAnalyzer` class, create an instance with your Markdown file, and you're good to go!
60
-
61
- ```python
62
- from mrkdwn_analysis import MarkdownAnalyzer
63
-
64
- analyzer = MarkdownAnalyzer("path/to/your/markdown.md")
65
-
66
- headers = analyzer.identify_headers()
67
- sections = analyzer.identify_sections()
68
- ...
69
- ```
70
-
71
- ### Class MarkdownAnalyzer
72
-
73
- The `MarkdownAnalyzer` class is designed to analyze Markdown files. It has the ability to extract and categorize various elements of a Markdown document.
74
-
75
- ### `__init__(self, file_path)`
76
-
77
- The constructor of the class. It opens the specified Markdown file and stores its content line by line.
78
-
79
- - `file_path`: the path of the Markdown file to analyze.
80
-
81
- ### `identify_headers(self)`
82
-
83
- Analyzes the file and identifies all headers (from h1 to h6). Headers are returned as a dictionary where the key is "Header" and the value is a list of all headers found.
84
-
85
- ### `identify_sections(self)`
86
-
87
- Analyzes the file and identifies all sections. Sections are defined as a block of text followed by a line composed solely of `=` or `-` characters. Sections are returned as a dictionary where the key is "Section" and the value is a list of all sections found.
88
-
89
- ### `identify_paragraphs(self)`
90
-
91
- Analyzes the file and identifies all paragraphs. Paragraphs are defined as a block of text that is not a header, list, blockquote, etc. Paragraphs are returned as a dictionary where the key is "Paragraph" and the value is a list of all paragraphs found.
92
-
93
- ### `identify_blockquotes(self)`
94
-
95
- Analyzes the file and identifies all blockquotes. Blockquotes are defined by a line starting with the `>` character. Blockquotes are returned as a dictionary where the key is "Blockquote" and the value is a list of all blockquotes found.
96
-
97
- ### `identify_code_blocks(self)`
98
-
99
- Analyzes the file and identifies all code blocks. Code blocks are defined by a block of text surrounded by lines containing only the "```" text. Code blocks are returned as a dictionary where the key is "Code block" and the value is a list of all code blocks found.
100
-
101
- ### `identify_ordered_lists(self)`
102
-
103
- Analyzes the file and identifies all ordered lists. Ordered lists are defined by lines starting with a number followed by a dot. Ordered lists are returned as a dictionary where the key is "Ordered list" and the value is a list of all ordered lists found.
104
-
105
- ### `identify_unordered_lists(self)`
106
-
107
- Analyzes the file and identifies all unordered lists. Unordered lists are defined by lines starting with a `-`, `*`, or `+`. Unordered lists are returned as a dictionary where the key is "Unordered list" and the value is a list of all unordered lists found.
108
-
109
- ### `identify_tables(self)`
110
-
111
- Analyzes the file and identifies all tables. Tables are defined by lines containing `|` to delimit cells and are separated by lines containing `-` to define the borders. Tables are returned as a dictionary where the key is "Table" and the value is a list of all tables found.
112
-
113
- ### `identify_links(self)`
114
-
115
- Analyzes the file and identifies all links. Links are defined by the format `[text](url)`. Links are returned as a dictionary where the keys are "Text link" and "Image link" and the values are lists of all links found.
116
-
117
- ### `check_links(self)`
118
-
119
- Checks all links identified by `identify_links` to see if they are broken (return a 404 error). Broken links are returned as a list, each item being a dictionary containing the line number, link text, and URL.
120
-
121
- ### `identify_todos(self)`
122
-
123
- Analyzes the file and identifies all todos. Todos are defined by lines starting with `- [ ] `. Todos are returned as a list, each item being a dictionary containing the line number and todo text.
124
-
125
- ### `count_elements(self, element_type)`
126
-
127
- Counts the total number of a specific element type in the file. The `element_type` should match the name of one of the identification methods (for example, "headers" for `identify_headers`). Returns the total number of elements of this type.
128
-
129
- ### `count_words(self)`
130
-
131
- Counts the total number of words in the file. Returns the word count.
132
-
133
- ### `count_characters(self)`
134
-
135
- Counts the total number of characters (excluding spaces) in the file. Returns the character count.
136
-
137
- ## Contributions
138
- Contributions are always welcome! If you have a feature request, bug report, or just want to improve the code, feel free to create a pull request or open an issue.
139
-
140
-
@@ -1,140 +0,0 @@
1
- Metadata-Version: 2.1
2
- Name: markdown-analysis
3
- Version: 0.0.4
4
- Summary: UNKNOWN
5
- Home-page: https://github.com/yannbanas/mrkdwn_analysis
6
- Author: yannbanas
7
- Author-email: yannbanas@gmail.com
8
- License: UNKNOWN
9
- Platform: UNKNOWN
10
- Classifier: Development Status :: 2 - Pre-Alpha
11
- Classifier: Intended Audience :: Developers
12
- Classifier: License :: OSI Approved :: MIT License
13
- Classifier: Programming Language :: Python :: 3.11
14
- Description-Content-Type: text/markdown
15
- License-File: LICENSE
16
-
17
- # mrkdwn_analysis
18
-
19
- `mrkdwn_analysis` is a Python library designed to analyze Markdown files. With its powerful parsing capabilities, it can extract and categorize various elements within a Markdown document, including headers, sections, links, images, blockquotes, code blocks, and lists. This makes it a valuable tool for anyone looking to parse Markdown content for data analysis, content generation, or for building other tools that utilize Markdown.
20
-
21
- ## Features
22
-
23
- - File Loading: The MarkdownAnalyzer can load any given Markdown file provided through the file path.
24
-
25
- - Header Identification: The tool can extract all headers from the markdown file, ranging from H1 to H6 tags. This allows users to have a quick overview of the document's structure.
26
-
27
- - Section Identification: The analyzer can recognize different sections of the document. It defines a section as a block of text followed by a line composed solely of = or - characters.
28
-
29
- - Paragraph Identification: The tool can distinguish between regular text and other elements such as lists, headers, etc., thereby identifying all the paragraphs present in the document.
30
-
31
- - Blockquote Identification: The analyzer can identify and extract all blockquotes in the markdown file.
32
-
33
- - Code Block Identification: The tool can extract all code blocks defined in the document, allowing you to separate the programming code from the regular text easily.
34
-
35
- - List Identification: The analyzer can identify both ordered and unordered lists in the markdown file, providing information about the hierarchical structure of the points.
36
-
37
- - Table Identification: The tool can identify and extract tables from the markdown file, enabling users to separate and analyze tabular data quickly.
38
-
39
- - Link Identification and Validation: The analyzer can identify all links present in the markdown file, categorizing them into text and image links. Moreover, it can also verify if these links are valid or broken.
40
-
41
- - Todo Identification: The tool is capable of recognizing and extracting todos (tasks or action items) present in the document.
42
-
43
- - Element Counting: The analyzer can count the total number of a specific element type in the file. This can help in quantifying the extent of different elements in the document.
44
-
45
- - Word Counting: The tool can count the total number of words in the file, providing an estimate of the document's length.
46
-
47
- - Character Counting: The analyzer can count the total number of characters (excluding spaces) in the file, giving a detailed measure of the document's size.
48
-
49
- ## Installation
50
- You can install `mrkdwn_analysis` from PyPI:
51
-
52
- ```bash
53
- pip install mrkdwn_analysis
54
- ```
55
-
56
- We hope `mrkdwn_analysis` helps you with all your Markdown analyzing needs!
57
-
58
- ## Usage
59
- Using `mrkdwn_analysis` is simple. Just import the `MarkdownAnalyzer` class, create an instance with your Markdown file, and you're good to go!
60
-
61
- ```python
62
- from mrkdwn_analysis import MarkdownAnalyzer
63
-
64
- analyzer = MarkdownAnalyzer("path/to/your/markdown.md")
65
-
66
- headers = analyzer.identify_headers()
67
- sections = analyzer.identify_sections()
68
- ...
69
- ```
70
-
71
- ### Class MarkdownAnalyzer
72
-
73
- The `MarkdownAnalyzer` class is designed to analyze Markdown files. It has the ability to extract and categorize various elements of a Markdown document.
74
-
75
- ### `__init__(self, file_path)`
76
-
77
- The constructor of the class. It opens the specified Markdown file and stores its content line by line.
78
-
79
- - `file_path`: the path of the Markdown file to analyze.
80
-
81
- ### `identify_headers(self)`
82
-
83
- Analyzes the file and identifies all headers (from h1 to h6). Headers are returned as a dictionary where the key is "Header" and the value is a list of all headers found.
84
-
85
- ### `identify_sections(self)`
86
-
87
- Analyzes the file and identifies all sections. Sections are defined as a block of text followed by a line composed solely of `=` or `-` characters. Sections are returned as a dictionary where the key is "Section" and the value is a list of all sections found.
88
-
89
- ### `identify_paragraphs(self)`
90
-
91
- Analyzes the file and identifies all paragraphs. Paragraphs are defined as a block of text that is not a header, list, blockquote, etc. Paragraphs are returned as a dictionary where the key is "Paragraph" and the value is a list of all paragraphs found.
92
-
93
- ### `identify_blockquotes(self)`
94
-
95
- Analyzes the file and identifies all blockquotes. Blockquotes are defined by a line starting with the `>` character. Blockquotes are returned as a dictionary where the key is "Blockquote" and the value is a list of all blockquotes found.
96
-
97
- ### `identify_code_blocks(self)`
98
-
99
- Analyzes the file and identifies all code blocks. Code blocks are defined by a block of text surrounded by lines containing only the "```" text. Code blocks are returned as a dictionary where the key is "Code block" and the value is a list of all code blocks found.
100
-
101
- ### `identify_ordered_lists(self)`
102
-
103
- Analyzes the file and identifies all ordered lists. Ordered lists are defined by lines starting with a number followed by a dot. Ordered lists are returned as a dictionary where the key is "Ordered list" and the value is a list of all ordered lists found.
104
-
105
- ### `identify_unordered_lists(self)`
106
-
107
- Analyzes the file and identifies all unordered lists. Unordered lists are defined by lines starting with a `-`, `*`, or `+`. Unordered lists are returned as a dictionary where the key is "Unordered list" and the value is a list of all unordered lists found.
108
-
109
- ### `identify_tables(self)`
110
-
111
- Analyzes the file and identifies all tables. Tables are defined by lines containing `|` to delimit cells and are separated by lines containing `-` to define the borders. Tables are returned as a dictionary where the key is "Table" and the value is a list of all tables found.
112
-
113
- ### `identify_links(self)`
114
-
115
- Analyzes the file and identifies all links. Links are defined by the format `[text](url)`. Links are returned as a dictionary where the keys are "Text link" and "Image link" and the values are lists of all links found.
116
-
117
- ### `check_links(self)`
118
-
119
- Checks all links identified by `identify_links` to see if they are broken (return a 404 error). Broken links are returned as a list, each item being a dictionary containing the line number, link text, and URL.
120
-
121
- ### `identify_todos(self)`
122
-
123
- Analyzes the file and identifies all todos. Todos are defined by lines starting with `- [ ] `. Todos are returned as a list, each item being a dictionary containing the line number and todo text.
124
-
125
- ### `count_elements(self, element_type)`
126
-
127
- Counts the total number of a specific element type in the file. The `element_type` should match the name of one of the identification methods (for example, "headers" for `identify_headers`). Returns the total number of elements of this type.
128
-
129
- ### `count_words(self)`
130
-
131
- Counts the total number of words in the file. Returns the word count.
132
-
133
- ### `count_characters(self)`
134
-
135
- Counts the total number of characters (excluding spaces) in the file. Returns the character count.
136
-
137
- ## Contributions
138
- Contributions are always welcome! If you have a feature request, bug report, or just want to improve the code, feel free to create a pull request or open an issue.
139
-
140
-