mmmd 0.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml ADDED
@@ -0,0 +1,7 @@
1
+ ---
2
+ SHA256:
3
+ metadata.gz: 8368bf270f4bdba9a1510c3c1f3106a5a56a43bdaf63673219bd7a082de436c6
4
+ data.tar.gz: f7a9e9a9457be72c8ffae3edfb531d4193be174df5db1714fd215809a4c41825
5
+ SHA512:
6
+ metadata.gz: 53fea17654973076e1c9fab3256d8034577c04ea684ee64aaa620bf71bd228c98bed317f3c2b7028298a0eae7ccc6cef4587b3b95b83bc52396327f366d01544
7
+ data.tar.gz: 469aba569708c8e2b88efa34bddf594ac6cd71fb389365e69ff3ef50fc09d7611eaf96c91bbe182ef52ba1c36505bf5df5a51ea733d0231defcaa4b72bb3ffca
data/README.md ADDED
@@ -0,0 +1,3 @@
1
+ # rubymark
2
+
3
+ Modular, compliant Markdown parser in Ruby
data/architecture.md ADDED
@@ -0,0 +1,278 @@
1
+ Architecture of madness
2
+ =======================
3
+
4
+ Prelude
5
+ -------
6
+
7
+ It needs to be stressed that making the parser modular while keeping it
8
+ relatively simple was a laborous undertaking. There has not been a standard
9
+ more hostile towards the people who dare attempt to implement it than
10
+ CommonMark. It should also be noted, that despite it being titled a
11
+ "Standard" in this document, it is less widely adopted than the Github
12
+ Flavored Markdown syntax. Github Flavored Markdown, however, is only but
13
+ a mere subset of this parser's model, albeit requiring a few extensions.
14
+
15
+ Current state (as of March 02, 2025)
16
+ ------------------------------------
17
+
18
+ This parser processes text in what can be boiled down to three phases.
19
+
20
+ - Block/Line phase
21
+ - Overlay phase
22
+ - Inline phase
23
+
24
+ It should be noted that all phases have their own related parser
25
+ classes, and a shared behaviour system, where each parser takes control
26
+ at some point, and may win ambiguous cases by having higher priority
27
+ (see `#define_child`, `#define_overlay` methods for priority parameter)
28
+
29
+ ### Block/Line phase ###
30
+
31
+ The first phase breaks down blocks, line by line, into block structures.
32
+ Blocks (preferably inherited from the Block class) can contain other blocks.
33
+ (i.e. QuoteBlock, ULBlock, OLBlock). Other blocks (known as leaf blocks)
34
+ may not contain anything else (except inline content, more on that later).
35
+
36
+ Blocks are designed to be parsed independently. This means that it *should*
37
+ be possible to tear out any standard block and make it not get parsed.
38
+ This, however, isn't thoroughly tested for.
39
+
40
+ Blocks as proper, real classes have a certain lifecycle to follow when
41
+ being constructed:
42
+
43
+ 1. Open condition
44
+ - A block needs to find its first marker on the current line to open
45
+ (see `#begin?` method)
46
+ - Once it's open, it's immediately initialized and fed the line it just
47
+ read (but now as an object, not as a class) (see `#consume` method)
48
+ 2. Marker/Line consumption
49
+ - While it should be kept open, the block parser instance will
50
+ keep reading inupt through `#consume` method, returning a pair
51
+ of modified line (after consuming its tokens from it) and
52
+ a boolean value indicating permission of lazy continuation
53
+ (if it's a block like a QuoteBlock or ULBlock that can be lazily
54
+ overflowed).
55
+ Every line the parser needs to record needs to be pushed
56
+ through the `#push` method.
57
+ 3. Closure
58
+ - If the current line no longer belongs to the current block
59
+ (if the block should have been closed on the previous line),
60
+ it simply needs to `return` a pair of `nil`, and a boolean value for
61
+ permission of lazy continuation
62
+ - If a block should be closed on the current line, it should capture it,
63
+ keep track of the "closed" state, then `return` `nil` on the next call
64
+ of `#consume`
65
+ - Once a block is closed, it:
66
+ 1. Receives its content from the parser
67
+ 2. Parser receives the "close" method call
68
+ 3. (optional) Parser may have a callable method `#applyprops`. If
69
+ it exists, it gets called with the current constructed block.
70
+ 4. (optional) All overlays assigned to this block's class are
71
+ processed on the contents of this block (more on that in
72
+ Overlay phase)
73
+ 5. (optional) Parser may return a different class, which
74
+ the current block should be cast into (Overlays may change
75
+ the class as well)
76
+ 6. (optional) If a block can respond to `#parse_inner` method, it
77
+ will get called, allowing the block to parse its own contents.
78
+ - After this point, the block is no longer touched until the document
79
+ fully gets processed.
80
+ 4. Inline processing
81
+ - (Applies only to Paragraph and any child of LeafBlock)
82
+ When the document gets fully processed, the contents of the current
83
+ block are taken, assigned to an InlineRoot instance, and then parsed
84
+ in Inline mode
85
+ 5. Completion
86
+ - The resulting document is then returned.
87
+
88
+ While there is a lot of functionality available in desgining blocks, it is
89
+ not necessary for the simplest of the block kinds available. The simplest
90
+ example of a block parser is likely the ThematicBreakParser class, which
91
+ implements the only 2 methods needed for a block parser to function.
92
+
93
+ While parsing text, a block may use additional info:
94
+
95
+ - In consume method: `lazy` hasharg, if the current line is being processed
96
+ in lazy continuation mode (likely only ever matters for Paragraph); and
97
+ `parent` - the parent block containing this block.
98
+
99
+ Block interpretations are tried in decreasing order of their priority
100
+ value, as applied using the `#define_child` method.
101
+
102
+ For blocks to be properly indexed, they need to be a valid child or
103
+ a valid descendant (meaning reachable through child chain) of the
104
+ Document class.
105
+
106
+ ### Overlay phase ###
107
+
108
+ Overlay phase doesn't start at some specific point in time. Rather,
109
+ Overlay phase happens for every block individually - when that block
110
+ closes.
111
+
112
+ Overlay mechanism can be applied to any DOMObject type, so long as its
113
+ close method is called at some point (this may not be of interest to
114
+ people that do not implement custom syntax, as it generally translates
115
+ to "only block level elements get their overlays processed")
116
+
117
+ Overlay mechanism provides the ability to perform some action on the block
118
+ right after it gets closed and right before it gets interpreted by the
119
+ inline phase. Overlays may do the following:
120
+
121
+ - Change the block's class
122
+ (by returning a class from the `#process` method)
123
+ - Change the block's content (by directly editing it)
124
+ - Change the block's properties (by modifying its `properties` hash)
125
+
126
+ Overlay interpretations are tried in decreasing order of their priority
127
+ value, as defined using the `#define_overlay` method.
128
+
129
+ ### Inline phase ###
130
+
131
+ Once all blocks have been processed, and all overlays have been applied
132
+ to their respective block types, the hook in the Document class's
133
+ `#parser` method executes inline parsing phase of all leaf blocks
134
+ (descendants of the `Leaf` class) and paragraphs.
135
+
136
+ The outer class encompassing all inline children of a block is
137
+ `InlineRoot`. As such, if an inline element is to ever appear within the
138
+ text, it needs to be reachable as a child or a descendant of InlineRoot.
139
+
140
+ Inline parsing works in three parts:
141
+
142
+ - First, the contens are tokenized (every parser marks its own tokens)
143
+ - Second, the forward walk procedure is called
144
+ - Third, the reverse walk procedure is called
145
+
146
+ This process is repeated for every group of parsers with equal priority.
147
+ At one point in time, only all the parsers of equal priority may run in
148
+ the same step. Then, the process goes to the next step, of parsers of
149
+ higher priority value. As counter-intuitive as this is, this means that
150
+ it goes to the parsers of _lower_ priority.
151
+
152
+ At the very end of the process, the remaining strings are concatenated
153
+ within the mixed array of inlines and strings, and turned into Text
154
+ nodes, after which the contents of the array are appended as children to
155
+ the root node.
156
+
157
+ This process is recursively applied to all elements which may have child
158
+ elements. This is ensured when an inline parser calls the "build"
159
+ utility method.
160
+
161
+ The inline parser is a class that implements static methods `tokenize`
162
+ and either `forward_walk` or `reverse_walk`. Both may be implemented at
163
+ the same time, but this isn't advisable.
164
+
165
+ The tokenization process is characterized by calling every parser in the
166
+ current group with every string in tokens array using the `tokenize`
167
+ method. It is expected that the parser breaks the string down into an
168
+ array of other strings and tokens. A token is an array where the first
169
+ element is the literal text representation of the token, the second
170
+ value is the class of the parser, and the _last_ value (_not third_) is
171
+ the `:close` or `:open` symbol (though functionally it may hold any
172
+ symbol value). Any additional information the parser may need in later
173
+ stages may be stored between the last element and the second element.
174
+
175
+ Example:
176
+
177
+ Input:
178
+
179
+ "_this _is a string of_ tokens_"
180
+
181
+ Output:
182
+
183
+ [["_", ::PointBlank::Parsing::EmphInline, :open],
184
+ "this ",
185
+ ["_", ::PointBlank::Parsing::EmphInline, :open],
186
+ "is a string of",
187
+ ["_", ::PointBlank::Parsing::EmphInline, :close],
188
+ " tokens",
189
+ ["_", ::PointBlank::Parsing::EmphInline, :close]]
190
+
191
+ The forward walk is characterized by calling parsers which implement the
192
+ `#forward_walk` method. When the main class encounters an opening token
193
+ in `forward_walk`, it will call the `#forward_walk` method of the class
194
+ that represents this token. It is expected that the parser class will
195
+ then attempt to build the first available occurence of the inline
196
+ element it represents, after which it will return the array of all
197
+ tokens and strings that it was passed where the first element will be
198
+ the newly constructed inline element. If it is unable to close the
199
+ block, it should simply return the original contents, unmodified.
200
+
201
+ Example:
202
+
203
+ Original text:
204
+
205
+ this is outside the inline `this is inside the inline` and this
206
+ is right after the inline `and this is the next inline`
207
+
208
+ Input:
209
+
210
+ [["`", ::PointBlank::Parsing::CodeInline, :open],
211
+ "this is inside the inline"
212
+ ["`", ::PointBlank::Parsing::CodeInline, :close],
213
+ " and this is right after the inline ",
214
+ ["`", ::PointBlank::Parsing::CodeInline, :open],
215
+ "and this is the next inline"
216
+ ["`", ::PointBlank::Parsing::CodeInline, :close]]
217
+
218
+ Output:
219
+
220
+ [<::PointBlank::DOM::InlineCode
221
+ @content = "this is inside the inline">,
222
+ " and this is right after the inline ",
223
+ ["`", ::PointBlank::Parsing::CodeInline, :open],
224
+ "and this is the next inline"
225
+ ["`", ::PointBlank::Parsing::CodeInline, :close]]
226
+
227
+ The reverse walk is characterized by calling parsers which implement the
228
+ `#reverse_walk` method when the main class encounters a closing token
229
+ for this class (the one that contains the `:close` symbol in the last
230
+ position of the token information array). After that the main class will
231
+ call the parser's `#reverse_walk` method with the current list of
232
+ tokens, inlines and strings. It is expected that the parser will then
233
+ collect all the blocks, strings and inlines that fit within the block
234
+ closed by the last element in the list, and once it encounters the
235
+ appropriate opening token for the closing token in the last position of
236
+ the array, it will then replace the elements fitting within that inline
237
+ with a class containing all the collected elements. If it is unable to
238
+ find a matching opening token for the closing token in the last
239
+ position, it should simply return the original contents, unmodified.
240
+
241
+ Example:
242
+
243
+ Original text:
244
+
245
+ blah blah something something lots of text before the emphasis
246
+ _this is emphasized `and this is an inline` but it's still
247
+ emphasized_
248
+
249
+
250
+ Input:
251
+
252
+ ["blah blah something something lots of text before the emphasis",
253
+ ["_", ::PointBlank::Parsing::EmphInline, :open],
254
+ "this is emphasized",
255
+ <::PointBlank::DOM::InlineCode,
256
+ @content = "and this is an inline">,
257
+ " but it's still emphasized",
258
+ ["_", ::PointBlank::Parsing::EmphInline, :close]]
259
+
260
+ Output:
261
+
262
+ ["blah blah something something lots of text before the emphasis",
263
+ <::PointBlank::DOM::InlineEmphasis,
264
+ children = [...,
265
+ <::PointBlank::DOM::InlineCode ...>
266
+ ...]>]
267
+
268
+ Both `#forward_walk` and `#reverse_walk` are not restricted to making
269
+ just the changes discussed above, and can arbitrarily modify the token
270
+ arrays. That, however, should be done with great care, so as to not
271
+ accidentally break compatibility with other parsers.
272
+
273
+ To ensure that the collected tokens in the `#reverse_walk` and
274
+ `#forward_walk` are processes correctly, the colllected arrays of
275
+ tokens, blocks and inlines should be built into an object that
276
+ represents this parser using the `build` method (it will automatically
277
+ attempt to find the correct class to construct using the
278
+ `#define_parser` directive in the DOMObject subclass definition)
data/bin/mmmdpp ADDED
@@ -0,0 +1,168 @@
1
+ #!/usr/bin/env ruby
2
+ # frozen_string_literal: true
3
+
4
+ require 'io/console/size'
5
+ require 'optionparser'
6
+ require 'json'
7
+ require 'mmmd'
8
+
9
+ class ParserError < StandardError
10
+ end
11
+
12
+ class OptionNavigator
13
+ def initialize
14
+ @options = {}
15
+ end
16
+
17
+ # Read a definition
18
+ # @param define [String]
19
+ def read_definition(define)
20
+ define.split(";").each do |part|
21
+ locstring, _, value = part.partition(":")
22
+ locstring = deconstruct(locstring.strip)
23
+ assign(locstring, JSON.parse(value))
24
+ end
25
+ end
26
+
27
+ attr_reader :options
28
+
29
+ private
30
+
31
+ def check_unescaped(str, index)
32
+ return true if index.zero?
33
+
34
+ reverse_index = index - 1
35
+ count = 0
36
+ while str[reverse_index] == "\\"
37
+ break if reverse_index.zero?
38
+
39
+ count += 1
40
+ reverse_index -= 1
41
+ end
42
+ count.even?
43
+ end
44
+
45
+ def find_unescaped(str, pattern, index)
46
+ found = str.index(pattern, index)
47
+ return nil unless found
48
+
49
+ until check_unescaped(str, found)
50
+ index = found + 1
51
+ found = str.index(pattern, index)
52
+ return nil unless found
53
+ end
54
+ found
55
+ end
56
+
57
+ def deconstruct(locstring)
58
+ parts = []
59
+ buffer = ""
60
+ part = nil
61
+ until locstring.empty?
62
+ case locstring[0]
63
+ when '"'
64
+ raise ParserError, 'separator missing' unless buffer.empty?
65
+
66
+ closepart = find_unescaped(locstring, '"', 1)
67
+ raise ParserError, 'unclosed string' unless closepart
68
+
69
+ buffer = locstring[0..closepart]
70
+ part = buffer[1..-2]
71
+ locstring = locstring[closepart + 1..]
72
+ when '.'
73
+ parts.append(part)
74
+ buffer = ""
75
+ part = nil
76
+ locstring = locstring[1..]
77
+ when '['
78
+ raise ParserError, 'separator missing' unless buffer.empty?
79
+
80
+ closepart = find_unescaped(locstring, ']', 1)
81
+ raise ParserError, 'unclosed index' unless closepart
82
+
83
+ buffer = locstring[0..closepart]
84
+ part = locstring[1..-2].to_i
85
+ locstring = locstring.delete_prefix(buffer)
86
+ else
87
+ raise ParserError, 'separator missing' unless buffer.empty?
88
+
89
+ buffer = locstring.match(/^[\w_]+/)[0]
90
+ part = buffer.to_sym
91
+ locstring = locstring.delete_prefix(buffer)
92
+ end
93
+ end
94
+ parts.append(part) if part
95
+ parts
96
+ end
97
+
98
+ def assign(keys, value)
99
+ current = @options
100
+ while keys.length > 1
101
+ current_key = keys.shift
102
+ unless current[current_key]
103
+ next_key = keys.first
104
+ case next_key
105
+ when Integer
106
+ current[current_key] = []
107
+ when String
108
+ current[current_key] = {}
109
+ when Symbol
110
+ current[current_key] = {}
111
+ end
112
+ end
113
+ current = current[current_key]
114
+ end
115
+ current[keys.shift] = value
116
+ end
117
+ end
118
+
119
+ options = {
120
+ include: [],
121
+ nav: OptionNavigator.new
122
+ }
123
+ parser = OptionParser.new do |opts|
124
+ opts.banner = "Usage: mmmdpp [OPTIONS] (input|-) (output|-)"
125
+
126
+ opts.on("-r", "--renderer [STRING]", String,
127
+ "Specify renderer to use for this document") do |renderer|
128
+ options[:renderer] = renderer
129
+ end
130
+
131
+ opts.on("-i", "--include [STRING]", String,
132
+ "Script to execute before rendering.\
133
+ May be specified multiple times.") do |inc|
134
+ options[:include].append(inc)
135
+ end
136
+
137
+ opts.on("-o", "--option [STRING]", String,
138
+ "Add option string. Can be repeated. Format: <key>: <JSON value>\n"\
139
+ "<key>: (<\"string\">|<symbol>|<[integer]>)"\
140
+ "[.(<\"string\"|<symbol>|<[integer]>[...]]\n"\
141
+ "Example: \"style\".\"CodeBlock\".literal.[0]: 50") do |value|
142
+ options[:nav].read_definition(value) if value
143
+ end
144
+ end
145
+ parser.parse!
146
+
147
+ unless ARGV[1]
148
+ warn parser.help
149
+ exit 1
150
+ end
151
+
152
+ Renderers = {
153
+ "HTML" => -> { ::MMMD::Renderers::HTML },
154
+ "Plainterm" => -> { ::MMMD::Renderers::Plainterm }
155
+ }.freeze
156
+
157
+ options[:include].each { |name| Kernel.load(name) }
158
+ renderer_opts = options[:nav].options
159
+ renderer_opts["hsize"] ||= IO.console_size[1]
160
+ input = ARGV[0] == "-" ? $stdin.read : File.read(ARGV[0])
161
+ output = ARGV[1] == "-" ? $stdout : File.open(ARGV[1], "w")
162
+ doc = MMMD.parse(input)
163
+ rclass = Renderers[options[:renderer] || "Plainterm"]
164
+ raise StandardError, "unknown renderer: #{options[:renderer]}" unless rclass
165
+
166
+ renderer = rclass.call.new(doc, renderer_opts)
167
+ output.puts(renderer.render)
168
+ output.close