mmmd 0.1.1 → 0.1.3

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: f64700645c2b97717469cd6da2ecd780138d0722f43f1503375e367f9574b718
4
- data.tar.gz: 153b9712bdb5ed0fe66681f64105152360bd50abf9aa907e97e5ef2f33ca3f2b
3
+ metadata.gz: 98817fd73ed29d5b59ed440706d868c6a627632b072b39e8f0b41ade4d7253c6
4
+ data.tar.gz: f60b756923a31d9194aa574ddadc6d46f75f1de564b1e59198f8706fb88a1f2f
5
5
  SHA512:
6
- metadata.gz: dfe1a17ba166b8eb2076ef3097552d40fba6f1ea040b85d741ab650fe0794320ec20528b9887ac83c9b7620d5b6844805d0b12c0c7a8c547fd2cad889af1b046
7
- data.tar.gz: cd476ec80f1022b99115b3e6a0213ceca078d736fe3c3dce47aa3acd7404f15766d41e1324d2d90f84781b82a7f3a959df2c479584a02d28a3bf73b9f5ea5c81
6
+ metadata.gz: 16a7da77e52b739ae8b40be7a5010731d4ddc8dde05388d4626477bfead344bdc9e52fcf6280684004d55d33a18477ec45aae083a962a839f5e62699951fff3f
7
+ data.tar.gz: 45efe890c613a4e0fd19805d7ae585200b3c89f24b217feffafe88c6a742d0e5e33870e72de138ffa3b4ca080cf89784e85406cf5e8d51dadbd84970b99f1448
data/README.md CHANGED
@@ -1,3 +1,59 @@
1
- # rubymark
1
+ MMMD (Mark My Message Down)
2
+ ============
3
+
4
+ (Originally titled Rubymark)
2
5
 
3
6
  Modular, compliant Markdown parser in Ruby
7
+
8
+ Installation
9
+ ------------
10
+
11
+ This package is available as a gem over at
12
+ [rubygems.org](https://rubygems.org/gems/mmmd).
13
+
14
+ Installing it is as simple as executing `gem install mmmd`
15
+
16
+ Usage
17
+ -----
18
+
19
+ This package is generally intended as a library, but it also
20
+ includes a CLI tool which permits the usage of the library
21
+ for simple document translation tasks.
22
+
23
+ Examples:
24
+ ```sh
25
+ # Render the file in a terminal-oriented format
26
+ $ mmmdpp file.md -
27
+
28
+ # Read the markdown contents directly from input, output to stdout
29
+ $ external-program | mmmdpp - -
30
+
31
+ # Render file.md to a complete webpage
32
+ $ mmmdpp -r HTML file.md file.html
33
+
34
+ # Render file.md into a complete webpage and add extra tags to head and
35
+ # wrap all images with a figure tag with figcaption containing image title
36
+ $ mmmdpp -r HTML -o '"head": ["<meta charset=\"UTF-8\">", "<style>img { max-width: 90%; }</style>"]' -o '"mapping"."PointBlank::DOM::InlineImage".figcaption: true' - -
37
+
38
+ # Render file.md into a set of HTML tags, without anything extra
39
+ $ mmmdpp -r HTML -o '"nowrap": true' file.md file.html
40
+ ```
41
+
42
+ A lot more usage options are documented on the Wiki page for the project
43
+
44
+ License
45
+ -------
46
+
47
+ Copyright 2025 yessiest@text.512mb.org
48
+
49
+ Licensed under the Apache License, Version 2.0 (the "License");
50
+ you may not use this file except in compliance with the License.
51
+ You may obtain a copy of the License at
52
+
53
+ http://www.apache.org/licenses/LICENSE-2.0
54
+
55
+ Unless required by applicable law or agreed to in writing, software
56
+ distributed under the License is distributed on an "AS IS" BASIS,
57
+ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
58
+ See the License for the specific language governing permissions and
59
+ limitations under the License.
data/bin/mmmdpp CHANGED
@@ -17,11 +17,8 @@ class OptionNavigator
17
17
  # Read a definition
18
18
  # @param define [String]
19
19
  def read_definition(define)
20
- define.split(";").each do |part|
21
- locstring, _, value = part.partition(":")
22
- locstring = deconstruct(locstring.strip)
23
- assign(locstring, JSON.parse(value))
24
- end
20
+ locstring, value = deconstruct(define)
21
+ assign(locstring, JSON.parse(value))
25
22
  end
26
23
 
27
24
  attr_reader :options
@@ -83,6 +80,9 @@ class OptionNavigator
83
80
  buffer = locstring[0..closepart]
84
81
  part = locstring[1..-2].to_i
85
82
  locstring = locstring.delete_prefix(buffer)
83
+ when ':'
84
+ locstring = locstring.delete_prefix(':')
85
+ break
86
86
  else
87
87
  raise ParserError, 'separator missing' unless buffer.empty?
88
88
 
@@ -92,7 +92,7 @@ class OptionNavigator
92
92
  end
93
93
  end
94
94
  parts.append(part) if part
95
- parts
95
+ [parts, locstring]
96
96
  end
97
97
 
98
98
  def assign(keys, value)
@@ -367,7 +367,7 @@ module PointBlank
367
367
 
368
368
  # (see ::PointBlank::Parsing::NullParser#consume)
369
369
  def consume(line, parent = nil, lazy: false)
370
- @lazy_triggered = lazy || @lazy_triggered
370
+ @lazy_triggered = lazy || @lazy_triggered unless line.strip.empty?
371
371
  return [nil, nil] if line.match?(/\A {0,3}\Z/)
372
372
  return [nil, nil] if @closed
373
373
  return [nil, nil] if check_candidates(line, parent)
@@ -957,9 +957,9 @@ module PointBlank
957
957
  parts = tokens
958
958
  @valid_parsers.each do |parser|
959
959
  newparts = []
960
- parts.each do |x|
960
+ parts.each_with_index do |x, i|
961
961
  if x.is_a? String
962
- newparts.append(*parser.tokenize(x))
962
+ newparts.append(*parser.tokenize(x, newparts.last, parts[i + 1]))
963
963
  else
964
964
  newparts.append(x)
965
965
  end
@@ -1013,8 +1013,10 @@ module PointBlank
1013
1013
 
1014
1014
  # Tokenize a string
1015
1015
  # @param string [String]
1016
+ # @param before [String, ::PointBlank::DOM::DOMObject]
1017
+ # @param after [String, ::PointBlank::DOM::DOMObject]
1016
1018
  # @return [Array<Array(String, Class, Symbol), String>]
1017
- def self.tokenize(string)
1019
+ def self.tokenize(string, _before, _after)
1018
1020
  [string]
1019
1021
  end
1020
1022
 
@@ -1142,18 +1144,11 @@ module PointBlank
1142
1144
  # Code inline parser
1143
1145
  class CodeInline < NullInline
1144
1146
  # (see ::PointBlank::Parsing::NullInline#tokenize)
1145
- def self.tokenize(string)
1146
- open = {}
1147
+ def self.tokenize(string, *_lookaround)
1147
1148
  iterate_tokens(string, "`") do |_before, current_text, matched|
1148
1149
  if matched
1149
1150
  match = current_text.match(/^`+/)[0]
1150
- if open[match]
1151
- open[match] = nil
1152
- [match, self, :close]
1153
- else
1154
- open[match] = true
1155
- [match, self, :open]
1156
- end
1151
+ [match, self, :open]
1157
1152
  else
1158
1153
  current_text[0]
1159
1154
  end
@@ -1170,9 +1165,9 @@ module PointBlank
1170
1165
  text = (part.is_a?(Array) ? part.first : part)
1171
1166
  buffer += text
1172
1167
  next unless part.is_a? Array
1168
+ next if idx.zero?
1173
1169
 
1174
- break (cutoff = idx) if part.first == opening &&
1175
- part.last == :close
1170
+ break (cutoff = idx) if part.first == opening
1176
1171
  end
1177
1172
  buffer = construct_literal(buffer[opening.length..(-1 - opening.length)])
1178
1173
  [cutoff.positive? ? build([buffer]) : opening, parts[(cutoff + 1)..]]
@@ -1182,7 +1177,7 @@ module PointBlank
1182
1177
  # Autolink inline parser
1183
1178
  class AutolinkInline < NullInline
1184
1179
  # (see ::PointBlank::Parsing::NullInline#tokenize)
1185
- def self.tokenize(string)
1180
+ def self.tokenize(string, *_lookaround)
1186
1181
  iterate_tokens(string, /[<>]/) do |_before, current_text, matched|
1187
1182
  if matched
1188
1183
  if current_text.start_with?("<")
@@ -1238,11 +1233,10 @@ module PointBlank
1238
1233
  linkinfo = capture[-1][2]
1239
1234
  obj = build(capture[1..-2])
1240
1235
  if linkinfo[:label]
1241
- if (props = doc.root.properties[:linkdefs][linkinfo[:label]])
1242
- linkinfo = props
1243
- else
1236
+ unless (props = doc.root.properties[:linkdefs][linkinfo[:label]])
1244
1237
  return nil
1245
1238
  end
1239
+ linkinfo = props
1246
1240
  end
1247
1241
  obj.properties = linkinfo
1248
1242
  obj
@@ -1277,7 +1271,7 @@ module PointBlank
1277
1271
  end
1278
1272
 
1279
1273
  # (see ::PointBlank::Parsing::NullInline#tokenize)
1280
- def self.tokenize(string)
1274
+ def self.tokenize(string, *_lookaround)
1281
1275
  iterate_tokens(string, /(?:!\[|\]\()/) do |_before, text, matched|
1282
1276
  next text[0] unless matched
1283
1277
  next ["![", self, :open] if text.start_with? "!["
@@ -1296,7 +1290,7 @@ module PointBlank
1296
1290
  end
1297
1291
 
1298
1292
  # (see ::PointBlank::Parsing::NullInline#tokenize)
1299
- def self.tokenize(string)
1293
+ def self.tokenize(string, *_lookaround)
1300
1294
  iterate_tokens(string, /(?:\[|\][(\[])/) do |_before, text, matched|
1301
1295
  next text[0] unless matched
1302
1296
  next ["[", self, :open] if text.start_with? "["
@@ -1308,20 +1302,61 @@ module PointBlank
1308
1302
  end
1309
1303
  end
1310
1304
 
1305
+ # TODO: this seems way too complicated for something that's supposed
1306
+ # to be a goddamn emphasis markup parser. i'd blame it on commonmark's
1307
+ # convoluted specs.
1308
+ # (P.S: it could be possible to make this easier for implementers by
1309
+ # making a claims system with pointers that do not modify the string
1310
+ # while it's being parsed. however that would just move complexity from
1311
+ # the parser into the scanner instead. and it does not resolve the
1312
+ # problem of overlapping claims as efficiently as simply splitting text
1313
+ # into tokens and remaining string bits.)
1314
+
1311
1315
  # Emphasis and strong emphasis inline parser
1312
1316
  class EmphInline < NullInline
1313
1317
  INFIX_TOKENS = /^[^\p{S}\p{P}\p{Zs}_]_++[^\p{S}\p{P}\p{Zs}_]$/
1314
1318
  # (see ::PointBlank::Parsing::NullInline#tokenize)
1315
- def self.tokenize(string)
1319
+ def self.tokenize(string, before, after)
1320
+ bfrb = extract_left(before)
1321
+ afra = extract_right(after)
1316
1322
  iterate_tokens(string, /(?:_++|\*++)/) do |bfr, text, matched|
1317
1323
  token, afr = text.match(/^(_++|\*++)(.?)/)[1..2]
1318
- left = left_token?(bfr[-1] || "", token, afr)
1319
- right = right_token?(bfr[-1] || "", token, afr)
1324
+ bfr = bfr[-1] || bfrb || ""
1325
+ afr = afr.empty? ? afra || "" : afr
1326
+ left = left_token?(bfr, token, afr)
1327
+ right = right_token?(bfr, token, afr)
1320
1328
  break_into_elements(token, [bfr[-1] || "", token, afr].join(''),
1321
1329
  left, right, matched)
1322
1330
  end
1323
1331
  end
1324
1332
 
1333
+ # Extract left-flanking token from before the tokenized string
1334
+ # @param bfr [String, ::PointBlank::DOM::DOMObject, Array(String, Class, Symbol)]
1335
+ # @return [String]
1336
+ def self.extract_left(bfr)
1337
+ case bfr
1338
+ when String
1339
+ bfr[-1]
1340
+ when ::PointBlank::DOM::DOMObject
1341
+ "."
1342
+ when Array
1343
+ bfr.first[-1]
1344
+ end
1345
+ end
1346
+
1347
+ # Extract right-flanking token from after the tokenized string
1348
+ # @param afr [String, ::PointBlank::DOM::DOMObject, Array(String, Class, Symbol)]
1349
+ # @return [String]
1350
+ def self.extract_right(afr)
1351
+ case afr
1352
+ when String
1353
+ afr[0]
1354
+ when ::PointBlank::DOM::DOMObject
1355
+ "."
1356
+ when Array
1357
+ afr.first[0]
1358
+ end
1359
+ end
1325
1360
  # Is this token, given these surrounding characters, left-flanking?
1326
1361
  # @param bfr [String]
1327
1362
  # @param token [String]
@@ -1386,8 +1421,9 @@ module PointBlank
1386
1421
  ((blk.first.length % 3).zero? &&
1387
1422
  (closer.first.length % 3).zero?))
1388
1423
  (open ? capture : before).prepend(blk)
1389
- next unless blk.is_a?(Array)
1390
- return backlog unless blk[1].check_contents(capture)
1424
+ next unless blk.is_a?(Array) && !open
1425
+
1426
+ return backlog unless closer[1].check_contents(capture)
1391
1427
  end
1392
1428
  return backlog if open
1393
1429
 
@@ -1431,7 +1467,7 @@ module PointBlank
1431
1467
  # Hard break
1432
1468
  class HardBreakInline < NullInline
1433
1469
  # (see ::PointBlank::Parsing::NullInline#tokenize)
1434
- def self.tokenize(string)
1470
+ def self.tokenize(string, *_lookaround)
1435
1471
  iterate_tokens(string, /(?: \n|\\\n)/) do |_before, token, matched|
1436
1472
  next ["\n", self, :close] if token.start_with?(" \n")
1437
1473
  next ["\n", self, :close] if matched
@@ -45,7 +45,8 @@ module MMMD
45
45
  tag: "ol"
46
46
  },
47
47
  "PointBlank::DOM::IndentBlock" => {
48
- tag: "pre"
48
+ tag: "pre",
49
+ codeblock: true
49
50
  },
50
51
  "PointBlank::DOM::ULListElement" => {
51
52
  tag: "li"
@@ -87,7 +88,8 @@ module MMMD
87
88
  tag: "pre",
88
89
  outer: {
89
90
  tag: "code"
90
- }
91
+ },
92
+ codeblock: true
91
93
  },
92
94
  "PointBlank::DOM::QuoteBlock" => {
93
95
  tag: "blockquote"
@@ -126,7 +128,11 @@ module MMMD
126
128
 
127
129
  def initialize(overrides)
128
130
  @mapping = self.class.mapping
129
- @mapping = @mapping.merge(overrides["mapping"]) if overrides["mapping"]
131
+ overrides["mapping"]&.each do |key, value|
132
+ next unless @mapping[key]
133
+
134
+ @mapping[key] = @mapping[key].merge(value)
135
+ end
130
136
  end
131
137
 
132
138
  attr_reader :mapping
@@ -154,7 +160,7 @@ module MMMD
154
160
  text = _render(@document, @options, level: @options["init_level"])
155
161
  @options["init_level"].times { text = indent(text) }
156
162
  if @options["nowrap"]
157
- text
163
+ remove_pre_spaces(text)
158
164
  else
159
165
  [
160
166
  preambule,
@@ -226,8 +232,7 @@ module MMMD
226
232
  end
227
233
 
228
234
  def _render(element, options, inline: false, level: 0, literaltext: false)
229
- modeswitch = element.is_a?(::PointBlank::DOM::LeafBlock) ||
230
- element.is_a?(::PointBlank::DOM::Paragraph)
235
+ modeswitch = figure_out_modeswitch(element)
231
236
  inline ||= modeswitch
232
237
  level += 1 unless inline
233
238
  text = if element.children.empty?
@@ -248,21 +253,29 @@ module MMMD
248
253
  literaltext: literaltext)
249
254
  end
250
255
 
256
+ def figure_out_modeswitch(element)
257
+ element.is_a?(::PointBlank::DOM::LeafBlock) ||
258
+ element.is_a?(::PointBlank::DOM::Paragraph)
259
+ end
260
+
251
261
  def run_filters(text, element, level:, inline:, modeswitch:,
252
262
  literaltext:)
253
263
  element_style = @mapping[element.class.name]
254
264
  return text unless element_style
255
265
  return text if literaltext
256
266
 
267
+ codeblock = element_style[:codeblock]
257
268
  hsize = @options["linewrap"] - (level * @options["indent"])
258
- text = wordwrap(text, hsize) if modeswitch
269
+ text = wordwrap(text, hsize) if modeswitch && !codeblock
270
+
259
271
  if element_style[:sanitize]
260
272
  text = MMMD::EntityUtils.encode_entities(text)
261
273
  end
262
274
  if element_style[:inline]
263
275
  innerclose(element, element_style, text)
264
276
  else
265
- openclose(text, element, element_style, inline)
277
+ openclose(text, element, element_style,
278
+ codeblock ? false : inline)
266
279
  end
267
280
  end
268
281
 
@@ -272,20 +285,25 @@ module MMMD
272
285
  opentag + text + closetag
273
286
  else
274
287
  [opentag,
275
- indent(text),
288
+ indent(text.rstrip),
276
289
  closetag].join("\n")
277
290
  end
278
291
  end
279
292
 
280
293
  def innerclose(element, style, text)
281
294
  props = element.properties
282
- tag = "<#{style[:tag]}"
295
+ tag = ""
296
+ tag += "<figure>" if style[:figcaption]
297
+ tag += "<#{style[:tag]}"
283
298
  tag += " style=#{style[:style].inspect}" if style[:style]
284
299
  tag += " href=#{read_link(element)}" if style[:href]
285
300
  tag += " alt=#{text.inspect}" if style[:alt]
286
301
  tag += " src=#{read_link(element)}" if style[:src]
287
302
  tag += " title=#{read_title(element)}" if style[:title] && props[:title]
288
303
  tag += ">"
304
+ if style[:figcaption]
305
+ tag += "<figcaption>#{text}</figcaption></figure>"
306
+ end
289
307
  if style[:outer]
290
308
  outeropen, outerclose = construct_tags(style[:outer], element)
291
309
  tag = outeropen + tag + outerclose
@@ -298,7 +316,11 @@ module MMMD
298
316
 
299
317
  props = element.properties
300
318
  opentag = "<#{style[:tag]}"
319
+ opentag += "<figure>#{opentag}" if style[:figcaption]
301
320
  closetag = "</#{style[:tag]}>"
321
+ if style[:figcaption]
322
+ closetag += "<figcaption>#{text}</figcaption></figure>"
323
+ end
302
324
  opentag += " style=#{style[:style].inspect}" if style[:style]
303
325
  opentag += " href=#{read_link(element)}" if style[:href]
304
326
  opentag += " src=#{read_link(element)}" if style[:src]
@@ -326,7 +348,7 @@ module MMMD
326
348
 
327
349
  def indent(text)
328
350
  text.lines.map do |line|
329
- "#{' ' * @options["indent"]}#{line}"
351
+ "#{' ' * @options['indent']}#{line}"
330
352
  end.join('')
331
353
  end
332
354
 
@@ -334,7 +356,7 @@ module MMMD
334
356
  head = @options['head']
335
357
  headinfo = "#{indent(<<~HEAD.rstrip)}\n " if head
336
358
  <head>
337
- #{head.is_a?(Array) ? head.join("\n") : head}
359
+ #{indent(head.is_a?(Array) ? head.join("\n") : head)}
338
360
  </head>
339
361
  HEAD
340
362
  headinfo ||= " "
@@ -381,7 +381,11 @@ module MMMD
381
381
  def initialize(overrides)
382
382
  @style = self.class.style
383
383
  @effect_priority = self.class.effect_priority
384
- @style = @style.merge(overrides["style"]) if overrides["style"]
384
+ overrides["style"]&.each do |key, value|
385
+ next unless @style[key]
386
+
387
+ @style[key] = @style[key].merge(value)
388
+ end
385
389
  end
386
390
 
387
391
  attr_reader :style, :effect_priority
metadata CHANGED
@@ -1,7 +1,7 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: mmmd
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.1.1
4
+ version: 0.1.3
5
5
  platform: ruby
6
6
  authors:
7
7
  - Yessiest
@@ -9,7 +9,7 @@ autorequire:
9
9
  bindir:
10
10
  - bin
11
11
  cert_chain: []
12
- date: 2025-03-09 00:00:00.000000000 Z
12
+ date: 2025-04-03 00:00:00.000000000 Z
13
13
  dependencies: []
14
14
  description: |
15
15
  MMMD (short for Mark My Manuscript Down) is a Markdown processor
@@ -21,12 +21,8 @@ executables:
21
21
  extensions: []
22
22
  extra_rdoc_files:
23
23
  - README.md
24
- - architecture.md
25
- - security.md
26
- - test.md
27
24
  files:
28
25
  - README.md
29
- - architecture.md
30
26
  - bin/mmmdpp
31
27
  - lib/mmmd.rb
32
28
  - lib/mmmd/blankshell.rb
@@ -35,8 +31,6 @@ files:
35
31
  - lib/mmmd/renderers/html.rb
36
32
  - lib/mmmd/renderers/plainterm.rb
37
33
  - lib/mmmd/util.rb
38
- - security.md
39
- - test.md
40
34
  homepage: https://adastra7.net/git/Yessiest/rubymark
41
35
  licenses:
42
36
  - AGPL-3.0-or-later
data/architecture.md DELETED
@@ -1,278 +0,0 @@
1
- Architecture of madness
2
- =======================
3
-
4
- Prelude
5
- -------
6
-
7
- It needs to be stressed that making the parser modular while keeping it
8
- relatively simple was a laborous undertaking. There has not been a standard
9
- more hostile towards the people who dare attempt to implement it than
10
- CommonMark. It should also be noted, that despite it being titled a
11
- "Standard" in this document, it is less widely adopted than the Github
12
- Flavored Markdown syntax. Github Flavored Markdown, however, is only but
13
- a mere subset of this parser's model, albeit requiring a few extensions.
14
-
15
- Current state (as of March 02, 2025)
16
- ------------------------------------
17
-
18
- This parser processes text in what can be boiled down to three phases.
19
-
20
- - Block/Line phase
21
- - Overlay phase
22
- - Inline phase
23
-
24
- It should be noted that all phases have their own related parser
25
- classes, and a shared behaviour system, where each parser takes control
26
- at some point, and may win ambiguous cases by having higher priority
27
- (see `#define_child`, `#define_overlay` methods for priority parameter)
28
-
29
- ### Block/Line phase ###
30
-
31
- The first phase breaks down blocks, line by line, into block structures.
32
- Blocks (preferably inherited from the Block class) can contain other blocks.
33
- (i.e. QuoteBlock, ULBlock, OLBlock). Other blocks (known as leaf blocks)
34
- may not contain anything else (except inline content, more on that later).
35
-
36
- Blocks are designed to be parsed independently. This means that it *should*
37
- be possible to tear out any standard block and make it not get parsed.
38
- This, however, isn't thoroughly tested for.
39
-
40
- Blocks as proper, real classes have a certain lifecycle to follow when
41
- being constructed:
42
-
43
- 1. Open condition
44
- - A block needs to find its first marker on the current line to open
45
- (see `#begin?` method)
46
- - Once it's open, it's immediately initialized and fed the line it just
47
- read (but now as an object, not as a class) (see `#consume` method)
48
- 2. Marker/Line consumption
49
- - While it should be kept open, the block parser instance will
50
- keep reading inupt through `#consume` method, returning a pair
51
- of modified line (after consuming its tokens from it) and
52
- a boolean value indicating permission of lazy continuation
53
- (if it's a block like a QuoteBlock or ULBlock that can be lazily
54
- overflowed).
55
- Every line the parser needs to record needs to be pushed
56
- through the `#push` method.
57
- 3. Closure
58
- - If the current line no longer belongs to the current block
59
- (if the block should have been closed on the previous line),
60
- it simply needs to `return` a pair of `nil`, and a boolean value for
61
- permission of lazy continuation
62
- - If a block should be closed on the current line, it should capture it,
63
- keep track of the "closed" state, then `return` `nil` on the next call
64
- of `#consume`
65
- - Once a block is closed, it:
66
- 1. Receives its content from the parser
67
- 2. Parser receives the "close" method call
68
- 3. (optional) Parser may have a callable method `#applyprops`. If
69
- it exists, it gets called with the current constructed block.
70
- 4. (optional) All overlays assigned to this block's class are
71
- processed on the contents of this block (more on that in
72
- Overlay phase)
73
- 5. (optional) Parser may return a different class, which
74
- the current block should be cast into (Overlays may change
75
- the class as well)
76
- 6. (optional) If a block can respond to `#parse_inner` method, it
77
- will get called, allowing the block to parse its own contents.
78
- - After this point, the block is no longer touched until the document
79
- fully gets processed.
80
- 4. Inline processing
81
- - (Applies only to Paragraph and any child of LeafBlock)
82
- When the document gets fully processed, the contents of the current
83
- block are taken, assigned to an InlineRoot instance, and then parsed
84
- in Inline mode
85
- 5. Completion
86
- - The resulting document is then returned.
87
-
88
- While there is a lot of functionality available in desgining blocks, it is
89
- not necessary for the simplest of the block kinds available. The simplest
90
- example of a block parser is likely the ThematicBreakParser class, which
91
- implements the only 2 methods needed for a block parser to function.
92
-
93
- While parsing text, a block may use additional info:
94
-
95
- - In consume method: `lazy` hasharg, if the current line is being processed
96
- in lazy continuation mode (likely only ever matters for Paragraph); and
97
- `parent` - the parent block containing this block.
98
-
99
- Block interpretations are tried in decreasing order of their priority
100
- value, as applied using the `#define_child` method.
101
-
102
- For blocks to be properly indexed, they need to be a valid child or
103
- a valid descendant (meaning reachable through child chain) of the
104
- Document class.
105
-
106
- ### Overlay phase ###
107
-
108
- Overlay phase doesn't start at some specific point in time. Rather,
109
- Overlay phase happens for every block individually - when that block
110
- closes.
111
-
112
- Overlay mechanism can be applied to any DOMObject type, so long as its
113
- close method is called at some point (this may not be of interest to
114
- people that do not implement custom syntax, as it generally translates
115
- to "only block level elements get their overlays processed")
116
-
117
- Overlay mechanism provides the ability to perform some action on the block
118
- right after it gets closed and right before it gets interpreted by the
119
- inline phase. Overlays may do the following:
120
-
121
- - Change the block's class
122
- (by returning a class from the `#process` method)
123
- - Change the block's content (by directly editing it)
124
- - Change the block's properties (by modifying its `properties` hash)
125
-
126
- Overlay interpretations are tried in decreasing order of their priority
127
- value, as defined using the `#define_overlay` method.
128
-
129
- ### Inline phase ###
130
-
131
- Once all blocks have been processed, and all overlays have been applied
132
- to their respective block types, the hook in the Document class's
133
- `#parser` method executes inline parsing phase of all leaf blocks
134
- (descendants of the `Leaf` class) and paragraphs.
135
-
136
- The outer class encompassing all inline children of a block is
137
- `InlineRoot`. As such, if an inline element is to ever appear within the
138
- text, it needs to be reachable as a child or a descendant of InlineRoot.
139
-
140
- Inline parsing works in three parts:
141
-
142
- - First, the contens are tokenized (every parser marks its own tokens)
143
- - Second, the forward walk procedure is called
144
- - Third, the reverse walk procedure is called
145
-
146
- This process is repeated for every group of parsers with equal priority.
147
- At one point in time, only all the parsers of equal priority may run in
148
- the same step. Then, the process goes to the next step, of parsers of
149
- higher priority value. As counter-intuitive as this is, this means that
150
- it goes to the parsers of _lower_ priority.
151
-
152
- At the very end of the process, the remaining strings are concatenated
153
- within the mixed array of inlines and strings, and turned into Text
154
- nodes, after which the contents of the array are appended as children to
155
- the root node.
156
-
157
- This process is recursively applied to all elements which may have child
158
- elements. This is ensured when an inline parser calls the "build"
159
- utility method.
160
-
161
- The inline parser is a class that implements static methods `tokenize`
162
- and either `forward_walk` or `reverse_walk`. Both may be implemented at
163
- the same time, but this isn't advisable.
164
-
165
- The tokenization process is characterized by calling every parser in the
166
- current group with every string in tokens array using the `tokenize`
167
- method. It is expected that the parser breaks the string down into an
168
- array of other strings and tokens. A token is an array where the first
169
- element is the literal text representation of the token, the second
170
- value is the class of the parser, and the _last_ value (_not third_) is
171
- the `:close` or `:open` symbol (though functionally it may hold any
172
- symbol value). Any additional information the parser may need in later
173
- stages may be stored between the last element and the second element.
174
-
175
- Example:
176
-
177
- Input:
178
-
179
- "_this _is a string of_ tokens_"
180
-
181
- Output:
182
-
183
- [["_", ::PointBlank::Parsing::EmphInline, :open],
184
- "this ",
185
- ["_", ::PointBlank::Parsing::EmphInline, :open],
186
- "is a string of",
187
- ["_", ::PointBlank::Parsing::EmphInline, :close],
188
- " tokens",
189
- ["_", ::PointBlank::Parsing::EmphInline, :close]]
190
-
191
- The forward walk is characterized by calling parsers which implement the
192
- `#forward_walk` method. When the main class encounters an opening token
193
- in `forward_walk`, it will call the `#forward_walk` method of the class
194
- that represents this token. It is expected that the parser class will
195
- then attempt to build the first available occurence of the inline
196
- element it represents, after which it will return the array of all
197
- tokens and strings that it was passed where the first element will be
198
- the newly constructed inline element. If it is unable to close the
199
- block, it should simply return the original contents, unmodified.
200
-
201
- Example:
202
-
203
- Original text:
204
-
205
- this is outside the inline `this is inside the inline` and this
206
- is right after the inline `and this is the next inline`
207
-
208
- Input:
209
-
210
- [["`", ::PointBlank::Parsing::CodeInline, :open],
211
- "this is inside the inline"
212
- ["`", ::PointBlank::Parsing::CodeInline, :close],
213
- " and this is right after the inline ",
214
- ["`", ::PointBlank::Parsing::CodeInline, :open],
215
- "and this is the next inline"
216
- ["`", ::PointBlank::Parsing::CodeInline, :close]]
217
-
218
- Output:
219
-
220
- [<::PointBlank::DOM::InlineCode
221
- @content = "this is inside the inline">,
222
- " and this is right after the inline ",
223
- ["`", ::PointBlank::Parsing::CodeInline, :open],
224
- "and this is the next inline"
225
- ["`", ::PointBlank::Parsing::CodeInline, :close]]
226
-
227
- The reverse walk is characterized by calling parsers which implement the
228
- `#reverse_walk` method when the main class encounters a closing token
229
- for this class (the one that contains the `:close` symbol in the last
230
- position of the token information array). After that the main class will
231
- call the parser's `#reverse_walk` method with the current list of
232
- tokens, inlines and strings. It is expected that the parser will then
233
- collect all the blocks, strings and inlines that fit within the block
234
- closed by the last element in the list, and once it encounters the
235
- appropriate opening token for the closing token in the last position of
236
- the array, it will then replace the elements fitting within that inline
237
- with a class containing all the collected elements. If it is unable to
238
- find a matching opening token for the closing token in the last
239
- position, it should simply return the original contents, unmodified.
240
-
241
- Example:
242
-
243
- Original text:
244
-
245
- blah blah something something lots of text before the emphasis
246
- _this is emphasized `and this is an inline` but it's still
247
- emphasized_
248
-
249
-
250
- Input:
251
-
252
- ["blah blah something something lots of text before the emphasis",
253
- ["_", ::PointBlank::Parsing::EmphInline, :open],
254
- "this is emphasized",
255
- <::PointBlank::DOM::InlineCode,
256
- @content = "and this is an inline">,
257
- " but it's still emphasized",
258
- ["_", ::PointBlank::Parsing::EmphInline, :close]]
259
-
260
- Output:
261
-
262
- ["blah blah something something lots of text before the emphasis",
263
- <::PointBlank::DOM::InlineEmphasis,
264
- children = [...,
265
- <::PointBlank::DOM::InlineCode ...>
266
- ...]>]
267
-
268
- Both `#forward_walk` and `#reverse_walk` are not restricted to making
269
- just the changes discussed above, and can arbitrarily modify the token
270
- arrays. That, however, should be done with great care, so as to not
271
- accidentally break compatibility with other parsers.
272
-
273
- To ensure that the collected tokens in the `#reverse_walk` and
274
- `#forward_walk` are processes correctly, the colllected arrays of
275
- tokens, blocks and inlines should be built into an object that
276
- represents this parser using the `build` method (it will automatically
277
- attempt to find the correct class to construct using the
278
- `#define_parser` directive in the DOMObject subclass definition)
data/security.md DELETED
@@ -1,21 +0,0 @@
1
- Security acknowledgements
2
- =========================
3
-
4
- While special care has been taken to prevent some of the more common common
5
- vulnerabilities that might arise from using this parser, it does not prevent
6
- certain issues which **which should be acknowledged**.
7
-
8
- - It is possible to inject a form of one-click XSS into the website. In
9
- particular, there are no restrictions placed on urls embedded within the links
10
- (as per the description of CommonMark specification). As such, something as
11
- simple as `[test](<javascript:dangerous code here>)` would be more than enough
12
- to employ such an exploit.
13
- - While generally speaking the parser acts stable on most tests, and precents
14
- stray HTML tokens from occuring in the output text where appropriate, due to
15
- the nontrivial nature of the task some form of XSS injection may or may not
16
- occur. If such an incident occurs, please report it to the current maintainer
17
- of the project.
18
- - User input should NOT be trusted when it comes to applying options to
19
- rendering. Some renderers, such as the HTML renderer, allow modifying the
20
- style parameter for rendered tags, which when passed control of to an
21
- untrusted party may become an XSS attack vector.
data/test.md DELETED
@@ -1,15 +0,0 @@
1
- Jabronicle Mitch's wild ride
2
- ============================
3
-
4
- crusty cocks
5
-
6
- -------------
7
-
8
- - DIIIIICKS
9
- - - let
10
- - me
11
- - smell
12
- - your
13
- - dick
14
- - [anime feet](https://google.com)
15
- - [goog](https://goog.com 'goog')