kramdown 0.2.0 → 0.3.0

Sign up to get free protection for your applications and to get access to all the features.

Potentially problematic release.


This version of kramdown might be problematic. Click here for more details.

Files changed (67) hide show
  1. data/ChangeLog +267 -0
  2. data/VERSION +1 -1
  3. data/benchmark/benchmark.rb +2 -1
  4. data/benchmark/generate_data.rb +110 -0
  5. data/benchmark/historic-jruby-1.4.0.dat +7 -0
  6. data/benchmark/historic-ruby-1.8.6.dat +7 -0
  7. data/benchmark/historic-ruby-1.8.7.dat +7 -0
  8. data/benchmark/historic-ruby-1.9.1p243.dat +7 -0
  9. data/benchmark/historic-ruby-1.9.2dev.dat +7 -0
  10. data/benchmark/static-jruby-1.4.0.dat +7 -0
  11. data/benchmark/static-ruby-1.8.6.dat +7 -0
  12. data/benchmark/static-ruby-1.8.7.dat +7 -0
  13. data/benchmark/static-ruby-1.9.1p243.dat +7 -0
  14. data/benchmark/static-ruby-1.9.2dev.dat +7 -0
  15. data/benchmark/testing.sh +1 -1
  16. data/doc/index.page +5 -5
  17. data/doc/installation.page +3 -3
  18. data/doc/quickref.page +3 -3
  19. data/doc/syntax.page +133 -101
  20. data/doc/tests.page +9 -1
  21. data/lib/kramdown/compatibility.rb +34 -0
  22. data/lib/kramdown/converter.rb +26 -8
  23. data/lib/kramdown/document.rb +2 -1
  24. data/lib/kramdown/parser.rb +1 -1192
  25. data/lib/kramdown/parser/kramdown.rb +272 -0
  26. data/lib/kramdown/parser/kramdown/attribute_list.rb +102 -0
  27. data/lib/kramdown/parser/kramdown/autolink.rb +42 -0
  28. data/lib/kramdown/parser/kramdown/blank_line.rb +43 -0
  29. data/lib/kramdown/parser/kramdown/blockquote.rb +42 -0
  30. data/lib/kramdown/parser/kramdown/codeblock.rb +62 -0
  31. data/lib/kramdown/parser/kramdown/codespan.rb +57 -0
  32. data/lib/kramdown/parser/kramdown/emphasis.rb +69 -0
  33. data/lib/kramdown/parser/kramdown/eob.rb +39 -0
  34. data/lib/kramdown/parser/kramdown/escaped_chars.rb +38 -0
  35. data/lib/kramdown/parser/kramdown/extension.rb +65 -0
  36. data/lib/kramdown/parser/kramdown/footnote.rb +72 -0
  37. data/lib/kramdown/parser/kramdown/header.rb +81 -0
  38. data/lib/kramdown/parser/kramdown/horizontal_rule.rb +39 -0
  39. data/lib/kramdown/parser/kramdown/html.rb +253 -0
  40. data/lib/kramdown/{deprecated.rb → parser/kramdown/html_entity.rb} +10 -12
  41. data/lib/kramdown/parser/kramdown/line_break.rb +38 -0
  42. data/lib/kramdown/parser/kramdown/link.rb +153 -0
  43. data/lib/kramdown/parser/kramdown/list.rb +225 -0
  44. data/lib/kramdown/parser/kramdown/paragraph.rb +44 -0
  45. data/lib/kramdown/parser/kramdown/typographic_symbol.rb +48 -0
  46. data/lib/kramdown/version.rb +1 -1
  47. data/test/testcases/block/09_html/comment.html +1 -0
  48. data/test/testcases/block/09_html/comment.text +1 -1
  49. data/test/testcases/block/09_html/content_model/tables.text +2 -2
  50. data/test/testcases/block/09_html/not_parsed.html +10 -0
  51. data/test/testcases/block/09_html/not_parsed.text +9 -0
  52. data/test/testcases/block/09_html/parse_as_raw.html +4 -0
  53. data/test/testcases/block/09_html/parse_as_raw.text +2 -0
  54. data/test/testcases/block/09_html/parse_block_html.html +4 -0
  55. data/test/testcases/block/09_html/parse_block_html.text +3 -0
  56. data/test/testcases/block/09_html/processing_instruction.html +1 -0
  57. data/test/testcases/block/09_html/processing_instruction.text +1 -1
  58. data/test/testcases/block/09_html/simple.html +8 -15
  59. data/test/testcases/block/09_html/simple.text +2 -12
  60. data/test/testcases/span/02_emphasis/normal.html +8 -4
  61. data/test/testcases/span/02_emphasis/normal.text +6 -2
  62. data/test/testcases/span/05_html/markdown_attr.html +2 -1
  63. data/test/testcases/span/05_html/markdown_attr.text +2 -1
  64. data/test/testcases/span/05_html/normal.html +6 -2
  65. data/test/testcases/span/05_html/normal.text +4 -0
  66. metadata +35 -4
  67. data/lib/kramdown/parser/registry.rb +0 -62
@@ -31,7 +31,7 @@ kramdown comes with a small benchmark to test how fast it is in regard to four o
31
31
  implementations: Maruku, BlueFeather, BlueCloth and RDiscount. The first two are written using only
32
32
  Ruby, the latter two use the C discount library for the actual hard work (which makes them really
33
33
  fast but they do not provide additional syntax elements). As one can see below, kramdown is
34
- currently (November 2009) ~5x faster than Maruku, ~10x faster than BlueFeather but ~30x slower than
34
+ currently (December 2009) ~5x faster than Maruku, ~10x faster than BlueFeather but ~30x slower than
35
35
  BlueCloth and rdiscount:
36
36
 
37
37
  <pre><code>
@@ -39,5 +39,13 @@ BlueCloth and rdiscount:
39
39
  </code>
40
40
  </pre>
41
41
 
42
+ And here are some graphs which show the execution times on different Ruby interpreters:
43
+
44
+ ![ruby 1.8.6]({relocatable: img/graph-ruby-1.8.6.png})
45
+ ![ruby 1.8.7]({relocatable: img/graph-ruby-1.8.7.png})
46
+ ![ruby 1.9.1p243]({relocatable: img/graph-ruby-1.9.1p243.png})
47
+ ![ruby 1.9.2dev]({relocatable: img/graph-ruby-1.9.2dev.png})
48
+ ![jruby 1.4.0]({relocatable: img/graph-jruby-1.4.0.png})
49
+
42
50
  [Markdown Test Suite]: http://daringfireball.net/projects/downloads/MarkdownTest_1.0.zip
43
51
  [MDTest]: http://www.michelf.com/docs/projets/mdtest-1.0.zip
@@ -0,0 +1,34 @@
1
+ # -*- coding: utf-8 -*-
2
+ #
3
+ #--
4
+ # Copyright (C) 2009 Thomas Leitner <t_leitner@gmx.at>
5
+ #
6
+ # This file is part of kramdown.
7
+ #
8
+ # kramdown is free software: you can redistribute it and/or modify
9
+ # it under the terms of the GNU General Public License as published by
10
+ # the Free Software Foundation, either version 3 of the License, or
11
+ # (at your option) any later version.
12
+ #
13
+ # This program is distributed in the hope that it will be useful,
14
+ # but WITHOUT ANY WARRANTY; without even the implied warranty of
15
+ # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
16
+ # GNU General Public License for more details.
17
+ #
18
+ # You should have received a copy of the GNU General Public License
19
+ # along with this program. If not, see <http://www.gnu.org/licenses/>.
20
+ #++
21
+ #
22
+
23
+ # All the code in this file is backported from Ruby 1.8.7 sothat kramdown works under 1.8.5
24
+
25
+ if RUBY_VERSION == '1.8.5'
26
+ require 'rexml/parsers/baseparser'
27
+ module REXML
28
+ module Parsers
29
+ class BaseParser
30
+ UNAME_STR= "(?:#{NCNAME_STR}:)?#{NCNAME_STR}"
31
+ end
32
+ end
33
+ end
34
+ end
@@ -118,10 +118,6 @@ module Kramdown
118
118
  "#{' '*indent}<dt#{options_for_element(el)}>#{inner}</dt>\n"
119
119
  end
120
120
 
121
- def convert_html_raw(el, inner, indent)
122
- el.value + (el.options[:type] == :block ? "\n" : '')
123
- end
124
-
125
121
  HTML_TAGS_WITH_BODY=['div', 'script']
126
122
 
127
123
  def convert_html_element(el, inner, indent)
@@ -131,12 +127,10 @@ module Kramdown
131
127
  "<#{el.value}#{options_for_element(el)}" << (!inner.empty? ? ">#{inner}</#{el.value}>" : " />")
132
128
  else
133
129
  output = ''
134
- output << ' '*indent if !el.options[:no_start_indent] && el.options[:parse_type] != :raw && !el.options[:parent_is_raw]
130
+ output << ' '*indent if el.options[:parse_type] != :raw && !el.options[:parent_is_raw]
135
131
  output << "<#{el.value}#{options_for_element(el)}"
136
- if !inner.empty? && (el.options[:compact] || el.options[:parse_type] != :block)
132
+ if !inner.empty? && el.options[:parse_type] != :block
137
133
  output << ">#{inner}</#{el.value}>"
138
- elsif !inner.empty? && (el.children.first.type == :text || el.children.first.options[:no_start_indent])
139
- output << ">#{inner}" << ' '*indent << "</#{el.value}>"
140
134
  elsif !inner.empty?
141
135
  output << ">\n#{inner}" << ' '*indent << "</#{el.value}>"
142
136
  elsif HTML_TAGS_WITH_BODY.include?(el.value)
@@ -149,11 +143,27 @@ module Kramdown
149
143
  end
150
144
  end
151
145
 
146
+ def convert_html_text(el, inner, indent)
147
+ escape_html(el.value, false)
148
+ end
149
+
150
+ def convert_xml_comment(el, inner, indent)
151
+ el.value + (el.options[:type] == :block ? "\n" : '')
152
+ end
153
+ alias :convert_xml_pi :convert_xml_comment
154
+
152
155
  def convert_br(el, inner, indent)
153
156
  "<br />"
154
157
  end
155
158
 
156
159
  def convert_a(el, inner, indent)
160
+ if el.options[:attr]['href'] =~ /^mailto:/
161
+ el = Marshal.load(Marshal.dump(el)) # so that the original is not changed
162
+ href = obfuscate(el.options[:attr]['href'].sub(/^mailto:/, ''))
163
+ mailto = obfuscate('mailto')
164
+ el.options[:attr]['href'] = "#{mailto}:#{href}"
165
+ end
166
+ inner = obfuscate(inner) if el.options[:obfuscate_text]
157
167
  "<a#{options_for_element(el)}>#{inner}</a>"
158
168
  end
159
169
 
@@ -198,6 +208,14 @@ module Kramdown
198
208
  inner << footnote_content
199
209
  end
200
210
 
211
+ # Helper method for obfuscating the +text+ by using HTML entities.
212
+ def obfuscate(text)
213
+ result = ""
214
+ text.each_byte do |b|
215
+ result += (b > 128 ? b.chr : "&#%03d;" % b)
216
+ end
217
+ result
218
+ end
201
219
 
202
220
  # Return a HTML list with the footnote content for the used footnotes.
203
221
  def footnote_content
@@ -20,12 +20,13 @@
20
20
  #++
21
21
  #
22
22
 
23
+ require 'kramdown/compatibility'
24
+
23
25
  require 'kramdown/version'
24
26
  require 'kramdown/error'
25
27
  require 'kramdown/parser'
26
28
  require 'kramdown/converter'
27
29
  require 'kramdown/extension'
28
- require 'kramdown/deprecated'
29
30
 
30
31
  module Kramdown
31
32
 
@@ -20,1204 +20,13 @@
20
20
  #++
21
21
  #
22
22
 
23
- require 'strscan'
24
- require 'stringio'
25
- require 'kramdown/parser/registry'
26
-
27
- #TODO: use [[:alpha:]] in all regexp to allow parsing of international values in 1.9.1
28
- #NOTE: use @src.pre_match only before other check/match?/... operations, otherwise the content is changed
29
-
30
23
  module Kramdown
31
24
 
32
25
  # This module contains all available parsers. Currently, there is only one parser for parsing
33
26
  # documents in kramdown format.
34
27
  module Parser
35
28
 
36
- # Used for parsing a document in kramdown format.
37
- class Kramdown
38
-
39
- include ::Kramdown
40
-
41
- attr_reader :tree
42
- attr_reader :doc
43
-
44
- # Create a new Kramdown parser object for the Kramdown::Document +doc+.
45
- def initialize(doc)
46
- @doc = doc
47
- @src = nil
48
- @tree = nil
49
- @unclosed_html_tags = []
50
- @stack = []
51
- @used_ids = {}
52
- @doc.parse_infos[:ald] = {}
53
- @doc.parse_infos[:link_defs] = {}
54
- @doc.parse_infos[:footnotes] = {}
55
- end
56
- private_class_method(:new, :allocate)
57
-
58
-
59
- # Parse the string +source+ using the Kramdown::Document +doc+ and return the parse tree.
60
- def self.parse(source, doc)
61
- new(doc).parse(source)
62
- end
63
-
64
- # The source string provided on initialization is parsed and the created +tree+ is returned.
65
- def parse(source)
66
- configure_parser
67
- tree = Element.new(:root)
68
- parse_blocks(tree, adapt_source(source))
69
- update_tree(tree)
70
- @doc.parse_infos[:footnotes].each do |name, data|
71
- update_tree(data[:content])
72
- end
73
- tree
74
- end
75
-
76
- # Add the given warning +text+ to the warning array of the Kramdown document.
77
- def warning(text)
78
- @doc.warnings << text
79
- #TODO: add position information
80
- end
81
-
82
- #######
83
- private
84
- #######
85
-
86
- BLOCK_PARSERS = [:blank_line, :codeblock, :codeblock_fenced, :blockquote, :atx_header,
87
- :setext_header, :horizontal_rule, :list, :definition_list, :link_definition, :block_html,
88
- :footnote_definition, :ald, :block_ial, :extension_block, :eob_marker, :paragraph]
89
- SPAN_PARSERS = [:emphasis, :codespan, :autolink, :span_html, :footnote_marker, :link,
90
- :span_ial, :html_entity, :typographic_syms, :line_break, :escaped_chars]
91
-
92
- # Adapt the object to allow parsing like specified in the options.
93
- def configure_parser
94
- @parsers = {}
95
- BLOCK_PARSERS.each do |name|
96
- if Registry.has_parser?(name, :block)
97
- extend(Registry.parser(name).module)
98
- @parsers[name] = Registry.parser(name)
99
- else
100
- raise Kramdown::Error, "Unknown block parser: #{name}"
101
- end
102
- end
103
- SPAN_PARSERS.each do |name|
104
- if Registry.has_parser?(name, :span)
105
- extend(Registry.parser(name).module)
106
- @parsers[name] = Registry.parser(name)
107
- else
108
- raise Kramdown::Error, "Unknown span parser: #{name}"
109
- end
110
- end
111
- @span_start = Regexp.union(*SPAN_PARSERS.map {|name| @parsers[name].start_re})
112
- @span_start_re = /(?=#{@span_start})/
113
- end
114
-
115
- # Parse all block level elements in +text+ (a string or a StringScanner object) into the
116
- # element +el+.
117
- def parse_blocks(el, text)
118
- @stack.push([@tree, @src, @unclosed_html_tags])
119
- @tree, @src, @unclosed_html_tags = el, StringScanner.new(text), []
120
-
121
- while !@src.eos?
122
- BLOCK_PARSERS.any? do |name|
123
- if @src.check(@parsers[name].start_re)
124
- send(@parsers[name].method)
125
- else
126
- false
127
- end
128
- end || begin
129
- warning('Warning: this should not occur - no block parser handled the line')
130
- add_text(@src.scan(/.*\n/))
131
- end
132
- end
133
-
134
- @unclosed_html_tags.reverse.each do |tag|
135
- warning("Automatically closing unclosed html tag '#{tag.value}'")
136
- end
137
-
138
- @tree, @src, @unclosed_html_tags = *@stack.pop
139
- end
140
-
141
- # Update the tree by parsing all <tt>:text</tt> elements with the span level parser (resets
142
- # +@tree+, +@src+ and the +@stack+) and by updating the attributes from the IALs.
143
- def update_tree(element)
144
- element.children.map! do |child|
145
- if child.type == :text
146
- @stack, @tree = [], nil
147
- @src = StringScanner.new(child.value)
148
- parse_spans(child)
149
- child.children
150
- else
151
- update_tree(child)
152
- update_attr_with_ial(child.options[:attr] ||= {}, child.options[:ial]) if child.options[:ial]
153
- child
154
- end
155
- end.flatten!
156
- end
157
-
158
- # Parse all span level elements in the source string.
159
- def parse_spans(el, stop_re = nil)
160
- @stack.push(@tree)
161
- @tree = el
162
-
163
- used_re = (stop_re.nil? ? @span_start_re : /(?=#{Regexp.union(stop_re, @span_start)})/)
164
- stop_re_found = false
165
- while !@src.eos? && !stop_re_found
166
- if result = @src.scan_until(used_re)
167
- add_text(result)
168
- if stop_re && (stop_re_matched = @src.check(stop_re))
169
- stop_re_found = (block_given? ? yield : true)
170
- end
171
- processed = SPAN_PARSERS.any? do |name|
172
- if @src.check(@parsers[name].start_re)
173
- send(@parsers[name].method)
174
- true
175
- else
176
- false
177
- end
178
- end unless stop_re_found
179
- if !processed && !stop_re_found
180
- if stop_re_matched
181
- add_text(@src.scan(/./))
182
- else
183
- raise Kramdown::Error, 'Bug: please report!'
184
- end
185
- end
186
- else
187
- add_text(@src.scan_until(/.*/m)) unless stop_re
188
- break
189
- end
190
- end
191
-
192
- @tree = @stack.pop
193
-
194
- stop_re_found
195
- end
196
-
197
- # Modify the string +source+ to be usable by the parser.
198
- def adapt_source(source)
199
- source.gsub(/\r\n?/, "\n").chomp + "\n"
200
- end
201
-
202
- # This helper method adds the given +text+ either to the last element in the +tree+ if it is a
203
- # text element or creates a new text element.
204
- def add_text(text, tree = @tree)
205
- if tree.children.last && tree.children.last.type == :text
206
- tree.children.last.value << text
207
- elsif !text.empty?
208
- tree.children << Element.new(:text, text)
209
- end
210
- end
211
-
212
- end
213
-
214
-
215
- module ParserMethods
216
-
217
- INDENT = /^(?:\t| {4})/
218
- OPT_SPACE = / {0,3}/
219
-
220
-
221
- # Parse the string +str+ and extract all attributes and add all found attributes to the hash
222
- # +opts+.
223
- def parse_attribute_list(str, opts)
224
- str.scan(ALD_TYPE_ANY).each do |key, sep, val, id_attr, class_attr, ref|
225
- if ref
226
- (opts[:refs] ||= []) << ref
227
- elsif class_attr
228
- opts['class'] = ((opts['class'] || '') + " #{class_attr}").lstrip
229
- elsif id_attr
230
- opts['id'] = id_attr
231
- else
232
- opts[key] = val.gsub(/\\(\}|#{sep})/, "\\1")
233
- end
234
- end
235
- end
236
-
237
- # Update the +ial+ with the information from the inline attribute list +opts+.
238
- def update_ial_with_ial(ial, opts)
239
- (ial[:refs] ||= []) << opts[:refs]
240
- ial['class'] = ((ial['class'] || '') + " #{opts['class']}").lstrip if opts['class']
241
- opts.each {|k,v| ial[k] = v if k != :refs && k != 'class' }
242
- end
243
-
244
- # Update the attributes with the information from the inline attribute list and all referenced ALDs.
245
- def update_attr_with_ial(attr, ial)
246
- ial[:refs].each do |ref|
247
- update_attr_with_ial(attr, ref) if ref = @doc.parse_infos[:ald][ref]
248
- end if ial[:refs]
249
- attr['class'] = ((attr['class'] || '') + " #{ial['class']}").lstrip if ial['class']
250
- ial.each {|k,v| attr[k] = v if k.kind_of?(String) && k != 'class' }
251
- end
252
-
253
- # Generate an alpha-numeric ID from the the string +str+.
254
- def generate_id(str)
255
- gen_id = str.gsub(/[^a-zA-Z0-9 -]/, '').gsub(/^[^a-zA-Z]*/, '').gsub(' ', '-').downcase
256
- gen_id = 'section' if gen_id.length == 0
257
- if @used_ids.has_key?(gen_id)
258
- gen_id += '-' + (@used_ids[gen_id] += 1).to_s
259
- else
260
- @used_ids[gen_id] = 0
261
- end
262
- gen_id
263
- end
264
-
265
- # Helper method for obfuscating the +email+ address by using HTML entities.
266
- def obfuscate_email(email)
267
- result = ""
268
- email.each_byte do |b|
269
- result += (b > 128 ? b.chr : "&#%03d;" % b)
270
- end
271
- result
272
- end
273
-
274
-
275
- BLANK_LINE = /(?:^\s*\n)+/
276
-
277
- # Parse the blank line at the current postition.
278
- def parse_blank_line
279
- @src.pos += @src.matched_size
280
- if @tree.children.last && @tree.children.last.type == :blank
281
- @tree.children.last.value += @src.matched
282
- else
283
- @tree.children << Element.new(:blank, @src.matched)
284
- end
285
- true
286
- end
287
- Registry.define_parser(:block, :blank_line, BLANK_LINE, self)
288
-
289
-
290
- EOB_MARKER = /^\^\s*?\n/
291
-
292
- # Parse the EOB marker at the current location.
293
- def parse_eob_marker
294
- @src.pos += @src.matched_size
295
- @tree.children << Element.new(:eob)
296
- true
297
- end
298
- Registry.define_parser(:block, :eob_marker, EOB_MARKER, self)
299
-
300
-
301
- PARAGRAPH_START = /^#{OPT_SPACE}[^ \t].*?\n/
302
-
303
- # Parse the paragraph at the current location.
304
- def parse_paragraph
305
- @src.pos += @src.matched_size
306
- if @tree.children.last && @tree.children.last.type == :p
307
- @tree.children.last.children.first.value << "\n" << @src.matched.chomp
308
- else
309
- @tree.children << Element.new(:p)
310
- add_text(@src.matched.lstrip.chomp, @tree.children.last)
311
- end
312
- true
313
- end
314
- Registry.define_parser(:block, :paragraph, PARAGRAPH_START, self)
315
-
316
- HEADER_ID=/(?:[ \t]\{#((?:\w|\d)[\w\d-]*)\})?/
317
- SETEXT_HEADER_START = /^(#{OPT_SPACE}[^ \t].*?)#{HEADER_ID}[ \t]*?\n(-|=)+\s*?\n/
318
-
319
- # Parse the Setext header at the current location.
320
- def parse_setext_header
321
- if @tree.children.last && @tree.children.last.type != :blank
322
- return false
323
- end
324
- @src.pos += @src.matched_size
325
- text, id, level = @src[1].strip, @src[2], @src[3]
326
- el = Element.new(:header, nil, :level => (level == '-' ? 2 : 1))
327
- add_text(text, el)
328
- el.options[:attr] = {'id' => id} if id
329
- el.options[:attr] = {'id' => generate_id(text)} if @doc.options[:auto_ids] && !id
330
- @tree.children << el
331
- true
332
- end
333
- Registry.define_parser(:block, :setext_header, SETEXT_HEADER_START, self)
334
-
335
-
336
- ATX_HEADER_START = /^\#{1,6}/
337
- ATX_HEADER_MATCH = /^(\#{1,6})(.+?)\s*?#*#{HEADER_ID}\s*?\n/
338
-
339
- # Parse the Atx header at the current location.
340
- def parse_atx_header
341
- if @tree.children.last && @tree.children.last.type != :blank
342
- return false
343
- end
344
- result = @src.scan(ATX_HEADER_MATCH)
345
- level, text, id = @src[1], @src[2].strip, @src[3]
346
- el = Element.new(:header, nil, :level => level.length)
347
- add_text(text, el)
348
- el.options[:attr] = {'id' => id} if id
349
- el.options[:attr] = {'id' => generate_id(text)} if @doc.options[:auto_ids] && !id
350
- @tree.children << el
351
- true
352
- end
353
- Registry.define_parser(:block, :atx_header, ATX_HEADER_START, self)
354
-
355
-
356
- BLOCKQUOTE_START = /^#{OPT_SPACE}> ?/
357
- BLOCKQUOTE_MATCH = /(^#{OPT_SPACE}>.*?\n)+/
358
-
359
- # Parse the blockquote at the current location.
360
- def parse_blockquote
361
- result = @src.scan(BLOCKQUOTE_MATCH).gsub(BLOCKQUOTE_START, '')
362
- el = Element.new(:blockquote)
363
- @tree.children << el
364
- parse_blocks(el, result)
365
- true
366
- end
367
- Registry.define_parser(:block, :blockquote, BLOCKQUOTE_START, self)
368
-
369
-
370
- CODEBLOCK_START = INDENT
371
- CODEBLOCK_MATCH = /(?:#{INDENT}.*?\S.*?\n)+/
372
-
373
- # Parse the indented codeblock at the current location.
374
- def parse_codeblock
375
- result = @src.scan(CODEBLOCK_MATCH).gsub(INDENT, '')
376
- children = @tree.children
377
- if children.length >= 2 && children[-1].type == :blank && children[-2].type == :codeblock
378
- children[-2].value << children[-1].value.gsub(INDENT, '') << result
379
- children.pop
380
- else
381
- @tree.children << Element.new(:codeblock, result)
382
- end
383
- true
384
- end
385
- Registry.define_parser(:block, :codeblock, CODEBLOCK_START, self)
386
-
387
-
388
- FENCED_CODEBLOCK_START = /^~{3,}/
389
- FENCED_CODEBLOCK_MATCH = /^(~{3,})\s*?\n(.*?)^\1~*\s*?\n/m
390
-
391
- # Parse the fenced codeblock at the current location.
392
- def parse_codeblock_fenced
393
- if @src.check(FENCED_CODEBLOCK_MATCH)
394
- @src.pos += @src.matched_size
395
- @tree.children << Element.new(:codeblock, @src[2])
396
- true
397
- else
398
- false
399
- end
400
- end
401
- Registry.define_parser(:block, :codeblock_fenced, FENCED_CODEBLOCK_START, self)
402
-
403
-
404
- HR_START = /^#{OPT_SPACE}(\*|-|_)[ \t]*\1[ \t]*\1[ \t]*(\1|[ \t])*\n/
405
-
406
- # Parse the horizontal rule at the current location.
407
- def parse_horizontal_rule
408
- @src.pos += @src.matched_size
409
- @tree.children << Element.new(:hr)
410
- true
411
- end
412
- Registry.define_parser(:block, :horizontal_rule, HR_START, self)
413
-
414
-
415
- LIST_START_UL = /^(#{OPT_SPACE}[+*-])([\t| ].*?\n)/
416
- LIST_START_OL = /^(#{OPT_SPACE}\d+\.)([\t| ].*?\n)/
417
- LIST_START = /#{LIST_START_UL}|#{LIST_START_OL}/
418
-
419
- # Parse the ordered or unordered list at the current location.
420
- def parse_list
421
- if @tree.children.last && @tree.children.last.type == :p # last element must not be a paragraph
422
- return false
423
- end
424
-
425
- type, list_start_re = (@src.check(LIST_START_UL) ? [:ul, LIST_START_UL] : [:ol, LIST_START_OL])
426
- list = Element.new(type)
427
-
428
- item = nil
429
- indent_re = nil
430
- content_re = nil
431
- eob_found = false
432
- nested_list_found = false
433
- while !@src.eos?
434
- if @src.check(HR_START)
435
- break
436
- elsif @src.scan(list_start_re)
437
- item = Element.new(:li)
438
- item.value, indentation, content_re, indent_re = parse_first_list_line(@src[1].length, @src[2])
439
- list.children << item
440
-
441
- list_start_re = (type == :ul ? /^( {0,#{[3, indentation - 1].min}}[+*-])([\t| ].*?\n)/ :
442
- /^( {0,#{[3, indentation - 1].min}}\d+\.)([\t| ].*?\n)/)
443
- nested_list_found = false
444
- elsif result = @src.scan(content_re)
445
- result.sub!(/^(\t+)/) { " "*4*($1 ? $1.length : 0) }
446
- result.sub!(indent_re, '')
447
- if !nested_list_found && result =~ LIST_START
448
- parse_blocks(item, item.value)
449
- if item.children.length == 1 && item.children.first.type == :p
450
- item.value = ''
451
- else
452
- item.children.clear
453
- end
454
- nested_list_found = true
455
- end
456
- item.value << result
457
- elsif result = @src.scan(BLANK_LINE)
458
- nested_list_found = true
459
- item.value << result
460
- elsif @src.scan(EOB_MARKER)
461
- eob_found = true
462
- break
463
- else
464
- break
465
- end
466
- end
467
-
468
- @tree.children << list
469
-
470
- last = nil
471
- list.children.each do |item|
472
- temp = Element.new(:temp)
473
- parse_blocks(temp, item.value)
474
- item.children += temp.children
475
- item.value = nil
476
- next if item.children.size == 0
477
-
478
- if item.children.first.type == :p && (item.children.length < 2 || item.children[1].type != :blank ||
479
- (item == list.children.last && item.children.length == 2 && !eob_found))
480
- text = item.children.shift.children.first
481
- text.value += "\n" if !item.children.empty? && item.children[0].type != :blank
482
- item.children.unshift(text)
483
- else
484
- item.options[:first_is_block] = true
485
- end
486
-
487
- if item.children.last.type == :blank
488
- last = item.children.pop
489
- else
490
- last = nil
491
- end
492
- end
493
-
494
- @tree.children << last if !last.nil? && !eob_found
495
-
496
- true
497
- end
498
- Registry.define_parser(:block, :list, LIST_START, self)
499
-
500
- def parse_first_list_line(indentation, content)
501
- if content =~ /^\s*\n/
502
- indentation = 4
503
- else
504
- while content =~ /^ *\t/
505
- temp = content.scan(/^ */).first.length + indentation
506
- content.sub!(/^( *)(\t+)/) {$1 + " "*(4 - (temp % 4)) + " "*($2.length - 1)*4}
507
- end
508
- indentation += content.scan(/^ */).first.length
509
- end
510
- content.sub!(/^\s*/, '')
511
-
512
- indent_re = /^ {#{indentation}}/
513
- content_re = /^(?:(?:\t| {4}){#{indentation / 4}} {#{indentation % 4}}|(?:\t| {4}){#{indentation / 4 + 1}}).*?\n/
514
- [content, indentation, content_re, indent_re]
515
- end
516
-
517
-
518
- DEFINITION_LIST_START = /^(#{OPT_SPACE}:)([\t| ].*?\n)/
519
-
520
- # Parse the ordered or unordered list at the current location.
521
- def parse_definition_list
522
- children = @tree.children
523
- if !children.last || (children.length == 1 && children.last.type != :p ) ||
524
- (children.length >= 2 && children[-1].type != :p && (children[-1].type != :blank || children[-1].value != "\n" || children[-2].type != :p))
525
- return false
526
- end
527
-
528
- first_as_para = false
529
- deflist = Element.new(:dl)
530
- para = @tree.children.pop
531
- if para.type == :blank
532
- para = @tree.children.pop
533
- first_as_para = true
534
- end
535
- para.children.first.value.split("\n").each do |term|
536
- el = Element.new(:dt)
537
- el.children << Element.new(:text, term)
538
- deflist.children << el
539
- end
540
-
541
- item = nil
542
- indent_re = nil
543
- content_re = nil
544
- def_start_re = DEFINITION_LIST_START
545
- while !@src.eos?
546
- if @src.scan(def_start_re)
547
- item = Element.new(:dd)
548
- item.options[:first_as_para] = first_as_para
549
- item.value, indentation, content_re, indent_re = parse_first_list_line(@src[1].length, @src[2])
550
- deflist.children << item
551
-
552
- def_start_re = /^( {0,#{[3, indentation - 1].min}}:)([\t| ].*?\n)/
553
- first_as_para = false
554
- elsif result = @src.scan(content_re)
555
- result.sub!(/^(\t+)/) { " "*4*($1 ? $1.length : 0) }
556
- result.sub!(indent_re, '')
557
- item.value << result
558
- first_as_para = false
559
- elsif result = @src.scan(BLANK_LINE)
560
- first_as_para = true
561
- item.value << result
562
- else
563
- break
564
- end
565
- end
566
-
567
- last = nil
568
- deflist.children.each do |item|
569
- next if item.type == :dt
570
-
571
- parse_blocks(item, item.value)
572
- item.value = nil
573
- next if item.children.size == 0
574
-
575
- if item.children.last.type == :blank
576
- last = item.children.pop
577
- else
578
- last = nil
579
- end
580
- if item.children.first.type == :p && !item.options.delete(:first_as_para)
581
- text = item.children.shift.children.first
582
- text.value += "\n" if !item.children.empty?
583
- item.children.unshift(text)
584
- else
585
- item.options[:first_is_block] = true
586
- end
587
- end
588
-
589
- if @tree.children.length >= 1 && @tree.children.last.type == :dl
590
- @tree.children[-1].children += deflist.children
591
- elsif @tree.children.length >= 2 && @tree.children[-1].type == :blank && @tree.children[-2].type == :dl
592
- @tree.children.pop
593
- @tree.children[-1].children += deflist.children
594
- else
595
- @tree.children << deflist
596
- end
597
-
598
- @tree.children << last if !last.nil?
599
-
600
- true
601
- end
602
- Registry.define_parser(:block, :definition_list, DEFINITION_LIST_START, self)
603
-
604
-
605
- PUNCTUATION_CHARS = "_.:,;!?-"
606
- LINK_ID_CHARS = /[a-zA-Z0-9 #{PUNCTUATION_CHARS}]/
607
- LINK_ID_NON_CHARS = /[^a-zA-Z0-9 #{PUNCTUATION_CHARS}]/
608
- LINK_DEFINITION_START = /^#{OPT_SPACE}\[(#{LINK_ID_CHARS}+)\]:[ \t]*(?:<(.*?)>|([^\s]+))[ \t]*?(?:\n?[ \t]*?(["'])(.+?)\4[ \t]*?)?\n/
609
-
610
- # Parse the link definition at the current location.
611
- def parse_link_definition
612
- @src.pos += @src.matched_size
613
- link_id, link_url, link_title = @src[1].downcase, @src[2] || @src[3], @src[5]
614
- warning("Duplicate link ID '#{link_id}' - overwriting") if @doc.parse_infos[:link_defs][link_id]
615
- @doc.parse_infos[:link_defs][link_id] = [link_url, link_title]
616
- true
617
- end
618
- Registry.define_parser(:block, :link_definition, LINK_DEFINITION_START, self)
619
-
620
-
621
- ALD_ID_CHARS = /[\w\d-]/
622
- ALD_ANY_CHARS = /\\\}|[^\}]/
623
- ALD_ID_NAME = /(?:\w|\d)#{ALD_ID_CHARS}*/
624
- ALD_TYPE_KEY_VALUE_PAIR = /(#{ALD_ID_NAME})=("|')((?:\\\}|\\\2|[^\}\2])+?)\2/
625
- ALD_TYPE_CLASS_NAME = /\.(#{ALD_ID_NAME})/
626
- ALD_TYPE_ID_NAME = /#(#{ALD_ID_NAME})/
627
- ALD_TYPE_REF = /(#{ALD_ID_NAME})/
628
- ALD_TYPE_ANY = /(?:\A|\s)(?:#{ALD_TYPE_KEY_VALUE_PAIR}|#{ALD_TYPE_ID_NAME}|#{ALD_TYPE_CLASS_NAME}|#{ALD_TYPE_REF})(?=\s|\Z)/
629
- ALD_START = /^#{OPT_SPACE}\{:(#{ALD_ID_NAME}):(#{ALD_ANY_CHARS}+)\}\s*?\n/
630
-
631
- # Parse the attribute list definition at the current location.
632
- def parse_ald
633
- @src.pos += @src.matched_size
634
- parse_attribute_list(@src[2], @doc.parse_infos[:ald][@src[1]] ||= {})
635
- true
636
- end
637
- Registry.define_parser(:block, :ald, ALD_START, self)
638
-
639
-
640
- IAL_BLOCK_START = /^#{OPT_SPACE}\{:(?!:)(#{ALD_ANY_CHARS}+)\}\s*?\n/
641
-
642
- # Parse the inline attribute list at the current location.
643
- def parse_block_ial
644
- @src.pos += @src.matched_size
645
- if @tree.children.last && @tree.children.last.type != :blank
646
- parse_attribute_list(@src[1], @tree.children.last.options[:ial] ||= {})
647
- end
648
- true
649
- end
650
- Registry.define_parser(:block, :block_ial, IAL_BLOCK_START, self)
651
-
652
-
653
- EXT_BLOCK_START_STR = "^#{OPT_SPACE}\\{::(%s):(:)?(#{ALD_ANY_CHARS}*)\\}\s*?\n"
654
- EXT_BLOCK_START = /#{EXT_BLOCK_START_STR % ALD_ID_NAME}/
655
-
656
- # Parse the extension block at the current location.
657
- def parse_extension_block
658
- @src.pos += @src.matched_size
659
-
660
- ext = @src[1]
661
- opts = {}
662
- body = nil
663
- parse_attribute_list(@src[3], opts)
664
-
665
- if !@doc.extension.public_methods.map {|m| m.to_s}.include?("parse_#{ext}")
666
- warning("No extension named '#{ext}' found - ignoring extension block")
667
- body = :invalid
668
- end
669
-
670
- if !@src[2]
671
- stop_re = /#{EXT_BLOCK_START_STR % ext}/
672
- if result = @src.scan_until(stop_re)
673
- parse_attribute_list(@src[3], opts)
674
- body = result.sub!(stop_re, '') if body != :invalid
675
- else
676
- body = :invalid
677
- warning("No ending line for extension block '#{ext}' found - ignoring extension block")
678
- end
679
- end
680
-
681
- @doc.extension.send("parse_#{ext}", self, opts, body) if body != :invalid
682
-
683
- true
684
- end
685
- Registry.define_parser(:block, :extension_block, EXT_BLOCK_START, self)
686
-
687
-
688
- FOOTNOTE_DEFINITION_START = /^#{OPT_SPACE}\[\^(#{ALD_ID_NAME})\]:\s*?(.*?\n(?:#{BLANK_LINE}?#{CODEBLOCK_MATCH})*)/
689
-
690
- # Parse the foot note definition at the current location.
691
- def parse_footnote_definition
692
- @src.pos += @src.matched_size
693
-
694
- el = Element.new(:footnote_def)
695
- parse_blocks(el, @src[2].gsub(INDENT, ''))
696
- warning("Duplicate footnote name '#{@src[1]}' - overwriting") if @doc.parse_infos[:footnotes][@src[1]]
697
- (@doc.parse_infos[:footnotes][@src[1]] = {})[:content] = el
698
- end
699
- Registry.define_parser(:block, :footnote_definition, FOOTNOTE_DEFINITION_START, self)
700
-
701
-
702
- require 'rexml/parsers/baseparser'
703
-
704
- #:stopdoc:
705
- # The following regexps are based on the ones used by REXML, with some slight modifications.
706
- #:startdoc:
707
- HTML_COMMENT_RE = /<!--(.*?)-->/m
708
- HTML_INSTRUCTION_RE = /<\?(.*?)\?>/m
709
- HTML_ATTRIBUTE_RE = /\s*(#{REXML::Parsers::BaseParser::UNAME_STR})\s*=\s*(["'])(.*?)\2/m
710
- HTML_TAG_RE = /<((?>#{REXML::Parsers::BaseParser::UNAME_STR}))\s*((?>\s+#{REXML::Parsers::BaseParser::UNAME_STR}\s*=\s*(["']).*?\3)*)\s*(\/)?>/m
711
- HTML_TAG_CLOSE_RE = /<\/(#{REXML::Parsers::BaseParser::NAME_STR})\s*>/
712
-
713
-
714
- HTML_PARSE_AS_BLOCK = %w{applet button blockquote colgroup dd div dl fieldset form iframe li
715
- map noscript object ol table tbody td th thead tfoot tr ul}
716
- HTML_PARSE_AS_SPAN = %w{a abbr acronym address b bdo big cite caption code del dfn dt em
717
- h1 h2 h3 h4 h5 h6 i ins kbd label legend optgroup p pre q rb rbc
718
- rp rt rtc ruby samp select small span strong sub sup tt var}
719
- HTML_PARSE_AS_RAW = %w{script math option textarea}
720
-
721
- HTML_PARSE_AS = Hash.new {|h,k| h[k] = :raw}
722
- HTML_PARSE_AS_BLOCK.each {|i| HTML_PARSE_AS[i] = :block}
723
- HTML_PARSE_AS_SPAN.each {|i| HTML_PARSE_AS[i] = :span}
724
- HTML_PARSE_AS_RAW.each {|i| HTML_PARSE_AS[i] = :raw}
725
-
726
- #:stopdoc:
727
- # Some HTML elements like script belong to both categories (i.e. are valid in block and
728
- # span HTML) and don't appear therefore!
729
- #:startdoc:
730
- HTML_SPAN_ELEMENTS = %w{a abbr acronym b big bdo br button cite code del dfn em i img input
731
- ins kbd label option q rb rbc rp rt rtc ruby samp select small span
732
- strong sub sup textarea tt var}
733
- HTML_BLOCK_ELEMENTS = %w{address applet button blockquote caption col colgroup dd div dl dt fieldset
734
- form h1 h2 h3 h4 h5 h6 hr iframe legend li map ol optgroup p pre table tbody
735
- td th thead tfoot tr ul}
736
- HTML_ELEMENTS_WITHOUT_BODY = %w{area br col hr img input}
737
-
738
- HTML_BLOCK_START = /^#{OPT_SPACE}<(#{REXML::Parsers::BaseParser::UNAME_STR}|\?|!--|\/)/
739
-
740
- # Parse the HTML at the current position as block level HTML.
741
- def parse_block_html
742
- if result = @src.scan(HTML_COMMENT_RE)
743
- @tree.children << Element.new(:html_raw, result, :type => :block)
744
- @src.scan(/.*?\n/)
745
- true
746
- elsif result = @src.scan(HTML_INSTRUCTION_RE)
747
- @tree.children << Element.new(:html_raw, result, :type => :block)
748
- @src.scan(/.*?\n/)
749
- true
750
- else
751
- if (!@src.check(/^#{OPT_SPACE}#{HTML_TAG_RE}/) && !@src.check(/^#{OPT_SPACE}#{HTML_TAG_CLOSE_RE}/)) ||
752
- HTML_SPAN_ELEMENTS.include?(@src[1])
753
- if @tree.type == :html_element && @tree.options[:parse_type] != :block
754
- add_html_text(@src.scan(/.*?\n/), @tree)
755
- add_html_text(@src.scan_until(/(?=#{HTML_BLOCK_START})|\Z/), @tree)
756
- return true
757
- else
758
- return false
759
- end
760
- end
761
-
762
- current_el = (@tree.type == :html_element ? @tree : nil)
763
- @src.scan(/^(#{OPT_SPACE})(.*?)\n/)
764
- if current_el && current_el.options[:parse_type] == :raw
765
- add_html_text(@src[1], current_el)
766
- end
767
- line = @src[2]
768
- stack = []
769
-
770
- while line.size > 0
771
- index_start_tag, index_close_tag = line.index(HTML_TAG_RE), line.index(HTML_TAG_CLOSE_RE)
772
- if index_start_tag && (!index_close_tag || index_start_tag < index_close_tag)
773
- md = line.match(HTML_TAG_RE)
774
- line = md.post_match
775
- add_html_text(md.pre_match, current_el) if current_el
776
- if HTML_SPAN_ELEMENTS.include?(md[1]) || (current_el && current_el.options[:parse_type] == :span)
777
- add_html_text(md.to_s, current_el) if current_el
778
- next
779
- end
780
-
781
- attrs = {}
782
- md[2].scan(HTML_ATTRIBUTE_RE).each {|name,sep,val| attrs[name] = val}
783
-
784
- parse_type = if !current_el || current_el.options[:parse_type] != :raw
785
- (@doc.options[:parse_block_html] ? HTML_PARSE_AS[md[1]] : :raw)
786
- else
787
- :raw
788
- end
789
- if val = get_parse_type(attrs.delete('markdown'))
790
- parse_type = (val == :default ? HTML_PARSE_AS[md[1]] : val)
791
- end
792
- el = Element.new(:html_element, md[1], :attr => attrs, :type => :block, :parse_type => parse_type)
793
- el.options[:no_start_indent] = true if !stack.empty?
794
- el.options[:outer_element] = true if !current_el
795
- el.options[:parent_is_raw] = true if current_el && current_el.options[:parse_type] == :raw
796
-
797
- @tree.children << el
798
- if !md[4] && HTML_ELEMENTS_WITHOUT_BODY.include?(el.value)
799
- warning("The HTML tag '#{el.value}' cannot have any content - auto-closing it")
800
- elsif !md[4]
801
- @unclosed_html_tags.push(el)
802
- @stack.push(@tree)
803
- stack.push(current_el)
804
- @tree = current_el = el
805
- end
806
- elsif index_close_tag
807
- md = line.match(HTML_TAG_CLOSE_RE)
808
- line = md.post_match
809
- add_html_text(md.pre_match, current_el) if current_el
810
-
811
- if @unclosed_html_tags.size > 0 && md[1] == @unclosed_html_tags.last.value
812
- el = @unclosed_html_tags.pop
813
- @tree = @stack.pop
814
- current_el.options[:compact] = true if stack.size > 0
815
- current_el = stack.pop || (@tree.type == :html_element ? @tree : nil)
816
- else
817
- if !HTML_SPAN_ELEMENTS.include?(md[1]) && @tree.options[:parse_type] != :span
818
- warning("Found invalidly used HTML closing tag for '#{md[1]}'")
819
- elsif current_el
820
- add_html_text(md.to_s, current_el)
821
- end
822
- end
823
- else
824
- if current_el
825
- line.rstrip! if current_el.options[:parse_type] == :block
826
- add_html_text(line + "\n", current_el)
827
- else
828
- add_text(line + "\n")
829
- end
830
- line = ''
831
- end
832
- end
833
- if current_el && (current_el.options[:parse_type] == :span || current_el.options[:parse_type] == :raw)
834
- result = @src.scan_until(/(?=#{HTML_BLOCK_START})|\Z/)
835
- last = current_el.children.last
836
- result = "\n" + result if last.nil? || (last.type != :text && last.type != :raw) || last.value !~ /\n\Z/
837
- add_html_text(result, current_el)
838
- end
839
- true
840
- end
841
- end
842
- Registry.define_parser(:block, :block_html, HTML_BLOCK_START, self)
843
-
844
- # Return the HTML parse type defined by the string +val+, i.e. raw when "0", default parsing
845
- # (return value +nil+) when "1", span parsing when "span" and block parsing when "block". If
846
- # +val+ is nil, then the default parsing mode is used.
847
- def get_parse_type(val)
848
- case val
849
- when "0" then :raw
850
- when "1" then :default
851
- when "span" then :span
852
- when "block" then :block
853
- when NilClass then nil
854
- else
855
- warning("Invalid markdown attribute val '#{val}', using default")
856
- nil
857
- end
858
- end
859
-
860
- # Special version of #add_text which either creates a :text element or a :raw element,
861
- # depending on the HTML element type.
862
- def add_html_text(text, tree)
863
- type = (tree.options[:parse_type] == :raw ? :raw : :text)
864
- if tree.children.last && tree.children.last.type == type
865
- tree.children.last.value << text
866
- elsif !text.empty?
867
- tree.children << Element.new(type, text)
868
- end
869
- end
870
-
871
-
872
- ESCAPED_CHARS = /\\([\\.*_+-`()\[\]{}#!])/
873
-
874
- # Parse the backslash-escaped character at the current location.
875
- def parse_escaped_chars
876
- @src.pos += @src.matched_size
877
- add_text(@src[1])
878
- end
879
- Registry.define_parser(:span, :escaped_chars, ESCAPED_CHARS, self)
880
-
881
-
882
- # Parse the HTML entity at the current location.
883
- def parse_html_entity
884
- @src.pos += @src.matched_size
885
- @tree.children << Element.new(:entity, @src.matched)
886
- end
887
- Registry.define_parser(:span, :html_entity, REXML::Parsers::BaseParser::REFERENCE_RE, self)
888
-
889
-
890
- LINE_BREAK = /( |\\\\)(?=\n)/
891
-
892
- # Parse the line break at the current location.
893
- def parse_line_break
894
- @src.pos += @src.matched_size
895
- @tree.children << Element.new(:br)
896
- end
897
- Registry.define_parser(:span, :line_break, LINE_BREAK, self)
898
-
899
-
900
- TYPOGRAPHIC_SYMS = [['---', :mdash], ['--', :ndash], ['...', :ellipsis],
901
- ['\\<<', '&lt;&lt;'], ['\\>>', '&gt;&gt;'],
902
- ['<< ', :laquo_space], [' >>', :raquo_space],
903
- ['<<', :laquo], ['>>', :raquo]]
904
- TYPOGRAPHIC_SYMS_SUBST = Hash[*TYPOGRAPHIC_SYMS.flatten]
905
- TYPOGRAPHIC_SYMS_RE = /#{TYPOGRAPHIC_SYMS.map {|k,v| Regexp.escape(k)}.join('|')}/
906
-
907
- # Parse the typographic symbols at the current location.
908
- def parse_typographic_syms
909
- @src.pos += @src.matched_size
910
- val = TYPOGRAPHIC_SYMS_SUBST[@src.matched]
911
- if val.kind_of?(Symbol)
912
- @tree.children << Element.new(:typographic_sym, val)
913
- else
914
- add_text(val.dup)
915
- end
916
- end
917
- Registry.define_parser(:span, :typographic_syms, TYPOGRAPHIC_SYMS_RE, self)
918
-
919
-
920
- AUTOLINK_START = /<((mailto|https?|ftps?):.*?|\S*?@\S*?)>/
921
-
922
- # Parse the autolink at the current location.
923
- def parse_autolink
924
- @src.pos += @src.matched_size
925
-
926
- text = href = @src[1]
927
- if @src[2].nil? || @src[2] == 'mailto'
928
- text = obfuscate_email(@src[2] ? @src[1].sub(/^mailto:/, '') : @src[1])
929
- mailto = obfuscate_email('mailto')
930
- href = "#{mailto}:#{text}"
931
- end
932
- el = Element.new(:a, nil, {:attr => {'href' => href}})
933
- add_text(text, el)
934
- @tree.children << el
935
- end
936
- Registry.define_parser(:span, :autolink, AUTOLINK_START, self)
937
-
938
-
939
- CODESPAN_DELIMITER = /`+/
940
-
941
- # Parse the codespan at the current scanner location.
942
- def parse_codespan
943
- result = @src.scan(CODESPAN_DELIMITER)
944
- simple = (result.length == 1)
945
- reset_pos = @src.pos
946
-
947
- if simple && @src.pre_match =~ /\s\Z/ && @src.match?(/\s/)
948
- add_text(result)
949
- return
950
- end
951
-
952
- text = @src.scan_until(/#{result}/)
953
- if text
954
- text.sub!(/#{result}\Z/, '')
955
- if !simple
956
- text = text[1..-1] if text[0..0] == ' '
957
- text = text[0..-2] if text[-1..-1] == ' '
958
- end
959
- @tree.children << Element.new(:codespan, text)
960
- else
961
- @src.pos = reset_pos
962
- add_text(result)
963
- end
964
- end
965
- Registry.define_parser(:span, :codespan, CODESPAN_DELIMITER, self)
966
-
967
-
968
- IAL_SPAN_START = /\{:(#{ALD_ANY_CHARS}+)\}/
969
-
970
- # Parse the inline attribute list at the current location.
971
- def parse_span_ial
972
- @src.pos += @src.matched_size
973
- if @tree.children.last && @tree.children.last.type != :text
974
- attr = {}
975
- parse_attribute_list(@src[1], attr)
976
- update_ial_with_ial(@tree.children.last.options[:ial] ||= {}, attr)
977
- update_attr_with_ial(@tree.children.last.options[:attr] ||= {}, attr)
978
- else
979
- warning("Ignoring span IAL because preceding element is just text")
980
- add_text(@src.matched)
981
- end
982
- end
983
- Registry.define_parser(:span, :span_ial, IAL_SPAN_START, self)
984
-
985
-
986
- FOOTNOTE_MARKER_START = /\[\^(#{ALD_ID_NAME})\]/
987
-
988
- # Parse the footnote marker at the current location.
989
- def parse_footnote_marker
990
- @src.pos += @src.matched_size
991
- fn_def = @doc.parse_infos[:footnotes][@src[1]]
992
- if fn_def
993
- valid = fn_def[:marker] && fn_def[:marker].options[:stack][0..-2].zip(fn_def[:marker].options[:stack][1..-1]).all? do |par, child|
994
- par.children.include?(child)
995
- end
996
- if !fn_def[:marker] || !valid
997
- fn_def[:marker] = Element.new(:footnote, nil, :name => @src[1])
998
- fn_def[:marker].options[:stack] = [@stack, @tree, fn_def[:marker]].flatten.compact
999
- @tree.children << fn_def[:marker]
1000
- else
1001
- warning("Footnote marker '#{@src[1]}' already appeared in document, ignoring newly found marker")
1002
- add_text(@src.matched)
1003
- end
1004
- else
1005
- warning("Footnote definition for '#{@src[1]}' not found")
1006
- add_text(@src.matched)
1007
- end
1008
- end
1009
- Registry.define_parser(:span, :footnote_marker, FOOTNOTE_MARKER_START, self)
1010
-
1011
-
1012
- EMPHASIS_START = /(?:\*\*?|__?)/
1013
-
1014
- # Parse the emphasis at the current location.
1015
- def parse_emphasis
1016
- result = @src.scan(EMPHASIS_START)
1017
- element = (result.length == 2 ? :strong : :em)
1018
- type = (result =~ /_/ ? '_' : '*')
1019
- reset_pos = @src.pos
1020
-
1021
- if (type == '_' && @src.pre_match =~ /[[:alpha:]]\Z/ && @src.check(/[[:alpha:]]/)) || @src.check(/\s/)
1022
- add_text(result)
1023
- return
1024
- end
1025
-
1026
- sub_parse = lambda do |delim, elem|
1027
- el = Element.new(elem)
1028
- stop_re = /#{Regexp.escape(delim)}/
1029
- found = parse_spans(el, stop_re) do
1030
- (@src.string[@src.pos-1, 1] !~ /\s/) &&
1031
- (elem != :em || !@src.match?(/#{Regexp.escape(delim*2)}(?!#{Regexp.escape(delim)})/)) &&
1032
- (type != '_' || !@src.match?(/#{Regexp.escape(delim)}[[:alpha:]]/)) && el.children.size > 0
1033
- end
1034
- [found, el, stop_re]
1035
- end
1036
-
1037
- found, el, stop_re = sub_parse.call(result, element)
1038
- if !found && element == :strong
1039
- @src.pos = reset_pos - 1
1040
- found, el, stop_re = sub_parse.call(type, :em)
1041
- end
1042
- if found
1043
- @src.scan(stop_re)
1044
- @tree.children << el
1045
- else
1046
- @src.pos = reset_pos
1047
- add_text(result)
1048
- end
1049
- end
1050
- Registry.define_parser(:span, :emphasis, EMPHASIS_START, self)
1051
-
1052
-
1053
- HTML_SPAN_START = /<(#{REXML::Parsers::BaseParser::UNAME_STR}|\?|!--)/
1054
-
1055
- # Parse the HTML at the current position as span level HTML.
1056
- def parse_span_html
1057
- if result = @src.scan(HTML_COMMENT_RE)
1058
- @tree.children << Element.new(:html_raw, result, :type => :span)
1059
- elsif result = @src.scan(HTML_INSTRUCTION_RE)
1060
- @tree.children << Element.new(:html_raw, result, :type => :span)
1061
- elsif result = @src.scan(HTML_TAG_RE)
1062
- if HTML_BLOCK_ELEMENTS.include?(@src[1])
1063
- add_text(result)
1064
- return
1065
- end
1066
- reset_pos = @src.pos
1067
- attrs = {}
1068
- @src[2].scan(HTML_ATTRIBUTE_RE).each {|name,sep,val| attrs[name] = val.gsub(/\n+/, ' ')}
1069
-
1070
- do_parsing = @doc.options[:parse_span_html]
1071
- if val = get_parse_type(attrs.delete('markdown'))
1072
- if val == :block
1073
- warning("Cannot use block level parsing in span level HTML tag - using default mode")
1074
- elsif val == :span || val == :default
1075
- do_parsing = true
1076
- elsif val == :raw
1077
- do_parsing = false
1078
- end
1079
- end
1080
- do_parsing = false if HTML_PARSE_AS_RAW.include?(@src[1])
1081
-
1082
- el = Element.new(:html_element, @src[1], :attr => attrs, :type => :span)
1083
- stop_re = /<\/#{Regexp.escape(@src[1])}\s*>/
1084
- if @src[4]
1085
- @tree.children << el
1086
- elsif HTML_ELEMENTS_WITHOUT_BODY.include?(el.value)
1087
- warning("The HTML tag '#{el.value}' cannot have any content - auto-closing it")
1088
- @tree.children << el
1089
- else
1090
- if parse_spans(el, stop_re)
1091
- end_pos = @src.pos
1092
- @src.scan(stop_re)
1093
- @tree.children << el
1094
- if !do_parsing
1095
- el.children.clear
1096
- el.children << Element.new(:raw, @src.string[reset_pos...end_pos])
1097
- end
1098
- else
1099
- @src.pos = reset_pos
1100
- add_text(result)
1101
- end
1102
- end
1103
- else
1104
- add_text(@src.scan(/./))
1105
- end
1106
- end
1107
- Registry.define_parser(:span, :span_html, HTML_SPAN_START, self)
1108
-
1109
-
1110
- LINK_TEXT_BRACKET_RE = /\\\[|\\\]|\[|\]/
1111
- LINK_INLINE_ID_RE = /\s*?\[(#{LINK_ID_CHARS}+)?\]/
1112
- LINK_INLINE_TITLE_RE = /\s*?(["'])(.+?)\1\s*?\)/
1113
-
1114
- LINK_START = /!?\[(?=[^^])/
1115
-
1116
- # Parse the link at the current scanner position. This method is used to parse normal links as
1117
- # well as image links.
1118
- def parse_link
1119
- result = @src.scan(LINK_START)
1120
- reset_pos = @src.pos
1121
-
1122
- link_type = (result =~ /^!/ ? :img : :a)
1123
-
1124
- # no nested links allowed
1125
- if link_type == :a && (@tree.type == :img || @tree.type == :a || @stack.any? {|t,s| t && (t.type == :img || t.type == :a)})
1126
- add_text(result)
1127
- return
1128
- end
1129
- el = Element.new(link_type)
1130
-
1131
- stop_re = /\]|!?\[/
1132
- count = 1
1133
- found = parse_spans(el, stop_re) do
1134
- case @src.matched
1135
- when "[", "!["
1136
- count += 1
1137
- when "]"
1138
- count -= 1
1139
- end
1140
- count - el.children.select {|c| c.type == :img}.size == 0
1141
- end
1142
- if !found || el.children.empty?
1143
- @src.pos = reset_pos
1144
- add_text(result)
1145
- return
1146
- end
1147
- alt_text = @src.string[reset_pos...@src.pos]
1148
- conv_link_id = alt_text.gsub(/(\s|\n)+/m, ' ').gsub(LINK_ID_NON_CHARS, '').downcase
1149
- @src.scan(stop_re)
1150
-
1151
- # reference style link or no link url
1152
- if @src.scan(LINK_INLINE_ID_RE) || !@src.check(/\(/)
1153
- link_id = (@src[1] || conv_link_id).downcase
1154
- if @doc.parse_infos[:link_defs].has_key?(link_id)
1155
- add_link(el, @doc.parse_infos[:link_defs][link_id].first, @doc.parse_infos[:link_defs][link_id].last, alt_text)
1156
- else
1157
- warning("No link definition for link ID '#{link_id}' found")
1158
- @src.pos = reset_pos
1159
- add_text(result)
1160
- end
1161
- return
1162
- end
1163
-
1164
- # link url in parentheses
1165
- if @src.scan(/\(<(.*?)>/)
1166
- link_url = @src[1]
1167
- if @src.scan(/\)/)
1168
- add_link(el, link_url, nil, alt_text)
1169
- return
1170
- end
1171
- else
1172
- link_url = ''
1173
- re = /\(|\)|\s/
1174
- nr_of_brackets = 0
1175
- while temp = @src.scan_until(re)
1176
- link_url += temp
1177
- case @src.matched
1178
- when /\s/
1179
- break
1180
- when '('
1181
- nr_of_brackets += 1
1182
- when ')'
1183
- nr_of_brackets -= 1
1184
- break if nr_of_brackets == 0
1185
- end
1186
- end
1187
- link_url = link_url[1..-2]
1188
-
1189
- if nr_of_brackets == 0
1190
- add_link(el, link_url, nil, alt_text)
1191
- return
1192
- end
1193
- end
1194
-
1195
- if @src.scan(LINK_INLINE_TITLE_RE)
1196
- add_link(el, link_url, @src[2], alt_text)
1197
- else
1198
- @src.pos = reset_pos
1199
- add_text(result)
1200
- end
1201
- end
1202
- Registry.define_parser(:span, :link, LINK_START, self)
1203
-
1204
-
1205
- # This helper methods adds the approriate attributes to the element +el+ of type +a+ or +img+
1206
- # and the element itself to the <tt>@tree</tt>.
1207
- def add_link(el, href, title, alt_text = nil)
1208
- el.options[:attr] ||= {}
1209
- el.options[:attr]['title'] = title if title
1210
- if el.type == :a
1211
- el.options[:attr]['href'] = href
1212
- else
1213
- el.options[:attr]['src'] = href
1214
- el.options[:attr]['alt'] = alt_text
1215
- el.children.clear
1216
- end
1217
- @tree.children << el
1218
- end
1219
-
1220
- end
29
+ autoload :Kramdown, 'kramdown/parser/kramdown'
1221
30
 
1222
31
  end
1223
32