bluefeather 0.10

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (46) hide show
  1. data/Rakefile.rb +168 -0
  2. data/bin/bluefeather +4 -0
  3. data/doc/author-and-license.bfdoc +16 -0
  4. data/doc/base.css +135 -0
  5. data/doc/basic-usage.bfdoc +265 -0
  6. data/doc/black.css +130 -0
  7. data/doc/class-reference.bfdoc +105 -0
  8. data/doc/difference.bfdoc +63 -0
  9. data/doc/en/author-and-license.bfdoc +20 -0
  10. data/doc/en/base.css +136 -0
  11. data/doc/en/basic-usage.bfdoc +266 -0
  12. data/doc/en/black.css +130 -0
  13. data/doc/en/class-reference.bfdoc +6 -0
  14. data/doc/en/difference.bfdoc +72 -0
  15. data/doc/en/format-extension.bfdoc +324 -0
  16. data/doc/en/index.bfdoc +41 -0
  17. data/doc/en/metadata-reference.bfdoc +7 -0
  18. data/doc/format-extension.bfdoc +325 -0
  19. data/doc/index.bfdoc +36 -0
  20. data/doc/metadata-reference.bfdoc +86 -0
  21. data/lib/bluefeather.rb +1872 -0
  22. data/lib/bluefeather/cui.rb +207 -0
  23. data/license/gpl-2.0.txt +339 -0
  24. data/license/gpl.ja.txt +416 -0
  25. data/original-tests/00_Class.tests.rb +42 -0
  26. data/original-tests/05_Markdown.tests.rb +1530 -0
  27. data/original-tests/10_Bug.tests.rb +44 -0
  28. data/original-tests/15_Contrib.tests.rb +130 -0
  29. data/original-tests/bftestcase.rb +278 -0
  30. data/original-tests/data/antsugar.txt +34 -0
  31. data/original-tests/data/ml-announce.txt +17 -0
  32. data/original-tests/data/re-overflow.txt +67 -0
  33. data/original-tests/data/re-overflow2.txt +281 -0
  34. data/readme_en.txt +37 -0
  35. data/readme_ja.txt +33 -0
  36. data/spec/auto-link.rb +100 -0
  37. data/spec/code-block.rb +91 -0
  38. data/spec/dl.rb +182 -0
  39. data/spec/escape-char.rb +18 -0
  40. data/spec/footnote.rb +34 -0
  41. data/spec/header-id.rb +38 -0
  42. data/spec/lib/common.rb +103 -0
  43. data/spec/table.rb +70 -0
  44. data/spec/toc.rb +64 -0
  45. data/spec/warning.rb +61 -0
  46. metadata +99 -0
data/doc/index.bfdoc ADDED
@@ -0,0 +1,36 @@
1
+ CSS: black.css
2
+
3
+ <div class="back"><a> </a></div>
4
+
5
+
6
+ BlueFeather マニュアル
7
+ ====
8
+
9
+ → [English version](en/index.html)
10
+
11
+
12
+ (2009-02-22 バージョン 0.10 準拠)
13
+
14
+ BlueFeather は、拡張 Markdown 記法で書かれたテキストを html に変換するソフトウェアです。
15
+ コマンドラインツールと、Ruby スクリプト内で変換を行うためのライブラリがセットになっています。
16
+
17
+ 広く使われている Markdown 実装である [BlueCloth][] をベースとしつつ、既知のバグの修正やインターフェースの変更、そして記法・機能へのさまざまな拡張を施しています。
18
+
19
+ なお、このマニュアルページそのものも、 BlueFeather を用いて生成された html 文書です。変換前のテキストファイルは `doc` ディレクトリ内に同梱しています。
20
+
21
+
22
+ * [インストール・基本的な使い方](basic-usage.html)
23
+ * [BlueCloth との違い](difference.html)
24
+ * [Markdown 記法の拡張](format-extension.html)
25
+
26
+ ~
27
+
28
+ * [クラスリファレンス](class-reference.html)
29
+ * [メタデータリファレンス](metadata-reference.html)
30
+
31
+ ~
32
+
33
+ * [連絡先・ライセンス](author-and-license.html)
34
+ * [BlueFeather 配布サイト(http://ruby.morphball.net/bluefeather/)](http://ruby.morphball.net/bluefeather/)
35
+
36
+ [BlueCloth]: http://www.deveiate.org/projects/BlueCloth
@@ -0,0 +1,86 @@
1
+ Title: メタデータリファレンス - BlueFeather マニュアル
2
+ CSS: black.css
3
+
4
+ <div class="back"><a href="index.html">BlueFeather マニュアル</a></div>
5
+
6
+ メタデータリファレンス
7
+ ====
8
+
9
+ 文書ファイル(*.bfdoc)の頭に、`:` 記号で区切られたキーと値の組(ヘッダー)を書いておくことで、その文書にタイトルなどの情報(メタデータ)を付け加えることができます。
10
+
11
+ Title: 文書名
12
+ CSS: style.css
13
+ Atom-Feed: info/atom.xml
14
+
15
+ ここから本文
16
+
17
+ これらの文書メタデータは、`parse_document` や `parse_document_file` などのメソッドを使って解釈したときにのみ有効です。
18
+
19
+ キー名の大文字/小文字は区別されません。
20
+
21
+
22
+ {toc}
23
+
24
+
25
+ 重要なメタデータ
26
+ ----
27
+
28
+
29
+ ### CSS: {#css}
30
+
31
+ CSS: http://example.net/style.css
32
+
33
+ CSS スタイルシートの URL。生成される html 文書の head 要素内に、そのスタイルシートへのリンクが付け加えられる。
34
+
35
+ ### Encoding: {#encoding}
36
+
37
+ Encoding: utf-8
38
+
39
+ その文書のマルチバイトエンコーディングを表す。utf-8, euc-jp, shift-jis, ascii のいずれかが有効(小文字と大文字は区別しない)。
40
+ html の head 要素内に出力される Content-Type の値、および変換処理に影響する。
41
+
42
+ なお、他のヘッダーの値をマルチバイト文字列で記述する場合、 *Encoding はそれらのヘッダーよりも先に記述されていなければならない。*
43
+ そのため、このヘッダーは常に文書ファイルの最初に記述しておくことが推奨される。
44
+
45
+ 省略された場合には、*エンコーディングが UTF-8 であるものとして取り扱う。*
46
+
47
+ ### Title: {#title}
48
+
49
+ Title: にんじんの美味しい調理法
50
+
51
+
52
+ その文書の名前(表題)。生成されるhtml文書の title 要素に、ここで指定した値が使われる。
53
+ 省略された場合には、本文中にレベル1の見出し(h1)があればその内容を title 要素とし、なければ「no title」とする。
54
+
55
+
56
+
57
+ 補助的なメタデータ
58
+ ----
59
+
60
+ ### Atom-Feed: {#atom-feed}
61
+ ### RDF-Feed: {#rdf-feed}
62
+ ### RSS-Feed: {#rss-feed}
63
+
64
+ Atom-Feed: example.xml
65
+
66
+
67
+ ニュースフィードの URL。生成される html 文書の head 要素内に、以下のようなリンクが付け加えられ、RSS リーダーなどから登録できるようになる(オートディスカバリー)。
68
+
69
+ <link rel="alternate" type="application/atom+xml" href="example.xml" />
70
+
71
+ どのヘッダー名を用いるかによって、生成される link 要素の type 属性値が異なる。
72
+ 基本的には RSS 1.0 なら RDF-Feed を、RSS 2.0 なら RSS-Feed を、Atom (Atom Syndication Format) なら Atom-Feed を使うことが推奨される。
73
+
74
+ ### Description: {#description}
75
+
76
+ Description: 簡単にチャレンジできる、にんじんの美味しい調理法についての解説。
77
+
78
+
79
+ その文書の説明。`<meta name="description" content="~">` の内容になる。
80
+
81
+
82
+ ### Keywords: {#keywords}
83
+
84
+ Description: にんじん,レシピ,料理
85
+
86
+ その文書を表すキーワード。`<meta name="keywords" content="~">` の内容になる。
@@ -0,0 +1,1872 @@
1
+ #
2
+ # BlueFeather - Extended Markdown Converter
3
+ #
4
+ # Author of Original BlueCloth: Michael Granger <ged@FaerieMUD.org>
5
+ # Remaker: Dice <tetradice@gmail.com>
6
+ # Website: http://ruby.morphball.net/bluefeather/
7
+ # License: GPL version 2 or later
8
+ #
9
+ # If you want to know better about BlueFeather, See the attached document
10
+ # 'doc/index.html' or Website.
11
+ #
12
+ #
13
+ #
14
+ #-- Copyrights & License -------------------------------------------------------
15
+ #
16
+ # Original Markdown:
17
+ # Copyright (c) 2003-2004 John Gruber
18
+ # <http://daringfireball.net/>
19
+ # All rights reserved.
20
+ #
21
+ # Orignal BlueCloth:
22
+ # Copyright (c) 2004 The FaerieMUD Consortium.
23
+ #
24
+ # BlueFeather:
25
+ # Copyright (c) 2009 Dice
26
+ #
27
+ # BlueFeater is free software; you can redistribute it and/or modify it under
28
+ # the terms of the GNU General Public License as published by the Free Software
29
+ # Foundation; either version 2 of the License, or (at your option) any later
30
+ # version.
31
+ #
32
+ # BlueCloth is distributed in the hope that it will be useful, but WITHOUT ANY
33
+ # WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR
34
+ # A PARTICULAR PURPOSE. See the GNU General Public License for more details.
35
+
36
+
37
+ require 'digest/md5'
38
+ require 'logger'
39
+ require 'strscan'
40
+ require 'stringio'
41
+
42
+
43
+ module BlueFeather
44
+ VERSION = '0.10'
45
+ VERSION_NUMBER = 0.10
46
+
47
+
48
+ # Fancy methods
49
+ class << self
50
+ def parse_text(src)
51
+ Parser.new.parse_text(src)
52
+ end
53
+
54
+ alias parse parse_text
55
+
56
+ def parse_document(src)
57
+ Parser.new.parse_document(src)
58
+ end
59
+
60
+
61
+ def parse_text_file(path)
62
+ Parser.new.parse_text_file(path)
63
+ end
64
+
65
+ alias parse_file parse_text_file
66
+
67
+ def parse_document_file(path)
68
+ Parser.new.parse_document_file(path)
69
+ end
70
+ end
71
+
72
+ ### Exception class on BlueFeather running.
73
+ class Error < ::RuntimeError
74
+ end
75
+
76
+ class EncodingError < Error
77
+ end
78
+
79
+ ### Exception class for formatting errors.
80
+ class FormatError < Error
81
+
82
+ ### Create a new FormatError with the given source +str+ and an optional
83
+ ### message about the +specific+ error.
84
+ def initialize( str, specific=nil )
85
+ if specific
86
+ msg = "Bad markdown format near %p: %s" % [ str, specific ]
87
+ else
88
+ msg = "Bad markdown format near %p" % str
89
+ end
90
+
91
+ super( msg )
92
+ end
93
+ end
94
+
95
+ module EncodingType
96
+ EUC = 'euc-jp'
97
+ SJIS = 'shift-jis'
98
+ UTF8 = 'utf-8'
99
+ ASCII = 'ascii'
100
+
101
+ def self.type_to_kcode(type)
102
+ case type
103
+ when EUC, SJIS, UTF8
104
+ type
105
+ when ASCII
106
+ 'none'
107
+ else
108
+ raise EncodingError, "not adapted encoding type - #{type} (shift-jis, euc-jp, utf-8, or ascii)"
109
+ end
110
+ end
111
+
112
+
113
+ def self.type_to_charset(type)
114
+ case type
115
+ when EUC
116
+ 'euc-jp'
117
+ when SJIS
118
+ 'shift_jis'
119
+ when UTF8
120
+ 'utf-8'
121
+ when ASCII, nil
122
+ nil
123
+ else
124
+ raise EncodingError, "not adapted encoding type - #{type} (shift-jis, euc-jp, utf-8, or ascii)"
125
+ end
126
+ end
127
+ end
128
+
129
+ module Util
130
+ HTML_ESC = {
131
+ '&' => '&amp;',
132
+ '"' => '&quot;',
133
+ '<' => '&lt;',
134
+ '>' => '&gt;'
135
+ }
136
+
137
+ module_function
138
+
139
+ # from http://jp.rubyist.net/magazine/?0010-CodeReview#l28
140
+ # (Author: Minero Aoki)
141
+ def escape_html(str)
142
+ table = HTML_ESC # optimize
143
+ str.gsub(/[&"<>]/) {|s| table[s] }
144
+ end
145
+
146
+ def generate_blank_string_io(encoding_base)
147
+ io = StringIO.new
148
+
149
+ if io.respond_to?(:set_encoding) then
150
+ io.set_encoding(encoding_base.encoding)
151
+ end
152
+
153
+ return io
154
+ end
155
+
156
+ def change_kcode(kcode = nil)
157
+ if defined?(Encoding) then
158
+ # ruby 1.9 later
159
+ yield
160
+ else
161
+ # ruby 1.8 earlier
162
+ original_kcode = $KCODE
163
+
164
+ begin
165
+ $KCODE = kcode if kcode
166
+ yield
167
+
168
+ ensure
169
+ # recover
170
+ $KCODE = original_kcode
171
+ end
172
+ end # if defined?
173
+ end # def
174
+ end
175
+
176
+ class Document
177
+ HEADER_PATTERN = /^(.+?)\s*\:\s*(.+)(?:\n|\Z)/
178
+ BLANK_LINE_PATTERN = /^\n/
179
+ HEADER_SEQUEL_PATTERN = /^\s+(.+)$/
180
+
181
+ attr_accessor :headers, :body
182
+ alias text body
183
+ alias text= body=
184
+
185
+ class << self
186
+ def parse(source)
187
+ s = StringScanner.new(source)
188
+ headers = {}
189
+
190
+ Util.change_kcode(nil){
191
+ # get headers
192
+ while s.scan(HEADER_PATTERN) do
193
+ key = s[1].downcase; value = s[2]
194
+ headers[key] = value
195
+
196
+ if key == 'encoding' then
197
+ if s.string.respond_to?(:force_encoding) then
198
+ s.string.force_encoding(value)
199
+ else
200
+ $KCODE = EncodingType.type_to_kcode(value.downcase)
201
+ end
202
+ end
203
+ end
204
+
205
+ # skip blank lines
206
+ while s.scan(BLANK_LINE_PATTERN) do
207
+ end
208
+ }
209
+
210
+ body = s.peek(source.length - s.pos + 1)
211
+ return self.new(headers, body)
212
+ end
213
+
214
+ end
215
+
216
+ def initialize(headers = {}, body = '')
217
+ @headers = headers
218
+ @body = body
219
+ end
220
+
221
+ def [](key)
222
+ @headers[key.to_s.downcase]
223
+ end
224
+
225
+ def []=(key, value)
226
+ @headers[key.to_s.downcase] = value.to_s
227
+ end
228
+
229
+ def title
230
+ @headers['title']
231
+ end
232
+
233
+ def css
234
+ @headers['css']
235
+ end
236
+
237
+ def encoding_type
238
+ (@headers['encoding'] ? @headers['encoding'].downcase : 'utf-8')
239
+ end
240
+
241
+ def kcode
242
+ self.encoding_type && EncodingType.type_to_kcode(self.encoding_type)
243
+ end
244
+
245
+ def to_html
246
+ Parser.new.document_to_html(self)
247
+ end
248
+ end
249
+
250
+
251
+ class Parser
252
+ # Rendering state class Keeps track of URLs, titles, and HTML blocks
253
+ # midway through a render. I prefer this to the globals of the Perl version
254
+ # because globals make me break out in hives. Or something.
255
+ class RenderState
256
+ # Headers struct.
257
+ Header = Struct.new(:id, :level, :content, :content_html)
258
+
259
+ # from Original BlueCloth
260
+ attr_accessor :urls, :titles, :html_blocks, :log
261
+
262
+ # BlueFeather Extension
263
+ attr_accessor :footnotes, :found_footnote_ids, :warnings
264
+ attr_accessor :headers, :block_transform_depth
265
+
266
+ def initialize
267
+ @urls, @titles, @html_blocks = {}, {}, {}
268
+ @log = nil
269
+ @footnotes, @found_footnote_ids, @warnings = {}, [], []
270
+ @headers = []
271
+ @block_transform_depth = 0
272
+ end
273
+
274
+ end
275
+
276
+ # Tab width for #detab! if none is specified
277
+ TabWidth = 4
278
+
279
+ # The tag-closing string -- set to '>' for HTML
280
+ EmptyElementSuffix = " />";
281
+
282
+ # Table of MD5 sums for escaped characters
283
+ EscapeTable = {}
284
+ '\\`*_{}[]()#.!|:~'.split(//).each {|char|
285
+ hash = Digest::MD5::hexdigest( char )
286
+
287
+ EscapeTable[ char ] = {
288
+ :md5 => hash,
289
+ :md5re => Regexp::new( hash ),
290
+ :re => Regexp::new( '\\\\' + Regexp::escape(char) ),
291
+ }
292
+ }
293
+
294
+
295
+ #################################################################
296
+ ### I N S T A N C E M E T H O D S
297
+ #################################################################
298
+
299
+ ### Create a new BlueFeather parser.
300
+ def initialize(*restrictions)
301
+ @log = Logger::new( $deferr )
302
+ @log.level = $DEBUG ?
303
+ Logger::DEBUG :
304
+ ($VERBOSE ? Logger::INFO : Logger::WARN)
305
+ @scanner = nil
306
+
307
+ # Add any restrictions, and set the line-folding attribute to reflect
308
+ # what happens by default.
309
+ @filter_html = nil
310
+ @filter_styles = nil
311
+ restrictions.flatten.each {|r| __send__("#{r}=", true) }
312
+ @fold_lines = true
313
+
314
+ @use_header_id = true
315
+ @display_warnings = true
316
+
317
+ @log.debug "String is: %p" % self
318
+ end
319
+
320
+
321
+ ######
322
+ public
323
+ ######
324
+
325
+ # Filters for controlling what gets output for untrusted input. (But really,
326
+ # you're filtering bad stuff out of untrusted input at submission-time via
327
+ # untainting, aren't you?)
328
+ attr_accessor :filter_html, :filter_styles
329
+
330
+ # RedCloth-compatibility accessor. Line-folding is part of Markdown syntax,
331
+ # so this isn't used by anything.
332
+ attr_accessor :fold_lines
333
+
334
+ # BlueFeather Extension: display warnings on the top of output html (default: true)
335
+ attr_accessor :display_warnings
336
+
337
+ # BlueFeather Extension: add id to each header, for toc and anchors. (default: true)
338
+ attr_accessor :use_header_id
339
+
340
+ ### Render Markdown-formatted text in this string object as HTML and return
341
+ ### it. The parameter is for compatibility with RedCloth, and is currently
342
+ ### unused, though that may change in the future.
343
+ def parse_text(source, rs = nil)
344
+ rs ||= RenderState.new
345
+
346
+ # Create a StringScanner we can reuse for various lexing tasks
347
+ @scanner = StringScanner::new( '' )
348
+
349
+ # Make a copy of the string with normalized line endings, tabs turned to
350
+ # spaces, and a couple of guaranteed newlines at the end
351
+
352
+ text = detab(source.gsub( /\r\n?/, "\n" ))
353
+ text += "\n\n"
354
+ @log.debug "Normalized line-endings: %p" % text
355
+
356
+ # Filter HTML if we're asked to do so
357
+ if self.filter_html
358
+ text.gsub!( "<", "&lt;" )
359
+ text.gsub!( ">", "&gt;" )
360
+ @log.debug "Filtered HTML: %p" % text
361
+ end
362
+
363
+ # Simplify blank lines
364
+ text.gsub!( /^ +$/, '' )
365
+ @log.debug "Tabs -> spaces/blank lines stripped: %p" % text
366
+
367
+ # Replace HTML blocks with placeholders
368
+ text = hide_html_blocks( text, rs )
369
+ @log.debug "Hid HTML blocks: %p" % text
370
+ @log.debug "Render state: %p" % rs
371
+
372
+ # Strip footnote definitions, store in render state
373
+ text = strip_footnote_definitions( text, rs )
374
+ @log.debug "Stripped footnote definitions: %p" % text
375
+ @log.debug "Render state: %p" % rs
376
+
377
+
378
+ # Strip link definitions, store in render state
379
+ text = strip_link_definitions( text, rs )
380
+ @log.debug "Stripped link definitions: %p" % text
381
+ @log.debug "Render state: %p" % rs
382
+
383
+
384
+
385
+ # Escape meta-characters
386
+ text = escape_special_chars( text )
387
+ @log.debug "Escaped special characters: %p" % text
388
+
389
+ # Transform block-level constructs
390
+ text = apply_block_transforms( text, rs )
391
+ @log.debug "After block-level transforms: %p" % text
392
+
393
+ # Now swap back in all the escaped characters
394
+ text = unescape_special_chars( text )
395
+ @log.debug "After unescaping special characters: %p" % text
396
+
397
+ # Extend footnotes
398
+ unless rs.footnotes.empty? then
399
+ text << %Q|<div class="footnotes"><hr#{EmptyElementSuffix}\n<ol>\n|
400
+ rs.found_footnote_ids.each do |id|
401
+ content = rs.footnotes[id]
402
+ html = apply_block_transforms(content.sub(/\n+\Z/, '') + %Q| <a href="#footnote-ref:#{id}" rev="footnote">&#8617;</a>|, rs)
403
+ text << %Q|<li id="footnote:#{id}">\n#{html}\n</li>|
404
+ end
405
+ text << %Q|</ol>\n</div>\n|
406
+ end
407
+
408
+ # Display warnings
409
+ if @display_warnings then
410
+ unless rs.warnings.empty? then
411
+ html = %Q|<pre><strong>[WARNINGS]\n|
412
+ html << rs.warnings.map{|x| Util.escape_html(x)}.join("\n")
413
+ html << %Q|</strong></pre>|
414
+
415
+ text = html + text
416
+ end
417
+ end
418
+
419
+ return text
420
+ end
421
+
422
+ alias parse parse_text
423
+
424
+ # return values are extended. (mainly for testing)
425
+ def parse_text_with_render_state(str, rs = nil)
426
+ rs ||= RenderState.new
427
+ html = parse_text(str, rs)
428
+
429
+ return [html, rs]
430
+ end
431
+
432
+ def parse_text_file(path)
433
+ parse_text(File.read(path))
434
+ end
435
+
436
+ alias parse_file parse_text_file
437
+
438
+
439
+ def parse_document(source)
440
+ document_to_html(Document.parse(source))
441
+ end
442
+
443
+ def parse_document_file(path)
444
+ parse_document(File.read(path))
445
+ end
446
+
447
+
448
+ def document_to_html(doc)
449
+ rs = RenderState.new
450
+
451
+ body_html = nil
452
+
453
+ if doc.encoding_type then
454
+ Util.change_kcode(doc.kcode){
455
+ # for scene when doc.encoding changed after Document.parse
456
+ if doc.body.respond_to?(:force_encoding) then
457
+ doc.body.force_encoding(doc.encoding_type)
458
+ end
459
+
460
+ body_html = parse_text(doc.body, rs)
461
+ }
462
+ else
463
+ body_html = parse_text(doc.body, rs)
464
+ end
465
+
466
+ out = Util.generate_blank_string_io(doc.body)
467
+
468
+ # XHTML decleration
469
+ out.puts %Q|<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">|
470
+
471
+ # html start
472
+ out.puts %Q|<html>|
473
+
474
+ # head
475
+ out.puts %Q|<head>|
476
+
477
+ h1 = rs.headers.find{|x| x.level == 1}
478
+ h1_content = (h1 ? h1.content : nil)
479
+ title = Util.escape_html(doc.title || h1_content || 'no title (Generated by BlueFeather)')
480
+ out.puts %Q|<title>#{title}</title>|
481
+
482
+ if doc.encoding_type and (charset = EncodingType.type_to_charset(doc.encoding_type)) then
483
+ out.puts %Q|<meta http-equiv="Content-Type" content="text/html; charset=#{charset}">|
484
+ end
485
+
486
+ %w(description keywords).each do |name|
487
+ if doc[name] then
488
+ content = Util.escape_html(doc[name])
489
+ out.puts %Q|<meta name="Content-Type" content="#{content}">|
490
+ end
491
+ end
492
+
493
+
494
+ if doc['css'] then
495
+ href = Util.escape_html(doc.css)
496
+ out.puts %Q|<link rel="stylesheet" type="text/css" href="#{href}" />|
497
+
498
+ end
499
+
500
+ if doc['rdf-feed'] then
501
+ href = Util.escape_html(doc['rdf-feed'])
502
+ out.puts %Q|<link rel="alternate" type="application/rdf+xml" href="#{href}" />|
503
+ end
504
+
505
+
506
+
507
+ if doc['rss-feed'] then
508
+ href = Util.escape_html(doc['rss-feed'])
509
+ out.puts %Q|<link rel="alternate" type="application/rss+xml" href="#{href}" />|
510
+ end
511
+
512
+ if doc['atom-feed'] then
513
+ href = Util.escape_html(doc['atom-feed'])
514
+ out.puts %Q|<link rel="alternate" type="application/atom+xml" href="#{href}" />|
515
+ end
516
+
517
+ out.puts %Q|</head>|
518
+
519
+ # body
520
+ out.puts %Q|<body>|
521
+ out.puts
522
+ out.puts body_html
523
+ out.puts
524
+ out.puts %Q|</body>|
525
+
526
+ # html end
527
+ out.puts %Q|</html>|
528
+
529
+
530
+ return out.string
531
+ end
532
+
533
+ alias doc2html document_to_html
534
+
535
+
536
+
537
+
538
+ #######
539
+ #private
540
+ #######
541
+
542
+ ### Convert tabs in +str+ to spaces.
543
+ ### (this method is reformed to function-like method from original BlueCloth)
544
+ def detab( str, tabwidth=TabWidth )
545
+ re = str.split( /\n/ ).collect {|line|
546
+ line.gsub( /(.*?)\t/ ) do
547
+ $1 + ' ' * (tabwidth - $1.length % tabwidth)
548
+ end
549
+ }.join("\n")
550
+
551
+ re
552
+ end
553
+
554
+
555
+
556
+
557
+ ### Do block-level transforms on a copy of +str+ using the specified render
558
+ ### state +rs+ and return the results.
559
+ def apply_block_transforms( str, rs )
560
+ rs.block_transform_depth += 1
561
+
562
+ # Port: This was called '_runBlockGamut' in the original
563
+
564
+ @log.debug "Applying block transforms to:\n %p" % str
565
+ text = str
566
+ text = pretransform_block_separators(text, rs)
567
+ text = pretransform_fenced_code_blocks( text, rs )
568
+
569
+ text = transform_headers( text, rs )
570
+ text = transform_toc(text, rs)
571
+
572
+ text = transform_hrules( text, rs )
573
+ text = transform_lists( text, rs )
574
+ text = transform_definition_lists( text, rs ) # BlueFeather Extension
575
+ text = transform_code_blocks( text, rs )
576
+ text = transform_block_quotes( text, rs )
577
+ text = transform_auto_links( text, rs )
578
+ text = hide_html_blocks( text, rs )
579
+
580
+ text = transform_tables(text, rs)
581
+ text = form_paragraphs( text, rs )
582
+
583
+ rs.block_transform_depth -= 1
584
+ @log.debug "Done with block transforms:\n %p" % text
585
+ return text
586
+ end
587
+
588
+
589
+ ### Apply Markdown span transforms to a copy of the specified +str+ with the
590
+ ### given render state +rs+ and return it.
591
+ def apply_span_transforms( str, rs )
592
+ @log.debug "Applying span transforms to:\n %p" % str
593
+
594
+ str = transform_code_spans( str, rs )
595
+ str = encode_html( str )
596
+ str = transform_images( str, rs )
597
+ str = transform_anchors( str, rs )
598
+ str = transform_italic_and_bold( str, rs )
599
+
600
+ # Hard breaks
601
+ str.gsub!( / {2,}\n/, "<br#{EmptyElementSuffix}\n" )
602
+
603
+ @log.debug "Done with span transforms:\n %p" % str
604
+ return str
605
+ end
606
+
607
+
608
+ # The list of tags which are considered block-level constructs and an
609
+ # alternation pattern suitable for use in regexps made from the list
610
+ StrictBlockTags = %w[ p div h[1-6] blockquote pre table dl ol ul script noscript
611
+ form fieldset iframe math ins del ]
612
+ StrictTagPattern = StrictBlockTags.join('|')
613
+
614
+ LooseBlockTags = StrictBlockTags - %w[ins del]
615
+ LooseTagPattern = LooseBlockTags.join('|')
616
+
617
+ # Nested blocks:
618
+ # <div>
619
+ # <div>
620
+ # tags for inner block must be indented.
621
+ # </div>
622
+ # </div>
623
+ StrictBlockRegexp = %r{
624
+ ^ # Start of line
625
+ <(#{StrictTagPattern}) # Start tag: \2
626
+ \b # word break
627
+ (.*\n)*? # Any number of lines, minimal match
628
+ </\1> # Matching end tag
629
+ [ ]* # trailing spaces
630
+ $ # End of line or document
631
+ }ix
632
+
633
+ # More-liberal block-matching
634
+ LooseBlockRegexp = %r{
635
+ ^ # Start of line
636
+ <(#{LooseTagPattern}) # start tag: \2
637
+ \b # word break
638
+ (.*\n)*? # Any number of lines, minimal match
639
+ .*</\1> # Anything + Matching end tag
640
+ [ ]* # trailing spaces
641
+ $ # End of line or document
642
+ }ix
643
+
644
+ # Special case for <hr />.
645
+ HruleBlockRegexp = %r{
646
+ ( # $1
647
+ \A\n? # Start of doc + optional \n
648
+ | # or
649
+ .*\n\n # anything + blank line
650
+ )
651
+ ( # save in $2
652
+ # BlueFeather fix: Not allow any space on line top
653
+ <hr # Tag open
654
+ \b # Word break
655
+ ([^<>])*? # Attributes
656
+ /?> # Tag close
657
+ $ # followed by a blank line or end of document
658
+ )
659
+ }ix
660
+
661
+ ### Replace all blocks of HTML in +str+ that start in the left margin with
662
+ ### tokens.
663
+ def hide_html_blocks( str, rs )
664
+ @log.debug "Hiding HTML blocks in %p" % str
665
+
666
+ # Tokenizer proc to pass to gsub
667
+ tokenize = lambda {|match|
668
+ key = Digest::MD5::hexdigest( match )
669
+ rs.html_blocks[ key ] = match
670
+ @log.debug "Replacing %p with %p" % [ match, key ]
671
+ "\n\n#{key}\n\n"
672
+ }
673
+
674
+ rval = str.dup
675
+
676
+ @log.debug "Finding blocks with the strict regex..."
677
+ rval.gsub!( StrictBlockRegexp, &tokenize )
678
+
679
+ @log.debug "Finding blocks with the loose regex..."
680
+ rval.gsub!( LooseBlockRegexp, &tokenize )
681
+
682
+ @log.debug "Finding hrules..."
683
+ rval.gsub!( HruleBlockRegexp ) {|match| $1 + tokenize[$2] }
684
+
685
+ return rval
686
+ end
687
+
688
+
689
+ # Link defs are in the form: ^[id]: url "optional title"
690
+ LinkRegexp = %r{
691
+ ^[ ]{0,#{TabWidth - 1}} # BlueFeather fix: indent < tab width
692
+ \[(.+)\]: # id = $1
693
+ [ ]*
694
+ \n? # maybe *one* newline
695
+ [ ]*
696
+ <?(\S+?)>? # url = $2
697
+ [ ]*
698
+ \n? # maybe one newline
699
+ [ ]*
700
+ (?:
701
+ # Titles are delimited by "quotes" or (parens).
702
+ ["(]
703
+ (.+?) # title = $3
704
+ [")] # Matching ) or "
705
+ [ ]*
706
+ )? # title is optional
707
+ (?:\n+|\Z)
708
+ }x
709
+
710
+ ### Strip link definitions from +str+, storing them in the given RenderState
711
+ ### +rs+.
712
+ def strip_link_definitions( str, rs )
713
+ str.gsub( LinkRegexp ) {|match|
714
+ id, url, title = $1, $2, $3
715
+
716
+ rs.urls[ id.downcase ] = encode_html( url )
717
+ unless title.nil?
718
+ rs.titles[ id.downcase ] = title.gsub( /"/, "&quot;" )
719
+ end
720
+
721
+ ""
722
+ }
723
+ end
724
+
725
+ # Footnotes defs are in the form: [^id]: footnote contents.
726
+ FootnoteDefinitionRegexp = %r{
727
+ ^[ ]{0,#{TabWidth - 1}}
728
+ \[\^(.+?)\]\: # id = $1
729
+ [ ]*
730
+ (.*) # first line content = $2
731
+ (?:\n|\Z)
732
+
733
+ ( # second or more lines content = $3
734
+ (?:
735
+ [ ]{#{TabWidth},} # indented
736
+ .*
737
+ (?:\n|\Z)
738
+ |
739
+ \n # blank line
740
+ )*
741
+ )?
742
+
743
+ }x
744
+
745
+ FootnoteIdRegexp = /^[a-zA-Z0-9\:\._-]+$/
746
+
747
+ def strip_footnote_definitions(str, rs)
748
+ str.gsub( FootnoteDefinitionRegexp ) {|match|
749
+ id = $1; content1 = $2; content2 = $3
750
+
751
+ unless id =~ FootnoteIdRegexp then
752
+ rs.warnings << "illegal footnote id - #{id} (legal chars: a-zA-Z0-9_-.:)"
753
+ end
754
+
755
+ if content2 then
756
+ @log.debug " Stripping multi-line definition %p, %p" % [$2, $3]
757
+ content = content1 + "\n" + outdent(content2.chomp)
758
+ @log.debug " Stripped multi-line definition %p, %p" % [id, content]
759
+ rs.footnotes[id] = content
760
+ else
761
+ content = content1 || ''
762
+ @log.debug " Stripped single-line definition %p, %p" % [id, content]
763
+ rs.footnotes[id] = content
764
+ end
765
+
766
+
767
+
768
+ ""
769
+ }
770
+ end
771
+
772
+
773
+ ### Escape special characters in the given +str+
774
+ def escape_special_chars( str )
775
+ @log.debug " Escaping special characters"
776
+ text = ''
777
+
778
+ # The original Markdown source has something called '$tags_to_skip'
779
+ # declared here, but it's never used, so I don't define it.
780
+
781
+ tokenize_html( str ) {|token, str|
782
+ @log.debug " Adding %p token %p" % [ token, str ]
783
+ case token
784
+
785
+ # Within tags, encode * and _
786
+ when :tag
787
+ text += str.
788
+ gsub( /\*/, EscapeTable['*'][:md5] ).
789
+ gsub( /_/, EscapeTable['_'][:md5] )
790
+
791
+ # Encode backslashed stuff in regular text
792
+ when :text
793
+ text += encode_backslash_escapes( str )
794
+ else
795
+ raise TypeError, "Unknown token type %p" % token
796
+ end
797
+ }
798
+
799
+ @log.debug " Text with escapes is now: %p" % text
800
+ return text
801
+ end
802
+
803
+
804
+ ### Swap escaped special characters in a copy of the given +str+ and return
805
+ ### it.
806
+ def unescape_special_chars( str )
807
+ EscapeTable.each {|char, hash|
808
+ @log.debug "Unescaping escaped %p with %p" % [ char, hash[:md5re] ]
809
+ str.gsub!( hash[:md5re], char )
810
+ }
811
+
812
+ return str
813
+ end
814
+
815
+
816
+ ### Return a copy of the given +str+ with any backslashed special character
817
+ ### in it replaced with MD5 placeholders.
818
+ def encode_backslash_escapes( str )
819
+ # Make a copy with any double-escaped backslashes encoded
820
+ text = str.gsub( /\\\\/, EscapeTable['\\'][:md5] )
821
+
822
+ EscapeTable.each_pair {|char, esc|
823
+ next if char == '\\'
824
+ text.gsub!( esc[:re], esc[:md5] )
825
+ }
826
+
827
+ return text
828
+ end
829
+
830
+
831
+ def pretransform_block_separators(str, rs)
832
+ str.gsub(/^[ ]{0,#{TabWidth - 1}}[~][ ]*\n/){
833
+ "\n~\n\n"
834
+ }
835
+ end
836
+
837
+
838
+ TOCRegexp = %r{
839
+ ^\{ # bracket on line-head
840
+ [ ]* # optional inner space
841
+ toc
842
+
843
+ (?:
844
+ (?:
845
+ [:] # colon
846
+ | # or
847
+ [ ]+ # 1 or more space
848
+ )
849
+ (.+?) # $1 = parameter
850
+ )?
851
+
852
+ [ ]* # optional inner space
853
+ \} # closer
854
+ [ ]*$ # optional space on line-foot
855
+ }ix
856
+
857
+ TOCStartLevelRegexp = /^h([1-6])[.]{2,}$/i # $1 = level
858
+
859
+ ### Transform any Markdown-style horizontal rules in a copy of the specified
860
+ ### +str+ and return it.
861
+ def transform_toc( str, rs )
862
+ @log.debug " Transforming tables of contents"
863
+ str.gsub(TOCRegexp){
864
+ start_level = 2 # default
865
+
866
+ param = $1
867
+ if param then
868
+ if param =~ TOCStartLevelRegexp then
869
+ start_level = $1.to_i
870
+ else
871
+ rs.warnings << "illegal TOC parameter - #{param} (valid example: 'h2..')"
872
+ end
873
+ end
874
+
875
+ ul_text = "\n\n"
876
+ rs.headers.each do |header|
877
+ if header.level >= start_level then
878
+ ul_text << ' ' * TabWidth * (header.level - start_level)
879
+ ul_text << '* '
880
+ ul_text << %Q|<a href="##{header.id}" rel="toc">#{header.content_html}</a>|
881
+ ul_text << "\n"
882
+ end
883
+ end
884
+ ul_text << "\n"
885
+
886
+ ul_text
887
+ }
888
+ end
889
+
890
+ TableRegexp = %r{
891
+ (?:
892
+ ^([ ]{0,#{TabWidth - 1}}) # not indented
893
+ (?:[|][ ]*) # NOT optional border
894
+
895
+ \S.*? # 1st cell content
896
+
897
+ (?: # 2nd cell or later
898
+ [|] # cell splitter
899
+ .+? # content
900
+ )+ # 1 or more..
901
+
902
+ [|]? # optional border
903
+ (?:\n|\Z) # line end
904
+ )+
905
+ }x
906
+
907
+ # Transform tables.
908
+ def transform_tables(str, rs)
909
+ str.gsub(TableRegexp){
910
+ transform_table_rows($~[0], rs)
911
+ }
912
+ end
913
+
914
+ TableSeparatorCellRegexp = %r{
915
+ ^
916
+ [ ]*
917
+ ([:])? # $1 = left-align symbol
918
+ [ ]*
919
+ [-]+ # border
920
+ [ ]*
921
+ ([:])? # $2 = right-align symbol
922
+ [ ]*
923
+ $
924
+ }x
925
+
926
+ def transform_table_rows(str, rs)
927
+ # split cells to 2-d array
928
+ data = str.split("\n").map{|x| x.split('|')}
929
+
930
+ # cut when optional side-borders is includeed
931
+ data.each do |row|
932
+ row.shift if row.first.empty?
933
+ end
934
+
935
+ column_attrs = []
936
+
937
+ re = ''
938
+ re << "<table>\n"
939
+
940
+ # head is exist?
941
+ if data.size >= 3 and data[1].all?{|x| x =~ TableSeparatorCellRegexp} then
942
+ head_row = data.shift
943
+ separator_row = data.shift
944
+
945
+ separator_row.each do |cell|
946
+ cell.match TableSeparatorCellRegexp
947
+ left = $1; right = $2
948
+
949
+ if left and right then
950
+ column_attrs << ' style="text-align: center"'
951
+ elsif right then
952
+ column_attrs << ' style="text-align: right"'
953
+ elsif left then
954
+ column_attrs << ' style="text-align: left"'
955
+ else
956
+ column_attrs << ''
957
+ end
958
+ end
959
+
960
+ re << "\t<thead><tr>\n"
961
+ head_row.each_with_index do |cell, i|
962
+ re << "\t\t<th#{column_attrs[i]}>#{apply_span_transforms(cell.strip, rs)}</th>\n"
963
+ end
964
+ re << "\t</tr></thead>\n"
965
+ end
966
+
967
+ # data row
968
+ re << "\t<tbody>\n"
969
+ data.each do |row|
970
+ re << "\t\t<tr>\n"
971
+ row.each_with_index do |cell, i|
972
+ re << "\t\t\t<td#{column_attrs[i]}>#{apply_span_transforms(cell.strip, rs)}</td>\n"
973
+ end
974
+ re << "\t\t</tr>\n"
975
+ end
976
+ re << "\t</tbody>\n"
977
+
978
+ re << "</table>\n"
979
+
980
+ re
981
+ end
982
+
983
+
984
+ ### Transform any Markdown-style horizontal rules in a copy of the specified
985
+ ### +str+ and return it.
986
+ def transform_hrules( str, rs )
987
+ @log.debug " Transforming horizontal rules"
988
+ str.gsub( /^( ?[\-\*_] ?){3,}$/, "\n<hr#{EmptyElementSuffix}\n" )
989
+ end
990
+
991
+
992
+
993
+ # Patterns to match and transform lists
994
+ ListMarkerOl = %r{\d+\.}
995
+ ListMarkerUl = %r{[*+-]}
996
+ ListMarkerAny = Regexp::union( ListMarkerOl, ListMarkerUl )
997
+
998
+ ListRegexp = %r{
999
+ (?:
1000
+ ^[ ]{0,#{TabWidth - 1}} # Indent < tab width
1001
+ (#{ListMarkerAny}) # unordered or ordered ($1)
1002
+ [ ]+ # At least one space
1003
+ )
1004
+ (?m:.+?) # item content (include newlines)
1005
+ (?:
1006
+ \z # Either EOF
1007
+ | # or
1008
+ \n{2,} # Blank line...
1009
+ (?=\S) # ...followed by non-space
1010
+ (?![ ]* # ...but not another item
1011
+ (#{ListMarkerAny})
1012
+ [ ]+)
1013
+ )
1014
+ }x
1015
+
1016
+ ### Transform Markdown-style lists in a copy of the specified +str+ and
1017
+ ### return it.
1018
+ def transform_lists( str, rs )
1019
+ @log.debug " Transforming lists at %p" % (str[0,100] + '...')
1020
+
1021
+ str.gsub( ListRegexp ) {|list|
1022
+ @log.debug " Found list %p" % list
1023
+ bullet = $1
1024
+ list_type = (ListMarkerUl.match(bullet) ? "ul" : "ol")
1025
+ list.gsub!( /\n{2,}/, "\n\n\n" )
1026
+
1027
+ %{<%s>\n%s</%s>\n} % [
1028
+ list_type,
1029
+ transform_list_items( list, rs ),
1030
+ list_type,
1031
+ ]
1032
+ }
1033
+ end
1034
+
1035
+
1036
+ # Pattern for transforming list items
1037
+ ListItemRegexp = %r{
1038
+ (\n)? # leading line = $1
1039
+ (^[ ]*) # leading whitespace = $2
1040
+ (#{ListMarkerAny}) [ ]+ # list marker = $3
1041
+ ((?m:.+?) # list item text = $4
1042
+ (\n{1,2}))
1043
+ (?= \n* (\z | \2 (#{ListMarkerAny}) [ ]+))
1044
+ }x
1045
+
1046
+ ### Transform list items in a copy of the given +str+ and return it.
1047
+ def transform_list_items( str, rs )
1048
+ @log.debug " Transforming list items"
1049
+
1050
+ # Trim trailing blank lines
1051
+ str = str.sub( /\n{2,}\z/, "\n" )
1052
+
1053
+ str.gsub( ListItemRegexp ) {|line|
1054
+ @log.debug " Found item line %p" % line
1055
+ leading_line, item = $1, $4
1056
+
1057
+ if leading_line or /\n{2,}/.match( item )
1058
+ @log.debug " Found leading line or item has a blank"
1059
+ item = apply_block_transforms( outdent(item), rs )
1060
+ else
1061
+ # Recursion for sub-lists
1062
+ @log.debug " Recursing for sublist"
1063
+ item = transform_lists( outdent(item), rs ).chomp
1064
+ item = apply_span_transforms( item, rs )
1065
+ end
1066
+
1067
+ %{<li>%s</li>\n} % item
1068
+ }
1069
+ end
1070
+
1071
+ DefinitionListRegexp = %r{
1072
+ (?:
1073
+ (?:^.+\n)+ # dt
1074
+ \n*
1075
+ (?:
1076
+ ^[ ]{0,#{TabWidth - 1}} # Indent < tab width
1077
+ \: # dd marker (line head)
1078
+ [ ]* # space
1079
+ ((?m:.+?)) # dd content
1080
+ (?:
1081
+ \s*\z # end of string
1082
+ | # or
1083
+ \n{2,} # blank line
1084
+ (?=[ ]{0,#{TabWidth - 1}}\S) # ...followed by
1085
+ )
1086
+ )+
1087
+ )+
1088
+ }x
1089
+
1090
+ def transform_definition_lists(str, rs)
1091
+ @log.debug " Transforming definition lists at %p" % (str[0,100] + '...')
1092
+ str.gsub( DefinitionListRegexp ) {|list|
1093
+ @log.debug " Found definition list %p (captures=%p)" % [list, $~.captures]
1094
+ transform_definition_list_items(list, rs)
1095
+ }
1096
+ end
1097
+
1098
+ DDLineRegexp = /^\:[ ]{0,#{TabWidth - 1}}(.*)/
1099
+
1100
+
1101
+ def transform_definition_list_items(str, rs)
1102
+ buf = Util.generate_blank_string_io(str)
1103
+ buf.puts %Q|<dl>|
1104
+
1105
+ lines = str.split("\n")
1106
+ until lines.empty? do
1107
+
1108
+ dts = []
1109
+
1110
+ # get dt items
1111
+ while lines.first =~ /^(?!\:).+$/ do
1112
+ dts << lines.shift
1113
+ end
1114
+
1115
+
1116
+ dd_as_block = false
1117
+
1118
+ # skip blank lines
1119
+ while not lines.empty? and lines.first.empty? do
1120
+ lines.shift
1121
+ dd_as_block = true
1122
+ end
1123
+
1124
+
1125
+ dds = []
1126
+ while lines.first =~ DDLineRegexp do
1127
+ dd_buf = []
1128
+
1129
+ # dd first line
1130
+ unless (line = lines.shift).empty? then
1131
+ dd_buf << $1 << "\n"
1132
+ end
1133
+
1134
+ # dd second and more lines (sequential with 1st-line)
1135
+ until lines.empty? or # stop if read all
1136
+ lines.first =~ /^[ ]{0,#{TabWidth - 1}}$/ or # stop if blank line
1137
+ lines.first =~ DDLineRegexp do # stop if new dd found
1138
+ dd_buf << outdent(lines.shift) << "\n"
1139
+ end
1140
+
1141
+ # dd second and more lines (separated with 1st-line)
1142
+ until lines.empty? do # stop if all was read
1143
+ if lines.first.empty? then
1144
+ # blank line (skip)
1145
+ lines.shift
1146
+ dd_buf << "\n"
1147
+ elsif lines.first =~ /^[ ]{#{TabWidth},}/ then
1148
+ # indented body
1149
+ dd_buf << outdent(lines.shift) << "\n"
1150
+ else
1151
+ # not indented body
1152
+ break
1153
+ end
1154
+
1155
+ end
1156
+
1157
+
1158
+ dds << dd_buf.join
1159
+
1160
+ # skip blank lines
1161
+ unless lines.empty? then
1162
+ while lines.first.empty? do
1163
+ lines.shift
1164
+ end
1165
+ end
1166
+ end
1167
+
1168
+ # html output
1169
+ dts.each do |dt|
1170
+ buf.puts %Q| <dt>#{apply_span_transforms(dt, rs)}</dt>|
1171
+ end
1172
+
1173
+ dds.each do |dd|
1174
+ if dd_as_block then
1175
+ buf.puts %Q| <dd>#{apply_block_transforms(dd, rs)}</dd>|
1176
+ else
1177
+ dd.gsub!(/\n+\z/, '') # chomp linefeeds
1178
+ buf.puts %Q| <dd>#{apply_span_transforms(dd.chomp, rs)}</dd>|
1179
+ end
1180
+ end
1181
+ end
1182
+
1183
+ buf.puts %Q|</dl>|
1184
+
1185
+ return(buf.string)
1186
+ end
1187
+
1188
+ # old
1189
+
1190
+
1191
+ # Pattern for matching codeblocks
1192
+ CodeBlockRegexp = %r{
1193
+ (?:\n\n|\A)
1194
+ ( # $1 = the code block
1195
+ (?:
1196
+ (?:[ ]{#{TabWidth}} | \t) # a tab or tab-width of spaces
1197
+ .*\n+
1198
+ )+
1199
+ )
1200
+ (^[ ]{0,#{TabWidth - 1}}\S|\Z) # Lookahead for non-space at
1201
+ # line-start, or end of doc
1202
+ }x
1203
+
1204
+
1205
+ ### Transform Markdown-style codeblocks in a copy of the specified +str+ and
1206
+ ### return it.
1207
+ def transform_code_blocks( str, rs )
1208
+ @log.debug " Transforming code blocks"
1209
+
1210
+ str.gsub( CodeBlockRegexp ) {|block|
1211
+ codeblock = $1
1212
+ remainder = $2
1213
+
1214
+ # Generate the codeblock
1215
+ %{\n\n<pre><code>%s\n</code></pre>\n\n%s} %
1216
+ [ encode_code( outdent(codeblock), rs ).rstrip, remainder ]
1217
+ }
1218
+ end
1219
+
1220
+ FencedCodeBlockRegexp = /^(\~{3,})\n((?m:.+?)\n)\1\n/
1221
+
1222
+ def pretransform_fenced_code_blocks( str, rs )
1223
+ @log.debug " Transforming fenced code blocks => standard code blocks"
1224
+
1225
+ str.gsub( FencedCodeBlockRegexp ) {|block|
1226
+ "\n" + transform_code_blocks(indent($2), rs) + "\n"
1227
+ }
1228
+ end
1229
+
1230
+
1231
+
1232
+ # Pattern for matching Markdown blockquote blocks
1233
+ BlockQuoteRegexp = %r{
1234
+ (?:
1235
+ ^[ ]*>[ ]? # '>' at the start of a line
1236
+ .+\n # rest of the first line
1237
+ (?:.+\n)* # subsequent consecutive lines
1238
+ \n* # blanks
1239
+ )+
1240
+ }x
1241
+ PreChunk = %r{ ( ^ \s* <pre> .+? </pre> ) }xm
1242
+
1243
+ ### Transform Markdown-style blockquotes in a copy of the specified +str+
1244
+ ### and return it.
1245
+ def transform_block_quotes( str, rs )
1246
+ @log.debug " Transforming block quotes"
1247
+
1248
+ str.gsub( BlockQuoteRegexp ) {|quote|
1249
+ @log.debug "Making blockquote from %p" % quote
1250
+
1251
+ quote.gsub!( /^ *> ?/, '' ) # Trim one level of quoting
1252
+ quote.gsub!( /^ +$/, '' ) # Trim whitespace-only lines
1253
+
1254
+ indent = " " * TabWidth
1255
+ quoted = %{<blockquote>\n%s\n</blockquote>\n\n} %
1256
+ apply_block_transforms( quote, rs ).
1257
+ gsub( /^/, indent ).
1258
+ gsub( PreChunk ) {|m| m.gsub(/^#{indent}/o, '') }
1259
+ @log.debug "Blockquoted chunk is: %p" % quoted
1260
+ quoted
1261
+ }
1262
+ end
1263
+
1264
+
1265
+ # BlueFeather change:
1266
+ # allow loosely urls and addresses (BlueCloth is very strict)
1267
+ #
1268
+ # loose examples:
1269
+ # <skype:tetra-dice> (other protocol)
1270
+ # <ema+il@example.com> (ex: gmail alias)
1271
+ #
1272
+ # not adapted addresses:
1273
+ # <"Abc@def"@example.com> (refer to quoted-string of RFC 5321)
1274
+
1275
+ AutoAnchorURLRegexp = /<([a-z]+:[^'">\s]+)>/ # $1 = url
1276
+
1277
+ AutoAnchorEmailRegexp = /<([^'">\s]+?\@[^'">\s]+[.][a-zA-Z]+)>/ # $2 = address
1278
+
1279
+ ### Transform URLs in a copy of the specified +str+ into links and return
1280
+ ### it.
1281
+ def transform_auto_links( str, rs )
1282
+ @log.debug " Transforming auto-links"
1283
+ str.gsub( AutoAnchorURLRegexp, %{<a href="\\1">\\1</a>}).
1284
+ gsub( AutoAnchorEmailRegexp ) {|addr|
1285
+ encode_email_address( unescape_special_chars($1) )
1286
+ }
1287
+ end
1288
+
1289
+
1290
+ # Encoder functions to turn characters of an email address into encoded
1291
+ # entities.
1292
+ Encoders = [
1293
+ lambda {|char| "&#%03d;" % char},
1294
+ lambda {|char| "&#x%X;" % char},
1295
+ lambda {|char| char.chr },
1296
+ ]
1297
+
1298
+ ### Transform a copy of the given email +addr+ into an escaped version safer
1299
+ ### for posting publicly.
1300
+ def encode_email_address( addr )
1301
+
1302
+ rval = ''
1303
+ ("mailto:" + addr).each_byte {|b|
1304
+ case b
1305
+ when ?:
1306
+ rval += ":"
1307
+ when ?@
1308
+ rval += Encoders[ rand(2) ][ b ]
1309
+ else
1310
+ r = rand(100)
1311
+ rval += (
1312
+ r > 90 ? Encoders[2][ b ] :
1313
+ r < 45 ? Encoders[1][ b ] :
1314
+ Encoders[0][ b ]
1315
+ )
1316
+ end
1317
+ }
1318
+
1319
+ return %{<a href="%s">%s</a>} % [ rval, rval.sub(/.+?:/, '') ]
1320
+ end
1321
+
1322
+
1323
+ # Regexp for matching Setext-style headers
1324
+ SetextHeaderRegexp = %r{
1325
+ (.+?) # The title text ($1)
1326
+
1327
+ (?: # Markdown Extra: Header Id Attribute (optional)
1328
+ [ ]* # space after closing #'s
1329
+ \{\#
1330
+ (\S+?) # $2 = Id
1331
+ \}
1332
+ [ \t]* # allowed lazy spaces
1333
+ )?
1334
+ \n
1335
+ ([\-=])+ # Match a line of = or -. Save only one in $3.
1336
+ [ ]*\n+
1337
+ }x
1338
+
1339
+ # Regexp for matching ATX-style headers
1340
+ AtxHeaderRegexp = %r{
1341
+ ^(\#+) # $1 = string of #'s
1342
+ [ ]*
1343
+ (.+?) # $2 = Header text
1344
+ [ ]*
1345
+ \#* # optional closing #'s (not counted)
1346
+
1347
+ (?: # Markdown Extra: Header Id Attribute (optional)
1348
+ [ ]* # space after closing #'s
1349
+ \{\#
1350
+ (\S+?) # $3 = Id
1351
+ \}
1352
+ [ \t]* # allowed lazy spaces
1353
+ )?
1354
+
1355
+ \n+
1356
+ }x
1357
+
1358
+ HeaderRegexp = Regexp.union(SetextHeaderRegexp, AtxHeaderRegexp)
1359
+
1360
+ IdRegexp = /^[a-zA-Z][a-zA-Z0-9\:\._-]*$/
1361
+
1362
+ ### Apply Markdown header transforms to a copy of the given +str+ amd render
1363
+ ### state +rs+ and return the result.
1364
+ def transform_headers( str, rs )
1365
+ @log.debug " Transforming headers"
1366
+
1367
+ # Setext-style headers:
1368
+ # Header 1
1369
+ # ========
1370
+ #
1371
+ # Header 2
1372
+ # --------
1373
+ #
1374
+ str.
1375
+ gsub( HeaderRegexp ) {|m|
1376
+ if $1 then
1377
+ @log.debug "Found setext-style header"
1378
+ title, id, hdrchar = $1, $2, $3
1379
+
1380
+ case hdrchar
1381
+ when '='
1382
+ level = 1
1383
+ when '-'
1384
+ level = 2
1385
+ end
1386
+ else
1387
+ @log.debug "Found ATX-style header"
1388
+ hdrchars, title, id = $4, $5, $6
1389
+ level = hdrchars.length
1390
+
1391
+ if level >= 7 then
1392
+ rs.warnings << "illegal header level - h#{level} ('#' symbols are too many)"
1393
+ end
1394
+ end
1395
+
1396
+
1397
+ title_html = apply_span_transforms( title, rs )
1398
+ id ||= "bfheader-#{Digest::MD5.hexdigest(title)}"
1399
+
1400
+ unless id =~ IdRegexp then
1401
+ rs.warnings << "illegal header id - #{id} (legal chars: a-zA-Z0-9_-. | 1st: a-zA-Z)"
1402
+ end
1403
+
1404
+ if rs.block_transform_depth == 1 then
1405
+ rs.headers << RenderState::Header.new(id, level, title, title_html)
1406
+ end
1407
+
1408
+ if @use_header_id then
1409
+ %{<h%d id="%s">%s</h%d>\n\n} % [ level, id, title_html, level ]
1410
+ else
1411
+ %{<h%d>%s</h%d>\n\n} % [ level, title_html, level ]
1412
+ end
1413
+ }
1414
+ end
1415
+
1416
+
1417
+ ### Wrap all remaining paragraph-looking text in a copy of +str+ inside <p>
1418
+ ### tags and return it.
1419
+ def form_paragraphs( str, rs )
1420
+ @log.debug " Forming paragraphs"
1421
+ grafs = str.
1422
+ sub( /\A\n+/, '' ).
1423
+ sub( /\n+\z/, '' ).
1424
+ split( /\n{2,}/ )
1425
+
1426
+ rval = grafs.collect {|graf|
1427
+
1428
+ # Unhashify HTML blocks if this is a placeholder
1429
+ if rs.html_blocks.key?( graf )
1430
+ rs.html_blocks[ graf ]
1431
+
1432
+ # no output if this is block separater
1433
+ elsif graf == '~' then
1434
+ ''
1435
+
1436
+ # Otherwise, wrap in <p> tags
1437
+ else
1438
+ apply_span_transforms(graf, rs).
1439
+ sub( /^[ ]*/, '<p>' ) + '</p>'
1440
+ end
1441
+ }.join( "\n\n" )
1442
+
1443
+ @log.debug " Formed paragraphs: %p" % rval
1444
+ return rval
1445
+ end
1446
+
1447
+
1448
+ # Pattern to match the linkid part of an anchor tag for reference-style
1449
+ # links.
1450
+ RefLinkIdRegexp = %r{
1451
+ [ ]? # Optional leading space
1452
+ (?:\n[ ]*)? # Optional newline + spaces
1453
+ \[
1454
+ (.*?) # Id = $1
1455
+ \]
1456
+ }x
1457
+
1458
+ InlineLinkRegexp = %r{
1459
+ \( # Literal paren
1460
+ [ ]* # Zero or more spaces
1461
+ <?(.+?)>? # URI = $1
1462
+ [ ]* # Zero or more spaces
1463
+ (?: #
1464
+ ([\"\']) # Opening quote char = $2
1465
+ (.*?) # Title = $3
1466
+ \2 # Matching quote char
1467
+ )? # Title is optional
1468
+ \)
1469
+ }x
1470
+
1471
+ ### Apply Markdown anchor transforms to a copy of the specified +str+ with
1472
+ ### the given render state +rs+ and return it.
1473
+ def transform_anchors( str, rs )
1474
+ @log.debug " Transforming anchors"
1475
+ @scanner.string = str.dup
1476
+ text = ''
1477
+
1478
+ # Scan the whole string
1479
+ until @scanner.empty?
1480
+
1481
+ if @scanner.scan( /\[/ )
1482
+ link = ''; linkid = ''
1483
+ depth = 1
1484
+ startpos = @scanner.pos
1485
+ @log.debug " Found a bracket-open at %d" % startpos
1486
+
1487
+ # Scan the rest of the tag, allowing unlimited nested []s. If
1488
+ # the scanner runs out of text before the opening bracket is
1489
+ # closed, append the text and return (wasn't a valid anchor).
1490
+ while depth.nonzero?
1491
+ linktext = @scanner.scan_until( /\]|\[/ )
1492
+
1493
+ if linktext
1494
+ @log.debug " Found a bracket at depth %d: %p" % [ depth, linktext ]
1495
+ link += linktext
1496
+
1497
+ # Decrement depth for each closing bracket
1498
+ depth += ( linktext[-1, 1] == ']' ? -1 : 1 )
1499
+ @log.debug " Depth is now #{depth}"
1500
+
1501
+ # If there's no more brackets, it must not be an anchor, so
1502
+ # just abort.
1503
+ else
1504
+ @log.debug " Missing closing brace, assuming non-link."
1505
+ link += @scanner.rest
1506
+ @scanner.terminate
1507
+ return text + '[' + link
1508
+ end
1509
+ end
1510
+ link.slice!( -1 ) # Trim final ']'
1511
+ @log.debug " Found leading link %p" % link
1512
+
1513
+
1514
+
1515
+ # Markdown Extra: Footnote
1516
+ if link =~ /^\^(.+)/ then
1517
+ id = $1
1518
+ if rs.footnotes[id] then
1519
+ rs.found_footnote_ids << id
1520
+ label = "[#{rs.found_footnote_ids.size}]"
1521
+ else
1522
+ rs.warnings << "undefined footnote id - #{id}"
1523
+ label = '[?]'
1524
+ end
1525
+
1526
+ text += %Q|<sup id="footnote-ref:#{id}"><a href="#footnote:#{id}" rel="footnote">#{label}</a></sup>|
1527
+
1528
+ # Look for a reference-style second part
1529
+ elsif @scanner.scan( RefLinkIdRegexp )
1530
+ linkid = @scanner[1]
1531
+ linkid = link.dup if linkid.empty?
1532
+ linkid.downcase!
1533
+ @log.debug " Found a linkid: %p" % linkid
1534
+
1535
+ # If there's a matching link in the link table, build an
1536
+ # anchor tag for it.
1537
+ if rs.urls.key?( linkid )
1538
+ @log.debug " Found link key in the link table: %p" % rs.urls[linkid]
1539
+ url = escape_md( rs.urls[linkid] )
1540
+
1541
+ text += %{<a href="#{url}"}
1542
+ if rs.titles.key?(linkid)
1543
+ text += %{ title="%s"} % escape_md( rs.titles[linkid] )
1544
+ end
1545
+ text += %{>#{link}</a>}
1546
+
1547
+ # If the link referred to doesn't exist, just append the raw
1548
+ # source to the result
1549
+ else
1550
+ @log.debug " Linkid %p not found in link table" % linkid
1551
+ @log.debug " Appending original string instead: "
1552
+ @log.debug "%p" % @scanner.string[ startpos-1 .. @scanner.pos-1 ]
1553
+
1554
+ rs.warnings << "link-id not found - #{linkid}"
1555
+ text += @scanner.string[ startpos-1 .. @scanner.pos-1 ]
1556
+ end
1557
+
1558
+ # ...or for an inline style second part
1559
+ elsif @scanner.scan( InlineLinkRegexp )
1560
+ url = @scanner[1]
1561
+ title = @scanner[3]
1562
+ @log.debug " Found an inline link to %p" % url
1563
+
1564
+ text += %{<a href="%s"} % escape_md( url )
1565
+ if title
1566
+ title.gsub!( /"/, "&quot;" )
1567
+ text += %{ title="%s"} % escape_md( title )
1568
+ end
1569
+ text += %{>#{link}</a>}
1570
+
1571
+ # No linkid part: just append the first part as-is.
1572
+ else
1573
+ @log.debug "No linkid, so no anchor. Appending literal text."
1574
+ text += @scanner.string[ startpos-1 .. @scanner.pos-1 ]
1575
+ end # if linkid
1576
+
1577
+ # Plain text
1578
+ else
1579
+ @log.debug " Scanning to the next link from %p" % @scanner.rest
1580
+ text += @scanner.scan( /[^\[]+/ )
1581
+ end
1582
+
1583
+ end # until @scanner.empty?
1584
+
1585
+ return text
1586
+ end
1587
+
1588
+
1589
+ # Pattern to match strong emphasis in Markdown text
1590
+ BoldRegexp = %r{ (\*\*|__) (\S|\S.*?\S) \1 }x
1591
+
1592
+ # Pattern to match normal emphasis in Markdown text
1593
+ ItalicRegexp = %r{ (\*|_) (\S|\S.*?\S) \1 }x
1594
+
1595
+ ### Transform italic- and bold-encoded text in a copy of the specified +str+
1596
+ ### and return it.
1597
+ def transform_italic_and_bold( str, rs )
1598
+ @log.debug " Transforming italic and bold"
1599
+
1600
+ str.
1601
+ gsub( BoldRegexp, %{<strong>\\2</strong>} ).
1602
+ gsub( ItalicRegexp, %{<em>\\2</em>} )
1603
+ end
1604
+
1605
+
1606
+ ### Transform backticked spans into <code> spans.
1607
+ def transform_code_spans( str, rs )
1608
+ @log.debug " Transforming code spans"
1609
+
1610
+ # Set up the string scanner and just return the string unless there's at
1611
+ # least one backtick.
1612
+ @scanner.string = str.dup
1613
+ unless @scanner.exist?( /`/ )
1614
+ @scanner.terminate
1615
+ @log.debug "No backticks found for code span in %p" % str
1616
+ return str
1617
+ end
1618
+
1619
+ @log.debug "Transforming code spans in %p" % str
1620
+
1621
+ # Build the transformed text anew
1622
+ text = ''
1623
+
1624
+ # Scan to the end of the string
1625
+ until @scanner.empty?
1626
+
1627
+ # Scan up to an opening backtick
1628
+ if pre = @scanner.scan_until( /.?(?=`)/m )
1629
+ text += pre
1630
+ @log.debug "Found backtick at %d after '...%s'" % [ @scanner.pos, text[-10, 10] ]
1631
+
1632
+ # Make a pattern to find the end of the span
1633
+ opener = @scanner.scan( /`+/ )
1634
+ len = opener.length
1635
+ closer = Regexp::new( opener )
1636
+ @log.debug "Scanning for end of code span with %p" % closer
1637
+
1638
+ # Scan until the end of the closing backtick sequence. Chop the
1639
+ # backticks off the resultant string, strip leading and trailing
1640
+ # whitespace, and encode any enitites contained in it.
1641
+ codespan = @scanner.scan_until( closer ) or
1642
+ raise FormatError::new( @scanner.rest[0,20],
1643
+ "No %p found before end" % opener )
1644
+
1645
+ @log.debug "Found close of code span at %d: %p" % [ @scanner.pos - len, codespan ]
1646
+ codespan.slice!( -len, len )
1647
+ text += "<code>%s</code>" %
1648
+ encode_code( codespan.strip, rs )
1649
+
1650
+ # If there's no more backticks, just append the rest of the string
1651
+ # and move the scan pointer to the end
1652
+ else
1653
+ text += @scanner.rest
1654
+ @scanner.terminate
1655
+ end
1656
+ end
1657
+
1658
+ return text
1659
+ end
1660
+
1661
+
1662
+ # Next, handle inline images: ![alt text](url "optional title")
1663
+ # Don't forget: encode * and _
1664
+ InlineImageRegexp = %r{
1665
+ ( # Whole match = $1
1666
+ !\[ (.*?) \] # alt text = $2
1667
+ \([ ]*
1668
+ <?(\S+?)>? # source url = $3
1669
+ [ ]*
1670
+ (?: #
1671
+ (["']) # quote char = $4
1672
+ (.*?) # title = $5
1673
+ \4 # matching quote
1674
+ [ ]*
1675
+ )? # title is optional
1676
+ \)
1677
+ )
1678
+ }x #"
1679
+
1680
+
1681
+ # Reference-style images
1682
+ ReferenceImageRegexp = %r{
1683
+ ( # Whole match = $1
1684
+ !\[ (.*?) \] # Alt text = $2
1685
+ [ ]? # Optional space
1686
+ (?:\n[ ]*)? # One optional newline + spaces
1687
+ \[ (.*?) \] # id = $3
1688
+ )
1689
+ }x
1690
+
1691
+ ### Turn image markup into image tags.
1692
+ def transform_images( str, rs )
1693
+ @log.debug " Transforming images %p" % str
1694
+
1695
+ # Handle reference-style labeled images: ![alt text][id]
1696
+ str.
1697
+ gsub( ReferenceImageRegexp ) {|match|
1698
+ whole, alt, linkid = $1, $2, $3.downcase
1699
+ @log.debug "Matched %p" % match
1700
+ res = nil
1701
+ alt.gsub!( /"/, '&quot;' )
1702
+
1703
+ # for shortcut links like ![this][].
1704
+ linkid = alt.downcase if linkid.empty?
1705
+
1706
+ if rs.urls.key?( linkid )
1707
+ url = escape_md( rs.urls[linkid] )
1708
+ @log.debug "Found url '%s' for linkid '%s' " % [ url, linkid ]
1709
+
1710
+ # Build the tag
1711
+ result = %{<img src="%s" alt="%s"} % [ url, alt ]
1712
+ if rs.titles.key?( linkid )
1713
+ result += %{ title="%s"} % escape_md( rs.titles[linkid] )
1714
+ end
1715
+ result += EmptyElementSuffix
1716
+
1717
+ else
1718
+ result = whole
1719
+ end
1720
+
1721
+ @log.debug "Replacing %p with %p" % [ match, result ]
1722
+ result
1723
+ }.
1724
+
1725
+ # Inline image style
1726
+ gsub( InlineImageRegexp ) {|match|
1727
+ @log.debug "Found inline image %p" % match
1728
+ whole, alt, title = $1, $2, $5
1729
+ url = escape_md( $3 )
1730
+ alt.gsub!( /"/, '&quot;' )
1731
+
1732
+ # Build the tag
1733
+ result = %{<img src="%s" alt="%s"} % [ url, alt ]
1734
+ unless title.nil?
1735
+ title.gsub!( /"/, '&quot;' )
1736
+ result += %{ title="%s"} % escape_md( title )
1737
+ end
1738
+ result += EmptyElementSuffix
1739
+
1740
+ @log.debug "Replacing %p with %p" % [ match, result ]
1741
+ result
1742
+ }
1743
+ end
1744
+
1745
+
1746
+ # Regexp to match special characters in a code block
1747
+ CodeEscapeRegexp = %r{( \* | _ | \{ | \} | \[ | \] | \\ )}x
1748
+
1749
+ ### Escape any characters special to HTML and encode any characters special
1750
+ ### to Markdown in a copy of the given +str+ and return it.
1751
+ def encode_code( str, rs )
1752
+ str.gsub( %r{&}, '&amp;' ).
1753
+ gsub( %r{<}, '&lt;' ).
1754
+ gsub( %r{>}, '&gt;' ).
1755
+ gsub( CodeEscapeRegexp ) {|match| EscapeTable[match][:md5]}
1756
+ end
1757
+
1758
+
1759
+
1760
+ #################################################################
1761
+ ### U T I L I T Y F U N C T I O N S
1762
+ #################################################################
1763
+
1764
+ ### Escape any markdown characters in a copy of the given +str+ and return
1765
+ ### it.
1766
+ def escape_md( str )
1767
+ str.
1768
+ gsub( /\*/, EscapeTable['*'][:md5] ).
1769
+ gsub( /_/, EscapeTable['_'][:md5] )
1770
+ end
1771
+
1772
+
1773
+ # Matching constructs for tokenizing X/HTML
1774
+ HTMLCommentRegexp = %r{ <! ( -- .*? -- \s* )+ > }mx
1775
+ XMLProcInstRegexp = %r{ <\? .*? \?> }mx
1776
+ MetaTag = Regexp::union( HTMLCommentRegexp, XMLProcInstRegexp )
1777
+
1778
+ HTMLTagOpenRegexp = %r{ < [a-z/!$] [^<>]* }imx
1779
+ HTMLTagCloseRegexp = %r{ > }x
1780
+ HTMLTagPart = Regexp::union( HTMLTagOpenRegexp, HTMLTagCloseRegexp )
1781
+
1782
+ ### Break the HTML source in +str+ into a series of tokens and return
1783
+ ### them. The tokens are just 2-element Array tuples with a type and the
1784
+ ### actual content. If this function is called with a block, the type and
1785
+ ### text parts of each token will be yielded to it one at a time as they are
1786
+ ### extracted.
1787
+ def tokenize_html( str )
1788
+ depth = 0
1789
+ tokens = []
1790
+ @scanner.string = str.dup
1791
+ type, token = nil, nil
1792
+
1793
+ until @scanner.empty?
1794
+ @log.debug "Scanning from %p" % @scanner.rest
1795
+
1796
+ # Match comments and PIs without nesting
1797
+ if (( token = @scanner.scan(MetaTag) ))
1798
+ type = :tag
1799
+
1800
+ # Do nested matching for HTML tags
1801
+ elsif (( token = @scanner.scan(HTMLTagOpenRegexp) ))
1802
+ tagstart = @scanner.pos
1803
+ @log.debug " Found the start of a plain tag at %d" % tagstart
1804
+
1805
+ # Start the token with the opening angle
1806
+ depth = 1
1807
+ type = :tag
1808
+
1809
+ # Scan the rest of the tag, allowing unlimited nested <>s. If
1810
+ # the scanner runs out of text before the tag is closed, raise
1811
+ # an error.
1812
+ while depth.nonzero?
1813
+
1814
+ # Scan either an opener or a closer
1815
+ chunk = @scanner.scan( HTMLTagPart ) or
1816
+ raise "Malformed tag at character %d: %p" %
1817
+ [ tagstart, token + @scanner.rest ]
1818
+
1819
+ @log.debug " Found another part of the tag at depth %d: %p" % [ depth, chunk ]
1820
+
1821
+ token += chunk
1822
+
1823
+ # If the last character of the token so far is a closing
1824
+ # angle bracket, decrement the depth. Otherwise increment
1825
+ # it for a nested tag.
1826
+ depth += ( token[-1, 1] == '>' ? -1 : 1 )
1827
+ @log.debug " Depth is now #{depth}"
1828
+ end
1829
+
1830
+ # Match text segments
1831
+ else
1832
+ @log.debug " Looking for a chunk of text"
1833
+ type = :text
1834
+
1835
+ # Scan forward, always matching at least one character to move
1836
+ # the pointer beyond any non-tag '<'.
1837
+ token = @scanner.scan_until( /[^<]+/m )
1838
+ end
1839
+
1840
+ @log.debug " type: %p, token: %p" % [ type, token ]
1841
+
1842
+ # If a block is given, feed it one token at a time. Add the token to
1843
+ # the token list to be returned regardless.
1844
+ if block_given?
1845
+ yield( type, token )
1846
+ end
1847
+ tokens << [ type, token ]
1848
+ end
1849
+
1850
+ return tokens
1851
+ end
1852
+
1853
+
1854
+ ### Return a copy of +str+ with angle brackets and ampersands HTML-encoded.
1855
+ def encode_html( str )
1856
+ str.gsub( /&(?!#?[x]?(?:[0-9a-f]+|\w+);)/i, "&amp;" ).
1857
+ gsub( %r{<(?![a-z/?\$!])}i, "&lt;" )
1858
+ end
1859
+
1860
+
1861
+ ### Return one level of line-leading tabs or spaces from a copy of +str+ and
1862
+ ### return it.
1863
+ def outdent( str )
1864
+ str.gsub( /^(\t|[ ]{1,#{TabWidth}})/, '')
1865
+ end
1866
+
1867
+ def indent(str)
1868
+ str.gsub( /^/, ' ' * TabWidth)
1869
+ end
1870
+
1871
+ end
1872
+ end