pseudohikiparser 0.0.0.6.develop → 0.0.0.7.develop

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA1:
3
- metadata.gz: 0cd301fcbf579cecea32ac473e1bffd351cb9e21
4
- data.tar.gz: 5871802bd7b9dc0e71df8f3a5c4805339caf1833
3
+ metadata.gz: ac5860d6ae01c25224992ed3bdaeb09080446bab
4
+ data.tar.gz: fce47183ada974d2416789045023fe0b31e9f4a0
5
5
  SHA512:
6
- metadata.gz: 411eeb634503a5f79d9f563ddcfcfa45140e2889c78463a494275d044c3fb7bbbaa50591a9a27d61604bda7adaf8e5ff43cdacd1b52c7b10bda672729b98d394
7
- data.tar.gz: 9d58afc0b126a74e1b2ba1e9fc32efff8262a1a248e98037cb0c18371144bc3420bfd5b72a58d4a219dbcb48ff61653dca0b4a4abfb194bbe974fccd0c9be303
6
+ metadata.gz: de7070b265af53ea7fbcfe1bb09bb5d0c0cc0256e41e7074b712ada33483d0f913fea346c0ae4f5aaf2d961fbfb778b652a385b4621a02934f90d272517f8332
7
+ data.tar.gz: 897b062e0a00e9f3064d1237d1731dfba42934888997b0bbe4b0b08467000ef67dbb01e228c2c995cba5f90c48956050f88a188277093e93a67daf3c80cf6c9c
data/README.md CHANGED
@@ -8,10 +8,10 @@ Currently, only a limited range of notations can be converted into HTML4 or XHTM
8
8
  I am writing this tool with following objectives in mind,
9
9
 
10
10
  * provide some additional features that do not exist in the original Hiki notation
11
- * make the notation more line oriented
12
- * allow to assign ids to elements such as headings
11
+ * make the notation more line oriented
12
+ * allow to assign ids to elements such as headings
13
13
  * support several formats other than HTML
14
- * The visitor pattern is adopted for the implementation, so you only have to add a visitor class to support a certain format.
14
+ * The visitor pattern is adopted for the implementation, so you only have to add a visitor class to support a certain format.
15
15
 
16
16
  And, it would not be compatible with the original Hiki notation.
17
17
 
@@ -30,14 +30,14 @@ gem install pseudohikiparser --pre
30
30
 
31
31
  ### Samples
32
32
 
33
- [A sample text](https://github.com/nico-hn/PseudoHikiParser/blob/develop/samples/wikipage.txt) in Hiki notation and [a result of conversion](http://htmlpreview.github.com/?https://github.com/nico-hn/PseudoHikiParser/blob/develop/samples/wikipage.html), and [another result of conversion](http://htmlpreview.github.com/?https://github.com/nico-hn/PseudoHikiParser/blob/develop/samples/wikipage_with_toc.html)
33
+ [A sample text](https://github.com/nico-hn/PseudoHikiParser/blob/develop/samples/wikipage.txt) in Hiki notation and [a result of conversion](http://htmlpreview.github.com/?https://github.com/nico-hn/PseudoHikiParser/blob/develop/samples/wikipage.html), [another result of conversion](http://htmlpreview.github.com/?https://github.com/nico-hn/PseudoHikiParser/blob/develop/samples/wikipage_with_toc.html) and [yet another result](http://htmlpreview.github.com/?https://github.com/nico-hn/PseudoHikiParser/blob/develop/samples/wikipage_html5_with_toc.html).
34
34
 
35
35
  You will find those samples in [develop branch](https://github.com/nico-hn/PseudoHikiParser/tree/develop/samples).
36
36
 
37
37
 
38
38
  ### pseudohiki2html.rb
39
39
 
40
- After the installation of PseudoHikiParser, you could use a command, _pseudohiki2html.rb_.
40
+ After the installation of PseudoHikiParser, you could use a command: **pseudohiki2html.rb**.
41
41
 
42
42
  _Please note that pseudohiki2html.rb is currently provided as a showcase of PseudoHikiParser, and the options will be continuously changed at this stage of development._
43
43
 
@@ -90,7 +90,7 @@ For more options, please try `pseudohiki2html.rb --help`
90
90
 
91
91
  If you save the lines below as a ruby script and execute it:
92
92
 
93
- ```
93
+ ```ruby
94
94
  #!/usr/bin/env ruby
95
95
 
96
96
  require 'pseudohikiparser'
@@ -106,7 +106,7 @@ puts html
106
106
  ```
107
107
  you will get the following output:
108
108
 
109
- ```
109
+ ```html
110
110
  <div class="section h2">
111
111
  <h2> The first heading
112
112
  </h2>
@@ -119,17 +119,17 @@ The first paragraph
119
119
 
120
120
  Other than PseudoHiki::HtmlFormat, you can choose PseudoHiki::XhtmlFormat, PseudoHiki::Xhtml5Format, PseudoHiki::PlainTextFormat.
121
121
 
122
- ## Development status of features from the original [Hiki notation](http://hikiwiki.org/en/TextFormattingRules.html)
122
+ ## Development status of features from the original [Hiki notation](http://rabbit-shocker.org/en/hiki.html)
123
123
 
124
124
  * Paragraphs - Usable
125
125
  * Links
126
- * WikiNames - Not supported (and would never be)
127
- * Linking to other Wiki pages - Not supported as well
128
- * Linking to an arbitrary URL - Maybe usable
126
+ * WikiNames - Not supported (and would never be)
127
+ * Linking to other Wiki pages - Not supported as well
128
+ * Linking to an arbitrary URL - Maybe usable
129
129
  * Preformatted text - Usable
130
130
  * Text decoration - Partly supported
131
- * Currently, there is no means of escaping tags for inline decorations.
132
- * The notation with backquote tags(``) for inline literals is not supported.
131
+ * Currently, there is no means of escaping tags for inline decorations.
132
+ * The notation with backquote tags(``) for inline literals is not supported.
133
133
  * Headings - Usable
134
134
  * Horizontal lines - Usable
135
135
  * Lists - Usable
@@ -197,7 +197,62 @@ cell 3-1 || || cell 3-4 cell 3-5
197
197
  cell 4-1 cell 4-2 cell 4-3 cell 4-4 cell 4-5
198
198
  ```
199
199
  #### A visitor for HTML5
200
- The visitor, [Xhtml5Format](https://github.com/nico-hn/PseudoHikiParser/blob/develop/lib/pseudohiki/htmlformat.rb#L225) is currently available only in the [develop branch](https://github.com/nico-hn/PseudoHikiParser/tree/develop).
200
+ The visitor, [Xhtml5Format](https://github.com/nico-hn/PseudoHikiParser/blob/develop/lib/pseudohiki/htmlformat.rb#L222) is currently available only in the [develop branch](https://github.com/nico-hn/PseudoHikiParser/tree/develop).
201
+
202
+ #### A vistor for (Git Flavored) Markdown
203
+
204
+ The visitor, [MarkDownFormat](https://github.com/nico-hn/PseudoHikiParser/blob/develop/lib/pseudohiki/markdownformat.rb) too, is currently available only in the [develop branch](https://github.com/nico-hn/PseudoHikiParser/blob/develop/).
205
+
206
+ It's just in experimental stage. For example, it cannot properly convert html elements appeared in hiki notation text yet.
207
+
208
+ The following are a sample script and its output:
209
+
210
+ ```ruby
211
+ #!/usr/bin/env ruby
212
+
213
+ require 'pseudohiki/markdownformat'
214
+
215
+ md = PseudoHiki::MarkDownFormat.create
216
+ gfm = PseudoHiki::MarkDownFormat.create(gfm_style: true)
217
+
218
+ hiki = <<TEXT
219
+ !! The first heading
220
+
221
+ The first paragraph
201
222
 
223
+ ||!header 1||!header 2
224
+ ||''cell 1''||cell2
225
+
226
+ TEXT
227
+
228
+ tree = PseudoHiki::BlockParser.parse(hiki)
229
+ md_text = md.format(tree).to_s
230
+ gfm_text = gfm.format(tree).to_s
231
+ puts md_text
232
+ puts "===================="
233
+ puts gfm_text
234
+ ```
235
+
236
+ (You will get the following output.)
237
+
238
+ ```
239
+ ## The first heading
240
+
241
+ The first paragraph
242
+
243
+ <table>
244
+ <tr><th>header 1</th><th>header 2</th></tr>
245
+ <tr><td><em>cell 1</em></td><td>cell2</td></tr>
246
+ </table>
247
+
248
+ ====================
249
+ ## The first heading
250
+
251
+ The first paragraph
252
+
253
+ |header 1|header 2|
254
+ |--------|--------|
255
+ |_cell 1_|cell2 |
256
+ ```
202
257
 
203
258
  ### Not Implemented Yet
data/lib/htmlelement.rb CHANGED
@@ -5,6 +5,16 @@ require 'kconv'
5
5
  class HtmlElement
6
6
  class Children < Array
7
7
  alias to_s join
8
+
9
+ def traverse(&block)
10
+ each do |child|
11
+ if child.kind_of? HtmlElement or child.kind_of? Children
12
+ child.traverse(&block)
13
+ else
14
+ yield child
15
+ end
16
+ end
17
+ end
8
18
  end
9
19
 
10
20
  module CHARSET
@@ -64,11 +74,11 @@ class HtmlElement
64
74
  end
65
75
 
66
76
  def HtmlElement.urlencode(str)
67
- str.toutf8.gsub(/[^\w\.\-]/n) {|ch| format('%%%02X', ch[0]) }
77
+ str.toutf8.gsub(/[^\w\.\-]/o) {|utf8_char| utf8_char.unpack("C*").map {|b| '%%%02X'%[b] }.join }
68
78
  end
69
79
 
70
80
  def HtmlElement.urldecode(str)
71
- utf = str.gsub(/%\w\w/) {|ch| [ch[-2,2]].pack('H*') }
81
+ utf = str.gsub(/%\w\w/) {|ch| [ch[-2,2]].pack('H*') }.toutf8
72
82
  return utf.tosjis if $KCODE =~ /^s/io
73
83
  return utf.toeuc if $KCODE =~ /^e/io
74
84
  utf
@@ -84,7 +94,7 @@ class HtmlElement
84
94
  end
85
95
 
86
96
  def HtmlElement.escape(str)
87
- str.gsub(/[&"<>]/on) {|pat| ESC[pat] }
97
+ str.gsub(/[&"<>]/o) {|pat| ESC[pat] }
88
98
  end
89
99
 
90
100
  def HtmlElement.decode(str)
@@ -144,6 +154,11 @@ class HtmlElement
144
154
  self.class::TagFormats[@tagname]%[@tagname, format_attributes, @children, @tagname]
145
155
  end
146
156
  alias to_str to_s
157
+
158
+ def traverse(&block)
159
+ yield self
160
+ @children.traverse(&block)
161
+ end
147
162
  end
148
163
 
149
164
  class XhtmlElement < HtmlElement
@@ -24,10 +24,9 @@ module PseudoHiki
24
24
  # return unless tree[0].kind_of? Array ** block_leaf:[inline_node:[token or inline_node]]
25
25
  head = leaf[0]
26
26
  return unless head.kind_of? String
27
- m = ID_TAG_PAT.match(head)
28
- if m
27
+ if m = ID_TAG_PAT.match(head)
29
28
  node.node_id = m[1]
30
- leaf[0] = head.sub(ID_TAG_PAT,"")
29
+ leaf[0] = head.sub(ID_TAG_PAT, "")
31
30
  end
32
31
  node
33
32
  end
@@ -47,8 +46,7 @@ module PseudoHiki
47
46
 
48
47
  class BlockLeaf < BlockStack::Leaf
49
48
  @@head_re = {}
50
- attr_accessor :nominal_level
51
- attr_accessor :node_id
49
+ attr_accessor :nominal_level, :node_id
52
50
 
53
51
  def self.head_re=(head_regex)
54
52
  @@head_re[self] = head_regex
@@ -63,9 +61,8 @@ module PseudoHiki
63
61
  end
64
62
 
65
63
  def self.create(line, inline_parser=InlineParser)
66
- line.sub!(self.head_re,"") if self.head_re
67
- leaf = self.new
68
- leaf.concat(inline_parser.parse(line))
64
+ line.sub!(self.head_re, "") if self.head_re
65
+ new.concat(inline_parser.parse(line)) #leaf = self.new
69
66
  end
70
67
 
71
68
  def self.assign_head_re(head, need_to_escape=true, reg_pat="(%s)")
@@ -105,7 +102,7 @@ module PseudoHiki
105
102
  include TreeStack::Mergeable
106
103
 
107
104
  def self.create(line)
108
- line.sub!(self.head_re,"") if self.head_re
105
+ line.sub!(self.head_re, "") if self.head_re
109
106
  self.new.tap {|leaf| leaf.push line }
110
107
  end
111
108
 
@@ -142,7 +139,6 @@ module PseudoHiki
142
139
  class ListTypeLeaf < NestedBlockLeaf; end
143
140
 
144
141
  class BlockNode < BlockStack::Node
145
- attr_accessor :base_level, :relative_level_from_base
146
142
  attr_accessor :node_id
147
143
 
148
144
  def nominal_level
@@ -160,6 +156,24 @@ module PseudoHiki
160
156
  end
161
157
 
162
158
  def parse_leafs; end
159
+
160
+ def in_link_tag?(preceding_str)
161
+ preceding_str[-2, 2] == "[[" or preceding_str[-1, 1] == "|"
162
+ end
163
+
164
+ def tagfy_link(line)
165
+ line.gsub(URI_RE) {|url| in_link_tag?($`) ? url : "[[#{url}]]" }
166
+ end
167
+
168
+ def add_leaf(line, blockparser)
169
+ if LINE_PAT::VERBATIM_BEGIN =~ line
170
+ return blockparser.stack.push BlockElement::VerbatimNode.new.tap {|node| node.in_block_tag = true }
171
+ end
172
+ line = tagfy_link(line) unless BlockElement::VerbatimLeaf.head_re =~ line
173
+ leaf = blockparser.select_leaf_type(line).create(line)
174
+ blockparser.stack.pop while blockparser.breakable?(leaf)
175
+ blockparser.stack.push leaf
176
+ end
163
177
  end
164
178
 
165
179
  class NonNestedBlockNode < BlockNode
@@ -203,13 +217,23 @@ module PseudoHiki
203
217
  def push_self(stack); end
204
218
  end
205
219
 
220
+ class BlockElement::VerbatimNode
221
+ attr_writer :in_block_tag
222
+
223
+ def add_leaf(line, blockparser)
224
+ return @stack.pop if LINE_PAT::VERBATIM_END =~ line
225
+ return super(line, blockparser) unless @in_block_tag
226
+ line = " ".concat(line) if BlockElement::BlockNodeEnd.head_re =~ line and not @in_block_tag
227
+ @stack.push BlockElement::VerbatimLeaf.create(line, @in_block_tag)
228
+ end
229
+ end
230
+
206
231
  class BlockElement::QuoteNode
207
232
  def parse_leafs
208
233
  self[0] = BlockParser.parse(self[0])
209
234
  end
210
235
  end
211
236
 
212
- # class HeadingNode
213
237
  class BlockElement::HeadingNode
214
238
  def breakable?(breaker)
215
239
  kind_of?(breaker.block) and nominal_level >= breaker.nominal_level
@@ -217,8 +241,8 @@ module PseudoHiki
217
241
  end
218
242
 
219
243
  class BlockElement::VerbatimLeaf
220
- def self.create(line)
221
- line.sub!(self.head_re,"") if self.head_re
244
+ def self.create(line, in_block_tag=nil)
245
+ line.sub!(self.head_re, "") if self.head_re and not in_block_tag
222
246
  self.new.tap {|leaf| leaf.push line }
223
247
  end
224
248
  end
@@ -297,46 +321,16 @@ module PseudoHiki
297
321
  @stack.current_node.breakable?(breaker)
298
322
  end
299
323
 
300
- def in_link_tag?(preceding_str)
301
- preceding_str[-2,2] == "[[" or preceding_str[-1,1] == "|"
302
- end
303
-
304
- def tagfy_link(line)
305
- line.gsub(URI_RE) {|url| in_link_tag?($`) ? url : "[[#{url}]]" }
306
- end
307
-
308
324
  def select_leaf_type(line)
309
325
  [BlockNodeEnd, HrLeaf].each {|leaf| return leaf if leaf.head_re =~ line }
310
326
  matched = HEAD_RE.match(line)
311
- return HeadToLeaf[matched[0]]||HeadToLeaf[line[0,1]] || HeadToLeaf['\s'] if matched
327
+ return HeadToLeaf[matched[0]]||HeadToLeaf[line[0, 1]] || HeadToLeaf['\s'] if matched
312
328
  ParagraphLeaf
313
329
  end
314
330
 
315
- def add_verbatim_block(lines)
316
- until lines.empty? or LINE_PAT::VERBATIM_END =~ lines.first
317
- lines[0] = " " + lines[0] if BlockNodeEnd.head_re =~ lines.first
318
- @stack.push(VerbatimLeaf.create(lines.shift))
319
- end
320
- lines.shift if LINE_PAT::VERBATIM_END =~ lines.first
321
- end
322
-
323
- def add_leaf(line)
324
- leaf = select_leaf_type(line).create(line)
325
- while breakable?(leaf)
326
- @stack.pop
327
- end
328
- @stack.push leaf
329
- end
330
-
331
331
  def read_lines(lines)
332
- while line = lines.shift
333
- if LINE_PAT::VERBATIM_BEGIN =~ line
334
- add_verbatim_block(lines)
335
- else
336
- line = self.tagfy_link(line) unless VerbatimLeaf.head_re =~ line
337
- add_leaf(line)
338
- end
339
- end
332
+ each_line = lines.respond_to?(:each_line) ? :each_line : :each
333
+ lines.send(each_line) {|line| @stack.current_node.add_leaf(line, self) }
340
334
  @stack.pop
341
335
  end
342
336
  end
@@ -168,6 +168,7 @@ module PseudoHiki
168
168
  super(tree).tap do |element|
169
169
  element["rowspan"] = tree.rowspan if tree.rowspan > 1
170
170
  element["colspan"] = tree.colspan if tree.colspan > 1
171
+ # element.push "&#160;" if element.empty? # &#160; = &nbsp; this line would be necessary for HTML 4 or XHTML 1.0
171
172
  end
172
173
  end
173
174
  end
@@ -62,8 +62,7 @@ module PseudoHiki
62
62
  def convert_last_node_into_leaf
63
63
  last_node = remove_current_node
64
64
  tag_head = NodeTypeToHead[last_node.class]
65
- tag_head_leaf = InlineLeaf.create(tag_head)
66
- self.push tag_head_leaf
65
+ self.push InlineLeaf.create(tag_head)
67
66
  last_node.each {|leaf| self.push_as_leaf leaf }
68
67
  end
69
68
 
@@ -73,23 +72,20 @@ module PseudoHiki
73
72
 
74
73
  def treated_as_node_end(token)
75
74
  return self.pop if current_node.class == TAIL[token]
76
- if node_in_ancestors?(TAIL[token])
77
- convert_last_node_into_leaf until current_node.class == TAIL[token]
78
- return self.pop
79
- end
80
- nil
75
+ return nil unless node_in_ancestors?(TAIL[token])
76
+ convert_last_node_into_leaf until current_node.class == TAIL[token]
77
+ self.pop
81
78
  end
82
79
 
83
80
  def split_into_tokens(str)
84
- result = []
81
+ tokens = []
85
82
  while m = token_pat.match(str)
86
- result.push m.pre_match if m.pre_match
87
- result.push m[0]
83
+ tokens.push m.pre_match unless m.pre_match.empty?
84
+ tokens.push m[0]
88
85
  str = m.post_match
89
86
  end
90
- result.push str unless str.empty?
91
- result.delete_if {|token| token.empty? }
92
- result
87
+ tokens.push str unless str.empty?
88
+ tokens
93
89
  end
94
90
 
95
91
  def parse
@@ -102,15 +98,22 @@ module PseudoHiki
102
98
  end
103
99
 
104
100
  def self.parse(str)
105
- parser = new(str)
106
- parser.parse.tree
101
+ new(str).parse.tree #parser = new(str)
107
102
  end
108
103
  end
109
104
 
110
105
  class TableRowParser < InlineParser
106
+ TD, TH, ROW_EXPANDER, COL_EXPANDER, TH_PAT = %w(td th ^ > !)
107
+ MODIFIED_CELL_PAT = /^!?[>^]*/o
108
+
111
109
  module InlineElement
112
110
  class TableCellNode < InlineParser::InlineElement::InlineNode
113
111
  attr_accessor :cell_type, :rowspan, :colspan
112
+
113
+ def initialize
114
+ super
115
+ @cell_type, @rowspan, @colspan = TD, 1, 1
116
+ end
114
117
  end
115
118
  end
116
119
  include InlineElement
@@ -118,27 +121,22 @@ module PseudoHiki
118
121
  TAIL[TableSep] = TableCellNode
119
122
  TokenPat[self] = InlineParser::TokenPat[InlineParser]
120
123
 
121
- TD, TH, ROW_EXPANDER, COL_EXPANDER, TH_PAT = %w(td th ^ > !)
122
- MODIFIED_CELL_PAT = /^!?[>^]*/o
123
-
124
124
  class InlineElement::TableCellNode
125
- def parse_first_token(token)
126
- @cell_type, @rowspan, @colspan, parsed_token = TD, 1, 1, token.dup
127
- return token if token.kind_of? InlineParser::InlineNode
128
- token_str = parsed_token[0]
129
- m = MODIFIED_CELL_PAT.match(token_str) #if token.kind_of? String
130
-
131
- if m
132
- cell_modifiers = m[0].split(//o)
133
- if cell_modifiers.first == TH_PAT
134
- cell_modifiers.shift
135
- @cell_type = TH
136
- end
137
- parsed_token[0] = token_str.sub(MODIFIED_CELL_PAT,"")
138
- @rowspan = cell_modifiers.count(ROW_EXPANDER) + 1
139
- @colspan = cell_modifiers.count(COL_EXPANDER) + 1
125
+ def parse_cellspan(token_str)
126
+ return token_str if m = MODIFIED_CELL_PAT.match(token_str) and m[0].empty? #if token.kind_of? String
127
+ cell_modifiers = m[0]
128
+ if cell_modifiers[0].chr == TH_PAT
129
+ cell_modifiers[0] = ""
130
+ @cell_type = TH
140
131
  end
141
- parsed_token
132
+ @rowspan = cell_modifiers.count(ROW_EXPANDER) + 1
133
+ @colspan = cell_modifiers.count(COL_EXPANDER) + 1
134
+ token_str.sub(MODIFIED_CELL_PAT, "")
135
+ end
136
+
137
+ def parse_first_token(orig_tokens)
138
+ return orig_tokens if orig_tokens.kind_of? InlineParser::InlineNode
139
+ orig_tokens.dup.tap {|tokens| tokens[0] = parse_cellspan(tokens[0]) }
142
140
  end
143
141
 
144
142
  def push(token)