pseudohikiparser 0.0.0.6.develop → 0.0.0.7.develop

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA1:
3
- metadata.gz: 0cd301fcbf579cecea32ac473e1bffd351cb9e21
4
- data.tar.gz: 5871802bd7b9dc0e71df8f3a5c4805339caf1833
3
+ metadata.gz: ac5860d6ae01c25224992ed3bdaeb09080446bab
4
+ data.tar.gz: fce47183ada974d2416789045023fe0b31e9f4a0
5
5
  SHA512:
6
- metadata.gz: 411eeb634503a5f79d9f563ddcfcfa45140e2889c78463a494275d044c3fb7bbbaa50591a9a27d61604bda7adaf8e5ff43cdacd1b52c7b10bda672729b98d394
7
- data.tar.gz: 9d58afc0b126a74e1b2ba1e9fc32efff8262a1a248e98037cb0c18371144bc3420bfd5b72a58d4a219dbcb48ff61653dca0b4a4abfb194bbe974fccd0c9be303
6
+ metadata.gz: de7070b265af53ea7fbcfe1bb09bb5d0c0cc0256e41e7074b712ada33483d0f913fea346c0ae4f5aaf2d961fbfb778b652a385b4621a02934f90d272517f8332
7
+ data.tar.gz: 897b062e0a00e9f3064d1237d1731dfba42934888997b0bbe4b0b08467000ef67dbb01e228c2c995cba5f90c48956050f88a188277093e93a67daf3c80cf6c9c
data/README.md CHANGED
@@ -8,10 +8,10 @@ Currently, only a limited range of notations can be converted into HTML4 or XHTM
8
8
  I am writing this tool with following objectives in mind,
9
9
 
10
10
  * provide some additional features that do not exist in the original Hiki notation
11
- * make the notation more line oriented
12
- * allow to assign ids to elements such as headings
11
+ * make the notation more line oriented
12
+ * allow to assign ids to elements such as headings
13
13
  * support several formats other than HTML
14
- * The visitor pattern is adopted for the implementation, so you only have to add a visitor class to support a certain format.
14
+ * The visitor pattern is adopted for the implementation, so you only have to add a visitor class to support a certain format.
15
15
 
16
16
  And, it would not be compatible with the original Hiki notation.
17
17
 
@@ -30,14 +30,14 @@ gem install pseudohikiparser --pre
30
30
 
31
31
  ### Samples
32
32
 
33
- [A sample text](https://github.com/nico-hn/PseudoHikiParser/blob/develop/samples/wikipage.txt) in Hiki notation and [a result of conversion](http://htmlpreview.github.com/?https://github.com/nico-hn/PseudoHikiParser/blob/develop/samples/wikipage.html), and [another result of conversion](http://htmlpreview.github.com/?https://github.com/nico-hn/PseudoHikiParser/blob/develop/samples/wikipage_with_toc.html)
33
+ [A sample text](https://github.com/nico-hn/PseudoHikiParser/blob/develop/samples/wikipage.txt) in Hiki notation and [a result of conversion](http://htmlpreview.github.com/?https://github.com/nico-hn/PseudoHikiParser/blob/develop/samples/wikipage.html), [another result of conversion](http://htmlpreview.github.com/?https://github.com/nico-hn/PseudoHikiParser/blob/develop/samples/wikipage_with_toc.html) and [yet another result](http://htmlpreview.github.com/?https://github.com/nico-hn/PseudoHikiParser/blob/develop/samples/wikipage_html5_with_toc.html).
34
34
 
35
35
  You will find those samples in [develop branch](https://github.com/nico-hn/PseudoHikiParser/tree/develop/samples).
36
36
 
37
37
 
38
38
  ### pseudohiki2html.rb
39
39
 
40
- After the installation of PseudoHikiParser, you could use a command, _pseudohiki2html.rb_.
40
+ After the installation of PseudoHikiParser, you could use a command: **pseudohiki2html.rb**.
41
41
 
42
42
  _Please note that pseudohiki2html.rb is currently provided as a showcase of PseudoHikiParser, and the options will be continuously changed at this stage of development._
43
43
 
@@ -90,7 +90,7 @@ For more options, please try `pseudohiki2html.rb --help`
90
90
 
91
91
  If you save the lines below as a ruby script and execute it:
92
92
 
93
- ```
93
+ ```ruby
94
94
  #!/usr/bin/env ruby
95
95
 
96
96
  require 'pseudohikiparser'
@@ -106,7 +106,7 @@ puts html
106
106
  ```
107
107
  you will get the following output:
108
108
 
109
- ```
109
+ ```html
110
110
  <div class="section h2">
111
111
  <h2> The first heading
112
112
  </h2>
@@ -119,17 +119,17 @@ The first paragraph
119
119
 
120
120
  Other than PseudoHiki::HtmlFormat, you can choose PseudoHiki::XhtmlFormat, PseudoHiki::Xhtml5Format, PseudoHiki::PlainTextFormat.
121
121
 
122
- ## Development status of features from the original [Hiki notation](http://hikiwiki.org/en/TextFormattingRules.html)
122
+ ## Development status of features from the original [Hiki notation](http://rabbit-shocker.org/en/hiki.html)
123
123
 
124
124
  * Paragraphs - Usable
125
125
  * Links
126
- * WikiNames - Not supported (and would never be)
127
- * Linking to other Wiki pages - Not supported as well
128
- * Linking to an arbitrary URL - Maybe usable
126
+ * WikiNames - Not supported (and would never be)
127
+ * Linking to other Wiki pages - Not supported as well
128
+ * Linking to an arbitrary URL - Maybe usable
129
129
  * Preformatted text - Usable
130
130
  * Text decoration - Partly supported
131
- * Currently, there is no means of escaping tags for inline decorations.
132
- * The notation with backquote tags(``) for inline literals is not supported.
131
+ * Currently, there is no means of escaping tags for inline decorations.
132
+ * The notation with backquote tags(``) for inline literals is not supported.
133
133
  * Headings - Usable
134
134
  * Horizontal lines - Usable
135
135
  * Lists - Usable
@@ -197,7 +197,62 @@ cell 3-1 || || cell 3-4 cell 3-5
197
197
  cell 4-1 cell 4-2 cell 4-3 cell 4-4 cell 4-5
198
198
  ```
199
199
  #### A visitor for HTML5
200
- The visitor, [Xhtml5Format](https://github.com/nico-hn/PseudoHikiParser/blob/develop/lib/pseudohiki/htmlformat.rb#L225) is currently available only in the [develop branch](https://github.com/nico-hn/PseudoHikiParser/tree/develop).
200
+ The visitor, [Xhtml5Format](https://github.com/nico-hn/PseudoHikiParser/blob/develop/lib/pseudohiki/htmlformat.rb#L222) is currently available only in the [develop branch](https://github.com/nico-hn/PseudoHikiParser/tree/develop).
201
+
202
+ #### A vistor for (Git Flavored) Markdown
203
+
204
+ The visitor, [MarkDownFormat](https://github.com/nico-hn/PseudoHikiParser/blob/develop/lib/pseudohiki/markdownformat.rb) too, is currently available only in the [develop branch](https://github.com/nico-hn/PseudoHikiParser/blob/develop/).
205
+
206
+ It's just in experimental stage. For example, it cannot properly convert html elements appeared in hiki notation text yet.
207
+
208
+ The following are a sample script and its output:
209
+
210
+ ```ruby
211
+ #!/usr/bin/env ruby
212
+
213
+ require 'pseudohiki/markdownformat'
214
+
215
+ md = PseudoHiki::MarkDownFormat.create
216
+ gfm = PseudoHiki::MarkDownFormat.create(gfm_style: true)
217
+
218
+ hiki = <<TEXT
219
+ !! The first heading
220
+
221
+ The first paragraph
201
222
 
223
+ ||!header 1||!header 2
224
+ ||''cell 1''||cell2
225
+
226
+ TEXT
227
+
228
+ tree = PseudoHiki::BlockParser.parse(hiki)
229
+ md_text = md.format(tree).to_s
230
+ gfm_text = gfm.format(tree).to_s
231
+ puts md_text
232
+ puts "===================="
233
+ puts gfm_text
234
+ ```
235
+
236
+ (You will get the following output.)
237
+
238
+ ```
239
+ ## The first heading
240
+
241
+ The first paragraph
242
+
243
+ <table>
244
+ <tr><th>header 1</th><th>header 2</th></tr>
245
+ <tr><td><em>cell 1</em></td><td>cell2</td></tr>
246
+ </table>
247
+
248
+ ====================
249
+ ## The first heading
250
+
251
+ The first paragraph
252
+
253
+ |header 1|header 2|
254
+ |--------|--------|
255
+ |_cell 1_|cell2 |
256
+ ```
202
257
 
203
258
  ### Not Implemented Yet
data/lib/htmlelement.rb CHANGED
@@ -5,6 +5,16 @@ require 'kconv'
5
5
  class HtmlElement
6
6
  class Children < Array
7
7
  alias to_s join
8
+
9
+ def traverse(&block)
10
+ each do |child|
11
+ if child.kind_of? HtmlElement or child.kind_of? Children
12
+ child.traverse(&block)
13
+ else
14
+ yield child
15
+ end
16
+ end
17
+ end
8
18
  end
9
19
 
10
20
  module CHARSET
@@ -64,11 +74,11 @@ class HtmlElement
64
74
  end
65
75
 
66
76
  def HtmlElement.urlencode(str)
67
- str.toutf8.gsub(/[^\w\.\-]/n) {|ch| format('%%%02X', ch[0]) }
77
+ str.toutf8.gsub(/[^\w\.\-]/o) {|utf8_char| utf8_char.unpack("C*").map {|b| '%%%02X'%[b] }.join }
68
78
  end
69
79
 
70
80
  def HtmlElement.urldecode(str)
71
- utf = str.gsub(/%\w\w/) {|ch| [ch[-2,2]].pack('H*') }
81
+ utf = str.gsub(/%\w\w/) {|ch| [ch[-2,2]].pack('H*') }.toutf8
72
82
  return utf.tosjis if $KCODE =~ /^s/io
73
83
  return utf.toeuc if $KCODE =~ /^e/io
74
84
  utf
@@ -84,7 +94,7 @@ class HtmlElement
84
94
  end
85
95
 
86
96
  def HtmlElement.escape(str)
87
- str.gsub(/[&"<>]/on) {|pat| ESC[pat] }
97
+ str.gsub(/[&"<>]/o) {|pat| ESC[pat] }
88
98
  end
89
99
 
90
100
  def HtmlElement.decode(str)
@@ -144,6 +154,11 @@ class HtmlElement
144
154
  self.class::TagFormats[@tagname]%[@tagname, format_attributes, @children, @tagname]
145
155
  end
146
156
  alias to_str to_s
157
+
158
+ def traverse(&block)
159
+ yield self
160
+ @children.traverse(&block)
161
+ end
147
162
  end
148
163
 
149
164
  class XhtmlElement < HtmlElement
@@ -24,10 +24,9 @@ module PseudoHiki
24
24
  # return unless tree[0].kind_of? Array ** block_leaf:[inline_node:[token or inline_node]]
25
25
  head = leaf[0]
26
26
  return unless head.kind_of? String
27
- m = ID_TAG_PAT.match(head)
28
- if m
27
+ if m = ID_TAG_PAT.match(head)
29
28
  node.node_id = m[1]
30
- leaf[0] = head.sub(ID_TAG_PAT,"")
29
+ leaf[0] = head.sub(ID_TAG_PAT, "")
31
30
  end
32
31
  node
33
32
  end
@@ -47,8 +46,7 @@ module PseudoHiki
47
46
 
48
47
  class BlockLeaf < BlockStack::Leaf
49
48
  @@head_re = {}
50
- attr_accessor :nominal_level
51
- attr_accessor :node_id
49
+ attr_accessor :nominal_level, :node_id
52
50
 
53
51
  def self.head_re=(head_regex)
54
52
  @@head_re[self] = head_regex
@@ -63,9 +61,8 @@ module PseudoHiki
63
61
  end
64
62
 
65
63
  def self.create(line, inline_parser=InlineParser)
66
- line.sub!(self.head_re,"") if self.head_re
67
- leaf = self.new
68
- leaf.concat(inline_parser.parse(line))
64
+ line.sub!(self.head_re, "") if self.head_re
65
+ new.concat(inline_parser.parse(line)) #leaf = self.new
69
66
  end
70
67
 
71
68
  def self.assign_head_re(head, need_to_escape=true, reg_pat="(%s)")
@@ -105,7 +102,7 @@ module PseudoHiki
105
102
  include TreeStack::Mergeable
106
103
 
107
104
  def self.create(line)
108
- line.sub!(self.head_re,"") if self.head_re
105
+ line.sub!(self.head_re, "") if self.head_re
109
106
  self.new.tap {|leaf| leaf.push line }
110
107
  end
111
108
 
@@ -142,7 +139,6 @@ module PseudoHiki
142
139
  class ListTypeLeaf < NestedBlockLeaf; end
143
140
 
144
141
  class BlockNode < BlockStack::Node
145
- attr_accessor :base_level, :relative_level_from_base
146
142
  attr_accessor :node_id
147
143
 
148
144
  def nominal_level
@@ -160,6 +156,24 @@ module PseudoHiki
160
156
  end
161
157
 
162
158
  def parse_leafs; end
159
+
160
+ def in_link_tag?(preceding_str)
161
+ preceding_str[-2, 2] == "[[" or preceding_str[-1, 1] == "|"
162
+ end
163
+
164
+ def tagfy_link(line)
165
+ line.gsub(URI_RE) {|url| in_link_tag?($`) ? url : "[[#{url}]]" }
166
+ end
167
+
168
+ def add_leaf(line, blockparser)
169
+ if LINE_PAT::VERBATIM_BEGIN =~ line
170
+ return blockparser.stack.push BlockElement::VerbatimNode.new.tap {|node| node.in_block_tag = true }
171
+ end
172
+ line = tagfy_link(line) unless BlockElement::VerbatimLeaf.head_re =~ line
173
+ leaf = blockparser.select_leaf_type(line).create(line)
174
+ blockparser.stack.pop while blockparser.breakable?(leaf)
175
+ blockparser.stack.push leaf
176
+ end
163
177
  end
164
178
 
165
179
  class NonNestedBlockNode < BlockNode
@@ -203,13 +217,23 @@ module PseudoHiki
203
217
  def push_self(stack); end
204
218
  end
205
219
 
220
+ class BlockElement::VerbatimNode
221
+ attr_writer :in_block_tag
222
+
223
+ def add_leaf(line, blockparser)
224
+ return @stack.pop if LINE_PAT::VERBATIM_END =~ line
225
+ return super(line, blockparser) unless @in_block_tag
226
+ line = " ".concat(line) if BlockElement::BlockNodeEnd.head_re =~ line and not @in_block_tag
227
+ @stack.push BlockElement::VerbatimLeaf.create(line, @in_block_tag)
228
+ end
229
+ end
230
+
206
231
  class BlockElement::QuoteNode
207
232
  def parse_leafs
208
233
  self[0] = BlockParser.parse(self[0])
209
234
  end
210
235
  end
211
236
 
212
- # class HeadingNode
213
237
  class BlockElement::HeadingNode
214
238
  def breakable?(breaker)
215
239
  kind_of?(breaker.block) and nominal_level >= breaker.nominal_level
@@ -217,8 +241,8 @@ module PseudoHiki
217
241
  end
218
242
 
219
243
  class BlockElement::VerbatimLeaf
220
- def self.create(line)
221
- line.sub!(self.head_re,"") if self.head_re
244
+ def self.create(line, in_block_tag=nil)
245
+ line.sub!(self.head_re, "") if self.head_re and not in_block_tag
222
246
  self.new.tap {|leaf| leaf.push line }
223
247
  end
224
248
  end
@@ -297,46 +321,16 @@ module PseudoHiki
297
321
  @stack.current_node.breakable?(breaker)
298
322
  end
299
323
 
300
- def in_link_tag?(preceding_str)
301
- preceding_str[-2,2] == "[[" or preceding_str[-1,1] == "|"
302
- end
303
-
304
- def tagfy_link(line)
305
- line.gsub(URI_RE) {|url| in_link_tag?($`) ? url : "[[#{url}]]" }
306
- end
307
-
308
324
  def select_leaf_type(line)
309
325
  [BlockNodeEnd, HrLeaf].each {|leaf| return leaf if leaf.head_re =~ line }
310
326
  matched = HEAD_RE.match(line)
311
- return HeadToLeaf[matched[0]]||HeadToLeaf[line[0,1]] || HeadToLeaf['\s'] if matched
327
+ return HeadToLeaf[matched[0]]||HeadToLeaf[line[0, 1]] || HeadToLeaf['\s'] if matched
312
328
  ParagraphLeaf
313
329
  end
314
330
 
315
- def add_verbatim_block(lines)
316
- until lines.empty? or LINE_PAT::VERBATIM_END =~ lines.first
317
- lines[0] = " " + lines[0] if BlockNodeEnd.head_re =~ lines.first
318
- @stack.push(VerbatimLeaf.create(lines.shift))
319
- end
320
- lines.shift if LINE_PAT::VERBATIM_END =~ lines.first
321
- end
322
-
323
- def add_leaf(line)
324
- leaf = select_leaf_type(line).create(line)
325
- while breakable?(leaf)
326
- @stack.pop
327
- end
328
- @stack.push leaf
329
- end
330
-
331
331
  def read_lines(lines)
332
- while line = lines.shift
333
- if LINE_PAT::VERBATIM_BEGIN =~ line
334
- add_verbatim_block(lines)
335
- else
336
- line = self.tagfy_link(line) unless VerbatimLeaf.head_re =~ line
337
- add_leaf(line)
338
- end
339
- end
332
+ each_line = lines.respond_to?(:each_line) ? :each_line : :each
333
+ lines.send(each_line) {|line| @stack.current_node.add_leaf(line, self) }
340
334
  @stack.pop
341
335
  end
342
336
  end
@@ -168,6 +168,7 @@ module PseudoHiki
168
168
  super(tree).tap do |element|
169
169
  element["rowspan"] = tree.rowspan if tree.rowspan > 1
170
170
  element["colspan"] = tree.colspan if tree.colspan > 1
171
+ # element.push "&#160;" if element.empty? # &#160; = &nbsp; this line would be necessary for HTML 4 or XHTML 1.0
171
172
  end
172
173
  end
173
174
  end
@@ -62,8 +62,7 @@ module PseudoHiki
62
62
  def convert_last_node_into_leaf
63
63
  last_node = remove_current_node
64
64
  tag_head = NodeTypeToHead[last_node.class]
65
- tag_head_leaf = InlineLeaf.create(tag_head)
66
- self.push tag_head_leaf
65
+ self.push InlineLeaf.create(tag_head)
67
66
  last_node.each {|leaf| self.push_as_leaf leaf }
68
67
  end
69
68
 
@@ -73,23 +72,20 @@ module PseudoHiki
73
72
 
74
73
  def treated_as_node_end(token)
75
74
  return self.pop if current_node.class == TAIL[token]
76
- if node_in_ancestors?(TAIL[token])
77
- convert_last_node_into_leaf until current_node.class == TAIL[token]
78
- return self.pop
79
- end
80
- nil
75
+ return nil unless node_in_ancestors?(TAIL[token])
76
+ convert_last_node_into_leaf until current_node.class == TAIL[token]
77
+ self.pop
81
78
  end
82
79
 
83
80
  def split_into_tokens(str)
84
- result = []
81
+ tokens = []
85
82
  while m = token_pat.match(str)
86
- result.push m.pre_match if m.pre_match
87
- result.push m[0]
83
+ tokens.push m.pre_match unless m.pre_match.empty?
84
+ tokens.push m[0]
88
85
  str = m.post_match
89
86
  end
90
- result.push str unless str.empty?
91
- result.delete_if {|token| token.empty? }
92
- result
87
+ tokens.push str unless str.empty?
88
+ tokens
93
89
  end
94
90
 
95
91
  def parse
@@ -102,15 +98,22 @@ module PseudoHiki
102
98
  end
103
99
 
104
100
  def self.parse(str)
105
- parser = new(str)
106
- parser.parse.tree
101
+ new(str).parse.tree #parser = new(str)
107
102
  end
108
103
  end
109
104
 
110
105
  class TableRowParser < InlineParser
106
+ TD, TH, ROW_EXPANDER, COL_EXPANDER, TH_PAT = %w(td th ^ > !)
107
+ MODIFIED_CELL_PAT = /^!?[>^]*/o
108
+
111
109
  module InlineElement
112
110
  class TableCellNode < InlineParser::InlineElement::InlineNode
113
111
  attr_accessor :cell_type, :rowspan, :colspan
112
+
113
+ def initialize
114
+ super
115
+ @cell_type, @rowspan, @colspan = TD, 1, 1
116
+ end
114
117
  end
115
118
  end
116
119
  include InlineElement
@@ -118,27 +121,22 @@ module PseudoHiki
118
121
  TAIL[TableSep] = TableCellNode
119
122
  TokenPat[self] = InlineParser::TokenPat[InlineParser]
120
123
 
121
- TD, TH, ROW_EXPANDER, COL_EXPANDER, TH_PAT = %w(td th ^ > !)
122
- MODIFIED_CELL_PAT = /^!?[>^]*/o
123
-
124
124
  class InlineElement::TableCellNode
125
- def parse_first_token(token)
126
- @cell_type, @rowspan, @colspan, parsed_token = TD, 1, 1, token.dup
127
- return token if token.kind_of? InlineParser::InlineNode
128
- token_str = parsed_token[0]
129
- m = MODIFIED_CELL_PAT.match(token_str) #if token.kind_of? String
130
-
131
- if m
132
- cell_modifiers = m[0].split(//o)
133
- if cell_modifiers.first == TH_PAT
134
- cell_modifiers.shift
135
- @cell_type = TH
136
- end
137
- parsed_token[0] = token_str.sub(MODIFIED_CELL_PAT,"")
138
- @rowspan = cell_modifiers.count(ROW_EXPANDER) + 1
139
- @colspan = cell_modifiers.count(COL_EXPANDER) + 1
125
+ def parse_cellspan(token_str)
126
+ return token_str if m = MODIFIED_CELL_PAT.match(token_str) and m[0].empty? #if token.kind_of? String
127
+ cell_modifiers = m[0]
128
+ if cell_modifiers[0].chr == TH_PAT
129
+ cell_modifiers[0] = ""
130
+ @cell_type = TH
140
131
  end
141
- parsed_token
132
+ @rowspan = cell_modifiers.count(ROW_EXPANDER) + 1
133
+ @colspan = cell_modifiers.count(COL_EXPANDER) + 1
134
+ token_str.sub(MODIFIED_CELL_PAT, "")
135
+ end
136
+
137
+ def parse_first_token(orig_tokens)
138
+ return orig_tokens if orig_tokens.kind_of? InlineParser::InlineNode
139
+ orig_tokens.dup.tap {|tokens| tokens[0] = parse_cellspan(tokens[0]) }
142
140
  end
143
141
 
144
142
  def push(token)