RubyGems - pseudohikiparser - Versions diffs - 0.0.0.6.develop → 0.0.0.7.develop - Mend

pseudohikiparser 0.0.0.6.develop → 0.0.0.7.develop

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (13) hide show

checksums.yaml +4 -4
data/README.md +69 -14
data/lib/htmlelement.rb +18 -3
data/lib/pseudohiki/blockparser.rb +40 -46
data/lib/pseudohiki/htmlformat.rb +1 -0
data/lib/pseudohiki/inlineparser.rb +32 -34
data/lib/pseudohiki/markdownformat.rb +369 -0
data/lib/pseudohiki/plaintextformat.rb +38 -52
data/lib/pseudohiki/version.rb +1 -1
data/test/test_htmlelement.rb +30 -0
data/test/test_htmlformat.rb +66 -0
data/test/test_markdownformat.rb +436 -0
metadata +5 -2

checksums.yaml CHANGED Viewed

@@ -1,7 +1,7 @@
 ---
 SHA1:
-  metadata.gz: 0cd301fcbf579cecea32ac473e1bffd351cb9e21
-  data.tar.gz: 5871802bd7b9dc0e71df8f3a5c4805339caf1833
+  metadata.gz: ac5860d6ae01c25224992ed3bdaeb09080446bab
+  data.tar.gz: fce47183ada974d2416789045023fe0b31e9f4a0
 SHA512:
-  metadata.gz: 411eeb634503a5f79d9f563ddcfcfa45140e2889c78463a494275d044c3fb7bbbaa50591a9a27d61604bda7adaf8e5ff43cdacd1b52c7b10bda672729b98d394
-  data.tar.gz: 9d58afc0b126a74e1b2ba1e9fc32efff8262a1a248e98037cb0c18371144bc3420bfd5b72a58d4a219dbcb48ff61653dca0b4a4abfb194bbe974fccd0c9be303
+  metadata.gz: de7070b265af53ea7fbcfe1bb09bb5d0c0cc0256e41e7074b712ada33483d0f913fea346c0ae4f5aaf2d961fbfb778b652a385b4621a02934f90d272517f8332
+  data.tar.gz: 897b062e0a00e9f3064d1237d1731dfba42934888997b0bbe4b0b08467000ef67dbb01e228c2c995cba5f90c48956050f88a188277093e93a67daf3c80cf6c9c

data/README.md CHANGED Viewed

@@ -8,10 +8,10 @@ Currently, only a limited range of notations can be converted into HTML4 or XHTM
 I am writing this tool with following objectives in mind,
 * provide some additional features that do not exist in the original Hiki notation
- * make the notation more line oriented
- * allow to assign ids to elements such as headings
+  * make the notation more line oriented
+  * allow to assign ids to elements such as headings
 * support several formats other than HTML
- * The visitor pattern is adopted for the implementation, so you only have to add a visitor class to support a certain format.
+  * The visitor pattern is adopted for the implementation, so you only have to add a visitor class to support a certain format.
 And, it would not be compatible with the original Hiki notation.
@@ -30,14 +30,14 @@ gem install pseudohikiparser --pre
 ### Samples
-[A sample text](https://github.com/nico-hn/PseudoHikiParser/blob/develop/samples/wikipage.txt) in Hiki notation and [a result of conversion](http://htmlpreview.github.com/?https://github.com/nico-hn/PseudoHikiParser/blob/develop/samples/wikipage.html), and [another result of conversion](http://htmlpreview.github.com/?https://github.com/nico-hn/PseudoHikiParser/blob/develop/samples/wikipage_with_toc.html)
+[A sample text](https://github.com/nico-hn/PseudoHikiParser/blob/develop/samples/wikipage.txt) in Hiki notation and [a result of conversion](http://htmlpreview.github.com/?https://github.com/nico-hn/PseudoHikiParser/blob/develop/samples/wikipage.html), [another result of conversion](http://htmlpreview.github.com/?https://github.com/nico-hn/PseudoHikiParser/blob/develop/samples/wikipage_with_toc.html) and [yet another result](http://htmlpreview.github.com/?https://github.com/nico-hn/PseudoHikiParser/blob/develop/samples/wikipage_html5_with_toc.html).
 You will find those samples in [develop branch](https://github.com/nico-hn/PseudoHikiParser/tree/develop/samples).
 ### pseudohiki2html.rb
-After the installation of PseudoHikiParser, you could use a command, _pseudohiki2html.rb_.
+After the installation of PseudoHikiParser, you could use a command: **pseudohiki2html.rb**.
 _Please note that pseudohiki2html.rb is currently provided as a showcase of PseudoHikiParser, and the options will be continuously changed at this stage of development._
@@ -90,7 +90,7 @@ For more options, please try `pseudohiki2html.rb --help`
 If you save the lines below as a ruby script and execute it:
-```
+```ruby
 #!/usr/bin/env ruby
 require 'pseudohikiparser'
@@ -106,7 +106,7 @@ puts html
 ```
 you will get the following output:
-```
+```html
 <div class="section h2">
 <h2> The first heading
 </h2>
@@ -119,17 +119,17 @@ The first paragraph
 Other than PseudoHiki::HtmlFormat, you can choose PseudoHiki::XhtmlFormat, PseudoHiki::Xhtml5Format, PseudoHiki::PlainTextFormat.
-## Development status of features from the original [Hiki notation](http://hikiwiki.org/en/TextFormattingRules.html)
+## Development status of features from the original [Hiki notation](http://rabbit-shocker.org/en/hiki.html)
 * Paragraphs - Usable
 * Links
- * WikiNames - Not supported (and would never be)
- * Linking to other Wiki pages - Not supported as well
- * Linking to an arbitrary URL - Maybe usable
+  * WikiNames - Not supported (and would never be)
+  * Linking to other Wiki pages - Not supported as well
+  * Linking to an arbitrary URL - Maybe usable
 * Preformatted text - Usable
 * Text decoration - Partly supported
- * Currently, there is no means of escaping tags for inline decorations.
- * The notation with backquote tags(``) for inline literals is not supported.
+  * Currently, there is no means of escaping tags for inline decorations.
+  * The notation with backquote tags(``) for inline literals is not supported.
 * Headings - Usable
 * Horizontal lines - Usable
 * Lists - Usable
@@ -197,7 +197,62 @@ cell 3-1	||	||	cell 3-4	cell 3-5
 cell 4-1	cell 4-2	cell 4-3	cell 4-4	cell 4-5
 ```
 #### A visitor for HTML5
-The visitor, [Xhtml5Format](https://github.com/nico-hn/PseudoHikiParser/blob/develop/lib/pseudohiki/htmlformat.rb#L225) is currently available only in the [develop branch](https://github.com/nico-hn/PseudoHikiParser/tree/develop).
+The visitor, [Xhtml5Format](https://github.com/nico-hn/PseudoHikiParser/blob/develop/lib/pseudohiki/htmlformat.rb#L222) is currently available only in the [develop branch](https://github.com/nico-hn/PseudoHikiParser/tree/develop).
+#### A vistor for (Git Flavored) Markdown
+The visitor, [MarkDownFormat](https://github.com/nico-hn/PseudoHikiParser/blob/develop/lib/pseudohiki/markdownformat.rb) too, is currently available only in the [develop branch](https://github.com/nico-hn/PseudoHikiParser/blob/develop/).
+It's just in experimental stage. For example, it cannot properly convert html elements appeared in hiki notation text yet.
+The following are a sample script and its output:
+```ruby
+#!/usr/bin/env ruby
+require 'pseudohiki/markdownformat'
+md = PseudoHiki::MarkDownFormat.create
+gfm = PseudoHiki::MarkDownFormat.create(gfm_style: true)
+hiki = <<TEXT
+!! The first heading
+The first paragraph
+||!header 1||!header 2
+||''cell 1''||cell2
+TEXT
+tree = PseudoHiki::BlockParser.parse(hiki)
+md_text = md.format(tree).to_s
+gfm_text = gfm.format(tree).to_s
+puts md_text
+puts "===================="
+puts gfm_text
+```
+(You will get the following output.)
+```
+## The first heading
+The first paragraph
+<table>
+<tr><th>header 1</th><th>header 2</th></tr>
+<tr><td><em>cell 1</em></td><td>cell2</td></tr>
+</table>
+====================
+## The first heading
+The first paragraph
+|header 1|header 2|
+|--------|--------|
+|_cell 1_|cell2   |
+```
 ### Not Implemented Yet

data/lib/htmlelement.rb CHANGED Viewed

@@ -5,6 +5,16 @@ require 'kconv'
 class HtmlElement
   class Children < Array
     alias to_s join
+    def traverse(&block)
+      each do |child|
+        if child.kind_of? HtmlElement or child.kind_of? Children
+          child.traverse(&block)
+        else
+          yield child
+        end
+      end
+    end
   end
   module CHARSET
@@ -64,11 +74,11 @@ class HtmlElement
   end
   def HtmlElement.urlencode(str)
-    str.toutf8.gsub(/[^\w\.\-]/n) {|ch| format('%%%02X', ch[0]) }
+    str.toutf8.gsub(/[^\w\.\-]/o) {|utf8_char| utf8_char.unpack("C*").map {|b| '%%%02X'%[b] }.join }
   end
   def HtmlElement.urldecode(str)
-    utf = str.gsub(/%\w\w/) {|ch| [ch[-2,2]].pack('H*') }
+    utf = str.gsub(/%\w\w/) {|ch| [ch[-2,2]].pack('H*') }.toutf8
     return utf.tosjis if $KCODE =~ /^s/io
     return utf.toeuc if $KCODE =~ /^e/io
     utf
@@ -84,7 +94,7 @@ class HtmlElement
   end
   def HtmlElement.escape(str)
-    str.gsub(/[&"<>]/on) {|pat| ESC[pat] }
+    str.gsub(/[&"<>]/o) {|pat| ESC[pat] }
   end
   def HtmlElement.decode(str)
@@ -144,6 +154,11 @@ class HtmlElement
     self.class::TagFormats[@tagname]%[@tagname, format_attributes, @children, @tagname]
   end
   alias to_str to_s
+  def traverse(&block)
+    yield self
+    @children.traverse(&block)
+  end
 end
 class XhtmlElement < HtmlElement

data/lib/pseudohiki/blockparser.rb CHANGED Viewed

@@ -24,10 +24,9 @@ module PseudoHiki
 #      return unless tree[0].kind_of? Array ** block_leaf:[inline_node:[token or inline_node]]
       head = leaf[0]
       return unless head.kind_of? String
-      m = ID_TAG_PAT.match(head)
-      if m
+      if m = ID_TAG_PAT.match(head)
         node.node_id = m[1]
-        leaf[0] = head.sub(ID_TAG_PAT,"")
+        leaf[0] = head.sub(ID_TAG_PAT, "")
       end
       node
     end
@@ -47,8 +46,7 @@ module PseudoHiki
     class BlockLeaf < BlockStack::Leaf
       @@head_re = {}
-      attr_accessor :nominal_level
-      attr_accessor :node_id
+      attr_accessor :nominal_level, :node_id
       def self.head_re=(head_regex)
         @@head_re[self] = head_regex
@@ -63,9 +61,8 @@ module PseudoHiki
       end
       def self.create(line, inline_parser=InlineParser)
-        line.sub!(self.head_re,"") if self.head_re
-        leaf = self.new
-        leaf.concat(inline_parser.parse(line))
+        line.sub!(self.head_re, "") if self.head_re
+        new.concat(inline_parser.parse(line)) #leaf = self.new
       end
       def self.assign_head_re(head, need_to_escape=true, reg_pat="(%s)")
@@ -105,7 +102,7 @@ module PseudoHiki
       include TreeStack::Mergeable
       def self.create(line)
-        line.sub!(self.head_re,"") if self.head_re
+        line.sub!(self.head_re, "") if self.head_re
         self.new.tap {|leaf| leaf.push line }
       end
@@ -142,7 +139,6 @@ module PseudoHiki
     class ListTypeLeaf < NestedBlockLeaf; end
     class BlockNode < BlockStack::Node
-      attr_accessor :base_level, :relative_level_from_base
       attr_accessor :node_id
       def nominal_level
@@ -160,6 +156,24 @@ module PseudoHiki
       end
       def parse_leafs; end
+      def in_link_tag?(preceding_str)
+        preceding_str[-2, 2] == "[[" or preceding_str[-1, 1] == "|"
+      end
+      def tagfy_link(line)
+        line.gsub(URI_RE) {|url| in_link_tag?($`) ? url : "[[#{url}]]" }
+      end
+      def add_leaf(line, blockparser)
+        if LINE_PAT::VERBATIM_BEGIN =~ line
+          return blockparser.stack.push BlockElement::VerbatimNode.new.tap {|node| node.in_block_tag = true }
+        end
+        line = tagfy_link(line) unless BlockElement::VerbatimLeaf.head_re =~ line
+        leaf = blockparser.select_leaf_type(line).create(line)
+        blockparser.stack.pop while blockparser.breakable?(leaf)
+        blockparser.stack.push leaf
+      end
     end
     class NonNestedBlockNode < BlockNode
@@ -203,13 +217,23 @@ module PseudoHiki
       def push_self(stack); end
     end
+    class BlockElement::VerbatimNode
+      attr_writer :in_block_tag
+      def add_leaf(line, blockparser)
+        return @stack.pop if LINE_PAT::VERBATIM_END =~ line
+        return super(line, blockparser) unless @in_block_tag
+        line = " ".concat(line) if BlockElement::BlockNodeEnd.head_re =~ line and not @in_block_tag
+        @stack.push BlockElement::VerbatimLeaf.create(line, @in_block_tag)
+      end
+    end
     class BlockElement::QuoteNode
       def parse_leafs
         self[0] = BlockParser.parse(self[0])
       end
     end
-#    class HeadingNode
     class BlockElement::HeadingNode
       def breakable?(breaker)
         kind_of?(breaker.block) and nominal_level >= breaker.nominal_level
@@ -217,8 +241,8 @@ module PseudoHiki
     end
     class BlockElement::VerbatimLeaf
-      def self.create(line)
-        line.sub!(self.head_re,"") if self.head_re
+      def self.create(line, in_block_tag=nil)
+        line.sub!(self.head_re, "") if self.head_re and not in_block_tag
         self.new.tap {|leaf| leaf.push line }
       end
     end
@@ -297,46 +321,16 @@ module PseudoHiki
       @stack.current_node.breakable?(breaker)
     end
-    def in_link_tag?(preceding_str)
-      preceding_str[-2,2] == "[[" or preceding_str[-1,1] == "|"
-    end
-    def tagfy_link(line)
-      line.gsub(URI_RE) {|url| in_link_tag?($`) ? url : "[[#{url}]]" }
-    end
     def select_leaf_type(line)
       [BlockNodeEnd, HrLeaf].each {|leaf| return leaf if leaf.head_re =~ line }
       matched = HEAD_RE.match(line)
-      return HeadToLeaf[matched[0]]||HeadToLeaf[line[0,1]] || HeadToLeaf['\s'] if matched
+      return HeadToLeaf[matched[0]]||HeadToLeaf[line[0, 1]] || HeadToLeaf['\s'] if matched
       ParagraphLeaf
     end
-    def add_verbatim_block(lines)
-      until lines.empty? or LINE_PAT::VERBATIM_END =~ lines.first
-        lines[0] = " " + lines[0] if BlockNodeEnd.head_re =~ lines.first
-        @stack.push(VerbatimLeaf.create(lines.shift))
-      end
-      lines.shift if LINE_PAT::VERBATIM_END =~ lines.first
-    end
-    def add_leaf(line)
-      leaf = select_leaf_type(line).create(line)
-      while breakable?(leaf)
-        @stack.pop
-      end
-      @stack.push leaf
-    end
     def read_lines(lines)
-      while line = lines.shift
-        if LINE_PAT::VERBATIM_BEGIN =~ line
-          add_verbatim_block(lines)
-        else
-          line = self.tagfy_link(line) unless VerbatimLeaf.head_re =~ line
-          add_leaf(line)
-        end
-      end
+      each_line = lines.respond_to?(:each_line) ? :each_line : :each
+      lines.send(each_line) {|line| @stack.current_node.add_leaf(line, self) }
       @stack.pop
     end
   end

data/lib/pseudohiki/htmlformat.rb CHANGED Viewed

@@ -168,6 +168,7 @@ module PseudoHiki
         super(tree).tap do |element|
           element["rowspan"] = tree.rowspan if tree.rowspan > 1
           element["colspan"] = tree.colspan if tree.colspan > 1
+          # element.push "&#160;" if element.empty? # &#160; = &nbsp; this line would be necessary for HTML 4 or XHTML 1.0
         end
       end
     end

data/lib/pseudohiki/inlineparser.rb CHANGED Viewed

@@ -62,8 +62,7 @@ module PseudoHiki
     def convert_last_node_into_leaf
       last_node = remove_current_node
       tag_head = NodeTypeToHead[last_node.class]
-      tag_head_leaf = InlineLeaf.create(tag_head)
-      self.push tag_head_leaf
+      self.push InlineLeaf.create(tag_head)
       last_node.each {|leaf| self.push_as_leaf leaf }
     end
@@ -73,23 +72,20 @@ module PseudoHiki
     def treated_as_node_end(token)
       return self.pop if current_node.class == TAIL[token]
-      if node_in_ancestors?(TAIL[token])
-        convert_last_node_into_leaf until current_node.class == TAIL[token]
-        return self.pop
-      end
-      nil
+      return nil unless node_in_ancestors?(TAIL[token])
+      convert_last_node_into_leaf until current_node.class == TAIL[token]
+      self.pop
     end
     def split_into_tokens(str)
-      result = []
+      tokens = []
       while m = token_pat.match(str)
-        result.push m.pre_match if m.pre_match
-        result.push m[0]
+        tokens.push m.pre_match unless m.pre_match.empty?
+        tokens.push m[0]
         str = m.post_match
       end
-      result.push str unless str.empty?
-      result.delete_if {|token| token.empty? }
-      result
+      tokens.push str unless str.empty?
+      tokens
     end
     def parse
@@ -102,15 +98,22 @@ module PseudoHiki
     end
     def self.parse(str)
-      parser = new(str)
-      parser.parse.tree
+      new(str).parse.tree #parser = new(str)
     end
   end
   class TableRowParser < InlineParser
+    TD, TH, ROW_EXPANDER, COL_EXPANDER, TH_PAT = %w(td th ^ > !)
+    MODIFIED_CELL_PAT = /^!?[>^]*/o
     module InlineElement
       class TableCellNode < InlineParser::InlineElement::InlineNode
         attr_accessor :cell_type, :rowspan, :colspan
+        def initialize
+          super
+          @cell_type, @rowspan, @colspan = TD, 1, 1
+        end
       end
     end
     include InlineElement
@@ -118,27 +121,22 @@ module PseudoHiki
     TAIL[TableSep] = TableCellNode
     TokenPat[self] = InlineParser::TokenPat[InlineParser]
-    TD, TH, ROW_EXPANDER, COL_EXPANDER, TH_PAT = %w(td th ^ > !)
-    MODIFIED_CELL_PAT = /^!?[>^]*/o
     class InlineElement::TableCellNode
-      def parse_first_token(token)
-        @cell_type, @rowspan, @colspan, parsed_token = TD, 1, 1, token.dup
-        return token if token.kind_of? InlineParser::InlineNode
-        token_str = parsed_token[0]
-        m = MODIFIED_CELL_PAT.match(token_str) #if token.kind_of? String
-        if m
-          cell_modifiers = m[0].split(//o)
-          if cell_modifiers.first == TH_PAT
-            cell_modifiers.shift
-            @cell_type = TH
-          end
-          parsed_token[0] = token_str.sub(MODIFIED_CELL_PAT,"")
-          @rowspan = cell_modifiers.count(ROW_EXPANDER) + 1
-          @colspan = cell_modifiers.count(COL_EXPANDER) + 1
+      def parse_cellspan(token_str)
+        return token_str if m = MODIFIED_CELL_PAT.match(token_str) and m[0].empty? #if token.kind_of? String
+        cell_modifiers = m[0]
+        if cell_modifiers[0].chr == TH_PAT
+          cell_modifiers[0] = ""
+          @cell_type = TH
         end
-        parsed_token
+        @rowspan = cell_modifiers.count(ROW_EXPANDER) + 1
+        @colspan = cell_modifiers.count(COL_EXPANDER) + 1
+        token_str.sub(MODIFIED_CELL_PAT, "")
+      end
+      def parse_first_token(orig_tokens)
+        return orig_tokens if orig_tokens.kind_of? InlineParser::InlineNode
+        orig_tokens.dup.tap {|tokens| tokens[0] = parse_cellspan(tokens[0]) }
       end
       def push(token)