pseudohikiparser 0.0.0.4.develop → 0.0.0.5.develop

Sign up to get free protection for your applications and to get access to all the features.
data/LICENSE ADDED
@@ -0,0 +1,23 @@
1
+ Copyright (c) 2011, HASHIMOTO Naoki
2
+ All rights reserved.
3
+
4
+ Redistribution and use in source and binary forms, with or without modification,
5
+ are permitted provided that the following conditions are met:
6
+
7
+ * Redistributions of source code must retain the above copyright notice, this
8
+ list of conditions and the following disclaimer.
9
+
10
+ * Redistributions in binary form must reproduce the above copyright notice, this
11
+ list of conditions and the following disclaimer in the documentation and/or
12
+ other materials provided with the distribution.
13
+
14
+ THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND
15
+ ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
16
+ WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
17
+ DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR
18
+ ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
19
+ (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
20
+ LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON
21
+ ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
22
+ (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
23
+ SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
data/README.md ADDED
@@ -0,0 +1,203 @@
1
+ PseudoHikiParser
2
+ ================
3
+
4
+ PseudoHikiParser is a converter of texts written in a [Hiki](http://hikiwiki.org/en/) like notation, into html or other formats.
5
+
6
+ Currently, only a limited range of notations can be converted into HTML4 or XHTML1.0.
7
+
8
+ I am writing this tool with following objectives in mind,
9
+
10
+ * provide some additional features that do not exist in the original Hiki notation
11
+ * make the notation more line oriented
12
+ * allow to assign ids to elements such as headings
13
+ * support several formats other than HTML
14
+ * The visitor pattern is adopted for the implementation, so you only have to add a visitor class to support a certain format.
15
+
16
+ And, it would not be compatible with the original Hiki notation.
17
+
18
+ ## License
19
+
20
+ BSD 2-Clause License
21
+
22
+ ## Installation
23
+
24
+ ```
25
+ gem install pseudohikiparser --pre
26
+ ```
27
+
28
+
29
+ ## Usage
30
+
31
+ ### Samples
32
+
33
+ [A sample text](https://github.com/nico-hn/PseudoHikiParser/blob/develop/samples/wikipage.txt) in Hiki notation and [a result of conversion](http://htmlpreview.github.com/?https://github.com/nico-hn/PseudoHikiParser/blob/develop/samples/wikipage.html), and [another result of conversion](http://htmlpreview.github.com/?https://github.com/nico-hn/PseudoHikiParser/blob/develop/samples/wikipage_with_toc.html)
34
+
35
+ You will find those samples in [develop branch](https://github.com/nico-hn/PseudoHikiParser/tree/develop/samples).
36
+
37
+
38
+ ### pseudohiki2html.rb
39
+
40
+ After the installation of PseudoHikiParser, you could use a command, _pseudohiki2html.rb_.
41
+
42
+ _Please note that pseudohiki2html.rb is currently provided as a showcase of PseudoHikiParser, and the options will be continuously changed at this stage of development._
43
+
44
+ Typing the following lines at the command prompt:
45
+
46
+ ```
47
+ pseudohiki2html.rb <<TEXT
48
+ !! The first heading
49
+ The first paragraph
50
+ TEXT
51
+ ```
52
+ will return the following result to stdout:
53
+
54
+ ```html
55
+ <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"
56
+ "http://www.w3.org/TR/html4/loose.dtd">
57
+ <html lang="en">
58
+ <head>
59
+ <meta content="en" http-equiv="Content-Language">
60
+ <meta content="text/html; charset=UTF-8" http-equiv="Content-Type">
61
+ <meta content="text/javascript" http-equiv="Content-Script-Type">
62
+ <title>-</title>
63
+ <link href="default.css" rel="stylesheet" type="text/css">
64
+ </head>
65
+ <body>
66
+ <div class="section h2">
67
+ <h2> The first heading
68
+ </h2>
69
+ <p>
70
+ The first paragraph
71
+ </p>
72
+ <!-- end of section h2 -->
73
+ </div>
74
+ </body>
75
+ </html>
76
+ ```
77
+ And if you specify a file name with `--output` option:
78
+
79
+ ```
80
+ pseudohiki2html.rb --output first_example.html <<TEXT
81
+ !! The first heading
82
+ The first paragraph
83
+ TEXT
84
+ ```
85
+ the result will be saved in first_example.html.
86
+
87
+ For more options, please try `pseudohiki2html.rb --help`
88
+
89
+ ### module PseudoHiki
90
+
91
+ If you save the lines below as a ruby script and execute it:
92
+
93
+ ```
94
+ #!/usr/bin/env ruby
95
+
96
+ require 'pseudohikiparser'
97
+
98
+ plain = <<TEXT
99
+ !! The first heading
100
+ The first paragraph
101
+ TEXT
102
+
103
+ tree = PseudoHiki::BlockParser.parse(plain.lines.to_a)
104
+ html = PseudoHiki::HtmlFormat.format(tree)
105
+ puts html
106
+ ```
107
+ you will get the following output:
108
+
109
+ ```
110
+ <div class="section h2">
111
+ <h2> The first heading
112
+ </h2>
113
+ <p>
114
+ The first paragraph
115
+ </p>
116
+ <!-- end of section h2 -->
117
+ </div>
118
+ ```
119
+
120
+ Other than PseudoHiki::HtmlFormat, you can choose PseudoHiki::XhtmlFormat, PseudoHiki::Xhtml5Format, PseudoHiki::PlainTextFormat.
121
+
122
+ ## Development status of features from the original [Hiki notation](http://hikiwiki.org/en/TextFormattingRules.html)
123
+
124
+ * Paragraphs - Usable
125
+ * Links
126
+ * WikiNames - Not supported (and would never be)
127
+ * Linking to other Wiki pages - Not supported as well
128
+ * Linking to an arbitrary URL - Maybe usable
129
+ * Preformatted text - Usable
130
+ * Text decoration - Partly supported
131
+ * Currently, there is no means of escaping tags for inline decorations.
132
+ * The notation with backquote tags(``) for inline literals is not supported.
133
+ * Headings - Usable
134
+ * Horizontal lines - Usable
135
+ * Lists - Usable
136
+ * Quotations - Usable
137
+ * Definitions - Usable
138
+ * Tables - Usable
139
+ * Comments - Usable
140
+ * Plugins - Not supported (and will not be compatible with the original one)
141
+
142
+ ## Additional Features
143
+ ### Already Implemented
144
+ #### Assigning ids
145
+ If you add [name_of_id], just after the marks that denote heading or list type items, it becomes the id attribute of resulting html elements. Below is an example.
146
+
147
+ ```
148
+ !![heading_id]heading
149
+
150
+ *[list_id]list
151
+ ```
152
+ will be rendered as
153
+
154
+ ```html
155
+ <div class="section h2">
156
+ <h2 id="HEADING_ID">heading
157
+ </h2>
158
+ <ul>
159
+ <li id="LIST_ID">list
160
+ </li>
161
+ </ul>
162
+ <!-- end of section h2 -->
163
+ </div>
164
+ ```
165
+
166
+ ### Partly Implemented
167
+ #### A visitor that removes markups and returns plain texts
168
+ The visitor, [PlainTextFormat](https://github.com/nico-hn/PseudoHikiParser/blob/develop/lib/pseudohiki/plaintextformat.rb) is currently available only in the [develop branch](https://github.com/nico-hn/PseudoHikiParser/tree/develop). Below are examples
169
+
170
+ ```
171
+ :tel:03-xxxx-xxxx
172
+ ::03-yyyy-yyyy
173
+ :fax:03-xxxx-xxxx
174
+ ```
175
+ will be rendered as
176
+
177
+ ```
178
+ tel: 03-xxxx-xxxx
179
+ 03-yyyy-yyyy
180
+ fax: 03-xxxx-xxxx
181
+ ```
182
+
183
+ And
184
+
185
+ ```
186
+ ||cell 1-1||>>cell 1-2,3,4||cell 1-5
187
+ ||cell 2-1||^>cell 2-2,3 3-2,3||cell 2-4||cell 2-5
188
+ ||cell 3-1||cell 3-4||cell 3-5
189
+ ||cell 4-1||cell 4-2||cell 4-3||cell 4-4||cell 4-5
190
+ ```
191
+ will be rendered as
192
+
193
+ ```
194
+ cell 1-1 cell 1-2,3,4 == == cell 1-5
195
+ cell 2-1 cell 2-2,3 3-2,3 == cell 2-4 cell 2-5
196
+ cell 3-1 || || cell 3-4 cell 3-5
197
+ cell 4-1 cell 4-2 cell 4-3 cell 4-4 cell 4-5
198
+ ```
199
+ #### A visitor for HTML5
200
+ The visitor, [Xhtml5Format](https://github.com/nico-hn/PseudoHikiParser/blob/develop/lib/pseudohiki/htmlformat.rb#L225) is currently available only in the [develop branch](https://github.com/nico-hn/PseudoHikiParser/tree/develop).
201
+
202
+
203
+ ### Not Implemented Yet
@@ -22,7 +22,8 @@ OPTIONS = {
22
22
  :template => nil,
23
23
  :output => nil,
24
24
  :force => false,
25
- :toc => nil
25
+ :toc => nil,
26
+ :split_main_heading => false
26
27
  }
27
28
 
28
29
  ENCODING_REGEXP = {
@@ -37,7 +38,7 @@ HTML_VERSIONS = %w(html4 xhtml1 html5)
37
38
  FILE_HEADER_PAT = /^(\xef\xbb\xbf)?\/\//
38
39
  WRITTEN_OPTION_PAT = {}
39
40
  OPTIONS.keys.each {|opt| WRITTEN_OPTION_PAT[opt] = /^(\xef\xbb\xbf)?\/\/#{opt}:\s*(.*)$/ }
40
- HEADING_WITH_ID_PAT = /^(!{2,3})\[([A-Za-z][0-9A-Za-z_\-.:]*)\]/o
41
+ HEADING_WITH_ID_PAT = /^(!{2,3})\[([A-Za-z][0-9A-Za-z_\-.:]*)\]\s*/o
41
42
 
42
43
  PlainFormat = PlainTextFormat.create
43
44
 
@@ -46,7 +47,12 @@ class InputManager
46
47
  @formatter ||= OPTIONS.html_template.new
47
48
  end
48
49
 
50
+ def to_plain(line)
51
+ PlainFormat.format(BlockParser.parse(line.lines.to_a)).to_s.chomp
52
+ end
53
+
49
54
  def create_table_of_contents(lines)
55
+ return "" unless OPTIONS[:toc]
50
56
  toc_lines = lines.grep(HEADING_WITH_ID_PAT).map do |line|
51
57
  m = HEADING_WITH_ID_PAT.match(line)
52
58
  heading_depth, id = m[1].length, m[2].upcase
@@ -55,7 +61,15 @@ class InputManager
55
61
  OPTIONS.formatter.format(BlockParser.parse(toc_lines))
56
62
  end
57
63
 
58
- def create_main(toc, body)
64
+ def split_main_heading(input_lines)
65
+ return "" unless OPTIONS[:split_main_heading]
66
+ h1_pos = input_lines.find_index {|line| /^![^!]/o =~ line }
67
+ return "" unless h1_pos
68
+ tree = BlockParser.parse([input_lines.delete_at(h1_pos)])
69
+ OPTIONS.formatter.format(tree)
70
+ end
71
+
72
+ def create_main(toc, body, h1)
59
73
  return nil unless OPTIONS[:toc]
60
74
  toc_container = formatter.create_element("section").tap do |element|
61
75
  element["id"] = "toc"
@@ -68,6 +82,7 @@ class InputManager
68
82
  end
69
83
  main = formatter.create_element("section").tap do |element|
70
84
  element["id"] = "main"
85
+ element.push h1 unless h1.empty?
71
86
  element.push toc_container
72
87
  element.push contents_container
73
88
  end
@@ -88,11 +103,12 @@ class InputManager
88
103
  end
89
104
 
90
105
  def compose_html(input_lines)
106
+ h1 = split_main_heading(input_lines)
91
107
  css = OPTIONS[:css]
92
108
  toc = create_table_of_contents(input_lines)
93
109
  body = compose_body(input_lines)
94
110
  title = OPTIONS.title
95
- main = create_main(toc,body)
111
+ main = create_main(toc,body, h1)
96
112
 
97
113
  if OPTIONS[:template]
98
114
  erb = ERB.new(OPTIONS.read_template_file)
@@ -107,10 +123,6 @@ class InputManager
107
123
  end
108
124
  end
109
125
 
110
- def to_plain(line)
111
- PlainFormat.format(BlockParser.parse(line.lines.to_a)).to_s.chomp
112
- end
113
-
114
126
  def win32?
115
127
  true if RUBY_PLATFORM =~ /win/i
116
128
  end
@@ -228,7 +240,7 @@ end
228
240
  OptionParser.new("** Convert texts written in a Hiki-like notation into HTML **
229
241
  USAGE: #{File.basename(__FILE__)} [options]") do |opt|
230
242
  opt.on("-h [html_version]", "--html_version [=html_version]",
231
- "HTML version to be used. Choose html4 or xhtml1 (default: #{OPTIONS[:html_version]})") do |version|
243
+ "HTML version to be used. Choose html4, xhtml1 or html5 (default: #{OPTIONS[:html_version]})") do |version|
232
244
  OPTIONS.set_html_version(version)
233
245
  end
234
246
 
@@ -254,7 +266,7 @@ USAGE: #{File.basename(__FILE__)} [options]") do |opt|
254
266
  end
255
267
 
256
268
  opt.on("-C [path_to_css_file]", "--embed-css [=path_to_css_file]",
257
- "Set the path to a css file to be used (default: not to embed)") do |path_to_css_file|
269
+ "Set the path to a css file to embed (default: not to embed)") do |path_to_css_file|
258
270
  OPTIONS[:embed_css] = path_to_css_file
259
271
  end
260
272
 
@@ -284,6 +296,11 @@ USAGE: #{File.basename(__FILE__)} [options]") do |opt|
284
296
  OPTIONS[:toc] = toc_title
285
297
  end
286
298
 
299
+ opt.on("-s", "--split-main-heading",
300
+ "Split the first h1 element") do |should_be_split|
301
+ OPTIONS[:split_main_heading] = should_be_split
302
+ end
303
+
287
304
  opt.parse!
288
305
  end
289
306
 
@@ -304,7 +321,7 @@ when 1
304
321
  OPTIONS.read_input_filename(ARGV[0])
305
322
  end
306
323
 
307
- input_lines = ARGF.lines.to_a
324
+ input_lines = ARGF.readlines
308
325
 
309
326
  OPTIONS.set_options_from_input_file(input_lines)
310
327
  OPTIONS.default_title = OPTIONS.input_file_basename
data/lib/htmlelement.rb CHANGED
@@ -4,9 +4,7 @@ require 'kconv'
4
4
 
5
5
  class HtmlElement
6
6
  class Children < Array
7
- def to_s
8
- self.join
9
- end
7
+ alias to_s join
10
8
  end
11
9
 
12
10
  module CHARSET
@@ -311,14 +311,12 @@ module PseudoHiki
311
311
  @stack.current_node.breakable?(breaker)
312
312
  end
313
313
 
314
+ def in_link_tag?(preceding_str)
315
+ preceding_str[-2,2] == "[[" or preceding_str[-1,1] == "|"
316
+ end
317
+
314
318
  def tagfy_link(line)
315
- line.gsub(URI_RE) do |url|
316
- unless ($`)[-2,2] == "[[" or ($`)[-1,1] == "|"
317
- "[[#{url}]]"
318
- else
319
- url
320
- end
321
- end
319
+ line.gsub(URI_RE) {|url| in_link_tag?($`) ? url : "[[#{url}]]" }
322
320
  end
323
321
 
324
322
  def select_leaf_type(line)
@@ -142,21 +142,15 @@ module PseudoHiki
142
142
  end
143
143
 
144
144
  def push(token)
145
- if self.empty?
146
- super(parse_first_token(token))
147
- else
148
- super(token)
149
- end
145
+ return super(token) unless self.empty?
146
+ super(parse_first_token(token))
150
147
  end
151
148
  end
152
149
 
153
150
  def treated_as_node_end(token)
154
- if token == TableSep
155
- self.pop
156
- return (self.push TableCellNode.new)
157
- end
158
-
159
- super(token)
151
+ return super(token) unless token == TableSep
152
+ self.pop
153
+ self.push TableCellNode.new
160
154
  end
161
155
 
162
156
  def parse
@@ -1,7 +1,6 @@
1
1
  #!/usr/bin/env ruby
2
2
 
3
3
  class TreeStack
4
-
5
4
  class NotLeafError < Exception; end
6
5
 
7
6
  module Mergeable; end
@@ -59,6 +58,7 @@ class TreeStack
59
58
  nil
60
59
  end
61
60
  end
61
+
62
62
  attr_reader :node_end, :last_leaf
63
63
 
64
64
  def initialize(root_node=Node.new)
@@ -1,3 +1,3 @@
1
1
  module PseudoHiki
2
- VERSION = "0.0.0.4.develop"
2
+ VERSION = "0.0.0.5.develop"
3
3
  end
@@ -64,6 +64,16 @@ TEXT
64
64
  @verbose_formatter.format(tree).to_s)
65
65
  end
66
66
 
67
+ def test_link_url2
68
+ text = <<TEXT
69
+ !![develepment_status] Development status of features from the original [[Hiki notation|http://hikiwiki.org/en/TextFormattingRules.html]]
70
+ TEXT
71
+ tree = BlockParser.parse(text.lines.to_a)
72
+ assert_equal(" Development status of features from the original Hiki notation\n", @formatter.format(tree).to_s)
73
+ assert_equal(" Development status of features from the original Hiki notation (http://hikiwiki.org/en/TextFormattingRules.html)\n",
74
+ @verbose_formatter.format(tree).to_s)
75
+ end
76
+
67
77
  def test_link_image
68
78
  text = <<TEXT
69
79
  A test string with an [[image|image.jpg]] is here.
metadata CHANGED
@@ -1,7 +1,7 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: pseudohikiparser
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.0.0.4.develop
4
+ version: 0.0.0.5.develop
5
5
  prerelease: 8
6
6
  platform: ruby
7
7
  authors:
@@ -9,7 +9,7 @@ authors:
9
9
  autorequire:
10
10
  bindir: bin
11
11
  cert_chain: []
12
- date: 2013-09-10 00:00:00.000000000 Z
12
+ date: 2013-10-19 00:00:00.000000000 Z
13
13
  dependencies:
14
14
  - !ruby/object:Gem::Dependency
15
15
  name: bundler
@@ -52,6 +52,8 @@ executables:
52
52
  extensions: []
53
53
  extra_rdoc_files: []
54
54
  files:
55
+ - README.md
56
+ - LICENSE
55
57
  - lib/pseudohikiparser.rb
56
58
  - lib/pseudohiki/treestack.rb
57
59
  - lib/pseudohiki/inlineparser.rb
@@ -71,9 +73,9 @@ files:
71
73
  - test/test_htmlformat.rb
72
74
  - test/test_htmlplugin.rb
73
75
  - bin/pseudohiki2html.rb
74
- homepage: https://github.com/hashimoto-naoki/PseudoHikiParser/wiki
76
+ homepage: https://github.com/nico-hn/PseudoHikiParser/wiki
75
77
  licenses:
76
- - Not decided yet
78
+ - BSD 2-Clause license
77
79
  post_install_message:
78
80
  rdoc_options: []
79
81
  require_paths: