html2textile 1.0.0.beta1
Sign up to get free protection for your applications and to get access to all the features.
- data/README.mdown +19 -0
- data/example.rb +35 -0
- data/lib/html2textile.rb +255 -0
- data/lib/sgml_parser.rb +333 -0
- metadata +75 -0
data/README.mdown
ADDED
@@ -0,0 +1,19 @@
|
|
1
|
+
# HTML2Textile #
|
2
|
+
|
3
|
+
A quick and simple way to convert HTML to Textile.
|
4
|
+
|
5
|
+
parser = HTMLToTextileParser.new
|
6
|
+
parser.feed(your_html)
|
7
|
+
puts parser.to_textile
|
8
|
+
|
9
|
+
## Introduction From 2007 ##
|
10
|
+
|
11
|
+
One of the many tricky decisions to be made when building content management tools is how to allow users to control the basic formatting of their input without breaking your carefully crafted layouts or injecting nasty hacks into your pages. One approach has long been to provide your own markup language. Instead of allowing users to write HTML, let them use bbcode, or markdown, or textile, which have more controlled vocabularies and rules that mean it’s much less likely that problems will occur.
|
12
|
+
|
13
|
+
Textile in particular has a nice simple syntax and is increasingly popular thanks to its adoption in products like those of 37signals. In Ruby, there’s the RedCloth library which makes it fast and easy to convert textile to HTML. The one problem is if you already have a body of user generated HTML in your legacy system that needs converting. That’s the situation I found myself in this week and I quickly needed a tool to translate the content so that I could get on with the more interesting parts of the system.
|
14
|
+
|
15
|
+
Searching for options, the ClothRed library which offers some translation, but it doesn’t handle important elements like links. I considered patching it to handle the elements I need, but in the end I decided to take a different approach and used the SGML parsing library found here to port a python html2textile parser.
|
16
|
+
|
17
|
+
Porting code from python to ruby is a pretty straightforward process as the language’s are so similar on a number of levels, but there were several issues to work through, particularly relating to scoping, and quite a few methods to change to make them feel a little more ruby-ish. I’ve not converted all of the entity handling as I didn’t really need it, but there might be a bit of work to do in making sure character set issues are properly taken care of.
|
18
|
+
|
19
|
+
The end result is a piece of code that’s now served its purpose and that I’m unlikely to need again for quite a while. It’s not something that I’m particularly proud of, it could almost certainly be implemented more neatly, but I thought I’d throw it out there in case it could be useful to someone else. Should you be inspired to take it and twist it and turn it into a well-heeled, more robust and properly distributable solution, feel free, but please let me know so that at the very least I can update this entry.
|
data/example.rb
ADDED
@@ -0,0 +1,35 @@
|
|
1
|
+
require 'html2textile'
|
2
|
+
|
3
|
+
first_block = <<END
|
4
|
+
<div class="column span-3">
|
5
|
+
<h3 class="storytitle entry-title" id="post-312">
|
6
|
+
<a href="http://jystewart.net/process/2007/11/converting-html-to-textile-with-ruby/" rel="bookmark">Converting HTML to Textile with Ruby</a>
|
7
|
+
</h3>
|
8
|
+
|
9
|
+
<p>
|
10
|
+
<span>23 November 2007</span>
|
11
|
+
(<abbr class="updated" title="2007-11-23T19:51:54+00:00">7:51 pm</abbr>)
|
12
|
+
</p>
|
13
|
+
|
14
|
+
<p>
|
15
|
+
By <span class="author vcard fn">James Stewart</span>
|
16
|
+
<br />filed under:
|
17
|
+
<a href="http://jystewart.net/process/category/snippets/" title="View all posts in Snippets" rel="category tag">Snippets</a>
|
18
|
+
<br />tagged: <a href="http://jystewart.net/process/tag/content-management/" rel="tag">content management</a>,
|
19
|
+
<a href="http://jystewart.net/process/tag/conversion/" rel="tag">conversion</a>,
|
20
|
+
<a href="http://jystewart.net/process/tag/html/" rel="tag">html</a>,
|
21
|
+
<a href="http://jystewart.net/process/tag/python/" rel="tag">Python</a>,
|
22
|
+
<a href="http://jystewart.net/process/tag/ruby/" rel="tag">ruby</a>,
|
23
|
+
<a href="http://jystewart.net/process/tag/textile/" rel="tag">textile</a>
|
24
|
+
</p>
|
25
|
+
|
26
|
+
|
27
|
+
<div class="feedback">
|
28
|
+
<script src="http://feeds.feedburner.com/~s/jystewart/iLiN?i=http://jystewart.net/process/2007/11/converting-html-to-textile-with-ruby/" type="text/javascript" charset="utf-8"></script>
|
29
|
+
</div>
|
30
|
+
</div>
|
31
|
+
END
|
32
|
+
|
33
|
+
parser = HTMLToTextileParser.new
|
34
|
+
parser.feed(first_block)
|
35
|
+
puts parser.to_textile
|
data/lib/html2textile.rb
ADDED
@@ -0,0 +1,255 @@
|
|
1
|
+
require 'sgml_parser'
|
2
|
+
|
3
|
+
# A class to convert HTML to textile. Based on the python parser
|
4
|
+
# found at http://aftnn.org/content/code/html2textile/
|
5
|
+
#
|
6
|
+
# Read more at http://jystewart.net/process/2007/11/converting-html-to-textile-with-ruby
|
7
|
+
#
|
8
|
+
# Author:: James Stewart (mailto:james@ketlai.co.uk)
|
9
|
+
# Copyright:: Copyright (c) 2010 James Stewart
|
10
|
+
# License:: Distributes under the same terms as Ruby
|
11
|
+
|
12
|
+
# This class is an implementation of an SgmlParser designed to convert
|
13
|
+
# HTML to textile.
|
14
|
+
#
|
15
|
+
# Example usage:
|
16
|
+
# parser = HTMLToTextileParser.new
|
17
|
+
# parser.feed(input_html)
|
18
|
+
# puts parser.to_textile
|
19
|
+
class HTMLToTextileParser < SgmlParser
|
20
|
+
|
21
|
+
attr_accessor :result
|
22
|
+
attr_accessor :in_block
|
23
|
+
attr_accessor :data_stack
|
24
|
+
attr_accessor :a_href
|
25
|
+
attr_accessor :in_ul
|
26
|
+
attr_accessor :in_ol
|
27
|
+
|
28
|
+
@@permitted_tags = []
|
29
|
+
@@permitted_attrs = []
|
30
|
+
|
31
|
+
def initialize(verbose=nil)
|
32
|
+
@output = String.new
|
33
|
+
self.in_block = false
|
34
|
+
self.result = []
|
35
|
+
self.data_stack = []
|
36
|
+
super(verbose)
|
37
|
+
end
|
38
|
+
|
39
|
+
# Normalise space in the same manner as HTML. Any substring of multiple
|
40
|
+
# whitespace characters will be replaced with a single space char.
|
41
|
+
def normalise_space(s)
|
42
|
+
s.to_s.gsub(/\s+/x, ' ')
|
43
|
+
end
|
44
|
+
|
45
|
+
def build_styles_ids_and_classes(attributes)
|
46
|
+
idclass = ''
|
47
|
+
idclass += attributes['class'] if attributes.has_key?('class')
|
48
|
+
idclass += "\##{attributes['id']}" if attributes.has_key?('id')
|
49
|
+
idclass = "(#{idclass})" if idclass != ''
|
50
|
+
|
51
|
+
style = attributes.has_key?('style') ? "{#{attributes['style']}}" : ""
|
52
|
+
"#{idclass}#{style}"
|
53
|
+
end
|
54
|
+
|
55
|
+
def make_block_start_pair(tag, attributes)
|
56
|
+
attributes = attrs_to_hash(attributes)
|
57
|
+
class_style = build_styles_ids_and_classes(attributes)
|
58
|
+
write("#{tag}#{class_style}. ")
|
59
|
+
start_capture(tag)
|
60
|
+
end
|
61
|
+
|
62
|
+
def make_block_end_pair
|
63
|
+
stop_capture_and_write
|
64
|
+
write("\n\n")
|
65
|
+
end
|
66
|
+
|
67
|
+
def make_quicktag_start_pair(tag, wrapchar, attributes)
|
68
|
+
attributes = attrs_to_hash(attributes)
|
69
|
+
class_style = build_styles_ids_and_classes(attributes)
|
70
|
+
write([" ", "#{wrapchar}#{class_style}"])
|
71
|
+
start_capture(tag)
|
72
|
+
end
|
73
|
+
|
74
|
+
def make_quicktag_end_pair(wrapchar)
|
75
|
+
stop_capture_and_write
|
76
|
+
write([wrapchar, " "])
|
77
|
+
end
|
78
|
+
|
79
|
+
def write(d)
|
80
|
+
if self.data_stack.size < 2
|
81
|
+
self.result += d.to_a
|
82
|
+
else
|
83
|
+
self.data_stack[-1] += d.to_a
|
84
|
+
end
|
85
|
+
end
|
86
|
+
|
87
|
+
def start_capture(tag)
|
88
|
+
self.in_block = tag
|
89
|
+
self.data_stack.push([])
|
90
|
+
end
|
91
|
+
|
92
|
+
def stop_capture_and_write
|
93
|
+
self.in_block = false
|
94
|
+
self.write(self.data_stack.pop)
|
95
|
+
end
|
96
|
+
|
97
|
+
def handle_data(data)
|
98
|
+
write(normalise_space(data).strip) unless data.nil? or data == ''
|
99
|
+
end
|
100
|
+
|
101
|
+
%w[1 2 3 4 5 6].each do |num|
|
102
|
+
define_method "start_h#{num}" do |attributes|
|
103
|
+
make_block_start_pair("h#{num}", attributes)
|
104
|
+
end
|
105
|
+
|
106
|
+
define_method "end_h#{num}" do
|
107
|
+
make_block_end_pair
|
108
|
+
end
|
109
|
+
end
|
110
|
+
|
111
|
+
PAIRS = { 'blockquote' => 'bq', 'p' => 'p' }
|
112
|
+
QUICKTAGS = { 'b' => '*', 'strong' => '*',
|
113
|
+
'i' => '_', 'em' => '_', 'cite' => '??', 's' => '-',
|
114
|
+
'sup' => '^', 'sub' => '~', 'code' => '@', 'span' => '%'}
|
115
|
+
|
116
|
+
PAIRS.each do |key, value|
|
117
|
+
define_method "start_#{key}" do |attributes|
|
118
|
+
make_block_start_pair(value, attributes)
|
119
|
+
end
|
120
|
+
|
121
|
+
define_method "end_#{key}" do
|
122
|
+
make_block_end_pair
|
123
|
+
end
|
124
|
+
end
|
125
|
+
|
126
|
+
QUICKTAGS.each do |key, value|
|
127
|
+
define_method "start_#{key}" do |attributes|
|
128
|
+
make_quicktag_start_pair(key, value, attributes)
|
129
|
+
end
|
130
|
+
|
131
|
+
define_method "end_#{key}" do
|
132
|
+
make_quicktag_end_pair(value)
|
133
|
+
end
|
134
|
+
end
|
135
|
+
|
136
|
+
def start_ol(attrs)
|
137
|
+
self.in_ol = true
|
138
|
+
end
|
139
|
+
|
140
|
+
def end_ol
|
141
|
+
self.in_ol = false
|
142
|
+
write("\n")
|
143
|
+
end
|
144
|
+
|
145
|
+
def start_ul(attrs)
|
146
|
+
self.in_ul = true
|
147
|
+
end
|
148
|
+
|
149
|
+
def end_ul
|
150
|
+
self.in_ul = false
|
151
|
+
write("\n")
|
152
|
+
end
|
153
|
+
|
154
|
+
def start_li(attrs)
|
155
|
+
if self.in_ol
|
156
|
+
write("# ")
|
157
|
+
else
|
158
|
+
write("* ")
|
159
|
+
end
|
160
|
+
|
161
|
+
start_capture("li")
|
162
|
+
end
|
163
|
+
|
164
|
+
def end_li
|
165
|
+
stop_capture_and_write
|
166
|
+
write("\n")
|
167
|
+
end
|
168
|
+
|
169
|
+
def start_a(attrs)
|
170
|
+
attrs = attrs_to_hash(attrs)
|
171
|
+
self.a_href = attrs['href']
|
172
|
+
|
173
|
+
if self.a_href:
|
174
|
+
write(" \"")
|
175
|
+
start_capture("a")
|
176
|
+
end
|
177
|
+
end
|
178
|
+
|
179
|
+
def end_a
|
180
|
+
if self.a_href:
|
181
|
+
stop_capture_and_write
|
182
|
+
write(["\":", self.a_href, " "])
|
183
|
+
self.a_href = false
|
184
|
+
end
|
185
|
+
end
|
186
|
+
|
187
|
+
def attrs_to_hash(array)
|
188
|
+
array.inject({}) { |collection, part| collection[part[0].downcase] = part[1]; collection }
|
189
|
+
end
|
190
|
+
|
191
|
+
def start_img(attrs)
|
192
|
+
attrs = attrs_to_hash(attrs)
|
193
|
+
write([" !", attrs["src"], "! "])
|
194
|
+
end
|
195
|
+
|
196
|
+
def end_img
|
197
|
+
end
|
198
|
+
|
199
|
+
def start_tr(attrs)
|
200
|
+
end
|
201
|
+
|
202
|
+
def end_tr
|
203
|
+
write("|\n")
|
204
|
+
end
|
205
|
+
|
206
|
+
def start_td(attrs)
|
207
|
+
write("|")
|
208
|
+
start_capture("td")
|
209
|
+
end
|
210
|
+
|
211
|
+
def end_td
|
212
|
+
stop_capture_and_write
|
213
|
+
write("|")
|
214
|
+
end
|
215
|
+
|
216
|
+
def start_br(attrs)
|
217
|
+
write("\n")
|
218
|
+
end
|
219
|
+
|
220
|
+
def unknown_starttag(tag, attrs)
|
221
|
+
if @@permitted_tags.include?(tag)
|
222
|
+
write(["<", tag])
|
223
|
+
attrs.each do |key, value|
|
224
|
+
if @@permitted_attributes.include?(key)
|
225
|
+
write([" ", key, "=\"", value, "\""])
|
226
|
+
end
|
227
|
+
end
|
228
|
+
end
|
229
|
+
end
|
230
|
+
|
231
|
+
def unknown_endtag(tag)
|
232
|
+
if @@permitted_tags.include?(tag)
|
233
|
+
write(["</", tag, ">"])
|
234
|
+
end
|
235
|
+
end
|
236
|
+
|
237
|
+
# Return the textile after processing
|
238
|
+
def to_textile
|
239
|
+
result.join
|
240
|
+
end
|
241
|
+
|
242
|
+
# UNCONVERTED PYTHON METHODS
|
243
|
+
#
|
244
|
+
# def handle_charref(self, tag):
|
245
|
+
# self._write(unichr(int(tag)))
|
246
|
+
#
|
247
|
+
# def handle_entityref(self, tag):
|
248
|
+
# if self.entitydefs.has_key(tag):
|
249
|
+
# self._write(self.entitydefs[tag])
|
250
|
+
#
|
251
|
+
# def handle_starttag(self, tag, method, attrs):
|
252
|
+
# method(dict(attrs))
|
253
|
+
#
|
254
|
+
|
255
|
+
end
|
data/lib/sgml_parser.rb
ADDED
@@ -0,0 +1,333 @@
|
|
1
|
+
# A parser for SGML, using the derived class as static DTD.
|
2
|
+
|
3
|
+
class SgmlParser
|
4
|
+
|
5
|
+
# Regular expressions used for parsing:
|
6
|
+
Interesting = /[&<]/
|
7
|
+
Incomplete = Regexp.compile('&([a-zA-Z][a-zA-Z0-9]*|#[0-9]*)?|' +
|
8
|
+
'<([a-zA-Z][^<>]*|/([a-zA-Z][^<>]*)?|' +
|
9
|
+
'![^<>]*)?')
|
10
|
+
|
11
|
+
Entityref = /&([a-zA-Z][-.a-zA-Z0-9]*)[^-.a-zA-Z0-9]/
|
12
|
+
Charref = /&#([0-9]+)[^0-9]/
|
13
|
+
|
14
|
+
Starttagopen = /<[>a-zA-Z]/
|
15
|
+
Endtagopen = /<\/[<>a-zA-Z]/
|
16
|
+
Endbracket = /[<>]/
|
17
|
+
Special = /<![^<>]*>/
|
18
|
+
Commentopen = /<!--/
|
19
|
+
Commentclose = /--[ \t\n]*>/
|
20
|
+
Tagfind = /[a-zA-Z][a-zA-Z0-9.-]*/
|
21
|
+
Attrfind = Regexp.compile('[\s,]*([a-zA-Z_][a-zA-Z_0-9.-]*)' +
|
22
|
+
'(\s*=\s*' +
|
23
|
+
"('[^']*'" +
|
24
|
+
'|"[^"]*"' +
|
25
|
+
'|[-~a-zA-Z0-9,./:+*%?!()_#=]*))?')
|
26
|
+
|
27
|
+
Entitydefs =
|
28
|
+
{'lt'=>'<', 'gt'=>'>', 'amp'=>'&', 'quot'=>'"', 'apos'=>'\''}
|
29
|
+
|
30
|
+
def initialize(verbose=false)
|
31
|
+
@verbose = verbose
|
32
|
+
reset
|
33
|
+
end
|
34
|
+
|
35
|
+
def reset
|
36
|
+
@rawdata = ''
|
37
|
+
@stack = []
|
38
|
+
@lasttag = '???'
|
39
|
+
@nomoretags = false
|
40
|
+
@literal = false
|
41
|
+
end
|
42
|
+
|
43
|
+
def has_context(gi)
|
44
|
+
@stack.include? gi
|
45
|
+
end
|
46
|
+
|
47
|
+
def setnomoretags
|
48
|
+
@nomoretags = true
|
49
|
+
@literal = true
|
50
|
+
end
|
51
|
+
|
52
|
+
def setliteral(*args)
|
53
|
+
@literal = true
|
54
|
+
end
|
55
|
+
|
56
|
+
def feed(data)
|
57
|
+
@rawdata << data
|
58
|
+
goahead(false)
|
59
|
+
end
|
60
|
+
|
61
|
+
def close
|
62
|
+
goahead(true)
|
63
|
+
end
|
64
|
+
|
65
|
+
def goahead(_end)
|
66
|
+
rawdata = @rawdata
|
67
|
+
i = 0
|
68
|
+
n = rawdata.length
|
69
|
+
while i < n
|
70
|
+
if @nomoretags
|
71
|
+
handle_data(rawdata[i..(n-1)])
|
72
|
+
i = n
|
73
|
+
break
|
74
|
+
end
|
75
|
+
j = rawdata.index(Interesting, i)
|
76
|
+
j = n unless j
|
77
|
+
if i < j
|
78
|
+
handle_data(rawdata[i..(j-1)])
|
79
|
+
end
|
80
|
+
i = j
|
81
|
+
break if (i == n)
|
82
|
+
if rawdata[i] == ?< #
|
83
|
+
if rawdata.index(Starttagopen, i) == i
|
84
|
+
if @literal
|
85
|
+
handle_data(rawdata[i, 1])
|
86
|
+
i += 1
|
87
|
+
next
|
88
|
+
end
|
89
|
+
k = parse_starttag(i)
|
90
|
+
break unless k
|
91
|
+
i = k
|
92
|
+
next
|
93
|
+
end
|
94
|
+
if rawdata.index(Endtagopen, i) == i
|
95
|
+
k = parse_endtag(i)
|
96
|
+
break unless k
|
97
|
+
i = k
|
98
|
+
@literal = false
|
99
|
+
next
|
100
|
+
end
|
101
|
+
if rawdata.index(Commentopen, i) == i
|
102
|
+
if @literal
|
103
|
+
handle_data(rawdata[i,1])
|
104
|
+
i += 1
|
105
|
+
next
|
106
|
+
end
|
107
|
+
k = parse_comment(i)
|
108
|
+
break unless k
|
109
|
+
i += k
|
110
|
+
next
|
111
|
+
end
|
112
|
+
if rawdata.index(Special, i) == i
|
113
|
+
if @literal
|
114
|
+
handle_data(rawdata[i, 1])
|
115
|
+
i += 1
|
116
|
+
next
|
117
|
+
end
|
118
|
+
k = parse_special(i)
|
119
|
+
break unless k
|
120
|
+
i += k
|
121
|
+
next
|
122
|
+
end
|
123
|
+
elsif rawdata[i] == ?& #
|
124
|
+
if rawdata.index(Charref, i) == i
|
125
|
+
i += $&.length
|
126
|
+
handle_charref($1)
|
127
|
+
i -= 1 unless rawdata[i-1] == ?;
|
128
|
+
next
|
129
|
+
end
|
130
|
+
if rawdata.index(Entityref, i) == i
|
131
|
+
i += $&.length
|
132
|
+
handle_entityref($1)
|
133
|
+
i -= 1 unless rawdata[i-1] == ?;
|
134
|
+
next
|
135
|
+
end
|
136
|
+
else
|
137
|
+
raise RuntimeError, 'neither < nor & ??'
|
138
|
+
end
|
139
|
+
# We get here only if incomplete matches but
|
140
|
+
# nothing else
|
141
|
+
match = rawdata.index(Incomplete, i)
|
142
|
+
unless match == i
|
143
|
+
handle_data(rawdata[i, 1])
|
144
|
+
i += 1
|
145
|
+
next
|
146
|
+
end
|
147
|
+
j = match + $&.length
|
148
|
+
break if j == n # Really incomplete
|
149
|
+
handle_data(rawdata[i..(j-1)])
|
150
|
+
i = j
|
151
|
+
end
|
152
|
+
# end while
|
153
|
+
if _end and i < n
|
154
|
+
handle_data(@rawdata[i..(n-1)])
|
155
|
+
i = n
|
156
|
+
end
|
157
|
+
@rawdata = rawdata[i..-1]
|
158
|
+
end
|
159
|
+
|
160
|
+
def parse_comment(i)
|
161
|
+
rawdata = @rawdata
|
162
|
+
if rawdata[i, 4] != '<!--'
|
163
|
+
raise RuntimeError, 'unexpected call to handle_comment'
|
164
|
+
end
|
165
|
+
match = rawdata.index(Commentclose, i)
|
166
|
+
return nil unless match
|
167
|
+
matched_length = $&.length
|
168
|
+
j = match
|
169
|
+
handle_comment(rawdata[i+4..(j-1)])
|
170
|
+
j = match + matched_length
|
171
|
+
return j-i
|
172
|
+
end
|
173
|
+
|
174
|
+
def parse_starttag(i)
|
175
|
+
rawdata = @rawdata
|
176
|
+
j = rawdata.index(Endbracket, i + 1)
|
177
|
+
return nil unless j
|
178
|
+
attrs = []
|
179
|
+
if rawdata[i+1] == ?> #
|
180
|
+
# SGML shorthand: <> == <last open tag seen>
|
181
|
+
k = j
|
182
|
+
tag = @lasttag
|
183
|
+
else
|
184
|
+
match = rawdata.index(Tagfind, i + 1)
|
185
|
+
unless match
|
186
|
+
raise RuntimeError, 'unexpected call to parse_starttag'
|
187
|
+
end
|
188
|
+
k = i + 1 + ($&.length)
|
189
|
+
tag = $&.downcase
|
190
|
+
@lasttag = tag
|
191
|
+
end
|
192
|
+
while k < j
|
193
|
+
break unless rawdata.index(Attrfind, k)
|
194
|
+
matched_length = $&.length
|
195
|
+
attrname, rest, attrvalue = $1, $2, $3
|
196
|
+
if not rest
|
197
|
+
attrvalue = '' # was: = attrname
|
198
|
+
elsif (attrvalue[0] == ?' && attrvalue[-1] == ?') or
|
199
|
+
(attrvalue[0] == ?" && attrvalue[-1] == ?")
|
200
|
+
attrvalue = attrvalue[1..-2]
|
201
|
+
end
|
202
|
+
attrs << [attrname.downcase, attrvalue]
|
203
|
+
k += matched_length
|
204
|
+
end
|
205
|
+
if rawdata[j] == ?> #
|
206
|
+
j += 1
|
207
|
+
end
|
208
|
+
finish_starttag(tag, attrs)
|
209
|
+
return j
|
210
|
+
end
|
211
|
+
|
212
|
+
def parse_endtag(i)
|
213
|
+
rawdata = @rawdata
|
214
|
+
j = rawdata.index(Endbracket, i + 1)
|
215
|
+
return nil unless j
|
216
|
+
tag = (rawdata[i+2..j-1].strip).downcase
|
217
|
+
if rawdata[j] == ?> #
|
218
|
+
j += 1
|
219
|
+
end
|
220
|
+
finish_endtag(tag)
|
221
|
+
return j
|
222
|
+
end
|
223
|
+
|
224
|
+
def finish_starttag(tag, attrs)
|
225
|
+
method = 'start_' + tag
|
226
|
+
if self.respond_to?(method)
|
227
|
+
@stack << tag
|
228
|
+
handle_starttag(tag, method, attrs)
|
229
|
+
return 1
|
230
|
+
else
|
231
|
+
method = 'do_' + tag
|
232
|
+
if self.respond_to?(method)
|
233
|
+
handle_starttag(tag, method, attrs)
|
234
|
+
return 0
|
235
|
+
else
|
236
|
+
unknown_starttag(tag, attrs)
|
237
|
+
return -1
|
238
|
+
end
|
239
|
+
end
|
240
|
+
end
|
241
|
+
|
242
|
+
def finish_endtag(tag)
|
243
|
+
if tag == ''
|
244
|
+
found = @stack.length - 1
|
245
|
+
if found < 0
|
246
|
+
unknown_endtag(tag)
|
247
|
+
return
|
248
|
+
end
|
249
|
+
else
|
250
|
+
unless @stack.include? tag
|
251
|
+
method = 'end_' + tag
|
252
|
+
unless self.respond_to?(method)
|
253
|
+
unknown_endtag(tag)
|
254
|
+
end
|
255
|
+
return
|
256
|
+
end
|
257
|
+
found = @stack.index(tag) #or @stack.length
|
258
|
+
end
|
259
|
+
while @stack.length > found
|
260
|
+
tag = @stack[-1]
|
261
|
+
method = 'end_' + tag
|
262
|
+
if respond_to?(method)
|
263
|
+
handle_endtag(tag, method)
|
264
|
+
else
|
265
|
+
unknown_endtag(tag)
|
266
|
+
end
|
267
|
+
@stack.pop
|
268
|
+
end
|
269
|
+
end
|
270
|
+
|
271
|
+
def parse_special(i)
|
272
|
+
rawdata = @rawdata
|
273
|
+
match = rawdata.index(Endbracket, i+1)
|
274
|
+
return nil unless match
|
275
|
+
matched_length = $&.length
|
276
|
+
handle_special(rawdata[i+1..(match-1)])
|
277
|
+
return match - i + matched_length
|
278
|
+
end
|
279
|
+
|
280
|
+
def handle_starttag(tag, method, attrs)
|
281
|
+
self.send(method, attrs)
|
282
|
+
end
|
283
|
+
|
284
|
+
def handle_endtag(tag, method)
|
285
|
+
self.send(method)
|
286
|
+
end
|
287
|
+
|
288
|
+
def report_unbalanced(tag)
|
289
|
+
if @verbose
|
290
|
+
print '*** Unbalanced </' + tag + '>', "\n"
|
291
|
+
print '*** Stack:', self.stack, "\n"
|
292
|
+
end
|
293
|
+
end
|
294
|
+
|
295
|
+
def handle_charref(name)
|
296
|
+
n = Integer(name)
|
297
|
+
if !(0 <= n && n <= 255)
|
298
|
+
unknown_charref(name)
|
299
|
+
return
|
300
|
+
end
|
301
|
+
handle_data(n.chr)
|
302
|
+
end
|
303
|
+
|
304
|
+
def handle_entityref(name)
|
305
|
+
table = Entitydefs
|
306
|
+
if table.include?(name)
|
307
|
+
handle_data(table[name])
|
308
|
+
else
|
309
|
+
unknown_entityref(name)
|
310
|
+
return
|
311
|
+
end
|
312
|
+
end
|
313
|
+
|
314
|
+
def handle_data(data)
|
315
|
+
end
|
316
|
+
|
317
|
+
def handle_comment(data)
|
318
|
+
end
|
319
|
+
|
320
|
+
def handle_special(data)
|
321
|
+
end
|
322
|
+
|
323
|
+
def unknown_starttag(tag, attrs)
|
324
|
+
end
|
325
|
+
def unknown_endtag(tag)
|
326
|
+
end
|
327
|
+
def unknown_charref(ref)
|
328
|
+
end
|
329
|
+
def unknown_entityref(ref)
|
330
|
+
end
|
331
|
+
|
332
|
+
end
|
333
|
+
|
metadata
ADDED
@@ -0,0 +1,75 @@
|
|
1
|
+
--- !ruby/object:Gem::Specification
|
2
|
+
name: html2textile
|
3
|
+
version: !ruby/object:Gem::Version
|
4
|
+
hash: -1848230051
|
5
|
+
prerelease: true
|
6
|
+
segments:
|
7
|
+
- 1
|
8
|
+
- 0
|
9
|
+
- 0
|
10
|
+
- beta1
|
11
|
+
version: 1.0.0.beta1
|
12
|
+
platform: ruby
|
13
|
+
authors:
|
14
|
+
- James Stewart
|
15
|
+
autorequire:
|
16
|
+
bindir: bin
|
17
|
+
cert_chain: []
|
18
|
+
|
19
|
+
date: 2011-05-05 00:00:00 +10:00
|
20
|
+
default_executable:
|
21
|
+
dependencies: []
|
22
|
+
|
23
|
+
description: Provides an SGML parser to convert HTML into the Textile format
|
24
|
+
email: james@ketlai.co.uk
|
25
|
+
executables: []
|
26
|
+
|
27
|
+
extensions: []
|
28
|
+
|
29
|
+
extra_rdoc_files: []
|
30
|
+
|
31
|
+
files:
|
32
|
+
- lib/html2textile.rb
|
33
|
+
- lib/sgml_parser.rb
|
34
|
+
- example.rb
|
35
|
+
- README.mdown
|
36
|
+
has_rdoc: true
|
37
|
+
homepage: http://jystewart.net/process/2007/11/converting-html-to-textile-with-ruby
|
38
|
+
licenses: []
|
39
|
+
|
40
|
+
post_install_message:
|
41
|
+
rdoc_options: []
|
42
|
+
|
43
|
+
require_paths:
|
44
|
+
- lib
|
45
|
+
required_ruby_version: !ruby/object:Gem::Requirement
|
46
|
+
none: false
|
47
|
+
requirements:
|
48
|
+
- - ">="
|
49
|
+
- !ruby/object:Gem::Version
|
50
|
+
hash: 57
|
51
|
+
segments:
|
52
|
+
- 1
|
53
|
+
- 8
|
54
|
+
- 7
|
55
|
+
version: 1.8.7
|
56
|
+
required_rubygems_version: !ruby/object:Gem::Requirement
|
57
|
+
none: false
|
58
|
+
requirements:
|
59
|
+
- - ">="
|
60
|
+
- !ruby/object:Gem::Version
|
61
|
+
hash: 23
|
62
|
+
segments:
|
63
|
+
- 1
|
64
|
+
- 3
|
65
|
+
- 6
|
66
|
+
version: 1.3.6
|
67
|
+
requirements: []
|
68
|
+
|
69
|
+
rubyforge_project:
|
70
|
+
rubygems_version: 1.3.7
|
71
|
+
signing_key:
|
72
|
+
specification_version: 3
|
73
|
+
summary: Converter from HTML to Textile
|
74
|
+
test_files: []
|
75
|
+
|