fotonauts-premailer 1.7.3

Sign up to get free protection for your applications and to get access to all the features.
data/.gitignore ADDED
@@ -0,0 +1,7 @@
1
+ .DS_Store
2
+ *.gem
3
+ Gemfile.lock
4
+ bin/*.html
5
+ html/
6
+ vendor/
7
+ rdoc/
data/.travis.yml ADDED
@@ -0,0 +1,8 @@
1
+ notifications:
2
+ disabled: true
3
+ rvm:
4
+ - 1.8.7
5
+ - 1.9.2
6
+ - 1.9.3
7
+ - jruby
8
+ - ree
data/Gemfile ADDED
@@ -0,0 +1,9 @@
1
+ source :rubygems
2
+ gem 'css_parser', :git => 'git://github.com/alexdunae/css_parser.git'
3
+ gem 'webmock', :group => [:development, :test]
4
+
5
+ platforms :jruby do
6
+ gem 'jruby-openssl'
7
+ end
8
+
9
+ gemspec
data/LICENSE.rdoc ADDED
@@ -0,0 +1,11 @@
1
+ = Premailer License
2
+
3
+ Copyright (c) 2007-2011, Alex Dunae. All rights reserved.
4
+
5
+ Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:
6
+
7
+ * Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.
8
+ * Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.
9
+ * Neither the name of Premailer, Alex Dunae nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission.
10
+
11
+ THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
data/README.rdoc ADDED
@@ -0,0 +1,89 @@
1
+ = Premailer README
2
+
3
+ === What is this?
4
+
5
+ For the best HTML e-mail delivery results, CSS should be inline. This is a
6
+ huge pain and a simple newsletter becomes un-managable very quickly. This
7
+ script is my solution.
8
+
9
+ * CSS styles are converted to inline style attributes
10
+ Checks <tt>style</tt> and <tt>link[rel=stylesheet]</tt> tags and preserves existing inline attributes
11
+ * Relative paths are converted to absolute paths
12
+ Checks links in <tt>href</tt>, <tt>src</tt> and CSS <tt>url('')</tt>
13
+ * CSS properties are checked against e-mail client capabilities
14
+ Based on the Email Standards Project's guides
15
+ * A plain text version is created
16
+ Optional
17
+
18
+ === Premailer 2.0 is coming
19
+
20
+ I'm looking for input on a version 2.0 update to Premailer. PLease visit the {Premailer 2.0 Planning Page}[https://github.com/alexdunae/premailer/wiki/Premailer-2.0-Planning] and give me your feedback.
21
+
22
+ === Installation
23
+
24
+ Download the Premailer gem from RubyGems.
25
+
26
+ sudo gem install premailer
27
+
28
+ === Example
29
+ premailer = Premailer.new('http://example.com/myfile.html', :warn_level => Premailer::Warnings::SAFE)
30
+
31
+ # Write the HTML output
32
+ fout = File.open("output.html", "w")
33
+ fout.puts premailer.to_inline_css
34
+ fout.close
35
+
36
+ # Write the plain-text output
37
+ fout = File.open("ouput.txt", "w")
38
+ fout.puts premailer.to_plain_text
39
+ fout.close
40
+
41
+ # Output any CSS warnings
42
+ premailer.warnings.each do |w|
43
+ puts "#{w[:message]} (#{w[:level]}) may not render properly in #{w[:clients]}"
44
+ end
45
+
46
+ === Ruby Compatibility
47
+
48
+ Premailer is tested on Ruby 1.8.7, Ruby 1.9.2 and Ruby 1.9.3 (preview 1). It also works on REE. JRuby support is close; contributors are welcome. Checkout the latest build status on the {Travis CI dashboard}[http://travis-ci.org/#!/alexdunae/premailer].
49
+
50
+ === Premailer-specific CSS
51
+
52
+ Premailer looks for a few CSS attributes that make working with tables a bit easier.
53
+
54
+ +-premailer-width+:: Available on <tt>table</tt>, <tt>th</tt> and <tt>td</tt> elements
55
+ +-premailer-height+:: Available on <tt>table</tt>, <tt>tr</tt>, <tt>th</tt> and <tt>td</tt> elements
56
+ +-premailer-cellpadding+:: Available on <tt>table</tt> elements
57
+ +-premailer-cellspacing+:: Available on <tt>table</tt> elements
58
+
59
+ Each of these CSS declarations will be copied to appropriate element's attribute.
60
+
61
+ For example
62
+
63
+ table { -premailer-cellspacing: 5; -premailer-width: 500;}
64
+
65
+ will result in
66
+
67
+ <table cellspacing='5' width='500'>
68
+
69
+
70
+ === Contributions
71
+
72
+ Contributions are most welcome. Premailer was rotting away in a private SVN repository for too long and could use some TLC. Fork and patch to your heart's content. Please don't increment the version numbers, though.
73
+
74
+ A few areas that are particularly in need of love:
75
+ * Improved test coverage
76
+ * Move un-repeated background images defined in CSS to <tt><td background=""></tt> for Outlook
77
+
78
+ === Credits and code
79
+
80
+ Thanks to {all the wonderful contributors}[https://github.com/alexdunae/premailer/contributors] for their updates.
81
+
82
+ Thanks to {Greenhood + Company}[http://www.greenhood.com/] for sponsoring some of the 1.5.6 updates,
83
+ and to {Campaign Monitor}[http://www.campaignmonitor.com] for supporting the web interface.
84
+
85
+ The web interface can be found at {premailer.dialect.ca}[http://premailer.dialect.ca].
86
+
87
+ The source code can be found on {GitHub}[https://github.com/alexdunae/premailer].
88
+
89
+ Copyright by Alex Dunae (dunae.ca, e-mail 'code' at the same domain), 2007-2011. See LICENSE.rdoc for license details.
data/bin/premailer ADDED
@@ -0,0 +1,7 @@
1
+ #!/usr/bin/env ruby
2
+
3
+ # This binary used in rubygems environment only as part of installed gem
4
+
5
+ require 'rubygems'
6
+ require 'premailer/executor'
7
+
data/init.rb ADDED
@@ -0,0 +1 @@
1
+ require 'premailer'
data/lib/premailer.rb ADDED
@@ -0,0 +1,10 @@
1
+ require 'yaml'
2
+ require 'open-uri'
3
+ require 'digest/md5'
4
+ require 'cgi'
5
+ require 'css_parser'
6
+
7
+ require 'premailer/adapter'
8
+ require 'premailer/html_to_plain_text'
9
+ require 'premailer/premailer'
10
+
@@ -0,0 +1,59 @@
1
+ # = HTTPI::Adapter
2
+ #
3
+ # Manages the adapter classes. Currently supports:
4
+ #
5
+ # * nokogiri
6
+ # * hpricot
7
+ class Premailer
8
+ module Adapter
9
+
10
+ autoload :Hpricot, 'premailer/adapter/hpricot'
11
+ autoload :Nokogiri, 'premailer/adapter/nokogiri'
12
+
13
+ REQUIREMENT_MAP = [
14
+ ["hpricot", :hpricot],
15
+ ["nokogiri", :nokogiri],
16
+ ]
17
+
18
+ # Returns the adapter to use.
19
+ def self.use
20
+ return @use if @use
21
+ self.use = self.default
22
+ @use
23
+ end
24
+
25
+ # The default adapter based on what you currently have loaded and
26
+ # installed. First checks to see if any adapters are already loaded,
27
+ # then ckecks to see which are installed if none are loaded.
28
+ def self.default
29
+ return :hpricot if defined?(::Hpricot)
30
+ return :nokogiri if defined?(::Nokogiri)
31
+
32
+ REQUIREMENT_MAP.each do |(library, adapter)|
33
+ begin
34
+ require library
35
+ return adapter
36
+ rescue LoadError
37
+ next
38
+ end
39
+ end
40
+
41
+ raise "No suitable adapter for Premailer was found, please install hpricot or nokogiri"
42
+ end
43
+
44
+ # Sets the +adapter+ to use. Raises an +ArgumentError+ unless the +adapter+ exists.
45
+ def self.use=(new_adapter)
46
+ @use = find(new_adapter)
47
+ end
48
+
49
+ # Returns an +adapter+. Raises an +ArgumentError+ unless the +adapter+ exists.
50
+ def self.find(adapter)
51
+ return adapter if adapter.is_a?(Module)
52
+
53
+ Premailer::Adapter.const_get("#{adapter.to_s.split('_').map{|s| s.capitalize}.join('')}")
54
+ rescue NameError
55
+ raise ArgumentError, "Invalid adapter: #{adapter}"
56
+ end
57
+
58
+ end
59
+ end
@@ -0,0 +1,193 @@
1
+ require 'hpricot'
2
+
3
+ class Premailer
4
+ module Adapter
5
+ module Hpricot
6
+
7
+ # Merge CSS into the HTML document.
8
+ #
9
+ # Returns a string.
10
+ def to_inline_css
11
+ doc = @processed_doc
12
+ @unmergable_rules = CssParser::Parser.new
13
+
14
+ # Give all styles already in style attributes a specificity of 1000
15
+ # per http://www.w3.org/TR/CSS21/cascade.html#specificity
16
+ doc.search("*[@style]").each do |el|
17
+ el['style'] = '[SPEC=1000[' + el.attributes['style'] + ']]'
18
+ end
19
+
20
+ # Iterate through the rules and merge them into the HTML
21
+ @css_parser.each_selector(:all) do |selector, declaration, specificity|
22
+ # Save un-mergable rules separately
23
+ selector.gsub!(/:link([\s]*)+/i) {|m| $1 }
24
+
25
+ # Convert element names to lower case
26
+ selector.gsub!(/([\s]|^)([\w]+)/) {|m| $1.to_s + $2.to_s.downcase }
27
+
28
+ if selector =~ Premailer::RE_UNMERGABLE_SELECTORS
29
+ @unmergable_rules.add_rule_set!(CssParser::RuleSet.new(selector, declaration)) unless @options[:preserve_styles]
30
+ else
31
+ begin
32
+ if selector =~ Premailer::RE_RESET_SELECTORS
33
+ # this is in place to preserve the MailChimp CSS reset: http://github.com/mailchimp/Email-Blueprints/
34
+ # however, this doesn't mean for testing pur
35
+ @unmergable_rules.add_rule_set!(CssParser::RuleSet.new(selector, declaration)) unless !@options[:preserve_reset]
36
+ end
37
+
38
+ # Change single ID CSS selectors into xpath so that we can match more
39
+ # than one element. Added to work around dodgy generated code.
40
+ selector.gsub!(/\A\#([\w_\-]+)\Z/, '*[@id=\1]')
41
+
42
+ # convert attribute selectors to hpricot's format
43
+ selector.gsub!(/\[([\w]+)\]/, '[@\1]')
44
+ selector.gsub!(/\[([\w]+)([\=\~\^\$\*]+)([\w\s]+)\]/, '[@\1\2\'\3\']')
45
+
46
+ doc.search(selector).each do |el|
47
+ if el.elem? and (el.name != 'head' and el.parent.name != 'head')
48
+ # Add a style attribute or append to the existing one
49
+ block = "[SPEC=#{specificity}[#{declaration}]]"
50
+ el['style'] = (el.attributes['style'].to_s ||= '') + ' ' + block
51
+ end
52
+ end
53
+ rescue ::Hpricot::Error, RuntimeError, ArgumentError
54
+ $stderr.puts "CSS syntax error with selector: #{selector}" if @options[:verbose]
55
+ next
56
+ end
57
+ end
58
+ end
59
+
60
+ # Read STYLE attributes and perform folding
61
+ doc.search("*[@style]").each do |el|
62
+ style = el.attributes['style'].to_s
63
+
64
+ declarations = []
65
+
66
+ style.scan(/\[SPEC\=([\d]+)\[(.[^\]\]]*)\]\]/).each do |declaration|
67
+ rs = CssParser::RuleSet.new(nil, declaration[1].to_s, declaration[0].to_i)
68
+ declarations << rs
69
+ end
70
+ # Perform style folding
71
+ merged = CssParser.merge(declarations)
72
+ merged.expand_shorthand!
73
+
74
+ # Duplicate CSS attributes as HTML attributes
75
+ if Premailer::RELATED_ATTRIBUTES.has_key?(el.name)
76
+ Premailer::RELATED_ATTRIBUTES[el.name].each do |css_att, html_att|
77
+ el[html_att] = merged[css_att].gsub(/url\('(.*)'\)/,'\1').gsub(/;$/, '').strip if el[html_att].nil? and not merged[css_att].empty?
78
+ end
79
+ end
80
+
81
+ merged.create_dimensions_shorthand!
82
+ merged.create_border_shorthand!
83
+
84
+ # write the inline STYLE attribute
85
+ el['style'] = Premailer.escape_string(merged.declarations_to_s)
86
+ end
87
+
88
+ doc = write_unmergable_css_rules(doc, @unmergable_rules)
89
+
90
+ if @options[:remove_classes] or @options[:remove_comments]
91
+ doc.search('*').each do |el|
92
+ if el.comment? and @options[:remove_comments]
93
+ lst = el.parent.children
94
+ el.parent = nil
95
+ lst.delete(el)
96
+ elsif el.elem?
97
+ el.remove_attribute('class') if @options[:remove_classes]
98
+ end
99
+ end
100
+ end
101
+
102
+ if @options[:remove_ids]
103
+ # find all anchor's targets and hash them
104
+ targets = []
105
+ doc.search("a[@href^='#']").each do |el|
106
+ target = el.get_attribute('href')[1..-1]
107
+ targets << target
108
+ el.set_attribute('href', "#" + Digest::MD5.hexdigest(target))
109
+ end
110
+ # hash ids that are links target, delete others
111
+ doc.search("*[@id]").each do |el|
112
+ id = el.get_attribute('id')
113
+ if targets.include?(id)
114
+ el.set_attribute('id', Digest::MD5.hexdigest(id))
115
+ else
116
+ el.remove_attribute('id')
117
+ end
118
+ end
119
+ end
120
+
121
+ @processed_doc = doc
122
+
123
+ @processed_doc.to_original_html
124
+ end
125
+
126
+ # Create a <tt>style</tt> element with un-mergable rules (e.g. <tt>:hover</tt>)
127
+ # and write it into the <tt>body</tt>.
128
+ #
129
+ # <tt>doc</tt> is an Hpricot document and <tt>unmergable_css_rules</tt> is a Css::RuleSet.
130
+ #
131
+ # Returns an Hpricot document.
132
+ def write_unmergable_css_rules(doc, unmergable_rules) # :nodoc:
133
+ styles = ''
134
+ unmergable_rules.each_selector(:all, :force_important => true) do |selector, declarations, specificity|
135
+ styles += "#{selector} { #{declarations} }\n"
136
+ end
137
+
138
+ unless styles.empty?
139
+ style_tag = "\n<style type=\"text/css\">\n#{styles}</style>\n"
140
+ if body = doc.search('body')
141
+ body.append(style_tag)
142
+ else
143
+ doc.inner_html= doc.inner_html << style_tag
144
+ end
145
+ end
146
+ doc
147
+ end
148
+
149
+
150
+ # Converts the HTML document to a format suitable for plain-text e-mail.
151
+ #
152
+ # If present, uses the <body> element as its base; otherwise uses the whole document.
153
+ #
154
+ # Returns a string.
155
+ def to_plain_text
156
+ html_src = ''
157
+ begin
158
+ html_src = @doc.search("body").inner_html
159
+ rescue; end
160
+
161
+ html_src = @doc.to_html unless html_src and not html_src.empty?
162
+ convert_to_text(html_src, @options[:line_length], @html_encoding)
163
+ end
164
+
165
+
166
+ # Returns the original HTML as a string.
167
+ def to_s
168
+ @doc.to_original_html
169
+ end
170
+
171
+ # Load the HTML file and convert it into an Hpricot document.
172
+ #
173
+ # Returns an Hpricot document.
174
+ def load_html(input) # :nodoc:
175
+ thing = nil
176
+
177
+ # TODO: duplicate options
178
+ if @options[:with_html_string] or @options[:inline] or input.respond_to?(:read)
179
+ thing = input
180
+ elsif @is_local_file
181
+ @base_dir = File.dirname(input)
182
+ thing = File.open(input, 'r')
183
+ else
184
+ thing = open(input)
185
+ end
186
+
187
+ # TODO: deal with Hpricot seg faults on empty input
188
+ thing ? Hpricot(thing) : nil
189
+ end
190
+
191
+ end
192
+ end
193
+ end
@@ -0,0 +1,203 @@
1
+ require 'nokogiri'
2
+
3
+ class Premailer
4
+ module Adapter
5
+ module Nokogiri
6
+
7
+ # Merge CSS into the HTML document.
8
+ #
9
+ # Returns a string.
10
+ def to_inline_css
11
+ doc = @processed_doc
12
+ @unmergable_rules = CssParser::Parser.new
13
+
14
+ # Give all styles already in style attributes a specificity of 1000
15
+ # per http://www.w3.org/TR/CSS21/cascade.html#specificity
16
+ doc.search("*[@style]").each do |el|
17
+ el['style'] = '[SPEC=1000[' + el.attributes['style'] + ']]'
18
+ end
19
+
20
+ # Iterate through the rules and merge them into the HTML
21
+ @css_parser.each_selector(:all) do |selector, declaration, specificity|
22
+ # Save un-mergable rules separately
23
+ selector.gsub!(/:link([\s]*)+/i) {|m| $1 }
24
+
25
+ # Convert element names to lower case
26
+ selector.gsub!(/([\s]|^)([\w]+)/) {|m| $1.to_s + $2.to_s.downcase }
27
+
28
+ if selector =~ Premailer::RE_UNMERGABLE_SELECTORS
29
+ @unmergable_rules.add_rule_set!(CssParser::RuleSet.new(selector, declaration)) unless @options[:preserve_styles]
30
+ else
31
+ begin
32
+ # Change single ID CSS selectors into xpath so that we can match more
33
+ # than one element. Added to work around dodgy generated code.
34
+ selector.gsub!(/\A\#([\w_\-]+)\Z/, '*[@id=\1]')
35
+
36
+ doc.search(selector).each do |el|
37
+ if el.elem? and (el.name != 'head' and el.parent.name != 'head')
38
+ # Add a style attribute or append to the existing one
39
+ block = "[SPEC=#{specificity}[#{declaration}]]"
40
+ el['style'] = (el.attributes['style'].to_s ||= '') + ' ' + block
41
+ end
42
+ end
43
+ rescue ::Nokogiri::SyntaxError, RuntimeError, ArgumentError
44
+ $stderr.puts "CSS syntax error with selector: #{selector}" if @options[:verbose]
45
+ next
46
+ end
47
+ end
48
+ end
49
+
50
+ # Read STYLE attributes and perform folding
51
+ doc.search("*[@style]").each do |el|
52
+ style = el.attributes['style'].to_s
53
+
54
+ declarations = []
55
+ style.scan(/\[SPEC\=([\d]+)\[(.[^\]\]]*)\]\]/).each do |declaration|
56
+ rs = CssParser::RuleSet.new(nil, declaration[1].to_s, declaration[0].to_i)
57
+ declarations << rs
58
+ end
59
+
60
+ # Perform style folding
61
+ merged = CssParser.merge(declarations)
62
+ merged.expand_shorthand!
63
+
64
+ # Duplicate CSS attributes as HTML attributes
65
+ if Premailer::RELATED_ATTRIBUTES.has_key?(el.name)
66
+ Premailer::RELATED_ATTRIBUTES[el.name].each do |css_att, html_att|
67
+ el[html_att] = merged[css_att].gsub(/url\('(.*)'\)/,'\1').gsub(/;$/, '').strip if el[html_att].nil? and not merged[css_att].empty?
68
+ end
69
+ end
70
+
71
+ merged.create_dimensions_shorthand!
72
+ merged.create_border_shorthand!
73
+
74
+ # write the inline STYLE attribute
75
+ el['style'] = Premailer.escape_string(merged.declarations_to_s)
76
+ end
77
+
78
+ doc = write_unmergable_css_rules(doc, @unmergable_rules)
79
+
80
+ if @options[:remove_classes] or @options[:remove_comments]
81
+ doc.traverse do |el|
82
+ if el.comment? and @options[:remove_comments]
83
+ el.remove
84
+ elsif el.element?
85
+ el.remove_attribute('class') if @options[:remove_classes]
86
+ end
87
+ end
88
+ end
89
+
90
+ if @options[:remove_ids]
91
+ # find all anchor's targets and hash them
92
+ targets = []
93
+ doc.search("a[@href^='#']").each do |el|
94
+ target = el.get_attribute('href')[1..-1]
95
+ targets << target
96
+ el.set_attribute('href', "#" + Digest::MD5.hexdigest(target))
97
+ end
98
+ # hash ids that are links target, delete others
99
+ doc.search("*[@id]").each do |el|
100
+ id = el.get_attribute('id')
101
+ if targets.include?(id)
102
+ el.set_attribute('id', Digest::MD5.hexdigest(id))
103
+ else
104
+ el.remove_attribute('id')
105
+ end
106
+ end
107
+ end
108
+
109
+ @processed_doc = doc
110
+ if is_xhtml?
111
+ # we don't want to encode carriage returns
112
+ @processed_doc.to_xhtml(:encoding => nil).gsub(/&\#xD;/i, "\r")
113
+ else
114
+ @processed_doc.to_html
115
+ end
116
+ end
117
+
118
+ # Create a <tt>style</tt> element with un-mergable rules (e.g. <tt>:hover</tt>)
119
+ # and write it into the <tt>body</tt>.
120
+ #
121
+ # <tt>doc</tt> is an Nokogiri document and <tt>unmergable_css_rules</tt> is a Css::RuleSet.
122
+ #
123
+ # Returns an Nokogiri document.
124
+ def write_unmergable_css_rules(doc, unmergable_rules) # :nodoc:
125
+ styles = ''
126
+ unmergable_rules.each_selector(:all, :force_important => true) do |selector, declarations, specificity|
127
+ styles += "#{selector} { #{declarations} }\n"
128
+ end
129
+
130
+ unless styles.empty?
131
+ style_tag = "<style type=\"text/css\">\n#{styles}></style>"
132
+ if body = doc.search('body')
133
+ doc.at_css('body').children.before(::Nokogiri::XML.fragment(style_tag))
134
+ else
135
+ doc.inner_html = style_tag += doc.inner_html
136
+ end
137
+ end
138
+ doc
139
+ end
140
+
141
+
142
+ # Converts the HTML document to a format suitable for plain-text e-mail.
143
+ #
144
+ # If present, uses the <body> element as its base; otherwise uses the whole document.
145
+ #
146
+ # Returns a string.
147
+ def to_plain_text
148
+ html_src = ''
149
+ begin
150
+ html_src = @doc.at("body").inner_html
151
+ rescue; end
152
+
153
+ html_src = @doc.to_html unless html_src and not html_src.empty?
154
+ convert_to_text(html_src, @options[:line_length], @html_encoding)
155
+ end
156
+
157
+ # Returns the original HTML as a string.
158
+ def to_s
159
+ if is_xhtml?
160
+ @doc.to_xhtml(:encoding => nil)
161
+ else
162
+ @doc.to_html(:encoding => nil)
163
+ end
164
+ end
165
+
166
+ # Load the HTML file and convert it into an Nokogiri document.
167
+ #
168
+ # Returns an Nokogiri document.
169
+ def load_html(input) # :nodoc:
170
+ thing = nil
171
+
172
+ # TODO: duplicate options
173
+ if @options[:with_html_string] or @options[:inline] or input.respond_to?(:read)
174
+ thing = input
175
+ elsif @is_local_file
176
+ @base_dir = File.dirname(input)
177
+ thing = File.open(input, 'r')
178
+ else
179
+ thing = open(input)
180
+ end
181
+
182
+ if thing.respond_to?(:read)
183
+ thing = thing.read
184
+ end
185
+
186
+ return nil unless thing
187
+ doc = nil
188
+
189
+ # Default encoding is ASCII-8BIT (binary) per http://groups.google.com/group/nokogiri-talk/msg/0b81ef0dc180dc74
190
+ if thing.is_a?(String) and RUBY_VERSION =~ /1.9/
191
+ thing = thing.force_encoding('ASCII-8BIT').encode!
192
+ doc = ::Nokogiri::HTML(thing) {|c| c.recover }
193
+ else
194
+ default_encoding = RUBY_PLATFORM == 'java' ? nil : 'BINARY'
195
+ doc = ::Nokogiri::HTML(thing, nil, @options[:inputencoding] || default_encoding) {|c| c.recover }
196
+ end
197
+
198
+ return doc
199
+ end
200
+
201
+ end
202
+ end
203
+ end