html2md 0.1

Sign up to get free protection for your applications and to get access to all the features.
data/LICENSE.md ADDED
@@ -0,0 +1,202 @@
1
+ Apache License
2
+
3
+ Version 2.0, January 2004
4
+
5
+ [http://www.apache.org/licenses/](http://www.apache.org/licenses/)
6
+
7
+ TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
8
+
9
+ **1. Definitions**.
10
+
11
+ "License" shall mean the terms and conditions for use, reproduction, and
12
+ distribution as defined by Sections 1 through 9 of this document.
13
+
14
+ "Licensor" shall mean the copyright owner or entity authorized by the
15
+ copyright owner that is granting the License.
16
+
17
+ "Legal Entity" shall mean the union of the acting entity and all other
18
+ entities that control, are controlled by, or are under common control with
19
+ that entity. For the purposes of this definition, "control" means (i) the
20
+ power, direct or indirect, to cause the direction or management of such
21
+ entity, whether by contract or otherwise, or (ii) ownership of fifty
22
+ percent (50%) or more of the outstanding shares, or (iii) beneficial
23
+ ownership of such entity.
24
+
25
+ "You" (or "Your") shall mean an individual or Legal Entity exercising
26
+ permissions granted by this License.
27
+
28
+ "Source" form shall mean the preferred form for making modifications,
29
+ including but not limited to software source code, documentation source,
30
+ and configuration files.
31
+
32
+ "Object" form shall mean any form resulting from mechanical transformation
33
+ or translation of a Source form, including but not limited to compiled
34
+ object code, generated documentation, and conversions to other media types.
35
+
36
+ "Work" shall mean the work of authorship, whether in Source or Object form,
37
+ made available under the License, as indicated by a copyright notice that
38
+ is included in or attached to the work (an example is provided in the
39
+ Appendix below).
40
+
41
+ "Derivative Works" shall mean any work, whether in Source or Object form,
42
+ that is based on (or derived from) the Work and for which the editorial
43
+ revisions, annotations, elaborations, or other modifications represent, as
44
+ a whole, an original work of authorship. For the purposes of this License,
45
+ Derivative Works shall not include works that remain separable from, or
46
+ merely link (or bind by name) to the interfaces of, the Work and Derivative
47
+ Works thereof.
48
+
49
+ "Contribution" shall mean any work of authorship, including the original
50
+ version of the Work and any modifications or additions to that Work or
51
+ Derivative Works thereof, that is intentionally submitted to Licensor for
52
+ inclusion in the Work by the copyright owner or by an individual or Legal
53
+ Entity authorized to submit on behalf of the copyright owner. For the
54
+ purposes of this definition, "submitted" means any form of electronic,
55
+ verbal, or written communication sent to the Licensor or its
56
+ representatives, including but not limited to communication on electronic
57
+ mailing lists, source code control systems, and issue tracking systems that
58
+ are managed by, or on behalf of, the Licensor for the purpose of discussing
59
+ and improving the Work, but excluding communication that is conspicuously
60
+ marked or otherwise designated in writing by the copyright owner as "Not a
61
+ Contribution."
62
+
63
+ "Contributor" shall mean Licensor and any individual or Legal Entity on
64
+ behalf of whom a Contribution has been received by Licensor and
65
+ subsequently incorporated within the Work.
66
+
67
+ **2. Grant of Copyright License**. Subject to the
68
+ terms and conditions of this License, each Contributor hereby grants to You
69
+ a perpetual, worldwide, non-exclusive, no-charge, royalty-free, irrevocable
70
+ copyright license to reproduce, prepare Derivative Works of, publicly
71
+ display, publicly perform, sublicense, and distribute the Work and such
72
+ Derivative Works in Source or Object form.
73
+
74
+ **3. Grant of Patent License**. Subject to the terms
75
+ and conditions of this License, each Contributor hereby grants to You a
76
+ perpetual, worldwide, non-exclusive, no-charge, royalty-free, irrevocable
77
+ (except as stated in this section) patent license to make, have made, use,
78
+ offer to sell, sell, import, and otherwise transfer the Work, where such
79
+ license applies only to those patent claims licensable by such Contributor
80
+ that are necessarily infringed by their Contribution(s) alone or by
81
+ combination of their Contribution(s) with the Work to which such
82
+ Contribution(s) was submitted. If You institute patent litigation against
83
+ any entity (including a cross-claim or counterclaim in a lawsuit) alleging
84
+ that the Work or a Contribution incorporated within the Work constitutes
85
+ direct or contributory patent infringement, then any patent licenses
86
+ granted to You under this License for that Work shall terminate as of the
87
+ date such litigation is filed.
88
+
89
+ **4. Redistribution**. You may reproduce and
90
+ distribute copies of the Work or Derivative Works thereof in any medium,
91
+ with or without modifications, and in Source or Object form, provided that
92
+ You meet the following conditions:
93
+
94
+
95
+ 1. You must give any other recipients of the Work or Derivative Works a
96
+ copy of this License; and
97
+
98
+
99
+ 2. You must cause any modified files to carry prominent notices stating
100
+ that You changed the files; and
101
+
102
+
103
+ 3. You must retain, in the Source form of any Derivative Works that You
104
+ distribute, all copyright, patent, trademark, and attribution notices from
105
+ the Source form of the Work, excluding those notices that do not pertain to
106
+ any part of the Derivative Works; and
107
+
108
+
109
+ 4. If the Work includes a "NOTICE" text file as part of its distribution,
110
+ then any Derivative Works that You distribute must include a readable copy
111
+ of the attribution notices contained within such NOTICE file, excluding
112
+ those notices that do not pertain to any part of the Derivative Works, in
113
+ at least one of the following places: within a NOTICE text file distributed
114
+ as part of the Derivative Works; within the Source form or documentation,
115
+ if provided along with the Derivative Works; or, within a display generated
116
+ by the Derivative Works, if and wherever such third-party notices normally
117
+ appear. The contents of the NOTICE file are for informational purposes only
118
+ and do not modify the License. You may add Your own attribution notices
119
+ within Derivative Works that You distribute, alongside or as an addendum to
120
+ the NOTICE text from the Work, provided that such additional attribution
121
+ notices cannot be construed as modifying the License.
122
+ You may add Your own copyright statement to Your modifications and may
123
+ provide additional or different license terms and conditions for use,
124
+ reproduction, or distribution of Your modifications, or for any such
125
+ Derivative Works as a whole, provided Your use, reproduction, and
126
+ distribution of the Work otherwise complies with the conditions stated in
127
+ this License.
128
+
129
+
130
+ **5. Submission of Contributions**. Unless You
131
+ explicitly state otherwise, any Contribution intentionally submitted for
132
+ inclusion in the Work by You to the Licensor shall be under the terms and
133
+ conditions of this License, without any additional terms or conditions.
134
+ Notwithstanding the above, nothing herein shall supersede or modify the
135
+ terms of any separate license agreement you may have executed with Licensor
136
+ regarding such Contributions.
137
+
138
+ **6. Trademarks**. This License does not grant
139
+ permission to use the trade names, trademarks, service marks, or product
140
+ names of the Licensor, except as required for reasonable and customary use
141
+ in describing the origin of the Work and reproducing the content of the
142
+ NOTICE file.
143
+
144
+ **7. Disclaimer of Warranty**. Unless required by
145
+ applicable law or agreed to in writing, Licensor provides the Work (and
146
+ each Contributor provides its Contributions) on an "AS IS" BASIS, WITHOUT
147
+ WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied, including,
148
+ without limitation, any warranties or conditions of TITLE,
149
+ NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A PARTICULAR PURPOSE. You
150
+ are solely responsible for determining the appropriateness of using or
151
+ redistributing the Work and assume any risks associated with Your exercise
152
+ of permissions under this License.
153
+
154
+ **8. Limitation of Liability**. In no event and
155
+ under no legal theory, whether in tort (including negligence), contract, or
156
+ otherwise, unless required by applicable law (such as deliberate and
157
+ grossly negligent acts) or agreed to in writing, shall any Contributor be
158
+ liable to You for damages, including any direct, indirect, special,
159
+ incidental, or consequential damages of any character arising as a result
160
+ of this License or out of the use or inability to use the Work (including
161
+ but not limited to damages for loss of goodwill, work stoppage, computer
162
+ failure or malfunction, or any and all other commercial damages or losses),
163
+ even if such Contributor has been advised of the possibility of such
164
+ damages.
165
+
166
+ **9. Accepting Warranty or Additional Liability**.
167
+ While redistributing the Work or Derivative Works thereof, You may choose
168
+ to offer, and charge a fee for, acceptance of support, warranty, indemnity,
169
+ or other liability obligations and/or rights consistent with this License.
170
+ However, in accepting such obligations, You may act only on Your own behalf
171
+ and on Your sole responsibility, not on behalf of any other Contributor,
172
+ and only if You agree to indemnify, defend, and hold each Contributor
173
+ harmless for any liability incurred by, or claims asserted against, such
174
+ Contributor by reason of your accepting any such warranty or additional
175
+ liability.
176
+
177
+ END OF TERMS AND CONDITIONS
178
+
179
+ APPENDIX: How to apply the Apache License to your workTo apply the Apache License to your work, attach the following boilerplate
180
+ notice, with the fields enclosed by brackets "[]" replaced with your own
181
+ identifying information. (Don't include the brackets!) The text should be
182
+ enclosed in the appropriate comment syntax for the file format. We also
183
+ recommend that a file or class name and description of purpose be included
184
+ on the same "printed page" as the copyright notice for easier
185
+ identification within third-party archives.
186
+
187
+
188
+ ```
189
+ Copyright [yyyy] [name of copyright owner]
190
+
191
+ Licensed under the Apache License, Version 2.0 (the "License");
192
+ you may not use this file except in compliance with the License.
193
+ You may obtain a copy of the License at
194
+
195
+ http://www.apache.org/licenses/LICENSE-2.0
196
+
197
+ Unless required by applicable law or agreed to in writing, software
198
+ distributed under the License is distributed on an "AS IS" BASIS,
199
+ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
200
+ See the License for the specific language governing permissions and
201
+ limitations under the License.
202
+ ```
data/README.md ADDED
File without changes
data/Rakefile ADDED
@@ -0,0 +1,16 @@
1
+ require 'bundler/setup'
2
+ require "cucumber/rake/task"
3
+ lib = File.expand_path('../lib/', __FILE__)
4
+ $:.unshift lib unless $:.include?(lib)
5
+
6
+ require 'html2md'
7
+
8
+ Cucumber::Rake::Task.new do |t|
9
+ t.cucumber_opts = %w{--format pretty}
10
+ end
11
+
12
+ desc "Test"
13
+ task :t, [] => [] do |taks,args|
14
+ t = Html2Md.new(File.read('test.html'))
15
+ puts t.parse
16
+ end
@@ -0,0 +1,82 @@
1
+ Feature: Markdown
2
+ We need to convert basic HTML to markdown
3
+
4
+ Scenario: Create a H Rule (HR) element
5
+ * HTML <hr/>
6
+ * I say parse
7
+ * The markdown should be (\n* * * * *\n)
8
+
9
+ Scenario: Create a hard break (BR) element
10
+ * HTML <br/>
11
+ * I say parse
12
+ * The markdown should be ( \n)
13
+
14
+ Scenario: Paragraph (P) elements should be a single hard return
15
+ * HTML <p>
16
+ * I say parse
17
+ * The markdown should be (\n\n)
18
+
19
+ Scenario: Link (a href=) elements should should convert
20
+ * HTML <a href="/some/link.html"> Link </a>
21
+ * I say parse
22
+ * The markdown should be ([ Link ](/some/link.html))
23
+
24
+ Scenario: Other ancors should be ignored
25
+ * HTML <a name="link"> Link </a>
26
+ * I say parse
27
+ * The markdown should be ( Link )
28
+
29
+ Scenario: Ancors should reset after being used once
30
+ * HTML <a href="/some/link.html"> Link </a> <a name="link"> Link </a>
31
+ * I say parse
32
+ * The markdown should be ([ Link ](/some/link.html) Link )
33
+
34
+ Scenario: Other (a) elements should be ignored
35
+ * HTML <a> Text </a>
36
+ * I say parse
37
+ * The markdown should be ( Text )
38
+
39
+ Scenario: An order list
40
+ * HTML <ol><li>First</li><li>Second</li><ol>
41
+ * I say parse
42
+ * The markdown should be (\n 1. First\n 2. Second\n\n)
43
+
44
+ Scenario: An un-order list
45
+ * HTML <ul><li>First</li><li>Second</li><ul>
46
+ * I say parse
47
+ * The markdown should be (\n - First\n - Second\n\n)
48
+
49
+ Scenario: Complex List
50
+ * HTML <ul><li>First</li><li> <ol><li>First<ul><li>First</li><li>Second</li></ul></li><li>Second</li> </ol>Second</li><ul>
51
+ * I say parse
52
+ * The markdown should be (\n - First\n - \n 1. First\n - First\n - Second\n\n 2. Second\nSecond\n\n)
53
+
54
+ Scenario: Emphasis (em) element
55
+ * HTML <em>Emphasis</em>
56
+ * I say parse
57
+ * The markdown should be (_Emphasis_)
58
+
59
+ Scenario: Strong (strong) element
60
+ * HTML <strong>Emphasis</strong>
61
+ * I say parse
62
+ * The markdown should be (**Emphasis**)
63
+
64
+ Scenario: Pre (pre) element
65
+ * HTML <pre>This is some preformatted code</pre>
66
+ * I say parse
67
+ * The markdown should be (\n```\nThis is some preformatted code\n```\n)
68
+
69
+ Scenario: Other HTML Elements (div) should be ignored
70
+ * HTML <div>This is in a div</div>
71
+ * I say parse
72
+ * The markdown should be (This is in a div)
73
+
74
+ Scenario: Other HTML Elements (span) should be ignored
75
+ * HTML <span>This is in a span</span>
76
+ * I say parse
77
+ * The markdown should be (This is in a span)
78
+
79
+ Scenario: Character data should not have new lines
80
+ * HTML This is character data \n
81
+ * I say parse
82
+ * The markdown should be (This is character data \n\n)
@@ -0,0 +1,24 @@
1
+ # encoding: utf-8
2
+ require 'rspec/expectations'
3
+ require 'cucumber/formatter/unicode'
4
+ $:.unshift(File.dirname(__FILE__) + '/../../lib')
5
+ require 'html2md'
6
+
7
+ Before do
8
+ @html2md = Html2Md.new
9
+ end
10
+
11
+ After do
12
+ end
13
+
14
+ Given /HTML (.*)/ do |n|
15
+ @html2md.source = n.gsub("\\n", "\n")
16
+ end
17
+
18
+ When /I say parse/ do
19
+ @result = @html2md.parse
20
+ end
21
+
22
+ Then /The markdown should be \((.*)\)/ do |result|
23
+ @result.should == result.gsub("\\n", "\n")
24
+ end
data/lib/html2md.rb ADDED
@@ -0,0 +1,19 @@
1
+ require 'nokogiri'
2
+ require 'html2md/document'
3
+
4
+ class Html2Md
5
+ attr_accessor :options, :source
6
+
7
+ def initialize(source =nil , options = {})
8
+ @options = options
9
+ @source = source
10
+ end
11
+
12
+ def parse()
13
+ doc = Html2Md::Document.new()
14
+ doc.relative_url = options[:relative_url]
15
+ parser = Nokogiri::HTML::SAX::Parser.new(doc)
16
+ parser.parse(source)
17
+ parser.document.markdown
18
+ end
19
+ end
@@ -0,0 +1,3 @@
1
+ class Html2Md
2
+ VERSION = "0.1"
3
+ end
@@ -0,0 +1,176 @@
1
+ require 'nokogiri'
2
+ require 'uri'
3
+
4
+ class Html2Md
5
+ class Document < Nokogiri::XML::SAX::Document
6
+
7
+ attr_reader :markdown
8
+ attr_accessor :relative_url
9
+
10
+ def start_document
11
+ @markdown = ''
12
+ @last_href = nil
13
+ @allowed_tags = ['tr','td','th','table']
14
+ @current_list = -1
15
+ @list_tree = []
16
+
17
+ end
18
+
19
+ def start_tag(name,attributes = [])
20
+ if @allowed_tags.include? name
21
+ "<#{name}>"
22
+ else
23
+ ''
24
+ end
25
+ end
26
+
27
+ def end_tag(name,attributes = [])
28
+ if @allowed_tags.include? name
29
+ "</#{name}>"
30
+ else
31
+ ''
32
+ end
33
+ end
34
+
35
+ def start_element name, attributes = []
36
+ start_name = "start_#{name}".to_sym
37
+ both_name = "start_and_end_#{name}".to_sym
38
+
39
+ if self.respond_to?(both_name)
40
+ self.send( both_name, attributes )
41
+ elsif self.respond_to?(start_name)
42
+ self.send( start_name, attributes )
43
+ else
44
+ @markdown << start_tag(name)
45
+ end
46
+
47
+ end
48
+
49
+ def end_element name, attributes = []
50
+ end_name = "end_#{name}".to_sym
51
+ both_name = "start_and_end_#{name}".to_sym
52
+
53
+ if self.respond_to?(both_name)
54
+ self.send( both_name, attributes )
55
+ elsif self.respond_to?(end_name)
56
+ self.send( end_name, attributes )
57
+ else
58
+ @markdown << end_tag(name)
59
+ end
60
+ end
61
+
62
+ def start_hr(attributes)
63
+ @markdown << "\n* * * * *\n"
64
+ end
65
+
66
+ def end_hr(attributes)
67
+
68
+ end
69
+
70
+ def start_and_end_em(attributes)
71
+ @markdown << '_'
72
+ end
73
+
74
+ def start_and_end_strong(attributes)
75
+ @markdown << '**'
76
+ end
77
+
78
+ def start_br(attributes)
79
+ @markdown << " \n"
80
+ end
81
+
82
+ def end_br(attributes)
83
+
84
+ end
85
+
86
+ def start_p(attributes)
87
+
88
+ end
89
+
90
+ def end_p(attributes)
91
+ @markdown << "\n\n"
92
+ end
93
+
94
+ def start_a(attributes)
95
+ attributes.each do | attrib |
96
+ if attrib[0].downcase.eql? 'href'
97
+ @markdown << '['
98
+ @last_href = attrib[1]
99
+ end
100
+ end
101
+ end
102
+
103
+ def start_pre(attributes)
104
+ @markdown << "\n```\n"
105
+ end
106
+
107
+ def end_pre(attributes)
108
+ @markdown << "\n```\n"
109
+ end
110
+
111
+ def end_a(attributes)
112
+ if @last_href and not (['http','https'].include? URI(@last_href).scheme)
113
+ begin
114
+ rp = URI(relative_url)
115
+ rp.path = @last_href
116
+ @last_href = rp.to_s
117
+ rescue
118
+ end
119
+ end
120
+
121
+ @markdown << "](#{@last_href})" if @last_href
122
+ @last_href = nil if @last_href
123
+
124
+ end
125
+
126
+ def start_ul(attributes)
127
+ @list_tree.push( { :type => :ul, :current_element => 0 } )
128
+ @markdown << "\n"
129
+ end
130
+
131
+ def end_ul(attributes)
132
+ @list_tree.pop
133
+ end
134
+
135
+ def start_ol(attributes)
136
+ @list_tree.push( { :type => :ol, :current_element => 0 } )
137
+ @markdown << "\n"
138
+ end
139
+
140
+ def end_ol(attributes)
141
+ @list_tree.pop
142
+ end
143
+
144
+ def start_li(attributes)
145
+
146
+ @list_tree.length.times do
147
+ @markdown << " "
148
+ end
149
+
150
+ @list_tree[-1][:current_element] += 1
151
+
152
+ case @list_tree[-1][:type]
153
+ when :ol
154
+ @markdown << "#{ @list_tree[-1][:current_element] }. "
155
+ when :ul
156
+ @markdown << "- "
157
+ end
158
+
159
+ end
160
+
161
+ def end_li(attributes)
162
+ @markdown << "\n"
163
+ end
164
+
165
+ def characters c
166
+ if @list_tree[-1]
167
+ @markdown << c.chomp.lstrip.rstrip
168
+ else
169
+ @markdown << c.chomp
170
+ end
171
+ end
172
+
173
+
174
+ end
175
+ end
176
+
metadata ADDED
@@ -0,0 +1,54 @@
1
+ --- !ruby/object:Gem::Specification
2
+ name: html2md
3
+ version: !ruby/object:Gem::Version
4
+ version: '0.1'
5
+ prerelease:
6
+ platform: ruby
7
+ authors:
8
+ - Paul Morton
9
+ autorequire:
10
+ bindir: bin
11
+ cert_chain: []
12
+ date: 2012-03-18 00:00:00.000000000 Z
13
+ dependencies: []
14
+ description: ! ' Converts Basic HTML to markdown
15
+
16
+ '
17
+ email: geeksitk@gmail.com
18
+ executables: []
19
+ extensions: []
20
+ extra_rdoc_files: []
21
+ files:
22
+ - README.md
23
+ - Rakefile
24
+ - LICENSE.md
25
+ - lib/html2md/document.rb
26
+ - lib/html2md/VERSION.rb
27
+ - lib/html2md.rb
28
+ - features/markdown.feature
29
+ - features/step_definitions/markdown_steps.rb
30
+ homepage: http://github.com/pmorton/html2md
31
+ licenses: []
32
+ post_install_message:
33
+ rdoc_options: []
34
+ require_paths:
35
+ - lib
36
+ required_ruby_version: !ruby/object:Gem::Requirement
37
+ none: false
38
+ requirements:
39
+ - - ! '>='
40
+ - !ruby/object:Gem::Version
41
+ version: '0'
42
+ required_rubygems_version: !ruby/object:Gem::Requirement
43
+ none: false
44
+ requirements:
45
+ - - ! '>='
46
+ - !ruby/object:Gem::Version
47
+ version: '0'
48
+ requirements: []
49
+ rubyforge_project:
50
+ rubygems_version: 1.8.15
51
+ signing_key:
52
+ specification_version: 3
53
+ summary: A library for converting basic html to markdown
54
+ test_files: []