html2md 0.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
data/LICENSE.md ADDED
@@ -0,0 +1,202 @@
1
+ Apache License
2
+
3
+ Version 2.0, January 2004
4
+
5
+ [http://www.apache.org/licenses/](http://www.apache.org/licenses/)
6
+
7
+ TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
8
+
9
+ **1. Definitions**.
10
+
11
+ "License" shall mean the terms and conditions for use, reproduction, and
12
+ distribution as defined by Sections 1 through 9 of this document.
13
+
14
+ "Licensor" shall mean the copyright owner or entity authorized by the
15
+ copyright owner that is granting the License.
16
+
17
+ "Legal Entity" shall mean the union of the acting entity and all other
18
+ entities that control, are controlled by, or are under common control with
19
+ that entity. For the purposes of this definition, "control" means (i) the
20
+ power, direct or indirect, to cause the direction or management of such
21
+ entity, whether by contract or otherwise, or (ii) ownership of fifty
22
+ percent (50%) or more of the outstanding shares, or (iii) beneficial
23
+ ownership of such entity.
24
+
25
+ "You" (or "Your") shall mean an individual or Legal Entity exercising
26
+ permissions granted by this License.
27
+
28
+ "Source" form shall mean the preferred form for making modifications,
29
+ including but not limited to software source code, documentation source,
30
+ and configuration files.
31
+
32
+ "Object" form shall mean any form resulting from mechanical transformation
33
+ or translation of a Source form, including but not limited to compiled
34
+ object code, generated documentation, and conversions to other media types.
35
+
36
+ "Work" shall mean the work of authorship, whether in Source or Object form,
37
+ made available under the License, as indicated by a copyright notice that
38
+ is included in or attached to the work (an example is provided in the
39
+ Appendix below).
40
+
41
+ "Derivative Works" shall mean any work, whether in Source or Object form,
42
+ that is based on (or derived from) the Work and for which the editorial
43
+ revisions, annotations, elaborations, or other modifications represent, as
44
+ a whole, an original work of authorship. For the purposes of this License,
45
+ Derivative Works shall not include works that remain separable from, or
46
+ merely link (or bind by name) to the interfaces of, the Work and Derivative
47
+ Works thereof.
48
+
49
+ "Contribution" shall mean any work of authorship, including the original
50
+ version of the Work and any modifications or additions to that Work or
51
+ Derivative Works thereof, that is intentionally submitted to Licensor for
52
+ inclusion in the Work by the copyright owner or by an individual or Legal
53
+ Entity authorized to submit on behalf of the copyright owner. For the
54
+ purposes of this definition, "submitted" means any form of electronic,
55
+ verbal, or written communication sent to the Licensor or its
56
+ representatives, including but not limited to communication on electronic
57
+ mailing lists, source code control systems, and issue tracking systems that
58
+ are managed by, or on behalf of, the Licensor for the purpose of discussing
59
+ and improving the Work, but excluding communication that is conspicuously
60
+ marked or otherwise designated in writing by the copyright owner as "Not a
61
+ Contribution."
62
+
63
+ "Contributor" shall mean Licensor and any individual or Legal Entity on
64
+ behalf of whom a Contribution has been received by Licensor and
65
+ subsequently incorporated within the Work.
66
+
67
+ **2. Grant of Copyright License**. Subject to the
68
+ terms and conditions of this License, each Contributor hereby grants to You
69
+ a perpetual, worldwide, non-exclusive, no-charge, royalty-free, irrevocable
70
+ copyright license to reproduce, prepare Derivative Works of, publicly
71
+ display, publicly perform, sublicense, and distribute the Work and such
72
+ Derivative Works in Source or Object form.
73
+
74
+ **3. Grant of Patent License**. Subject to the terms
75
+ and conditions of this License, each Contributor hereby grants to You a
76
+ perpetual, worldwide, non-exclusive, no-charge, royalty-free, irrevocable
77
+ (except as stated in this section) patent license to make, have made, use,
78
+ offer to sell, sell, import, and otherwise transfer the Work, where such
79
+ license applies only to those patent claims licensable by such Contributor
80
+ that are necessarily infringed by their Contribution(s) alone or by
81
+ combination of their Contribution(s) with the Work to which such
82
+ Contribution(s) was submitted. If You institute patent litigation against
83
+ any entity (including a cross-claim or counterclaim in a lawsuit) alleging
84
+ that the Work or a Contribution incorporated within the Work constitutes
85
+ direct or contributory patent infringement, then any patent licenses
86
+ granted to You under this License for that Work shall terminate as of the
87
+ date such litigation is filed.
88
+
89
+ **4. Redistribution**. You may reproduce and
90
+ distribute copies of the Work or Derivative Works thereof in any medium,
91
+ with or without modifications, and in Source or Object form, provided that
92
+ You meet the following conditions:
93
+
94
+
95
+ 1. You must give any other recipients of the Work or Derivative Works a
96
+ copy of this License; and
97
+
98
+
99
+ 2. You must cause any modified files to carry prominent notices stating
100
+ that You changed the files; and
101
+
102
+
103
+ 3. You must retain, in the Source form of any Derivative Works that You
104
+ distribute, all copyright, patent, trademark, and attribution notices from
105
+ the Source form of the Work, excluding those notices that do not pertain to
106
+ any part of the Derivative Works; and
107
+
108
+
109
+ 4. If the Work includes a "NOTICE" text file as part of its distribution,
110
+ then any Derivative Works that You distribute must include a readable copy
111
+ of the attribution notices contained within such NOTICE file, excluding
112
+ those notices that do not pertain to any part of the Derivative Works, in
113
+ at least one of the following places: within a NOTICE text file distributed
114
+ as part of the Derivative Works; within the Source form or documentation,
115
+ if provided along with the Derivative Works; or, within a display generated
116
+ by the Derivative Works, if and wherever such third-party notices normally
117
+ appear. The contents of the NOTICE file are for informational purposes only
118
+ and do not modify the License. You may add Your own attribution notices
119
+ within Derivative Works that You distribute, alongside or as an addendum to
120
+ the NOTICE text from the Work, provided that such additional attribution
121
+ notices cannot be construed as modifying the License.
122
+ You may add Your own copyright statement to Your modifications and may
123
+ provide additional or different license terms and conditions for use,
124
+ reproduction, or distribution of Your modifications, or for any such
125
+ Derivative Works as a whole, provided Your use, reproduction, and
126
+ distribution of the Work otherwise complies with the conditions stated in
127
+ this License.
128
+
129
+
130
+ **5. Submission of Contributions**. Unless You
131
+ explicitly state otherwise, any Contribution intentionally submitted for
132
+ inclusion in the Work by You to the Licensor shall be under the terms and
133
+ conditions of this License, without any additional terms or conditions.
134
+ Notwithstanding the above, nothing herein shall supersede or modify the
135
+ terms of any separate license agreement you may have executed with Licensor
136
+ regarding such Contributions.
137
+
138
+ **6. Trademarks**. This License does not grant
139
+ permission to use the trade names, trademarks, service marks, or product
140
+ names of the Licensor, except as required for reasonable and customary use
141
+ in describing the origin of the Work and reproducing the content of the
142
+ NOTICE file.
143
+
144
+ **7. Disclaimer of Warranty**. Unless required by
145
+ applicable law or agreed to in writing, Licensor provides the Work (and
146
+ each Contributor provides its Contributions) on an "AS IS" BASIS, WITHOUT
147
+ WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied, including,
148
+ without limitation, any warranties or conditions of TITLE,
149
+ NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A PARTICULAR PURPOSE. You
150
+ are solely responsible for determining the appropriateness of using or
151
+ redistributing the Work and assume any risks associated with Your exercise
152
+ of permissions under this License.
153
+
154
+ **8. Limitation of Liability**. In no event and
155
+ under no legal theory, whether in tort (including negligence), contract, or
156
+ otherwise, unless required by applicable law (such as deliberate and
157
+ grossly negligent acts) or agreed to in writing, shall any Contributor be
158
+ liable to You for damages, including any direct, indirect, special,
159
+ incidental, or consequential damages of any character arising as a result
160
+ of this License or out of the use or inability to use the Work (including
161
+ but not limited to damages for loss of goodwill, work stoppage, computer
162
+ failure or malfunction, or any and all other commercial damages or losses),
163
+ even if such Contributor has been advised of the possibility of such
164
+ damages.
165
+
166
+ **9. Accepting Warranty or Additional Liability**.
167
+ While redistributing the Work or Derivative Works thereof, You may choose
168
+ to offer, and charge a fee for, acceptance of support, warranty, indemnity,
169
+ or other liability obligations and/or rights consistent with this License.
170
+ However, in accepting such obligations, You may act only on Your own behalf
171
+ and on Your sole responsibility, not on behalf of any other Contributor,
172
+ and only if You agree to indemnify, defend, and hold each Contributor
173
+ harmless for any liability incurred by, or claims asserted against, such
174
+ Contributor by reason of your accepting any such warranty or additional
175
+ liability.
176
+
177
+ END OF TERMS AND CONDITIONS
178
+
179
+ APPENDIX: How to apply the Apache License to your workTo apply the Apache License to your work, attach the following boilerplate
180
+ notice, with the fields enclosed by brackets "[]" replaced with your own
181
+ identifying information. (Don't include the brackets!) The text should be
182
+ enclosed in the appropriate comment syntax for the file format. We also
183
+ recommend that a file or class name and description of purpose be included
184
+ on the same "printed page" as the copyright notice for easier
185
+ identification within third-party archives.
186
+
187
+
188
+ ```
189
+ Copyright [yyyy] [name of copyright owner]
190
+
191
+ Licensed under the Apache License, Version 2.0 (the "License");
192
+ you may not use this file except in compliance with the License.
193
+ You may obtain a copy of the License at
194
+
195
+ http://www.apache.org/licenses/LICENSE-2.0
196
+
197
+ Unless required by applicable law or agreed to in writing, software
198
+ distributed under the License is distributed on an "AS IS" BASIS,
199
+ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
200
+ See the License for the specific language governing permissions and
201
+ limitations under the License.
202
+ ```
data/README.md ADDED
File without changes
data/Rakefile ADDED
@@ -0,0 +1,16 @@
1
+ require 'bundler/setup'
2
+ require "cucumber/rake/task"
3
+ lib = File.expand_path('../lib/', __FILE__)
4
+ $:.unshift lib unless $:.include?(lib)
5
+
6
+ require 'html2md'
7
+
8
+ Cucumber::Rake::Task.new do |t|
9
+ t.cucumber_opts = %w{--format pretty}
10
+ end
11
+
12
+ desc "Test"
13
+ task :t, [] => [] do |taks,args|
14
+ t = Html2Md.new(File.read('test.html'))
15
+ puts t.parse
16
+ end
@@ -0,0 +1,82 @@
1
+ Feature: Markdown
2
+ We need to convert basic HTML to markdown
3
+
4
+ Scenario: Create a H Rule (HR) element
5
+ * HTML <hr/>
6
+ * I say parse
7
+ * The markdown should be (\n* * * * *\n)
8
+
9
+ Scenario: Create a hard break (BR) element
10
+ * HTML <br/>
11
+ * I say parse
12
+ * The markdown should be ( \n)
13
+
14
+ Scenario: Paragraph (P) elements should be a single hard return
15
+ * HTML <p>
16
+ * I say parse
17
+ * The markdown should be (\n\n)
18
+
19
+ Scenario: Link (a href=) elements should should convert
20
+ * HTML <a href="/some/link.html"> Link </a>
21
+ * I say parse
22
+ * The markdown should be ([ Link ](/some/link.html))
23
+
24
+ Scenario: Other ancors should be ignored
25
+ * HTML <a name="link"> Link </a>
26
+ * I say parse
27
+ * The markdown should be ( Link )
28
+
29
+ Scenario: Ancors should reset after being used once
30
+ * HTML <a href="/some/link.html"> Link </a> <a name="link"> Link </a>
31
+ * I say parse
32
+ * The markdown should be ([ Link ](/some/link.html) Link )
33
+
34
+ Scenario: Other (a) elements should be ignored
35
+ * HTML <a> Text </a>
36
+ * I say parse
37
+ * The markdown should be ( Text )
38
+
39
+ Scenario: An order list
40
+ * HTML <ol><li>First</li><li>Second</li><ol>
41
+ * I say parse
42
+ * The markdown should be (\n 1. First\n 2. Second\n\n)
43
+
44
+ Scenario: An un-order list
45
+ * HTML <ul><li>First</li><li>Second</li><ul>
46
+ * I say parse
47
+ * The markdown should be (\n - First\n - Second\n\n)
48
+
49
+ Scenario: Complex List
50
+ * HTML <ul><li>First</li><li> <ol><li>First<ul><li>First</li><li>Second</li></ul></li><li>Second</li> </ol>Second</li><ul>
51
+ * I say parse
52
+ * The markdown should be (\n - First\n - \n 1. First\n - First\n - Second\n\n 2. Second\nSecond\n\n)
53
+
54
+ Scenario: Emphasis (em) element
55
+ * HTML <em>Emphasis</em>
56
+ * I say parse
57
+ * The markdown should be (_Emphasis_)
58
+
59
+ Scenario: Strong (strong) element
60
+ * HTML <strong>Emphasis</strong>
61
+ * I say parse
62
+ * The markdown should be (**Emphasis**)
63
+
64
+ Scenario: Pre (pre) element
65
+ * HTML <pre>This is some preformatted code</pre>
66
+ * I say parse
67
+ * The markdown should be (\n```\nThis is some preformatted code\n```\n)
68
+
69
+ Scenario: Other HTML Elements (div) should be ignored
70
+ * HTML <div>This is in a div</div>
71
+ * I say parse
72
+ * The markdown should be (This is in a div)
73
+
74
+ Scenario: Other HTML Elements (span) should be ignored
75
+ * HTML <span>This is in a span</span>
76
+ * I say parse
77
+ * The markdown should be (This is in a span)
78
+
79
+ Scenario: Character data should not have new lines
80
+ * HTML This is character data \n
81
+ * I say parse
82
+ * The markdown should be (This is character data \n\n)
@@ -0,0 +1,24 @@
1
+ # encoding: utf-8
2
+ require 'rspec/expectations'
3
+ require 'cucumber/formatter/unicode'
4
+ $:.unshift(File.dirname(__FILE__) + '/../../lib')
5
+ require 'html2md'
6
+
7
+ Before do
8
+ @html2md = Html2Md.new
9
+ end
10
+
11
+ After do
12
+ end
13
+
14
+ Given /HTML (.*)/ do |n|
15
+ @html2md.source = n.gsub("\\n", "\n")
16
+ end
17
+
18
+ When /I say parse/ do
19
+ @result = @html2md.parse
20
+ end
21
+
22
+ Then /The markdown should be \((.*)\)/ do |result|
23
+ @result.should == result.gsub("\\n", "\n")
24
+ end
data/lib/html2md.rb ADDED
@@ -0,0 +1,19 @@
1
+ require 'nokogiri'
2
+ require 'html2md/document'
3
+
4
+ class Html2Md
5
+ attr_accessor :options, :source
6
+
7
+ def initialize(source =nil , options = {})
8
+ @options = options
9
+ @source = source
10
+ end
11
+
12
+ def parse()
13
+ doc = Html2Md::Document.new()
14
+ doc.relative_url = options[:relative_url]
15
+ parser = Nokogiri::HTML::SAX::Parser.new(doc)
16
+ parser.parse(source)
17
+ parser.document.markdown
18
+ end
19
+ end
@@ -0,0 +1,3 @@
1
+ class Html2Md
2
+ VERSION = "0.1"
3
+ end
@@ -0,0 +1,176 @@
1
+ require 'nokogiri'
2
+ require 'uri'
3
+
4
+ class Html2Md
5
+ class Document < Nokogiri::XML::SAX::Document
6
+
7
+ attr_reader :markdown
8
+ attr_accessor :relative_url
9
+
10
+ def start_document
11
+ @markdown = ''
12
+ @last_href = nil
13
+ @allowed_tags = ['tr','td','th','table']
14
+ @current_list = -1
15
+ @list_tree = []
16
+
17
+ end
18
+
19
+ def start_tag(name,attributes = [])
20
+ if @allowed_tags.include? name
21
+ "<#{name}>"
22
+ else
23
+ ''
24
+ end
25
+ end
26
+
27
+ def end_tag(name,attributes = [])
28
+ if @allowed_tags.include? name
29
+ "</#{name}>"
30
+ else
31
+ ''
32
+ end
33
+ end
34
+
35
+ def start_element name, attributes = []
36
+ start_name = "start_#{name}".to_sym
37
+ both_name = "start_and_end_#{name}".to_sym
38
+
39
+ if self.respond_to?(both_name)
40
+ self.send( both_name, attributes )
41
+ elsif self.respond_to?(start_name)
42
+ self.send( start_name, attributes )
43
+ else
44
+ @markdown << start_tag(name)
45
+ end
46
+
47
+ end
48
+
49
+ def end_element name, attributes = []
50
+ end_name = "end_#{name}".to_sym
51
+ both_name = "start_and_end_#{name}".to_sym
52
+
53
+ if self.respond_to?(both_name)
54
+ self.send( both_name, attributes )
55
+ elsif self.respond_to?(end_name)
56
+ self.send( end_name, attributes )
57
+ else
58
+ @markdown << end_tag(name)
59
+ end
60
+ end
61
+
62
+ def start_hr(attributes)
63
+ @markdown << "\n* * * * *\n"
64
+ end
65
+
66
+ def end_hr(attributes)
67
+
68
+ end
69
+
70
+ def start_and_end_em(attributes)
71
+ @markdown << '_'
72
+ end
73
+
74
+ def start_and_end_strong(attributes)
75
+ @markdown << '**'
76
+ end
77
+
78
+ def start_br(attributes)
79
+ @markdown << " \n"
80
+ end
81
+
82
+ def end_br(attributes)
83
+
84
+ end
85
+
86
+ def start_p(attributes)
87
+
88
+ end
89
+
90
+ def end_p(attributes)
91
+ @markdown << "\n\n"
92
+ end
93
+
94
+ def start_a(attributes)
95
+ attributes.each do | attrib |
96
+ if attrib[0].downcase.eql? 'href'
97
+ @markdown << '['
98
+ @last_href = attrib[1]
99
+ end
100
+ end
101
+ end
102
+
103
+ def start_pre(attributes)
104
+ @markdown << "\n```\n"
105
+ end
106
+
107
+ def end_pre(attributes)
108
+ @markdown << "\n```\n"
109
+ end
110
+
111
+ def end_a(attributes)
112
+ if @last_href and not (['http','https'].include? URI(@last_href).scheme)
113
+ begin
114
+ rp = URI(relative_url)
115
+ rp.path = @last_href
116
+ @last_href = rp.to_s
117
+ rescue
118
+ end
119
+ end
120
+
121
+ @markdown << "](#{@last_href})" if @last_href
122
+ @last_href = nil if @last_href
123
+
124
+ end
125
+
126
+ def start_ul(attributes)
127
+ @list_tree.push( { :type => :ul, :current_element => 0 } )
128
+ @markdown << "\n"
129
+ end
130
+
131
+ def end_ul(attributes)
132
+ @list_tree.pop
133
+ end
134
+
135
+ def start_ol(attributes)
136
+ @list_tree.push( { :type => :ol, :current_element => 0 } )
137
+ @markdown << "\n"
138
+ end
139
+
140
+ def end_ol(attributes)
141
+ @list_tree.pop
142
+ end
143
+
144
+ def start_li(attributes)
145
+
146
+ @list_tree.length.times do
147
+ @markdown << " "
148
+ end
149
+
150
+ @list_tree[-1][:current_element] += 1
151
+
152
+ case @list_tree[-1][:type]
153
+ when :ol
154
+ @markdown << "#{ @list_tree[-1][:current_element] }. "
155
+ when :ul
156
+ @markdown << "- "
157
+ end
158
+
159
+ end
160
+
161
+ def end_li(attributes)
162
+ @markdown << "\n"
163
+ end
164
+
165
+ def characters c
166
+ if @list_tree[-1]
167
+ @markdown << c.chomp.lstrip.rstrip
168
+ else
169
+ @markdown << c.chomp
170
+ end
171
+ end
172
+
173
+
174
+ end
175
+ end
176
+
metadata ADDED
@@ -0,0 +1,54 @@
1
+ --- !ruby/object:Gem::Specification
2
+ name: html2md
3
+ version: !ruby/object:Gem::Version
4
+ version: '0.1'
5
+ prerelease:
6
+ platform: ruby
7
+ authors:
8
+ - Paul Morton
9
+ autorequire:
10
+ bindir: bin
11
+ cert_chain: []
12
+ date: 2012-03-18 00:00:00.000000000 Z
13
+ dependencies: []
14
+ description: ! ' Converts Basic HTML to markdown
15
+
16
+ '
17
+ email: geeksitk@gmail.com
18
+ executables: []
19
+ extensions: []
20
+ extra_rdoc_files: []
21
+ files:
22
+ - README.md
23
+ - Rakefile
24
+ - LICENSE.md
25
+ - lib/html2md/document.rb
26
+ - lib/html2md/VERSION.rb
27
+ - lib/html2md.rb
28
+ - features/markdown.feature
29
+ - features/step_definitions/markdown_steps.rb
30
+ homepage: http://github.com/pmorton/html2md
31
+ licenses: []
32
+ post_install_message:
33
+ rdoc_options: []
34
+ require_paths:
35
+ - lib
36
+ required_ruby_version: !ruby/object:Gem::Requirement
37
+ none: false
38
+ requirements:
39
+ - - ! '>='
40
+ - !ruby/object:Gem::Version
41
+ version: '0'
42
+ required_rubygems_version: !ruby/object:Gem::Requirement
43
+ none: false
44
+ requirements:
45
+ - - ! '>='
46
+ - !ruby/object:Gem::Version
47
+ version: '0'
48
+ requirements: []
49
+ rubyforge_project:
50
+ rubygems_version: 1.8.15
51
+ signing_key:
52
+ specification_version: 3
53
+ summary: A library for converting basic html to markdown
54
+ test_files: []