docx_converter 1.0.0

Sign up to get free protection for your applications and to get access to all the features.
@@ -0,0 +1,7 @@
1
+ *.gem
2
+ .bundle
3
+ Gemfile.lock
4
+ pkg/*
5
+ *~
6
+ *.swp
7
+ *.kate-swp
data/Gemfile ADDED
@@ -0,0 +1,3 @@
1
+ source "http://rubygems.org"
2
+
3
+ gemspec
@@ -0,0 +1,57 @@
1
+ docx-converter
2
+ =================
3
+
4
+ This Ruby library (gem) parses and translates `.docx` Word documents into kramdown syntax, which allows for easy subsequent translation into `html` or `TeX` code via the excellent `kramdown` library. `kramdown` is a superset of `Markdown`. See http://kramdown.gettalong.org/ for more details.
5
+
6
+ A `.docx` file as written by modern versions of Microsoft Office is just a `.zip` file in disguise. It contains a directory tree containing XML files. Parsing of these compressed XML trees is rather staightforward, thanks to the `zip` and `nokogiri` Ruby libraries.
7
+
8
+ `docx-converter` contains a parser which translates all common Word document entities into corresponding `kramdown` syntax. It extracts images and converts them into `.jpg` files with a maximum width or height of 800 pixels.
9
+
10
+ Output files and directories will be created according to the `webgen` conventions. This is useful when you want to generate a static website with the `webgen` gem after you have converted your `.docx` file into `html`. The file naming is in the format `ss.nnnn.ll.page`, where `ss` is a 2-digit sort number, `nnnn` is the main file name, `ll` is the language code. For more information on `webgen` see http://webgen.gettalong.org/
11
+
12
+ `docx_converter` was written for our project `publishr_web`, see http://documentation.red-e.eu/publishr/index.html
13
+
14
+ Supported Word elements:
15
+
16
+ * Paragraph
17
+ * Line break
18
+ * Page break
19
+ * Bold
20
+ * Italic
21
+ * paragraph styles like Heading1, Heading2 and Title
22
+ * character styles like Strong and Quote
23
+ * footnotes
24
+ * images including captions
25
+ * non-breaking spaces
26
+
27
+ Installation
28
+ ----------
29
+
30
+ `gem install docx-converter`
31
+
32
+ Usage
33
+ ----------
34
+
35
+ From the command line:
36
+
37
+ `docx-converter` `inputfile` `format` `output_directory`
38
+
39
+ `format` can be either `kramdown`, `html` or `latex`. For example:
40
+
41
+ `docx-converter` `~/Downloads/testdoc1.docx` `latex` `/tmp/docxoutput`
42
+
43
+ `output_directory` will be created if it doesn't exist. A subdirectory `/src` will be created by default, which is merely a convention to be identical with the `webgen` file system standard.
44
+
45
+ If you want to use `docx_converter` from a Ruby script, you can use the API like this:
46
+
47
+ r = DocxConverter::Render.new(options)
48
+ rendered_filepaths = r.render(:html)
49
+
50
+ `options` is a hash with the following keys
51
+
52
+ * `:output_dir`: The directory to be created for the output files. A subdirectory `/src` will be created by default, which is merely a convention to be identical with the `webgen` file system standard.
53
+ * `:inputfile`: The path to the `.docx` file to be parsed
54
+ * `:image_subdir_filesystem`: The subdirectory name into which images will be put. It will be created below the `/src` subdirectory.
55
+ * `:image_subdir_kramdown`: Usually this is identical to `:image_subdir_filesystem` and should only be different when you do further manual postprocessing with the kramdown output. This string will be added as a prefix for images in the final kramdown output. An example: `![image description](/image_subdir_kramdown/imagename.jpg)`.
56
+ * `:language`: The language to be used for the generated file names. See `webgen` conventions above.
57
+ * `:split_chapters`: when `true`, the output files will be split between headings which have the Word paragraph style "Heading1". This is useful for large documents. When `false`, no splitting is done and all content will be output to the file `01.chapter01.ll.page`. Footnotes will be split correctly into the various chapters.
@@ -0,0 +1 @@
1
+ require "bundler/gem_tasks"
@@ -0,0 +1,51 @@
1
+ #!/usr/bin/env ruby
2
+
3
+ # docx_converter -- Converts docx files into html or LaTeX via the kramdown syntax
4
+ # Copyright (C) 2013 Red (E) Tools Ltd. (www.thebigrede.net)
5
+ #
6
+ # This program is free software: you can redistribute it and/or modify
7
+ # it under the terms of the GNU Affero General Public License as
8
+ # published by the Free Software Foundation, either version 3 of the
9
+ # License, or (at your option) any later version.
10
+ #
11
+ #
12
+ # This program is distributed in the hope that it will be useful,
13
+ # but WITHOUT ANY WARRANTY; without even the implied warranty of
14
+ # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
15
+ # GNU Affero General Public License for more details.
16
+ #
17
+ # You should have received a copy of the GNU Affero General Public License
18
+ # along with this program. If not, see <http://www.gnu.org/licenses/>.
19
+
20
+ $LOAD_PATH.unshift File.expand_path('../../lib', __FILE__)
21
+
22
+ require 'docx_converter'
23
+
24
+ if ARGV[2].nil?
25
+ puts 'Error: Arguments missing! See https://github.com/michaelfranzl /docx_converter or README.md for documentation.'
26
+ Process.exit!
27
+ end
28
+
29
+ options = {
30
+ :inputfile => ARGV[0],
31
+ :image_subdir_filesystem => "images",
32
+ :image_subdir_kramdown => "images",
33
+ :output_dir => File.join(ARGV[2], "src"),
34
+ :language => "en",
35
+ :split_chapters => true
36
+ }
37
+
38
+ output_format = ARGV[1].to_sym
39
+
40
+ supported_output_formats = [:kramdown, :html, :latex]
41
+ unless supported_output_formats.include? output_format
42
+ puts "Output format must be one of #{ supported_output_formats.join(", ") }. Exiting."
43
+ Process.exit!
44
+ end
45
+
46
+ r = DocxConverter::Render.new(options)
47
+ rendered_filepaths = r.render(output_format)
48
+
49
+ if rendered_filepaths.any?
50
+ puts "Rendered files #{ rendered_filepaths.join(", ") } in directory #{ File.join(ARGV[2], "src") }"
51
+ end
@@ -0,0 +1,26 @@
1
+ # -*- encoding: utf-8 -*-
2
+ $:.push File.expand_path("../lib", __FILE__)
3
+ require "docx_converter/version"
4
+
5
+ Gem::Specification.new do |s|
6
+ s.name = "docx_converter"
7
+ s.version = DocxConverter::VERSION
8
+ s.authors = ["Michael Franzl"]
9
+ s.email = ["office@michaelfranzl.com"]
10
+ s.homepage = "https://github.com/michaelfranzl/docx_converter"
11
+ s.summary = %q{Converts Word docx files into html or LaTeX via the kramdown syntax}
12
+ s.description = %q{Converts Word docx files into html or LaTeX via the kramdown syntax. It supports Word's most common paragraph, character and mixed styles (Title, Heading, Strong, Quote), footnotes, line breaks, page breaks, non-breaking spaces and images with captions. The output is in kramdown syntax (see http://kramdown.gettalong.org/) which can be converted into beautiful html and LaTex code.}
13
+ s.rubyforge_project = "docx_converter"
14
+
15
+ s.files = `git ls-files`.split("\n")
16
+ s.test_files = `git ls-files -- {test,spec,features}/*`.split("\n")
17
+ s.executables = `git ls-files -- bin/*`.split("\n").map{ |f| File.basename(f) }
18
+ s.require_paths = ["lib"]
19
+
20
+ # specify any dependencies here; for example:
21
+ s.add_runtime_dependency 'kramdown'
22
+ s.add_runtime_dependency 'nokogiri'
23
+ s.add_runtime_dependency 'rubyzip'
24
+ s.add_runtime_dependency 'rmagick'
25
+ s.add_runtime_dependency 'ruby-filemagic'
26
+ end
@@ -0,0 +1,28 @@
1
+ # docx_converter -- Converts Word docx files into html or LaTeX via the kramdown syntax
2
+ # Copyright (C) 2013 Red (E) Tools Ltd. (www.thebigrede.net)
3
+ #
4
+ # This program is free software: you can redistribute it and/or modify
5
+ # it under the terms of the GNU Affero General Public License as
6
+ # published by the Free Software Foundation, either version 3 of the
7
+ # License, or (at your option) any later version.
8
+ #
9
+ #
10
+ # This program is distributed in the hope that it will be useful,
11
+ # but WITHOUT ANY WARRANTY; without even the implied warranty of
12
+ # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
13
+ # GNU Affero General Public License for more details.
14
+ #
15
+ # You should have received a copy of the GNU Affero General Public License
16
+ # along with this program. If not, see <http://www.gnu.org/licenses/>.
17
+
18
+ require 'publishr'
19
+ require 'kramdown'
20
+ require 'nokogiri'
21
+ require 'zip/zipfilesystem'
22
+ require 'filemagic'
23
+ require 'RMagick'
24
+
25
+ dir = File.dirname(__FILE__)
26
+ Dir[File.expand_path("#{dir}/docx_converter/*.rb")].uniq.each do |file|
27
+ require file
28
+ end
@@ -0,0 +1,241 @@
1
+ # encoding: UTF-8
2
+
3
+ # docx_converter -- Converts Word docx files into html or LaTeX via the kramdown syntax
4
+ # Copyright (C) 2013 Red (E) Tools Ltd. (www.thebigrede.net)
5
+ #
6
+ # This program is free software: you can redistribute it and/or modify
7
+ # it under the terms of the GNU Affero General Public License as
8
+ # published by the Free Software Foundation, either version 3 of the
9
+ # License, or (at your option) any later version.
10
+ #
11
+ #
12
+ # This program is distributed in the hope that it will be useful,
13
+ # but WITHOUT ANY WARRANTY; without even the implied warranty of
14
+ # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
15
+ # GNU Affero General Public License for more details.
16
+ #
17
+ # You should have received a copy of the GNU Affero General Public License
18
+ # along with this program. If not, see <http://www.gnu.org/licenses/>.
19
+
20
+ module DocxConverter
21
+ class Parser
22
+ def initialize(options)
23
+ @output_dir = options[:output_dir]
24
+ @docx_filepath = options[:inputfile]
25
+
26
+ @image_subdir_filesystem = options[:image_subdir_filesystem]
27
+ @image_subdir_kramdown = options[:image_subdir_kramdown]
28
+
29
+ @relationships_hash = {}
30
+
31
+ @zipfile = Zip::ZipFile.new(@docx_filepath)
32
+ end
33
+
34
+ def parse
35
+ document_xml = unzip_read("word/document.xml")
36
+ footnotes_xml = unzip_read("word/footnotes.xml")
37
+ relationships_xml = unzip_read("word/_rels/document.xml.rels")
38
+
39
+ content = Nokogiri::XML(document_xml)
40
+ footnotes = Nokogiri::XML(footnotes_xml)
41
+ relationships = Nokogiri::XML(relationships_xml)
42
+
43
+ @relationships_hash = parse_relationships(relationships)
44
+
45
+ footnote_definitions = parse_footnotes(footnotes)
46
+ output_content = parse_content(content.elements.first,0)
47
+
48
+ return {
49
+ :content => output_content,
50
+ :footnote_definitions => footnote_definitions
51
+ }
52
+ end
53
+
54
+ private
55
+
56
+ def unzip_read(zip_path)
57
+ file = @zipfile.find_entry(zip_path)
58
+ contents = ""
59
+ file.get_input_stream do |f|
60
+ contents = f.read
61
+ end
62
+ return contents
63
+ end
64
+
65
+ # this is only needed for embedded images
66
+ def extract_image(zip_path)
67
+ file_contents = unzip_read(zip_path)
68
+ extract_basename = File.basename(zip_path, ".*") + ".jpg"
69
+ extract_path = File.join(@output_dir, @image_subdir_filesystem, extract_basename)
70
+
71
+ fm = FileMagic.new
72
+ filetype = fm.buffer(file_contents)
73
+ case filetype
74
+ when /^JPEG image data/, /^PNG image data/
75
+ img = Magick::Image.from_blob(file_contents)[0]
76
+ if img.columns > 800 || img.rows > 800
77
+ img.resize_to_fit!(800)
78
+ end
79
+ ret = img.write(extract_path) {
80
+ self.format = "JPEG"
81
+ self.quality = 80
82
+ }
83
+ end
84
+ if @image_subdir_kramdown.empty?
85
+ kramdown_path = extract_basename
86
+ else
87
+ kramdown_path = File.join(@image_subdir_kramdown, extract_basename)
88
+ end
89
+ return kramdown_path
90
+ end
91
+
92
+ def parse_relationships(relationships)
93
+ output = {}
94
+ relationships.children.first.children.each do |rel|
95
+ rel_id = rel.attributes["Id"].value
96
+ rel_target = rel.attributes["Target"].value
97
+ output[rel_id] = rel_target
98
+ end
99
+ return output
100
+ end
101
+
102
+ def parse_footnotes(node)
103
+ output = {}
104
+ node.xpath("//w:footnote").each do |fnode|
105
+ footnote_number = fnode.attributes["id"].value
106
+ if ["-1", "0"].include?(footnote_number)
107
+ # Word outputs -1 and 0 as 'magic' footnotes
108
+ next
109
+ end
110
+ output[footnote_number] = parse_content(fnode,0).strip
111
+ end
112
+ return output
113
+ end
114
+
115
+ def parse_content(node,depth)
116
+ output = ""
117
+ depth += 1
118
+ children_count = node.children.length
119
+ i = 0
120
+
121
+ while i < children_count
122
+ add = ""
123
+ nd = node.children[i]
124
+
125
+ case nd.name
126
+ when "body"
127
+ # This is just a container element.
128
+ add = parse_content(nd,depth)
129
+
130
+ when "document"
131
+ # This is just a container element.
132
+ add = parse_content(nd,depth)
133
+
134
+ when "p"
135
+ # This is a paragraph. In kramdown, paragraphs are spearated by an empty line.
136
+ add = parse_content(nd,depth) + "\n\n"
137
+
138
+ when "pPr"
139
+ # This is Word's paragraph-level preset
140
+ add = parse_content(nd,depth)
141
+
142
+ when "pStyle"
143
+ # This is a reference to one of Word's paragraph-level styles
144
+ case nd["w:val"]
145
+ when "Title"
146
+ add = "{: .class = 'title' }\n"
147
+ when "Heading1"
148
+ add = "# "
149
+ when "Heading2"
150
+ add = "## "
151
+ when "Quote"
152
+ add = "> "
153
+ end
154
+
155
+ when "r"
156
+ # This corresponds to Word's character/inline node. Word's XML is not nested for formatting, wo we cannot descend recursively and 'close' kramdown's formatting in the recursion. Rather, we have to look ahead if this node is formatted, and if yes, set a formatting prefix and postfix which is required for kramdown (e.g. **bold**).
157
+ prefix = postfix = ""
158
+ first_child = nd.children.first
159
+
160
+ case first_child.name
161
+ when "rPr"
162
+ # This inline node is formatted. The first child always specifies the formatting of the subsequent 't' (text) node.
163
+ format_node = first_child.children.first
164
+ case format_node.name
165
+ when "b"
166
+ # This is regular (non-style) bold
167
+ prefix = postfix = "**"
168
+ when "i"
169
+ # This is regular (non-style) italic
170
+ prefix = postfix = "*"
171
+ when "rStyle"
172
+ # This is a reference to one of Word's style names
173
+ case format_node.attributes["val"].value
174
+ when "Strong"
175
+ # "Strong" is a predefined Word style
176
+ # This node is missing the xml:space="preserve" attribute, so we need to set the spaces ourselves.
177
+ prefix = " **"
178
+ postfix = "** "
179
+ end
180
+ end
181
+ add = prefix + parse_content(nd,depth) + postfix
182
+ when "br"
183
+ if first_child.attributes.empty?
184
+ # This is a line break. In kramdown, this corresponds to two spaces followed by a newline.
185
+ add = " \n"
186
+ else first_child.attributes["type"] == "page"
187
+ # this is a Word page break
188
+ add = "<br style='page-break-before:always;'>"
189
+ end
190
+
191
+ else
192
+ add = parse_content(nd,depth)
193
+ end
194
+
195
+
196
+ when "t"
197
+ # this is a regular text node
198
+ add = nd.text
199
+
200
+ when "footnoteReference"
201
+ # output the Kramdown footnote syntax
202
+ footnote_number = nd.attributes["id"].value
203
+ add = "[^#{ footnote_number }]"
204
+
205
+ when "tbl"
206
+ # parse the table recursively
207
+ add = parse_content(nd,depth)
208
+
209
+ when "tr"
210
+ # select all paragraph nodes below the table row and render them into Kramdown syntax
211
+ table_paragraphs = nd.xpath(".//w:p")
212
+ td = []
213
+ table_paragraphs.each do |tp|
214
+ td << parse_content(tp,depth)
215
+ end
216
+ add = "|" + td.join("|") + "|\n"
217
+
218
+ when "drawing"
219
+ image_nodes = nd.xpath(".//a:blip", :a => 'http://schemas.openxmlformats.org/drawingml/2006/main')
220
+ image_node = image_nodes.first
221
+ image_id = image_node.attributes["embed"].value
222
+ image_path_zip = File.join("word", @relationships_hash[image_id])
223
+
224
+ extracted_imagename = extract_image(image_path_zip)
225
+
226
+ add = "![](#{ extracted_imagename })\n"
227
+ else
228
+ # ignore those nodes
229
+ # puts ' ' * depth + "ELSE: #{ nd.name }"
230
+ end
231
+
232
+ output += add
233
+ i += 1
234
+ end
235
+
236
+ depth -= 1
237
+ return output
238
+ end
239
+
240
+ end
241
+ end
@@ -0,0 +1,75 @@
1
+ # encoding: UTF-8
2
+
3
+ # docx_converter -- Converts Word docx files into html or LaTeX via the kramdown syntax
4
+ # Copyright (C) 2013 Red (E) Tools Ltd. (www.thebigrede.net)
5
+ #
6
+ # This program is free software: you can redistribute it and/or modify
7
+ # it under the terms of the GNU Affero General Public License as
8
+ # published by the Free Software Foundation, either version 3 of the
9
+ # License, or (at your option) any later version.
10
+ #
11
+ #
12
+ # This program is distributed in the hope that it will be useful,
13
+ # but WITHOUT ANY WARRANTY; without even the implied warranty of
14
+ # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
15
+ # GNU Affero General Public License for more details.
16
+ #
17
+ # You should have received a copy of the GNU Affero General Public License
18
+ # along with this program. If not, see <http://www.gnu.org/licenses/>.
19
+
20
+ module DocxConverter
21
+ class PostProcessor
22
+
23
+ def initialize(content, footnote_definitions)
24
+ @content = content
25
+ @footnote_definitions = footnote_definitions
26
+
27
+ @chapters = []
28
+ end
29
+
30
+ # getter method
31
+ def chapters
32
+ return @chapters
33
+ end
34
+
35
+ def join_blockquotes
36
+ lines = @content.split("\n")
37
+ processed_lines = []
38
+ lines.size.times do |i|
39
+ if /^> /.match(lines[i-1]) && /^> /.match(lines[i+1])
40
+ processed_lines << ">" + lines[i]
41
+ else
42
+ processed_lines << lines[i]
43
+ end
44
+ end
45
+ @content = processed_lines.join("\n")
46
+ @chapters[0] = @content
47
+ return @content
48
+ end
49
+
50
+ def split_into_chapters
51
+ chapter_number = 0
52
+ @chapters[chapter_number] = ""
53
+ @content.split("\n").each do |line|
54
+ if /^# /.match(line)
55
+ # this is the style Heading1. A new chapter begins here.
56
+ chapter_number += 1
57
+ @chapters[chapter_number] = ""
58
+ end
59
+ @chapters[chapter_number] += line + "\n"
60
+ end
61
+ return @chapters
62
+ end
63
+
64
+ def add_foonote_definitions
65
+ @chapters.size.times do |n|
66
+ footnote_ids = @chapters[n].scan(/\[\^(.+?)\]/).flatten
67
+ @chapters[n] += "\n\n"
68
+ footnote_ids.each do |i|
69
+ @chapters[n] += "[^#{ i }]: #{ @footnote_definitions[i] }\n\n"
70
+ end
71
+ end
72
+ end
73
+
74
+ end
75
+ end
@@ -0,0 +1,119 @@
1
+ # encoding: UTF-8
2
+
3
+ # docx_converter -- Converts Word docx files into html or LaTeX via the kramdown syntax
4
+ # Copyright (C) 2013 Red (E) Tools Ltd. (www.thebigrede.net)
5
+ #
6
+ # This program is free software: you can redistribute it and/or modify
7
+ # it under the terms of the GNU Affero General Public License as
8
+ # published by the Free Software Foundation, either version 3 of the
9
+ # License, or (at your option) any later version.
10
+ #
11
+ #
12
+ # This program is distributed in the hope that it will be useful,
13
+ # but WITHOUT ANY WARRANTY; without even the implied warranty of
14
+ # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
15
+ # GNU Affero General Public License for more details.
16
+ #
17
+ # You should have received a copy of the GNU Affero General Public License
18
+ # along with this program. If not, see <http://www.gnu.org/licenses/>.
19
+
20
+ module DocxConverter
21
+ class Render
22
+
23
+ def initialize(options)
24
+ @options = options
25
+ @output_dir = options[:output_dir]
26
+
27
+ @chapters = nil
28
+ end
29
+
30
+ def render(output_format)
31
+ FileUtils.mkdir_p(File.join(@output_dir, @options[:image_subdir_filesystem]))
32
+
33
+ output = DocxConverter::Parser.new(@options).parse
34
+
35
+ content = output[:content]
36
+ footnote_definitions = output[:footnote_definitions]
37
+
38
+ p = DocxConverter::PostProcessor.new(content, footnote_definitions)
39
+ p.join_blockquotes
40
+
41
+ if @options[:split_chapters] == true
42
+ p.split_into_chapters
43
+ end
44
+
45
+ p.add_foonote_definitions
46
+ @chapters = p.chapters
47
+
48
+ case output_format
49
+ when :kramdown
50
+ filepaths = render_kramdown
51
+ when :html
52
+ filepaths = render_html
53
+ when :latex
54
+ filepaths = render_latex
55
+ else
56
+ filepaths = []
57
+ end
58
+ return filepaths
59
+ end
60
+
61
+ private
62
+
63
+ def render_kramdown
64
+ # output is in .page file extension. this is merely a convetion taken from the webgen file system structure
65
+ rendered_kramdown_file_paths = []
66
+ if @options[:split_chapters] == true
67
+ @chapters.size.times do |n|
68
+ filename = "%02i.chapter%02i.%s.page" % [n, n, @options[:language]]
69
+ file_path = File.join(@output_dir, filename)
70
+ File.write(file_path, @chapters[n])
71
+ rendered_kramdown_file_paths << filename
72
+ end
73
+ else
74
+ filename = "01.chapter01.%s.page" % [@options[:language]]
75
+ file_path = File.join(@output_dir, filename)
76
+ File.write(file_path, @chapters[0])
77
+ rendered_kramdown_file_paths << filename
78
+ end
79
+ return rendered_kramdown_file_paths
80
+ end
81
+
82
+ def render_html
83
+ rendered_kramdown_file_paths = render_kramdown
84
+ rendered_html_file_paths = []
85
+ rendered_kramdown_file_paths.each do |kfp|
86
+ filename = kfp.gsub("page", "html")
87
+ file_path = File.join(@output_dir, filename)
88
+ kramdown = File.read(File.join(@output_dir, kfp))
89
+ html = Kramdown::Document.new(
90
+ kramdown,
91
+ :input => 'kramdown',
92
+ :line_width => 100000
93
+ ).to_html
94
+ File.write(file_path, html)
95
+ rendered_html_file_paths << filename
96
+ end
97
+ return rendered_html_file_paths
98
+ end
99
+
100
+ def render_latex
101
+ rendered_kramdown_file_paths = render_kramdown
102
+ rendered_latex_file_paths = []
103
+ rendered_kramdown_file_paths.each do |kfp|
104
+ filename = kfp.gsub("page", "tex")
105
+ file_path = File.join(@output_dir, filename)
106
+ kramdown = File.read(File.join(@output_dir, kfp))
107
+ latex = Kramdown::Document.new(
108
+ kramdown,
109
+ :input => 'kramdown',
110
+ :line_width => 100000
111
+ ).to_latex
112
+ File.write(file_path, latex)
113
+ rendered_latex_file_paths << filename
114
+ end
115
+ return rendered_latex_file_paths
116
+ end
117
+
118
+ end
119
+ end
@@ -0,0 +1,20 @@
1
+ # docx_converter -- Converts Word docx files into html or LaTeX via the kramdown syntax
2
+ # Copyright (C) 2013 Red (E) Tools Ltd. (www.thebigrede.net)
3
+ #
4
+ # This program is free software: you can redistribute it and/or modify
5
+ # it under the terms of the GNU Affero General Public License as
6
+ # published by the Free Software Foundation, either version 3 of the
7
+ # License, or (at your option) any later version.
8
+ #
9
+ #
10
+ # This program is distributed in the hope that it will be useful,
11
+ # but WITHOUT ANY WARRANTY; without even the implied warranty of
12
+ # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
13
+ # GNU Affero General Public License for more details.
14
+ #
15
+ # You should have received a copy of the GNU Affero General Public License
16
+ # along with this program. If not, see <http://www.gnu.org/licenses/>.
17
+
18
+ module DocxConverter
19
+ VERSION = "1.0.0"
20
+ end
metadata ADDED
@@ -0,0 +1,141 @@
1
+ --- !ruby/object:Gem::Specification
2
+ name: docx_converter
3
+ version: !ruby/object:Gem::Version
4
+ version: 1.0.0
5
+ prerelease:
6
+ platform: ruby
7
+ authors:
8
+ - Michael Franzl
9
+ autorequire:
10
+ bindir: bin
11
+ cert_chain: []
12
+ date: 2013-12-30 00:00:00.000000000 Z
13
+ dependencies:
14
+ - !ruby/object:Gem::Dependency
15
+ name: kramdown
16
+ requirement: !ruby/object:Gem::Requirement
17
+ none: false
18
+ requirements:
19
+ - - ! '>='
20
+ - !ruby/object:Gem::Version
21
+ version: '0'
22
+ type: :runtime
23
+ prerelease: false
24
+ version_requirements: !ruby/object:Gem::Requirement
25
+ none: false
26
+ requirements:
27
+ - - ! '>='
28
+ - !ruby/object:Gem::Version
29
+ version: '0'
30
+ - !ruby/object:Gem::Dependency
31
+ name: nokogiri
32
+ requirement: !ruby/object:Gem::Requirement
33
+ none: false
34
+ requirements:
35
+ - - ! '>='
36
+ - !ruby/object:Gem::Version
37
+ version: '0'
38
+ type: :runtime
39
+ prerelease: false
40
+ version_requirements: !ruby/object:Gem::Requirement
41
+ none: false
42
+ requirements:
43
+ - - ! '>='
44
+ - !ruby/object:Gem::Version
45
+ version: '0'
46
+ - !ruby/object:Gem::Dependency
47
+ name: rubyzip
48
+ requirement: !ruby/object:Gem::Requirement
49
+ none: false
50
+ requirements:
51
+ - - ! '>='
52
+ - !ruby/object:Gem::Version
53
+ version: '0'
54
+ type: :runtime
55
+ prerelease: false
56
+ version_requirements: !ruby/object:Gem::Requirement
57
+ none: false
58
+ requirements:
59
+ - - ! '>='
60
+ - !ruby/object:Gem::Version
61
+ version: '0'
62
+ - !ruby/object:Gem::Dependency
63
+ name: rmagick
64
+ requirement: !ruby/object:Gem::Requirement
65
+ none: false
66
+ requirements:
67
+ - - ! '>='
68
+ - !ruby/object:Gem::Version
69
+ version: '0'
70
+ type: :runtime
71
+ prerelease: false
72
+ version_requirements: !ruby/object:Gem::Requirement
73
+ none: false
74
+ requirements:
75
+ - - ! '>='
76
+ - !ruby/object:Gem::Version
77
+ version: '0'
78
+ - !ruby/object:Gem::Dependency
79
+ name: ruby-filemagic
80
+ requirement: !ruby/object:Gem::Requirement
81
+ none: false
82
+ requirements:
83
+ - - ! '>='
84
+ - !ruby/object:Gem::Version
85
+ version: '0'
86
+ type: :runtime
87
+ prerelease: false
88
+ version_requirements: !ruby/object:Gem::Requirement
89
+ none: false
90
+ requirements:
91
+ - - ! '>='
92
+ - !ruby/object:Gem::Version
93
+ version: '0'
94
+ description: Converts Word docx files into html or LaTeX via the kramdown syntax.
95
+ It supports Word's most common paragraph, character and mixed styles (Title, Heading,
96
+ Strong, Quote), footnotes, line breaks, page breaks, non-breaking spaces and images
97
+ with captions. The output is in kramdown syntax (see http://kramdown.gettalong.org/)
98
+ which can be converted into beautiful html and LaTex code.
99
+ email:
100
+ - office@michaelfranzl.com
101
+ executables:
102
+ - docx_converter
103
+ extensions: []
104
+ extra_rdoc_files: []
105
+ files:
106
+ - .gitignore
107
+ - Gemfile
108
+ - README.md
109
+ - Rakefile
110
+ - bin/docx_converter
111
+ - docx_converter.gemspec
112
+ - lib/docx_converter.rb
113
+ - lib/docx_converter/parser.rb
114
+ - lib/docx_converter/postprocessor.rb
115
+ - lib/docx_converter/render.rb
116
+ - lib/docx_converter/version.rb
117
+ homepage: https://github.com/michaelfranzl/docx_converter
118
+ licenses: []
119
+ post_install_message:
120
+ rdoc_options: []
121
+ require_paths:
122
+ - lib
123
+ required_ruby_version: !ruby/object:Gem::Requirement
124
+ none: false
125
+ requirements:
126
+ - - ! '>='
127
+ - !ruby/object:Gem::Version
128
+ version: '0'
129
+ required_rubygems_version: !ruby/object:Gem::Requirement
130
+ none: false
131
+ requirements:
132
+ - - ! '>='
133
+ - !ruby/object:Gem::Version
134
+ version: '0'
135
+ requirements: []
136
+ rubyforge_project: docx_converter
137
+ rubygems_version: 1.8.23
138
+ signing_key:
139
+ specification_version: 3
140
+ summary: Converts Word docx files into html or LaTeX via the kramdown syntax
141
+ test_files: []