omnidocx 0.1.0

Sign up to get free protection for your applications and to get access to all the features.
@@ -0,0 +1,7 @@
1
+ ---
2
+ SHA1:
3
+ metadata.gz: eb2d9b3c0660ef87f01bbae03091086530c53356
4
+ data.tar.gz: 06199961c7335fc951154813e6f20ba7905f3cce
5
+ SHA512:
6
+ metadata.gz: 0db84fb799547a87ed70d396959846117f9a8fea6f5426e0a42f72b12b889866f685cfcfad71e86be6d468cbbfc36a8ac420590d1bb2b158e15996ae655972e2
7
+ data.tar.gz: e079825e879615772658e23cd28e7adeee0177e8fab882f3190b258434bf395d19a5e5b24b698de007994a2b25575404d9ce00e2bf2822c9bd7a6995a13c965b
Binary file
@@ -0,0 +1,9 @@
1
+ /.bundle/
2
+ /.yardoc
3
+ /Gemfile.lock
4
+ /_yardoc/
5
+ /coverage/
6
+ /doc/
7
+ /pkg/
8
+ /spec/reports/
9
+ /tmp/
data/.rspec ADDED
@@ -0,0 +1,2 @@
1
+ --format documentation
2
+ --color
@@ -0,0 +1,5 @@
1
+ sudo: false
2
+ language: ruby
3
+ rvm:
4
+ - 2.2.0
5
+ before_install: gem install bundler -v 1.13.6
@@ -0,0 +1,74 @@
1
+ # Contributor Covenant Code of Conduct
2
+
3
+ ## Our Pledge
4
+
5
+ In the interest of fostering an open and welcoming environment, we as
6
+ contributors and maintainers pledge to making participation in our project and
7
+ our community a harassment-free experience for everyone, regardless of age, body
8
+ size, disability, ethnicity, gender identity and expression, level of experience,
9
+ nationality, personal appearance, race, religion, or sexual identity and
10
+ orientation.
11
+
12
+ ## Our Standards
13
+
14
+ Examples of behavior that contributes to creating a positive environment
15
+ include:
16
+
17
+ * Using welcoming and inclusive language
18
+ * Being respectful of differing viewpoints and experiences
19
+ * Gracefully accepting constructive criticism
20
+ * Focusing on what is best for the community
21
+ * Showing empathy towards other community members
22
+
23
+ Examples of unacceptable behavior by participants include:
24
+
25
+ * The use of sexualized language or imagery and unwelcome sexual attention or
26
+ advances
27
+ * Trolling, insulting/derogatory comments, and personal or political attacks
28
+ * Public or private harassment
29
+ * Publishing others' private information, such as a physical or electronic
30
+ address, without explicit permission
31
+ * Other conduct which could reasonably be considered inappropriate in a
32
+ professional setting
33
+
34
+ ## Our Responsibilities
35
+
36
+ Project maintainers are responsible for clarifying the standards of acceptable
37
+ behavior and are expected to take appropriate and fair corrective action in
38
+ response to any instances of unacceptable behavior.
39
+
40
+ Project maintainers have the right and responsibility to remove, edit, or
41
+ reject comments, commits, code, wiki edits, issues, and other contributions
42
+ that are not aligned to this Code of Conduct, or to ban temporarily or
43
+ permanently any contributor for other behaviors that they deem inappropriate,
44
+ threatening, offensive, or harmful.
45
+
46
+ ## Scope
47
+
48
+ This Code of Conduct applies both within project spaces and in public spaces
49
+ when an individual is representing the project or its community. Examples of
50
+ representing a project or community include using an official project e-mail
51
+ address, posting via an official social media account, or acting as an appointed
52
+ representative at an online or offline event. Representation of a project may be
53
+ further defined and clarified by project maintainers.
54
+
55
+ ## Enforcement
56
+
57
+ Instances of abusive, harassing, or otherwise unacceptable behavior may be
58
+ reported by contacting the project team at parth.nagori@idfy.com. All
59
+ complaints will be reviewed and investigated and will result in a response that
60
+ is deemed necessary and appropriate to the circumstances. The project team is
61
+ obligated to maintain confidentiality with regard to the reporter of an incident.
62
+ Further details of specific enforcement policies may be posted separately.
63
+
64
+ Project maintainers who do not follow or enforce the Code of Conduct in good
65
+ faith may face temporary or permanent repercussions as determined by other
66
+ members of the project's leadership.
67
+
68
+ ## Attribution
69
+
70
+ This Code of Conduct is adapted from the [Contributor Covenant][homepage], version 1.4,
71
+ available at [http://contributor-covenant.org/version/1/4][version]
72
+
73
+ [homepage]: http://contributor-covenant.org
74
+ [version]: http://contributor-covenant.org/version/1/4/
data/Gemfile ADDED
@@ -0,0 +1,4 @@
1
+ source 'https://rubygems.org'
2
+
3
+ # Specify your gem's dependencies in omnidocx.gemspec
4
+ gemspec
@@ -0,0 +1,21 @@
1
+ The MIT License (MIT)
2
+
3
+ Copyright (c) 2017 TODO: Write your name
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining a copy
6
+ of this software and associated documentation files (the "Software"), to deal
7
+ in the Software without restriction, including without limitation the rights
8
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
+ copies of the Software, and to permit persons to whom the Software is
10
+ furnished to do so, subject to the following conditions:
11
+
12
+ The above copyright notice and this permission notice shall be included in
13
+ all copies or substantial portions of the Software.
14
+
15
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
21
+ THE SOFTWARE.
@@ -0,0 +1,107 @@
1
+ # Omnidocx
2
+
3
+ Omnidocx is ruby gem that allows you to merge multiple docx (microsoft word) files into one, writing images to a docx file and making string replacements in the header, footer or main document content.
4
+ This gem works for docx files generated from microsoft word as well as google docs.
5
+
6
+ ## Installation
7
+
8
+ Add this line to your application's Gemfile:
9
+
10
+ ```ruby
11
+ gem 'omnidocx'
12
+ ```
13
+
14
+ And then execute:
15
+
16
+ $ bundle
17
+
18
+ Or install it yourself as:
19
+
20
+ $ gem install omnidocx
21
+
22
+ ## To Merge Documents
23
+
24
+ If you plan to have a header and footer applied to all the pages of the final document, then pass the document with header and footer as the first document in the array. Currently multiple different headers and footers are not supported.
25
+
26
+ While passing documents array make sure all the documents are from the same source, i.e., either microsoft word or google docs. Passing a mix of documents created from microsoft word and google docs might throw up namespace errors.
27
+
28
+
29
+ ```ruby
30
+ require 'omnidocx'
31
+
32
+ # To merge multiple docx files into one, you can use the following
33
+ # documents_to_merge is an array of documents (file paths) need to be merged and page_break is a boolean value if you want page breaks in b/w documents
34
+ $ Omnidocx::Docx.merge_documents(documents_to_merge=[], output_document_path, page_break)
35
+
36
+ # for e.g. if you had to merge two documents, just pass their entire paths in an array, if you need a page break in between documents then pass the page_break flag as true
37
+ $ Omnidocx::Docx.merge_documents(['tmp/doc1.docx', 'tmp/doc2.docx'], 'tmp/output_doc.docx', true)
38
+ ```
39
+
40
+ ## To Write Images to a Document
41
+ ```ruby
42
+ require 'omnidocx'
43
+
44
+ # To write images to a document, you can use the following
45
+ # images_to_write is an array of hashes, where each hash stores information about one image
46
+ $ Omnidocx::Docx.write_images_to_doc(images_to_write=[], input_document_path, output_document_path)
47
+
48
+ # Below is an example of the images_to_write array that you can pass in for images to be written to the doc
49
+ # image path, height and width are mandatory
50
+
51
+ $ images_to_write = [ {
52
+ :path => "tmp/image1.jpg", #URL || local path
53
+ :height => 500,
54
+ :width => 500,
55
+ :hdpi => 115, #optional
56
+ :vdpi => 115 #optional
57
+ },
58
+ :path => "https://xyz.com/abc.jpeg", #URL || local path
59
+ :height => 800,
60
+ :width => 500,
61
+ :hdpi => 115, #optional
62
+ :vdpi => 115 #optional
63
+ }
64
+ ]
65
+
66
+ ```
67
+
68
+ ## For String Replacements
69
+
70
+ There are three different methods that can be used for string replacements
71
+
72
+ ```ruby
73
+ require 'omnidocx'
74
+
75
+ # replacement_hash is a hash with keys present in the document that are to be replaced with their corresponding values
76
+
77
+ # For document content, you can use the following
78
+ $ Omnidocx::Docx.replace_doc_content(replacement_hash={}, input_document_path, output_document_path)
79
+
80
+ # For header content, you can use the following
81
+ $ Omnidocx::Docx.replace_header_content(replacement_hash={}, input_document_path, output_document_path)
82
+
83
+ # For footer content, you can use the following
84
+ $ Omnidocx::Docx.replace_footer_content(replacement_hash={}, input_document_path, output_document_path)
85
+
86
+ # Below is an example of how replacement_hash can be constructed
87
+ $ replacement_hash = { "first_name" => "John", "last_name" => "Doe"}
88
+
89
+ ```
90
+
91
+ Will be adding test specs soon.
92
+
93
+ ## Development
94
+
95
+ After checking out the repo, run `bin/setup` to install dependencies. Then, run `rake spec` to run the tests. You can also run `bin/console` for an interactive prompt that will allow you to experiment.
96
+
97
+ To install this gem onto your local machine, run `bundle exec rake install`. To release a new version, update the version number in `version.rb`, and then run `bundle exec rake release`, which will create a git tag for the version, push git commits and tags, and push the `.gem` file to [rubygems.org](https://rubygems.org).
98
+
99
+ ## Contributing
100
+
101
+ Bug reports and pull requests are welcome on GitHub at https://github.com/parthnagori/omnidocx. This project is intended to be a safe, welcoming space for collaboration, and contributors are expected to adhere to the [Contributor Covenant](http://contributor-covenant.org) code of conduct.
102
+
103
+
104
+ ## License
105
+
106
+ The gem is available as open source under the terms of the [MIT License](http://opensource.org/licenses/MIT).
107
+
@@ -0,0 +1,6 @@
1
+ require "bundler/gem_tasks"
2
+ require "rspec/core/rake_task"
3
+
4
+ RSpec::Core::RakeTask.new(:spec)
5
+
6
+ task :default => :spec
@@ -0,0 +1,14 @@
1
+ #!/usr/bin/env ruby
2
+
3
+ require "bundler/setup"
4
+ require "omnidocx"
5
+
6
+ # You can add fixtures and/or initialization code here to make experimenting
7
+ # with your gem easier. You can also use a different console, if you like.
8
+
9
+ # (If you use this, don't forget to add pry to your Gemfile!)
10
+ # require "pry"
11
+ # Pry.start
12
+
13
+ require "irb"
14
+ IRB.start
@@ -0,0 +1,8 @@
1
+ #!/usr/bin/env bash
2
+ set -euo pipefail
3
+ IFS=$'\n\t'
4
+ set -vx
5
+
6
+ bundle install
7
+
8
+ # Do any other automated setup that you need to do here
@@ -0,0 +1,539 @@
1
+ require "omnidocx/version"
2
+ require 'nokogiri'
3
+ require 'zip'
4
+ require 'tempfile'
5
+ require 'mime/types'
6
+ require 'open-uri'
7
+
8
+
9
+ module Omnidocx
10
+ class Docx
11
+ DOCUMENT_FILE_PATH = 'word/document.xml'
12
+ RELATIONSHIP_FILE_PATH = 'word/_rels/document.xml.rels'
13
+ CONTENT_TYPES_FILE = '[Content_Types].xml'
14
+ HEADER_RELS_FILE_PATH = 'word/_rels/header1.xml.rels'
15
+ FOOTER_RELS_FILE_PATH = 'word/_rels/footer1.xml.rels'
16
+ STYLES_FILE_PATH = "word/styles.xml"
17
+ HEADER_FILE_PATH = "word/header1.xml"
18
+ FOOTER_FILE_PATH = "word/footer1.xml"
19
+
20
+ MEDIA_TYPE = "http://schemas.openxmlformats.org/officeDocument/2006/relationships/image"
21
+
22
+ EMUSPERINCH = 914400
23
+ EMUSPERCM = 360000
24
+ HORIZONTAL_DPI = 115
25
+ VERTICAL_DPI = 117
26
+
27
+ NAMESPACES = {
28
+ "w": "http://schemas.openxmlformats.org/wordprocessingml/2006/main",
29
+ "wp": "http://schemas.openxmlformats.org/drawingml/2006/wordprocessingDrawing",
30
+ "a": "http://schemas.openxmlformats.org/drawingml/2006/main",
31
+ "pic": "http://schemas.openxmlformats.org/drawingml/2006/picture",
32
+ "r": "http://schemas.openxmlformats.org/officeDocument/2006/relationships"
33
+ }
34
+
35
+ IMAGE_ELEMENT = '<w:p xmlns:w="http://schemas.openxmlformats.org/wordprocessingml/2006/main" w:rsidR="00F127EA" w:rsidRDefault="00F127EA" w:rsidP="00BF4C96"><w:pPr><w:jc w:val="center"/></w:pPr><w:r><w:rPr><w:noProof/><w:lang w:eastAsia="en-IN"/></w:rPr><w:drawing><wp:inline xmlns:wp="http://schemas.openxmlformats.org/drawingml/2006/wordprocessingDrawing" distT="0" distB="0" distL="0" distR="0"><wp:extent cx="" cy=""/><wp:effectExtent l="0" t="0" r="2540" b="1905"/><wp:docPr id="" name=""/><wp:cNvGraphicFramePr><a:graphicFrameLocks xmlns:a="http://schemas.openxmlformats.org/drawingml/2006/main" noChangeAspect="1"/></wp:cNvGraphicFramePr><a:graphic xmlns:a="http://schemas.openxmlformats.org/drawingml/2006/main"><a:graphicData uri="http://schemas.openxmlformats.org/drawingml/2006/picture"><pic:pic xmlns:pic="http://schemas.openxmlformats.org/drawingml/2006/picture"><pic:nvPicPr><pic:cNvPr id="" name=""/><pic:cNvPicPr/></pic:nvPicPr><pic:blipFill><a:blip xmlns:r="http://schemas.openxmlformats.org/officeDocument/2006/relationships" r:embed=""><a:extLst><a:ext uri="{28A0092B-C50C-407E-A947-70E740481C1C}"><a14:useLocalDpi xmlns:a14="http://schemas.microsoft.com/office/drawing/2010/main" val="0"/></a:ext></a:extLst></a:blip><a:stretch><a:fillRect/></a:stretch></pic:blipFill><pic:spPr><a:xfrm><a:off x="0" y="0"/><a:ext cx="" cy=""/></a:xfrm><a:prstGeom prst="rect"><a:avLst/></a:prstGeom></pic:spPr></pic:pic></a:graphicData></a:graphic></wp:inline></w:drawing></w:r></w:p>'
36
+
37
+
38
+ def self.write_images_to_doc(images_to_write=[], doc_path, final_path)
39
+
40
+ temp_file = Tempfile.new('docxedit-')
41
+
42
+ #every docx file is ultimately a zip file with the extension as docx
43
+ @document_zip = Zip::File.new(doc_path)
44
+ #reading the document content xml from the zip
45
+ @document_content = @document_zip.read(DOCUMENT_FILE_PATH)
46
+ @document_xml = Nokogiri::XML @document_content
47
+
48
+ #every docx file has one body tag which essentially contains all the content of the doc
49
+ @body = @document_xml.xpath("//w:body")
50
+
51
+ @rel_doc = ""
52
+ @cont_type_doc = ""
53
+
54
+ cnt = 20
55
+ media_hash = {}
56
+
57
+ #to maintain a list of all the content type info to be added upon adding media with different extensions
58
+ media_content_type_hash = {}
59
+
60
+
61
+ @document_zip.entries.each do |e|
62
+ if e.name == RELATIONSHIP_FILE_PATH
63
+ in_stream = e.get_input_stream.read
64
+ @rel_doc = Nokogiri::XML in_stream #Relationships XML
65
+ end
66
+ if e.name == CONTENT_TYPES_FILE
67
+ in_stream = e.get_input_stream.read
68
+ @cont_type_doc = Nokogiri::XML in_stream #Content types XML to be updated later on with the additional media type info
69
+ end
70
+ end
71
+
72
+ Zip::OutputStream.open(temp_file.path) do |zos|
73
+
74
+ @document_zip.entries.each do |e|
75
+ unless [DOCUMENT_FILE_PATH, RELATIONSHIP_FILE_PATH, CONTENT_TYPES_FILE].include?(e.name)
76
+ #writing the files not needed to be edited back to the new zip
77
+ zos.put_next_entry(e.name)
78
+ zos.print e.get_input_stream.read
79
+ end
80
+ end
81
+
82
+ images_to_write.each_with_index do |img, index|
83
+ data = ''
84
+
85
+ #checking if image path is a url or a local path
86
+ uri = URI.parse(img[:path])
87
+ if %w( http https ).include?(uri.scheme)
88
+ data = Kernel.open(img[:path]).read rescue nil
89
+ else
90
+ File.open(img[:path], 'rb') do |f|
91
+ data = f.read rescue nil
92
+ end
93
+ end
94
+
95
+ #if image path is readable
96
+ if !data.empty?
97
+ extension = File.extname(img[:path]).split(".").last
98
+
99
+ if !media_content_type_hash.keys.include?(extension.split(".").last)
100
+ #making an entry for a new media type
101
+ media_content_type_hash["#{extension}"] = MIME::Types.type_for(img[:path])[0].to_s
102
+ end
103
+
104
+ zos.put_next_entry("word/media/image#{cnt}.#{extension}")
105
+ zos.print data #storing the image in the new zip
106
+
107
+ new_rel_node = Nokogiri::XML::Node.new("Relationship", @rel_doc)
108
+ new_rel_node["Id"] = "rid#{cnt}"
109
+ new_rel_node["Type"] = MEDIA_TYPE
110
+ new_rel_node["Target"] = "media/image#{cnt}.#{extension}"
111
+ @rel_doc.at('Relationships').add_child(new_rel_node) #adding a new relationship node to the relationships xml
112
+
113
+ hdpi = img[:hdpi] || HORIZONTAL_DPI
114
+ vdpi = img[:vdpi] || VERTICAL_DPI
115
+
116
+ #calculating the width and height of the image in EMUs, the format accepted by docx files
117
+ widthEmus = (img[:width].to_i / hdpi.to_i * EMUSPERINCH)
118
+ heightEmus = (img[:height].to_i / vdpi.to_i * EMUSPERINCH)
119
+
120
+ #creating a new drawing element with info like rid, height, width,etc.
121
+ @image_element_xml = Nokogiri::XML IMAGE_ELEMENT
122
+ @image_element_xml.xpath("//w:drawing", NAMESPACES).each do |dr_node|
123
+ docPr = dr_node.xpath(".//wp:docPr", NAMESPACES).last
124
+ docPr["name"] = "image#{cnt}.#{extension}"
125
+ docPr["id"] = "#{cnt}"
126
+
127
+ extent = dr_node.xpath(".//wp:extent", NAMESPACES).last
128
+ extent["cx"] = widthEmus.to_s
129
+ extent["cy"] = heightEmus.to_s
130
+
131
+ ext = dr_node.xpath(".//a:ext", NAMESPACES).last
132
+ ext["cx"] = widthEmus.to_s
133
+ ext["cy"] = heightEmus.to_s
134
+
135
+ pic_cNvPr = dr_node.xpath(".//pic:cNvPr", NAMESPACES).last
136
+ pic_cNvPr["name"] = "image#{cnt}.#{extension}"
137
+ pic_cNvPr["id"] = "#{cnt}"
138
+
139
+ blip = dr_node.xpath(".//a:blip", NAMESPACES).last
140
+ blip.attributes["embed"].value = "rid#{cnt}"
141
+ end
142
+
143
+ #appending the drawing element to the document's body
144
+ @body.children.last.add_previous_sibling(@image_element_xml.xpath("//w:p").last.to_xml)
145
+
146
+ media_hash[cnt] = index
147
+ end
148
+ cnt+=1
149
+ end
150
+
151
+ #updating the content type info
152
+ media_content_type_hash.each do |ext, cont_type|
153
+ new_default_node = Nokogiri::XML::Node.new("Default", @cont_type_doc)
154
+ new_default_node["Extension"] = ext
155
+ new_default_node["ContentType"] = cont_type
156
+ @cont_type_doc.at("Types").add_child(new_default_node)
157
+ end
158
+
159
+ #writing the content types xml to the new zip
160
+ zos.put_next_entry CONTENT_TYPES_FILE
161
+ zos.print @cont_type_doc.to_xml
162
+
163
+ #writing the relationships xml to the new zip
164
+ zos.put_next_entry RELATIONSHIP_FILE_PATH
165
+ zos.print @rel_doc.to_xml
166
+
167
+ #writing the updated document content xml to the new zip
168
+ zos.put_next_entry DOCUMENT_FILE_PATH
169
+ zos.print @document_xml.to_xml
170
+ end
171
+
172
+ #moving the temporary docx file to the final_path specified by the user
173
+ FileUtils.mv(temp_file.path, final_path)
174
+ end
175
+
176
+ def self.merge_documents(documents_to_merge=[], final_path, page_break)
177
+ temp_file = Tempfile.new('docxedit-')
178
+
179
+ if documents_to_merge.count < 2
180
+ return "Pass atleast two documents to be merged" #minimum two documents required to merge
181
+ end
182
+
183
+ #first document to which the others will be appended (header/footer will be picked from this document)
184
+ @main_document_zip = Zip::File.new(documents_to_merge.first)
185
+ @main_document_content = @main_document_zip.read(DOCUMENT_FILE_PATH)
186
+ @main_document_xml = Nokogiri::XML @main_document_content
187
+ @main_body = @main_document_xml.xpath("//w:body")
188
+ @rel_nodes = ""
189
+ @rel_doc = ""
190
+ @cont_type_doc = ""
191
+ @style_doc = ""
192
+ doc_cnt = 0
193
+ #cnt variable to construct relationship ids, taken a high value 100 to avoid duplication
194
+ cnt = 100
195
+ tbl_cnt = 10
196
+ #hash to store information about the media files and their corresponding new names
197
+ media_hash = {}
198
+ #rid_hash to store relationship information
199
+ rid_hash = {}
200
+ #table hash to store information if any tables present
201
+ table_hash = {}
202
+ #head_foot_media hash to store if any media files present in header/footer
203
+ head_foot_media = {}
204
+ #a counter for docPr element in the main document body
205
+ docPr_id = 100
206
+
207
+ #array to store content type information about media extensions
208
+ default_extensions = []
209
+ #array to store override content type information
210
+ override_partnames = []
211
+
212
+ #array to store information about additional content types other than the ones present in the first(main) document
213
+ additional_cont_type_entries = []
214
+
215
+
216
+ @main_document_zip.entries.each do |e|
217
+ if e.name == RELATIONSHIP_FILE_PATH
218
+ @in_stream = e.get_input_stream.read
219
+ @rel_doc = Nokogiri::XML @in_stream #Relationship XML
220
+ @rel_nodes = @rel_doc.css "Relationship"
221
+ end
222
+ if e.name == CONTENT_TYPES_FILE
223
+ in_stream = e.get_input_stream.read
224
+ @cont_type_doc = Nokogiri::XML in_stream #Content types XML to be updated later on with the additional media type info
225
+ default_nodes = @cont_type_doc.css "Default"
226
+ override_nodes = @cont_type_doc.css "Override"
227
+ default_nodes.each do |node|
228
+ default_extensions << node["Extension"]
229
+ end
230
+ override_nodes.each do |node|
231
+ override_partnames << node["PartName"]
232
+ end
233
+ end
234
+ if e.name == STYLES_FILE_PATH
235
+ in_stream = e.get_input_stream.read
236
+ @style_doc = Nokogiri::XML in_stream #Styles XML to be updated later on with the additional tables info
237
+ end
238
+ end
239
+
240
+
241
+ #opening a new zip for the final document
242
+ Zip::OutputStream.open(temp_file.path) do |zos|
243
+ documents_to_merge.each do |doc_path|
244
+ media_hash["doc#{doc_cnt}"] = {}
245
+ rid_hash["doc#{doc_cnt}"] = {}
246
+ head_foot_media["doc#{doc_cnt}"] = []
247
+ table_hash["doc#{doc_cnt}"] = {}
248
+ zip_file = Zip::File.new(doc_path)
249
+
250
+ zip_file.entries.each do |e|
251
+ if [HEADER_RELS_FILE_PATH, FOOTER_RELS_FILE_PATH].include?(e.name)
252
+ hf_content = e.get_input_stream.read
253
+ hf_xml = Nokogiri::XML hf_content
254
+ hf_xml.css("Relationship").each do |rel_node|
255
+ #media file names in header & footer need not be changed as they will be picked from the first document only and not the subsequent documents, so no chance of duplication
256
+ head_foot_media["doc#{doc_cnt}"] << rel_node["Target"].gsub("media/","")
257
+ end
258
+ end
259
+ if e.name == CONTENT_TYPES_FILE
260
+ cont_types = e.get_input_stream.read
261
+ cont_type_xml = Nokogiri::XML cont_types
262
+ default_nodes = cont_type_xml.css "Default"
263
+ override_nodes = cont_type_xml.css "Override"
264
+
265
+ default_nodes.each do |node|
266
+ #checking if extension type already present in the content types xml extracted from the first document
267
+ if !default_extensions.include?(node["Extension"]) && !node.to_xml.empty?
268
+ additional_cont_type_entries << node
269
+ default_extensions << node["Extension"] #extra extension type to be added to the content types XML
270
+ end
271
+ end
272
+
273
+ override_nodes.each do |node|
274
+ #checking if override content tpye infoalready present in the content types xml extracted from the first document
275
+ if !override_partnames.include?(node["PartName"]) && !node.to_xml.empty?
276
+ additional_cont_type_entries << node
277
+ override_partnames << node["Partname"] #extra content type info to be added to the content types XML
278
+ end
279
+ end
280
+ end
281
+ end
282
+
283
+
284
+ zip_file.entries.each do |e|
285
+ unless e.name == DOCUMENT_FILE_PATH || [RELATIONSHIP_FILE_PATH, CONTENT_TYPES_FILE, STYLES_FILE_PATH].include?(e.name)
286
+ if e.name.include?("word/media/image")
287
+ if !head_foot_media["doc#{doc_cnt}"].include?(e.name.gsub("word/media/",""))
288
+ #renaming media files with a higher counter to avoid duplicaiton in case multiple documents have images present
289
+ e_name = e.name.gsub(/image[0-9]*./,"image#{cnt}.")
290
+ #writing the media file back to the new zip with the new name
291
+ zos.put_next_entry(e_name)
292
+ zos.print e.get_input_stream.read
293
+ #storing the old media file name to new media file name to mapping in the media hash
294
+ media_hash["doc#{doc_cnt}"][e.name.gsub("word/media/","")] = cnt
295
+ cnt+=1
296
+ else
297
+ #writing the media files present in the header and footer as their names are not needed to be changed
298
+ zos.put_next_entry(e.name)
299
+ zos.print e.get_input_stream.read
300
+ end
301
+ else
302
+ #writing the files not needed to be edited back to the new zip (only from the first document, so as to avoid duplication)
303
+ if doc_cnt == 0
304
+ zos.put_next_entry(e.name)
305
+ zos.print e.get_input_stream.read
306
+ end
307
+ end
308
+ end
309
+ end
310
+
311
+ if doc_cnt == 0
312
+ doc_content = @main_document_xml #first document's content XML
313
+ else
314
+ document_content = zip_file.read(DOCUMENT_FILE_PATH)
315
+ doc_content = Nokogiri::XML document_content #subsequent documents' content XML
316
+ end
317
+
318
+ #updating the stlye ids in the table elements present in the document content XML
319
+ doc_content.xpath("//w:tbl").each do |tbl_node|
320
+ tblStyle = tbl_node.xpath('.//w:tblStyle').last
321
+
322
+ table_hash["doc#{doc_cnt}"]["#{tblStyle.attributes['val'].value}"] = tbl_cnt
323
+ tblStyle.attributes['val'].value = tblStyle.attributes['val'].value.gsub(/[0-9]+/,"#{tbl_cnt}")
324
+ tbl_cnt+=1
325
+ end
326
+
327
+ #updating the relationship ids with the new media file names in the relationships XML
328
+ if doc_cnt == 0
329
+ zip_file.entries.each do |e|
330
+ if e.name == RELATIONSHIP_FILE_PATH
331
+ @rel_nodes.each do |node|
332
+ if node.values.to_s.include?("image")
333
+ i = media_hash["doc#{doc_cnt}"]["#{node['Target']}".gsub("media/","")]
334
+ target_val = node["Target"].gsub(/image[0-9]*./,"image#{i}.")
335
+ node["Target"] = target_val
336
+ rid_hash["doc#{doc_cnt}"]["#{node['Id']}"] = "#{i}"
337
+ node.attributes["Id"].value = node.attributes["Id"].value.gsub(/[0-9]+/,"#{i}")
338
+ end
339
+ end
340
+ end
341
+ #adding the table style information to the styles xml, if any tables present in the document being merged
342
+ if e.name == STYLES_FILE_PATH
343
+ table_nodes = @style_doc.xpath('//w:style').select{|n| n.attributes["type"].value == "table"}
344
+ table_nodes.each do |table_node|
345
+ tab_val = table_hash["doc#{doc_cnt}"]["#{table_node.attributes['styleId'].value}"]
346
+ table_node.attributes['styleId'].value = table_node.attributes['styleId'].value.gsub(/[0-9]+/,"#{tab_val}")
347
+ end
348
+ end
349
+ end
350
+ else
351
+ zip_file.entries.each do |e|
352
+ if e.name == RELATIONSHIP_FILE_PATH
353
+ input_stream = e.get_input_stream.read
354
+ rel_xml = Nokogiri::XML input_stream
355
+ rel_xml_nodes = rel_xml.css "Relationship"
356
+ rel_xml_nodes.each do |node|
357
+ if node.values.to_s.include?("image")
358
+ i = media_hash["doc#{doc_cnt}"]["#{node['Target']}".gsub("media/","")]
359
+ target_val = node["Target"].gsub(/image[0-9]*./,"image#{i}.")
360
+ rid_hash["doc#{doc_cnt}"]["#{node['Id']}"] = "#{i}"
361
+
362
+ new_rel_node = Nokogiri::XML::Node.new("Relationship", @rel_doc)
363
+ new_rel_node["Id"] = node.attributes["Id"].value.gsub(/[0-9]+/,"#{i}")
364
+ new_rel_node["Type"] = node["Type"]
365
+ new_rel_node["Target"] = target_val
366
+
367
+ #adding the extra relationship nodes for the media files from the subsequent documents (apart from first) to the relationship XML
368
+ @rel_doc.at('Relationships').add_child(new_rel_node)
369
+ end
370
+ end
371
+ end
372
+
373
+ if e.name == STYLES_FILE_PATH
374
+ input_stream = e.get_input_stream.read
375
+ style_xml = Nokogiri::XML input_stream
376
+ table_nodes = style_xml.xpath("//w:style").select{|n| n.attributes["type"].value == "table" && n.attributes["styleId"].value != "TableNormal"}
377
+ table_nodes.each do |table_node|
378
+ tab_val = table_hash["doc#{doc_cnt}"]["#{table_node.attributes['styleId'].value}"]
379
+ table_node.attributes['styleId'].value = table_node.attributes['styleId'].value.gsub(/[0-9]+/,"#{tab_val}")
380
+ #adding extra table style nodes to the styles xml, if any tables present in the document being merged
381
+ @style_doc.xpath("//w:styles").children.last.add_next_sibling(table_node.to_xml)
382
+ end
383
+ end
384
+ end
385
+ end
386
+
387
+ #updting the id and rid values for every drawing element in the document XML with the new counters
388
+ doc_content.xpath("//w:drawing").each do |dr_node|
389
+ blip = dr_node.xpath(".//a:blip", NAMESPACES).last
390
+ i = rid_hash["doc#{doc_cnt}"][blip.attributes["embed"].value]
391
+ blip.attributes["embed"].value = blip.attributes["embed"].value.gsub(/[0-9]+/,i)
392
+ docPr = dr_node.xpath(".//wp:docPr").last
393
+ docPr["id"] = #{docPr_id}
394
+ docPr_id+=1
395
+ end
396
+
397
+
398
+ if doc_cnt > 0
399
+ w_p_nodes = doc_content.xpath("//w:p")
400
+ #pulling out the <w:p> elements fromt the document body to be appended to the main document's body
401
+ body_nodes = doc_content.xpath('//w:body').children[0..doc_content.xpath('//w:body').children.count-2]
402
+
403
+ #adding a page break between documents being merged
404
+ if doc_cnt > 1 && page_break
405
+ @main_body.children.last.add_previous_sibling('<w:p><w:r><w:br w:type="page"/></w:r></w:p>')
406
+ end
407
+ #appending the body_nodes to main document's body
408
+ @main_body.children.last.add_previous_sibling(body_nodes.to_xml)
409
+ end
410
+
411
+ doc_cnt+=1
412
+ end
413
+
414
+ #writing the updated styles XML to the new zip
415
+ zos.put_next_entry(STYLES_FILE_PATH)
416
+ zos.print @style_doc.to_xml
417
+
418
+ #writing the updated relationships XML to the new zip
419
+ zos.put_next_entry(RELATIONSHIP_FILE_PATH)
420
+ zos.print @rel_doc.to_xml
421
+
422
+ zos.put_next_entry(CONTENT_TYPES_FILE)
423
+ additional_cont_type_entries.each do |node|
424
+ #adding addtional content type nodes to the content type XML
425
+ @cont_type_doc.at("Types").add_child(node)
426
+ end
427
+ #writing the updated content types XML to the new zip
428
+ zos.print @cont_type_doc.to_xml
429
+
430
+ #writing the updated document content XML to the new zip
431
+ zos.put_next_entry(DOCUMENT_FILE_PATH)
432
+ zos.print @main_document_xml.to_xml
433
+ end
434
+
435
+ #moving the temporary docx file to the final_path specified by the user
436
+ FileUtils.mv(temp_file.path, final_path)
437
+ end
438
+
439
+ def self.replace_doc_content(replacement_hash={}, template_path, final_path)
440
+ @template_zip = Zip::File.new(template_path)
441
+ @template_content = @template_zip.read(DOCUMENT_FILE_PATH)
442
+
443
+ #replacing the keys with values in the document content xml
444
+ replacement_hash.each do |key,value|
445
+ @template_content.force_encoding("UTF-8").gsub!(key,value)
446
+ end
447
+
448
+ temp_file = Tempfile.new('docxedit-')
449
+
450
+ Zip::OutputStream.open(temp_file.path) do |zos|
451
+
452
+ @template_zip.entries.each do |e|
453
+ unless e.name == DOCUMENT_FILE_PATH
454
+ #writing the files not needed to be edited back to the new zip
455
+ zos.put_next_entry(e.name)
456
+ zos.print e.get_input_stream.read
457
+ end
458
+ end
459
+
460
+ #writing the updated document content xml to the new zip
461
+ zos.put_next_entry DOCUMENT_FILE_PATH
462
+ zos.print @template_content
463
+ end
464
+
465
+ #moving the temporary docx file to the final_path specified by the user
466
+ FileUtils.mv(temp_file.path, final_path)
467
+ end
468
+
469
+ def self.replace_header_content(replacement_hash={}, template_path, final_path)
470
+ @template_zip = Zip::File.new(template_path)
471
+
472
+ @header_content = ''
473
+ @template_zip.entries.each do |e|
474
+ if e.name == HEADER_FILE_PATH
475
+ @header_content = e.get_input_stream.read
476
+ end
477
+ end
478
+
479
+ replacement_hash.each do |key,value|
480
+ @header_content.force_encoding("UTF-8").gsub!(key,value)
481
+ end
482
+
483
+ temp_file = Tempfile.new('docxedit-')
484
+
485
+ Zip::OutputStream.open(temp_file.path) do |zos|
486
+
487
+ @template_zip.entries.each do |e|
488
+ unless e.name == HEADER_FILE_PATH
489
+ #writing the files not needed to be edited back to the new zip
490
+ zos.put_next_entry(e.name)
491
+ zos.print e.get_input_stream.read
492
+ end
493
+ end
494
+
495
+ #writing the updated document content xml to the new zip
496
+ zos.put_next_entry HEADER_FILE_PATH
497
+ zos.print @header_content
498
+ end
499
+
500
+ #moving the temporary docx file to the final_path specified by the user
501
+ FileUtils.mv(temp_file.path, final_path)
502
+ end
503
+
504
+ def self.replace_footer_content(replacement_hash={}, template_path, final_path)
505
+ @template_zip = Zip::File.new(template_path)
506
+
507
+ @footer_content = ''
508
+ @template_zip.entries.each do |e|
509
+ if e.name == FOOTER_FILE_PATH
510
+ @footer_content = e.get_input_stream.read
511
+ end
512
+ end
513
+
514
+ replacement_hash.each do |key,value|
515
+ @footer_content.force_encoding("UTF-8").gsub!(key,value)
516
+ end
517
+
518
+ temp_file = Tempfile.new('docxedit-')
519
+
520
+ Zip::OutputStream.open(temp_file.path) do |zos|
521
+
522
+ @template_zip.entries.each do |e|
523
+ unless e.name == FOOTER_FILE_PATH
524
+ #writing the files not needed to be edited back to the new zip
525
+ zos.put_next_entry(e.name)
526
+ zos.print e.get_input_stream.read
527
+ end
528
+ end
529
+
530
+ #writing the updated document content xml to the new zip
531
+ zos.put_next_entry FOOTER_FILE_PATH
532
+ zos.print @footer_content
533
+ end
534
+
535
+ #moving the temporary docx file to the final_path specified by the user
536
+ FileUtils.mv(temp_file.path, final_path)
537
+ end
538
+ end
539
+ end
@@ -0,0 +1,3 @@
1
+ module Omnidocx
2
+ VERSION = "0.1.0"
3
+ end
@@ -0,0 +1,40 @@
1
+ # coding: utf-8
2
+ lib = File.expand_path('../lib', __FILE__)
3
+ $LOAD_PATH.unshift(lib) unless $LOAD_PATH.include?(lib)
4
+ require 'omnidocx/version'
5
+
6
+ Gem::Specification.new do |spec|
7
+ spec.name = "omnidocx"
8
+ spec.version = Omnidocx::VERSION
9
+ spec.authors = ["Parth Nagori"]
10
+ spec.email = ["nagori.parth@gmail.com"]
11
+
12
+ spec.summary = %q{A gem to merge docx files, write images to docx and other utilities.}
13
+ spec.description = %q{A gem that allows to merge multiple docx (microsoft word) files into one, writing images to a docx file and making string replacements in the header, footer or main document content.}
14
+ spec.homepage = "https://github.com/parthnagori/omnidocx"
15
+ spec.license = "MIT"
16
+
17
+ # Prevent pushing this gem to RubyGems.org. To allow pushes either set the 'allowed_push_host'
18
+ # to allow pushing to a single host or delete this section to allow pushing to any host.
19
+ # if spec.respond_to?(:metadata)
20
+ # spec.metadata['allowed_push_host'] = "TODO: Set to 'http://mygemserver.com'"
21
+ # else
22
+ # raise "RubyGems 2.0 or newer is required to protect against " \
23
+ # "public gem pushes."
24
+ # end
25
+
26
+ spec.files = `git ls-files -z`.split("\x0").reject do |f|
27
+ f.match(%r{^(test|spec|features)/})
28
+ end
29
+ spec.bindir = "exe"
30
+ spec.executables = spec.files.grep(%r{^exe/}) { |f| File.basename(f) }
31
+ spec.require_paths = ["lib"]
32
+
33
+ spec.add_dependency 'nokogiri', '~> 1.6'
34
+ spec.add_runtime_dependency 'rubyzip', '~> 1.1', '>= 1.1.6'
35
+
36
+ spec.add_development_dependency "bundler", "~> 1.13"
37
+ spec.add_development_dependency "rake", "~> 10.0"
38
+ spec.add_development_dependency "rspec", "~> 3.0"
39
+
40
+ end
metadata ADDED
@@ -0,0 +1,136 @@
1
+ --- !ruby/object:Gem::Specification
2
+ name: omnidocx
3
+ version: !ruby/object:Gem::Version
4
+ version: 0.1.0
5
+ platform: ruby
6
+ authors:
7
+ - Parth Nagori
8
+ autorequire:
9
+ bindir: exe
10
+ cert_chain: []
11
+ date: 2017-02-07 00:00:00.000000000 Z
12
+ dependencies:
13
+ - !ruby/object:Gem::Dependency
14
+ name: nokogiri
15
+ requirement: !ruby/object:Gem::Requirement
16
+ requirements:
17
+ - - "~>"
18
+ - !ruby/object:Gem::Version
19
+ version: '1.6'
20
+ type: :runtime
21
+ prerelease: false
22
+ version_requirements: !ruby/object:Gem::Requirement
23
+ requirements:
24
+ - - "~>"
25
+ - !ruby/object:Gem::Version
26
+ version: '1.6'
27
+ - !ruby/object:Gem::Dependency
28
+ name: rubyzip
29
+ requirement: !ruby/object:Gem::Requirement
30
+ requirements:
31
+ - - "~>"
32
+ - !ruby/object:Gem::Version
33
+ version: '1.1'
34
+ - - ">="
35
+ - !ruby/object:Gem::Version
36
+ version: 1.1.6
37
+ type: :runtime
38
+ prerelease: false
39
+ version_requirements: !ruby/object:Gem::Requirement
40
+ requirements:
41
+ - - "~>"
42
+ - !ruby/object:Gem::Version
43
+ version: '1.1'
44
+ - - ">="
45
+ - !ruby/object:Gem::Version
46
+ version: 1.1.6
47
+ - !ruby/object:Gem::Dependency
48
+ name: bundler
49
+ requirement: !ruby/object:Gem::Requirement
50
+ requirements:
51
+ - - "~>"
52
+ - !ruby/object:Gem::Version
53
+ version: '1.13'
54
+ type: :development
55
+ prerelease: false
56
+ version_requirements: !ruby/object:Gem::Requirement
57
+ requirements:
58
+ - - "~>"
59
+ - !ruby/object:Gem::Version
60
+ version: '1.13'
61
+ - !ruby/object:Gem::Dependency
62
+ name: rake
63
+ requirement: !ruby/object:Gem::Requirement
64
+ requirements:
65
+ - - "~>"
66
+ - !ruby/object:Gem::Version
67
+ version: '10.0'
68
+ type: :development
69
+ prerelease: false
70
+ version_requirements: !ruby/object:Gem::Requirement
71
+ requirements:
72
+ - - "~>"
73
+ - !ruby/object:Gem::Version
74
+ version: '10.0'
75
+ - !ruby/object:Gem::Dependency
76
+ name: rspec
77
+ requirement: !ruby/object:Gem::Requirement
78
+ requirements:
79
+ - - "~>"
80
+ - !ruby/object:Gem::Version
81
+ version: '3.0'
82
+ type: :development
83
+ prerelease: false
84
+ version_requirements: !ruby/object:Gem::Requirement
85
+ requirements:
86
+ - - "~>"
87
+ - !ruby/object:Gem::Version
88
+ version: '3.0'
89
+ description: A gem that allows to merge multiple docx (microsoft word) files into
90
+ one, writing images to a docx file and making string replacements in the header,
91
+ footer or main document content.
92
+ email:
93
+ - nagori.parth@gmail.com
94
+ executables: []
95
+ extensions: []
96
+ extra_rdoc_files: []
97
+ files:
98
+ - ".DS_Store"
99
+ - ".gitignore"
100
+ - ".rspec"
101
+ - ".travis.yml"
102
+ - CODE_OF_CONDUCT.md
103
+ - Gemfile
104
+ - LICENSE.txt
105
+ - README.md
106
+ - Rakefile
107
+ - bin/console
108
+ - bin/setup
109
+ - lib/omnidocx.rb
110
+ - lib/omnidocx/version.rb
111
+ - omnidocx.gemspec
112
+ homepage: https://github.com/parthnagori/omnidocx
113
+ licenses:
114
+ - MIT
115
+ metadata: {}
116
+ post_install_message:
117
+ rdoc_options: []
118
+ require_paths:
119
+ - lib
120
+ required_ruby_version: !ruby/object:Gem::Requirement
121
+ requirements:
122
+ - - ">="
123
+ - !ruby/object:Gem::Version
124
+ version: '0'
125
+ required_rubygems_version: !ruby/object:Gem::Requirement
126
+ requirements:
127
+ - - ">="
128
+ - !ruby/object:Gem::Version
129
+ version: '0'
130
+ requirements: []
131
+ rubyforge_project:
132
+ rubygems_version: 2.4.8
133
+ signing_key:
134
+ specification_version: 4
135
+ summary: A gem to merge docx files, write images to docx and other utilities.
136
+ test_files: []